|
|
|
||||
|
|
|||||
Here are a number of tips to ease the process of using the Census 2000 data with SAS.
You can use the obs= option to read just a small number of records to test your program. This will let you run your program on just a small segment of the data and save you time in the initial stages of debugging your program. Here is an example using options obs=20 to read just the first 20 observations.
options obs=20;
data "c:\census2000\sf1_file05";
infile "c:\census2000\ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
Then, when you are more confident that your program will run OK, you can run all of the observations using obs=max, for example.
options obs=max;
data "c:\census2000\sf1_file05";
infile "c:\census2000\ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
You can use the keep statement to keep just the variables that you are interested in. This will reduce the size of your data file. Here is an example showing keeping just a handful of variables, although you would probably want to keep more variables.
data "c:\census2000\sf1_file05";
infile "c:\census2000\ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
keep P012A001 P012A002 P012A003 P012A004 P012A005 P012A006 P012A007
P012A008 P012A009 P012A010 ;
run;
Note that since these variables were consecutive in the file, we could use P012A001--to indicate that we wish to keep P012A001 to P012A010. Note that this is 2 dashes, and that this indicates the consecutive position of the variables in the file. This is illustrated below.
data "c:\census2000\sf1_file05";
infile "c:\census2000\ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
keep P012A001--P012A010 ;
run;
You can also use the drop statement to drop variables you do not want. The syntax is just like keep, except that the varaibles that are listed are dropped from the resulting data file. For the sake of space, we omit an example of this.
You may want to keep just some observations in your file. You can use the if statement in SAS to do this. Here is an example using the geography file where we keep just the observations where the summary level is 40.
data "c:\census2000\sf_geo";
infile "c:\census2000\cageo.txt" lrecl=2000;
input @1 FILEID $6. @7 STUSAB $2. @9 SUMLEV 3. @12 GEOCOMP 2. @14 CHARITER 3.
@17 CIFSN 2. @19 LOGRECNO 7. @26 REGION 1. @27 DIVISION 1. @28 STATECE 2.
@30 STATE 2. @32 COUNTY 3. (rest of variables omitted to save space) ;
if sumlev = 40 then output;
run;
All of the examples we have shown are based on using SAS for windows. The setup will look slightly different if you are using SAS for UNIX. Here is an example of what a setup might look like on UNIX. Note that the c:\census2000 is not used since this refers to a folder in Windows. This would read the raw data file ca00005.uf1 from the current directory and store the file sf1_file05.sas7bdat in the current directory.
data "sf1_file05";
infile "ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
If you were using the RS/6000 Cluster at UCLA, you might store the raw data file in /u/7day/, for example if your userid is joebruin, then you would probably store the file in /u/7day/joebruin and then your setup might look like this.
data "/u/7day/joebruin/sf1_file05";
infile "/u/7day/joebruin/ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
If you were using Nicco at Social Science Computing, then you might store the raw data file in /tmpSTAT/, for example if your userid is joebruin, then you would probably store the file in /tmpSTAT/joebruin and then your setup might look like this.
data "/tmpSTAT/joebruin/sf1_file05";
infile "/tmpSTAT/joebruin/ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
In SAS for UNIX, you can use SAS to read the compressed file on the fly without needing to decompress it first. SAS will decompress the file as it reads it if you show it how. Here is an example showing how you can do this on Nicco using the Social Sciences Computer system. The relevant portions are bolded.
FILENAME in PIPE "unzip -pa /tmpSTAT/ca00005_uf1.zip" LRECL=2000;
data "/tmpSTAT/sf1_file05";
infile in dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
If you were to do this on the RS/6000 Cluster, the syntax would be like that shown below.
FILENAME in PIPE "/local/bin/unzip -pa /u/7day/joebruin/ca00005_uf1.zip" LRECL=2000;
data "/u/7day/joebruin/sf1_file05";
infile in dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
If you choose to unzip the file first, be sure to use the -a option on windows, for example.
unzip -a ca00005_uf1.zip
In SAS for Windows you can also read the compressed file on the fly without needing to decompress it first. SAS will decompress the file as it reads it if you show it how. Here is an example showing how you can do this, but you first need to download and install unzip, which you can obtain at ftp://ftp.info-zip.org/pub/infozip/WIN32/ or in particular, download and install ftp://ftp.info-zip.org/pub/infozip/WIN32/unz542xN.exe . We assume here you have installed it in c:\unzip. We also assume you are reading the file ca00005_uf1.zip from c:\census2000\ . The relevant portions are shown in bold.
FILENAME in PIPE "c:\unzip\unzip -pa c:\census2000\ca00005_uf1.zip" LRECL=2000;
data "c:\census2000\sf1_file05";
infile in dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
proc print data="c:\census2000\sf1_file05";
run;
The RS/6000 Cluster and Nicco have copies of the SF1 data files that you can use. On the RS/6000 Cluster the data files are in a directory called /u/datalib01/cusgdbl/census2000/sf1/ and on Nicco they are in a directory called /home2/census . If you are using the RS/6000 Cluster, you can read the ca00005_uf1.zip file like this.
FILENAME in PIPE
"/local/bin/unzip -pa /u/datalib01/cusgdbl/census2000/sf1/ca00005_uf1.zip" LRECL=2000;
data "/u/7day/joebruin/sf1_file05";
infile in dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
On Nicco, you can read the data like this.
FILENAME in PIPE
"unzip -pa /home2/census/ca00005_uf1.zip" LRECL=2000;
data "/tmpSTAT/joebruin/sf1_file05";
infile in dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
You will generally wish to merge the "geography" file with one of the other data files. Assume that you have read the "geography" file with this setup...
data "c:\census2000\sf1_geo";
infile "c:\census2000\cageo.txt" lrecl=2000;
input @1 FILEID $6. @7 STUSAB $2. @9 SUMLEV 3. @12 GEOCOMP 2. @14 CHARITER 3.
@17 CIFSN 2. @19 LOGRECNO 7. @26 REGION 1. @27 DIVISION 1. @28 STATECE 2.
(rest of variables omitted to save space) ;
run;
and that you read file number five using this setup...
data "c:\census2000\sf1_file05";
infile "c:\census2000\ca00005.uf1" dlm="," dsd lrecl=2000;
length fileid $ 8 stusab $ 8 ;
input FILEID STUSAB CHARITER CIFSN LOGRECNO P012A001 P012A002 P012A003 P012A004
P012A005 P012A006 P012A007 P012A008 P012A009 P012A010 P012A011 P012A012
P012A013 P012A014 (rest of variables omitted to save space) ;
run;
You could then merge these two files like this.
data "c:\census2000\geo_file05"; merge "c:\census2000\sf1_geo" "c:\census2000\sf1_file05" ; by logrecno; run;
and then the file c:\census2000\geo_file05 would be the merged version of both of the files. If you were using UNIX, you would replace the file names c:\census2000\sf1_file05 etc... with file names that would be appropriate for UNIX.
All of the examples above assume that you are using SAS version 8. Note that some of the setups we provide may use variable names that are over 8 characters long, so will not work in SAS version 6. You can either switch to using SAS version 8, or shorten the variable names to be 8 characters or less.
Another difference is that SAS version 8 allows you to refer to names of SAS data files in " ", for example
data "c:\census2000\sf1_geo";
infile "c:\census2000\cageo.txt" lrecl=2000;
input @1 FILEID $6. @7 STUSAB $2. @9 SUMLEV 3. @12 GEOCOMP 2. @14 CHARITER 3.
@17 CIFSN 2. @19 LOGRECNO 7. @26 REGION 1. @27 DIVISION 1. @28 STATECE 2.
(rest of variables omitted to save space) ;
run;
In SAS version 6, however, you need to use the libname statement, for example.
libname out "c:\census2000\";
data out.sf1_geo;
infile "c:\census2000\cageo.txt" lrecl=2000;
input @1 FILEID $6. @7 STUSAB $2. @9 SUMLEV 3. @12 GEOCOMP 2. @14 CHARITER 3.
@17 CIFSN 2. @19 LOGRECNO 7. @26 REGION 1. @27 DIVISION 1. @28 STATECE 2.
(rest of variables omitted to save space) ;
run;
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services