|
|
|
||||
|
|
|||||
| infile | Identifies an external raw data file to read |
| data | Begins a data step which manipulates datasets |
| input | Lists variable names in the input file |
| datalines | Indicates internal data |
| set | Reads a SAS data set |
| proc contents | Contents of a data set |
| proc print | Prints observations of variables in a data set |
We will start with inputting an Excel file into SAS first through the SAS Import Wizard. The variable names are on the first line of the Excel file.
File
Import Data
Choose Excel .xls format (this is the default)
Click on Next
Click on Browse to select a file: c:\sas_data\hs0.xls
The default option is to read variable names from the first line,
leave as it is.
Click on Next
Enter a name (hs0) for the data set
Click on Finish
Below is the SAS syntax to import the same excel file.
proc import datafile="c:\sas_data\hs0.xls" out=hs0; run;
Both of the methods above (menus or syntax) work for other file formats, such as comma-separated or tab-delimited files, and Stata or SPSS datasets. Now we can look at the data or even modify them if we want.
Explorer
Libraries
Work
Double click on hs0
Edit
Edit Mode
Click on data to modify data
One of the more commonly used ASCII data formats is the comma-separated-values (.csv) format. Files of this type can be read in through the Import Wizard or proc import as shown above, or through a little bit of programming. We will now show how to read in a .csv file with a SAS data step. The following segment is the beginning part of the hs0 file in .csv format. This data file doesn't have variable names on the first line. Also notice that the line in bold italics has two consecutive commas near the end. This means that there is a missing value in between. In order to read in the data correctly, we use the option dsd in the infile statement.0,70,4,1,1,"general",57,52,41,47,57 1,121,4,2,1,"vocati",68,59,53,63,61 0,86,4,3,1,"general",44,33,54,58,31 0,141,4,3,1,"vocati",63,44,47,53,56 0,172,4,2,1,"academic",47,52,57,53,61 0,113,4,2,1,"academic",44,52,51,63,61 0,50,3,2,1,"general",50,59,42,53,61 0,11,1,2,1,"academic",34,46,45,39,36 0,84,4,2,1,"general",63,57,54,,51 0,48,3,2,1,"academic",57,55,52,50,51 0,75,4,2,1,"vocati",60,46,51,53,61 0,60,5,2,1,"academic",57,65,51,63,61 0,95,4,3,1,"academic",73,60,71,61,71The following data step will read the data file and name it temp. The input statement gives the names of the variables in the dataset in the same order as the comma separated file. The $ after prgtype tells SAS that prgtype is a string variable, that is, a variable that can contain letters as well as numbers. The length statement tells SAS that the variable prgtype is a string (as in the input statement, the $ indicates a string variable) and has ten characters (indicated by the 10 following the $). By default, SAS allows a string variable to be 8 or fewer characters. If the string is to be longer, you have to tell SAS using the length statement. Note that if you have already specified that the variable is a string in the length it is not necessary to include the $ after prgtype in the input statement; however, doing so is not problematic.
data temp; infile 'c:\sas_data\hs0.csv' delimiter=',' dsd; length prgtype $10; input gender id race ses schtyp prgtype $ read write math science socst ; run;
Once we have entered the data, we can list the first ten observations to check that the inputting was successful. Note that proc print "prints" the data to the output window, not to a physical printer.
proc print data = temp (obs=10); run;
Another type of commonly used ASCII data format is fixed format. It always requires a codebook to specify which column corresponds to which variable. Here is a small example of this type of data with a codebook.195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801
variable name column number id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12
data fixed; infile "c:\sas_data\schdat.fix"; input id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12; run; proc print data = fixed; run;
Sometimes we may want to input data directly from within SAS and here is what to do.
data hsb10; input id female race ses schtype $ prog read write math science socst; datalines; 147 1 1 3 pub 1 47 62 53 53 61 108 0 1 2 pub 2 34 33 41 36 36 18 0 3 2 pub 3 50 33 49 44 36 153 0 1 2 pub 3 39 31 40 39 51 50 0 2 2 pub 2 50 59 42 53 61 51 1 2 1 pub 2 42 36 42 31 39 102 0 1 1 pub 1 52 41 51 53 56 57 1 1 2 pub 1 71 65 72 66 56 160 1 1 2 pub 1 55 65 55 50 61 136 0 1 2 pub 1 65 59 70 63 51 ; run; proc print data=hsb10; run;
So far, all the SAS data sets that we have created are temporary. When we quit SAS, all temporary data sets will be gone. To save a SAS data file to disk we can use a data step. The example below saves the dataset temp from above as c:\sas_data\hs0 (SAS will automatically add the file extension .sas7bdat to the file name hs0).
data 'c:\sas_data\hs0'; set temp; run;
We can use permanent SAS data files by referring to them by their path and file name.
proc print data='c:\sas_data\hs0'; run;
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services