|
|
|
||||
|
|
|||||
In this FAQ we will cover three situations for converting SAS to Stata:
1. Manually Converting your SAS file to a Stata File using a .csv file
2. Manually Converting your SAS file to a Stata File using a .xpt file
3. Using the savastata macro to convert your SAS file to Stata
4. Using Stat/Transfer to convert your SAS file to StataWe then have a section showing how you can verify the transfer.
We will use a two stage procedure to move the SAS file into Stata. In the first stage we will convert the SAS file to an ASCII file and in the second step, we will read the ASCII file in from Stata.
You can use proc export to convert a SAS data file to a raw data file. Let's say that we have a SAS file on our computer called hsb2.sas7bdat that is located in a directory called c:\data and we wish to convert into a Stata file called hsb2.dta. Here is the SAS program that will convert the SAS file into an ASCII file called hsb2.csv.
libname in "c:\data\"; proc export data=in.hsb2 outfile="c:\data\hsb2.csv" dbms=csv replace; run;
Now, here are the Stata commands to read in the ASCII file and save it.
. cd c:\data . insheet using hsb2.csv . save hsb2
Note that the insheet command does not require any variable names because they are included in the ASCII file.
After the transfer is complete, we recommend that you verify the transfer as described in section 5.
Starting with Stata version 8.2, it is possible to save and use SAS XPORT (.xpt) file directly from within Stata. If your Stata 8 is not fully up to date, then see Installing, Customizing, Updating Stata to make sure your copy of Stata is fully up to date. We will use a two stage procedure to move the SAS file into Stata. In the first stage we will convert the SAS file to a .xpt file and in the second step, we will read the .xpt file in Stata.
Let's say that we have a SAS file on our computer called hsb2.sas7bdat that is located in a directory called c:\data and we wish to convert into a Stata file called hsb2.dta. First, we will convert it into a SAS .xpt file as shown below.
libname out XPORT "c:\data\hsb2.xpt"; data out.hsb2; set "c:\data\hsb2"; run;
Now, here are the Stata commands to read in the .xpt file and save it.
. cd c:\data . fdause hsb2 . compress . save hsb2
Note that we added the compress command to store the variables in the most economical way, but you can omit this step if you wish. We then use the save command to save the file as a Stata data file. After the transfer is complete, we recommend that you verify the transfer as described in section 5.
Rather than doing all of these steps manually, you can use the savastata macro created by the Computer Services Group at the Carolina Population Center to automatically convert you SAS data file to a Stata data file. Their web page includes examples of how to install the macro and how to use it. (Note that savastata often can automatically detect where your Stata is located if you installed it in a conventional location. Otherwise, you may need to do a bit of configuration, as described at the savastata web site.) Say that you download the savastata macro and stored it on your computer as c:\sasmacros\savastata.mac . And suppose you wish to convert the SAS file c:\data\hsb2.sas7bdat to a Stata file and putting the Stata file in c:\data . You do this like this.
data hsb2; set "c:\data\hsb2"; run; %include "c:\sasmacros\savastata.mac"; %savastata(c:\data, -check);
Note that savastata reads the most recently created SAS data file, hence it read hsb2 from this example. We told it that we wanted to save the Stata file in c:\data and we used the -check option to ask for additional information to help us verify that the conversion was successful (more on this in a moment). If we check the log file, we see messages suggesting this was successful.
1864 %include "c:\sasmacros\savastata.mac"; 3714 %savastata(c:\data, -check); NOTE: Savastata has successfully saved the * Stata 8.0 SE data file c:\data\hsb2.dta. * Stata reports that the dataset has 200 observations * and 11 variables. * * You have requested to have savastata provide 2 check files: * c:\data\hsb2_SAScheck.lst and * c:\data\hsb2_STATAcheck.log *
It indicates that it created c:\data\hsb2.dta and that file has 200 observations and 11 variables (which is the same as our SAS file). Also, because we used the -check option, it created c:\data\hsb2_SAScheck.lst (which contains output from proc means, proc contents and a brief proc print output for the SAS file) and c:\data\hsb2_STATAcheck.log (which contains output from summarize, describe and a brief list command for the Stata file). So, we can compare these two files (see below) and see that the results suggest the conversion was successful. In comparing the results we see that the Ns, means, standard deviations, etc. are all the same (based on comparing the proc means with the summarize). The number of observations, number of variables and variable names all match (based on comparing the proc contents with the describe). And the first five observations look the same (based on comparing the proc print with the list).
hsb2_SAScheck.lst
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------------------
id 200 100.5000000 57.8791845 1.0000000 200.0000000
female 200 0.5450000 0.4992205 0 1.0000000
race 200 3.4300000 1.0394722 1.0000000 4.0000000
ses 200 2.0550000 0.7242914 1.0000000 3.0000000
schtyp type of school 200 1.1600000 0.3675260 1.0000000 2.0000000
prog type of program 200 2.0250000 0.6904772 1.0000000 3.0000000
read reading score 200 52.2300000 10.2529368 28.0000000 76.0000000
write writing score 200 52.7750000 9.4785860 31.0000000 67.0000000
math math score 200 52.6450000 9.3684478 33.0000000 75.0000000
science science score 200 51.8500000 9.9008908 26.0000000 74.0000000
socst social studies score 200 52.4050000 10.7357935 26.0000000 71.0000000
-------------------------------------------------------------------------------------------
The CONTENTS Procedure
Data Set Name: WORK.HSB2 Observations: 200
Member Type: DATA Variables: 11
Engine: V8 Indexes: 0
Created: 13:03 Thursday, May 22, 2003 Observation Length: 88
Last Modified: 13:03 Thursday, May 22, 2003 Deleted Observations: 0
Protection: Compressed: NO
Data Set Type: Sorted: NO
Label:
<some output omitted>
-----Variables Ordered by Position-----
# Variable Type Len Pos Label
----------------------------------------------------------
1 id Num 8 0
2 female Num 8 8
3 race Num 8 16
4 ses Num 8 24
5 schtyp Num 8 32 type of school
6 prog Num 8 40 type of program
7 read Num 8 48 reading score
8 write Num 8 56 writing score
9 math Num 8 64 math score
10 science Num 8 72 science score
11 socst Num 8 80 social studies score
Obs id female race ses schtyp prog read write math science socst
1 70 0 4 1 1 1 57 52 41 47 57
2 121 1 4 2 1 3 68 59 53 63 61
3 86 0 4 3 1 1 44 33 54 58 31
4 141 0 4 3 1 3 63 44 47 53 56
5 172 0 4 2 1 2 47 52 57 53 61
hsb2_STATAcheck.log
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 200 100.5 57.87918 1 200
female | 200 .545 .4992205 0 1
race | 200 3.43 1.039472 1 4
ses | 200 2.055 .7242914 1 3
schtyp | 200 1.16 .367526 1 2
-------------+--------------------------------------------------------
prog | 200 2.025 .6904772 1 3
read | 200 52.23 10.25294 28 76
write | 200 52.775 9.478586 31 67
math | 200 52.645 9.368448 33 75
science | 200 51.85 9.900891 26 74
-------------+--------------------------------------------------------
socst | 200 52.405 10.73579 26 71
Contains data from c:\data\hsb2.dta
obs: 200 Savastata created this dataset
on 21MAY03
vars: 11 22 May 2003 13:03
size: 3,200 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id int %8.0g
female byte %8.0g
race byte %8.0g
ses byte %8.0g
schtyp byte %8.0g type of school
prog byte %8.0g type of program
read byte %8.0g reading score
write byte %8.0g writing score
math byte %8.0g math score
science byte %8.0g science score
socst byte %8.0g social studies score
-------------------------------------------------------------------------------
Sorted by:
+-----------------------------------------------------------------------------------+
| id female race ses schtyp prog read write math science socst |
|-----------------------------------------------------------------------------------|
1. | 70 0 4 1 1 1 57 52 41 47 57 |
2. | 121 1 4 2 1 3 68 59 53 63 61 |
3. | 86 0 4 3 1 1 44 33 54 58 31 |
4. | 141 0 4 3 1 3 63 44 47 53 56 |
5. | 172 0 4 2 1 2 47 52 57 53 61 |
+-----------------------------------------------------------------------------------+
If you have Stat Transfer or have access to Stat Transfer converting SAS to Stata is fast and easy:
After the transfer is complete, we recommend that you verify the transfer as described in section 5.
Regardless of how you convert from SAS to Stata its a good idea to verify that the transfer worked properly. To this end we will run some procedures in both SAS and Stata. First, the SAS statements:
PROC CONTENTS DATA=hsb2 position; RUN; PROC MEANS DATA=hsb2; RUN; PROC PRINT DATA=hsb2(obs=5); RUN;
Below the proc contents shows you that the variables in the original file, the proc means shows the means of the variables, and the proc print shows the first five observations. These values can be compared to the corresponding values in the Stata file.
The CONTENTS Procedure
Data Set Name: WORK.HSB2 Observations: 200
Member Type: DATA Variables: 11
Engine: V8 Indexes: 0
Created: 14:10 Wednesday, May 21, 2003 Observation Length: 88
Last Modified: 14:10 Wednesday, May 21, 2003 Deleted Observations: 0
Protection: Compressed: NO
Data Set Type: Sorted: NO
Label:
<some output omitted>
-----Variables Ordered by Position-----
# Variable Type Len Pos Label
-----------------------------------------------------------
1 id Num 8 0
2 female Num 8 8
3 race Num 8 16
4 ses Num 8 24
5 schtyp Num 8 32 type of school
6 prog Num 8 40 type of program
7 read Num 8 48 reading score
8 write Num 8 56 writing score
9 math Num 8 64 math score
10 science Num 8 72 science score
11 socst Num 8 80 social studies score
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum
-------------------------------------------------------------------------------------------
id 200 100.5000000 57.8791845 1.0000000 200.0000000
female 200 0.5450000 0.4992205 0 1.0000000
race 200 3.4300000 1.0394722 1.0000000 4.0000000
ses 200 2.0550000 0.7242914 1.0000000 3.0000000
schtyp type of school 200 1.1600000 0.3675260 1.0000000 2.0000000
prog type of program 200 2.0250000 0.6904772 1.0000000 3.0000000
read reading score 200 52.2300000 10.2529368 28.0000000 76.0000000
write writing score 200 52.7750000 9.4785860 31.0000000 67.0000000
math math score 200 52.6450000 9.3684478 33.0000000 75.0000000
science science score 200 51.8500000 9.9008908 26.0000000 74.0000000
socst social studies score 200 52.4050000 10.7357935 26.0000000 71.0000000
-------------------------------------------------------------------------------------------
Obs id female race ses schtyp prog read write math science socst 1 70 0 4 1 1 1 57 52 41 47 57 2 121 1 4 2 1 3 68 59 53 63 61 3 86 0 4 3 1 1 44 33 54 58 31 4 141 0 4 3 1 3 63 44 47 53 56 5 172 0 4 2 1 2 47 52 57 53 61
Now let's generate the same information in Stata for comparison using the describe, summarize and list commands.
. describe
Contains data from hsb2.dta
obs: 200 Savastata created this dataset
on 21MAY03
vars: 11 21 May 2003 14:10
size: 3,200 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
id int %8.0g
female byte %8.0g
race byte %8.0g
ses byte %8.0g
schtyp byte %8.0g type of school
prog byte %8.0g type of program
read byte %8.0g reading score
write byte %8.0g writing score
math byte %8.0g math score
science byte %8.0g science score
socst byte %8.0g social studies score
-------------------------------------------------------------------------------
Sorted by:
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 200 100.5 57.87918 1 200
female | 200 .545 .4992205 0 1
race | 200 3.43 1.039472 1 4
ses | 200 2.055 .7242914 1 3
schtyp | 200 1.16 .367526 1 2
-------------+--------------------------------------------------------
prog | 200 2.025 .6904772 1 3
read | 200 52.23 10.25294 28 76
write | 200 52.775 9.478586 31 67
math | 200 52.645 9.368448 33 75
science | 200 51.85 9.900891 26 74
-------------+--------------------------------------------------------
socst | 200 52.405 10.73579 26 71
. list in 1/5
+-----------------------------------------------------------------------------------+
| id female race ses schtyp prog read write math science socst |
|-----------------------------------------------------------------------------------|
1. | 70 0 4 1 1 1 57 52 41 47 57 |
2. | 121 1 4 2 1 3 68 59 53 63 61 |
3. | 86 0 4 3 1 1 44 33 54 58 31 |
4. | 141 0 4 3 1 3 63 44 47 53 56 |
5. | 172 0 4 2 1 2 47 52 57 53 61 |
+-----------------------------------------------------------------------------------+
Everything looks good. The values from the SAS file are the same as for the Stata file, suggesting that the file was converted successfully and without error.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services