UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How do I convert a SAS file to a Stata file?

In this FAQ we will cover three situations for converting SAS to Stata:

1. Manually Converting your SAS file to a Stata File using a .csv file
2. Manually Converting your SAS file to a Stata File using a .xpt file
3. Using the savastata macro to convert your SAS file to Stata
4. Using Stat/Transfer to convert your SAS file to Stata

We then have a section showing how you can verify the transfer.

1. Manually converting your SAS file to a Stata file via a .csv file

We will use a two stage procedure to move the SAS file into Stata. In the first stage we will convert the SAS file to an ASCII file and in the second step, we will read the ASCII file in from Stata.

You can use proc export to convert a SAS data file to a raw data file. Let's say that we have a SAS file on our computer called hsb2.sas7bdat that is located in a directory called c:\data and we wish to convert into a Stata file called hsb2.dta. Here is the SAS program that will convert the SAS file into an ASCII file called hsb2.csv.

libname in "c:\data\";
proc export data=in.hsb2 outfile="c:\data\hsb2.csv" dbms=csv replace;
run;

Now, here are the Stata commands to read in the ASCII file and save it.

. cd c:\data
. insheet using hsb2.csv
. save hsb2

Note that the insheet command does not require any variable names because they are included in the ASCII file.

After the transfer is complete, we recommend that you verify the transfer as described in section 5.

2. Manually converting your SAS file to a Stata file via a .xpt file

Starting with Stata version 8.2, it is possible to save and use SAS XPORT (.xpt) file directly from within Stata.  If your Stata 8 is not fully up to date, then see Installing, Customizing, Updating Stata to make sure your copy of Stata is fully up to date. We will use a two stage procedure to move the SAS file into Stata. In the first stage we will convert the SAS file to a .xpt file and in the second step, we will read the .xpt file in Stata.

Let's say that we have a SAS file on our computer called hsb2.sas7bdat that is located in a directory called c:\data and we wish to convert into a Stata file called hsb2.dta. First, we will convert it into a SAS .xpt file as shown below.

libname out XPORT "c:\data\hsb2.xpt";

data out.hsb2;
  set "c:\data\hsb2";
run;

Now, here are the Stata commands to read in the .xpt file and save it.

. cd c:\data
. fdause hsb2
. compress
. save hsb2

Note that we added the compress command to store the variables in the most economical way, but you can omit this step if you wish.  We then use the save command to save the file as a Stata data file. After the transfer is complete, we recommend that you verify the transfer as described in section 5.

3. Using the savastata macro to convert your SAS file to Stata

Rather than doing all of these steps manually, you can use the savastata macro created by the Computer Services Group at the Carolina Population Center to automatically convert you SAS data file to a Stata data file.  Their web page includes examples of how to install the macro and how to use it. (Note that savastata often can automatically detect where your Stata is located if you installed it in a conventional location.  Otherwise, you may need to do a bit of configuration, as described at the savastata web site.)  Say that you download the savastata macro and stored it on your computer as c:\sasmacros\savastata.mac . And suppose you wish to convert the SAS file c:\data\hsb2.sas7bdat to a Stata file and putting the Stata file in c:\data .  You do this like this.

data hsb2;
  set "c:\data\hsb2";
run;

%include "c:\sasmacros\savastata.mac";
%savastata(c:\data, -check);

Note that savastata reads the most recently created SAS data file, hence it read hsb2 from this example.  We told it that we wanted to save the Stata file in c:\data and we used the -check option to ask for additional information to help us verify that the conversion was successful (more on this in a moment).  If we check the log file, we see messages suggesting this was successful.

1864  %include "c:\sasmacros\savastata.mac";
3714  %savastata(c:\data, -check);
NOTE: Savastata has successfully saved the               *
Stata 8.0 SE data file c:\data\hsb2.dta.    *
Stata reports that the dataset has 200 observations   *
and 11 variables.                                    *
*
You have requested to have savastata provide 2 check files: *
c:\data\hsb2_SAScheck.lst and *
c:\data\hsb2_STATAcheck.log  *

It indicates that it created c:\data\hsb2.dta and that file has 200 observations and 11 variables (which is the same as our SAS file).  Also, because we used the -check option, it created c:\data\hsb2_SAScheck.lst (which contains output from proc means, proc contents and a brief proc print output for the SAS file) and c:\data\hsb2_STATAcheck.log (which contains output from summarize, describe and a brief list command for the Stata file).  So, we can compare these two files (see below) and see that the results suggest the conversion was successful.  In comparing the results we see that the Ns, means, standard deviations, etc. are all the same (based on comparing the proc means with the summarize).  The number of observations, number of variables and variable names all match (based on comparing the proc contents with the describe).  And the first five observations look the same (based on comparing the proc print with the list).

hsb2_SAScheck.lst
The MEANS Procedure

Variable  Label                   N          Mean       Std Dev       Minimum       Maximum
-------------------------------------------------------------------------------------------
id                              200   100.5000000    57.8791845     1.0000000   200.0000000
female                          200     0.5450000     0.4992205             0     1.0000000
race                            200     3.4300000     1.0394722     1.0000000     4.0000000
ses                             200     2.0550000     0.7242914     1.0000000     3.0000000
schtyp    type of school        200     1.1600000     0.3675260     1.0000000     2.0000000
prog      type of program       200     2.0250000     0.6904772     1.0000000     3.0000000
read      reading score         200    52.2300000    10.2529368    28.0000000    76.0000000
write     writing score         200    52.7750000     9.4785860    31.0000000    67.0000000
math      math score            200    52.6450000     9.3684478    33.0000000    75.0000000
science   science score         200    51.8500000     9.9008908    26.0000000    74.0000000
socst     social studies score  200    52.4050000    10.7357935    26.0000000    71.0000000
-------------------------------------------------------------------------------------------

The CONTENTS Procedure

Data Set Name: WORK.HSB2                             Observations:         200
Member Type:   DATA                                  Variables:            11 
Engine:        V8                                    Indexes:              0  
Created:       13:03 Thursday, May 22, 2003          Observation Length:   88 
Last Modified: 13:03 Thursday, May 22, 2003          Deleted Observations: 0  
Protection:                                          Compressed:           NO 
Data Set Type:                                       Sorted:               NO 
Label:                                                                        

<some output omitted>

           -----Variables Ordered by Position-----
 
 #    Variable    Type    Len    Pos    Label
----------------------------------------------------------
 1    id          Num       8      0                        
 2    female      Num       8      8                        
 3    race        Num       8     16                        
 4    ses         Num       8     24                        
 5    schtyp      Num       8     32    type of school      
 6    prog        Num       8     40    type of program     
 7    read        Num       8     48    reading score       
 8    write       Num       8     56    writing score       
 9    math        Num       8     64    math score          
10    science     Num       8     72    science score       
11    socst       Num       8     80    social studies score

Obs    id   female   race   ses   schtyp   prog   read   write   math   science   socst
  1    70      0       4     1       1       1     57      52     41       47       57 
  2   121      1       4     2       1       3     68      59     53       63       61 
  3    86      0       4     3       1       1     44      33     54       58       31 
  4   141      0       4     3       1       3     63      44     47       53       56 
  5   172      0       4     2       1       2     47      52     57       53       61 
hsb2_STATAcheck.log
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          id |       200       100.5    57.87918          1        200
      female |       200        .545    .4992205          0          1
        race |       200        3.43    1.039472          1          4
         ses |       200       2.055    .7242914          1          3
      schtyp |       200        1.16     .367526          1          2
-------------+--------------------------------------------------------
        prog |       200       2.025    .6904772          1          3
        read |       200       52.23    10.25294         28         76
       write |       200      52.775    9.478586         31         67
        math |       200      52.645    9.368448         33         75
     science |       200       51.85    9.900891         26         74
-------------+--------------------------------------------------------
       socst |       200      52.405    10.73579         26         71

Contains data from c:\data\hsb2.dta
  obs:           200                          Savastata created this dataset
                                                on 21MAY03
 vars:            11                          22 May 2003 13:03
 size:         3,200 (99.9% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              int    %8.0g                  
female          byte   %8.0g                  
race            byte   %8.0g                  
ses             byte   %8.0g                  
schtyp          byte   %8.0g                  type of school 
prog            byte   %8.0g                  type of program 
read            byte   %8.0g                  reading score 
write           byte   %8.0g                  writing score 
math            byte   %8.0g                  math score 
science         byte   %8.0g                  science score 
socst           byte   %8.0g                  social studies score 
-------------------------------------------------------------------------------
Sorted by:  

     +-----------------------------------------------------------------------------------+
     |  id   female   race   ses   schtyp   prog   read   write   math   science   socst |
     |-----------------------------------------------------------------------------------|
  1. |  70        0      4     1        1      1     57      52     41        47      57 |
  2. | 121        1      4     2        1      3     68      59     53        63      61 |
  3. |  86        0      4     3        1      1     44      33     54        58      31 |
  4. | 141        0      4     3        1      3     63      44     47        53      56 |
  5. | 172        0      4     2        1      2     47      52     57        53      61 |
     +-----------------------------------------------------------------------------------+

4. Using Stat/Transfer to convert your SAS file to Stata

If you have Stat Transfer or have access to Stat Transfer converting SAS to Stata is fast and easy:

After the transfer is complete, we recommend that you verify the transfer as described in section 5.

5. Verifying the transfer

Regardless of how you convert from SAS to Stata its a good idea to verify that the transfer worked properly. To this end we will run some procedures in both SAS and Stata. First, the SAS statements:

PROC CONTENTS DATA=hsb2 position;
RUN;
PROC MEANS DATA=hsb2;
RUN;
PROC PRINT DATA=hsb2(obs=5);
RUN;

Below the proc contents shows you that the variables in the original file, the proc means shows the means of the variables, and the proc print shows the first five observations.  These values can be compared to the corresponding values in the Stata file.

The CONTENTS Procedure

Data Set Name: WORK.HSB2                              Observations:         200
Member Type:   DATA                                   Variables:            11
Engine:        V8                                     Indexes:              0
Created:       14:10 Wednesday, May 21, 2003          Observation Length:   88
Last Modified: 14:10 Wednesday, May 21, 2003          Deleted Observations: 0
Protection:                                           Compressed:           NO
Data Set Type:                                        Sorted:               NO
Label:

<some output omitted>

           -----Variables Ordered by Position-----

 #    Variable    Type    Len    Pos    Label
-----------------------------------------------------------
 1    id          Num       8      0
 2    female      Num       8      8
 3    race        Num       8     16
 4    ses         Num       8     24
 5    schtyp      Num       8     32    type of school
 6    prog        Num       8     40    type of program
 7    read        Num       8     48    reading score
 8    write       Num       8     56    writing score
 9    math        Num       8     64    math score
10    science     Num       8     72    science score
11    socst       Num       8     80    social studies score
The MEANS Procedure                                                                             
                                                                                                
Variable  Label                   N          Mean       Std Dev       Minimum       Maximum     
-------------------------------------------------------------------------------------------     
id                              200   100.5000000    57.8791845     1.0000000   200.0000000     
female                          200     0.5450000     0.4992205             0     1.0000000     
race                            200     3.4300000     1.0394722     1.0000000     4.0000000     
ses                             200     2.0550000     0.7242914     1.0000000     3.0000000     
schtyp    type of school        200     1.1600000     0.3675260     1.0000000     2.0000000     
prog      type of program       200     2.0250000     0.6904772     1.0000000     3.0000000     
read      reading score         200    52.2300000    10.2529368    28.0000000    76.0000000     
write     writing score         200    52.7750000     9.4785860    31.0000000    67.0000000     
math      math score            200    52.6450000     9.3684478    33.0000000    75.0000000     
science   science score         200    51.8500000     9.9008908    26.0000000    74.0000000     
socst     social studies score  200    52.4050000    10.7357935    26.0000000    71.0000000     
------------------------------------------------------------------------------------------- 
Obs    id   female   race   ses   schtyp   prog   read   write   math   science   socst
  1    70      0       4     1       1       1     57      52     41       47       57
  2   121      1       4     2       1       3     68      59     53       63       61
  3    86      0       4     3       1       1     44      33     54       58       31
  4   141      0       4     3       1       3     63      44     47       53       56
  5   172      0       4     2       1       2     47      52     57       53       61

Now let's generate the same information in Stata for comparison using the describe, summarize and list commands.

. describe

Contains data from hsb2.dta
  obs:           200                          Savastata created this dataset
                                                on 21MAY03
 vars:            11                          21 May 2003 14:10
 size:         3,200 (99.9% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
id              int    %8.0g                  
female          byte   %8.0g                  
race            byte   %8.0g                  
ses             byte   %8.0g                  
schtyp          byte   %8.0g                  type of school 
prog            byte   %8.0g                  type of program 
read            byte   %8.0g                  reading score 
write           byte   %8.0g                  writing score 
math            byte   %8.0g                  math score 
science         byte   %8.0g                  science score 
socst           byte   %8.0g                  social studies score 
-------------------------------------------------------------------------------
Sorted by:  

. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          id |       200       100.5    57.87918          1        200
      female |       200        .545    .4992205          0          1
        race |       200        3.43    1.039472          1          4
         ses |       200       2.055    .7242914          1          3
      schtyp |       200        1.16     .367526          1          2
-------------+--------------------------------------------------------
        prog |       200       2.025    .6904772          1          3
        read |       200       52.23    10.25294         28         76
       write |       200      52.775    9.478586         31         67
        math |       200      52.645    9.368448         33         75
     science |       200       51.85    9.900891         26         74
-------------+--------------------------------------------------------
       socst |       200      52.405    10.73579         26         71

. list in 1/5

     +-----------------------------------------------------------------------------------+
     |  id   female   race   ses   schtyp   prog   read   write   math   science   socst |
     |-----------------------------------------------------------------------------------|
  1. |  70        0      4     1        1      1     57      52     41        47      57 |
  2. | 121        1      4     2        1      3     68      59     53        63      61 |
  3. |  86        0      4     3        1      1     44      33     54        58      31 |
  4. | 141        0      4     3        1      3     63      44     47        53      56 |
  5. | 172        0      4     2        1      2     47      52     57        53      61 |
     +-----------------------------------------------------------------------------------+

Everything looks good. The values from the SAS file are the same as for the Stata file, suggesting that the file was converted successfully and without error.

Web notes


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.