|
|
|
||||
|
|
|||||
When you have two data files, you may want to combine them by stacking them one on top of the other (referred to as concatenating files). Below we have a file called dads and a file containing moms.
dads famid name inc 2 Art 22000 1 Bill 30000 3 Paul 25000 moms famid name inc 1 Bess 15000 3 Pat 50000 2 Amy 18000
Below we have stacked (concatenated) these files creating a file we called momdad. These examples will show how to concatenate files in SAS.
momdad famid name inc 2 Art 22000 1 Bill 30000 3 Paul 25000 1 Bess 15000 3 Pat 50000 2 Amy 18000
The SAS program below creates a SAS data file called dads and a file called moms. It then combines them (concatenates them) creating a file called dadmom.
* Here is a file with information about dads with their family id name and income ; DATA dads; INPUT famid name $ inc ; CARDS; 2 Art 22000 1 Bill 30000 3 Paul 25000 ; RUN; * Here is a file with information about moms with their family id name and income ; DATA moms; INPUT famid name $ inc ; CARDS; 1 Bess 15000 3 Pat 50000 2 Amy 18000 ; RUN; * We can combine these files by stacking them one on top the other ; * by setting them both together in the same data step as shown below ; DATA dadmom; SET dads moms; RUN; * Let's use PROC PRINT to look at the result ; PROC PRINT DATA=dadmom; RUN;
The output of this program is shown below.
OBS FAMID NAME INC 1 2 Art 22000 2 1 Bill 30000 3 3 Paul 25000 4 1 Bess 15000 5 3 Pat 50000 6 2 Amy 18000
The output from this program shows that the files were combined properly. The dads and moms are stacked together in one file. But, there is a little problem. We can't tell the dads from the moms. Let's try doing this again but in such a way that we can tell which observations are the moms and which are the dads.
In order to tell the dads from the moms, let's create a variable called momdad in the dads and moms data files that will contain dad for the dads data file and mom for the moms data file. When we combine the two files together the momdad variable will tell us who the moms and dads are.
DATA dads; INPUT famid name $ inc ; momdad = "dad"; CARDS; 2 Art 22000 1 Bill 30000 3 Paul 25000 ; RUN;DATA moms; INPUT famid name $ inc ; momdad = "mom"; CARDS; 1 Bess 15000 3 Pat 50000 2 Amy 18000 ; RUN;DATA dadmom; SET dads moms; RUN;* Now when we do the proc print you can see the dads from the moms ;PROC PRINT DATA=dadmom; RUN;
The output of this program is shown below.
OBS FAMID NAME INC MOMDAD 1 2 Art 22000 dad 2 1 Bill 30000 dad 3 3 Paul 25000 dad 4 1 Bess 15000 mom 5 3 Pat 50000 mom 6 2 Amy 18000 mom
Here we get a more desirable result, because we can tell the dads from the moms by looking at the variable momdad. This required some thinking ahead because we had to put momdad in both the dads data file and the moms data file before we merged the data files.
You can see the problem illustrated below.DATA dads; INPUT famid name $ dadinc ; DATALINES; 2 Art 22000 1 Bill 30000 3 Paul 25000 ; RUN; DATA moms; INPUT famid name $ mominc ; DATALINES; 1 Bess 15000 3 Pat 50000 2 Amy 18000 ; RUN; DATA momdad; SET dads(IN=dad) moms(IN=mom); IF dad=1 THEN momdad="dad"; IF mom=1 THEN momdad="mom"; run;PROC PRINT DATA=momdad; RUN;
Solution #1. The most obvious solution is to choose appropriate variable names for the original files (i.e., name the variable inc in both the moms and dads file). This solution is not always possible since you might be concatenating files that you did not originally create. To save space, we omit illustrating this solution.OBS FAMID NAME DADINC MOMINC DAD MOM MOMDAD 1 2 Art 22000 . 1 0 dad 2 1 Bill 30000 . 1 0 dad 3 3 Paul 25000 . 1 0 dad 4 1 Bess . 15000 0 1 mom 5 3 Pat . 50000 0 1 mom 6 2 Amy . 18000 0 1 mom
Solution #2. If solution #1 is not possible, then this problem can be addressed using an if statement in a data step.
DATA momdad;
SET dads(IN=dad) moms(IN=mom);
IF dad=1 THEN
DO;
momdad="dad";
inc=dadinc;
END;
IF mom=1 THEN
DO;
momdad="mom";
inc=mominc;
END;
RUN;
PROC PRINT DATA=momdad;
RUN;
The results are shown below,
where inc now has the income for both the moms and dads.
Solution 3. Another way you can fix this is by using the rename option on the set statement of a data step to rename the variables just before the files are combined.OBS FAMID NAME DADINC MOMINC DAD MOM MOMDAD INC 1 2 Art 22000 . 1 0 dad 22000 2 1 Bill 30000 . 1 0 dad 30000 3 3 Paul 25000 . 1 0 dad 25000 4 1 Bess . 15000 0 1 mom 15000 5 3 Pat . 50000 0 1 mom 50000 6 2 Amy . 18000 0 1 mom 18000
The output for Solution 3 is below.DATA momdad; SET dads(RENAME=(dadinc=inc)) moms(RENAME=(mominc=inc)); RUN; PROC PRINT DATA=momdad; RUN;
OBS FAMID NAME INC 1 2 Art 22000 2 1 Bill 30000 3 3 Paul 25000 4 1 Bess 15000 5 3 Pat 50000 6 2 Amy 18000
The output is below.DATA dads; INPUT famid name $3. inc; DATALINES; 2 Art 22000 1 Bob 30000 3 Tom 25000 RUN; DATA moms; INPUT famid name $4. inc; DATALINES; 1 Bess 15000 3 Rory 50000 2 Jane 18000 RUN; DATA momdad; SET dads moms; RUN; PROC PRINT DATA=momdad; RUN;
Note that the names for the moms are truncated to be length 3. This is because the length for names in the dads file is 3. To fix this, use the length statement in the data step that merges the two files.OBS FAMID NAME INC 1 2 Art 22000 2 1 Bob 30000 3 3 Tom 25000 4 1 Bes 15000 5 3 Ror 50000 6 2 Jan 18000
The output is below.DATA momdad; LENGTH name $ 4; SET dads moms; RUN; PROC PRINT DATA=momdad; RUN;
OBS NAME FAMID INC 1 Art 2 22000 2 Bob 1 30000 3 Tom 3 25000 4 Bess 1 15000 5 Rory 3 50000 6 Jane 2 18000
Solution #1. Code the variables in the two files in the same way. For example, code fulltime using 0/1 for both files with 1 indicating working fulltime. This is the simplest solution if you are creating the files yourself. We will omit illustrating this solution to save space.DATA dads; INPUT famid name $ inc fulltime; DATALINES; 2 Art 22000 0 1 Bill 30000 1 3 Paul 25000 1 ; RUN; DATA moms; INPUT famid name $ inc fulltime $1.; DATALINES; 1 Bess 15000 N 3 Pat 50000 Y 2 Amy 18000 N ; RUN;
Solution #2. You may not have created the original raw data files, so solution #1 may not be possible for you. In that case, you can create a new variable in each file that has the same coding and will be compatible when you merge the files. Below we illustrate this strategy.
For the dads file, we make a variable called full that is the same as fulltime, and save the file as dads2, dropping fulltime. For the moms, we create full by recoding fulltime, and save the file as moms2, also dropping fulltime. The files dads2 and moms2 both have the variable full coded the same way (0/1 where 1=works full time) so we can combine those files together.
The results are shown below.DATA dads; SET dads; full=fulltime; DROP fulltime; RUN; DATA moms; SET moms; IF fulltime="Y" THEN full=1; IF fulltime="N" THEN full=0; DROP fulltime; RUN; DATA momdad; SET dads moms; RUN;PROC PRINT DATA=momdad; RUN;
OBS FAMID NAME INC FULL 1 2 Art 22000 0 2 1 Bill 30000 1 3 3 Paul 25000 1 4 1 Bess 15000 0 5 3 Pat 50000 1 6 2 Amy 18000 0
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services