|
|
|
||||
|
|
|||||
There are ten variables in the PUMS which provide information regarding race, ethnicity, national origin, and nationality. They can be used separately or in various combinations which can paint a rich, detailed portrait of individuals and groups. For a breakdown of race, ethnicity and national origin variable codes, see the appendices to the 1990 PUMS documentation, which are available at the ISSR data archives or at ATS.
Following a discussion of the ten variables is an example SAS program which can be used to create a race/ethnicity variable with five race/ethnic categories.
This variable is extremely specific for some groups (Tongans, Apaches) and extremely broad for others (Blacks, Whites). In addition, the variable has often been criticized for offering no categories for mixed race individuals. While some mixed race persons report themselves as belonging to a single race, many such persons report their race as "other". Because Latino origin is coded separately (as Hispanic ethnicity) and not as a race, the self reported race code for "other" (race=37) also includes many Latinos, most of whom are mestizos (mixed European and Amerindian) or other racial mixes. In all, over 13% of the population of California is coded as "other" for race.
These are the place of birth and recoded place of birth variables. POB has separate codes for U.S. states and territories and for foreign countries (see appendix I of the 1990 PUMS documentation)
These are codes for each respondents' primary and secondary ethnic ancestries, not to be confused with place of birth. Most of the codes are for nations, but some are for previously independent regions (Tibet, Sicily) and a few are for ethnic groups without territory. The assignment of codes to extra-territorial ethnicities is not consistent: There is a code for Rom (Gypsies) but not for Jews or Palestinians. There is, however, a code for Israel, which provides no information regarding ethnic ancestry.
These are a set of ancestry codes for persons of Hispanic origin. They overlap to a great degree with the ancestry variables. Persons of Hispanic origin may be born anywhere, including the U.S., and may have any U.S. citizenship status. Separate Hispanic origin codes are assigned to Spain and to each of the nations of Spanish America. In some cases, a range of codes are assigned for each nation, with the separate codes corresponding to different ways of phrasing responses to the question. Note that there are no Hispanic codes for persons of Brazilian or Portuguese origin; these persons can be identified using the ancestry variables.
LANG1 is a flag variable which denotes whether the respondent speaks a language other than English at home. LANG2 contains codes for the specific language other than English, if any which the respondent speaks at home. See Appendix I for specific language codes.
The variable shows the categories of U.S. citizenship.
IMMIGR contains the categories for respondents' year of immigration to the U.S. There is some dissatisfaction with this variable because it is somewhat ambiguous: it is not clear from the long form whether respondents are being asked when they first came to the U.S., when they most recently came to the U.S., when they came to the U.S. and decided to stay, or when they became U.S. citizens. Some researchers believe that treating these values as if they were dates of first entry to the U.S. may be a downwardly biased measure of the actual length of time respondents have been in the U.S.
The SAS program below creates a 5 category Race/Ethnicity variable which reduces much of the complexity of racial and ethnic coding in the PUMS by creating a new variable called RACE5. Once this new variable is created, PROC FREQ is used to show the distribution of this new variable. The values of RACE5 are as follows:
1=White (Non-Hispanic)
2=Black (Non-Hispanic)
3=Asian (Non-Hispanic)
4=Other Race (Non-Hispanic, mostly Native American or Pacific Islander)
5=All Hispanics (except European Spanish)
data test;
set your_censusfile_here(keep = race hispanic);
race5 = .;
* White (Non-Hispanic;
IF (race = 1) THEN race5 = 1;
* Black (non-Hispanic);
IF (race = 2) THEN race5 = 2;
* Asian (non-Hispanic);
IF (6 LE race LE 24) THEN race5 = 3;
* Other Race (non-Hispanic);
IF (3 LE race LE 5) or (race GE 25) THEN race5 = 4;
* Hispanic (except European Spanish);
IF (1 LE hispanic LE 5) or (hispanic GE 210) THEN race5 = 5;
run;
PROC FREQ DATA=test;
TABLES race5;
RUN;
Originally revised: 21 Jan 97
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services