UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Understanding the 1990 Public Use Microdata Sample (PUMS)

Overview

The U.S. Census Bureau provides results of the decennial census in many forms. Most of these are summarized results produced by the Census Bureau in a predefined tabular format. The Public Use Microdata Sample (PUMS) data is different. It gives researchers access to the actual responses collected by the Census Bureau after confidentiality has been preserved. This allows the researcher the freedom to analyze the data in the way most appropriate for their research, without the restriction of predefined tables. This freedom of analysis makes the PUMS data one of the most popular forms of census data and provides rich research opportunities. This article will present a general introduction to the 1990 PUMS data structure and its contents.

The 1990 PUMS data is a selected sample of raw data within specific geographic areas extracted from the actual Census Long-Form Questionnaires. The 1990 PUMS data reflects the U.S. housing and population status on April 1, 1990. The Census Bureau protects the privacy of the individual respondents by editing and removing all identifying information before releasing the PUMS data to researchers.

The 1990 PUMS data contains records for households, with information on the characteristics of each housing unit and the people living in them. In effect, the researcher can do customized tabulations and statistical analyses of census data to meet the needs of specific research projects, while reaping the benefits of Census Bureau data collection techniques and very large sample sizes that might not be feasible otherwise.

For 1990, the U.S. Census Bureau provides two independently drawn samples of the Long-Form Questionnaires. These samples are 5% and 1% PUMS data. The primary difference between these two samples is the geographic area associated with the sample. In general, the PUMS 5% data is based on counties, except in states where counties are not defined and then county-equivalent geographic areas are used. For example, in the State of Louisiana, parishes are the county equivalent. The PUMS 1% data is based on metropolitan areas and is a smaller sample. The remainder of this article will deal specifically with the 1990 PUMS 5% Data.

Distribution of Census Questionnaires

The U.S. Census Bureau distributed the census questionnaires to every housing unit in the United States. Both Long- and Short-Form Questionnaires were mailed or hand delivered at varying rates, depending on the population and the density of housing units. Taking into account the varying rates, approximately 15.9% of U.S. housing units received the Census Long-Form Questionnaire and the rest of the housing units received the Short-Form Questionnaires.

The U.S. Census Bureau varied the distribution rate of the Long-Form Questionnaire using three rates (1 in 6, 1 in 2, and 1 in 8) depending, on the population. The reasons that the Census Bureau chose to vary the rates were "to provide relatively more reliable estimates for small populations" and "to decrease the respondent burden in more densely populated areas." (Census of Population and Housing, 1990: Public Use Microdata Samples Technical Documentation.) According to the Census Bureau documentation, a 1 in 6 rate was used unless information gathered in the precensus estimates taken in 1988 or work done in 1989 indicated that one of the other rates were appropriate. The other rates were used under the following conditions:

Housing units on American Indian reservations, Tribal Jurisdiction Statistical Areas, Alaska Native villages, and Trust Lands were sampled with the Long-Form Questionnaires according to the same criteria as other governmental areas. The sampling rates, however, were based on the size of the American Indian and Alaska Native populations. In Hawaii, the same sampling rates were used for "census-designated places," because the Census Bureau does not recognize Hawaii's incorporated places. (Census of Population and Housing, 1990: Public Use Microdata Samples Technical Documentation.)

Privacy Protection

The PUMS data, like all other data released by the Census Bureau in print or on electronic media, is subject to strict confidentiality measures. These measures are imposed by law under Title 13 of the United States Code, which protects the confidentiality of individual respondents. Under these laws, questionnaire responses can be used only for statistical purposes, and Census Bureau employees are sworn to protect respondents' identities.

PUMS records are selected as a stratified random sample after all of the confidentiality editing has been performed. The Census Bureau edits the long-form responses by:

Since the PUMS data contains only a small fraction of the total population, the chances of a specific individual being included in the data is limited.

Selection of PUMS 5% Data

The PUMS data was selected with a stratified systematic procedure that allowed Long-Form Questionnaire responses from each housing unit equal probability of being included. The strata were defined so that there would be a high degree of homogeneity among the responses from each household with respect to characteristics of major interest.

Strata were defined for each of the three major categories for a total of 1,049 strata:

Structure of the 1990 PUMS 5% Data

The 1990 PUMS 5% data is structured as a series of groups of related records for a specified geographic area. The related records are of two types: one for the housing unit and zero or more for the people living in the housing unit. These records are referred to as the household record and the person record, respectively. The geographic information on the household record applies to both the household and the people living in the housing unit.

Geographic Information

The 1990 PUMS 5% Data geographic information gives location of the housing unit only in the broadest terms of geography. The geographic information consists primarily of three variables: a code for the region of the U.S. containing the state (i.e.: New England, Middle Atlantic, Pacific, etc.), a unique state code, and a Public Use Microsample Area (PUMA) code. A PUMA code is a 5-digit code that is unique within the state and is supplied by the State Data Center. Other descriptive variables that delineate the geographic areas are also included in the geographic information.

In general, a PUMA is based on a county and the places within the county. There may be more than one PUMA code in a county if the population is high. For example, if the population exceeds 200,000 persons, then it is possible for the State Data Center to designate more than one PUMA code within that county. However, the U.S. Census Bureau requires the State Data Center to define each PUMA code in such a way that the geographic area specified by the code contains at least 100,000 people. In the California PUMS 5% data, some counties have a large number of PUMA codes due to the very large populations.

Household Information

The household record contains information about each household. Housing units are of three general kinds: occupied, vacant, or group quarters (i.e.: hospitals, prisons, military bases, dormitories etc.). If the housing unit is occupied, then the household record will have one or more corresponding person records for each person living in the housing unit. If the housing unit is vacant, then only the household record is present and there will be no corresponding person record. If the household record is for group quarters, then there will be exactly one corresponding person record, and the household record will be a "dummy record," containing only geographic information with all other variables coded as "not applicable."

The household record contains geographic information and many variables regarding the housing unit, including:

The household record also contains some general information about the people living in the household, along with weights that, when applied, allow the researcher to use PUMS 5% data to produce estimates of the 100% characteristics of the housing units. For more information on the 100% characteristics, see the 1990 Census of Population and Housing Summary Tape File 1 Technical Documentation and see the article "Understanding the Census STF3A File for California".

Person Information

A person record focuses on an individual member of the household. The primary purpose of the person record is to give detailed information on each individual in the household. The person record contains variables on:

The person record contains weights that, when applied, allow the researcher to use PUMS 5% data to produce estimates of the 100% characteristics of the population, the same as the household record.

Originally revised: 11 Oct 96

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California