UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Understanding the Census STF3A File For California

STF3A files are different in structure than the files most familiar to researchers. They consist of summary data in the form of contingency tables of counts or aggregate data in the form of means. Cases or observations are geographic divisions such as counties, cities, census tracts, etc. The file is hierarchical; for example, block groups are nested within census tracts. Instead of furnishing a different type of information for each of the different levels of the hierarchy, the same information presented in the lower levels of the hierarchy (i.e. block groups) is aggregated and presented again at the higher levels (i.e. tracts).

The discussion that follows will concentrate on the structure of these files, how to interpret and use the tables, and how to determine appropriate geographies. While this discussion pertains primarily to the 1990 California STF3A, most information should also apply to the STF3A for other states as well. This information is presented here to enable you to make informed decisions on your research involving the U.S. Census.

Structure of the Tables

One difficulty researchers new to the U.S. Census often have is understanding the nature of a variable in a file of tables. Typically, you think of variables as being of the type: gender, income, or ancestry, etc. In contrast, you need to realize that concepts such as gender have meaning only in labeling groups of variables in the STF3A file. Each variable contains the count or mean that would appear in the cell of a table, while the rows or columns of this logical table correspond to the concepts of interest to the researcher. This table structure does not exist in the file but is a structure you have to impose on the file in order to interpret the information presented. Let us examine an actual table. Table 1 is a table of citizenship versus age for LA County. It is rather easy to read in tabular form; unfortunately, the data must be stored in linear form.


Table 1:

Table 1: Citizenship and Age in Los Angeles County


The data would appear in the STF3A like this:

   1928844   54409   340041   4039254  738289  1762327
   P37_1     P37_2   P37_3    P37_4    P37_5   P37_6

The columns containing each cell value would be associated with a variable name. Possible variable names appear below the data to help clarify the discussion. Since this is table P37 in the 1990 STF3A, a reasonable set of names for the six variables containing the table would be P37_1 through P37_6. Table 2 shows how these variable names correspond to the table in Table 1. Notice that the counts of each of the citizenship categories for the under 18 age group precede those for the 18 and over age group. All the rows corresponding to a column before are presented before the rows for the next column. Also, the totals are not stored in the data as they can be easily obtained by summing the values of the column or row variables.


Table 2:

Table 2: Relationship of Variable Names to the Cells in Table 1


In dealing with a file of tables, it is important to be aware that concepts such as age and citizenship are a logical structure used in interpreting the variables in the file, but are not themselves variables in these files. Thus, it is possible to collapse cells by combining the totals for naturalized and non-citizens into a single citizenship category, but it is not possible to create a table of citizenship by ancestry, since the STF3A lacks a table relating these subjects. When using STF3A, you are limited to the tables presented. If it is necessary to obtain information beyond what is present in these prepackaged tables, or tables in other STFs, you may be able to accomplish your goal using the PUMS data set. PUMS provides the ultimate flexibility in using census variables.

Geographic Organization of the File

In summary data, the basic unit of analysis is the specific geographic area. There is a definite hierarchy of geographic areas in the census. This hierarchy proceeds from the entire U.S. down to the block level. The basic hierarchy follows:

   United States
     Region
       Division
          State
             County
                County Subdivision
                   Place
                     Census Tract/Block Numbering Area
                        Block Group
                           Block
                           

Unfortunately, this hierarchy is incomplete, as counties or county subdivisions may contain only parts of a place. In addition, places may contain only parts of census tracts and block groups. This means that decisions made on reading the data are critical. In addition to this hierarchy, the STF3A contains information for geographic areas such as American Indian reservations, trust lands, metropolitan statistical areas, and urbanized areas.

Summary Levels Summary levels are assigned by the Census Bureau to designate which levels of the hierarchy apply and to indicate the smallest level of geography presented. For example, a case may include information on a county that falls within a state, or a place that falls within a county and state. Researchers need to select a summary level appropriate to the level of geography desired for analysis. Some of the most important summary levels follow:

  40   State
  50      County in State 
  60         Sub-County in County in State
  70           Place in Sub-County in County in State
  80              Tract in Place in Sub-County in County
                    in State 
  90              Block Group in Congressional District in
                    Urban/Rural  in Urbanized Area/Remainder 
                    in (variety of American  Indian geographic
                    designations/Remainder) in Tract in Place
                    in Sub-County in County in State 
  140   Tract in County in State 
  150   Block Group in Tract in County in State 
  155   County in Place in State 
  160   Place in State 
  170   Consolidated City in State
  

For example, summary level 50 contains tables at the county level and the county variable will contain the county code. Information is often presented more than once in the file. Summary level 80 cases are at the tract level as are summary level 140 cases. Since summary level 80 records pertain to tracts in place, in sub-county, in county, tracts that cross place boundaries may be broken into part tracts. On the other hand, summary level 140 represents tracts in county in state, and tracts never cross county boundaries, so these records are for complete tracts. You will want to use summary level 140 if it is important to use complete tracts and summary level 80 if it is important to have place information while dealing with tracts. When the lowest geographic area in the hierarchy is split into parts by a higher geographic area, the variable PARTFLAG has a value of 1.

Summary level 90 presents block groups within a variety of designations, most of which between block group and tract are irrelevant in studies of California's most populous counties. However, if you are interested specifically in breakdowns by congressional district or in urban (versus rural) areas, that information is available here in summary level 90. You will also find information on various designations of American Indian/Alaskan Native areas. If a block group crosses one of these boundaries, the block will appear in separate parts.

Counties

The Census Bureau assigns a three-digit FIPS code to each county or county equivalent. The code is unique within state and is stored in the variable CNTY. The codes correspond to an alphabetical listing of counties. For analysis considerations, the counties are entirely within states and thus there are no partial counties.

County Subdivisions

County subdivisions include county census divisions, minor civil divisions, and census subareas. County subdivisions are entirely within counties. The two variables that identify county subdivisions are COUSUBCE and COUSUBFP. COUSUBCE is a three-digit code that uniquely identifies county subdivisions within county, while COUSUBFP is a five-digit code that uniquely identifies the county subdivision within state.

Places

Places are wholly within counties but may be divided between county subdivisions. If you extract summary level 70 records, you may find you have extracted parts of places, but if you extract summary level 170 or 180 records, you will have complete places for California. Both a four-digit census code, PLACECE, and a five-digit FIPS code, PLACEFP, identify places within a state with no repetition of codes. Places include both census-designated places and incorporated places.

Census Tracts

Census tracts are unique within county; they range between numbers 0001 and 9499.99. Even though they are unique by county, they do not always start renumbering within the county, as they are also unique within the Standard Metropolitan Statistical Area (SMSA). The variable TRACTBNA has a four-digit basic number and may have a two-digit suffix. Often it is useful to consider the basic number and suffix as two separate variables. Tracts are wholly contained within counties but not within places. This means that you can obtain full tracts by specifying summary level 140, but may have to combine part tracts if you extract summary level 80 records. The variable PARTFLAG will have a value of 1 if the tract is in more than one part. Tracts have between 2,500 and 8,000 persons and the intention is to keep them stable from year to year. Occasionally, due to population growth, a tract may be split into two or more tracts. When this happens, a two-digit suffix is added to the basic tract code. Tracts that were revised or created for the 1990 Census usually have a suffix in the 80 to 90 range. Tracts may also be combined from 1980 to 1990 if there was sufficient decrease in population.

Block Groups

Block groups are unique within tract, but other geographic area boundaries may cut through them. They are complete in summary level 150, because they are nested only within tract in this summary level. They may be in separate parts in summary level 90, because they may cross place or Congressional District boundaries in this file. Block groups are also nested in various Indian Lands designations in summary level 90, but for most of the population of California this is not a consideration. The values of the various Indian Lands designations for most of the population of California is 9, 99, 999, or 9999. These codes indicate areas that are not Indian Lands. Therefore, these classifications are not relevant for most of California.

Comparison of PUMS and STF3A

The STF3A files contain tables, and variables in these files are cells. You can explore only those tables presented in this file. If you need information not found in the tables or not organized to meet your research needs, you have to use the PUMS files where the unit of analysis is a household or person. PUMS contains variables such as age, gender, income, and education. For example, the value of the variable AGE in the PUMS file is the age of the person. Use the PUMS file for statistical analysis at the person level.

The advantages of using STF3A include a larger sample, smaller geographic units, and data adjusted to reflect population estimates. STF3A contains the sample of respondents who received the U.S. Census Long-Form Questionnaire, which in 1990 is approximately a 1 in 6 sample of households. The sampling rate for persons living in group quarters was 1 in 6 individuals. Governmental units of less than 2,500 persons were sampled at a 1 in 2 rate. The sampling rate for households in census tracts with more than 2,000 households was 1 in 8. On the other hand, the PUMS files are based on a sample of these households and individuals. The resulting PUMS file contains approximately 5% of the individuals or households in that geographic area. STF3A contains information down to the block group level. It also contains information on census tracts, place, and county. PUMS, on the other hand, only has geographic identification as small as a part place at most and sometimes just a group of counties. The numbers presented in the tables are adjusted to be estimates of the population parameters. The researcher has the burden of producing those estimates for the PUMS files.

You will want to use the STF3A if you need, or can make do with, summary data, and you either need larger samples or finer levels of geography. You will want to use PUMS if you are interested in doing model-building or if the tables of interest to you are not available in STF3A. If you use PUMS, you will have to accept the smaller sample size and the larger-grained geographic areas.

Originally revised: 11 Oct 96


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.