UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Using Pre-Defined Formats to Label 1990 PUMS Data

The census uses numeric codes to represent the values of the variables. To know what the codes mean, one usually needs to refer to the codebook. When you produce a PROC FREQ, it can be very cumbersome to repeatedly refer to the codebook to determine the meaning of the numeric codes. For example, consider the program shown in Example 1 which performs a PROC FREQ on the variable sex. As you can see in Output 1, it is unclear whether 0 represents males or females.

Example 1. PROC FREQ on Sex Without Formats

   PROC FREQ DATA="c:\census\to90pump";
     TABLES sex;
   RUN;
   

Output 1. Output From PROC FREQ Without Formats

                                 Cumulative  Cumulative
      SEX   Frequency   Percent   Frequency    Percent
      -------------------------------------------------
       0       2995      49.6        2995       49.6
       1       3047      50.4        6042      100.0
   

By contrast, Example 2 performs the PROC FREQ displaying the formatted values for sex. As you can see in Output 2, the values for Male and Female are clearly labeled.

Example 2. PROC FREQ on Sex With Formats

   * this creates the formats;
   %INCLUDE 'c:\census\pum90.format.sas';

   * this illustrates how to use the formats;
   PROC FREQ data="c:\census\to90pump";
    TABLES sex;
    FORMAT sex sex.;
   RUN;

Output 2. Output From PROC FREQ with Formats

                                    Cumulative  Cumulative
         SEX   Frequency   Percent   Frequency    Percent
      ----------------------------------------------------
      Male         2995      49.6        2995       49.6
      Female       3047      50.4        6042      100.0

Two changes were made to Example 2 to display the formatted values. First, the line

     %INCLUDE 'c:\census\pum90.format.sas';  

was added, which reads in the ATS pre-defined formats for the PUMS 90 data. You can download that file here. Second, the line

     FORMAT sex sex.;

was added to the PROC FREQ, which instructed SAS to format the variable "sex" according to the format "sex.".

ATS has created Pre-Defined formats for many of the variables in the 1990 PUMS data files (i.e. us90pump, us90pumh, ca90pump, ca90pumh, to90pump, to90pumh). You can find a list of all variables which have formats (and their corresponding format name) in the file format.list.

In general, the format has the same name as the variable, with a trailing period (e.g. the format for "sex" is "sex."). However, SAS does not permit a format to end with a number, so the format for a variable which ended with a number was given a trailing "f" as a suffix (e.g. the format for "units1" is "units1f.". Sometimes this meant that the format for a variable had to be abbreviated (e.g. the format for "vacancy1" is "vcancy1f.").


Originally revised: 15 Oct 96


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.