Creating and using formats and format libraries in SAS

SAS formats can serve many purposes. The most obvious is to format numeric and character variables in output. But you can also use formats create a data set, or you can create a format file from a data set. This page will illustrate how to do these things, as well as how to create and maintain a format library.

The most common way to create a format is use proc format. SAS provides many predefined formats for both numeric and character variables. You can also create your own formats for variables. For now, let's just use the formats that SAS has provided. In the following example, we will format the variables wage and birthdate. First, let's input a small example data set.

We will use proc print to see that the data were input correctly.

data temp;
input employee_id jobcat $ birthdate wage rating ;
cards;
1254 one 14120 11000 2
9936 two 14169 5000 0
7529 two 14187 9000 1
9154 one 14208 10000 2
7741 two 14201 9500 3
8896 two 14163 9600 4
6658 one 14365 12000 4
7854 two 14179 9600 3
9458 two 14196 8999 1
7887 two 14171 9050 0
;
run;

proc print data = temp;
run;
       employee_
Obs        id       jobcat    birthdate     wage    rating

  1       1254       one        14120      11000       2
  2       9936       two        14169       5000       0
  3       7529       two        14187       9000       1
  4       9154       one        14208      10000       2
  5       7741       two        14201       9500       3
  6       8896       two        14163       9600       4
  7       6658       one        14365      12000       4
  8       7854       two        14179       9600       3
  9       9458       two        14196       8999       1
 10       7887       two        14171       9050       0

As you can see, the output is difficult to read. The dates are not clear, and the wages could be presented with dollar signs and perhaps commas. We can use a format statement in proc print to accomplish these things.

proc print data = temp;
format birthdate date9. wage dollar9.2;
run;
       employee_
Obs        id       jobcat    birthdate         wage    rating

  1       1254       one      29AUG1998    $11000.00       2
  2       9936       two      17OCT1998    $5,000.00       0
  3       7529       two      04NOV1998    $9,000.00       1
  4       9154       one      25NOV1998    $10000.00       2
  5       7741       two      18NOV1998    $9,500.00       3
  6       8896       two      11OCT1998    $9,600.00       4
  7       6658       one      01MAY1999    $12000.00       4
  8       7854       two      27OCT1998    $9,600.00       3
  9       9458       two      13NOV1998    $8,999.00       1
 10       7887       two      19OCT1998    $9,050.00       0

Now let's consider the variables jobcat and rating. Without a codebook, their meaning is unclear. We can create our own formats so that the meaning of these values is clear.

proc format;
value $jc 'one' = 'management'
          'two' = 'non-management';
value rate 
           0 = 'terrible'
           1 = 'poor'
           2 = 'fair'
	   3 = 'good'
	   4 = 'excellent';
run;

We include our new formats on the format statement in the proc print just as we did the other formats.

proc print data = temp;
format birthdate date9. wage dollar9.2 jobcat $jc. rating rate.;
run;
       employee_
Obs        id           jobcat        birthdate         wage    rating

  1       1254      management        29AUG1998    $11000.00    fair
  2       9936      non-management    17OCT1998    $5,000.00    terrible
  3       7529      non-management    04NOV1998    $9,000.00    poor
  4       9154      management        25NOV1998    $10000.00    fair
  5       7741      non-management    18NOV1998    $9,500.00    good
  6       8896      non-management    11OCT1998    $9,600.00    excellent
  7       6658      management        01MAY1999    $12000.00    excellent
  8       7854      non-management    27OCT1998    $9,600.00    good
  9       9458      non-management    13NOV1998    $8,999.00    poor
 10       7887      non-management    19OCT1998    $9,050.00    terrible

When looking at the ratings, we see that three of the rating categories are acceptable, but two are not. We may want to create a second format indicating acceptable and unacceptable performance. A feature new to SAS version 8 is the multilabel option in proc format. This option allows you to two formats that apply to one variable. The multiple labels can be used in some procs, including proc tabulate, proc means and proc summary.  In the example below, we will use the mulitlabel option for the label called ratenew, and then we will use proc means to see the results.  Note that we need to use the mlf option on the class statement in the proc means to use the multiple labels.  The order = option is also used, and its purpose is to reorder the categories in the output.

proc format;
value $jc 'one' = 'management'
          'two' = 'non-management';
value ratenew (multilabel)
           0 = 'terrible'
           1 = 'poor'
           2 = 'fair'
		   3 = 'good'
		   4 = 'excellent'
		   0 - 1 = 'unacceptable performance'
		   2 - 4 = 'acceptable performance';
run;

proc means data = temp;
var wage;
class rating / mlf order = freq;
format rating ratenew.;
run;
                                   Analysis Variable : wage

                             N
rating                     Obs    N           Mean        Std Dev        Minimum        Maximum
-----------------------------------------------------------------------------------------------
acceptable performance       6    6       10283.33        1008.79        9500.00       12000.00

unacceptable performance     4    4        8012.25        2008.31        5000.00        9050.00

fair                         2    2       10500.00    707.1067812       10000.00       11000.00

terrible                     2    2        7025.00        2863.78        5000.00        9050.00

poor                         2    2        8999.50      0.7071068        8999.00        9000.00

good                         2    2        9550.00     70.7106781        9500.00        9600.00

excellent                    2    2       10800.00        1697.06        9600.00       12000.00
-----------------------------------------------------------------------------------------------

There are three SAS keywords that you can use when creating your formats. They are other, high and low. The keyword other is used to label any value that is not otherwise assigned a label. The keywords high and low can be used when these values are not known for the variable being labeled, or if the data set, and hence these values, may change. The keyword low will not label missing values (which are assigned the lowest possible number in SAS).

Now that we have created some formats, we need to consider how to store them. Like SAS data sets, SAS formats can be either temporary or permanent. Temporary formats remain only for the duration of your current SAS session. One advantage of temporary formats is that, because they are defined in your SAS program and are not stored in a separate file, you do not have to remember to keep the format file with the data file. Permanent formats are a type of SAS file that are usually stored in a SAS format catalog. You can create your own format catalog, or you can use a default format catalog that SAS creates.

There are at least three ways of storing the formats. If you do not use the library = option on the proc format statement, the formats are by default stored in work.formats and exist only for the current SAS session. If you specify only a libref, the formats are permanently stored in libref.formats. For example, if you created a libref called myformats, your formats would be stored in myformats.formats. The formats that we have created and used so far are temporary formats. If we wanted to make them permanent and save them in a file called myfmtlib.formats, we could do the following.

libname myfmtlib 'g:\SAS\';

proc format library = myfmtlib;
value $jc 'one' = 'management'
          'two' = 'non-management';
value rate 
           0 = 'terrible'
           1 = 'poor'
           2 = 'fair'
	   3 = 'good'
	   4 = 'excellent';
run;


If you use the library = option on the proc formats statement and specify a libref = catalog, then the formats are permanently stored in that catalog.

proc format library = myfmtlib.cat1;
value $jc 'one' = 'management'
          'two' = 'non-management';
value rate 
           0 = 'terrible'
           1 = 'poor'
           2 = 'fair'
	   3 = 'good'
	   4 = 'excellent';
run;

Now that you have created and stored you formats, the next question is when can you use them. Formats can be used in both procs and data step statements, including format statements; put statements; attrib statements; put functions in assignment, where and if statements; and format = options. Some of these uses will be illustrated below.

What do you do if you have created a format, but can't remember in which catalog you saved it? You can use the fmtsearch system option to list the catalogs that you want SAS to search for the format. By default, SAS looks in SAS supplied formats are always searched first, and work.formats is searched second. If a libref is assigned, say myformats, then myformats.formats is searched. You can override the default search order by using the following system option.

options fmtsearch = (myfmtlib myfmtlib.cat1);

In this example, SAS supplied formats would be searched first, followed by work.formats, then myfmtlib.formats, and finally myfmtlib.cat1.

You can also view the entries in your SAS format catalogs. Entries are the individual formats stored in a catalog. You can use either proc catalog or proc format to get information about the entries in your catalog. In proc catalog, you need to use the contents statement. When using proc format, you need to use the fmtlib option on the proc format statement. You can use either the select or the exclude statement to choose the formats for which you want information. (You would use the select statement if you want information on only a few formats, and you would use the exclude statement if you wanted information on all formats except a few.  Just choose the one that means the least typing!)

proc catalog catalog = myfmtlib.cat1;
contents;
run;
quit;
                        Contents of Catalog MYFMTLIB.CAT1

#    Name    Type               Create Date          Modified Date    Description
---------------------------------------------------------------------------------
1    RATE    FORMAT      28JUL2003:10:31:24     28JUL2003:10:31:24
2    JC      FORMATC     28JUL2003:10:31:24     28JUL2003:10:31:24
proc format library = myfmtlib.cat1 fmtlib;
select $jc rate;
title 'Formats for temp data set';
run;
quit;
Formats for temp data set                                                                      

----------------------------------------------------------------------------
|       FORMAT NAME: RATE     LENGTH:    9   NUMBER OF VALUES:    5        |
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH   9  FUZZ: STD        |
|--------------------------------------------------------------------------|
|START           |END             |LABEL  (VER. 8.2     28JUL2003:10:31:24)|
|----------------+----------------+----------------------------------------|
|               0|               0|terrible                                |
|               1|               1|poor                                    |
|               2|               2|fair                                    |
|               3|               3|good                                    |
|               4|               4|excellent                               |
----------------------------------------------------------------------------


----------------------------------------------------------------------------
|       FORMAT NAME: $JC      LENGTH:   16   NUMBER OF VALUES:    2        |
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH  16  FUZZ:        0   |
|--------------------------------------------------------------------------|
|START           |END             |LABEL  (VER. 8.2     28JUL2003:10:31:24)|
|----------------+----------------+----------------------------------------|
|one             |one             |management                              |
|two             |two             |non-management                          |
----------------------------------------------------------------------------

Unfortunately, it sometimes happens that the format library for the format files get separated from the data set, such as when a friend sends you a data set and forgets to send the formats. If the formats are permanently assigned, and you try to read the data into SAS, SAS will issue an error because it can't find the formats and stop processing. If getting the format file is not an option, you can use the nofmterr system option. This will tell SAS to read the data set without the formats. SAS will replace the missing formats with the w. or $w. default format, and SAS will issue a warning in the log telling you that it couldn't find the format file.

options nofmterr;

At some point you may find that you want to add or delete or modify some of the entries in your format. For example, if a new job category was added to our example above, say temporary workers, then we might want to add to our format $jc. There are several ways that you can do this in SAS. All of the methods involve changing the format from a format into a data set, making the desired modifications, and then changing the data set back into a format. To change a format into a data set or to change a data set into a format, you use proc format. If you are changing a format into a data set, you would use the cntlout = option on the proc format statement. If you are changing a data set into a format, you would use the cntlin = option on the proc format statement. Hence, in general, you use proc format with a cntlout = option to change the format into a data set, choose your preferred method of modifying the data set, and then use proc format with the cntlin = option to change the data set back into a format. You can then use proc format with the fmtlib option to see the results of your manipulations.

The first method that we will describe involves using proc fsedit, as shown below. This is an interactive procedure, so we will just delete one entry from the format $jc by closing the fsedit window once it opens. You can use the pull-down windows to add, copy and delete entries in the format. The first proc format is used to create a data set called fmtdata from the format $jc. Then proc fsedit is used to make the modifications to the fmtdata data set. The next proc format is used to turn the modified data set back into a format, and the last proc format is used with the fmtlib option to show us the results.

proc format lib = myfmtlib cntlout = fmtdata;
select $jc;
run;

proc fsedit data = work.fmtdata;
run;


proc format library = myfmtlib cntlin = fmtdata;
run;

proc format library = myfmtlib fmtlib;
select $jc;
run;
Formats for temp data set                                                                      

----------------------------------------------------------------------------
|       FORMAT NAME: $JC      LENGTH:   14   NUMBER OF VALUES:    2        |
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH  16  FUZZ:        0   |
|--------------------------------------------------------------------------|
|START           |END             |LABEL  (VER. 8.2     28JUL2003:10:52:00)|
|----------------+----------------+----------------------------------------|
|one             |one             |management                              |
|two             |two             |non-management                          |
----------------------------------------------------------------------------

A second way to edit formats is with proc sql. In this example, we will add a new entry to the format $jc called "temporary worker". The proc format is used to create the data set called fmtdata and the select statement is used to tell SAS which format to put into the data set. The second proc format is used to return the data set to a format, and the third proc format is used with the fmtlib option on the proc format statement to show us the results.  (NOTE:  We will reissue the proc format with one minor change, adding two spaces to 'non-management', so that when we add the longer label, 'temporary worker', the last two letters won't be cut off.)

proc format library = myfmtlib;
value $jc 'one' = 'management'
          'two' = 'non-management  ';
value rate 
           0 = 'terrible'
           1 = 'poor'
           2 = 'fair'
	   3 = 'good'
           4 = 'excellent';
run;

proc format lib = myfmtlib cntlout = fmtdata;
select $jc;
run;

proc sql;
insert into fmtdata
set fmtname = '$jc',
start = 'zero',
end = 'zero',
label = 'temporary worker';
quit;

proc format library = myfmtlib cntlin = fmtdata;
run;

proc format library = myfmtlib fmtlib;
select $jc;
run;

Formats for temp data set                                                                     

----------------------------------------------------------------------------
|       FORMAT NAME: $JC      LENGTH:   16   NUMBER OF VALUES:    3        |
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH  16  FUZZ:        0   |
|--------------------------------------------------------------------------|
|START           |END             |LABEL  (VER. 8.2     28JUL2003:10:55:30)|
|----------------+----------------+----------------------------------------|
|one             |one             |management                              |
|two             |two             |non-management                          |
|zero            |zero            |temporary worker                        |
----------------------------------------------------------------------------

A third method of modifying formats is to use a data step. First, you use proc format to create a data set from the format, as we did in the previous two examples. Next, you write your data step. You can add as many new labels as you like. Then you use proc format to return the data set to a format, and finally you use proc format with the fmtlib option to see the results.

proc format lib = myfmtlib cntlout = fmtdata;
select rate;
run;

data work.fmtdata;
set work.fmtdata end = last;
output;
if last then do;
fmtname = 'rate';
start = '5';
end = '5';
label = 'stupendous';
output;
end;
run;

proc format library = myfmtlib cntlin = fmtdata;
run;

proc format library = myfmtlib fmtlib;
select rate;
run;
Formats for temp data set                                                                     

----------------------------------------------------------------------------
|       FORMAT NAME: RATE     LENGTH:   10   NUMBER OF VALUES:    6        |
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH   9  FUZZ: STD        |
|--------------------------------------------------------------------------|
|START           |END             |LABEL  (VER. 8.2     28JUL2003:10:57:00)|
|----------------+----------------+----------------------------------------|
|               0|               0|terrible                                |
|               1|               1|poor                                    |
|               2|               2|fair                                    |
|               3|               3|good                                    |
|               4|               4|excellent                               |
|               5|               5|stupendous                              |
----------------------------------------------------------------------------

There are two more options that we will consider. One is called the fuzz option. This allows you to create a format and instead of specifying the values to which that format should apply, you can specify a range around the value (this range is called "fuzz").  We will use a different data set for this example, and we will input the new data set after we set up the formats.

proc format;
value close (fuzz = .25) 1 = "one"
                         2 = "two"
                     other = "other";
run;

data temp1;
input number;
put number = close.;
cards;
1
1.5
1.24
1.31
1.13
2.27
2.33
2
2.12
2.01
;
run;

proc print data =  temp1;
format number close.;
run;
Obs    number

  1    one
  2    other
  3    one
  4    other
  5    one
  6    other
  7    other
  8    two
  9    two
 10    two

Finally, we will consider nested formats. In nested formats, you create a series of formats, and then in the same proc format, you can assign those formats to a range of values. In the example below, we create two formats, called dog and cat, and then we assign these formats to a range of values in the third value statement. Nest formats should be used carefully, and it is not recommended to use more than one level of nesting because nesting requires lots of computer resources. 

proc format;
value dog 1 = 'Collie'
          2 = 'Bassett'
	  3 = 'Shepard';
value cat 4 = 'Tabby'
          5 = 'Persian'
	  6 = 'Lynx';
value pet 1 - 3 = [dog10.]
          4 - 6 = [cat10.];
run;

data pets;
input animal;
cards;
1
4
3
2
5
6
4
3
2
;
run;

proc print data = pets;
format animal pet.;
run;
Obs    animal

 1     Collie
 2     Tabby
 3     Shepard
 4     Bassett
 5     Persian
 6     Lynx
 7     Tabby
 8     Shepard
 9     Bassett

Finally, let's consider outputting a file using the formats that we have created.  We will create a text file called states, and instead of outputting the numbers that we used to input the data, we will use a put statement in the data step and output the labels that we created in proc format and assigned in the data step.

proc format;
value st 1 = "CA"
         2 = "OR"
         3 = "WA"
	 4 = "HI";
run;
data wstates;
input state population;
file 'g:\sas\states.txt';
put state st. "," population;
cards;
1 100
3 120
2 110
4 111
;
run;

We can see the resulting file by opening it in a text editor, such as NotePad.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.