Stata Learning Module
Using IF with Stata commands

This module shows the use of if with common Stata commands.

Let's use the auto data file.

sysuse auto 

For this module, we will focus on the variables make, rep78, foreign, mpg, and price. We can use the keep command to keep just these five variables.

keep make rep78 foreign mpg price  

Let's make a table of rep78 by foreign to look at the repair histories of the foreign and domestic cars.

tabulate rep78 foreign  
            |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         1 |         2          0 |         2 
         2 |         8          0 |         8 
         3 |        27          3 |        30 
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        48         21 |        69  

Suppose we wanted to focus on just the cars with repair histories of four or better. We can use if suffix to do this.

tabulate rep78 foreign if rep78 >=4 
            |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         4 |         9          9 |        18 
         5 |         2          9 |        11 
-----------+----------------------+----------
     Total |        11         18 |        29  

Let's make the above table using the column and nofreq options. The command column requests column percentages while the command nofreq suppresses cell frequencies. Note that column and nofreq come after the comma. These are options on the tabulate command and options need to be placed after a comma.

tabulate rep78 foreign if rep78 >=4, column nofreq 
           |        foreign
     rep78 |         0          1 |     Total
-----------+----------------------+----------
         4 |     81.82      50.00 |     62.07 
         5 |     18.18      50.00 |     37.93 
-----------+----------------------+----------
     Total |    100.00     100.00 |    100.00  

The use of if is not limited to the tabulate command. Here, we use it with the list command.

list if rep78 >= 4
                   make      price        mpg      rep78    foreign 
  3.        AMC Spirit       3799         22          .          0  
  5.     Buick Electra       7827         15          4          0  
  7.        Buick Opel       4453         26          .          0  
 15.      Chev. Impala       5705         16          4          0  
 20.        Dodge Colt       3984         30          5          0  
 24.       Ford Fiesta       4389         28          4          0  
 29.      Merc. Bobcat       3829         22          4          0  
 30.      Merc. Cougar       5379         14          4          0  
 33.        Merc. XR-7       6303         14          4          0  
 35.           Olds 98       8814         21          4          0  
 38.     Olds Delta 88       4890         18          4          0  
 43.       Plym. Champ       4425         34          5          0  
 45.     Plym. Sapporo       6486         26          .          0  
 47.    Pont. Catalina       5798         18          4          0  
 51.     Pont. Phoenix       4424         19          .          0  
 53.         Audi 5000       9690         17          5          1  
 55.          BMW 320i       9735         25          4          1  
 56.        Datsun 200       6229         23          4          1  
 57.        Datsun 210       4589         35          5          1  
 58.        Datsun 510       5079         24          4          1  
 59.        Datsun 810       8129         21          4          1  
 61.      Honda Accord       5799         25          5          1  
 62.       Honda Civic       4499         28          4          1  
 63.         Mazda GLC       3995         30          4          1  
 64.       Peugeot 604      12990         14          .          1  
 66.            Subaru       3798         35          5          1  
 67.     Toyota Celica       5899         18          5          1  
 68.    Toyota Corolla       3748         31          5          1  
 69.     Toyota Corona       5719         18          5          1  
 70.         VW Dasher       7140         23          4          1  
 71.         VW Diesel       5397         41          5          1  
 72.         VW Rabbit       4697         25          4          1  
 73.       VW Scirocco       6850         25          4          1  
 74.         Volvo 260      11995         17          5          1  

Did you see that some of the observations had a value of  '.' for rep78? These are missing values. For example, the value of rep78 for the AMC Spirit is missing. Stata treats a missing value as positive infinity, the highest number possible. So, when we said list if  rep78 >= 4, Stata included the observations where rep78 was '.' as well.

If we wanted to include just the valid (non-missing) observations that are greater than or equal to 4, we can do the following to tell Stata we want only observations where rep78 >= 4 and rep78 is not missing.

list if rep78 >= 4  &  !missing(rep78) 
                   make      price        mpg      rep78    foreign 
  5.     Buick Electra       7827         15          4          0  
 15.      Chev. Impala       5705         16          4          0  
 20.        Dodge Colt       3984         30          5          0  
 24.       Ford Fiesta       4389         28          4          0  
 29.      Merc. Bobcat       3829         22          4          0  
 30.      Merc. Cougar       5379         14          4          0  
 33.        Merc. XR-7       6303         14          4          0  
 35.           Olds 98       8814         21          4          0  
 38.     Olds Delta 88       4890         18          4          0  
 43.       Plym. Champ       4425         34          5          0  
 47.    Pont. Catalina       5798         18          4          0  
 53.         Audi 5000       9690         17          5          1  
 55.          BMW 320i       9735         25          4          1  
 56.        Datsun 200       6229         23          4          1  
 57.        Datsun 210       4589         35          5          1  
 58.        Datsun 510       5079         24          4          1  
 59.        Datsun 810       8129         21          4          1  
 61.      Honda Accord       5799         25          5          1  
 62.       Honda Civic       4499         28          4          1  
 63.         Mazda GLC       3995         30          4          1  
 66.            Subaru       3798         35          5          1  
 67.     Toyota Celica       5899         18          5          1  
 68.    Toyota Corolla       3748         31          5          1  
 69.     Toyota Corona       5719         18          5          1  
 70.         VW Dasher       7140         23          4          1  
 71.         VW Diesel       5397         41          5          1  
 72.         VW Rabbit       4697         25          4          1  
 73.       VW Scirocco       6850         25          4          1  
 74.         Volvo 260      11995         17          5          1   

This code will also yield the same output as above.

list if rep78 >= 4 & rep78 != .

We can use if with most Stata commands. Here, we get summary statistics for price for cars with repair histories of 1 or 2. Note the double equal (==) represents IS EQUAL TO and the pipe ( | ) represents OR.

summarize price if rep78 == 1 | rep78 == 2
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   price |      10        5687   3216.375       3667      14500   

A simpler way to say this would be...

summarize price if rep78 <= 2 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   price |      10        5687   3216.375       3667      14500   

Likewise, we can do this for cars with repair history of 3, 4 or 5.

summarize price if rep78 == 3 | rep78 == 4 | rep78 == 5 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   price |      59    6223.847   2880.454       3291      15906   

Additionally, we can use this code to designate a range of values. Here is a summary of price for the values 3 through 5 in rep78.

summarize price if inrange(rep78,3,5)
Variable |       Obs        Mean    Std. Dev.       Min        Max
----------+--------------------------------------------------------
   price |        59    6223.847    2880.454       3291      15906

Let's simplify this by saying rep78 >= 3.

summarize price if rep78 >= 3 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   price |      64    6239.984   2925.843       3291      15906   

Did you see the mistake we made? We accidentally included the missing values because we forgot to exclude them. We really needed to say.

summarize price if rep78 >= 3 & !missing(rep78) 
Variable |     Obs        Mean   Std. Dev.       Min        Max
---------+-----------------------------------------------------
   price |      59    6223.847   2880.454       3291      15906   

Taking a random sample

It is also possible to take a simple random sample of your data using the sample command. This information can be found on our STATA FAQ page: How can I draw a random sample of my data?

Summary

Most Stata commands can be followed by if, for example

Summarize if rep78 equals 2

summarize if rep78 == 2 

Summarize if rep78 is greater than or equal to 2

summarize if rep78 >= 2 

Summarize if rep78 greater than 2

summarize if rep78 >  2 

Summarize if rep78 less than or equal to 2

summarize if rep78 <= 2

Summarize if rep78 less than 2

summarize if rep78 <2

Summarize if rep78 not equal to 2

summarize if rep78 != 2 

If expressions can be connected with

|   for OR
& for AND

Missing Values

Missing values are represented as '.' and are the highest value possible. Therefore, when values are missing, be careful with commands like

summarize if rep78 >  3
summarize if rep78 >= 3
summarize if rep78 != 3

to omit missing values, use

summarize if rep78 >  3 & !missing(rep78)
summarize if rep78 >= 3 & !missing(rep78)
summarize if rep78 != 3 & !missing(rep78)

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.