[http://www.ats.ucla.edu/stat/_headers/header1.htm][http://www.ats.ucla.edu/stat/sas/faq/header.htm][http://www.ats.ucla.edu/stat/_headers/header2.htm]

SAS FAQ:
How do I generate a variogram for spatial data in SAS?

When analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance.  SAS has several procedures that allow you to explore such patterns. Proc variogram is one such procedure. It allows you to generate a pairwise distance/difference dataset from your existing data and calculate the variogram.  The variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.

Let's look at an example.  Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month.  The dataset includes the station number, the latitude and longitude of the station, and the average of the highest eight hour daily averages. This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab.

proc print data=ozone (obs = 5); run; 

Obs    Station     Av8top      Lat         Lon

  1       60      7.22581    34.1358    -117.924
  2       69      5.89919    34.1761    -118.315
  3       72      4.05289    33.8236    -118.188
  4       74      7.18145    34.1994    -118.535
  5       75      6.07661    34.0669    -117.751

In this dataset, we can use the latitude and longitude values as our coordinates and the average level of ozone as our measured variable of interest in calculating the variogram.  Unless you have prior knowledge about the distances between locations, you will first need to use proc variogram to generate a dataset that includes the pairwise distances between all of the points in your data.  In the code below, we indicate that we will output a new dataset (using the outpair option) called pairwise.  On the compute statement, we indicate novar because we are not yet computing the variogram.  On the coordinates statement, we specify our x-coordinate and y-coordinate.

proc variogram data=ozone outpair = pairwise; 
  compute novar; 
  coordinates xc=Lon yc=Lat; 
run; 

The output dataset, pairwise, contains an observation for each possible ordered pair of observations from ozone.  For a dataset of size n, a pairwise dataset of size n(n-1) will be generated. For a pair, the dataset will contain the coordinates of both locations, the distance between them, and the cosine of the angle between the two locations.

proc print data = pairwise (obs = 10);
run;

Obs  VARNAME     X1        Y1       X2        Y2    V1  V2  DISTANCE     COS
  1  Station  -117.924  34.1358  -118.315  34.1761  60  69   0.39373   0.99475
  2  Station  -117.924  34.1358  -118.188  33.8236  60  72   0.40880   0.64552
  3  Station  -117.924  34.1358  -118.535  34.1994  60  74   0.61441   0.99463
  4  Station  -117.924  34.1358  -117.751  34.0669  60  75   0.18549  -0.92848
  5  Station  -117.924  34.1358  -118.210  33.9292  60  84   0.35295   0.81064
  6  Station  -117.924  34.1358  -118.060  34.0150  60  85   0.18201   0.74783
  7  Station  -117.924  34.1358  -118.226  34.0672  60  87   0.31045   0.97527
  8  Station  -117.924  34.1358  -118.107  34.0833  60  88   0.19070   0.96136
  9  Station  -117.924  34.1358  -118.535  34.3875  60  89   0.66090   0.92466
 10  Station  -117.924  34.1358  -118.454  34.0508  60  91   0.53732   0.98741

In order to calculate the variogram, we will need to specify a lag distance and a maximum number of lags.  The lag distance is the size of the bins into which the pairwise distances are grouped.  Then for each distance bin, the variance of the pairwise differences in the measured variable (in this example, average ozone) is calculated.  Depending on the arrangement of points, the numbers of pairs in each bin can vary wildly and tends to be lower for greater distances.  Small numbers of pairs can lead to unreliable variances and possible deceptive variograms.  Thus, a maximum difference should be chosen with this in mind.  To determine appropriate values for both, we can look at a proc univariate of the pairwise distances, focusing on the quantiles portion of the output.

proc univariate data = pairwise;
  var distance;
run;

The UNIVARIATE Procedure
Variable:  DISTANCE  (Distance Between Pairs)

                            Moments
N                         992    Sum Weights                992
Mean               0.73668247    Sum Observations    730.789006
Std Deviation      0.45046795    Variance            0.20292138
Skewness           0.99590373    Kurtosis            0.90397751
Uncorrected SS     739.454532    Corrected SS        201.095086
Coeff Variation    61.1481847    Std Error Mean      0.01430237

              Basic Statistical Measures
    Location                    Variability
Mean     0.736682     Std Deviation            0.45047
Median   0.646603     Variance                 0.20292
Mode     0.072670     Range                    2.32455
                      Interquartile Range      0.59736

NOTE: The mode displayed is the smallest of 496 modes with a count of 2.

Quantiles (Definition 5)

Quantile        Estimate
100% Max       2.3972202
99%            2.1353806
95%            1.6735046
90%            1.3333395
75% Q3         0.9922617
50% Median     0.6466028
25% Q1         0.3949009
10%            0.2301238
5%             0.1690395
1%             0.0900813
0% Min         0.0726699

            Extreme Observations
-------Lowest------        -----Highest-----
     Value      Obs           Value      Obs
 0.0726699      527         2.24559      772
 0.0726699       31         2.35137      108
 0.0830621      669         2.35137      604
 0.0830621      173         2.39722      255
 0.0852947      917         2.39722      751

Given the above quantiles, a reasonable choice for a lag distance might be 0.1 and we can start with 12 as our number of lags.  The code to implement this again uses proc variogram, but with different options than the previous call. We will output a new dataset (using the outvar option) called variopoints.  On the compute statement, we indicate the lag distance and number of lags we selected based on the quantiles.  We again specify our x-coordinate and y-coordinate in the coordinates line, and then indicate the measured variable of interest on the var statement.

proc variogram data=ozone outvar = variopoints; 
  compute lagd = .1 maxlag = 12; 
  coordinates xc=Lon yc=Lat;
  var Av8top; 
run; 

proc print data = variopoints; run;

Obs    VARNAME    LAG    COUNT    DISTANCE    AVERAGE     VARIOG      COVAR
  1    Av8top      -1      32       .         6.88687     .          4.72220
  2    Av8top       0       0       .          .          .           .
  3    Av8top       1      21      0.11652    7.17801    1.84850     3.71954
  4    Av8top       2      37      0.20608    6.48717    1.98780     2.68354
  5    Av8top       3      43      0.29988    6.45338    2.44336     2.73521
  6    Av8top       4      51      0.39856    6.94997    3.08333     1.41148
  7    Av8top       5      50      0.50010    6.63555    4.43721    -0.13814
  8    Av8top       6      47      0.60300    6.78446    4.57383    -0.55327
  9    Av8top       7      41      0.70342    6.96270    6.73215    -2.07707
 10    Av8top       8      38      0.80175    7.02013    6.47101    -1.40847
 11    Av8top       9      30      0.90026    6.64823    6.72505    -2.18947
 12    Av8top      10      31      1.00109    7.44071    9.36003    -3.28606
 13    Av8top      11      26      1.09800    6.88141    6.36359    -1.29684
 14    Av8top      12      17      1.19387    7.37454    6.03411    -1.14203

We can see that our new dataset contains one observation for each of the 12 lag distances.  Additionally, there is one observation where lag equals -1 that includes locations paired with themselves and one observation where lag equals 0 for the instances where we might see multiple observations from the same location.  In the variable count, we can see how many pairs of observations fell into each lag class.  None of these counts appear particularly low, so we can proceed with 12 lags.  Had we encountered low counts, we may have considered decreasing the number of lags (if the low counts appeared at high distances) or expanding the size of a lag (if the low counts appeared at a range of distances).  The variable variog contains the estimated variogram for each distance.

Finally, we can plot the variogram using proc gplot with the dataset we generated.


symbol1 i=join l=1 c=blue; 
axis1 minor=none label=(c=black 'Lag'); 
axis2 label=(angle=90 rotate=0 c=black 'Variogram'); 

proc gplot data=variopoints; 
  plot variog*distance; 
run;

This plot suggests that points separated by short distances are correlated and the correlation decreases as the distance between points increases. This can be seen in the increase in the variogram up to a lag distance of about 0.7.  If our outcome variable is free of spatial correlation, we will expect the variogram to appear relatively constant across all distances.  Proc variogram includes many options that were not explored on this page. Please see the SAS documentation for more information on other variogram options. 

See also

References

 

[http://www.ats.ucla.edu/stat/sas/footer.htm]