When analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance. SAS has several procedures that allow you to explore such patterns. Proc variogram is one such procedure. It allows you to generate a pairwise distance/difference dataset from your existing data and calculate the variogram. The variogram illustrates how differences in a measured variable Z vary as the distances between the points at which Z is measured increase.
Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number, the latitude and longitude of the station, and the average of the highest eight hour daily averages. This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab.
proc print data=ozone (obs = 5); run; Obs Station Av8top Lat Lon 1 60 7.22581 34.1358 -117.924 2 69 5.89919 34.1761 -118.315 3 72 4.05289 33.8236 -118.188 4 74 7.18145 34.1994 -118.535 5 75 6.07661 34.0669 -117.751
In this dataset, we can use the latitude and longitude values as our coordinates and the average level of ozone as our measured variable of interest in calculating the variogram. Unless you have prior knowledge about the distances between locations, you will first need to use proc variogram to generate a dataset that includes the pairwise distances between all of the points in your data. In the code below, we indicate that we will output a new dataset (using the outpair option) called pairwise. On the compute statement, we indicate novar because we are not yet computing the variogram. On the coordinates statement, we specify our x-coordinate and y-coordinate.
proc variogram data=ozone outpair = pairwise; compute novar; coordinates xc=Lon yc=Lat; run;
The output dataset, pairwise, contains an observation for each possible ordered pair of observations from ozone. For a dataset of size n, a pairwise dataset of size n(n-1) will be generated. For a pair, the dataset will contain the coordinates of both locations, the distance between them, and the cosine of the angle between the two locations.
proc print data = pairwise (obs = 10); run; Obs VARNAME X1 Y1 X2 Y2 V1 V2 DISTANCE COS 1 Station -117.924 34.1358 -118.315 34.1761 60 69 0.39373 0.99475 2 Station -117.924 34.1358 -118.188 33.8236 60 72 0.40880 0.64552 3 Station -117.924 34.1358 -118.535 34.1994 60 74 0.61441 0.99463 4 Station -117.924 34.1358 -117.751 34.0669 60 75 0.18549 -0.92848 5 Station -117.924 34.1358 -118.210 33.9292 60 84 0.35295 0.81064 6 Station -117.924 34.1358 -118.060 34.0150 60 85 0.18201 0.74783 7 Station -117.924 34.1358 -118.226 34.0672 60 87 0.31045 0.97527 8 Station -117.924 34.1358 -118.107 34.0833 60 88 0.19070 0.96136 9 Station -117.924 34.1358 -118.535 34.3875 60 89 0.66090 0.92466 10 Station -117.924 34.1358 -118.454 34.0508 60 91 0.53732 0.98741
In order to calculate the variogram, we will need to specify a lag distance and a maximum number of lags. The lag distance is the size of the bins into which the pairwise distances are grouped. Then for each distance bin, the variance of the pairwise differences in the measured variable (in this example, average ozone) is calculated. Depending on the arrangement of points, the numbers of pairs in each bin can vary wildly and tends to be lower for greater distances. Small numbers of pairs can lead to unreliable variances and possible deceptive variograms. Thus, a maximum difference should be chosen with this in mind. To determine appropriate values for both, we can look at a proc univariate of the pairwise distances, focusing on the quantiles portion of the output.
proc univariate data = pairwise;
var distance;
run;
The UNIVARIATE Procedure
Variable: DISTANCE (Distance Between Pairs)
Moments
N 992 Sum Weights 992
Mean 0.73668247 Sum Observations 730.789006
Std Deviation 0.45046795 Variance 0.20292138
Skewness 0.99590373 Kurtosis 0.90397751
Uncorrected SS 739.454532 Corrected SS 201.095086
Coeff Variation 61.1481847 Std Error Mean 0.01430237
Basic Statistical Measures
Location Variability
Mean 0.736682 Std Deviation 0.45047
Median 0.646603 Variance 0.20292
Mode 0.072670 Range 2.32455
Interquartile Range 0.59736
NOTE: The mode displayed is the smallest of 496 modes with a count of 2.
Quantiles (Definition 5)
Quantile Estimate
100% Max 2.3972202
99% 2.1353806
95% 1.6735046
90% 1.3333395
75% Q3 0.9922617
50% Median 0.6466028
25% Q1 0.3949009
10% 0.2301238
5% 0.1690395
1% 0.0900813
0% Min 0.0726699
Extreme Observations
-------Lowest------ -----Highest-----
Value Obs Value Obs
0.0726699 527 2.24559 772
0.0726699 31 2.35137 108
0.0830621 669 2.35137 604
0.0830621 173 2.39722 255
0.0852947 917 2.39722 751
Given the above quantiles, a reasonable choice for a lag distance might be 0.1 and we can start with 12 as our number of lags. The code to implement this again uses proc variogram, but with different options than the previous call. We will output a new dataset (using the outvar option) called variopoints. On the compute statement, we indicate the lag distance and number of lags we selected based on the quantiles. We again specify our x-coordinate and y-coordinate in the coordinates line, and then indicate the measured variable of interest on the var statement.
proc variogram data=ozone outvar = variopoints; compute lagd = .1 maxlag = 12; coordinates xc=Lon yc=Lat; var Av8top; run; proc print data = variopoints; run; Obs VARNAME LAG COUNT DISTANCE AVERAGE VARIOG COVAR 1 Av8top -1 32 . 6.88687 . 4.72220 2 Av8top 0 0 . . . . 3 Av8top 1 21 0.11652 7.17801 1.84850 3.71954 4 Av8top 2 37 0.20608 6.48717 1.98780 2.68354 5 Av8top 3 43 0.29988 6.45338 2.44336 2.73521 6 Av8top 4 51 0.39856 6.94997 3.08333 1.41148 7 Av8top 5 50 0.50010 6.63555 4.43721 -0.13814 8 Av8top 6 47 0.60300 6.78446 4.57383 -0.55327 9 Av8top 7 41 0.70342 6.96270 6.73215 -2.07707 10 Av8top 8 38 0.80175 7.02013 6.47101 -1.40847 11 Av8top 9 30 0.90026 6.64823 6.72505 -2.18947 12 Av8top 10 31 1.00109 7.44071 9.36003 -3.28606 13 Av8top 11 26 1.09800 6.88141 6.36359 -1.29684 14 Av8top 12 17 1.19387 7.37454 6.03411 -1.14203
We can see that our new dataset contains one observation for each of the 12 lag distances. Additionally, there is one observation where lag equals -1 that includes locations paired with themselves and one observation where lag equals 0 for the instances where we might see multiple observations from the same location. In the variable count, we can see how many pairs of observations fell into each lag class. None of these counts appear particularly low, so we can proceed with 12 lags. Had we encountered low counts, we may have considered decreasing the number of lags (if the low counts appeared at high distances) or expanding the size of a lag (if the low counts appeared at a range of distances). The variable variog contains the estimated variogram for each distance.
Finally, we can plot the variogram using proc gplot with the dataset we generated.
symbol1 i=join l=1 c=blue; axis1 minor=none label=(c=black 'Lag'); axis2 label=(angle=90 rotate=0 c=black 'Variogram'); proc gplot data=variopoints; plot variog*distance; run;

This plot suggests that points separated by short distances are correlated and the correlation decreases as the distance between points increases. This can be seen in the increase in the variogram up to a lag distance of about 0.7. If our outcome variable is free of spatial correlation, we will expect the variogram to appear relatively constant across all distances. Proc variogram includes many options that were not explored on this page. Please see the SAS documentation for more information on other variogram options.
[http://www.ats.ucla.edu/stat/sas/footer.htm]