UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS FAQ:
How do I fit a variogram model to my spatial data in SAS using Proc Variogram?

We often examine data with the aim of making predictions.  Spatial data analysis is no exception.  Given measurements of a variable at a set of points in a region, we might like to extrapolate to points in the region where the variable was not measured or, possibly, to points outside the region that we believe will behave similarly.  We can base these predictions on our measured values alone by kriging or we can incorporate covariates and make predictions using a regression model.  In both scenarios, we will need to first fit a variogram model to our data. 

You can fit a variogram model graphically using proc variogram to calculate and then plot the possible models; or you can fit several variogram models using proc mixed and compare the model fits. This page walks through the first approach.  For an example of the other approach, see SAS FAQ: How do I fit a variogram model to my spatial data in SAS using Proc Mixed?. Before visually fitting a variogram model, you must first calculate and plot the variogram.  For details on how to do this, see SAS FAQ: How do I generate a variogram for spatial data in SAS?.  Once you have calculated and plotted your variogram, you will need to assess the shape of the variogram to determine which variogram model is most appropriate for your dataset. 

There are several shapes that a variogram might follow and, in fitting a variogram model, we aim to mathematically describe the shape. Some commonly used variogram models are the spherical, exponential and Gaussian models.  In all three of these models, the variogram increases with distance at small distances and then levels off.  This general shape is suggestive of a spatial correlation that is positive and strong at small distances and becomes less so as distances increase until reaching a certain distance d.  Pairs of points separated by a distance greater than d appear uncorrelated. 

This page will walk through an example with these three models because they are relatively straightforward and because all three are supported as types of spatial correlations in proc krige2d and proc mixed, which may be useful for future analysis of our outcome variable. 

We will be using the thick dataset provided in the SAS documentation for proc variogram, which provides thickness of coal seams at different coordinates.


data thick; 
  input east north thick @@; 
  datalines; 
   0.7  59.6  34.1   2.1  82.7  42.2   4.7  75.1  39.5  
   4.8  52.8  34.3   5.9  67.1  37.0   6.0  35.7  35.9 
   6.4  33.7  36.4   7.0  46.7  34.6   8.2  40.1  35.4    
  13.3   0.6  44.7  13.3  68.2  37.8  13.4  31.3  37.8 
  17.8   6.9  43.9  20.1  66.3  37.7  22.7  87.6  42.8  
  23.0  93.9  43.6  24.3  73.0  39.3  24.8  15.1  42.3 
  24.8  26.3  39.7  26.4  58.0  36.9  26.9  65.0  37.8  
  27.7  83.3  41.8  27.9  90.8  43.3  29.1  47.9  36.7 
  29.5  89.4  43.0  30.1   6.1  43.6  30.8  12.1  42.8 
  32.7  40.2  37.5  34.8   8.1  43.3  35.3  32.0  38.8 
  37.0  70.3  39.2  38.2  77.9  40.7  38.9  23.3  40.5 
  39.4  82.5  41.4  43.0   4.7  43.3  43.7   7.6  43.1 
  46.4  84.1  41.5  46.7  10.6  42.6  49.9  22.1  40.7 
  51.0  88.8  42.0  52.8  68.9  39.3  52.9  32.7  39.2 
  55.5  92.9  42.2  56.0   1.6  42.7  60.6  75.2  40.1 
  62.1  26.6  40.1  63.0  12.7  41.8  69.0  75.6  40.1 
  70.5  83.7  40.9  70.9  11.0  41.7  71.5  29.5  39.8 
  78.1  45.5  38.7  78.2   9.1  41.7  78.4  20.0  40.8 
  80.5  55.9  38.7  81.1  51.0  38.6  83.8   7.9  41.6 
  84.5  11.0  41.5  85.2  67.3  39.4  85.5  73.0  39.8  
  86.7  70.4  39.6  87.2  55.7  38.8  88.1   0.0  41.6 
  88.4  12.1  41.3  88.4  99.6  41.2  88.8  82.9  40.5  
  88.9   6.2  41.5  90.6   7.0  41.5  90.7  49.6  38.9  
  91.5  55.4  39.0  92.9  46.8  39.1  93.4  70.9  39.7  
  94.8  71.5  39.7  96.2  84.3  40.3  98.2  58.2  39.5 
; 

Before fitting a variogram model, we will first calculate the variogram and graph it.

proc variogram data=thick outv = outv; 
  compute lagd = 7 maxlag = 15 robust; 
  coordinates xc=east yc=north; 
  var thick; 
run; 

symbol1 i=join l=1 c=blue; 
axis1 minor=none label=(c=black 'Lag'); 
axis2 label=(angle=90 rotate=0 c=black 'Variogram'); 

proc gplot data=outv; 
  plot VARIOG*distance; 
run;

Next, we will calculate theoretical variogram values under three different models and overlay these with our data's variogram.  Note that there are slight variations in the formulations for a given shape found in different references.  The formulations we are using for the variograms can be found in chapter 11 of SAS for Mixed Models, Second Edition, by Littel, et. al.  We are assuming a range value of 30 and a scale value of 7.5, as suggested in the SAS documentation.  This suggests that at a distance of 30, the variance of the differences in measured thickness levels off at 7.5.  Depending on your model, some rescaling of the range might be useful. 


data outv_models; set outv; 
  vari = variog; type = 'data'; output;
  c0=7.5; a0=30; 
  vari = c0*(1-exp((-distance*distance)/(a0*a0))); 
  type = 'Gaussian   '; output; 
  vari_val = (1.5*(distance/a0) - .5*((distance/a0)**3));
  if (distance <  a0) then varisph = c0*(vari_val);
  else varisph = c0;
  vari = varisph; type = 'spherical'; output; 
  variexp = c0*(1-exp(-distance/a0));
  vari = variexp; type = 'exponential'; output;
run; 

symbol1 i=join l=1 c=blue line=20; 
symbol2 i=join l=1 c=black; 
symbol3 i=join l=1 c=green line=21; 
symbol4 i=join l=1 c=red line=23;
axis1 minor=none label=(c=black 'Distance'); 
axis2 order=(0 to 9 by 1) minor=none label=(angle=90 rotate=0 c=black 'Variogram');  

proc gplot data=outv_models; 
  plot vari*distance=type; 
run;

Looking at the above graph of three theoretical variograms and the variogram calculated from our data, we can see that, though all three theoretical variograms follow the very general increasing-then-flattening shape of our data, the Gaussian variogram appears to most closely match the data.  Thus, if we were to model coal seam thickness in this dataset and wished to indicate a spatial correlation in the outcome, we would indicate a Gaussian spatial correlation pattern.  This is consistent with the findings of the model-fitting approach to selecting a variogram. 

This graph also gives us a chance to see how these three theoretical variograms differ in shape: exponential increases gradually and is concave over the range; spherical features a sharp increase and a quick leveling off; Gaussian offers a compromise between the two.  For further variogram shapes to consider, please see the references below. 

See also

References


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.