|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Kriging is a technique used in the analysis of spatial data. Once the spatial correlation structure of a variable has been identified, the data from the measured locations can be used to estimate the variable at locations where it had not been measured. This extrapolation from measured locations to unmeasured locations is called kriging. This page will walk through an example of ordinary kriging using proc krige2d in SAS. It is not rigorous in its presentation of kriging nor exhaustive in its presentation of proc krige2d, but hopefully illustrates the basic use of the technique and procedure.
There are several assumptions underlying ordinary kriging:
We assumed second-order stationarity when we fit a variogram model in the related page SAS FAQ: How do I fit a variogram model to my spatial data in SAS using Proc Variogram?. We will be using the same dataset and the Gaussian variogram model that appeared to best fit our data. The dataset contains measurements of coal seam thickness at 75 locations.
In ordinary kriging, we allow the variable values from the measured locations to act as covariates for the values at unmeasured locations. For each predicted location, we fit a linear function to these covariates (opting for all measured points or, more often, a subset of points in the neighborhood of the predicted point), applying the variance-covariance structure implied in the variogram. This generates an unbiased linear prediction and a standard error of the value at the location.
First, let's read in the data.
data thick; input east north thick @@; datalines; 0.7 59.6 34.1 2.1 82.7 42.2 4.7 75.1 39.5 4.8 52.8 34.3 5.9 67.1 37.0 6.0 35.7 35.9 6.4 33.7 36.4 7.0 46.7 34.6 8.2 40.1 35.4 13.3 0.6 44.7 13.3 68.2 37.8 13.4 31.3 37.8 17.8 6.9 43.9 20.1 66.3 37.7 22.7 87.6 42.8 23.0 93.9 43.6 24.3 73.0 39.3 24.8 15.1 42.3 24.8 26.3 39.7 26.4 58.0 36.9 26.9 65.0 37.8 27.7 83.3 41.8 27.9 90.8 43.3 29.1 47.9 36.7 29.5 89.4 43.0 30.1 6.1 43.6 30.8 12.1 42.8 32.7 40.2 37.5 34.8 8.1 43.3 35.3 32.0 38.8 37.0 70.3 39.2 38.2 77.9 40.7 38.9 23.3 40.5 39.4 82.5 41.4 43.0 4.7 43.3 43.7 7.6 43.1 46.4 84.1 41.5 46.7 10.6 42.6 49.9 22.1 40.7 51.0 88.8 42.0 52.8 68.9 39.3 52.9 32.7 39.2 55.5 92.9 42.2 56.0 1.6 42.7 60.6 75.2 40.1 62.1 26.6 40.1 63.0 12.7 41.8 69.0 75.6 40.1 70.5 83.7 40.9 70.9 11.0 41.7 71.5 29.5 39.8 78.1 45.5 38.7 78.2 9.1 41.7 78.4 20.0 40.8 80.5 55.9 38.7 81.1 51.0 38.6 83.8 7.9 41.6 84.5 11.0 41.5 85.2 67.3 39.4 85.5 73.0 39.8 86.7 70.4 39.6 87.2 55.7 38.8 88.1 0.0 41.6 88.4 12.1 41.3 88.4 99.6 41.2 88.8 82.9 40.5 88.9 6.2 41.5 90.6 7.0 41.5 90.7 49.6 38.9 91.5 55.4 39.0 92.9 46.8 39.1 93.4 70.9 39.7 94.8 71.5 39.7 96.2 84.3 40.3 98.2 58.2 39.5 ;
Recall that we decided on a range value of 30 and a scale value of 7.5 when fitting the variogram. Before kriging, we will need to choose a set of unmeasured locations to predict. To do this, let's look at the range of our measured locations with a proc univariate on east and north.
proc univariate data = thick; var east north; run; The UNIVARIATE Procedure Variable: east [...output omitted...] Quantile Estimate 100% Max 98.2 99% 98.2 95% 93.4 90% 90.6 75% Q3 83.8 50% Median 46.7 25% Q1 24.8 10% 7.0 5% 4.8 1% 0.7 0% Min 0.7 Variable: north [...output omitted...] Quantile Estimate 100% Max 99.6 99% 99.6 95% 90.8 90% 84.3 75% Q3 73.0 50% Median 51.0 25% Q1 15.1 10% 7.0 5% 4.7 1% 0.0 0% Min 0.0
From the values seen above, we can see that our measured points all fall in a 100 by 100 unit grid. We can plot the measured points to see what our data raw data looks like.
proc g3d data=thick; scatter north*east=thick; run;

Let's assume we wish to predict the seam thickness over the entire 100 by 100 grid--that is, to try to fit a surface to the 3-D scatterplot above. We can indicate to SAS what exact locations to estimate. Additionally, we need to choose which points should be used in calculating a fitted value for each point. We will refer to this as the "neighborhood" of each point. We can define a neighborhood based on a radius about a point (neighborhood = all points with distance d of the given point), a number of points (neighborhood = n points closest to the given point), or a combination of the two (neighborhood = all points within distance d, decreasing d if number of points within d exceeds max or increasing d if number of points within d is less than min).
Your neighborhood definition should reflect what you already know about your data. For example, the range seen in the variogram for this dataset was 30 and the shape of the Gaussian distribution requires rescaling this by the square root of 3 (see Littel, et. al. for how to scale ranges by distribution). Thus, assuming a Gaussian model, the range is closer to 50. We want our radius to be at least this big in order to fully reflect the spatial correlations in the data.
In the code below, we use proc krige2d to implement our kriging plan. We indicate our coordinate variables, north and east, and describe the locations of the points to predict in the grid statement, opting to predict values at the corners of 5 by 5 unit squares dividing our 100 by 100 unit region (441 total locations). In the pred statement, we indicate the predicted variable and our neighborhood definition (a radius of 60 units). Finally, in the model statement we provide the form of the variogram model and the parameter values. We will generate a dataset, est, containing estimated values at the indicated points.
proc krige2d data=thick outest=est; coord xc=east yc=north; grid x=0 to 100 by 5 y=0 to 100 by 5; pred var=thick r=60; model scale=7.5 range=30 form=gauss; run;
proc print data = est (obs = 10); run; Obs LABEL VARNAME GXC GYC NPOINTS ESTIMATE STDERR 1 Pred1.Model1 thick 0 0 23 43.9408 0.68260 2 Pred1.Model1 thick 0 5 25 43.2266 0.65560 3 Pred1.Model1 thick 0 10 28 41.6828 0.55909 4 Pred1.Model1 thick 0 15 29 40.4906 0.43590 5 Pred1.Model1 thick 0 20 31 38.9601 0.30185 6 Pred1.Model1 thick 0 25 32 37.4150 0.19440 7 Pred1.Model1 thick 0 30 32 36.1701 0.12705 8 Pred1.Model1 thick 0 35 35 34.9685 0.08115 9 Pred1.Model1 thick 0 40 39 33.8376 0.04872 10 Pred1.Model1 thick 0 45 40 33.0759 0.03615
In the dataset created by proc krige2d, we can see the coordinates of the predicted points, the number of measured points used in each estimate, the kriging estimate of our variable, and the standard error of the estimate. We can now plot a surface based on our kriging estimates.
proc g3d data=est;
plot gyc*gxc=estimate ;
label gyc = 'North'
gxc = 'East'
estimate = 'Estimated Thickness';
run;

In evaluating our estimates, we can look at the surface of standard errors to see where our most stable estimates can be found.
proc g3d data=est;
plot gyc*gxc=STDERR;
label gyc = 'North'
gxc = 'East'
STDERR= 'Standard Errors';
run;

In this plot, we can see that the points on the edges and especially at the corners of our grid have the highest standard errors. Because of their locations on the edges of the measured area, these points probably have far fewer points in their radius-defined neighborhood than points closer to the center of the grid. We can look at a plot of the number of points used in a location's prediction to examine this.
proc g3d data=est;
plot gyc*gxc=NPOINTS;
label gyc = 'North'
gxc = 'East'
NPOINTS = 'Number of Points';
run;

Our suspicions are confirmed: the edge points have far fewer points contributing to their estimates, generally resulting in higher standard errors than non-edge points. However, we can see that the most central location appears to have the most neighbors, but is an elevated point in the standard errors surface. Additional points do not always improve estimates, especially if the additional points are outside the range of spatial correlation. Your grid and neighborhood options can be used to focus on the locations of greatest interest or uncertainty. Consult the SAS documentation for proc krige2d for details.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services