UCLA Academic Technology Services HomeServicesClassesContactJobs

R FAQ:
How do I generate a variogram for spatial data in R?

When analyzing geospatial data, describing the spatial pattern of a measured variable is of great importance.  A common way of visualizing the spatial autocorrelation of a variable is a variogram plot.  This can be done in R.  There are several libraries with variogram capabilities.  We will show how to generate a variogram using the geoR library.  To install, enter install.packages("geoR") and then library(geoR) in R.

Let's look at an example. Our dataset, ozone, contains ozone measurements from thirty-two locations in the Los Angeles area aggregated over one month. The dataset includes the station number (Station), the latitude and longitude of the station (Lat and Lon), and the average of the highest eight hour daily averages (Av8top). This data, and other spatial datasets, can be downloaded from the University of Illinois's Spatial Analysis Lab. By generating a variogram, we will be able to look at the variance of the differences of Av8top among pairs of stations at different distances.  We can look at a sample of our data and then a summary of the distances between the stations.

ozone<-read.table("http://www.ats.ucla.edu/stat/R/faq/ozone.csv", sep=",", header=T)
head(ozone, n=10)

   Station   Av8top      Lat       Lon
1       60 7.225806 34.13583 -117.9236
2       69 5.899194 34.17611 -118.3153
3       72 4.052885 33.82361 -118.1875
4       74 7.181452 34.19944 -118.5347
5       75 6.076613 34.06694 -117.7514
6       84 3.157258 33.92917 -118.2097
7       85 5.201613 34.01500 -118.0597
8       87 4.717742 34.06722 -118.2264
9       88 6.532258 34.08333 -118.1069
10      89 7.540323 34.38750 -118.5347

dists <- dist(ozone[,3:4])
summary(dists)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.07267 0.39550 0.64660 0.73670 0.99190 2.39700

Next, we can calculate a variogram using the latitude and longitude of the stations. For this, we will use the variog command. We will indicate the distance intervals we wish to consider.  Based on the summary of distances, we can look at 10 lag intervals of .15.  To do this, we will first create a breaks vector of the endpoints of our intervals.  To variog, we provide our coordinate variables and the "data", the variable of interest.  Then, we can look at the variog output that we will be plotting, the semi-variance, and the number of pairs counted in each interval.  We want to check that our variogram is not calculating the semi-variance on small numbers of pairs. 


breaks = seq(0, 1.5, l = 11)
v1 <- variog(coords = ozone[,3:4], data = ozone[,2], breaks = breaks)

v1.summary <- cbind(c(1:10), v1$v, v1$n)
colnames(v1.summary) <- c("lag", "semi-variance", "# of pairs")

v1.summary

      lag semi-variance # of pairs
 [1,]   1      1.848505         21
 [2,]   2      1.841220         57
 [3,]   3      3.120452         74
 [4,]   4      4.375926         67
 [5,]   5      5.910725         71
 [6,]   6      7.097913         55
 [7,]   7      7.896033         44
 [8,]   8      6.571356         37
 [9,]   9      4.071090         23
[10,]  10      3.317602         16

Now, we can plot the variogram:


plot(v1, type = "b", main = "Variogram: Av8top") 

In the summary, we can see lag distances up to 10*.15 = 1.5, the number of pairs that are this far apart in the dataset, and the semi-variance. As we can see from the plot, the semi-variance increases until the lag distance exceeds 1.

Variograms in other packages:

References:


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.