|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Singular value decomposition (SVD) is a type of matrix factorization. For more details on SVD, the Wikipedia page is a good starting point. On this page, we provide four examples of data analysis using SVD in R.
For a square matrix A with a non-zero determinant, there exists an inverse matrix B such that AB = I and BA = I. For a matrix that is not square, generalized inverse matrices have some (but not all) of the properties of an inverse matrix. SVD can be used to find a generalized inverse matrix. In the example below, we use SVD to find a generalized inverse B to the matrix A such that ABA = A. We compare our generalized inverse with the one generated by the ginv command.
library(MASS)
a <- matrix(c(1,1,1,1,1,1,1,1,1,
1,1,1,0,0,0,0,0,0,
0,0,0,1,1,1,0,0,0,
0,0,0,0,0,0,1,1,1), 9, 4)
a.svd <- svd(a)
a.svd$d
3.464e+00 1.732e+00 1.732e+00 9.688e-17
ds <- diag(1 / a.svd$d[1:3])
u <- a.svd$u
v <- a.svd$v
us <- as.matrix(u[, 1:3])
vs <- as.matrix(v[, 1:3])
(a.ginv <- vs %*% ds %*% t(us))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333
[2,] 0.25000 0.25000 0.25000 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333
[3,] -0.08333 -0.08333 -0.08333 0.25000 0.25000 0.25000 -0.08333 -0.08333 -0.08333
[4,] -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 0.25000 0.25000 0.25000
# using the function ginv defined in MASS
ginv(a)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333 0.08333
[2,] 0.25000 0.25000 0.25000 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333
[3,] -0.08333 -0.08333 -0.08333 0.25000 0.25000 0.25000 -0.08333 -0.08333 -0.08333
[4,] -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 -0.08333 0.25000 0.25000 0.25000
The code below requires the rimage package. It reads in a jpeg (pansy.jpg) and plots it in R, first in color (when the image is stored as three matrices--one red, one green, one blue) and then in grayscale (when the image is stored as one matrix). Then, using SVD, we can essentially compress the image. Note that we can recover the image to varying degrees of detail as we recreate the image from different numbers of dimensions from our SVD matrices. You can see how many dimensions are needed before you have an image that cannot be differentiated from the original.
library(rimage)
x <- read.jpeg("pansy.jpg")
dim(x)
[1] 600 465 3
plot(x, useRaster = TRUE)
r <- imagematrix(x, type="grey")
dim(r)
[1] 600 465
plot(r, useRaster= TRUE)
r.svd <- svd(r)
d <- diag(r.svd$d)
dim(d)
[1] 465 465
u <- r.svd$u
v <- r.svd$v
plot(1:length(r.svd$d), r.svd$d)
# first approximation
u1 <- as.matrix(u[-1, 1])
v1 <- as.matrix(v[-1, 1])
d1 <- as.matrix(d[1, 1])
l1 <- u1 %*% d1 %*% t(v1)
l1g <- imagematrix(l1, type = "grey")
plot(l1g, useRaster = TRUE)
# more approximation
depth <- 5
us <- as.matrix(u[, 1:depth])
vs <- as.matrix(v[, 1:depth])
ds <- as.matrix(d[1:depth, 1:depth])
ls <- us %*% ds %*% t(vs)
lsg <- imagematrix(ls, type = "grey")
plot(lsg, useRaster = TRUE)
This example uses the Stata dataset auto.dta. PCA can be achieved using SVD. Below, we first use the prcomp command in R and then plot the variances of the principal components (i.e. the square roots of the eigenvalues). These values can also be found through spectral decomposition on the correlation matrix or by SVD on the variable matrix after standardizing each variable.
llibrary(foreign)
auto <- read.dta("http://statistics.ats.ucla.edu/stat/data/auto.dta")
pca.m1 <- prcomp(~ trunk + weight + length + headroom, data = auto, scale = TRUE)
screeplot(pca.m1)
# spectral decomposition: eigen values and eigen vectors
xvars <- with(auto, cbind(trunk, weight, length, headroom))
corr <- cor(xvars)
a <- eigen(corr)
(std <- sqrt(a$values))
[1] 1.7379 0.8075 0.5264 0.2249
(rotation <- a$vectors)
[,1] [,2] [,3] [,4]
[1,] -0.5068 -0.2327 0.8249 0.092146
[2,] -0.5221 0.4536 -0.2677 0.670840
[3,] -0.5361 0.3903 -0.1370 -0.735833
[4,] -0.4280 -0.7667 -0.4786 -0.005704
# svd approach
df <- nrow(xvars) - 1
zvars <- scale(xvars)
z.svd <- svd(zvars)
z.svd$d/sqrt(df)
[1] 1.7379 0.8075 0.5264 0.2249
z.svd$v
[,1] [,2] [,3] [,4]
[1,] 0.5068 -0.2327 0.8249 -0.092146
[2,] 0.5221 0.4536 -0.2677 -0.670840
[3,] 0.5361 0.3903 -0.1370 0.735833
[4,] 0.4280 -0.7667 -0.4786 0.005704
This example uses the Stata dataset cerealnut.dta. Multi-dimensional scaling can also be achieved using SVD. The plots generated using cmdscale and the coordinates generated from the SVD steps are mirrored about the x/x1 = 0 axis, but are otherwise identical.
cnut <- read.dta("http://statistics.ats.ucla.edu/stat/data/cerealnut.dta")
# centering the variables
mds.data <- as.matrix(sweep(cnut[, -1], 2, colMeans(cnut[, -1])))
dismat <- dist(mds.data)
mds.m1 <- cmdscale(dismat, k = 8, eig = TRUE)
mds.m1$eig
[1] 1.584e+05 1.087e+05 1.056e+04 3.827e+02 6.976e+01 1.252e+01 5.756e+00
[8] 2.224e+00 6.467e-12 5.759e-12 2.660e-12 2.376e-12 2.214e-12 1.020e-12
[15] 7.264e-13 5.392e-13 3.527e-13 -5.217e-14 -3.928e-13 -6.939e-13 -1.011e-12
[22] -1.098e-12 -1.448e-12 -3.814e-12 -1.241e-11
mds.m1 <- cmdscale(dismat, k = 2, eig = TRUE)
x <- mds.m1$points[, 1]
y <- mds.m1$points[, 2]
plot(x, y)
text(x + 20, y, label = cnut$brand)
# eigenvalues
xx <- svd(mds.data %*% t(mds.data))
xx$d
[1] 1.584e+05 1.087e+05 1.056e+04 3.827e+02 6.976e+01 1.252e+01 5.756e+00 2.224e+00
[9] 8.195e-12 5.658e-12 4.377e-12 3.204e-12 2.204e-12 2.142e-12 1.816e-12 1.708e-12
[17] 1.292e-12 1.239e-12 1.146e-12 8.363e-13 6.949e-13 4.491e-13 2.551e-13 1.103e-13
[25] 5.319e-14
# coordinates
xxd <- xx$v %*% sqrt(diag(xx$d))
x1 <- xxd[, 1]
y1 <- xxd[, 2]
plot(x1, y1)
text(x1 + 20, y1, label = cnut$brand)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services