|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
| t.test | t-tests, including one sample, two sample and paired |
| tapply | applies a function to each cell of a ragged array |
| var | calculates the variance |
| lm | fits a linear model (regression) |
| anova | extracts the anova table from a lm object |
| summary | generic function provides a synopsis of an object |
| fitted | extracts the fitted values from a lm object |
| resid | extracts the residuals from a lm object |
| plot | a generic function which is used here to obtain default plots of a lm object as well as to generate a scatter plot between two continuous variables. |
| glm | generalized linear models |
| wilcox.test | non-parametric analyses |
| kruskal.test | non-parametric analyses |
Read in the hs1 data via the internet using read.table function. We also use attach function to place the data set on the search path of R.
Here is the link to the syntax file for this section.
rm(list=ls())
hs1 <- read.table("http://www.ats.ucla.edu/stat/R/notes/hs1.csv", header=T, sep=",")
attach(hs1)
This is a chi-square test of independence for the two-way table.
tab1 <- table(female, ses) # chi-square test of independence summary(tab1)
This is the one-sample t-test, testing whether the sample of writing scores was drawn from a population with a mean of 50.
t.test(write, mu=50)
This is the paired t-test, testing whether or not the mean of write equals the mean of read.
t.test(write, read, paired=TRUE)
This is the two-sample independent t-test. We can use either the by function or the tapply function to look at the variances of the variable write for each group of female. The output from the first t.test function assumes equal variances which is the default in the t.test function; the output from the second t.test function assumes unequal variances.
by(write, female, var) tapply(write, female, var)
# assuming equal variances t.test(write~female, var.equal=TRUE) # assuming unequal variances t.test(write~female, var.equal=FALSE)
In R you can use either the aov function or the anova function combined with the lm function. Both alternatives will give you the same results. The anova function extracts the anova table from the linear model fitted by the lm function. The aov function only fits an anova model and we use the summary function to see all the output.
anova(lm(write~factor(prog))) summary(aov(write~factor(prog)))
The following is an example of a two-way factorial ANOVA. Notice that in R, the sum of squares is type I.
m2<-lm(write~factor(prog)*factor(female)) anova(m2)
Here is an analysis of covariance (ANCOVA). In this example, prog is the categorical predictor and read is the continuous covariate.
anova(lm(write~factor(prog) + read)) summary(aov(write~factor(prog) + read))
Plain old OLS regression.
summary(lm(write~female+read)
The generic plot function will produce multiple diagnostic plots when applied to an lm object. These plots include residual versus fitted plots, qqplots of the residuals as well as scatter plots with the regression line overlaid. If you are only interested in one or a few of the plots it might be useful to use the which.plot option in the plot function.
lm2 <- lm(write~read+socst) summary(lm2) # plotting diagnostic plots of lm2 plot(lm2)
Let's take a closer look at the object lm2 we just created. Notice that an object can have many components of different types and different sizes.
class(lm2) names(lm2) length(lm2) length(lm2$residuals) length(lm2$coefficients) lm2$coefficients
The fitted function will extract the fitted values from the lm object and the resid function will extract the residuals.
write[1:20] fitted(lm2)[1:20] resid(lm2)[1:20]
In order to demonstrate we will create a dichotomous variable called honcomp (honors composition). Honcomp will be equal to 1 when the logical test of write >= 60 is true and honcomp will be equal to zero when it is not true. This variable is created purely for illustrative purposes only!
honcomp <- write >= 60 honcomp[1:20]
The glm function fits a generalized linear model including a logistic regression. In order to fit a logistic model we need to specify that the distribution of the dependent variable is binomial in the family argument and the default link function used will then be the logit function.
lr <- glm(honcomp~female+read, family=binomial) summary(lr) # odds ratios exp(coef(lr))
The signtest is the nonparametric analog to the single-sample t-test and is obtained by using the wilcox.test function. The value that is being tested is specified by the mu argument.
wilcox.test(write, mu=50)
The signrank test is the nonparametric analog to the paired t-test. This test can be obtained by also using the wilcox.test function and specifying T in the paired argument.
wilcox.test(write, read, paired=T)
The ranksum test is the nonparametric analog to the independent two-sample t-test.
wilcox.test(write, female)
The kruskal wallis test is the nonparametric analog to the one-way anova.
kruskal.test(write, ses)
Unless you are going to continue working with the hs1 data it is generally a good idea to detach all attached data frames.
detach()
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services