|
|
|
||||
|
Stat Computing > Stata > Notes
|
|
||||
You can download the datasets used in the Stata 1 & 2 Classes directly into Stata over the Internet. Use the following commands:
. net from http://www.ats.ucla.edu/stat/stata/notes . net get dataYou can also download the data files for the Stata, SAS, and SPSS classes as a Winzip file by clicking on cldata.zip
Stata is a powerful and yet easy to use statistical package that runs on Windows, Macintosh, and Unix platforms. This two-hour class designed for people who are just getting started using Stata. The class will involve demonstration of Stata commands for statistics, graphics, and data management. Students will also get a chance to try out Stata during the hands-on portions of the class.
Class Notes
The Stata 2 Class is meant for students who have taken the Stata 1 Class or who have the equivalent knowledge. The Stata 2 Class is more concerned with computer and data management commands than with the statistical procedures. The two-hour class will combine demonstration and explanation along with hands-on practice.
Class Notes Splitting & Combining Files Intro to Graphics I log, I do Controlling Your Computer Statistics Revisited Searching for Help Learning More about Stata Exploring on your own, the Codebook for LA High School
Click here for all of the class notes and extra resources in one file (for easy printing)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
We begin by showing how some of the commands work. We will not be showing all of the options for any of the commands.
. cd
mkdir
use
summarize
univar
graph
correlate
tabulate
help
search
Type in the commands at the same time as the class instructor. Don't worry if you fall behind. There will be time to catch up when we move on to the next unit in a few minutes.
. mkdir statacls
cd statacls
log using unit1, text
use http://www.ats.ucla.edu/stat/stata/notes/hsb1
The mkdir command creates a new directory where we can store all of our data and log files. The cd command stands for change directory, in this case, to change to the statacls directory. The log command starts a log file called unit1 that keeps a record of the commands and output during the stata session. The use command loads a Stata dataset into memory for use. We include the "http..." because the data resides on our web server and we are loading it into memory via the Internet.
hsb1 is a sample of 200 cases from the Highschool and Beyond Study (Rock, Hilton, Pollack, Ekstrom & Goertz, 1985). It includes the following variables: id female race ses schtyp program read write math science socst. You can see the names of the variables in Stata's 'Variables' window.
. summarize read math science
summarize read, detail
The summarize command displays basic descriptive statistics: n, mean, standard deviation, min and max. The detail option provides more descriptive statistics, including the variance, skewness, kurtosis, the median and other percentiles.
. univar read math science
The univar command displays basic descriptive statistics and the five number summary: min, 25th percentile, median, 75th percentile, and max.
. graph read
The graph command is used to create one and two dimensional graphs. When graph is used with a single variable, you get a histogram.
. correlate read write math
The correlate command displays a matrix of Pearson correlations for the variables listed.
. graph read math
This version of the graph command uses two variables and displays a scatterplot.
. tabulate ses
tabulate ses, nolabel
The tabulate command with one variable creates a frequency distribution table. Note that the nolabel option shows the numeric values of the variable instead of the "value label."
. tabulate ses
tabulate ses female, all
The tabulate command with two variables creates a two-way table or crosstabulation. With the chi2 option the command includes the chi-square value along with its p-value.
. help tabulate
help tab
search residual
help command followed by a Stata command brings up the on-line help system for that command. With help you must spell the name completely and correctly. The search command looks for the term in help files, Stata Technical Bulletins and Stata FAQs.
The log command is now used to close the log file. You can view the log file, unit1.log, with any text editor or word processor or you can enter the type command:
. log close
type unit1.log
A common thing that you might want to do is to copy your Stata output and/or graphs to Word. You can see the Stata Frequently Asked Question How do I Copy Stata Output and Stata Graphs into Word? to learn more about this.
. use http://www.ats.ucla.edu/stat/stata/notes/hsb1
summarize read write math
summarize read, detail
graph read
correlate read write math
graph read math
tabulate ses
tabulate ses, nolabel
tabulate ses female
tabulate ses female, chi2
help summarize
search homogeneity
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
Note: .dta is the extension for Stata-format files. Stata automatically includes .dta on files when they are saved. You do not have to include the .dta when reading datasets using Stata.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
Again, we will not be showing all of the options for any of the commands.
. list
describe
codebook
display
inspect
if exp qualifier
in range qualifier
. list
list female ses read science
The list command without any variable names displays values of all the variables for all the cases. list with variable names displays values only the variables following the command.
Note about --more--
Stata displays --more-- whenever it fills up the computer screen. Pressing the
'space bar' will display the next screen, and so on, until all of the information has been
displayed. To get out of --more--, you can click on the 'break' button, select
'Break" from the pull-down 'Tools' menu, or press the 'q' key.
. list female ses read science if science==.
list female ses read science if ses=="high"
list female ses read science if ses==3
The if exp qualifier allows you to list values for those cases for which the exp is "true." The first list displays all cases for which science is missing. Stata uses "." to indicate missing values. The if ses=="high" does not work because ses is a numeric variable. And that is why if ses==3 does work.
| ~ | not |
| == | equal |
| ~= | not equal |
| != | not equal |
| > | greater than |
| >= | greater than or equal |
| < | less than |
| <= | less than or equal |
| & | and |
| | | or |
list female ses read math in 50/60
list female ses read math in f/10
list female ses read math in -10/l
The in range qualifier allows you to list values for the subset of cases in the range. The range f/10 is the same as 1/10 and -10/l is the same as 191/200.
. describe
describe displays a summary of a Stata dataset, describing the variables and other information.
. codebook
codebook female ses race
codebook displays information about variables' names, labels and values.
. inspect female ses race
inspect displays information about the values of variables and is useful for checking data accuracy.
. summarize math
display 9.368^2
This use of display functions as a calculator. We found the standard deviation of math to be 9.368, and we use the display command to get the variance (the square of the standard deviation).
The if and in qualifiers work with almost all Stata commands. See how these commands from Unit 1 work with the qualifiers.
. summarize read write math if ses==3
correlate read write math in 50/150
graph read math if female==0
tabulate ses if read>50
tabulate ses female in 1/60
. list
list female ses read science
list female ses read science if science==.
list female ses read science if ses=="high"
list female ses read science if ses==3
list female ses read science in f/10
list female ses read science in -10/l
describe
codebook female ses race
display 27.8^3.2
display (57.5 - 52.23)/10.25294
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
Note: The dataset hsb2.dta is the same as hsb1.dta but without the missing data for science or the race values coded 5.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
By now you know that we will not be showing all of the options for any of the commands.
. clear
edit
save
infile
insheet
. clear
The clear command clears out the dataset that is currently in memory. We need to do this before we can create or read a new dataset.
name midterm final Smith 79 84 Jones 87 86 Brown 91 94 Adraktas 80 84 |
. edit
The edit command opens up a spreadsheet like window in which you can enter and change data. You can also get to the 'Data Editor' from the pull-down 'Window' menu or by clicking on the 'Data Editor' icon on the tool bar.
Enter values and press return. Double click on the column head and you can change the name of the variables. When you are done click the 'close box' for the 'Data Editor' window.
. save grades
save grades, replace
The save command will save the dataset as grades.dta. Editing the dataset changes data in the computer's memory, it does not change the data that is stored on the computer's disk. The replace option allows you to save a changed file to the disk, replacing the original file.
. list
summarize
Let's list the contents and run some statistics on the new data set
. clear
The infile command is used to read data from an external ascii file. The names of the variables are given followed by the keyword using which in turn is followed by the name of the file. str10 is not a variable name but indicates that name is a string variable up to 10 characters long.
The ASCII file called ascii.raw that looks like this:
"Smith" 79 84 "Jones" 87 86 "Brown" 91 94 "Adraktas" 80 84
. clear
type spread.raw
insheet using spread.raw
list
The insheet command is used to read data from a file created by a spreadsheet or database program. The values in the file must be either comma or tab delimited. The names are included in the file.
The spreadsheet file called spread.raw that looks like this:
name,midterm,final Smith,79,84 Jones,87,86 Brown,91,94 Adraktas,80,84
. clear
edit
save grades
save grades, replace
list
summarize
clear
infile str10 name midterm final using ascii.raw
clear
insheet using spread.raw
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The datasets ascii.raw and spread.raw can be loaded directly into Stata, over the
Internet, using the following commands:
infile str10 name midterm final using http://www.ats.ucla.edu/stat/stata/notes/ascii.raw
insheet using http://www.ats.ucla.edu/stat/stata/notes/spread.raw
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. generate
replace
recode
egen
. use hsb2, clear
generate total = read + write
summarize read write total
generate total = read + write + math
replace total = read + write + math
. summarize read write math total
generate highses = ses
tabulate highses ses
. recode highses 3=1 1 2=0
tabulate highses ses
The generate command allows you to create new variables. The replace command allows you to change an existing variable. The recode command allows you to the change specific values of the variables.
. egen zread = std(read) * standard scores for read
list read zread
summarize read zread
egen rmean = mean(read), by(ses) * mean read for each ses
list ses read rmean
egen mread = median(read), by(prog) * median read for each prog
list prog read mread
egen rread = rank(read) * rank for read
list read rread
egen stands for extended generate and is an extremely powerful command that has many options for creating new variables. Only a few of these options are demonstrated above. Here is a list of some of the other options:
| count | number of non-missing vlaues |
| diff | compares variables, 1 if different, 0 otherwise |
| fill | fill with a pattern |
| group | creates a group id from a list of variables |
| iqr | interquartile range |
| ma | moving average |
| max | maximum value |
| mean | mean |
| median | median |
| min | minimum value |
| pctile | percentile |
| rank | rank |
| rmean | mean across variables |
| sd | standard deviation |
| std | standard scores |
| sum | sums |
. generate tot = read + write + math
summarize tot
replace tot = read + math + science
summarize tot
generate newprog = prog
recode newprog 1/3=2 2=1
tabulate nprog
egen aread = mean(read),by(prog)
list prog read aread
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset hsb2.dta can be loaded directly into Stata, over the Internet, using the
following commands:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. ttest
oneway
anova
test
regress
predict
. use hsb2, clear
We will continue to use the Highschool and Beyond dataset. The file 'hsb2.dta' has correct and complete data for all of the variables.
. ttest write=50
This example involves the single-sample t-test, testing whether the sample was drawn from a population with a mean of 50. By the way, the standardized writing test in this sample was normed nationally with a mean of 50.
. ttest write=read
This example makes use of the t-test for dependent samples. In this case, we are testing whether there is a significant difference betweent the math and the science test scores.
. ttest write, by(female)
ttest write, by(female) unequal
sdtest write, by(female)
The t-test for independent groups comes in two varities: pooled variance and unequal variance. We want to look at the differences in writing test scores between 'school types.' We will begin with the ttest for independent groups with pooled variances and compare the results to the ttest for independent groups using unequal variance.
There is a test for heterogeneity of variance, sdtest, but it is overly sensitive to nonnormality and statisticians do not recommend using it to screen for heterogeneity of variance.
. oneway write prog, tabulate
anova write prog
sort prog
by prog: summarize write
table prog, contents(n write mean write sd write)
Here are two different ways to perform a one-way analysis of variance (ANOVA). They both give the exact same answer. The most visible difference is that oneway includes a test for homogeneity of variance.
. anova write female prog female*prog
This example demonstrates a 3 X 3 factorial analysis of variance.
. regress write read
regress write read, beta
predict pre1
generate pre2 = 23.95944 + .5517051*read
list pre1 pre2
graph pre1 write read, symbol(io) connect(L.)
graph pre1 write read, symbol(io) connect(L.) jitter(2)
These are two examples of simple linear regression. The first one displays confidence intervals for the regression coefficients while the second one displays standardized regression coefficients along with the 'regular' regression coefficients. The predict command computes a predicted science score for each observation. Compare 'pre1' with 'pre2' that was created using the generate command. The graph command, in this example, displays a scatter plot of read and write along with showing the regression regression line of science on math. The second example uses the jitter option to help see the points where there are multiple observations on one point.
. regress write read math
regress write read math female
This time we have two examples of a multiple regression, the first one with two predictor variables and the second one with three.
anova & regress are just two of many estimation procedures available in Stata. A partial list is given in the table below:
| anova | analysis of variance and covariance |
| arch | autoregressive conditional heterosce. family of estimators |
| arima | autoregressive integrated moving average models |
| bsqreg | quantile regression with bootstrapped standard errors |
| clogit | conditional logistic regression |
| cnreg | censored-normal regression |
| cnsreg | constrained linear regression |
| ereg | maximum-likelihood exponential distribution models |
| glm | generalized linear models |
| glogit | weighted least squares logit on grouped data |
| gprobit | weighted least squares probit on grouped data |
| ivreg | instrumental variable and two-stage least squares regression |
| lnormal | maximum-likelihood lognormal distribution models |
| logistic | logistic regression |
| logit | maximum-likelihood logit regression |
| mlogit | maximum-likelihood multinomial logit models |
| mvreg | multivariate regression |
| nbreg | maximum-likelihood negative binomial regression |
| nl | nonlinear least squares |
| ologit | maximum-likelihood ordered logit |
| oprobit | maximum-likelihood ordered probit |
| poisson | maximum-likelihood poisson regression |
| probit | maximum-likelihood probit estimation |
| qreg | quantile regression |
| reg3 | three-stage least squares regression |
| regress | linear regression |
| rreg | robust regression using IRLS |
| sureg | seemingly unrelated regression |
| tobit | tobit regression |
| vwls | variance-weighted least squares regression |
| zinb | zero-inflated negative binomial model |
| zip | zero-inflated poisson models |
test & predict are commands that can be used in conjuction with estimation procedures. There are too many combinations of estimation, predict and test to get into in this class, other than to say that they provide very powerful tools for researchers and are worth the time spent learning them.
. use hsb2, clear
ttest math=50
ttest math, by(sch)
oneway math sci
oneway math prog
anova math prog ses prog*ses
test ses / prog*ses
regress science math
regress science math, beta
regress science math read
regress science math read write
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset hsb2.dta can be loaded directly into Stata, over the Internet, using the
following command:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. order
rename
label data
label variable
label define
label values
replace
recode
note:
notes
save, replace
Let's begin by using a new data set, schdat.dta, it looks like this:
id a1 t1 gender a2 t2 tgender 1 95 88 0 94 95 1 2 63 86 1 61 94 1 3 87 80 0 81 84 1 4 79 70 0 79 87 0 5 68 78 1 63 69 0 6 64 87 1 82 96 0 7 86 75 0 69 76 0 8 81 94 1 93 92 1 9 89 79 0 90 78 1 10 78 68 1 80 80 1 |
. use schdat, clear
describe
The describe tells us the names of the variables but doesn't provide much more information. Here's the scoop on the data: a1 and a2 are scores on two assignments, t1 and t2 are the scores on the midterm and final respectively, gender is the gender of the student (1=female and 0=male). The variable tgender is the gender of the teacher and is also scored 1=female and 0=male. None of this is obvious from looking at the data, so let's get organized.
. order id gender tgender a1 a2 t1 t2
rename a1 assign1
rename a2 assign2
rename t1 midterm
rename t2 final
rename gender female
rename tgender tfemale
The order command changes the order of the varibles. The four rename commands change the names of some of the variables to more meaningful ones. This is a good start but we really need to add some labels to make things clear
. label data "Fall 1999 Stat 100 Scores"
label variable female "student gender"
label variable tfemale "teacher gender"
generate totavg = (assign1 + assign2 + midterm + final) / 4
label variable totavg "total score, divided by 4"
describe
The label data command places a label on the whole dataset. The label variable command makes labels that help explain individual variables. The generate command makes total the sum of the assignments and the midterm and final. Next we need to assign labels to female and tfemale and make a variable with the grade in the class.
Let's make labels showing that female and tfemale are coded 1=female and 0=male.
label define sex 1 "female" 0 "male"
label values female sex
label values tfemale sex
describe
tab1 female tfemale
tab1 female tfemale, nolabel
The label define command creates a definition for the values 0 and 1 called sex. The label values command connects the values defined for sex with the values in female and tfemale.
. generate grade = totavg
recode grade 0/60=0 60/70=1 70/80=2 80/90=3 90/100=4
label define abcdf 0 "F" 1 "D" 2 "C" 3 "B"
4 "A"
label values grade abcdf
list grade totavg
The generate and recode commands make a new variable grade going from 1 to 5. Using label define and label values the values of grade are labeled A - F.
. note: gender is self-report
note: the final was a take-home exam
notes
save schdat2
use schdat2, clear
The note: (note the colon, ":") command allows you to place notes into the dataset. The command notes displays the notes. The save, replace saves the dataset as schdat2.dta.
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset schdat.dta can be loaded directly into Stata, over the Internet, using the
following command:
use http://www.ats.ucla.edu/stat/stata/notes/schdat
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. drop cases
keep cases
append
drop variables
keep variables
sort
merge
. use hsb2, clear
drop if female==0
save hsbf
. use hsb2, clear
drop if female==1
save hsbm
tabulate female
We started with hsb2 which had 200 cases and created two new datasets hsbm and hsbf using the drop command. hsbm has only males (n = 91) while hsbf has only females (n = 109).
The exact same thing can be accomplished using the keep command instead of the drop command.
. use hsb2, clear
keep if female==0
save hsbm, replace
tabulate female
. use hsb2, clear
keep if female==1
save hsbf, replace
tabulate female
Note that drop if female==0 is equivalent to keep if female==1.
. use hsbm, clear
append using hsbf
. tabulate female
The append command concatenates two datasets, that is, sticks them together vertically, one after the other. Normally, the append command would be followed by a save command, however in this case, we already have the hsb2 dataset which is the same as hsbm and hsbf appended together.
. use hsb2, clear
drop female-prog
sort id
save hsbv1
describe
. use hsb2, clear
drop read-socst
sort id
save hsbv2
describe
We started with hsb2 which had 200 cases and 11 variables. Using the drop command we created hsbv1 which has six variables and hsbv2 which also has six variables. The variable id appears in both datasets.
As before, the exact same thing can be accomplished using the keep command instead of the drop command.
. use hsb2, clear
keep id read-socst
sort id
save hsbv1
describe
. use hsb2, clear
keep id female-prog
sort id
save hsbv2
describe
This time drop female-prog is equivalent to keep read-socst.
. use hsbv1, clear
merge id using hsbv2
describe
The merge command sticks two datasets together horizontally, one next to the other. Both datasets must be sorted by the merge variable, in this case, id, before being merged. Normally, the merge command would be followed by a save command, however in this case, we already have the hsb2 dataset which is the same as hsbv1 and hsbv2 merged by id.
. use hsb2, clear
drop if female~=0
save hsbf, replace
use hsb2, clear
drop if female~=1
save hsbm, replace
append using hsbf
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset hsb2.dta can be loaded directly into Stata, over the Internet, using the
following commands:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. stem
graph
graph types
histogram
box
bar
oneway
twoway
matrix
. kdensity
pnorm
rvfplot
rvpplot
. use hsb2, clear
stem math, lines(2)
The stem command produces a stem-and-leaf diagram. The lines(2) option sets the output to two lines per digit, which in this case, makes the output a little cleaner.
. graph math, histogram bin(11) normal
kdensity math, normal
The graph command produces many types of graphic plots. The histogram option naturally produces histograms. The bin(11) option indicates how many categories to break the data into. Eleven was chosen so as to be similar to the stem command above. The kdensity produces a type of a smoothed histogram. In both histogram and kdensity, the normal option superimposes a normal curve on the graph.
. sort prog
graph math, box by(prog) total
. graph read math socst, box
The box option produces box-and-wisker plots. The by(prog) option produces a box plot for each level of the variable prog, but only if the data have been sorted on the prog. The total option produces a box plot for all the observations, across all level of prog. The second box plot example produces separate box plots for each of the variables listed.
. graph math, bar by(prog) means
graph read math socst, bar means
The bar option produces vertical bar charts. The first bar chart looks at 'math' for each level of 'prog.' It is necessary for the data to be sorted by 'prog' which we did in the previous step. The means option produces bar graphs of means.
The second example produces a bar chart of means for the three variables listed after the graph command.
. graph math read science, oneway
The oneway option produces a one-dimensional frequency plot. Notice how easy it is to compare the frequency distributions to two or more variables simultaneously.
. graph math read, twoway
graph math read, twoway oneway
graph math read, twoway box
The twoway option produces a bivariate scatterplot. Three examples are given: 1) The scatterplot only, 2) the scatterplot along with oneway plots of the marginal distributions, 3) the scatterplot along with box plots of the marginal distributions.
. graph math read science ses, matrix half
The matrix option produces a bivaarite scatterplot for each of the variables listed. The half option suppresses the symmetric upper portion of the output, producing larger individual plots.
. pnorm math
The pnorm command produces a normal probability plot.
. regress math read science ses
rvfplot, yline(0)
rvpplot read, yline(0)
rvpplot science, yline(0)
rvpplot ses, yline(0)
It is easy to create various residual plots using the rv commands. The rvfplot command produces a plot of the residuals vs the predicted values (fitted). The rvpplot command produces plots of redisuals vs independent variables (predictors). The yline(0) option produces a horizontal line at the values of zero on the y-axiz.
. use hsb2, clear
stem math, lines(2)
graph math, histogram bin(11) normal
sort prog
graph math, box by(prog) total
graph math read science, oneway
graph math read, twoway box
graph math read science ses, matrix half
pnorm math
regress math read science ses
rvfplot, yline(0)
rvpplot read, yline(0)
rvpplot science, yline(0)
rvpplot ses, yline(0)
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset hsb2.dta can be loaded directly into Stata, over the Internet, using the
following command:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. log using filename.log
log close
log off
log on
type
do
. log using summary.log
use hsb2
generate lang = read + write
summarize read write lang
log close
type summary.log
The command log using summary.log opens a log file called summary.log that records everything you type and all of the output from the commands as a text file.
The command log close closes and saves the current log file.
The command type displays the contents of a file to the screen.
. log using resid.log
use hsb2
regress read write math science
rvfplot
predict r, rstudent
sort r
log off
list r
log on
list if abs(r) > 2.5
log close
type resid.log
This set of commands is much like the ones before, except that we use log off to temporarily suspend the log, and log on to resume the log. As before, we finish with log close to close and save the current log file.
Sometimes you may want to use the same commands on more than one file but you don't want to have to type them in more than once. Other times its easier to collect all of your Stata commands together in one place and do all at once rather than one at a time. A do-file allows you to place commands in a file and run them all at once. Any command that you can type in on the command line can be placed in a do-file.
Do-files are created with the do-file editor or any other text editor. Any command which can be executed from the command line can be placed in a do-file. Here are some commands that could be placed in a do-file:
set more off
use hsb2, clear
generate lang = read + write
label variable lang "language score"
tabulate lang
tabulate lang female
tabulate lang prog
tabulate lang schtyp
summarize lang, detail
table female, contents(n lang mean lang sd lang)
table prog, contents(n lang mean lang sd lang)
table ses, contents(n lang mean lang sd lang)
correlate lang math science socst
regress lang math science female
set more on
Let's look at a do-file that contains these commands that is on our floppy disk.
. type hsbbatch.do
Now let's "do" the file hsbbatch.do
. do hsbbatch
Notice that all of the commands scrolled off of the screen without prompting you with "-more-". This is because we started the do-file with set more off.
The above do-file, hsbbatch did not save the results. Let's improve it so it makes a log of its results. The additions are shown in italics. Notice we start with capture log close to close the log (in case it was open) and then the log using command starts logging our results to hsbbatch.log.
capture log close
log using hsbbatch.log, replace
set more off
use hsb2, clear
generate lang = read + write
label variable lang "language score"
tabulate lang
tabulate lang female
tabulate lang prog
tabulate lang schtyp
summarize lang, detail
table female, contents(n lang mean lang sd lang)
table prog, contents(n lang mean lang sd lang)
table ses, contents(n lang mean lang sd lang)
correlate lang math science socst
regress lang math science female
set more on
log close
Now let's "do" the file hsbbatch.do
. do hsbbatch
If we like, we could "run" the file hsbbatch.do and it would not show the results.
. run hsbbatch
Either way, we can see the results with the type command, i.e,
. type hsbbatch.log
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The datasets schdat.dta and hsb2.dta can be loaded directly into Stata, over the
Internet, using the following command:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. cd -- change directory
pwd -- print working directory
dir -- directory listing
ls -- directory listing
type -- type (display) a file to the screen
mkdir -- make a new directory
copy -- copy a file
erase -- erase (delete) a file
Many of these commands are similar to Unix or DOS commands.
. pwd
pwd
dir
dir *.do
ls *.raw
type hsbbatch.do
type ascii.raw
type cls2.log
type schdat.dta
The pwd command displays the current working directory, while the dir displays a directory listing. The ls is the same as dir. The "*" is called a wild card and is used to match a number of files that have a common prefix or postfix. The type displays the contents of a file to the screen. Notice that when you use type with a Stata data file you get jibberish. That is because Stata data files have their own special format. type only works with files that are in ascii format: .raw, .do, .log, .hlp and .ado.
. mkdir stata2
copy hsb2.dta stata2\hsbnew.dta
cd stata2
dir
erase hsbnew.dta
dir
cd ..
pwd
The mkdir creates a new sub-directory. The copy command makes a copy of a file. The command erase deletes a file. Typing the command cd .. takes you up one level in the directory structure.
. pwd
dir
ls *.raw
mkdir ctata2
copy hsb2.dat stata2\hsbnew.dta
cd stata2
dir
erase hsbnew.dta
dir
cd ..
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The dataset hsb2.dta can be loaded directly into Stata, over the Internet, using the
following command:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. tab1
tab2
ttest
hotel
regress
rreg
logistic
sw
xi
anova
signtest
signrank
ranksum
kwallis
We haven't done any actual statistics in a while so let's try a few stat commands.
[by varlist:] command [varlist] [if exp] [in range] [, options]
Items inside of the squares brackets are either option or not available for every command. By the way, this syntax applies to all Stata commands not just the statistical commands. In order to use by ...:, you must first sort on the by variables(s).
. use hsb2, clear
tabulate female race ses
tab1 female race ses
Use tab1 when you want to do a series of one-way frequency tables. If you typed tabulate female race ses you would get a three-way table.
. tab2 female race ses
Use tab2 when you want to do a series of two-way frequency tables. In this case, you get female by race, female by ses, and race by ses.
. ttest write = read
This is a two-sample dependent t-test.
. ttest write, by(female)
This is the standard two-sample independent t-test with pooled (equal) variances.
. ttest write, by(female) unequal
This is the two-sample independent t-test with separate variances.
. hotel read write math, by(female)
hotel performs Hotelling's T2, the multivariate analog of the univariate t-test.
. use hsb2
regress write read science female
This is the plain vanilla OLS regression.
. regress
regress without any arguments redisplays the last regression analysis.
. regress write read science female, robust
The robust option is used to compute robust standard errors when the residuals are not i.i.d. This option does not effect the estimates of the regression coefficients.
. rreg write read science female
The rreg command is used for robust regression. It is used when there is concern about outliers or about skewed distributions. Robust regression will result regression coefficients and standard errors that are diffeerent from OLS.
. generate honcomp = (write >= 60) /* honors composition */
tabulate honcomp
logistic honcomp read science female
logit
To demonstrate logistic regression we will create a dichotomous variable, honcomp (honors composition). Important Note: we are not recommending that continuous variables be converted into dichotomous variables. Note the code honcomp = (write >= 60) for creating the 0/1 variable. The logistic command produces output in odds ratios while logit produces coefficients.
. sw regress write read science female, pr(.05)
The sw command is used for stepwise regression. The pr option is the probability to remove a variable.
. sw regress write read science female, pe(.05)
The pe option is the probability to enter a variable.
. sw regress write read science female, pe(.05) pr(.1)
When both the pr and pe options are included variables can be removed and can reenter at a later step.
xi: regress math i.prog read
test Iprog_2 Iprog_3
xi: regress math i.prog*read
test Iprog_2 Iprog_3
test IpXrea_2 IpXrea_3
The xi command is to dummy code categorical variables. The variable prog has three levels and requires two dummy coded variables each. The expression i.prog*read automatically creates a model the includes the dummy coded effect for prog, the continuous variable read and the product (interaction) of prog and read.
The test command is used to test the collective effect of prog or prog*read using the dummy coded variables.. anova write prog female prog*female
This is a standard factorial anova.
. anova
anova without any arguments redisplays the last anova.
. anova write prog female prog*female read, continuous(read)
This is an analysis of covariance. The continuous option indicates the covariate.
. anova write prog female / prog*female /
This is a random-effects factorial anova in which prog*female is the error term for the prog and female main effects.
. use spf
anova y b s, repeated(s)
This is a randomized block design with repeated measures on subjects.
. anova y a / s|a b a*b, repeated(b)
This is a split-plot factorial with a as the between subjects factor and b as the within subjects factor. Subjects nested in a is the error term for the a main effect.
. signtest write = 55
The signtest is the nonparamentric analog to the single-sample t-test.
. signrank write = read
The signrank test is the nonparamentric analog to the dependent-sample t-test.
. ranksum write, by(female)
The ranksum test is the nonparamentric analog to the independent two-sample t-test.
. kwallis write, by(prog)
The kwallis test is the nonparamentric analog to the one-way anova.
. use hsb2
tab1 female race ses
tab2 female race ses
ttest read = write
ttest read, by(female) unequal
hotel read write math, by(female)
anova read prog ses prog*ses math, continuous(math)
anova read prog ses / prog*ses /
use spf
anova y b s, repeated(s)
use hsb2
regress read write science female, robust
rreg read write science female
sw regress read write science female, pr(.05)
xi: regress read write female i.ses i.prog
signtest read = 55
signrank read = write
ranksum read, by(female)
kwallis read, by(ses)
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
The datasets hsb2.dta and spf.dta can be loaded directly into Stata, over the Internet,
using the following commands:
use http://www.ats.ucla.edu/stat/stata/notes/hsb2 use http://www.ats.ucla.edu/stat/stata/notes/spf
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
. help
search
tutorial
. help if
help anova
help regress
help regression
The help command can be used from the command line or from the Help window. To use help the command must be spelled correctly and the full name of the command must be used. help contents will list all commands that can be accessed using help.
. search if
search regression
search ttest, manual
search tukey
search gould, author
search gould, author stb
search gould, author faq
net search missing data
findit missing data
The search command searches for information in Stata manuals, FAQs, and Stata Technical Bulletins (STBs). The search options include: manual which restricts searches to the Stata Manual; author when searching for an author by name; stb which restricts searches to STBs; faq which restricts searches to FAQs.
The search command can be used from either the command line or the Help window.
The findit command combine search and net search. You must have an up-to-date version of Stata use findit.
. tutorial
tutorial regress
Each copy of Stata comes with a built-in tutorial. Typing tutorial brings up information about the tutorials. tutorial regress will bring up the tutorial on regression.
The Training and Consulting unit of ATS has developed as series of self-paced learning modules for Stata. These module can be found on the World Wide Web at www.ats.ucla.edu/stat/stata/modules/
. help if
search if
help regress
help regression
search regression
search ttest, manual
search Tukey
search gould, author
search gould, author stb
search gould, author faq
The Stata Class Notes are available on the World Wide Web by visiting ...
http://www.ats.ucla.edu/stat/stata/notes/
1 Jul 1999 - pbe
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
ATS Web Pages
ATS has created a variety of web pages to help you learn and use Stata. The web pages are located at http://www.ats.ucla.edu/stat/stata/
The pages include information about
The Stata page at http://www.ats.ucla.edu/stat/stata/ also has a search engine to help find relevant pages for you.We recommend searching using one word searches, choosing a word that most distinguishes your question.
Consulting Services
If you visit our main page at http://www.ats.ucla.edu/stat/ you can see the section about our Consulting Services, and see our consulting schedule and how to visit us at Walk In consulting and send us questions via email.
We are constantly updating our pages. If you would like to receive occasional email notices notifying you of updates to our web pages and announcements of our Stata Classes, visit http://www.ats.ucla.edu/cfapps/listserv/joinleaveform.cfm and join the ATSstat-L list.
Searching the Internet
If you don't find what you want there, you can search the internet for information about Stata using a search engine like http://www.google.com or http://www.altavista.com . With our ATS web pages, we recommend using one word searches, however when searching the entire internet you will usually get thousands (or even millions) of pages if you search just for a single word. Here are some example searches that we can run in http://www.google.com which finds pages that have ALL of terms given. (Many/most other search engines find pages that have ANY of the terms you supply, and it is often necessary to place a + in front of every term to request pages that have ALL of the terms, e.g. +Stata +regression +logistic).
- Perhaps you want to learn more about Regression in Stata, so we can search for Stata regression
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
|
|
|
||||
|
|
|||||
use http://www.ats.ucla.edu/stat/stata/notes/lahigh
| Variable Name | Variable | Values |
| id | Student ID number | Each student has a unique four digit ID
number Records from school Alpha begin with 1 and records from School Beta begin with 2 |
| gender | Gender | 1 = Female 2 = Male |
| ethnic | Ethnicity | 1 = Native American 2 = Asian 3 = African-American 4 = Hispanic 5 = White 6 = Filipino 7 = Pacific Islander |
| school | School | 1 = School Alpha (n=159) 2 = School Beta (n=157) |
| mathpr | CTBS Math PR Score | percentile rank |
| langpr | CTBS Lang PR Score | percentile rank |
| mathnce | CTBS Math NCE Score | NCE1 Score |
| langnce | CTBS Language NCE Score | NCE1 Score |
| biling | Bllingual Status | 0 = No Bilingual Status (Native English Speaker) 1 = IFEP (Foreign language spoken in home but student tested English Proficient) 2 = RFEP (Formerly LEP but transitioned to English) 3 = LEP (Currently in Bilingual Program) |
| daysabs | Number of days absent | Number of days |
1NCE stands for normal curve equivalent. It's a type of standardized score with mean=50 and standard deviation=21.06.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
30Sep99