|
|
|
||||
|
|
|||||
page 34 Figure 3.1 Four datasets, due to Anscombe (1973), with identical least-squares regressions. In (a), the linear regression is an accurate summary; in (b), the linear regression distorts the curvilinear relationship between Y and X; in (c), the linear regression is drawn toward an outlier; in (d), the linear regression "chases" the influential observation at the right.
get file 'd:\quartet.sav'.
(a)
IGRAPH /X1 = VAR(x1) /Y = VAR(y1) /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL /SCATTER.
(b)
IGRAPH /X1 = VAR(x2) /Y = VAR(y2) /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL /SCATTER.
(c)
IGRAPH /X1 = VAR(x3) /Y = VAR(y3) /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL /SCATTER.
(d)
IGRAPH /X1 = VAR(x4) /Y = VAR(y4) /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL /SCATTER.
page 37 Figure 3.2 distribution of average income for 102 occupations in the Canadian occupational prestige data. The histograms both use bins of width 1000; histogram (a) employs bins that start at 0, while (b) employs bins that start at 500.
get file 'd:\prestige.sav'.
NOTE: In order to specify the number of bins (in this case, 35), you need to use igraph. Otherwise, you could just use graph with the /histogram subcommand.
(a)
IGRAPH /X1 = VAR(income) /Y = $count /Histogram X1INTERVAL NUM = 30.
(b)
IGRAPH /X1 = VAR(income) /Y = $count /Histogram X1INTERVAL NUM = 30 X1START = 16.5.
page 38 Figure 3.3 Stem-and-leaf display for average income in the Canadian occupational prestige data.
EXAMINE VARIABLES=income /PLOT STEMLEAF /COMPARE GROUP /STATISTICS NONE /NOTOTAL.
| |
Cases | |||||
|---|---|---|---|---|---|---|
| Valid | Missing | Total | ||||
| N | Percent | N | Percent | N | Percent | |
| Average income, dollars | 102 | 100.0% | 0 | .0% | 102 | 100.0% |
Average income, dollars Stem-and-Leaf Plot
Frequency Stem & Leaf
2.00 0 . 69
2.00 1 . 68
5.00 2 . 34589
15.00 3 . 001114445667999
14.00 4 . 00123345666777
14.00 5 . 00111245567899
12.00 6 . 112344556899
8.00 7 . 01445789
15.00 8 . 000122344788888
2.00 9 . 25
1.00 10 . 4
3.00 11 . 003
2.00 12 . 34
.00 13 .
2.00 14 . 01
5.00 Extremes (>=14558)
Stem width: 1000
Each leaf: 1 case(s)
page 40 Figure 3.4 Native density estimator for average income in the Canadian occupational prestige data, using a window half-width of h = 500. Note the roughness of the estimator. A 'one-dimensional scatterplot' of the data values appears at the bottom of the graph.
NOTE: We were unable to reproduce this graph in SPSS.
page 41 Figure 3.5 Kernel (solid line) and adaptive-kernel (broken line) density estimators for average income in the Canadian occupational prestige data, using a normal kernel and a window half-width of h = 800. Note the 'images' of the normal kernel (i.e., the bumps) near the right of the display where data are sparse.
NOTE: We were unable to reproduce this graph in SPSS.
page 46 Normal quantile comparison plot for average income in the Canadian occupational prestige data. Note the positive skew.
NOTE: We were unable to reproduce this graph in SPSS.
page 47 Figure 3.11 Boxplot for income in the Canadian occupational prestige data. The central box is drawn between the hinges; the position of the median is marked in the box; and outlying observations are displayed individually.
IGRAPH /Y = VAR(income) /BOX OUTLIERS = ON EXTREME = ON MEDIAN = ON WHISKER = T.
page 51 Figure 3.12 Scatterplot of scores on a 10-item vocabulary test versus years of education. Although there are n = 968 observations in the dataset, most of the plotted points fall on top of one another. The least-squares regression line is shown on the plot.
get file 'd:\vocab.sav'. IGRAPH /X1 = VAR(educ) /Y = VAR(vocab) /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT /SCATTER COINCIDENT = NONE.
page 51 Figure 3.13 Jittered scatterplot for vocabulary score versus years of education. A uniformly distributed random quantity between -1/2 and +1/2 was added to each score for both variables. The original least-squares regression line is shown on the plot.
NOTE: To apply jitter to a graph created with the igraph command, you need to double-click on the graph to open the graph editor and then double-click on one of the data points. This will open a dialogue box with three tabs at the top. Select the third tab, "jittering", and click on the check-box to add jitter to all scale variables. Next, select the amount of jittering that you want. You can add from zero to ten percent. Note that you can jitter points off of the graph if you add too much jitter. Another way to see how many observations are represented by a single point on the scatterplot is with "sunflowers." To add sunflowers to a scatterplot, create the scatterplot with the graph command, and then use SPSS's chart editor. To use the chart editor, double click on the graph. This will open the chart editor. Next, select "chart" from the list across the top. Next, select "options". In the bottom left, there is a check box labeled "show sunflowers". Single click in the box to active the use of sunflowers. By clicking on the "sunflower options" button, you can control how many observations each pedal represents, the resolution and whether the point is at the center of the pedals or at the mean. When you are finished, click on "OK" and then close the chart editor. The changes that you made in the chart editor will than take effect on you graph in the output window.
page 52 Figure 3.14 Number of interlocking directorate and executive positions by nation of control, for 248 dominant Canadian firms.
get file 'd:\ornstein.sav'. IGRAPH /X1 = VAR(nation) TYPE = CATEGORICAL /Y = VAR (intrlcks) TYPE = SCALE /CATORDER VAR(nation) (ASCENDING VALUES OMITEMPTY) /BOX OUTLIERS = ON EXTREME = ON MEDIAN = ON WHISKER = T.
page 54 Figure 3.15 Scatterplot matrix for occupational prestige, level of education, and level of income, for 45 US occupations. The least-squares regression line is shown on each plot. Three unusual observations were identified interactively using a 'mouse.'
get file 'd:\duncan.sav'. GRAPH /SCATTERPLOT(MATRIX)=prestige income educ.
page 55 Figure 3.16 Davis's data on measured and reported weight, by gender. Data points for men are represented by asterisks, for women by circles. The line on the plot is Y = X.
get file 'd:\davis.sav'.
NOTE: We do not know how to add the Y = X line in SPSS.
GRAPH /SCATTERPLOT(BIVAR)=reptwt WITH measwt.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services