UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SPSS FAQ
How can I display overlapping data points on a scatterplot?

Scatterplots are often a good way of displaying data.  Oftentimes, however, two or more observations will have the same values on the variables being graphed. When this happens, the points are graphed on top of each other, and you cannot tell from the scatterplot how many data points each symbol on the graph represents.  Consider the data set below.  Although there are four variables listed on the data list command, there are really only three variables in the data set: id, var1 and var2.  The variable wt is really the number of observations for each combination of values for var1 and var2.  After reading in the data, we will do a crosstab to clearly show how many observations have the same values for var1 and var2.  Then we will make a scatterplot of the data.

data list list / id var1 var2 wt.
begin data
1 1 1 4
2 1 2 7
3 1 3 6
4 2 1 9
5 2 2 5
6 2 3 11
7 3 1 1
8 3 2 2
9 3 3 3
10 4 1 12
11 4 2 8
12 4 3 10
end data.
save outfile 'a:\jit.sav'.

weight by wt.

crosstabs tables = var1 by var2. 
Case Processing Summary

Cases
Valid Missing Total
N Percent N Percent N Percent
VAR1 * VAR2 78 100.0% 0 .0% 78 100.0%


VAR1 * VAR2 Crosstabulation
Count

VAR2 Total
1.00 2.00 3.00
VAR1 1.00 4 7 6 17
2.00 9 5 11 25
3.00 1 2 3 6
4.00 12 8 10 30
Total 26 22 30 78

Using the graph command

graph
 /scatter var1 with var2.

Scatter of var2 var1

From the crosstab above we can see that there are 78 observations in the data set.  However, there are only 12 points on the scatterplot.  This means that most of the data are stacked one on top of the other on the scatterplot, but we cannot tell how many observations each point on the scatterplot represents.  This problem can be solved in two different ways, depending on which graph command you use.  If you use the graph command, you can use sunflowers.  If you use the igraph command, you can use jitter.  Both of these methods will be show below.  To solve this problem using the graph command, let's make the scatterplot again and then modify it with SPSS's chart editor so that we can clearly see how many observations each point represents.  SPSS's graph command uses "sunflowers".  To create the "sunflowers", a small line, called a pedal, is added to each point on the scatterplot to indicate how many observations each point represents.  For example, if there are three observations at that point, then three lines will be added. These are called "sunflowers" because they look somewhat like a sunflower.

graph
 /scatter var1 with var2. 

Scatter of var2 var1

To use SPSS's chart editor, double click on the graph.  This will open the chart editor. Next, select "chart" from the list across the top.  Next, select "options".  In the bottom left, there is a check box labeled "show sunflowers".  Single click in the box to active the use of sunflowers.  By clicking on the "sunflower options" button, you can control how many observations each pedal represents, the resolution and whether the point is at the center of the pedals or at the mean.  When you are finished, click on "OK" and then close the chart editor.  The changes that you made in the chart editor will than take effect on you graph in the output window.

Using the igraph command

Another way to indicate how many data points each point on the scatterplot represents is to use jitter (adding or subtracting a small amount from each value so that the points are not exactly on top of each other).  In this way, overlapping points are separated by a small amount, allowing you to see how many points are really there.  To do this in SPSS, you must first create the graph using the igraph command, as shown below.  Note that jittering in SPSS does not work with weighted variables, so we will re-enter the data without the weight variable and add eight new cases.  The overlapping data points will be on the diagonal in the graph.

data list list / id var1 var2.
begin data
1 1 1 
2 1 1 
3 1 1 
4 2 2 
5 2 2
6 2 2
7 3 3
8 3 3
9 3 3
10 4 4
11 4 4 
12 4 4
13 1 4
14 4 1
15 1 2
16 2 4
17 3 1
18 3 2
19 4 2
20 1 1
end data.
igraph
 /x1 = var(var1) 
 /y = var(var2)
 /scatter coincident = none.

To add the jitter, double-click on the graph to open the graph editor.  Next, double-click on one of the data points in the graph.  This will open a dialogue box with three tabs at the top.  Select the third tab, "jittering".  Click on the check-box to jitter all scale variables and then indicate the percent of jittering that you want.  The range of jittering is zero to ten percent.  Click on the "apply" button to preview the changes and the "OK" button to accept the changes.  Be aware that if you request too much jittering, you can jitter points off of the graph.  If you would like to get rid of the legend that says "Cloud is jittered", right-click on it and select "hide key".

Using the ggraph command

The ggraph command was introduced in SPSS version 14.  You can also use it to create a scatterplot with jittered points.  In the second to last line of the command, on the element statement, simply change point(position(var1*var2)) to point.jitter(var1*var2)).

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=var1 var2 MISSING=LISTWISE
  REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: var1=col(source(s), name("var1"), unit.category())
 DATA: var2=col(source(s), name("var2"), unit.category())
 GUIDE: axis(dim(1), label("var1"))
 GUIDE: axis(dim(2), label("var2"))
 SCALE: cat(dim(1))
 SCALE: cat(dim(2))
 ELEMENT: point.jitter(position(var1*var2))
END GPL.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.