[R] Plot survey data

John Fox jfox at mcmaster.ca
Sun Sep 14 14:31:20 CEST 2003

Dear Anupam,

At 12:07 AM 9/14/2003 -0400, TyagiAnupam at aol.com wrote:
>Hi John,
>thanks for the suggestion.  What would one consider as large range?

If the largest case weight corresponds to a probability of inclusion of 1, 
then the probability of inclusion for other cases is weight/max.weight, and 
the expected number of cases in the plot is n*average.weight/max.weight. 
You don't want this number to be too small. In your case, the ratio of the 
average to maximum weight is 0.013; assuming about 200,000 valid 
observations, you'd have about 2600 points in the plot, which seems 
reasonable (but see below).

> > summary(finalwt)
>    Min. 1st Qu.  Median    Mean          3rd Qu.             Max.
>     1.8   192.1   462.7      872.8          1018.0          67150.0
>The sample is large: about 250,000.
>How large a sample should one draw from the sample? There are also plenty of
>missing values.

Since you can't plot the missing values, the effective n is the number of 
valid cases. Making a scatterplot with a very large number of points is 
almost surely going to be uninformative (irrespective of the issue of 
weighting), and I'd consider an alternative, such as a bivariate 
nonparametric density estimate, possibly showing outlying points 
individually. (My second suggestion should produce a similar result.)

If you want to sample, I'd proceed by trial and error, adjusting both the 
sample size and point size. For example, you can decrease the sample size 
by scaling the weights down. A rule for how many points to include would be 
hard to come by, because a reasonable answer depends upon the configuration 
of the data, but I suspect that several tens of thousands of points would 
generally be too many.

Perhaps someone else will have better ideas.


>In a message dated 9/12/03 10:48:06 PM Pacific Daylight Time,
>jfox at mcmaster.ca writes:
> > Dear Anupam,
> >
> > I may be wrong, but I don't think that there's any standard method to use
> > in plotting with case weights. I can think of two approaches, however: (1)
> > If you have a large sample, and if the range of the weights isn't too
> > large, you could sample your observations with probability of inclusion in
> > the plot proportional to the case weights. (2) You could plot the points
> > with "size" proportional to the square root of the case weights (i.e., 
> area
> > proportional to the weights).
> >
> > I hope that this helps,
> >  John
> >
>         [[alternative HTML version deleted]]
>R-help at stat.math.ethz.ch mailing list

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox

More information about the R-help mailing list