# [R] Plot survey data

John Fox jfox at mcmaster.ca
Sun Sep 14 14:31:20 CEST 2003

```Dear Anupam,

At 12:07 AM 9/14/2003 -0400, TyagiAnupam at aol.com wrote:
>Hi John,
>thanks for the suggestion.  What would one consider as large range?

If the largest case weight corresponds to a probability of inclusion of 1,
then the probability of inclusion for other cases is weight/max.weight, and
the expected number of cases in the plot is n*average.weight/max.weight.
You don't want this number to be too small. In your case, the ratio of the
average to maximum weight is 0.013; assuming about 200,000 valid
observations, you'd have about 2600 points in the plot, which seems
reasonable (but see below).

> > summary(finalwt)
>    Min. 1st Qu.  Median    Mean          3rd Qu.             Max.
>     1.8   192.1   462.7      872.8          1018.0          67150.0
>The sample is large: about 250,000.
>How large a sample should one draw from the sample? There are also plenty of
>missing values.

Since you can't plot the missing values, the effective n is the number of
valid cases. Making a scatterplot with a very large number of points is
almost surely going to be uninformative (irrespective of the issue of
weighting), and I'd consider an alternative, such as a bivariate
nonparametric density estimate, possibly showing outlying points
individually. (My second suggestion should produce a similar result.)

If you want to sample, I'd proceed by trial and error, adjusting both the
sample size and point size. For example, you can decrease the sample size
by scaling the weights down. A rule for how many points to include would be
hard to come by, because a reasonable answer depends upon the configuration
of the data, but I suspect that several tens of thousands of points would
generally be too many.

Perhaps someone else will have better ideas.

Regards,
John

>In a message dated 9/12/03 10:48:06 PM Pacific Daylight Time,
>jfox at mcmaster.ca writes:
>
> > Dear Anupam,
> >
> > I may be wrong, but I don't think that there's any standard method to use
> > in plotting with case weights. I can think of two approaches, however: (1)
> > If you have a large sample, and if the range of the weights isn't too
> > large, you could sample your observations with probability of inclusion in
> > the plot proportional to the case weights. (2) You could plot the points
> > with "size" proportional to the square root of the case weights (i.e.,
> area
> > proportional to the weights).
> >
> > I hope that this helps,
> >  John
> >
>         [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-----------------------------------------------------
John Fox
Department of Sociology
McMaster University