[R] Need ideas on how to show spikes in my data and how to code it in R

Thomas Frööjd tfrojd at gmail.com
Mon Jun 23 21:40:48 CEST 2008


I have recently been analyzing birthweight data from a clinic.  The
data has obvious defects in that there is digit preference on certain
weights making them overrepresented. This shows as spikes in the
histogram on certain well rounded weights like 2, 2.5, 3, etc.   I
would like to show this to government officials but can't figure out
how I should present the finding in an easy to understand manner.

My idea is this:

I have a dataset of 20 000 childbirths from another nation that I
would like to plot in a graph over the histograms of birth weights
from the clinic. This dataset doesn't share the digit preference
problem. The idea is similar to how people sometimes plot a fitted
normal density function over a histogram to show how data is

To do this I need to do three steps. None which I succeeded with so far

1.       Shift the mean and std on the reference dataset to the mean
and std of my clinic birth weight data.

2.       Scale the data so they can be plotted on the same axis. The
reference dataset has around 20 000 observations and my data from the
clinic only around 3000 so I have to fix this otherwise the plot of
the reference datset will be much bigger in the graph.

3.       Plot both on the same graph. The reference dataset like a
density plot and my dataset as a histogram, that means weight bins on
the x axis and number of observations on y. It should be added that my
reference dataset isn't truly continuous but recorded at 100g
intervals. This means both datasets have the same grouping however
plotting both as histogram would probably make it harder to understand
for a person with little training in statistics. This means that the
reference dataset "density function" has to be smoothed somehow.

I would be very thankful for help on any of those steps. Also if you
think this approach is wrong for some reason please tell me.

Best regards

Thomas Fröjd

More information about the R-help mailing list