[R] Query about the functions used in tapply

lalitha viswanath lalithaviswanath at yahoo.com
Thu May 4 21:19:49 CEST 2006


Hi
I am trying to plot an x-y plot of the values a
certain variable against bins.
i.e. the x-axiz goes from 0 to 0.7 in increments of
0.02 while the y-axis is the average of values for all
the points in that interval.

Hence I first used cut to break the data into
intervals, then I applied tapply using mean as the
function and plotted the results.

I also replaced mean with median.

the 3 sets of functions that I used were

However I am finding that the actual value plotted in
the y-axis somehow does not seem to be correct?

i.e. for example in the interval 0.38-0.4 there are a
humungous number of points with y-axis value below 20
while there are very few with y-axis value above 20.
However the median plotted is still around the 20
mark.
It does not seem intuitive looking at the data that
more than 50% of the points have a clock_rate (plotted
on the y-axis) above 20.

Is there something about the way these functions work
with tapply, that I am missing?
Any obvious mistakes that I should look for?

SWfac <-cut(sorted_inp$age[1:290], seq(0, 0.7,0.02))
 SLmean <- tapply(sorted_inp$clock_rate[1:290], SWfac,
mean)
 plot(SLmean, type ="b", xaxt = "n")
 axis(1, seq(SLmean), levels(SWfac))

I tried a simple x-y scatter plot of the same 290 rows
in excel (without binning them) and the concentration
of points at lower values of clock rates does not seem
to indicate that the medians should be as high as they
are shown.

Hoping to hear further
Regards
Lalitha




More information about the R-help mailing list