[R] lattice histogram problem with integers values and nint

Richard and Barbara Males rbmales at gmail.com
Thu May 15 21:27:29 CEST 2008


been puzzling over this for a day.

Summary
integer variable to use with histogram, 170,000 rows.  Value is day of
year.  Hist works, lattice histogram with nint does not work (spurious
spikes in display), lattice histogram using breaks=c(0:365) works
fine.  Spike values appear to be sum of two adjacent bins.  Want to
know if this is a familiar problem, and what the recommended
work-around is.  Also want to know how to get the bin count from the
lattice histogram object, as I would with hist$count.

Thanks in advance.


Detail

I have a dataset of approximately 170,000 rows, with a DayOfYear
field.  I want a histogram of the number of rows in each day of the
year.  I set up breaks from 0:365, and use this with hist, and the
lattice histogram, e.g.

histogram(dfTemp$DayOfYear,breaks=breaklist,type="count")

If I use hist to display this, all values are under 600, everything is fine..

If I use lattice histogram on the full 365 days, either with nint=365,
or breaks set from (0:366), I get 26 equally-spaced spurious peaks
above 800 (that is, 26 days reported with bad values).  A table
command on this field shows me that the highest count of rows in a day
is 553.  When I do hist (not histogram), the plot looks fine.  When I
do plot(table(dataframe$DayOfYear), the plot looks fine.  If I do a
subset of the data to look only at days below 340, the plot looks
fine.  At days below 341, I get one of these spikes, at about 170
days, going up to about 900.  At a subset of days below 342, I get two
spikes, both over 900.

If I set breaks to 0:366, and add a small increment to my integer values, e.g.

histogram(df2006NonRecVessels$DayOfYear+.001,breaks=breaklist,type="count")

All is well.under this approach.


I have attempted to search to see if this is a known problem, but
don't find anything.

Also, I can get the count in each bin for hist as
> xx=hist(df2006NonRecVessels$DayOfYear,breaks=breaklist)
> xx$count  # this gives me the counts

I am unclear how to get equivalent information on bin contents from
the lattice-generated histogram object. It appears to be in
panel.args, but I am unclear on the exact syntax.


Richard M. Males
Cincinnati, Ohio, USA



More information about the R-help mailing list