[R] How to separate a data set by its factors

David Winsemius dwinsemius at comcast.net
Thu Dec 24 22:56:42 CET 2009


On Dec 24, 2009, at 3:24 PM, James Rome wrote:

> I have a large data set of airport data and wish to analyze it by hour
> and day of the week. hour and day of the week are factors.
>
> I can do something such as:
> histogram(~() | , type="count", breaks=60)
> which displays the data the way I want it in principle,  but the plots
> are too small to read. I added layout=c(7,6,4) to the argument list,  
> but
> then I only get the first page of plots. How do I see the other pages?

I was not aware that layout had a paging argument, but that just shows  
you that there are large gaps in my knowledge. if I munge one of the  
examples on the xyplot help page I get (ugly) multi-page output;

pdf(test.pdf")
xyplot(Sepal.Length + Sepal.Width ~ Petal.Length + Petal.Width |  
Species, data = iris, scales = "free", layout = c(2, 1, 2), auto.key =  
list(x = .6, y = .7, corner = c(0, 0)))
dev.off()
You may not be getting what you expect, but it may be that your plots  
are all being created, but too quickly to be seen. Try printing to a  
more durable "canvas".

> And I would like to add a Poisson Distribution fit to each of these
> plots (see below), but am clueless as to how to go about it.
>
> I would like to fit a distribution to the count data for each
> combination of day and hour, and I am unable to see how to do this  
> in a
> vector manner.  For example, I tried
> density((Arrival.Val | DAY*Hour), na.rm=TRUE)
> which does not work.

I should think the this would be informative:

glm(Arrival.Val ~ DAY*Hour, family="poisson")

Since DAY and Hour are factors you will get a large number of  
estimates. You can use the typical regression functions, such as  
predict() and summary() to get the fitted values.

>
> I think my question boils down to "how do you replace a whole data set
> by its factored subsets in all of the usual R commands?
>
> I am climbing up a steep R learning curve, and so would appreciate  
> some
> help.
>
> Thanks,


David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list