[R] as.data.frame.table() to convert by() output to a data frame

David Winsemius dwinsemius at comcast.net
Thu Nov 26 06:17:59 CET 2009


On Nov 25, 2009, at 9:54 PM, Michael Ash wrote:

> I remain confused by the difference between
>
> library(MASS)
> data(Cars93)
>
> as
> .data
> .frame
> (tapply
> (Cars93
> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
> as
> .data
> .frame
> .table
> (tapply
> (Cars93
> $Price,list(Cars93$Origin,Cars93$AirBags,Cars93$Passengers),median))
>

The display may not make it clear, but applying str() to both should  
make it clear that the first is similar to what one might get with  
cbind'ing the results of the inner function (which would not work in  
this case on a list object). You get two rows of 18 variables all  
medians or NA, while the second is the unique combinations of the (USA/ 
nonUSA)*(Passengers)*(Airbags)*(Passengers) cross and their associated  
medians which as.data.frame has labeled "Freq" since that is the usual  
element of a contingency table. I think of as.data.frame.table as  
simply another way of accomplishing as.data.frame(table()). Is that  
not how you were intending it?

>
> I clearly want the latter, but that's not clear from the  
> documentation.

Sometimes it is helpful to look at the code as well:

 > as.data.frame.table
function (x, row.names = NULL, ..., responseName = "Freq",  
stringsAsFactors = TRUE)
{
     x <- as.table(x)
     ex <- quote(data.frame(do.call("expand.grid", c(dimnames(x),
         stringsAsFactors = stringsAsFactors)), Freq = c(x), row.names  
= row.names))
     names(ex)[3L] <- responseName
     eval(ex)
}

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT
>
>
> Best,
> Michael
>
>
> On Wed, Nov 25, 2009 at 5:55 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On Nov 25, 2009, at 4:11 PM, Michael Ash wrote:
>>
>>> Dear all,
>>>
>>> This seems to be working, but I'd like to make sure that I'm not  
>>> doing
>>> anything wrong.
>>>
>>> I am using by() to construct a complicated summary statistic by
>>> several factors in my data (specifically, the 90-50 income ratio by
>>> city and race).
>>>
>>> cityrace.by <- by(microdata, list(microdata$city,microdata$race),
>>> function (x) quantile(x$income, probs=0.9) / quantile(x$income,
>>> probs=0.5) )
>>>
>>> I would now like to use the data created by by() as a dataset with
>>> city-race as the unit of observation.
>>>
>>> However, cityrace.data <- as.data.frame(cityrace.by) does not work  
>>> because
>>> "Error in as.data.frame.default(city.by) :
>>>  cannot coerce class "by" into a data.frame"
>>>
>>> The following is not a documented use of as.data.frame.table(),  
>>> but it
>>> seems to work.  It gives the columns slightly strange names,  
>>> including
>>> "Freq" for the statistic computed in by by() but otherwise, the
>>> dataframe is indexed by city and race with the 90-50 ratio as the
>>> variable
>>>
>>> cityrace.data <- as.data.frame.table(cityrace.by)
>>
>> If the by-object you get happens to be a 2d array, then why not.  
>> Tables are
>> matrices after all.
>>
>>> tt <- table(c(1,1), c(1,1))
>>> tt
>>
>>    1
>>  1 2
>>> is.matrix(tt)
>> [1] TRUE
>>
>>> --
>>
>>
>> David Winsemius, MD
>> Heritage Laboratories
>> West Hartford, CT
>
> -- 
> Michael Ash, Associate Professor
>  of Economics and Public Policy
> Department of Economics and CPPA
> University of Massachusetts
> Amherst, MA 01003
> Tel +1-413-545-6329 Fax +1-413-545-2921
> Email mash at econs.umass.edu
> http://people.umass.edu/maash




More information about the R-help mailing list