[R] Need help with table() and apply()

Stuart Luppescu slu at ccsr.uchicago.edu
Sun Nov 20 23:57:35 CET 2011


On 日, 2011-11-20 at 17:43 -0500, jim holtman wrote:
> It might be good if you told us the problem you are trying to solve.
> Why do you have factors in the dataframe?  Can you just have the
> values?  Do you want to count the 'levels' of the factors in a row, or
> do you want to count the numeric they represent (in your case it is
> the same, so I wonder why the factor).
> 
> Here is one way of doing it to count what the 'level' values are:
> 
> > apply(df, 1, function(x) tabulate(as.integer(x), nbins = 4))
>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,]    2    3    2    2    1    2    1    2    2     2
> [2,]    1    4    7    3    6    5    0    1    1     2
> [3,]    3    1    1    4    2    1    6    5    5     3
> [4,]    4    2    0    1    1    2    3    2    2     3
> >
> 
> So tell us what you want to do, not how you want to do it.

I see. The reason I turned the original numeric into factors with 4
levels is so table() would tell me when I had 0 counts of some factor
levels. Your method works very well, and will save me the extra step of
converting to factors. Also, thanks for the explanation on cbind()
converting to numerics. I appreciate the help.

> 
> 2011/11/20 jim holtman <jholtman at gmail.com>:
> > The answer to your question as to why you had to convert back to
> > factors is that you "undid" the factors when you did the 'cbind' to
> > create the dataframe.  Here is what you should have done:
> >
> >> df <- data.frame(rating.1 , rating.2 , rating.3 , rating.4 ,
> > +                          rating.5 , rating.6 , rating.7 , rating.8 ,
> > +                          rating.9 , rating.10)
> >>
> >> str(df)
> > 'data.frame':   10 obs. of  10 variables:
> >  $ rating.1 : Factor w/ 4 levels "1","2","3","4": 4 1 2 4 3 2 4 1 2 1
> >  $ rating.2 : Factor w/ 4 levels "1","2","3","4": 2 3 2 3 2 2 1 3 3 3
> >  $ rating.3 : Factor w/ 4 levels "1","2","3","4": 3 1 1 3 2 1 3 3 1 3
> >  $ rating.4 : Factor w/ 4 levels "1","2","3","4": 4 2 2 2 2 4 3 3 3 4
> >  $ rating.5 : Factor w/ 4 levels "1","2","3","4": 1 2 2 2 1 2 3 3 4 4
> >  $ rating.6 : Factor w/ 4 levels "1","2","3","4": 3 2 2 1 2 2 3 3 3 2
> >  $ rating.7 : Factor w/ 4 levels "1","2","3","4": 3 4 2 2 4 3 4 4 4 4
> >  $ rating.8 : Factor w/ 4 levels "1","2","3","4": 4 1 3 1 3 1 4 4 3 3
> >  $ rating.9 : Factor w/ 4 levels "1","2","3","4": 4 4 2 3 2 4 3 2 3 2
> >  $ rating.10: Factor w/ 4 levels "1","2","3","4": 1 2 1 3 2 2 3 1 1 1
> >
> > Notice that the factors are maintained.
> >
> > When having problems, break up the steps and see what happens at each
> > one.  Here is the output of your 'cbind':
> >
> >> x <- (cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> > +                          rating.5 , rating.6 , rating.7 , rating.8 ,
> > +                          rating.9 , rating.10)
> > + )
> >> str(x)
> >  int [1:10, 1:10] 4 1 2 4 3 2 4 1 2 1 ...
> >  - attr(*, "dimnames")=List of 2
> >  ..$ : NULL
> >  ..$ : chr [1:10] "rating.1" "rating.2" "rating.3" "rating.4" ...
> >>
> >
> > notice it is just an integer array.
> >
> > Also if you had looked at the HELP page, you would have seen:
> >
> > In the default method, all the vectors/matrices must be atomic (see
> > vector) or lists. Expressions are not allowed. Language objects (such
> > as formulae and calls) and pairlists will be coerced to lists: other
> > objects (such as names and external pointers) will be included as
> > elements in a list result. Any classes the inputs might have are
> > discarded (in particular, factors are replaced by their internal
> > codes).
> >
> > Notice the last sentence.
> >
> > 2011/11/20 Stuart Luppescu <slu at ccsr.uchicago.edu>:
> >> Hello, I am having trouble getting counts of values in rows of a data
> >> frame. I'm trying to use apply, but it's not working.
> >>
> >> This gives a sample of the kind of data I'm working with:
> >>
> >> rating.1 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.2 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.3 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >> rating.4 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.5 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.6 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >> rating.7 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.8 <- factor(sample(1:4, size=10, replace=T), levels=1:4)
> >> rating.9 <- factor(sample(2:4, size=10, replace=T), levels=1:4)
> >> rating.10 <- factor(sample(1:3, size=10, replace=T), levels=1:4)
> >>
> >> df <- as.data.frame(cbind(rating.1 , rating.2 , rating.3 , rating.4 ,
> >>                          rating.5 , rating.6 , rating.7 , rating.8 ,
> >>                          rating.9 , rating.10))
> >>
> >> for(i in 1:10) {
> >>  df[,i] <- factor(df[,i], levels=1:4)
> >> }
> >>
> >> [Aside: why does the original df have columns of class "integer" when
> >> the original data are factors? Why is it necessary to reconvert them
> >> into factors? Also, is it possible to do this without a for loop?]
> >>
> >> If I do this:
> >>
> >> apply(df[,1:10], 1, table)
> >>
> >> I get a 4x10 array, the contents of which I do not understand.
> >>
> >> apply(df[,1:10], 2, table)
> >>
> >> gives 10 tables for the columns, but it leaves out factor levels which
> >> do not occur. For example,
> >>
> >>  rating.6 : 'table' int [1:3(1d)] 7 1 2
> >>  ..- attr(*, "dimnames")=List of 1
> >>  .. ..$ : chr [1:3] "1" "2" "3"
> >>
> >> lapply(df[, 1:10], table)
> >>
> >> gives tables of the columns keeping the levels with 0 counts:
> >>
> >> $ rating.6 : 'table' int [1:4(1d)] 7 1 2 0
> >>  ..- attr(*, "dimnames")=List of 1
> >>  .. ..$ : chr [1:4] "1" "2" "3" "4"
> >>
> >> But I really want tables of the rows. Do I have to write my own function
> >> to count the numbers of values?
> >>
> >> Thanks in advance.
> >>
> >> --
> >> Stuart Luppescu -=- slu .at. ccsr.uchicago.edu
> >> University of Chicago -=- CCSR
> >> 才文と智奈美の父  -=- Kernel 3.0.6-gentoo
> >>  You say yourself it wasn't reproducible. So it could have been anything
> >> that "crashed" your R, cosmic radiation, a bolt of lightning reversing a
> >> bit in your computer memory, ... :-) -- Martin Maechler (replying to a
> >> bug report) R-devel (July 2005)
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> > --
> > Jim Holtman
> > Data Munger Guru
> >
> > What is the problem that you are trying to solve?
> > Tell me what you want to do, not how you want to do it.
> >
> 
> 
> 

-- 
Stuart Luppescu -=- slu .at. ccsr.uchicago.edu University of Chicago -=-
CCSR 才文と智奈美の父  -=- Kernel 3.0.6-gentoo There are actual error
messages, and until you show them, we can not help as the mind reading
machine is currently off for repairs. -- Dirk Eddelbuettel (after
reports about errors with R CMD check) R-help (July 2010)



More information about the R-help mailing list