[R] Convert dataframe to table with counts where column namesbecome row names

William Dunlap wdunlap at tibco.com
Thu Aug 6 23:17:32 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius
> Sent: Thursday, August 06, 2009 1:40 PM
> To: ghinkle
> Cc: r-help at r-project.org
> Subject: Re: [R] Convert dataframe to table with counts where 
> column namesbecome row names
> 
> 
> On Aug 6, 2009, at 1:14 PM, ghinkle wrote:
> 
> >
> > Can anyone explain how best to go from a dataframe to a table (or  
> > better yet
> > a new dataframe) of counts, where the row names in the new table (or
> > dataframe) are the column names of the original df.
> >
> > start w/
> > DF1 =
> >           Pos1  Pos2 Pos3 ....
> > oligo1   G       C     A
> > oligo2   U       U     A
> > oligo3   G       C     C
> > oligo4   C       G     U
> > oligo5   A       A     G
> > .....
> 
>  > apply(DF1, 2, table)
>    Pos1 Pos2 Pos3
> A    1    1    2
> C    1    2    1
> G    2    1    1
> U    1    1    1

Note that apply() gives the wrong answer here some column
of DF1 doesn't contain all of the 4 bases (G,A,U,C).  E.g.
   > apply(DF1,2,table) # good
     Pos1 Pos2 Pos3
   A    1    1    2
   C    1    2    1
   G    2    1    1
   U    1    1    1
   > apply(DF1[-5,],2,table) # bad
        Pos1 Pos2 Pos3
   [1,]    1    2    2
   [2,]    2    1    1
   [3,]    1    1    1
apply() generally has problems with data.frames, since it
converts them to matrices, losing information unless all
columns are the same type and that type is atomic (no
attributes or non-trivial class).

sapply() will avoid that conversion to matrix and often does
better.  However, if not every column had all 4 bases sapply
would fail in the same way as apply.  In this case it would
be best to either preprocess the data.frame so that all the
columns are factors with the same 4 levels or to have the
call to sapply (or apply) do this conversion on the fly.  E.g.,

   > sapply(DF1a[-5,], function(x)table(factor(x,
levels=c("G","A","U","C"))))
     Pos1 Pos2 Pos3
   G    2    1    0
   A    0    0    2
   U    1    1    1
   C    1    2    1
   > apply(DF1a[-5,], 2, function(x)table(factor(x,
levels=c("G","A","U","C"))))
     Pos1 Pos2 Pos3
   G    2    1    0
   A    0    0    2
   U    1    1    1
   C    1    2    1

By the way, another approach avoids the *apply calls and just
makes a 2 way table of value by column name:

   > table(as.matrix(DF1), colnames(DF1)[col(DF1)])
      
       Pos1 Pos2 Pos3
     A    1    1    2
     C    1    2    1
     G    2    1    1
     U    1    1    1

This approach won't give an incorrect (misaligned) answer if some
base is not in the dataset, but, of course, it will not include a base
in the table if it is not in the whole dataset.  Calling factor() with
the
desired levels in the call to table would fix that.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

> 
> 
> Since tables are really matrices, the t() operation would 
> bring you to  
> your goal:
> 
>  > t( apply(DF1, 2, table) )
>       A C G U
> Pos1 1 1 2 1
> Pos2 1 2 1 1
> Pos3 2 1 1 1
> 
> 
> >
> > End with
> >
> > DF2 =
> >           G  A  U C
> > Pos1   2   1 1 1
> > Pos2   1  1  1  2
> > Pos3  1  2  1  1
> > ....
> >
> > I know how to generate the counts of each one column at a time using
> > "table(DF1$Pos1)".
> > Is there a way to do this in one step?  Should I just write a for  
> > loop for
> > each of the columns?
> ---
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list