[R] Calculating proportions from a data frame rather than a table

Deepayan Sarkar deepayan.sarkar at gmail.com
Thu Oct 4 00:22:06 CEST 2007


On 10/3/07, Farrel Buchinsky <fjbuch at gmail.com> wrote:
> Thank you. It comes close but not exactly what I wanted. I had to
> scrap my column that contained character values. That column noted the
> name of the study. Let me try show you here
>
> Best if viewed in courier font
>
> > coinfection
>           study HPV6 HPV11 CoInfect other
> 1  Wiatrak 2004   31    23        4     0
> 2 Draganov 2006    6    14        3     0
> 3  Gabbott 1997   19    24        1     0
> 4   Gerein 2005   17    14        0     7
> 5  Michael 2005    8     5        0     1
> 6    Rabah 2001   29    32        0     0
> 7  Maloney 2006    4     4        7     0
>
> > str(coinfection)
> 'data.frame':   7 obs. of  5 variables:
>   $ study   : chr  "Wiatrak 2004" "Draganov 2006" "Gabbott 1997"
> "Gerein 2005" ...
>   $ HPV6    : num  31 6 19 17 8 29 4
>   $ HPV11   : num  23 14 24 14 5 32 4
>   $ CoInfect: num  4 3 1 0 0 0 7
>   $ other   : num  0 0 0 7 1 0 0
>
> I had tried the following and was getting nowhere
> > as.table(coinfection)
> Error in as.table.default(coinfection) : cannot coerce into a table
> > as.table(coinfection[,-1])
> Error in as.table.default(coinfection[, -1]) :
>         cannot coerce into a table
>
> Thanks to you was able to make some progress.
>
> > as.table(as.matrix(coinfection))
>   study         HPV6 HPV11 CoInfect other
> 1 Wiatrak 2004  31   23    4        0
> 2 Draganov 2006  6   14    3        0
> 3 Gabbott 1997  19   24    1        0
> 4 Gerein 2005   17   14    0        7
> 5 Michael 2005   8    5    0        1
> 6 Rabah 2001    29   32    0        0
> 7 Maloney 2006   4    4    7        0
> SO FAR THIS LOOKS GOOD BUT THEN LOOK
>
>
> > prop.table(as.table(as.matrix(coinfection)),1)#the main reason for doing this
> Error in sum(..., na.rm = na.rm) : invalid 'type' (character) of argument
>
> > prop.table(as.table(as.matrix(coinfection[,-1])),1)#this is to get rid of the variable called "study"
>         HPV6      HPV11   CoInfect      other
> 1 0.53448276 0.39655172 0.06896552 0.00000000
> 2 0.26086957 0.60869565 0.13043478 0.00000000
> 3 0.43181818 0.54545455 0.02272727 0.00000000
> 4 0.44736842 0.36842105 0.00000000 0.18421053
> 5 0.57142857 0.35714286 0.00000000 0.07142857
> 6 0.47540984 0.52459016 0.00000000 0.00000000
> 7 0.26666667 0.26666667 0.46666667 0.00000000
>
> WORKS PERFECTLY, EXACTLY WHAT I WANTED EXCEPT I HAVE LOST THE NAME OF
> THE STUDY AND HAVE TO GO BACK TO LOOK AT WHICH DATA BELONGS TO WHICH
> STUDY. THIS WOULD NOT HAVE HAPPENED IF I HAD THE DATA IN ITS RAWEST
> FORM: A TWO COLUMN DATA FRAME WHERE COLUMN ONE WAS THE STUDY AND
> COLUMN 2 WAS A FACTOR (LEVELS BEING hpv 6, hpv 11, coinfection,
> other). SUCH A DATA FRAME WOULD HAVE HAD 253 rows. Then I could have
> used table(column1,column2) and I could have got all this data as a
> table and the study name would be preserved. It is not that big a deal
> that I have to look elsewhere to find the study name but it seems
> silly that I cannot analyze data that is not in the raw state. I am
> sure there is a way. I just do not know it.

Try making $study the row names (which they are for your `table'), end
everything should be fine:

row.names(coinfection) <- coinfection$study
coinfection$study <- NULL
prop.table(as.matrix(coinfection)) # etc

-Deepayan



More information about the R-help mailing list