[R] Calculating proportions from a data frame rather than a table

Farrel Buchinsky fjbuch at gmail.com
Thu Oct 4 00:13:34 CEST 2007


Thank you. It comes close but not exactly what I wanted. I had to
scrap my column that contained character values. That column noted the
name of the study. Let me try show you here

Best if viewed in courier font

> coinfection
          study HPV6 HPV11 CoInfect other
1  Wiatrak 2004   31    23        4     0
2 Draganov 2006    6    14        3     0
3  Gabbott 1997   19    24        1     0
4   Gerein 2005   17    14        0     7
5  Michael 2005    8     5        0     1
6    Rabah 2001   29    32        0     0
7  Maloney 2006    4     4        7     0

> str(coinfection)
'data.frame':   7 obs. of  5 variables:
  $ study   : chr  "Wiatrak 2004" "Draganov 2006" "Gabbott 1997"
"Gerein 2005" ...
  $ HPV6    : num  31 6 19 17 8 29 4
  $ HPV11   : num  23 14 24 14 5 32 4
  $ CoInfect: num  4 3 1 0 0 0 7
  $ other   : num  0 0 0 7 1 0 0

I had tried the following and was getting nowhere
> as.table(coinfection)
Error in as.table.default(coinfection) : cannot coerce into a table
> as.table(coinfection[,-1])
Error in as.table.default(coinfection[, -1]) :
        cannot coerce into a table

Thanks to you was able to make some progress.

> as.table(as.matrix(coinfection))
  study         HPV6 HPV11 CoInfect other
1 Wiatrak 2004  31   23    4        0
2 Draganov 2006  6   14    3        0
3 Gabbott 1997  19   24    1        0
4 Gerein 2005   17   14    0        7
5 Michael 2005   8    5    0        1
6 Rabah 2001    29   32    0        0
7 Maloney 2006   4    4    7        0
SO FAR THIS LOOKS GOOD BUT THEN LOOK


> prop.table(as.table(as.matrix(coinfection)),1)#the main reason for doing this
Error in sum(..., na.rm = na.rm) : invalid 'type' (character) of argument

> prop.table(as.table(as.matrix(coinfection[,-1])),1)#this is to get rid of the variable called "study"
        HPV6      HPV11   CoInfect      other
1 0.53448276 0.39655172 0.06896552 0.00000000
2 0.26086957 0.60869565 0.13043478 0.00000000
3 0.43181818 0.54545455 0.02272727 0.00000000
4 0.44736842 0.36842105 0.00000000 0.18421053
5 0.57142857 0.35714286 0.00000000 0.07142857
6 0.47540984 0.52459016 0.00000000 0.00000000
7 0.26666667 0.26666667 0.46666667 0.00000000

WORKS PERFECTLY, EXACTLY WHAT I WANTED EXCEPT I HAVE LOST THE NAME OF
THE STUDY AND HAVE TO GO BACK TO LOOK AT WHICH DATA BELONGS TO WHICH
STUDY. THIS WOULD NOT HAVE HAPPENED IF I HAD THE DATA IN ITS RAWEST
FORM: A TWO COLUMN DATA FRAME WHERE COLUMN ONE WAS THE STUDY AND
COLUMN 2 WAS A FACTOR (LEVELS BEING hpv 6, hpv 11, coinfection,
other). SUCH A DATA FRAME WOULD HAVE HAD 253 rows. Then I could have
used table(column1,column2) and I could have got all this data as a
table and the study name would be preserved. It is not that big a deal
that I have to look elsewhere to find the study name but it seems
silly that I cannot analyze data that is not in the raw state. I am
sure there is a way. I just do not know it.




On 10/3/07, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>
> I think that what you need to do is
>
>         as.table(as.matrix(dff))
>
> E.g.
>
>         melvin <- data.frame(x=c(3,1,3,2),y=c(3,3,4,5))
>         clyde   <- as.table(as.matrix(melvin))
>         prop.table(clyde,1)
>
>            x         y
> A 0.5000000 0.5000000
> B 0.2500000 0.7500000
> C 0.4285714 0.5714286
> D 0.2857143 0.7142857
>
> HTH.
>
>                         cheers,
>
>                                 Rolf Turner
>
> ######################################################################
> Attention:
> This e-mail message is privileged and confidential. If you are not the
> intended recipient please delete the message and notify the sender.
> Any views or opinions presented are solely those of the author.
>
> This e-mail has been scanned and cleared by MailMarshal
> www.marshalsoftware.com
> ######################################################################
>



-- 
Farrel Buchinsky
GrandCentral Tel: (412) 567-7870



More information about the R-help mailing list