[R] Problem with rowMeans()

Erik Iverson iverson at biostat.wisc.edu
Fri Jun 13 01:48:25 CEST 2008



ss wrote:
> It is:
> 
>  > data <- 
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', 
> row.names = NULL ,header=TRUE, fill=TRUE)
>  > class(data[3])
> [1] "data.frame"
>  >
> 

Oops, should have said  class(data[[3]]) and
is.numeric(data[[3]])

See ?Extract

> 
> And if I try to use as.matrix(read.table()), I got:
> 
>  >data 
> <-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
> + row.names = NULL ,header=TRUE, fill=TRUE))
>  > data[1:4,1:4]
>      Probe_ID       Gene_Symbol M16012391010920 M16012391010525
> [1,] "A_23_P105862" "13CDNA73"  "-1.6"          " 0.16"       
> [2,] "A_23_P76435"  "15E1.2"    "0.18"          " 0.59"       
> [3,] "A_24_P402115" "15E1.2"    "1.63"          "-0.62"       
> [4,] "A_32_P227764" "15E1.2"    "-0.76"         "-0.42" 
> 
> You see they are surrounded by "".
> 
> I don't see such if I just use >read.table
> 

That is because matrices (objects of class 'matrix') are of homogeneous 
type.  It changes everything to a character (including the numbers), 
which you certainly do NOT want.

You want a data.frame, I will provide an example of what I think you are 
after.

Try the following commands and see how they compare to your situation: 
these work for me.

test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y = 
rnorm(26), z = rnorm(26))

test

class(test)

is.numeric(test[[2]])

is.numeric(test[[3]])

rowMeans(test)

rowMeans(test[2:3])

>  > data <- 
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', 
> row.names = NULL ,header=TRUE, fill=TRUE)
>  > data[1:4,1:4]
>       Probe_ID Gene_Symbol M16012391010920 M16012391010525
> 1 A_23_P105862    13CDNA73            -1.6            0.16
> 2  A_23_P76435      15E1.2            0.18            0.59
> 3 A_24_P402115      15E1.2            1.63           -0.62
> 4 A_32_P227764      15E1.2           -0.76           -0.42
> 
> 
> Thanks,
>       Allen
> 
> 
> 
> On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson <iverson at biostat.wisc.edu 
> <mailto:iverson at biostat.wisc.edu>> wrote:
> 
> 
> 
>     ss wrote:
> 
>         Hi Wacek,
> 
>         Yes, data is data frame not a matrix.
> 
>             is.numeric(data[3])
> 
>         [1] FALSE
> 
> 
>     what is class(data[3])
> 
> 
>         But I looked at the column 3 and it looks okay though. There are
>         few NAs and
>         I did find
>         anything strange.
> 
>         Any suggestions?
> 
>         Thanks,
>              Allen
> 
> 
> 
>         On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
>         Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
>         <mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>> wrote:
> 
>             ss wrote:
> 
>                 Thank you very much, Wacek! It works very well.
>                 But there is a minor problem. I did the following:
> 
>                     data <-
> 
>                 read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>                 +row.names = NULL ,header=TRUE, fill=TRUE)
> 
>             looks like you have a data frame, not a matrix
> 
> 
>                     dim(data)
> 
>                 [1] 23963    85
> 
>                     data[1:4,1:4]
> 
>                      Probe_ID Gene_Symbol M16012391010920 M16012391010525
>                 1 A_23_P105862    13CDNA73            -1.6            0.16
>                 2  A_23_P76435      15E1.2            0.18            0.59
>                 3 A_24_P402115      15E1.2            1.63           -0.62
>                 4 A_32_P227764      15E1.2           -0.76           -0.42
> 
>                     data1<-data[sapply(data, is.numeric)]
>                     dim(data1)
> 
>                 [1] 23963    82
> 
>                     data1[1:4,1:4]
> 
>                  M16012391010525 M16012391010843 M16012391010531
>                 M16012391010921
>                 1            0.16           -0.23           -1.40      
>                      0.90
>                 2            0.59            0.28           -0.30      
>                      0.08
>                 3           -0.62           -0.62           -0.22      
>                     -0.18
>                 4           -0.42            0.01            0.28      
>                     -0.79
> 
>                 You will notice that, after using 'data[sapply(data,
>                 is.numeric)]' and
>                 getting
>                 data1, the first sample in data, called
>                 'M16012391010920', was missed
>                 in data1.
> 
>                 Any further suggestions?
> 
>             surely there must be an entry in column 3 that makes it
>             non-numeric.
>             what does is.numeric(data[3]) say?  (NAs should not make a
>             column
>             non-numeric, unless there are only NAs there, which is not
>             the case
>             here.)  check your data for non-numeric entries in column 3,
>             there can
>             be a typo.
> 
>             vQ
> 
> 
>                [[alternative HTML version deleted]]
> 
>         ______________________________________________
>         R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> 
>         https://stat.ethz.ch/mailman/listinfo/r-help
>         PLEASE do read the posting guide
>         http://www.R-project.org/posting-guide.html
>         and provide commented, minimal, self-contained, reproducible code.
> 
>



More information about the R-help mailing list