[R] Problem with rowMeans()

Fri Jun 13 02:16:21 CEST 2008

ss wrote:
> Thanks, Erik. I will try your code soon.
> 
> I did this first:
> 
>  > data <- 
> read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt', 
> row.names = NULL ,header=TRUE, fill=TRUE)
>  > class(data[[3]])
> [1] "factor"
>  > is.numeric(data[[3]])
> [1] FALSE
>  >
> 
> So it is not numeric but 'factor' instead.
> Can I convert this column to numeric?

That depends.  My first question if I were you would be 'Why does 
read.table assign the class factor to this column.'

Then read ?factor, paying particular attention to,

   In particular,
      'as.numeric' applied to a factor is meaningless, and may happen by
      implicit coercion.  To transform a factor 'f' to its original
      numeric values, 'as.numeric(levels(f))[f]' is recommended and
      slightly more efficient than 'as.numeric(as.character(f))'.

You might also try levels(data[[3]]), but the list will be long.  The 
goal is to find the value(s) that are causing read.table to assign the 
class 'factor' to this column.  You have lots of values though, so I 
might try something like the following:

setdiff(levels(data[[3]]), 
as.character(as.numeric(levels(data[[3]])[data[[3]]])))

and look at what that returns (you'll get a warning).  Hopefully that 
tells you what is missing.

I see your new email, so that's that!

Good luck,
Erik

> 
> Allen
> 
> On Thu, Jun 12, 2008 at 7:48 PM, Erik Iverson <iverson at biostat.wisc.edu 
> <mailto:iverson at biostat.wisc.edu>> wrote:
> 
> 
> 
>     ss wrote:
> 
>         It is:
> 
>          > data <-
>         read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>         row.names = NULL ,header=TRUE, fill=TRUE)
>          > class(data[3])
>         [1] "data.frame"
>          >
> 
> 
>     Oops, should have said  class(data[[3]]) and
>     is.numeric(data[[3]])
> 
>     See ?Extract
> 
> 
> 
>         And if I try to use as.matrix(read.table()), I got:
> 
>          >data
>         <-as.matrix(read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>         + row.names = NULL ,header=TRUE, fill=TRUE))
>          > data[1:4,1:4]
>             Probe_ID       Gene_Symbol M16012391010920 M16012391010525
>         [1,] "A_23_P105862" "13CDNA73"  "-1.6"          " 0.16"      
>         [2,] "A_23_P76435"  "15E1.2"    "0.18"          " 0.59"      
>         [3,] "A_24_P402115" "15E1.2"    "1.63"          "-0.62"      
>         [4,] "A_32_P227764" "15E1.2"    "-0.76"         "-0.42"
>         You see they are surrounded by "".
> 
>         I don't see such if I just use >read.table
> 
> 
>     That is because matrices (objects of class 'matrix') are of
>     homogeneous type.  It changes everything to a character (including
>     the numbers), which you certainly do NOT want.
> 
>     You want a data.frame, I will provide an example of what I think you
>     are after.
> 
>     Try the following commands and see how they compare to your
>     situation: these work for me.
> 
>     test <- data.frame(x = factor(rep(c("A", "B"), each = 13)), y =
>     rnorm(26), z = rnorm(26))
> 
>     test
> 
>     class(test)
> 
>     is.numeric(test[[2]])
> 
>     is.numeric(test[[3]])
> 
>     rowMeans(test)
> 
>     rowMeans(test[2:3])
> 
>          > data <-
>         read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>         row.names = NULL ,header=TRUE, fill=TRUE)
>          > data[1:4,1:4]
>              Probe_ID Gene_Symbol M16012391010920 M16012391010525
>         1 A_23_P105862    13CDNA73            -1.6            0.16
>         2  A_23_P76435      15E1.2            0.18            0.59
>         3 A_24_P402115      15E1.2            1.63           -0.62
>         4 A_32_P227764      15E1.2           -0.76           -0.42
> 
> 
>         Thanks,
>              Allen
> 
> 
> 
>         On Thu, Jun 12, 2008 at 7:34 PM, Erik Iverson
>         <iverson at biostat.wisc.edu <mailto:iverson at biostat.wisc.edu>
>         <mailto:iverson at biostat.wisc.edu
>         <mailto:iverson at biostat.wisc.edu>>> wrote:
> 
> 
> 
>            ss wrote:
> 
>                Hi Wacek,
> 
>                Yes, data is data frame not a matrix.
> 
>                    is.numeric(data[3])
> 
>                [1] FALSE
> 
> 
>            what is class(data[3])
> 
> 
>                But I looked at the column 3 and it looks okay though.
>         There are
>                few NAs and
>                I did find
>                anything strange.
> 
>                Any suggestions?
> 
>                Thanks,
>                     Allen
> 
> 
> 
>                On Thu, Jun 12, 2008 at 7:01 PM, Wacek Kusnierczyk <
>                Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
>         <mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>
>                <mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
>         <mailto:Waclaw.Marcin.Kusnierczyk at idi.ntnu.no>>> wrote:
> 
>                    ss wrote:
> 
>                        Thank you very much, Wacek! It works very well.
>                        But there is a minor problem. I did the following:
> 
>                            data <-
> 
>                      
>          read.table('E-TABM-1-processed-data-1342561271_log2_with_symbols.txt',
>                        +row.names = NULL ,header=TRUE, fill=TRUE)
> 
>                    looks like you have a data frame, not a matrix
> 
> 
>                            dim(data)
> 
>                        [1] 23963    85
> 
>                            data[1:4,1:4]
> 
>                             Probe_ID Gene_Symbol M16012391010920
>         M16012391010525
>                        1 A_23_P105862    13CDNA73            -1.6      
>              0.16
>                        2  A_23_P76435      15E1.2            0.18      
>              0.59
>                        3 A_24_P402115      15E1.2            1.63      
>             -0.62
>                        4 A_32_P227764      15E1.2           -0.76      
>             -0.42
> 
>                            data1<-data[sapply(data, is.numeric)]
>                            dim(data1)
> 
>                        [1] 23963    82
> 
>                            data1[1:4,1:4]
> 
>                         M16012391010525 M16012391010843 M16012391010531
>                        M16012391010921
>                        1            0.16           -0.23           -1.40
>                                   0.90
>                        2            0.59            0.28           -0.30
>                                   0.08
>                        3           -0.62           -0.62           -0.22
>                                  -0.18
>                        4           -0.42            0.01            0.28
>                                  -0.79
> 
>                        You will notice that, after using 'data[sapply(data,
>                        is.numeric)]' and
>                        getting
>                        data1, the first sample in data, called
>                        'M16012391010920', was missed
>                        in data1.
> 
>                        Any further suggestions?
> 
>                    surely there must be an entry in column 3 that makes it
>                    non-numeric.
>                    what does is.numeric(data[3]) say?  (NAs should not
>         make a
>                    column
>                    non-numeric, unless there are only NAs there, which
>         is not
>                    the case
>                    here.)  check your data for non-numeric entries in
>         column 3,
>                    there can
>                    be a typo.
> 
>                    vQ
> 
> 
>                       [[alternative HTML version deleted]]
> 
>                ______________________________________________
>                R-help at r-project.org <mailto:R-help at r-project.org>
>         <mailto:R-help at r-project.org <mailto:R-help at r-project.org>>
>         mailing list
> 
> 
>                https://stat.ethz.ch/mailman/listinfo/r-help
>                PLEASE do read the posting guide
>                http://www.R-project.org/posting-guide.html
>                and provide commented, minimal, self-contained,
>         reproducible code.
> 
> 
>