[BioC] DESeq no recognizing row.names

Steve Lianoglou lianoglou.steve at gene.com
Fri May 24 23:51:25 CEST 2013


Hi Alicia,

On Fri, May 24, 2013 at 2:38 PM, Alicia R. Pérez-Porro
<alicia.r.perezporro at gmail.com> wrote:
> Hi,
>
> I'm trying to use DESeq to know the differential expressed genes of my
> datasets and i'm encountering that DESeq is not recognizing my row.names so
> i can't create my cds.
>
> My .csv input file looks like:
>
> transcript_id,C4,CRL_2APR10,CRL_1_15JUL11,CRL_2_15JUL11
> comp1000201_c0_seq1,5.00,0.00,0.00,0.00
> comp1000297_c0_seq1,7.00,0.00,0.00,0.00
> comp100036_c0_seq1,0.00,0.00,0.00,0.00
> comp10003_c1_seq1,2.00,0.00,0.00,0.00
> comp100041_c0_seq1,3.00,0.00,0.00,0.00
> comp100041_c0_seq2,0.00,0.00,0.00,0.00
> comp100041_c0_seq3,0.00,0.00,0.00,0.00
> comp100051_c0_seq1,0.00,0.00,0.00,0.00
> comp1000890_c0_seq1,3.00,0.00,0.00,0.00
>
> This is what i'm running:
>
>> spercysts_vs_embryos = read.csv (
> +   file.choose(),
> +   header = TRUE,
> +   row.names=1,
> +   sep = ",",
> +   dec = ".")
>
>> head(spercysts_vs_embryos)
>                     C4 CRL_2APR10 CRL_1_15JUL11 CRL_2_15JUL11
> comp1000201_c0_seq1  5          0             0             0
> comp1000297_c0_seq1  7          0             0             0
> comp100036_c0_seq1   0          0             0             0
> comp10003_c1_seq1    2          0             0             0
> comp100041_c0_seq1   3          0             0             0
> comp100041_c0_seq2   0          0             0             0
>
>>cond = factor(c("SP", "SP", "EB", "EB"))
>
>> spercysts_vs_embryosDesign = data.frame(
> +   row.names = colnames( spercysts_vs_embryos ),
> +   condition = c( "SP", "SP", "EB", "EB" ),
> +   libType = c( "paired-end", "paired-end", "paired-end", "paired-end" ) )
>> spercysts_vs_embryosDesign
>               condition    libType
> C4                   SP paired-end
> CRL_2APR10           SP paired-end
> CRL_1_15JUL11        EB paired-end
> CRL_2_15JUL11        EB paired-end
>
>> str(spercysts_vs_embryos)
> 'data.frame': 307048 obs. of  4 variables:
>  $ C4           : num  5 7 0 2 3 0 0 0 3 0 ...
>  $ CRL_2APR10   : num  0 0 0 0 0 0 0 0 0 0 ...
>  $ CRL_1_15JUL11: num  0 0 0 0 0 0 0 0 0 10 ...
>  $ CRL_2_15JUL11: num  0 0 0 0 0 0 0 0 0 3 ...
>
> So, everything looks fine to me. But when i try to create my cds:

Everything isn't fine :-) Your columns should be integers, not just
"numeric". If you look at the source code of `newCountDataSet`, you'll
see right at the very top:

    countData <- as.matrix(countData)
    if (any(round(countData) != countData))
        stop("The countData is not integer.")

Which looks like the error you are getting here:

>> cds <-newCountDataSet(spercysts_vs_embryos, cond )
> Error in newCountDataSet(spercysts_vs_embryos, cond) :
>   The countData is not integer.

So it's not that you have NA's in your data.frame, but your first
problem is that the numbers you are using for your count matrix are
not rounding to themselves, which is a quick/easy way to check that
they aren't whole numbers, as you would expect by count data, and
DESeq requires count data.

So, instead of checking for NA here:

> So, if i check what is happening:
>
>> which( is.na(spercysts_vs_embryos), arr.ind=TRUE )
>      row col

You might try to check which numbers are suspect:

R> which(round(spercysts_vs_embryos) != spercysts_vs_embryos), arr.ind=TRUE)

HTH,
-steve

--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list