[BioC] FW: DESeq analysis

Fatemehsadat Seyednasrollah fatsey at utu.fi
Fri Jun 29 13:26:26 CEST 2012


________________________________________
From: Fatemehsadat Seyednasrollah
Sent: Friday, June 29, 2012 12:00 PM
To: narges [guest]
Subject: RE: DESeq analysis

Hi,
First thanks a lot for your answer.
Actually I have used a subset of a public data from Bowtie(the Montgomery)
 and below are the reduced
codes of my work both from edgeR and DEseq. I wanted to know have I done
something wrong to obtain
very different answers ( 85 from DESeq and 407 from edgeR) or it is natural
to have this hude difference
and it is related to the algorithms?

edgeR:
> g1 <- read.delim ("count1.txt", row.names = 1)
> head(g1)
                NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F
ENSG00000000003        0        0        0        0        1        0        0
ENSG00000000005        0        0        0        0        0        0        0
ENSG00000000419       10       24       19       20       19        8       14
ENSG00000000457       17       15       13       18       21       18       21
ENSG00000000460        2        3        5        2        4        6        8
ENSG00000000938       20        4       35       16       10       17       19
                NA10847F
ENSG00000000003        0
ENSG00000000005        0
ENSG00000000419        6
ENSG00000000457       15
ENSG00000000460        2
ENSG00000000938        9
> group <- factor(rep(c("Male", "Female"), each= 4))
> dge <- DGEList(counts = g1 , group = group )
Calculating library sizes from column totals.
> dge <- calcNormFactors(dge)
> dge <- estimateCommonDisp(dge)
> sqrt (dge$common.dispersion)
[1] 0.3858996
> test <- exactTest(dge)
> head(test$table)
                     logFC    logCPM    PValue
ENSG00000000003 -2.3441897 -3.042057 1.0000000
ENSG00000000005  0.0000000      -Inf 1.0000000
ENSG00000000419  0.5777309  3.850993 0.2791539
ENSG00000000457 -0.3054489  4.080866 0.5592668
ENSG00000000460 -0.7792622  1.966865 0.3274528
ENSG00000000938  0.3909100  3.997866 0.4269672
> sum (test$table$PValue <0.01)
[1] 407


DESeq:

> g1 <- read.table("count1.txt", header = TRUE, row.names = 1)
> conds <- factor(rep(c("Male", "Female"), each= 4))
> dataPack <- data.frame(row.names = colnames(g1), condition =rep( c("Male", "Female"), each= 4))
> dataPack
         condition
NA06994M      Male
NA07051M      Male
NA07347M      Male
NA07357M      Male
NA07000F    Female
NA07037F    Female
NA07346F    Female
NA10847F    Female
> cds <- newCountDataSet(g1, conds)
> head(cds)
CountDataSet (storageMode: environment)
assayData: 1 features, 8 samples
  element names: counts
protocolData: none
phenoData
  sampleNames: NA06994M NA07051M ... NA10847F (8 total)
  varLabels: sizeFactor condition
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:
> head(counts(cds)
+ )
                NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F
ENSG00000000003        0        0        0        0        1        0        0
ENSG00000000005        0        0        0        0        0        0        0
ENSG00000000419       10       24       19       20       19        8       14
ENSG00000000457       17       15       13       18       21       18       21
ENSG00000000460        2        3        5        2        4        6        8
ENSG00000000938       20        4       35       16       10       17       19
                NA10847F
ENSG00000000003        0
ENSG00000000005        0
ENSG00000000419        6
ENSG00000000457       15
ENSG00000000460        2
ENSG00000000938        9
> cds <- estimateSizeFactors(cds)
> sizeFactors(cds)
 NA06994M  NA07051M  NA07347M  NA07357M  NA07000F  NA07037F  NA07346F  NA10847F
0.8599841 1.1102643 1.0869086 1.1157556 1.1056726 1.0666049 0.9152017 0.9402086
> head(counts(cds, normalized= TRUE))

> cds <- estimateDispersions(cds)
> result <- nbinomTest(cds, "Male", "Female")
> nrow(subset(result, result$pval <0.01))
[1] 85

Again thank you so much
With Best Regards,
Narges________________________________________
From: narges [guest] [guest at bioconductor.org]
Sent: Tuesday, June 26, 2012 7:17 PM
To: bioconductor at r-project.org; Fatemehsadat Seyednasrollah
Subject: DESeq analysis

Hi all

I am doing some RNA seq analysis with DESeq. I have applied the nbinomTest to my dataset which I know have many differentially expressed genes but the first problem is that the result values for "padj"column is almost NA and sometimes 1. and when I want to have a splice from my fata frame the result is not meaningful for me.

 -- output of sessionInfo():

res <- nbinomTest(cds, "Male", "Female")

> head(res)
               id   baseMean baseMeanA  baseMeanB foldChange log2FoldChange
1 ENSG00000000003  0.1130534  0.000000  0.2261067        Inf            Inf
2 ENSG00000000005  0.0000000  0.000000  0.0000000        NaN            NaN
3 ENSG00000000419 14.3767155 17.162610 11.5908205  0.6753530     -0.5662863
4 ENSG00000000457 17.0174761 15.342800 18.6921526  1.2183013      0.2848710
5 ENSG00000000460  3.9414822  2.855099  5.0278659  1.7610131      0.8164056
6 ENSG00000000938 16.0894945 18.350117 13.8288718  0.7536122     -0.4081058
       pval padj
1 0.9959638    1
2        NA   NA
3 0.3208560    1
4 0.5942512    1
5 0.4840607    1
6 0.5409953    1


> res1 <- res[res$padj<0.1,]
> head(res1)
       id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
NA   <NA>       NA        NA        NA         NA             NA   NA   NA
NA.1 <NA>       NA        NA        NA         NA             NA   NA   NA
NA.2 <NA>       NA        NA        NA         NA             NA   NA   NA
NA.3 <NA>       NA        NA        NA         NA             NA   NA   NA
NA.4 <NA>       NA        NA        NA         NA             NA   NA   NA
NA.5 <NA>       NA        NA        NA         NA             NA   NA   NA

my first question is that why although I know there are some differentially expressed genes in the my data, all the padj values are NA or 1 and the second question is this "NA.1" , "NA.2", ..... which are emerged as the first column of object "res1"instead of name of genes

Thank you so much
Regards

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list