[BioC] Limma to find differentially expressed genes

James W. MacDonald jmacdon at uw.edu
Fri Apr 19 15:04:05 CEST 2013


Hi Sandy,

On 4/19/2013 8:03 AM, Sandy [guest] wrote:
>
> I have my matrix designed in the following way which I name as mat1
>
>                   Probes  sample1  sample1 sample2 sample2 sample3 sample3 sample4 sample4
>                           rep1      rep2    rep1   rep2    rep1    rep2    rep1    rep2
>                   ------------------------------------------------------------------------
>                     gene1   5.098   5.076   5.072  4.677  7.450   7.456   8.564   8.555
>                     gene2   8.906   8.903   6.700  6.653  6.749   6.754   7.546   7.540
>                     gene3   7.409   7.398   5.392  5.432  6.715   6.724   5.345   5.330
>                     gene4   4.876   4.869   5.864  5.981  4.280   4.290   4.267   4.255
>                     gene4   3.567   3.560   3.554  3.425  8.500   8.564   6.345   6.330
>                     gene5   2.569   2.560   8.600  8.645  5.225   5.234   7.345   7.333
>
> I use the limma package to find the DEG's
>
>        Group<- factor(c("p1", "p1", "p2", "p2","p3", "p4","p4")
>        design<- model.matrix(~0 + Group)
>        colnames(design)<- gsub("Group","", colnames(design))
>        fit<- lmFit(mat1[,1:4],design)
>        contrast.matrix<-makeContrasts(p1-p2,levels=design)
>        fit2<-contrasts.fit(fit,contrast.matrix)
>        fit2<-eBayes(fit2)
>        sel.diif<-p.adjust(fit2$F.p.value,method="fdr")<0.05
>        deg<-mat1[,1:4][sel.diif,]
>
> So will "deg" just give me those genes which are significant in sample one versus two. I am interested in those genes which are significant only in first sample but not in the second sample and am not sure if this is the right approach.

Two things. First, you shouldn't be digging around in the 'fit2' object 
if you don't know what you are doing. If you read the user's guide, you 
will see that the correct sequence of events for you would be (after the 
eBayes step):

topGenes <- topTable(fit2, coef = 1, p.value = 0.05, number = Inf)

Then if you want the underlying data

deg <- mat[row.names(topGenes),1:4]

Second, I think you might misunderstand a fundamental concept here. 
There is no such thing as genes being significant in one sample but not 
in the second. A microarray can't really tell you if a gene is expressed 
or not. All it can tell you is if a gene appears to be expressed at a 
significantly different level in one sample versus another. That is what 
you are testing here, whether or not genes in p1 are expressed at a 
different level than in p2.

So this raises a question; when you say 'significant only in first 
sample but not in the second sample' are you instead asking for genes 
that are UP in p1 as compared to p2? If so, you can further filter 
results in the topGenes object by the logFC column, selecting only those 
with a positive value.

Best,

Jim


>
>
>   -- output of sessionInfo():
>
> R version 2.15.2 (2012-10-26)
> Platform: i686-redhat-linux-gnu (32-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
>   [1] limma_3.14.4         csSAM_1.2.1          GOstats_2.24.0       RSQLite_0.10.0       DBI_0.2-5
>   [6] graph_1.36.2         Category_2.22.0      AnnotationDbi_1.20.5 affy_1.36.1          Biobase_2.16.0
> [11] BiocGenerics_0.4.0   R.utils_1.23.2       R.oo_1.13.0          R.methodsS3_1.4.2
>
> loaded via a namespace (and not attached):
>   [1] affyio_1.22.0         annotate_1.36.0       AnnotationForge_1.0.3 BiocInstaller_1.8.3   genefilter_1.40.0
>   [6] GO.db_2.8.0           GSEABase_1.18.0       IRanges_1.16.6        parallel_2.15.2       preprocessCore_1.18.0
> [11] RBGL_1.34.0           splines_2.15.2        stats4_2.15.2         survival_2.36-14      tools_2.15.2
> [16] XML_3.9-4             xtable_1.6-0          zlibbioc_1.4.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list