[BioC] Using limma to identify differentially expressed genes

Ovokeraye Achinike-Oduaran ovokeraye at gmail.com
Wed Apr 11 13:28:12 CEST 2012


Hi Jim,

I digress a bit (sorry David). But I was looking at your code
combining both getGEO and getGEOSuppFiles. The analysis you did I'm
guessing is based on the raw files because you had to read in an
affybatch object. I'm having a challenge with making my covdesc.txt
file to work for me with read.affy(), so I'm wondering if your
combination is a way to retrieve the phenotypic data without having to
manually create the text file. In other words, my question is: does
this combination of getGEO() and getGEOSuppFiles() make it possible to
boycott the use of the manually created covdesc.txt file in
read.affy()?

Thanks.

Avoks


On Tue, Apr 10, 2012 at 5:14 PM, David Westergaard <david at harsk.dk> wrote:
> Hello,
>
> I've been trying to use limma to identify genes from the following
> data: http://www.ebi.ac.uk/arrayexpress/experiments/E-GEOD-21340 -
> It's a simple control vs. disease experiment
>
>
> # SDRF downloaded from above page
> SDRF <- read.table(file="E-GEOD-21340.sdrf.txt",header=TRUE,stringsAsFactors=FALSE,sep="\t")
>
> # Looking to compare Family-history negativer versus Diabetis
> controls <- SDRF[grep("Control, Family History
> Neg",SDRF$Comment..Sample_source_name.),]
> disease <- SDRF[grep("^DM",SDRF$Characteristics.DiseaseState.),]
> Batch <- rbind(controls,disease)
>
> # Read in CEL files
> mixture.batch <- ReadAffy(filenames=Batch$Array.Data.File)
>
> # Preprocess data
> mixture.processed <- expresso(mixture.batch, bgcorrect.method = "rma",
> normalize.method = "quantiles", pmcorrect.method = "pmonly",
> summary.method = "medianpolish")
>
> # Get data in matrix
> signals <- exprs(mixture.prepared)
> cl <- ifelse(colnames(signals) %in% disease$Array.Data.File,1,0)
>
> # Do design matrix and fit
> design <- model.matrix(~factor(cl))
> fit <- lmFit(signals,design)
> fit <- eBayes(fit)
> topTable(fit2,coef=2)
>
> Which yields the following:
>               ID  logFC AveExpr     t P.Value adj.P.Val     B
> 7513    208004_at -0.323 5.43 -4.65 0.00191     0.999 -3.10
> 11225 211829_s_at  0.340 5.07 4.36 0.00278     0.999 -3.17
> 5950    206424_at -0.907    6.65 -4.15 0.00363     0.999 -3.23
> 1354  201826_s_at -0.447 8.37 -4.13 0.00374     0.999 -3.24
> 19782   220418_at  0.392 5.43 4.02 0.00431     0.999 -3.27
> 8889  209396_s_at  1.899    7.47  4.01 0.00437     0.999 -3.28
> 5005    205478_at -0.931    9.22 -3.94 0.00481     0.999 -3.30
> 9469  209983_s_at  0.412 5.72 3.92 0.00492     0.999 -3.31
> 2936    203409_at  0.506    6.93  3.87 0.00531     0.999 -3.32
> 5054  205527_s_at  0.331 6.80 3.84 0.00549     0.999 -3.33
>
> I'm abit puzzled over the adjusted P-values. Can it really be true
> that ALL of the adjusted P-values are 0.999, or did I make a rookie
> mistake somewhere?
>
> Best Regards,
> David Westergaard
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list