[R] stuck with affy / limma

Tue Mar 30 00:19:26 CEST 2010

Hi Maxim,

This is the wrong list for this question, please subscribe to and ask
this question on the bioconductor list.

Info on how to subscribe is here:

http://www.bioconductor.org/docs/mailList.html

-steve

On Mon, Mar 29, 2010 at 4:24 PM, Maxim <deeepersound at googlemail.com> wrote:
> Hi,
>
> I have a question concerning the analysis of some affymetrix chips. I
> downloaded some of the data from GEO GSE11324 (see below). In doing so I'm
> stuck after I identified the probesets with significant changes. I have
> problems in assigning probeset specific gene names as well as getting the
> genomic coordinates. Furthermore I have no clue how to deal with the fact,
> that most genes have different probesets with differential transcriptional
> outcomes.
>
>
> I did this based on the affy and limma manuals like:
>
> targets file:
> Name FileName Target
> 0h1 GSM286031.CEL control
> 0h2 GSM286032.CEL control
> 0h3 GSM286033.CEL control
> 3h1 GSM286034.CEL three
> 3h2 GSM286035.CEL three
> 3h3 GSM286036.CEL three
> 6h1 GSM286037.CEL six
> 6h2 GSM286038.CEL six
> 6h3 GSM286039.CEL six
>
>
> library(affy)
> library(limma)
> library(vsn)
>
> pd <- read.AnnotatedDataFrame("er_for_affy.txt", header = TRUE, row.names =
> 2)
> pData(pd)
> #### load
> a <- ReadAffy(filenames = rownames(pData(pd)), phenoData = pd, verbose =
> TRUE)
> #### normalize
> x <- expresso(a, bg.correct = FALSE, normalize.method = "vsn",
> normalize.param
> = list(subsample = 1000), pmcorrect.method = "pmonly", summary.method =
> "medianpolish")
> ### genes with highest variation
> library(hgu133plus2.db)
> rsd <- apply(exprs(x), 1, sd)
> sel <- order(rsd, decreasing = TRUE)[1:250]
>
>
> u<-mget(row.names(exprs(x)[sel,]),hgu133plus2SYMBOL)
> heatmap(exprs(x)[sel, ], labRow=u)
> ### in this case it works to extract the gene symbol
>
>
> ### limma contrasts
> design <- model.matrix(~ -1+factor(c(1,1,1,2,2,2,3,3,3)))
> colnames(design) <- c("control", "three", "six")
> fit <- lmFit(x, design)
> meanSdPlot(x)
> contrast.matrix <- makeContrasts(three-control, six-control, levels=design)
> fit2 <- contrasts.fit(fit, contrast.matrix)
> fit2 <- eBayes(fit2)
> #### top list
> topTable(fit2, coef=1, adjust="BH", number=20, sort.by="M")
> library(hgu133plus2.db)
> u<-mget(row.names(fit2),hgu133plus2SYMBOL)
>
> How can I produce a topTable result with according gene names, somehow I do
> not understand the genelist argument?
>
> Next, I would like to produce a standard clustering of the "fold changes"
> observed within (averaged) contrasts 1 (three - control) and 2 (six -
> control) and a heatmap presentation of the results. How to extract for
> example all fold-changes of those genes with a p-value<0.001 in at least one
> of the two contrasts?
>
> The coordinates of the genes I seem to get with
> v<-mget(row.names(fit2),hgu133plus2CHRLOC)
> v<-mget(row.names(fit2),hgu133plus2CHRLOCEND)
> But again I do not know, how to implement it into my fit2 object or topTable
> results. Furthermore there are many probesets with multiple mappings, should
> these not be excluded from the analysis?
>
> Actually, in the end I'd like to get the corresponding genes' coordinates in
> a way, that the maximum region size is given, eg in case of genes with
> multiple TSSs.
>
> As mentioned above, I do not know how to deal with the fact, that many genes
> are represented with mutliple probesets, often with different fold changes -
> is there a general recipe to deal with this question? Furthermore there are
> many probesets with multiple mappings, should these not be excluded from the
> analysis?
>
> I know it's a lot of questions, so is there a general source of information,
> that may help me to overcome the hurdles?
>
> Maxim
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact