[BioC] Affy data analysis

James W. MacDonald jmacdon at uw.edu
Mon Apr 16 18:02:28 CEST 2012


Hi Himanshu,

Please don't take discussions off list. We like to think of the list 
archives as a repository of knowledge, and if you take conversations 
off-list, it eliminates that aspect.

On 4/16/2012 10:35 AM, hsharm03 at students.poly.edu wrote:
> Dear James,
> Thanks a lot for your reply. I will definitely do that. Also , I 
> wanted to ask where can I get the annotations for the HT430mgpm 
> array. Once I get the table of top genes. I would like to annotate 
> them and use them for pathway analysis . Where can I get the 
> annotation.db for the same. Also, Which method will you recommend in 
> order to get the eset. I was thinking of using RMA. I should be 
> comparing all of my control to each tumor differently?. Is that right 
> ?. I am sorry to be asking you so many questions but I am new to this 
> field and was thinking about it since many days.

The mouse4302.db annotation package should suffice. It is my 
understanding that the chip you used is the same, without the MM probes. 
If not, it is easy enough to make your own with the annotation file you 
can get from Affymetrix and the AnnotationDbi package. See

http://bioconductor.org/packages/release/bioc/vignettes/AnnotationDbi/inst/doc/makeProbePackage.pdf

if you want to make your own.

As for deciding how to analyze your data, that is up to you. I am more 
than willing to help with questions about how to use BioC packages, but 
cannot give analysis advice.

Best,

Jim


> Thanks a lot for your help till now . It is really helpful. Hope to 
> hear back from you.
> Thanks,
> Himanshu Sharma.
>
> > Date: Mon, 16 Apr 2012 09:49:10 -0400
> > From: jmacdon at uw.edu
> > To: hsharm03 at students.poly.edu
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Affy data analysis
> >
> > Hi Himanshu Sharma,
> >
> > On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote:
> > > Dear all,I have data from affy HT430mgpm and I need to analyze the 
> data for differential expression and pathway analysis. I have 3 
> wildtype controls (Wt neurospheres 2 and 3) for the control analysis. 
> I have two other tumors (1509 and 1701) for the analysis. From the cel 
> files, it doesn’t appear that we did replicates for the tumors, just 
> one each, the rationale at the time being that we had wanted to first 
> quickly scan the tumors for common signatures. Those genes that are 
> clearly highly expressed should however represent additional oncogenic 
> signatures, that may stem from the same or related activating 
> pathways.For now, my analysis for controls should give me an accurate 
> expression data for the controls. The tumors will have to be compared 
> across the samples to look for the low hanging fruits.??I am not sure 
> how do I go about doing this since I have 3 replicates for the control 
> but 1 each for different tumors. What should be the strategy that I 
> should use in order to do my analysis.
> >
> > You can just analyze your data as indicated in the limma User's Guide.
> > Note that although you only have one sample for each of the tumor
> > samples, since you have three replicates for the control you end up 
> with
> > 2 degrees of freedom, so can actually fit a model and compute 
> contrasts.
> > Here is an example using some fake data:
> >
> > > x <- matrix(rnorm(5e5), ncol = 5)
> > > design <- model.matrix(~factor(rep(1:3, c(3,1,1))))
> > > fit <- lmFit(x, design)
> > > fit2 <- eBayes(fit)
> > > topTable(fit2, 2)
> > logFC t P.Value adj.P.Val B
> > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008
> > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736
> > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717
> > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015
> > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141
> > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574
> > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841
> > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714
> > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596
> > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809
> > > topTable(fit2, 3)
> > logFC t P.Value adj.P.Val B
> > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077
> > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617
> > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452
> > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249
> > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920
> > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524
> > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235
> > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398
> > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357
> > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741
> >
> > As you can see, limma is happy to run the analysis without any
> > replication for two of the sample types.
> >
> > Best,
> >
> > Jim
> >
> >
> > > Thanks,Himanshu Sharma
> > > [[alternative HTML version deleted]]
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
> >

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list