[BioC] Affy data analysis

James W. MacDonald jmacdon at uw.edu
Tue Apr 17 21:15:49 CEST 2012


Hi Himanshu,

On 4/17/2012 2:54 PM, hsharm03 at students.poly.edu wrote:
> Dear James ,
> I was able to get the topTable using the method you told me to follow 
> from the limma manual. But when I try to annotate the genes using 
> ENTREZ , it requires id's but the topTable we found does not have the 
> Id's tab. So what should I be doing?.

If you are using an ExpressionSet when you run lmFit(), then you should 
automatically get the ID column in your topTable() output. If not, note 
that the row.names for your topTable() output correspond to the rows of 
your data, so you can get the probeset IDs that way.

Best,

Jim


> Thanks,
> Himanshu Sharma.
>
> > Date: Mon, 16 Apr 2012 09:49:10 -0400
> > From: jmacdon at uw.edu
> > To: hsharm03 at students.poly.edu
> > CC: bioconductor at r-project.org
> > Subject: Re: [BioC] Affy data analysis
> >
> > Hi Himanshu Sharma,
> >
> > On 4/14/2012 6:42 PM, hsharm03 at students.poly.edu wrote:
> > > Dear all,I have data from affy HT430mgpm and I need to analyze the 
> data for differential expression and pathway analysis. I have 3 
> wildtype controls (Wt neurospheres 2 and 3) for the control analysis. 
> I have two other tumors (1509 and 1701) for the analysis. From the cel 
> files, it doesn’t appear that we did replicates for the tumors, just 
> one each, the rationale at the time being that we had wanted to first 
> quickly scan the tumors for common signatures. Those genes that are 
> clearly highly expressed should however represent additional oncogenic 
> signatures, that may stem from the same or related activating 
> pathways.For now, my analysis for controls should give me an accurate 
> expression data for the controls. The tumors will have to be compared 
> across the samples to look for the low hanging fruits.??I am not sure 
> how do I go about doing this since I have 3 replicates for the control 
> but 1 each for different tumors. What should be the strategy that I 
> should use in order to do my analysis.
> >
> > You can just analyze your data as indicated in the limma User's Guide.
> > Note that although you only have one sample for each of the tumor
> > samples, since you have three replicates for the control you end up 
> with
> > 2 degrees of freedom, so can actually fit a model and compute 
> contrasts.
> > Here is an example using some fake data:
> >
> > > x <- matrix(rnorm(5e5), ncol = 5)
> > > design <- model.matrix(~factor(rep(1:3, c(3,1,1))))
> > > fit <- lmFit(x, design)
> > > fit2 <- eBayes(fit)
> > > topTable(fit2, 2)
> > logFC t P.Value adj.P.Val B
> > 27913 -5.164721 -4.474076 7.678459e-06 0.6669534 -4.402008
> > 98975 4.907831 4.251539 2.124031e-05 0.6669534 -4.421736
> > 90287 4.800002 4.158128 3.209996e-05 0.6669534 -4.429717
> > 41684 -4.754741 -4.118920 3.808058e-05 0.6669534 -4.433015
> > 43210 -4.711426 -4.081397 4.478309e-05 0.6669534 -4.436141
> > 46761 4.705393 4.076171 4.580108e-05 0.6669534 -4.436574
> > 37345 -4.687702 -4.060846 4.891387e-05 0.6669534 -4.437841
> > 98788 4.633203 4.013635 5.981260e-05 0.6669534 -4.441714
> > 46584 4.606493 3.990496 6.595873e-05 0.6669534 -4.443596
> > 72789 -4.603451 -3.987861 6.669534e-05 0.6669534 -4.443809
> > > topTable(fit2, 3)
> > logFC t P.Value adj.P.Val B
> > 19401 -5.232576 -4.532857 5.822486e-06 0.5822486 -1.796077
> > 883 4.813581 4.169892 3.048726e-05 0.8544860 -2.252617
> > 87408 -4.667879 -4.043673 5.263993e-05 0.8544860 -2.402452
> > 76730 4.641339 4.020682 5.805112e-05 0.8544860 -2.429249
> > 50261 4.533133 3.926946 8.605996e-05 0.8544860 -2.536920
> > 63980 4.502927 3.900780 9.591473e-05 0.8544860 -2.566524
> > 783 -4.498102 -3.896600 9.758446e-05 0.8544860 -2.571235
> > 59496 -4.441207 -3.847313 1.194575e-04 0.8544860 -2.626398
> > 92491 4.427735 3.835642 1.252750e-04 0.8544860 -2.639357
> > 22351 -4.420041 -3.828977 1.287163e-04 0.8544860 -2.646741
> >
> > As you can see, limma is happy to run the analysis without any
> > replication for two of the sample types.
> >
> > Best,
> >
> > Jim
> >
> >
> > > Thanks,Himanshu Sharma
> > > [[alternative HTML version deleted]]
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at r-project.org
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> > --
> > James W. MacDonald, M.S.
> > Biostatistician
> > University of Washington
> > Environmental and Occupational Health Sciences
> > 4225 Roosevelt Way NE, # 100
> > Seattle WA 98105-6099
> >
> >

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list