[BioC] read.phenoData vs read.AnnotatedDataFrame

Johnstone, Alice Alice.Johnstone at esr.cri.nz
Fri Jul 27 02:05:17 CEST 2007


 

Your initial question was about differences between a previous analysis
and current analysis. Are there still differences, or are you now more
interested in finding the 'right' analysis?
If there are still differences, you'll need to
(a) do a bit more investigation into where the differences are occurring
-- it seems like the phenotypic data are ok, the remaining issues could
be differences in the assayData of eset.rma, or in the limma-based
analysis of identical data sets. Can you narrow this down?
(b) provide more detail about how the 'comparison of the treatment to
the controls gets mixed up'.
If what you're really interested in is the correct way to analyze an
experiment, then the I'd suggest reviewing the limma manual and if that
is not helpful starting a new thread on the list with a subject that
emphasizes limma experimental design rather than differences between
read.phenoData and read.AnnotatedDataFrame.
Hope that helps,
Martin

There are still differences, the analysis is on the same data - but run
on different computers with different versions of R 2.5 vs 2.4 which
allows the read.phenoData.  
The comparison gets "mixed up" where with read.AnnotatedDataFrame and
R2.5.0 I run it as tr1-SAL and tr2-SAL
But when I compare this to the analysis done with R2.4 and
read.phenoData it looks like their tr2-SAL corresponds to my tr1-SAL,
but inversed ie I have negative logFC where theirs is positive.  Their
tr1-SAL would correspond to tr2-tr1 in my analysis.  As I want to
compare the treatment to the control, one of these is clearly comparing
the wrong groups.
If this is not a problem that anyone else has come across I will try and
run these side by side to compare at each step in case there is
something that I have overlooked.  The only other thing I can think of
going by other posts is perhaps the packages need updating on the
computer to run correctly.  I believe the analysis is correct, but which
one I use to carry on with my experiments has me in a pickle!

"Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:

>  
>
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Thursday, 26 July 2007 11:49 a.m.
> To: Johnstone, Alice
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
>
> Hi Alice --
>
> "Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
>> Using R2.5.0 and Bioconductor I have been following code to analysis 
>> Affymetrix expression data: 2 treatments vs control.  The original 
>> code was run last year and used the read.phenoData command, however 
>> with the newer version I get the error message Warning messages:
>> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The

>> phenoData class is deprecated, use AnnotatedDataFrame (with
>> ExpressionSet) instead
>> I use the read.AnnotatedDataFrame command, but when it comes to the 
>> end of the analysis the comparison of the treatment to the controls 
>> gets mixed up compared to what you get using the original 
>> read.phenoData ie it looks like the 3 groups get labelled wrong and 
>> so
>
>> the comparisons are different (but they can still be matched up).
>> My questions are,
>> 1) do you need to set up your target file differently when using 
>> read.AnnotatedDataFrame - what is the standard format?
>
> I can't quite tell where things are going wrong for you, so it would 
> help if you can narrow down where the problem occurs.  I think 
> read.AnnotatedDataFrame should be comparable to read.phenoData. Does
>> pData(pd)
> look right? What about
>> pData(Data)
> and
>> pData(eset.rma)
>
> All these are the same, with the correct target for the files.
>> pData(eset.rma)
>    Filename Target
> 1   m01.cel     tr2
> 2   m02.cel     tr2
> 3   m03.cel     tr2
> 4   m04.cel     tr2
> 5   m05.cel     tr2
> 6   s01.cel     cont
> 7   s02.cel     cont
> 8   s03.cel     cont
> 9   s04.cel     cont
> 10  s06.cel     cont
> 11  b08.cel     tr1
> 12  b09.cel     tr1
> 13  b10.cel     tr1
> 14  b11.cel     tr1
> 15  b12.cel     tr1
>
>
> ? It's not important but pData(pd)$Target is the same as pd$Target.
> Since the analysis is on eset.rma, it probably makes sense to use the 
> pData from there to construct your design matrix
>> targs<-factor(eset.rma$Target)
>> design<-model.matrix(~0+targs)
>> colnames(design)<-levels(targs)
> Does design look right?
>
> Yes the design looks right to me
>> design
>    tr1 tr2 cont
> 1    0  1   0
> 2    0  1   0
> 3    0  1   0
> 4    0  1   0
> 5    0  1   0
> 6    0  0   1
> 7    0  0   1
> 8    0  0   1
> 9    0  0   1
> 10   0  0   1
> 11   1  0   0
> 12   1  0   0
> 13   1  0   0
> 14   1  0   0
> 15   1  0   0
> attr(,"assign")
> [1] 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$targs
> [1] "contr.treatment"
>
>
>> I have three columns sample, filename and target.
>> 2) do you need to use a different model matrix to what I have?  
>> 3) do you use a different command for making the contrasts?
>
> Depends on the question! If you're performing the same analysis as 
> last year, then the model matrix and contrasts have to be the same!
>
> The question was simply whether the treatments were significantly 
> different to the control.
>> cont.wt
>       Contrasts
> Levels tr1-cont tr2-cont
>    tr1       1      0
>    tr2       0      1
>    cont     -1     -1
>
>
> Anything obvious? cheers
>
>
>> I have included my code below if that is of any assistance.
>> Many Thanks!
>> Alice
>>  
>>  
>>  
>> ##Read data
>> pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
>> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
>> ##normalisation
>> eset.rma<-rma(Data)
>> ##analysis
>> targs<-factor(pData(pd)$Target)
>> design<-model.matrix(~0+targs)
>> colnames(design)<-levels(targs)
>> fit<-lmFit(eset.rma,design)
>> cont.wt<-makeContrasts("tr1-cont","tr2-cont",leve
>> l
>> s=
>> design)
>> fit2<-contrasts.fit(fit,cont.wt)
>> fit2.eb<-eBayes(fit2)
>> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
>> topTable(fit2.eb,coef=2,n=300)
>> topTable(fit2.eb,coef=1,n=300)
>>  
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Martin Morgan
> Bioconductor / Computational Biology
> http://bioconductor.org
>
>

--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the Bioconductor mailing list