[BioC] Using Ringo

Joern Toedling toedling at ebi.ac.uk
Wed Aug 1 12:59:17 CEST 2007


Hello Christoph,

since this issue might of interest to other users of Ringo as well, I
have CCed this to the Bioconductor mailing list.
Please find my answers to your remarks below.

> However, at one point I seem to have a misconception about how data should
> be processed with Ringo: Based on your previous e-mail, I concluded that it
> would be best to have separate RGList and ExpressionSet objects for each
> separate analysis, such as for different histone modifications, and also to
> hold promoter array slide 1 (chr to chr10) and slide 2 (chr10 to chrY) in
> separate objects. That is, only biological replicates of the same histone
> modification on the same array slide are packaged into a single RGList and
> ExpressionSet object.
>   

Sorry, apparently my answer and the example in the vignette were
slightly misleading here. I do not see why different histone
modifications measured on the same array platform should be kept in
separate RGLists. I usually keep them in the same RGList, unless there's
a very good reason to separate the histone modifications (or TF-ChIP
data or other hybridizations), such as a very strong batch affect (huge
differences between the raw data that were measured on different days
and/or in different labs etc.) whose presence or absence you should
assess during the quality assessment step on the raw data (boxplots of
raw data distribution etc.).

There's is another advantage in combining the histone modifications into
a single RGList. VSN and other between-array normalizations aim at
making different arrays more consistent and comparable to each other. If
you only supply one array per RGList, you will only be able to perform
sub-optimal comparisons between these different resulting
MALists/ExpressionSets.

Samples measured on a different array platforms, such as the second part
of the genome represented by different probes on a separate array,
however, should be kept in a separate RGList. I would then normalize the
two RGLists separately and obtain the MALists of both.

MA1 <- preprocess(RG1[RG1$genes$Status=="Probe", ], returnMAList=TRUE)
MA2 <- preprocess(RG2[RG2$genes$Status=="Probe", ], returnMAList=TRUE)

and then combine the MALists:
MA.comb <- rbind(MA1,MA2) # results in one MAList
X.comb <- Ringo:::asExprSet(MA.comb) # results in one ExpressionSet

Then you should also generate one common probeAnno environment out of
your two files. The script "makeProbeAnno.R" in the 'scripts' directory
of the package contains such an example, too. Since I considerably
streamlined that script after the BioC2.0 release, please use the
development version of Ringo from
http://www.bioconductor.org/packages/2.1/bioc/html/Ringo.html

> This works perfectly fine for those cases where I have at least two
> biological replicates. In these cases, preprocess() runs without error and
> returns a normalized ExpressionSet. However, for some chromatin
> modifications I have only a single array but no replicates, and for these
> the normalization fails with the following error message:
>
>   
>>   MA <- preprocess(RG[RG$genes$Status=="Probe", ]) # normalization
>>     
> excluding any random probes, which are spotted in duplicate and cause
> trouble
> Normalizing...
> vsn: 385301 x 2 matrix (1 stratum). 100% done.
> Error in `colnames<-`(`*tmp*`, value = "1") : 
>         attempt to set colnames on object with less than two dimensions
> In addition: Warning messages:
> 1: The function 'vsn' is deprecated, could you please use 'vsn2' instead. 
> 2: The exprSet class is deprecated, use ExpressionSet instead 
> 3: The exprSet class is deprecated, use ExpressionSet instead 
> 4: The exprSet class is deprecated, use ExpressionSet instead 
> 5: The exprSet class is deprecated, use ExpressionSet instead 
> 6: The exprSet class is deprecated, use ExpressionSet instead 
>
> What solution would you suggest? Am I getting something terribly wrong?
>
>   

No, this is not worrying at all and is due to R's behavior to coerce
matrices with only one column into vectors if you do not explicitly stop
it from doing so and limma's "normalizeBetweenArray" does not. Taking
the normalization issue if you only have one sample aside, if you only
have one sample in the MAList you should manually convert the element
'M' of the MAList into a matrix before converting it into an
ExpressionSet, like this:
MA.comb$M <- as.matrix(MA.comb$M)
X.comb <- Ringo:::asExprSet(MA.comb) # results in one ExpressionSet

I will add such a line into the "asExprSet" function, too. Thank you for
pointing this out.

Regards,
Joern

-- 
Joern Toedling
EMBL - European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge CB10 1SD
United Kingdom
Phone  +44(0)1223 492566
Email  toedling at ebi.ac.uk



More information about the Bioconductor mailing list