[BioC] RLMM questions

Benilton Carvalho bcarvalh at jhsph.edu
Tue Aug 22 22:02:17 CEST 2006


To use CRLMM, you should install oligo, available at:

http://www.bioconductor.org/packages/1.9/bioc/html/oligo.html

You will also need a platform design environment, specific to the  
array you're using... Some can be downloaded from:

http://www.biostat.jhsph.edu/~bcarvalh/research.html

Best,

b.

On Aug 22, 2006, at 2:52 PM, Henrik Bengtsson wrote:

> Hi.
>
> On 8/22/06, Amit Bahl <abahl at mail.med.upenn.edu> wrote:
>>
>> I have a custom Affy array which allows several applications
>> (expression profiling, genotyping, etc...) on a single chip.  I want
>> to use RLMM to analyze our genotyping data, but have a couple of
>> questions:
>>
>> 1) Instead of normalizing to the scale of the training set (which I
>> don't have), does it make sense to normalize all arrays to each other
>> using quantile normalization?
>
> Depending of what type of data, but most likely yes.  If you work with
> extreme data such as cancer data, the might be too many copy-number
> differences for the assumptions behind quantile normalization to be
> true.
>
>> If I do this, then instead of using a
>> raw file intermediate, I could go from an abatch object directly to
>> the norm files (what is the format of these files?).  This is also
>> appealing as gtype_cel_to_pq chokes on my CDF file, probably due to
>> the mixed design.
>
> I can't tell you about 'abatch' objects, but I know that Affymetrix'
> gtype_cel_to_pq tool is designed for the 100K SNP chips, which have
> exactly 20PM and 20MM per SNP (probeset).  This is not the case for
> say the 500K chips.  The simple reason for this assumption is that it
> outputs a tab-delimited ASCII file (*.raw) with a table of rows of
> equal lengths.  Using tables to store CEL data with SNPs of different
> lengths does not work well.
>
>>
>> 2) Once I have norm files, I can create the theta file - but Is there
>> a way to do unsupervised classification from the results in the theta
>> file (that is, how do I  avoid the internal regions file altogether
>> or make a compatible uninformative one)?  Of course, I could always
>> define my own conservative decision regions in the unit square.
>>
>> 3) My genotyping probe-sets don't all have 20 PM probes, does RLMM
>> explicitly require this?
>
> If you talk about the package RLMM, the answer is yes.  The
> method/algorithm RLMM itself works on a SNP-to-SNP bases and does not
> require equally sized SNPs.
>
>>
>> 4) I'm also interested in checking how much the various quartet
>> offsets contribute to classification results.  Are the 20 probes in
>> the raw or norm file ordered by offset and strand?
>
> I did look at this many months ago and if I remember it correctly, the
> answer is that the probes are ordered as they are ordered in the CDF
> file and there all sense probes comes first and then the anti-sense.
> However, just looking in the *.raw file, you do not know how many
> sense and anti-sense probes a specific SNP has; it varies and it is
> *not* the case that it is always 20-20.
>
> If you are going to do serious (long-term) investigation of SNP data,
> I recommend you to move away from the *.raw file format; it was a
> temporary solution and will soon be forgotten.  It is also extremely
> slow to work with ASCII files - much better to work with binary CEL
> files directly.
>
> For low-level access to CDF and CEL data, I would recommend you to
> look at the 'affxparser' package, but also the 'affyio' package.
> Currently, they complement each other.  The latter has been around
> longer (hence probably less bugs), the former builds on top of
> Affymetrix open-source libraries and also tries to minimize memory
> usage by allowing you to work on a subset of probesets across
> 100-1000s of CEL files. Both will allow you to pull information from
> the CDF about probe distributions etc for the SNP.
>
> In the bigger picture, for doing RLMM and similar, I would recommend
> you to look at the 'oligo' package which is under development but is
> being designed for doing SNP analysis in R.  You might also want to
> look at the Affymetrix Power Tools (APT) (non R) which implements
> BRLMM, which is an extension to RLMM that let SNPs borrow information
> from other SNPs in order to get better genotype call regions.  See
> also CRLMM of 'oligo'.
>
> Cheers
>
> Henrik
>
>>
>> -Amit
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/ 
>> gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list