[BioC] Question about hgu133plus2cdf?

James W. MacDonald jmacdon at uw.edu
Thu Mar 15 15:07:49 CET 2012


Hi Fabrice,

On 3/15/2012 9:52 AM, Fabrice Tourre wrote:
> Dear Nico,
>
> Thank you very much for your explain.
>
> I am wondering why the hgu133plus2cdf in Bioc is not based on the
> custom CDF from Dai et al. It seems that unique mapping is better.

The default cdf packages from BioC are based on the manufacturer's data. 
As are the probe and annotation packages. We create these packages as a 
service to our end users, without making any claims to the suitability 
of these packages for any use (which I might add is true of all BioC 
packages, not just the metadata packages).

It is not in our interest (nor yours, I might guess) for us to decide 
which mapping is 'better', and then restrict what we supply. We have 
made the MBNI packages available via biocLite() for something like 6-7 
years now, in order to ensure that the end user has easy access to 
whatever mapping they feel appropriate to their analysis, and leave it 
up to the end user to make that decision.

Best,

Jim


>
> On Thu, Mar 15, 2012 at 9:16 PM, Nicolas Delhomme<delhomme at embl.de>  wrote:
>> Dear Fabrice,
>>
>> The hgu133plus2cdf in Bioc is based on the information provided by Affymetrix.
>>
>> The custom CDF from the website you mention, contains probes re-aligned to the human genome and only those probes that have a unique mapping are used. See their publication:  Dai et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Research (2005) vol. 33 (20) pp. e175 .
>>
>> That won't solve your SNP problem, but you can use the hgu133plus2probes package that contains the probe sequences or the one provided by Dai et al for that. Based on these sequences and their mapping, you should be able to filter out those that contains SNPs you're not interested in. For that the IRanges functionalities might prove helpful. Whether you drop the whole probe-set or try to re-create your own CDF then is up to you.
>>
>> If you want to create your own CDF, check the vignette of the makecdfenv package for that: vignette("makecdfenv"). And you might want to make sure your new probe-set are valid. This paper is a good starting point for that:  Lu et al. Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: high-resolution annotation for microarrays. BMC Bioinformatics (2007) vol. 8 pp. 108.
>>
>> HTH,
>>
>> Nico
>>
>>
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>>
>> Genome Biology Computational Support
>>
>> European Molecular Biology Laboratory
>>
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>>
>>
>>
>>
>>
>> On 15 Mar 2012, at 13:44, Fabrice Tourre wrote:
>>
>>> Dear list,
>>>
>>> I am now analysis hgu133plus2 array. I want a CDF which has been
>>> removed probes with SNPs. Because I want to remove the the noise
>>> caused by single nucleotide polymorphisms (SNPs) in different samples.
>>> Also I do not want some probeset which sequences can mapped to
>>> multiple genome position.
>>>
>>> In bioconductor, there is a package hgu133plus2cdf. I also noticed
>>> there is a website provide custom CDF file for hgu133plus2.
>>>
>>> The website is:
>>> http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp
>>> HGU133Plus2 (Version 15.0.0, ENTREZG)
>>>
>>> Is the same for this two CDF files?
>>>
>>> Or the package hgu133plus2cdf directly from Affy CDF file?
>>>
>>> Thank you very much in advance.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list