[BioC] Inconsistent annotation of affy probeset on Affymetrix chip for rat: 230.2

Robert Gentleman rgentlem at fhcrc.org
Thu Jul 3 06:32:07 CEST 2008


Hi,
  It is actually a bit simpler than Mark has suggested.

  biocLite("rat2302probe")

  will get the probe sequences used (at least as reported in late March 
- but they should not change)

  then you could ask Herve to build a BSgenome package for Rat, and use 
Biostrings to do the matching...

  or save the probes and use BLAT or any other string matcher (MAQ)

  best wishes
    Robert

Mark Cowley wrote:
> Hi Christoph,
> I would recommend obtaining the sequences of the actual probes that make 
> up this probeset (from NetAffx), then align them to the latest genome 
> using BLAT, thereby you can convince yourself which mRNA that these 
> probes will be most likely to detect.
> I find that aligning the probes often tells you far more information 
> than the affymetrix consensus sequence ever wi
> Be very concerned if your probes start aligning all over the genome!ll.
> 
> cheers,
> Mark
> 
> On 03/07/2008, at 3:47 AM, Marc Carlson wrote:
> 
>> Christoph Preuss wrote:
>>> Hi everyone,
>>>
>>> We analyzed a global exression microarray data set using gcrma for the
>>> normalization step and limma for finding differentially expressed
>>> genes. One of the most significant probesets (ProbeSetID annotation
>>> "1375535_at") in terms of d.e is annotated as  :
>>> Probeset "1375535_at"
>>> -Gene Symbol: Lpin1
>>> - Location: Chr 6
>>>
>>> in the bioconductor package  "rat2302" / "rat2302.db".
>>>
>>> We also looked at the Affymetrix web site, where the same probeset was
>>> annoted as "Transcribed sequence" on chromosome X.
>>>
>>> Affymetrix Annotation RG 230 2.0 Chip:
>>> -ProbeSetID:    1375535_at
>>> -Target Sequence:   
>>>
>>>> RAT230_2:1375535_AT
>>>>
>>> gaagttagagagctgtttccccactttacattttaaaatatgtatgccaggatntaatca
>>> ttcctttaagtgtacacttcaaggagagatgtgccgaataagaaaatagctttctctagc
>>> gtgaagggttttgcgtccgccgagttcttaaggtcttttttaagagctactgtgtatgag
>>> tgtgtgtatgtgtgcgcatgcatgttcctgcgactagtcattcattcacatggtgatcag
>>> acaacaatgggagctggttcgtctaccttatcttgtgggtcctggagttcaatctcagat
>>> catcaggctgggcagcaagtgccttcaccctccgagccatcttgccatcccacagctgag
>>> cgtctaatatgacattgccgatga
>>>
>>> Interestingly, the given target sequence for the probeset matches only
>>> a mouse sequence and not even a rat mRNA (blastn search).
>>>
>>> The question is which annotation should we trust?
>>> Is there any chance to validate the probeset annotation?
>>> Many thanks in advance for any help.
>>>
>>> cheers,
>>>
>>> Christoph Preuss
>>>
>>> (Leibniz-Institute for Arteriosclerosis Research, University of
>>> Muenster Germany )
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>> Hi Christoph,
>>
>> I can only really speak for the Bioconductor annotations which are 
>> generated from public sources along with an initial mapping of the 
>> probe or probeset to a public accession (usually this is a Genbank, 
>> Entrez ID or a related type of ID).  In the case of  "1375535_at", the 
>> probeset is an Affymetrix probeset and so we are ultimately at the 
>> mercy of Affymetrix to accurately tell us what this probeset is in 
>> this initial mapping, but after this we do the rest ourselves by using 
>> public sources.  We map the probeset to ID information onto additional 
>> information gathered from public sources (primarily NCBI) to get the 
>> rest of the information in the package.  The file that you get from 
>> Affymetrix may also have a lot of the same data as our packages, but 
>> unless they describe it somewhere, I don't think we actually know for 
>> certain where they collected all of their information from.  The only 
>> information that we ever actually take from them is the initial 
>> mapping of their probeset onto a public accession.
>>
>> I dug up the latest Affymetrix mapping files that we used to generate 
>> this package and investigated.  From the file that I have (which was 
>> collected in late March) the probeset you listed is indicated to be 
>> Lpin1, and also to be located on Chromosome 6 which agrees completely 
>> with the information that we gathered from NCBI  and GoldenPath from 
>> this time.  As of this morning, NCBI still lists this gene as being 
>> Lipin1 and being located on Chromosome 6.  However, there is also a 
>> field right next to that in the Affymetrix file that is called 
>> "Alignments" which lists the X chromosome.  But when I pull up an even 
>> more recent file from Affymetrix, then I see that they no longer list 
>> the location of this gene and have now replaced that value with a 
>> "---", they also no longer list the genes name or symbol.  But they 
>> still list Chromosome "X" in the alignment field and have even 
>> assigned different accessions to this probeset.
>> So the short answer is that Affymetrix has changed their mind about 
>> what they are claiming this probeset is measuring.
>>
>>
>> I hope this helps you,
>>
>>
>>   Marc
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list