[BioC] how deal with multiplicate affy probes?

Sean Davis sdavis2 at mail.nih.gov
Fri Mar 26 17:48:30 CET 2004


Just to add two cents to the discussion:

In our lab, we have a similar problem.  We have oligo arrays (oligos
slightly longer than affy) designed against a set of transcripts as defined
a couple of years ago.  Of course, over time, the annotations and predicted
transcripts have changed, so we have resorted to blatting (probably not good
to use blat for short oligos) all of the oligos against ensembl transcripts,
refseq, and genbank est (and then mapping to unigene).  Determining
meaningful blat (or blast) cutoffs is difficult if not impossible to do only
because hybridization may not be directly related to a score or even to
%identity (some probes hybridize better than others), so we construct a
database for each new build of the transcripts from the different annotators
(NCBI, ensembl, etc) of the blat hits so that one can examine the
characteristics of a suspect probe or set of probes in the context of the
expression data (eg., 2 probes that hit the same transcript may or not
behave the same way and if they don't, it is useful to quickly have access
to blat information that might explain the effect).  In summary, the process
of blat->assign probe to transcript->interpret based on this single
assignment may not be adequate in some situations.  Having all results on
hand in database form seems useful in our hands.

Finally, as noted above, blatting or blasting against the genome does not
get you the same information.

Sean

On 3/26/04 4:59 AM, "Ron Ophir" <lsophir at wisemail.weizmann.ac.il> wrote:

> Hi All,
> There is a project called GeneAnnot from the people of GeneCards that
> implement this idea by "blatting" each probe to many RNA annotation
> resources and integrate the results into two scores that define the
> specificity and the sensitivity of the whole probe set. It was done on
> U95 sets and please encourage that to do it on other chips.
> You can fine the papers at:
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abst
> ract&list_uids=14725348
> and at
> http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abst
> ract&list_uids=14962929
> and please have a look at
> http://genecards.weizmann.ac.il/geneannot/
> regards,
> Ron
> 
>>>> <Arne.Muller at aventis.com> 03/26/04 11:26 AM >>>
> Hi,
> 
> you may be able to automate this by blasting all the target sequences
> (as
> Lawrence suggested) against the ENSEMBLE confirmed or predicted genes
> (i.e.
> not the complete genome but just the genes). Then only look at matches
> with
>> 95% sequence identity (not sure about this cut off). In your analysis
> ignore
> all probe sets that do not have a confident match (<95% sequence id).
> One
> could say this is the actual informative subset of probe sets on the
> chip.
> 
> Note that you can still get >1 match with >95% id per probe set! In my
> opinion the correspnding *single* expression measure is meaningless,
> since
> you cannot measure the >1 exprssion measures with the same probe set ...
> 
> For cases with >1 gene per probe set (as determined by blast using ther
> target sequence) you may need to fall back to the single probe level
> where
> you may find that one of the genes has >95% sequence id in many probes
> whereas the other doesn't.
> 
> As I said above you could automate this, but I it's not an easy task ..
> :-(
> 
>   regards,
> 
>   Arne
> 
> --
> Arne Muller, Ph.D.
> Toxicogenomics, Aventis Pharma
> arne dot muller domain=aventis com
> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Johnnidis,
>> Jonathan
>> Sent: 26 March 2004 00:53
>> To: Lawrence Paul Petalidis; Michael Seewald
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: RE: [BioC] how deal with multiplicate affy probes?
>> 
>> 
>> thank you for your suggestions.  However, in this instance
>> I'm not interested in particular transcripts but rather an
>> entire range of transcripts (several hundred)--so I'm not
>> sure it would be feasible to individually lookup and verify
>> every single probe set...
>> Jonathan
>> 
>> -----Original Message-----
>> From: Lawrence Paul Petalidis [mailto:lpp22 at cam.ac.uk]
>> Sent: Thursday, March 25, 2004 6:35 PM
>> To: Michael Seewald; Johnnidis, Jonathan
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: RE: [BioC] how deal with multiplicate affy probes?
>> 
>> 
>> Hello,
>> As a note following on from Michael Seewald's message, I
>> totally agree that
>> there is a STRONG need to BLAST probe set sequences. I tend
>> to use the probe
>> set target sequence instead of the indicidual probe sequences
>> however. You
>> will be surprised to see the inconsistency of the Affy
>> annotation, in many
>> cases _at probes are really not unique at all. So if you are really
>> interested in a transcript, BLAST it to make sure you are
>> actually seeing
>> what you think you are.
>> 
>> Best regards to all, Lawrence
>> 
>> ______________________________
>> Lawrence Paul Petalidis
>> Ph.D. Candidate
>> 
>> University of Cambridge
>> Department of Pathology
>> ______________________________
>> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch
>> [mailto:bioconductor-bounces at stat.math.ethz.ch]On Behalf Of Michael
>> Seewald
>> Sent: 25 March 2004 20:48
>> To: Johnnidis, Jonathan
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] how deal with multiplicate affy probes?
>> 
>> 
>> 
>> As a rule of thumb: If statistics based on a given probe set
>> data tells you,
>> that a transcript is significantly deregulated, you can
>> usually trust it and
>> discard every other probe set for that transcript!
>> 
>> The thing to look at is the probe design itself: Download the
>> probe set from
>> NetAffx and blast the single probes agains the genome (e.g.
>> in ensembl). You
>> will be surprised, how many probes match up with introns or
>> genomic regions
>> that do not correspond to any cDNA!
>> 
>> 2 examples: There are 4 probe sets for human Wnt6
>> (HG-U133AB), 2 match with
>> the sense (!) strand and have to be discarded. Out of >12
>> probe sets for
>> human
>> CD44, only 4 have probes that are completely matching the
>> transcripts. >8
>> can
>> be discarded.
>> 
>> Best,
>> Michael
>> 
>> PS: www.ensembl.org is always a good place to check probe sets. Their
>> mapping
>> of probe sets does not show the location of single probes, though...
>> 
>> PPS: In affymetrix.com you can check out the "Details" view
>> for a probe set.
>> There you can discover, that 2 probe sets of Wnt 6 map to the
>> (-) strand,
>> which is bad. It doesn't tell you, however, that many probe sets match
>> intron
>> regions.
>> 
>> 
>> On Sat, 20 Mar 2004, Johnnidis, Jonathan wrote:
>>> I'm a new list member and am not quite sure if this question is
>> appropriate
>>> for the list, but will shoot anyway. I'm analyzing a bunch
>> of data from
>> Affy
>>> MgU74Av2 chips and am a bit perplexed as to how to treat conflicting
>>> expression data from multiplicate probe sets (that is a
>> gene that has >1
>>> probe set designed against it (for example, 97569_r_at and
>> 97658_r_at are
>>> both probes for the Insulin gene).
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list