[BioC] Does the strand of a microarray probe matter?

Thu Nov 20 21:48:25 CET 2008

Hi Nick, and others,

Apologies for not making my question more clear, but I guess there have 
been some interesting answers anyway. I was in fact thinking of 
expression arrays. And my main interest was from the standpoint of probe 
annotation.

It now does seem pretty clear that there are many regions in the genome 
that encode transcripts on both strands. If a probe is designed to such 
a region, the expression microarrays will be measuring both transcripts, 
and you will essentially have a "perfectly" cross-hybridizing probe.

Now, annotation-wise, what should we do? Ignore such probes? At least 
flag them up? The problem is, many bioconductor annotation packages only 
allow a single gene to be assigned to each probe. So, in many cases you 
many be led to believe that your experiment has measured differential 
expression for a particular gene (with its set of GO terms, KEGG 
pathways, etc) when in fact the changing gene was the one on the other 
strand.

These "problems" tend to show up on the list occasionally, for example 
when people find out that different databases (Ensembl/Biomart, NCBI, 
the manufacturer or a bioC annotation package) lists different genes for 
the same probe. Obviously not all, but many of these differences have 
been due to overlapping transcripts. In fact, Ensembl recently patched 
their probe mapping pipeline to be "strand-aware". If you think that 
this would affect a tiny portion of probes, think again: the Affymetrix 
probes affected on the human and mouse genomes was around 10%:

http://osdir.com/ml/science.biology.ensembl.devel/2008-06/msg00052.html

Also, from talking to some of the NuID/Illumina mapping people it seems 
that they simply don't consider the strand of the probe. But they do 
calculate a "uniqueness" score to avoid probes that map to multiple genes.

In the end, I would ideally prefer "cross-hybridizing" probes (of 
whatever sort) to be annotated in a way that they could be identified. 
But I have no idea of how much a nightmare that would be for the 
developers of the current annotation packages...

Many thanks,

Cei

Nick Henriquez wrote:
> Dear Cei, Steve,
> 
> There are two versions of the correct answer depending on whether we are
> talking about an expression or CGH/SNP type array;
> 
> If we are using an EXPRESSION array
> 
> 1) It does not matter on which strand the gene resides.
> 2) It a not matter of bad probe design. It is either a negative control or a
> misnomer derived from genome annotation.
> 
> For ANY probe to hybridise it has to be the RC of cDNA and therefore the DNA
> homologue of the original RNA sequence. (I'll let you work that one out for
> yourself).
> 
> If the probe WAS encoded on "the opposite strand" your labelled target would
> not hybridise as it would be the reverse complement of the actual sequence. 
> 
> The annotation "opposite strand" stems from the convention that we call one
> strand the "coding strand" and the other strand the non-coding or "opposite"
> strand. By definition then a gene cannot be encoded by the "opposite"
> strand. 
> 
> However, what often happens when sequencing genomes is that we find several
> genes encoded on one strand (which we will then call the coding strand) and
> then somewhat later also one or more genes on the "opposite" strand. This
> annotation is (wrongly in my opinion) retained when genomes are assembled
> and thus part of the annotation of the probes.
> 
> So an opposite strand probe is at best a kind of negative control, at worst
> a misnomer annotation retained when the genome was assembled. Mostly we now
> try to use terms like + and - but even that has the drawback that we
> generally associate + with coding and - with noncoding. As we all know BOTH
> strand encode functional RNAs of various kinds including those coding for
> proteins.....
> 
> If we are talking about DNA targets, e.g. a SNP array
> 
> 1) It does not matter on which strand a gene resides, any overlap is a
> matter of coincidence- "genes" are rare events on the genome.
> 2) It is not a matter of bad probe design. Usually it simply does not matter
> and this is a sequence that was used historically without knowledge of the
> gene (often discovered later). Sometimes the sequence on the coding strand
> may have a problem with background or sequence similarity. To get around
> this one can try to use the RC (i.e. "opposite strand" sequence) which is
> often different enough. Of course if more than 2 similar sequences exist the
> problem remains as we can use this trick only once.
> 
> Hope this helps,
> 
> Nick
> 
> N.V. Henriquez, Senior Research Associate
> Dept. Of Neurodegenerative Diseases
> Institute of Neurology, UCL, 
> Queen Square House rm 124
> Queen Square
> London WC1N 3BG
> 
> 
> 
> 
> Message: 8
> Date: Wed, 19 Nov 2008 10:45:52 -0500
> From: Steve Lianoglou <mailinglist.honeypot at gmail.com>
> Subject: Re: [BioC] Does the strand of a microarray probe matter?
> To: Cei Abreu-Goodger <cei at ebi.ac.uk>
> Cc: Bioconductor Newsgroup <bioconductor at stat.math.ethz.ch>
> Message-ID: <7710F044-03D5-4572-8EE4-2DB96F4C348C at gmail.com>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> Hi Cei,
> 
> On Nov 19, 2008, at 3:51 AM, Cei Abreu-Goodger wrote:
> 
>> Hello all,
>>
>> Related issues have arisen before, where the probe of a particular  
>> array platform was annotated to a gene on the opposite strand. But I  
>> was just asked if this even matters, or should it simply be  
>> considered a case of bad probe design.
>>
>> Does the protocol for different manufacturer's arrays always produce  
>> amplified product of both strands for the transcript to be measured?  
>> I could imagine that protocols that amplify based on poly-A tails  
>> would tend to produce an anti-sense biased amplification product  
>> (older Affy arrays?), whereas those based on random priming could  
>> produce products of both strands (and so the actual strand that is  
>> on the array becomes meaningless).
>>
>> Does someone know what is the case in particular for Illumina  
>> Beadarrays?
> 
> 
> I've never worked on the bench-side of a microarray experiment, but  
> for gene expression arrays I was under the impression that most  
> protocols:
> 
> (i) extract the the RNA from cell lysate using their poly-A tails as  
> targets
> (ii) reverse transcribe to cDNA and amplify the cDNA w/ random primers.
> (iii) hybridize amplified cDNA to the array
> 
> If that's the case, I don't think that the strand of the probe should  
> be an issue.
> 
> I'd be interested, of course, to hear other people's thoughts on this,  
> too (while this info should be easily available from the  
> manufacturer's site, or the Methods section of many papers, let's see  
> if the lazy-web can help :-).
> 
> -steve
> 
> --
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
> 
> http://cbio.mskcc.org/~lianos
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.