[BioC] Development of GCRMA-like methods

Sat Feb 7 05:40:15 MET 2004

although all the below statements may be true in theory, in
practice we see something slightly different.

say you want to predict intensities for a probe using its sequence
information? we have tried prediction using models such as
those based on the nearest neighbor models, and they dont work nearly
as well as statistical based models (that use training data) as
the one suggested by Naef.

Naef's idea of simply modeling the log of the sequence effect as an
additive linear model using the position/base as predictors (with the
across position effect for fixed bases modeled with a smooth function of
position) works much better at prediction. the data demonstrates that
having a C near the middle results in high intensities and having an A
near the middle in low intensities. the closer to the middle, the larger
the effect. adding interactions (to account for nearest neighbor effects)
does not seem to help prediction at all. both Naef and Jean Wu find this. 

The G and T dont appear to have much of an effect, although there is some. 
To see the plots you can look at Figure 3 in Naef and Magnasco's paper
"Solving the riddle of the ..." Physical Review E, 68:011906, 2003.

we demonstrate that Naef's simple additive model also works for
predicting intensities in arrays where one expects only non-specific
binding (NSB).
(http://www.biostat.jhsph.edu/~ririzarr/papers/p177-irizarry.pdf)
and you can see the same ATGC effect for NSB on 
page 66 of http://www.biostat.jhsph.edu/~ririzarr/Talks/jnj-affy.pdf

if anybody has empirical evidence (in microarray data) demonstrating some
of the below statements i would be interested in seeing it. 

On Thu, 5 Feb 2004, Richard Finney wrote:

> Hay.  Just a couple of notes on your questions ...
> 
> > > My understanding is that-
> > > The first of these papers shows that MM intensity
> > is related to GC content, and weights MM values
> > towards the average distribution of the binding of
> > MM with similar GC contents.
> 
> The signal intensity is more a function of the 
> CT content.  The lights attach to the back of the
> of Gs and As on the target cDNA. 
>  Remarkeably, this is syergistic :
> the more Cs and Ts you have lined up together, 
> the more the signal.  The locaion of the Cs and Ts
> are also important.  They are stronger in the middle.
> 
> 
> > > 
> > > The second proposes that most MM>PM occur because
> > when the middle PM A/G is changed to MM T/C the
> > smaller size of the substituted pyrimidine (C or T)
> > allows room for the label on the target RNA (U or C)
> > which would otherwise interfere with the binding to
> > the PM.
> 
> Label gets put on the cDNA.  RNA is converted to
> cDNA at the last step.
> 
> Cross hybridization comes from everywhere.
> The Gs and As overhelm the the correct hybridization
> at low levels of expressions.  At higher levels,
> the PM goes above MM.  It's not just that middle
> base, it's what's around it, too.  The more Cs and
> T's surrounding the mismatch spot, the stronger
> the signal.
> 
> 
> > > 
> > > Are there plans to combine these ideas and would
> > there be any benefit from doing so? Would
> > sub-setting the MM based on both GC content and the
> > middle base provide more accurate distributions to
> > weight the MM's to? The majority of the MM>PM would
> > have C (or T) as their middle base and 'averaging'
> > them must surely distort things for the MM's with A
> > or G?
> > > 
> > > Finally Fig 3 in ref (2) shows nice fits of the
> > positional effect due to having individual bases at
> > different positions (1-25) in the PM probe. What
> > would such fits look like for the MM probes, would
> > it be similar or random/distorted due to
> > non-specific binding? And would it help in answering
> > why C has a smaller effect than G on intensity - or
> > is this already known?
> 
> I don't have the Fig.; but MM should look the same
> with a dip at the mismatch spot.
> 
> Nota Bene : The signal strenth is also a function
> of the probablility that the target RNA will fold
> and a function of the distance from the 3` end.
> 
> 
> 
> __________________________________
> 
> Yahoo! Finance: Get your refund fast by filing online.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>