[BioC] M vs A plot

Tue Feb 17 16:43:58 MET 2004

Dear Naomi (and everybody),

	Thank you for your reply,
	Since the normalized curve is displayed with the same overall command 
that performed the
normalization it is not clear to me why you suggest that the display 
curve is fitted with a different
bandwith that that which was input. Also, once the data is normalized, 
shouldn't the default
parameters yield a flat curve?
	In any even I achieved flattening the curve by two successive loess 
normalizations with default parameters.  Do you see any disadvantage to 
that procedure?

Thanks and best wishes,
Rich
------------------------------------------------------------
Richard A. Friedman, PhD
Associate Research Scientist
Herbert Irving Comprehensive Cancer Center
Oncoinformatics Core
Lecturer
Department of Biomedical Informatics
Box 95, Room 130BB or P&S 1-420C
Columbia University Medical Center
630 W. 168th St.
New York, NY 10032
(212)305-6901 (5-6901) (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

In Memoriam, Julius Schwartz

On Feb 17, 2004, at 9:11 AM, Naomi Altman wrote:

> Dear Richard and other participants in this discussion,
>
> Loess uses a kernel weight to downweight the effects of more distant 
> data values in the local regression.  For bandwidth greater than 1, 
> all of the data values are used, but the more distant values are still 
> downweighted.  As the bandwidth increases, there is less 
> downweighting.
>
> If you increase the bandwidth during the normalization process, the 
> curve used for normalization gets flatter, until at very high 
> bandwidth you are just doing ordinary linear regression.  The 
> normalized values are the residuals from this curve.  As a result, if 
> there is curvature on the MvA plots, large bandwidths lead to 
> normalized data that still have curvature (since only the linear trend 
> is removed.)  Small bandwidths lead to normalized data that are 
> flatter.
>
> My understanding is that Richard then visualized the curvature in the 
> normalized data also using loess.  On these plots we are looking at 
> the curve fitted to the normalized data.  So, if a large bandwidth is 
> used to fit the curves, these curves should be flat.  However, if a 
> large bandwidth is used for normalization, and the default bandwidth 
> is used to visualize the normalized data, there will be excess 
> curvature in the normalized data.
>
> --Naomi
>
>
> At 05:03 PM 2/9/2004, Richard Friedman wrote:
>> Dear Sean (Wolfgang, Naomi, and Everybody),
>>
>>         The original command that I used was
>>
>> > ira.norm <- maNorm(ira.raw, norm ="p")\
>>
>> The command that I used with the altered span is
>>
>>
>> > ira.f8.norm <- maNormMain(ira.raw, f.loc = list(maNormLoess(x = 
>> "maA",
>> +               y = "maM", z= "maPrintTip", w = NULL, subset =TRUE, 
>> span =0.8)),
>> +               Mloc = TRUE,  Mscale = TRUE, echo =FALSE)
>>
>> This command still gave pronounced curvature at in the middle of one 
>> of the printtip blocks and
>> at the ends of several printtip blocks.
>> I did not use a span greater than .8 because that was counteridicated 
>> either in the
>> micorarray or loess literature.
>> Thank you f all for your suggestion of going to vsn. However,
>> as this program is new to me,  I ask if anyone knows a rule of thumb 
>> as to how flat the
>> printtip loess line should be in order to be acceptable? I would 
>> prefer not to change horses
>> unless necessary
>>
>> Thanks and best wishes,
>> Rich
>>
>> On Jan 30, 2004, at 2:48 PM, Sean Davis wrote:
>>
>>> Richard,
>>>
>>> The print-tip-loess lines should (I think) be straight and on the 
>>> x-axis
>>> (y=0) after print-tip-normalization.  If that isn't the case, 
>>> perhaps you
>>> could post exactly the commands you used to do your normalization.
>>> That may
>>> help people determine better what is going on.
>>>
>>> In reference to ridding you of intensity-dependent variability,
>>> loess-normalization is designed to locally center the data but does 
>>> not, in
>>> itself, deal with the variability that may be intensity-dependent.
>>> For that
>>> problem, you may need to look into something like vsn or other 
>>> scaling
>>> method.
>>>
>>> Sean
>>>
>>>
>>> On 1/30/04 2:35 PM, "Richard Friedman" 
>>> <friedman at cancercenter.columbia.edu>
>>> wrote:
>>>
>>>> Mick,
>>>>
>>>> Thanks for the help. What concerns  me however is not a single
>>>> point being an outlier, but the whole loess fit to all the points 
>>>> leading
>>>> the lowess curve for a few printips to deviate significantly from 
>>>> being
>>>> a straight line practically colinear with the x-axis (abcissa). The 
>>>> two
>>>> test cases on which I learned to use marray - the apoE data that 
>>>> comes
>>>> with spot, and the swirl data that comes with marray, all had
>>>> significantly expressed genes - however they also had flat 
>>>> normalized
>>>> lowess curves. Significant curvature in the lowess curve leads me
>>>> to be concerned that the spots associated with that region of
>>>> the curve are improperly normalized.
>>>>
>>>> Can anyone out there give me:
>>>>
>>>> 1. Guidelines as to how flat the lowess curve should be for the
>>>>  data to be considered normalized.
>>>>
>>>> 2. Advice as to what to do if the printtip normalization option
>>>>  in marray did not remove intensity dependence.
>>>>
>>>> If anyone is willing to look at the M vs A curve, I would be 
>>>> grateful.
>>>>
>>>> Thanks and best wishes,
>>>> Rich
>>>>
>>>>
>>>>
>>>> On Fri, 30 Jan 2004, michael watson (IAH-C) wrote:
>>>>
>>>>> Richard
>>>>>
>>>>> The nature of any normalisation means that we will always have 
>>>>> outliers -
>>>>> those spots that deviate from all the rest.  There could be two 
>>>>> reasons -
>>>>> that spot represents a differentially expressed gene or the spot is
>>>>> unreliable and comes from a "bad" spot.
>>>>>
>>>>> I'd take the common sense approach to these outliers:
>>>>>
>>>>> i) Check any replicate spots - if all replicate spots are outliers 
>>>>> then you
>>>>> have evidence that it's a differentially expressed gene.  However, 
>>>>> if the
>>>>> replicates disagree, this is evidence that the outlier comes from 
>>>>> an
>>>>> unreliable / bad measurement
>>>>>
>>>>> ii) Go take a look at the spot on the original image.  Does it 
>>>>> look "good"?
>>>>>
>>>>> You are likely always to find outliers after normalisation.  This 
>>>>> is, after
>>>>> all, what we are looking for, isn't it?  The key is to be able to 
>>>>> say, when
>>>>> you see an outlier, if that spot is of reliable quality or not.
>>>>>
>>>>> Thanks
>>>>> Mick
>>>>>
>>>>> -----Original Message-----
>>>>> From: Richard Friedman [mailto:friedman at cancercenter.columbia.edu]
>>>>> Sent: 29 January 2004 22:26
>>>>> To: 'Bioconductor Mail List'
>>>>> Cc: IRA A TABAS
>>>>> Subject: [BioC] M vs A plot
>>>>>
>>>>>
>>>>> Dear Bioconductors,
>>>>>
>>>>> I have normalized a series of arrays using print-tip normalization.
>>>>> Where as the systematic error in the unnormalized data was 
>>>>> pronounced,
>>>>> The systematic error on the normalized array was reduced greatly.
>>>>> The M vs. A curve was flat for most of the 48 print-tips. However 
>>>>> for a
>>>>> few
>>>>> printips, for A>12 M deviates from close to zero. in one case, M 
>>>>> rises
>>>>> as high
>>>>> as M=1/2. at A=15. This only involves a small fraction of the 
>>>>> spots (It
>>>>> is hard to
>>>>> estimate what proportion).
>>>>>
>>>>> Does this sound serious?
>>>>>
>>>>> If so, what should I do about it?
>>>>>
>>>>> Is anyone willing to look at the JPEg file (I did not attach it
>>>>> because I don't
>>>>> know if I am allowed to do so).
>>>>>
>>>>> Thanks and best wishes,
>>>>> Rich
>>>>> ------------------------------------------------------------
>>>>> Richard A. Friedman, PhD
>>>>> Associate Research Scientist
>>>>> Herbert Irving Comprehensive Cancer Center
>>>>> Oncoinformatics Core
>>>>> Lecturer
>>>>> Department of Biomedical Informatics
>>>>> Box 95, Room 130BB or P&S 1-420C
>>>>> Columbia University Medical Center
>>>>> 630 W. 168th St.
>>>>> New York, NY 10032
>>>>> (212)305-6901 (5-6901) (voice)
>>>>> friedman at cancercenter.columbia.edu
>>>>> http://cancercenter.columbia.edu/~friedman/
>>>>>
>>>>> "Spring, Summer, and Winter.
>>>>> Then Fall came along,
>>>>> and that's the end of our song,
>>>>> and the pigeons never hibernate at all".
>>>>> -Rose Friedman, age 7
>>>>> (These are the correct lyrics and supersede
>>>>> the version previously at the end of my sig)
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>>>
>>>> ------------------------------------------------------------
>>>> Richard A. Friedman, PhD
>>>> Associate Research Scientist
>>>> Herbert Irving Comprehensive Cancer Center
>>>> Oncoinformatics Core
>>>> Lecturer
>>>> Department of Biomedical Informatics
>>>> Box 95, Room 130BB or P&S 1-420C
>>>> Columbia University Medical Center
>>>> 630 W. 168th St.
>>>> New York, NY 10032
>>>> (212)305-6901 (5-6901) (voice)
>>>> friedman at cancercenter.columbia.edu
>>>> http://cancercenter.columbia.edu/~friedman/
>>>>
>>>> "Spring, Summer, and Winter.
>>>> Then Fall came along,
>>>> and that's the end of our song,
>>>> and the pigeons never hibernate at all".
>>>> -Rose Friedman, age 7
>>>> (These are the correct lyrics and supersede
>>>> the version previously at the end of my sig)
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>>>
>>> Naomi S. Altman                                814-865-3791 (voice)
>>> Associate Professor
>>> Bioinformatics Consulting Center
>>> Dept. of Statistics                              814-863-7114 (fax)
>>> Penn State University                         814-865-1348 
>>> (Statistics)
>>> University Park, PA 16802-2111
>
>