[BioC] objective criterion for identification of outlying arrays by pca

Richard Friedman friedman at cancercenter.columbia.edu
Fri Nov 4 14:48:53 CET 2011


Dear Kevin and List,

	I read your paper with great interest but from the paper the method  
seems to be implemented
mainly in Matlab. I am not a Matlab user,  Is there a user-friendly R  
version that can be used
with no more R-scripting on the part of the user than is typical of  
most bioconductor
packages?

Thanks and best wishes,
Rich
------------------------------------------------------------
Richard A. Friedman, PhD
Associate Research Scientist,
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Lecturer,
Department of Biomedical Informatics (DBMI)
Educational Coordinator,
Center for Computational Biology and Bioinformatics (C2B2)/
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Room 824
Irving Cancer Research Center
Columbia University
1130 St. Nicholas Ave
New York, NY 10032
(212)851-4765 (voice)
friedman at cancercenter.columbia.edu
http://cancercenter.columbia.edu/~friedman/

I am a Bayesian. When I see a multiple-choice question on a test and I  
don't
know the answer I say "eeney-meaney-miney-moe".

Rose Friedman, Age 14







On Nov 2, 2011, at 11:12 AM, Kevin R. Coombes wrote:

> The Mahalanobis distance (also known as Hotelling's T^2 statistic)  
> from the center of a D-dimensional principal component space (under  
> some sensible null hypothesis) should follow a chi-squared  
> distribution with D degrees of freedom.  You can thus conduct a test  
> for outliers based on the p-value associated with the chi-squared  
> statistic.  (We used this idea for QC in a serum proteomics study a  
> long time ago: Coombes et al, Clin Chem 2003; 49:1615-23.)
>
>    Kevin
>
> On 11/2/2011 9:11 AM, James W. MacDonald wrote:
>> Hi Rich,
>>
>> On 11/2/2011 10:04 AM, Richard Friedman wrote:
>>> Dear Bioconductor List,
>>>
>>>    Does anyone know of an objective criterion for the  
>>> identification of outlying arrays
>>> by pca?
>>
>> I don't know an objective criterion for this. However, unless the  
>> 'outlier' is ridiculously bad, you might be better off using array  
>> weights to down-weight the offending array(s). In limma, the  
>> arrayWeights() and arrayWeightsSimple() functions allow you to  
>> generate weights that you can then feed into lmFit().
>>
>> Best,
>>
>> Jim
>>
>>
>>>
>>>    I usually do this subjectively. However the experimental  
>>> investigator whom I am helping
>>> has a different subjective sense than I do, so that I wonder if  
>>> there is a hard-and-fast criterion.
>>>
>>> Thanks and best wishes,
>>> Rich
>>> ------------------------------------------------------------
>>> Richard A. Friedman, PhD
>>> Associate Research Scientist,
>>> Biomedical Informatics Shared Resource
>>> Herbert Irving Comprehensive Cancer Center (HICCC)
>>> Lecturer,
>>> Department of Biomedical Informatics (DBMI)
>>> Educational Coordinator,
>>> Center for Computational Biology and Bioinformatics (C2B2)/
>>> National Center for Multiscale Analysis of Genomic Networks (MAGNet)
>>> Room 824
>>> Irving Cancer Research Center
>>> Columbia University
>>> 1130 St. Nicholas Ave
>>> New York, NY 10032
>>> (212)851-4765 (voice)
>>> friedman at cancercenter.columbia.edu
>>> http://cancercenter.columbia.edu/~friedman/
>>>
>>> I am a Bayesian. When I see a multiple-choice question on a test  
>>> and I don't
>>> know the answer I say "eeney-meaney-miney-moe".
>>>
>>> Rose Friedman, Age 14
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>



More information about the Bioconductor mailing list