[R] PCA sensitive to outliers?

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Apr 23 07:10:31 CEST 2012


On Mon, Apr 23, 2012 at 12:01 AM, Michael <comtech.usa at gmail.com> wrote:
> yes, but that is not a good Review or Survey... thx

But the packages listed there do have their own documentation and
vignettes. For instance the rrcov package seems to have a nice
vignette about its design as well as methods it implements, and
references to these methods for further reading:

http://cran.r-project.org/web/packages/rrcov/vignettes/rrcov.pdf

You'll see at least a few mentions of PCA, which will lead you to
other package/papers/etc.

Enjoy,

-steve

>
> On Sun, Apr 22, 2012 at 9:47 PM, Bert Gunter <gunter.berton at gene.com> wrote:
>
>> As I believe I already told you, look at the CRAN Robust task view.
>>
>> -- Bert
>>
>> On Sun, Apr 22, 2012 at 6:29 PM, Michael <comtech.usa at gmail.com> wrote:
>> > Even in R, there are so many of "robust PCA"... any survey or review of
>> all
>> > these different methods?
>> >
>> > On Sun, Apr 22, 2012 at 6:58 PM, Joshua Wiley <jwiley.psych at gmail.com
>> >wrote:
>> >
>> >> On Sun, Apr 22, 2012 at 4:43 PM, Michael <comtech.usa at gmail.com> wrote:
>> >> > I actually tried "robustPca" in "pcaMethods" on bioconductor.
>> >> >
>> >> > It keeps giving me the warning "Input data is not complete"...
>> >> >
>> >> > Reading into the function:
>> >> >
>> >> > When there is no "NA"s, it will give this warning...
>> >> >
>> >> > It seems that there is a bug in this code...
>> >> >
>> >> > Is it reliable at all?
>> >> >
>> >> > ---------------------
>> >> >
>> >> >
>> >> >> robustPcafunction (Matrix, nPcs = 2, verbose = interactive(), ...)
>> >> > {
>> >> >    nas <- is.na(Matrix)
>> >> >    if (!any(nas) & verbose) {
>> >> >        cat("Input data is not complete.\n")
>> >> >        cat("Scores, R2 and R2cum may be inaccurate, handle with
>> care\n")
>> >> >    }
>> >>
>> >> that seems to issue the notes when there are *not any missing* and
>> >> verbose is TRUE.  I would submit a bug report to the author.
>> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Apr 20, 2012 at 9:58 AM, Kevin Wright <kw.stat at gmail.com>
>> wrote:
>> >> >
>> >> >> You can also have a look at the pcaMethods package on Bioconductor.
>> >> >>
>> >> >> Kevin
>> >> >>
>> >> >>
>> >> >>  On Thu, Apr 19, 2012 at 11:20 PM, Michael <comtech.usa at gmail.com>
>> >> wrote:
>> >> >>
>> >> >>>  Hi all,
>> >> >>>
>> >> >>> I found that the PCA gave chaotic results when there are big changes
>> >> in a
>> >> >>> few data points.
>> >> >>>
>> >> >>> Are there "improved" versions of PCA in R that can help with this
>> >> problem?
>> >> >>>
>> >> >>> Please give me some pointers...
>> >> >>>
>> >> >>> Thank you!
>> >> >>>
>> >> >>>        [[alternative HTML version deleted]]
>> >> >>>
>> >> >>> ______________________________________________
>> >> >>> R-help at r-project.org mailing list
>> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >>> PLEASE do read the posting guide
>> >> >>> http://www.R-project.org/posting-guide.html<
>> http://www.r-project.org/posting-guide.html>
>> >> <http://www.r-project.org/posting-guide.html>
>> >>  >>> and provide commented, minimal, self-contained, reproducible code.
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Kevin Wright
>> >> >>
>> >> >>
>> >> >
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > R-help at r-project.org mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html<
>> http://www.r-project.org/posting-guide.html>
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >>
>> >> --
>> >> Joshua Wiley
>> >> Ph.D. Student, Health Psychology
>> >> Programmer Analyst II, Statistical Consulting Group
>> >> University of California, Los Angeles
>> >> https://joshuawiley.com/
>> >>
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list