[BioC] microarray outlier detection

Devon Ryan dpryan at dpryan.com
Fri Aug 30 23:13:36 CEST 2013


To expound on what Peter Langfelder wrote, some people get in the unwise practice early in their careers of removing what the think are outlier datapoints/samples simply because it makes their data cleaner. This is a really bad idea because you end up chronically underestimating biological variability, which will inevitably come back to haunt you. I would argue that, regardless of what some statistical test that the lab person likely doesn't understand might say, if you can't immediately eyeball a sample as an outlier in a PCA plot or via hierarchical clustering, you probably shouldn't remove it. Try discussing with this person his/her reasons for thinking that there are outliers, it's likely that he/she has simply fallen into this trap.

Good luck,
Devon

____________________________________________
Devon Ryan, Ph.D.
Email: dpryan at dpryan.com
Molecular and Cellular Cognition Lab
German Centre for Neurodegenerative Diseases (DZNE)
Ludwig-Erhard-Allee 2
53175 Bonn, Germany

On Aug 30, 2013, at 10:32 PM, guest [guest] wrote:

> 
> Dear users,
> 
> I have human gene 2.0 st array, total 12 samples including 4 groups, each group has 3 replicates. The lab person would like to remove one from each of the group due to the outliers, but from PCA plot, the samples are not clustered, it is hard to remove any sample as an outlier. I wonder if we have the package or function to solve the outlier detection issue on microarray.
> 
> Thanks,
> 
> 
> -- output of sessionInfo(): 
> 
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] pd.hugene.2.0.st_3.8.0                oligo_1.24.1                          oligoClasses_1.22.0                   hugene20sttranscriptcluster.db_2.12.1
> [5] org.Hs.eg.db_2.9.0                    RSQLite_0.11.4                        DBI_0.2-7                             AnnotationDbi_1.22.6                 
> [9] Biobase_2.20.1                        BiocGenerics_0.6.0                    limma_3.16.6                         
> 
> loaded via a namespace (and not attached):
> [1] affxparser_1.32.3     affyio_1.28.0         BiocInstaller_1.10.3  Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8      
> [7] ff_2.2-11             foreach_1.4.1         GenomicRanges_1.12.4  IRanges_1.18.2        iterators_1.0.6       preprocessCore_1.22.0
> [13] splines_3.0.1         stats4_3.0.1          zlibbioc_1.6.0       
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list