[BioC] outlier removal from gene chip

fhong at salk.edu fhong at salk.edu
Tue Sep 19 20:20:26 CEST 2006


Dear Weiwei,
The definition of outlier is not clear, and no data point should be
treated as outlier unless there is reason to believe so. The simple way to
detect it is that 1.5IQR criteria, which you can write your own code (one
or two lines). Update me if there are any other method to detect outliers.

Fangxin


> dear listers:
>
> I have a question on whether bioconductor has some tool-kit to detect
> outliers and remove them.
>
> my original dataset looks like this:
>             V1       V51       V53        V55       V57
> 1   -493249600  1.459459 -3.069444  -1.300000  1.935484
> 2  -1613096495 -1.139269 -5.525281 -16.592593 -1.831978
> 3   1626196571 -3.500000 -1.011662   2.223881  3.921053
> 4  -1397009217 -3.571429  1.685714  -1.180297 -6.807692
> 5   1428659728 -1.405405 -1.469004  -4.779754 -1.033708
> 6    459853658 -2.158879 -7.510823  -1.085581 -9.382979
> 7    530182506 -1.431677 -1.336343  -3.126437  4.878788
> 8   1173842263  1.215385  1.856410  -2.059794 -6.020833
> 9        28847  2.407895 -2.048889  -1.730337 -1.178947
> 10 -1961875610  2.864159 -2.301234  -4.733264 -1.172058
>
> V1: internal probe id
> the rests are different samples. the cells are fold-change of
> disease/normal.
>
> summary of the sample columns( V51, ... V57) gives the following:
>       V51                V53                 V55                V57
>  Min.   :-482.000   Min.   : -55.7342   Min.   :-122.074   Min.
> :-14086.750
>  1st Qu.:  -2.159   1st Qu.:  -1.7312   1st Qu.:  -2.125   1st Qu.:
> -1.831
>  Median :  -1.199   Median :  -1.0416   Median :  -1.200   Median :
> -1.080
>  Mean   :  -0.918   Mean   :   0.1662   Mean   :  -1.027   Mean   :
> -1.874
>  3rd Qu.:   1.441   3rd Qu.:   1.5721   3rd Qu.:   1.419   3rd Qu.:
> 1.521
>  Max.   : 198.434   Max.   :1478.1639   Max.   :  95.768   Max.   :
> 683.519
>
>
> My question is, is there any package which can detect those outliers
> (like -14086.750)and remove them and get an "average" for each gene
> (instead of each probe)?
>
> Thank you.
>
> Weiwei
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>


--------------------
Fangxin Hong  Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu
(Phone): 858-453-4100 ext 1105



More information about the Bioconductor mailing list