[BioC] outlier removal from gene chip

Weiwei Shi helprhelp at gmail.com
Tue Sep 19 22:35:36 CEST 2006


thanks for all of suggestions here.

i will go w/o removing those "outliers" first and update some result
if necessary.

On 9/19/06, Kasper Daniel Hansen <khansen at stat.berkeley.edu> wrote:
>
> On Sep 19, 2006, at 12:18 PM, Weiwei Shi wrote:
>
> > my current way is using mahalanobis() distance.
> >
> > to Sean:
> > do u think that example: -14k is ok?
>
> That example could be a case of the gene being expressed in one
> condition and not being expressed in another. I do not remember where
> the data are from (or if you have even described that) or platform
> or ..., but I would agree with Sean and say that you do not want to
> blindly remove the genes. Note that we are not advising that you
> shouldn't remove the gene, just that you should take a careful look
> at the data and try to decide what to do.
>
> As Fangxin clearly writes, it is hard to really know what is an outlier.
>
> Kasper
>
>
> >
> > On 9/19/06, fhong at salk.edu <fhong at salk.edu> wrote:
> >> Dear Weiwei,
> >> The definition of outlier is not clear, and no data point should be
> >> treated as outlier unless there is reason to believe so. The
> >> simple way to
> >> detect it is that 1.5IQR criteria, which you can write your own
> >> code (one
> >> or two lines). Update me if there are any other method to detect
> >> outliers.
> >>
> >> Fangxin
> >>
> >>
> >>> dear listers:
> >>>
> >>> I have a question on whether bioconductor has some tool-kit to
> >>> detect
> >>> outliers and remove them.
> >>>
> >>> my original dataset looks like this:
> >>>             V1       V51       V53        V55       V57
> >>> 1   -493249600  1.459459 -3.069444  -1.300000  1.935484
> >>> 2  -1613096495 -1.139269 -5.525281 -16.592593 -1.831978
> >>> 3   1626196571 -3.500000 -1.011662   2.223881  3.921053
> >>> 4  -1397009217 -3.571429  1.685714  -1.180297 -6.807692
> >>> 5   1428659728 -1.405405 -1.469004  -4.779754 -1.033708
> >>> 6    459853658 -2.158879 -7.510823  -1.085581 -9.382979
> >>> 7    530182506 -1.431677 -1.336343  -3.126437  4.878788
> >>> 8   1173842263  1.215385  1.856410  -2.059794 -6.020833
> >>> 9        28847  2.407895 -2.048889  -1.730337 -1.178947
> >>> 10 -1961875610  2.864159 -2.301234  -4.733264 -1.172058
> >>>
> >>> V1: internal probe id
> >>> the rests are different samples. the cells are fold-change of
> >>> disease/normal.
> >>>
> >>> summary of the sample columns( V51, ... V57) gives the following:
> >>>       V51                V53                 V55                V57
> >>>  Min.   :-482.000   Min.   : -55.7342   Min.   :-122.074   Min.
> >>> :-14086.750
> >>>  1st Qu.:  -2.159   1st Qu.:  -1.7312   1st Qu.:  -2.125   1st Qu.:
> >>> -1.831
> >>>  Median :  -1.199   Median :  -1.0416   Median :  -1.200   Median :
> >>> -1.080
> >>>  Mean   :  -0.918   Mean   :   0.1662   Mean   :  -1.027   Mean   :
> >>> -1.874
> >>>  3rd Qu.:   1.441   3rd Qu.:   1.5721   3rd Qu.:   1.419   3rd Qu.:
> >>> 1.521
> >>>  Max.   : 198.434   Max.   :1478.1639   Max.   :  95.768   Max.   :
> >>> 683.519
> >>>
> >>>
> >>> My question is, is there any package which can detect those outliers
> >>> (like -14086.750)and remove them and get an "average" for each gene
> >>> (instead of each probe)?
> >>>
> >>> Thank you.
> >>>
> >>> Weiwei
> >>>
> >>> --
> >>> Weiwei Shi, Ph.D
> >>> Research Scientist
> >>> GeneGO, Inc.
> >>>
> >>> "Did you always know?"
> >>> "No, I did not. But I believed..."
> >>> ---Matrix III
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at stat.math.ethz.ch
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives:
> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>
> >>>
> >>
> >>
> >> --------------------
> >> Fangxin Hong  Ph.D.
> >> Plant Biology Laboratory
> >> The Salk Institute
> >> 10010 N. Torrey Pines Rd.
> >> La Jolla, CA 92037
> >> E-mail: fhong at salk.edu
> >> (Phone): 858-453-4100 ext 1105
> >>
> >>
> >
> >
> > --
> > Weiwei Shi, Ph.D
> > Research Scientist
> > GeneGO, Inc.
> >
> > "Did you always know?"
> > "No, I did not. But I believed..."
> > ---Matrix III
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/
> > gmane.science.biology.informatics.conductor
>
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the Bioconductor mailing list