[BioC] classification issues - normalization and standardization

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Jul 19 17:53:08 CEST 2011


Hi Theresa,

On Tue, Jul 19, 2011 at 3:47 AM, Theresa Brandt
<theresabrandt80 at gmail.com> wrote:
> Hi Steve,
>  Thank you very much for your help. Know it is clear for me. I can do the
> array normalization (like rma) on the whole data set. Then I have to split
> the dataset and I can do things like filtering of genes or gene
> standardization only on a training set.
>  I was confused after reading a book "Bioconductor Case Studies". In the
> chapter about supervised machine learning they performed non-specific gene
> filtering and gene standardization on the whole dataset. But I would rather
> trust that you are right.

I wouldn't trust that I am right ... the people who wrote that book
have some serious credentials. :-)

There is arguably "lots" of things you can do to (all) of your data --
especially if you do not use the labels on your data as part of your
data preprocessing. I was just suggesting what I might do in your
situation is all. I never read the book you mentioned, though, but by
looking at folks who wrote it, I would imagine what they are doing in
that particular scenario is also valid.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list