[BioC] DNA micro-array normalization

Wed Feb 17 17:42:19 CET 2010

Hi Avehna,

Please don't take conversations off list. The list is considered to be a 
resource that people in your situation can use in the future to answer 
questions themselves.

avehna wrote:
> On Tue, Feb 16, 2010 at 3:50 PM, James W. MacDonald 
> <jmacdon at med.umich.edu <mailto:jmacdon at med.umich.edu>> wrote:
> 
>     To add to this; these data are almost surely MAS5 processed data, as
>     I don't know of any other algorithm that gives the detection
>     p-value. In addition, the range of 0 - 9000 indicates that these
>     data are not logged (which is the next step for you). People
>     normally use log base 2 so that a difference of 1 or -1 indicates
>     two-fold up or down regulation.
> 
> 
> OK. But in this case what would be the reference point? Wouldn't be the 
> up or down regulation respect to the control? Before writing to the list 
> I have browsed several tutorials and I'm still missing this part. Should 
> it be log2(treatment/control)? (It's not clear what I have read)

Yes. Or since you have already taken logs, it will be log2(treatment) - 
log2(control), which you will notice is the numerator of your t-statistic.

> 
> 
>     MAS5 data are normalized after the fact, so you should log transform
>     and then look at plots of the densities to see if they look as if
>     they have been normalized or not. The default is to do a scale
>     normalization, so you are just looking for the densities to be in
>     same general vicinity rather than overlaying each other.
> 
> 
> OK. Could you send me some helpful references about that?

http://media.affymetrix.com/support/technical/whitepapers/sadd_whitepaper.pdf

Best,

Jim

>  
> 
> 
>     If you could get the original celfiles, you would be much better off.
> 
> 
> I will try!
> 
> Best and thank you so much for your help,
> 
> Avhena.
> 
>     Best,
> 
>     Jim
> 
> 
> 
> 
>     michael watson (IAH-C) wrote:
> 
>         This is definitely processed data, and without access to the
>         original data or a description of the analysis methodology, your
>         options are limited.
> 
>         Personally, I'd do a test for normality on the "Signal" values,
>         and if they turn out to be normal, I'd run a simple t-test
>         (control vs treatment) on each gene and correct the p-values for
>         multiple testing.
> 
>         Simple stuff, but it should work.
>         ________________________________________
>         From: bioconductor-bounces at stat.math.ethz.ch
>         <mailto:bioconductor-bounces at stat.math.ethz.ch>
>         [bioconductor-bounces at stat.math.ethz.ch
>         <mailto:bioconductor-bounces at stat.math.ethz.ch>] On Behalf Of
>         avehna [avhena at gmail.com <mailto:avhena at gmail.com>]
>         Sent: 16 February 2010 19:47
>         To: bioconductor at stat.math.ethz.ch
>         <mailto:bioconductor at stat.math.ethz.ch>
>         Subject: [BioC] DNA micro-array normalization
> 
>         Hi There,
> 
>         I've got a DNA microarray dataset that looks like this:
> 
>         *    Probe                 Signal          Detection
>         Detection_p-value                   Descriptions*
>         AFFX-BioB-5_at       181                P
>         0.00011                  "E. coli  GEN=bioB  DB_XREF=gb:J04423.1"
>         AFFX-BioB-M_at     227.3              P                 0.000044
>           "E. coli  GEN=bioB  DB_XREF=gb:J04423.1"
>         AFFX-BioC-5_at     499.2               P
>         0.000052                "E. coli  GEN=bioC  DB_XREF=gb:J04423.1"
> 
>         I have control and treatment with 3 replicas for each one of them.
> 
>         But I'm not sure whether these data have been already
>         normalized, and on the
>         other hand, this is not the typical affymetrix format...
> 
>         Could you help me in this regard? What is the typical signal
>         range for rough
>         affymetrix data? (these data range from 0 to 9000)
> 
>         If the data have been already normalized, Can I calculate the
>         mean (for
>         treatment and control) followed by the differential expression
>         of genes
>         without taking into account the "Detection" column?
> 
>         (I guess I will need to build my ExpressionSet from scratch)
> 
>         Thanks a lot (I'm a newbie in bioconductor and micro-array
>         analysis). I will
>         appreciate you help!
> 
>         Avhena
> 
>                [[alternative HTML version deleted]]
> 
>         _______________________________________________
>         Bioconductor mailing list
>         Bioconductor at stat.math.ethz.ch
>         <mailto:Bioconductor at stat.math.ethz.ch>
>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>         Search the archives:
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>         _______________________________________________
>         Bioconductor mailing list
>         Bioconductor at stat.math.ethz.ch
>         <mailto:Bioconductor at stat.math.ethz.ch>
>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>         Search the archives:
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
>     -- 
>     James W. MacDonald, M.S.
>     Biostatistician
>     Douglas Lab
>     University of Michigan
>     Department of Human Genetics
>     5912 Buhl
>     1241 E. Catherine St.
>     Ann Arbor MI 48109-5618
>     734-615-7826
>     **********************************************************
>     Electronic Mail is not secure, may not be read every day, and should
>     not be used for urgent or sensitive issues
> 
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues