[BioC] Problems normalizing scanarray express data with limma

Gordon K Smyth smyth at wehi.EDU.AU
Thu Jan 12 05:52:13 CET 2012


Dear Matthew,

This question hasn't been asked for many years!  It used to be quite a 
common question, see for example:

https://stat.ethz.ch/pipermail/bioconductor/2005-July/009886.html

The problem is not that you have an extra row, but rather than you have 
too few rows.  Your arrays have 16 blocks (4 x 4) with 6 rows and 14 
columns of spots in each block.  So limma assumes your arrays to have 
4x4x6x14 = 1344 spots, but your files actually contain only 1152 rows of 
data.  The reason is almost certainly that a number of empty spots have 
been removed from the files.

One easy workaround is simply to do global loess instead of 
print-tip-loess normalization:

   MA <- normalizeWithinArrays(RG, method="loess")

Another workaround is to make up a block count variable:

   block <- 4*(RG$genes[,"Array Row"]-1) + RG$genes[,"Array Column"]

and then to use the solution that I suggested back in July 2005.


With respect to the deleting of 74 lines of headers and so forth, have you 
tried simply using

   RG <-read.maimages(targets, source="scanarrayexpress", sep=",")

using your original unedited files?  The whole reason for having a 
"scanarrayexpress" method for read.maimages() is that it takes care of all 
the editing and reading for you.

Best wishes
Gordon


> Date: Tue, 10 Jan 2012 14:34:53 -0500
> From: Matthew Ouellette <ouellet5 at uwindsor.ca>
> To: bioconductor at r-project.org
> Subject: [BioC] Problems normalizing scanarray express data with limma
>
> Hello,
>
> I'm having trouble analyzing my custom arrays with limma.  I've searched
> the archives and I seem to be running into a similar problem that was
> previously dealt with here (
> https://stat.ethz.ch/pipermail/bioconductor/2005-October/010482.html).
>
> I'm also using outputs from a scanarray express, although I've modified my
> .csv's accordingly and removed the final line of useless data as indicated
> in the archives.  Also, being an R newbie I wasn't sure how to tell R that
> my data started after some 74 lines of headers (output info from the
> scanner), so I deleted those headers out as well (and input $printer info
> manually), leaving only a header for the columns of intensity data.   For
> simplicities sake I've pasted below a shortened session of what I'm trying
> to do (my apologies for the lengthy e-mail).  I appreciate the help and
> comments.
>
>
>
> R version 2.14.0 (2011-10-31)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
> [R.app GUI 1.42 (5933) i386-apple-darwin9.8.0]
>
>> setwd("***")
>> library(limma)
>> targets<-readTargets()
>> RG <-read.maimages(targets, source="scanarrayexpress",annotation=c("Array
> Row", "Array Column", "Spot Row", "Spot Column", "Name", "ID"),
> other.columns=c("Ch1 SignalNoiseRatio", "Ch2 SignalNoiseRatio"), sep=",")
> Read 01-13_B.csv
> Read 01-13_M.csv
> Read 01-13_T.csv
>> RG$printer <-getLayout2("ChinookBOT.gal")
>> spottypes<-readSpotTypes()
>> RG$genes$Status<- controlStatus(spottypes, RG)
> Matching patterns for: Name
> Found 1116 oligo
> Found 21 blank
> Found 15 serial
> Setting attributes: values Color
>> show(RG)
> An object of class "RGList"
> $G
>     01-13_B 01-13_M 01-13_T
> [1,]     102     119     239
> [2,]     100     122     339
> [3,]     102     135     251
> [4,]      90     112     242
> [5,]     110     141     239
> 1147 more rows ...
>
> $Gb
>     01-13_B 01-13_M 01-13_T
> [1,]      89      94     147
> [2,]      88      84     181
> [3,]      88      91     161
> [4,]      92      90     175
> [5,]      86      87     154
> 1147 more rows ...
>
> $R
>     01-13_B 01-13_M 01-13_T
> [1,]     120     678     202
> [2,]     154     610     312
> [3,]     146     614     306
> [4,]     108     654     310
> [5,]     122     710     291
> 1147 more rows ...
>
> $Rb
>     01-13_B 01-13_M 01-13_T
> [1,]     108     119     135
> [2,]     109     137     159
> [3,]     113     124     169
> [4,]     115     124     180
> [5,]     119     104     159
> 1147 more rows ...
>
> $targets
>     FileName Cy3 Cy5
> 1 01-13_B.csv  B1  B2
> 2 01-13_M.csv  M1  M2
> 3 01-13_T.csv  T1  T2
>
> $genes
>  Array Row Array Column Spot Row Spot Column     Name      ID Status
> 1         1            1        1           1 HEATH049 Gene A4  oligo
> 2         1            1        1           2 HEATH049 Gene A4  oligo
> 3         1            1        1           3 HEATH049 Gene A4  oligo
> 4         1            1        1           4 HEATH113 Gene A8  oligo
> 5         1            1        1           5 HEATH113 Gene A8  oligo
> 1147 more rows ...
>
> $source
> [1] "scanarrayexpress"
>
> $other
> $Ch1 SignalNoiseRatio
>     01-13_B 01-13_M 01-13_T
> [1,]    3.06    2.55    3.02
> [2,]    2.72    3.06    2.35
> [3,]    2.68    3.60    3.34
> [4,]    2.51    3.12    0.95
> [5,]    3.33    3.82    2.66
> 1147 more rows ...
>
> $Ch2 SignalNoiseRatio
>     01-13_B 01-13_M 01-13_T
> [1,]    2.31   12.41    2.85
> [2,]    2.42   11.82    3.57
> [3,]    2.66   11.71    4.14
> [4,]    1.75   14.41    0.65
> [5,]    2.09   15.90    4.62
> 1147 more rows ...
>
>
> $printer
> $ngrid.r
> [1] 4
>
> $ngrid.c
> [1] 4
>
> $nspot.r
> [1] 6
>
> $nspot.c
> [1] 14
>
>
>> MA<- normalizeWithinArrays(RG)
> Error in normalizeWithinArrays(RG) :
>  printer layout information does not match M row dimension
>
>
>
> -- 
> Matthew Ouellette, M.Sc. Candidate
> Great Lakes Institute for Environmental Research
> University of Windsor

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list