[BioC] Defining Weights in marrayNorm.

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Tue Aug 19 10:13:32 MEST 2003

Thank you, that is pretty much as I expected :-D

My only point being that the very nature of an marrayRaw object allows each individual array to have different weights to every other array, so it is a little confusing that the maNormMain function applies only one set of weights to multiple arrays. 

Anyway, I have taken enough help from this list, so here is some contributed code that *should* do what I want:

# raw data should be in an marrayRaw object called "data"
# layout object should be in an marrayLayout object called "layout"
# the marrayRaw object must have a maW slot

# number of arrays
x = 3

#create layout here if none exists
#layout = read.marrayLayout(ngr = 12, ngc = 4, nsr = 14, nsc = 15)

# type of normalisation - 'maPrintTip' will give print-tip Loess, NULL will
# give Loess
z = "maPrintTip"

# define matrices to hold the weighted data
# 100080 is the number of spots on my array
mam = matrix(nrow = 10080, ncol = x)
maa = matrix(nrow = 10080, ncol = x)
maw = matrix(nrow = 10080, ncol = x)

# create individual weighted 
for (i in seq(1,x)) {
    temp = maNormMain(data[,i], f.loc = list(maNormLoess(x="maA", y="maM",
    z=z, w=data[,i]@maW)))
    mam[,i] = maM(temp)
    maa[,i] = maA(temp)
    maw[,i] = data[,i]@maW

# create a marray Norm object
my.weight.norm = new('marrayNorm',
maA = maa,
maM = mam,
maW = maw,
maLayout = layout,
maNormCall = maNormCall(maNormMain(data, f.loc = list(maNormLoess(x="maA", y="maM", z="maPrintTip", w=data at maW)))))

-----Original Message-----
From: Marcus Davy [mailto:MDavy at hortresearch.co.nz]
Sent: 19 August 2003 04:16
To: michael.watson at bbsrc.ac.uk; jean at biostat.ucsf.edu;
bioconductor at stat.math.ethz.ch
Subject: RE: [BioC] Defining Weights in marrayNorm.


For multiple arrays as far as I can see there is currently no way that

you can make a call to maNormMain that will allow maLoess to utilise 
weights from EACH column of your maW matrix. 
maNormMain uses a controlling function maNormLoess specified as a LIST
calls in the arguement:

 f.loc=list(maNormLoess(x="maA",y="maM", z="maPrintTip", w=Flagweights,

maNormLoess calls maLoess. Inside the code for maLoess the actual loess
has an arguement weights=args$w, where args$w will be the vector of

 fit <- loess(y ~ x, weights = args$w, subset = args$subset, 
            span = args$span, na.action = args$na.action, degree =
            family = args$family, control = args$control)

What was probably happening in your call

data.weight.norm = maNormMain(data, f.loc = list(maNormLoess(x="maA",
y="maM", z="maPrintTip", w=data at maW)))

was that only the first column of data at maW was being used as weights
for the 
the normalization vectors x="maA data for each array", 
and y="maM data for each array" in the loess smoother.

cars.lo <- loess(dist ~ speed, cars)
plot(cars.lo$x, cars.lo$y)

weights <- matrix(rbinom(100,1,0.4),nc=2)
cars.lo <- loess(dist ~ speed, cars, weights=weights)

lines(cars.lo$x,cars.lo$fit, col="red")

cars.lo <- loess(dist ~ speed, cars, weights=weights[1:50,1])
lines(cars.lo$x,cars.lo$fit, lty=8, col="green")


>>> "michael watson (IAH-C)" <michael.watson at bbsrc.ac.uk> 14/08/2003
2:01:44 AM >>>

In my continuing search to get this working, I have made progress :-D 
But I think I found a bug/feature...

OK here is what I did. 

I took a GenePix file.  I made a copy of it.  I added column
"SpotWeight" to both.  In one of the files I set the weights all to 1. 
In the other, I set all of the weights to be between 0 and 0.5 (random
numbers).  I just wanted to see if I could get it working.


> data = read.GenePix(fnames = files, name.Gf = "F532 Median", name.Gb
= "B532 Median", name.Rf = "F635 Median", name.Rb = "B635 Median",
name.W = "SpotWeight", layout=layout)
> maW(data)

produces a nice lovely vector of my weights, so far so good.  By
chance, the first column was the one with all 1's - I think this is

> data.norm = maNorm(data, norm = "printTipLoess")

This works great and just produces normalised data as if maW didn't
exist - we expect this from the code, maNorm() function does not use


> data.weight.norm = maNormMain(data, f.loc = list(maNormLoess(x="maA",
y="maM", z="maPrintTip", w=data at maW)))

is my big hope.  And it doesn't throw any errors :-D.  However, it does
just produce M values as if maW doesn't exist.  I am about to throw in
the towel when I think I should try something.  So I try:

> data.weight.norm = maNormMain(data[,1], f.loc =
list(maNormLoess(x="maA", y="maM", z="maPrintTip", w=data[,1]@maW)))

This again turns up the now familiar M values, unaffected by maW.  But
of course, in my first data set maW is all set to 1, so of course thats
what it SHOULD produce.  So i try:

> data.weight.norm = maNormMain(data[,2], f.loc =
list(maNormLoess(x="maA", y="maM", z="maPrintTip", w=data[,2]@maW)))

and guess what?  It works!  Hurrah!  My M values have been affected by
maW, they are different to normal and I can only assume maNormMain is
calculating weighted normalised M values according to maW.

But wait - isn't this a little incorrect?  The marrayRaw class allows
me to have different weights for different spots for all of my arrays. 
So why when I normalise using maNormMain() do I have to do it on an
array-by-array basis?  Surely:

> data.weight.norm = maNormMain(data, f.loc = list(maNormLoess(x="maA",
y="maM", z="maPrintTip", w=data at maW)))

should work in that when it is normalising the nth marrayRaw data set
in "data", it should use the nth set of weights in data at maW...?  Instead
what it appears to have done is take the first column of data at maW, and
by chance in my case that was all 1's so I noticed.  If those hadn't
been 1's but had been legitimate weights for the first array, I don't
think I would have noticed.... and had all of my arrays normalised
according to weights for the first array.... :-(

Anyway, I believe I have cracked it now in that I can weight normalise
all ninety of my arrays.  The fact that i have to make 90 calls to
maNormMain and prodcue 90 normalised data sets is a nuisance rather than
anything else, though I do believe what i have said above makes sense, I
hope someone agrees :-)  In most other respects the marray* classes, and
bioconductor in general, are fantastic, so I hope I don't appear
unappreciative ;-)


Bioconductor mailing list
Bioconductor at stat.math.ethz.ch 

The contents of this e-mail are privileged and/or confidential to the
named recipient and are not to be used by any other person and/or
organisation. If you have received this e-mail in error, please notify 
the sender and delete all material pertaining to this e-mail.

More information about the Bioconductor mailing list