[BioC] predict vsn with reference

Wed Oct 3 13:45:16 CEST 2007

Dear Chris,

thank you for this very useful feedback! Indeed you have discovered an 
oversight in the "predict" function, which led to wrong results when the 
fit object was previously obtained from a "by-reference" fit (I had 
never had an instance of this use-case so far....)

I have adjusted this in version 3.3.1 of the package, which is posted 
here: http://www.ebi.ac.uk/~huber/pub

I need to see whether it can still be included in the BioC 2.1 release, 
otherwise it will shortly be in the new devel branch for 2.2.

There is also a little script, chris.R, which (afaIu) recapitulates
the synthetic data example from your post, but with real data. CCl4 can 
be obtained with "biocLite". And its output is in the PNG file. Please 
have a look whether this now fixes your problem!

 > sessionInfo()
R version 2.6.0 RC (2007-10-01 r43050)
i686-pc-linux-gnu

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] CCl4_1.0.6             vsn_3.3.1              limma_2.11.14
[4] affy_1.15.12           preprocessCore_0.99.22 affyio_1.5.11
[7] Biobase_1.15.36        fortunes_1.3-3

loaded via a namespace (and not attached):
[1] grid_2.6.0     lattice_0.16-5

Best wishes
   Wolfgang

------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber

chrisk ha scritto:
> I'm having difficulty using the 'reference' argument of vsn to put data 
> from a new microarray onto the scale of an existing set of arrays, when 
> all the arrays are normalised using a shared set of controls.
> 
> I think it's not understanding the way offsets are handled- predicted 
> values for the data used to create a vsn object are different from the 
> values stored in that vsn object when a reference is used. e.g. if I 
> have data from 2 arrays in 'a' and want to put array b back onto their 
> scale, this is what I'm doing:
> 
> library(vsn)
> set.seed(214)
> vals<-runif(1000)
> a<-matrix(rep(vals,2)+0.1*rnorm(2000),1000,2)
> b<-vals+0.1*rnorm(1000)
> aVsn<-vsn2(a)
> bVsn<-vsn2(b,reference=aVsn)
> 
> the values stored in bVsn are now on the same scale as the 'a' arrays:
> 
> plot(exprs(aVsn)[,2],exprs(bVsn)); abline(0,1)
> 
> however, the predictions from bVsn, using the data b are offset from 
> these values:
> 
> plot(exprs(bVsn),predict(bVsn,b)); abline(0,1)
> 
> This is an issue when these comparable spots are only a reference set of 
> probes for a larger array:
> 
> aFull<-rbind(a,matrix(runif(20000),10000,2))
> bFull<-c(b,runif(10000))
> 
> I've been calculating values for the 'a' arrays using:
> 
> aFullVal<-predict(aVsn,aFull)
> 
> but if I use the same approach for the b array I cease to be on the same 
> scale as the 'a' arrays:
> 
> bFullVal<-predict(bVsn,bFull)
> 
> plot(aFullVal[1:1000,1],bFullVal[1:1000,1]); abline(0,1)
> 
> I can get back to the scale by subtracting the difference:
> 
> offset<-mean(exprs(bVsn)-predict(bVsn,b))
> bFullVal2<-bFullVal+offset
> plot(aFullVal[1:1000,1],bFullVal2[1:1000,1]); abline(0,1)
> 
> But I don't really understand what this offset is or where it comes from 
> (particularly in this toy example where the offset is much larger than 
> any real difference between a and b, though I guess I haven't put in 
> anything that actually needs variance stabilisation).
> 
> So it would be good to know i) whether subtraction of whatever the 
> offset turns out to be is a reasonable approach (especially when b 
> actually comprises several arrays)? and ii) Is there any less arbitrary 
> way I can calculate values for array b while keeping on the scale of the 
> 'a' arrays (e.g. using parameter values directly)?
> 
> Any help much appreciated,
> 
> Chris
> 
>> sessionInfo()
> R version 2.5.1 (2007-06-27)
> i486-pc-linux-gnu
> 
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=en_GB.UTF-8;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] "tools"     "stats"     "graphics"  "grDevices" "utils"     "datasets"
> [7] "methods"   "base"
> 
> other attached packages:
>      vsn    limma     affy   affyio  Biobase
>  "2.2.0" "2.10.5" "1.14.2"  "1.4.1" "1.14.1"