[BioC] A couple VSN related questions
Wolfgang Huber
huber at ebi.ac.uk
Sun Feb 3 18:08:18 CET 2008
Dear Leo
> Thank you very much for your thorough and clear answer! I have an
> additional question: is the "sigsq" slot of a vsn object the estimated
> variance (which the name and actual value seem to indicate) or the standard
> deviation (as documented on the man page)? Also is it before or after the
> log(2) transformation (I guess it is before, but want to make sure)?
>
> Thanks again for taking your time to answer my questions!
thank you, you are right. It is the variance squared; the man page is
fixed
in vsn >= 3.4.6. sigma^2 is defined in the vignette "Likelihood
calculations
for vsn". It is computed before that affine scaling and shifting that we
talked about in the previous mail.
Best wishes
Wolfgang
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk]
> Sent: Wednesday, January 30, 2008 7:55 PM
> To: Leo J. LEE
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] A couple VSN related questions
>
> Dear Leo,
>
> answers please see below.
>
>> I am new to Bioconductor and R (having the latest version, Bioconductor
> 2.1
>> and R 2.6.1, installed) so please bear with me if I ask something
>> naive/stupid. Anyway, the most important reason why I come to
> Bioconductor
>> is to use the vsn package, and after trying it a bit, I have the following
>> questions:
>>
>> 1. It is clear from the documentation (and a previous email from Wolfgang)
>> that the real computation of vsn2 is written in C. Is it possible for me
> to
>> access the C source code (and compile and run it myself)? If yes, this
> will
>> be truly great since I have to do other computational analysis outside of
> R
>> anyway.
>
> Yes, it is all open source. The most obvious way is to download the
> .tar.gz file from
> http://www.bioconductor.org/packages/devel/bioc/html/vsn.html
> and unpack it. See the file vsn2.c. There is also public, anonymous svn
> access, which is described on the bioc website.
>
>> 2. I am trying to follow the computation performed by VSN as documented in
>> the "Introduction to VSN" vignette. Loading the kidney dataset, I have
> the
>> following data before and after VSN:
>>> exprs(kidney)[1:10,]
>> channels
>> spots green red
>> 1 815.32 937.02
>> 2 671.66 765.03
>> 3 713.93 713.59
>> 4 703.97 656.70
>> 5 493.59 472.10
>> 6 477.92 453.13
>> 7 346.55 385.47
>> 8 377.34 421.86
>> 9 132.80 121.77
>> 10 148.65 130.41
>>> fit = vsn2(kidney)
>> vsn: 8704 x 2 matrix (1 stratum). 100% done.
>>> fit at hx[1:10,]
>> channels
>> spots green red
>> 1 9.711781 9.870624
>> 2 9.453765 9.597638
>> 3 9.533994 9.506128
>> 4 9.515436 9.398666
>> 5 9.067212 8.995503
>> 6 9.028838 8.948561
>> 7 8.673814 8.771463
>> 8 8.762664 8.868634
>> 9 7.957576 7.926483
>> 10 8.016070 7.957696
>>
>> Now I am trying to reproduce the result from parameters estimated by VSN
>> using
>>> coef(fit)[1,,]
>> [,1] [,2]
>> [1,] -0.08444849 -5.927967
>> [2,] -0.06787093 -5.957545
>>> offset <- coef(fit)[1,,1];
>>> scale <- exp(coef(fit)[1,,2]);
>>> ynorm_my = asinh(exprs(kidney)*scale + offset);
>>> ynorm_my[1:10,]
>> channels
>> spots green red
>> 1 1.4820851 1.6139176
>> 2 1.2851048 1.4029668
>> 3 1.3588520 1.3584152
>> 4 1.3272734 1.2650499
>> 5 1.0353039 0.9986861
>> 6 0.9954236 0.9530607
>> 7 0.7626212 0.8400535
>> 8 0.8148209 0.8976601
>> 9 0.2661626 0.2376892
>> 10 0.3115130 0.2662458
>>
>> which is clearly different from the result produced by VSN. Could anybody
>> tell me what I have done wrong? Thank you so much!
>
> Try this:
>
> library("vsn")
> data("kidney")
> fit = vsn2(kidney)
> shift = coef(fit)[1,,1]
> scale = exp(coef(fit)[1,,2])
>
> nk = t(asinh(t(exprs(kidney))*scale + shift))
> nk = nk/log(2) - fit at hoffset
>
> identical(nk , exprs(fit))
> [1] TRUE
>
>
> This is a subtle point. It is mentioned (admittedly too briefly) in the
> vsn2 man page, section "Note on overall scale and location of the glog
> transformation":
>
> The data are returned on a glog scale to base 2. More precisely,
> the transformed data are subject to the transformation
> glog2(f(b)*x+a) + c, where glog2(u) = log2(u+sqrt(u*u+1)) =
> asinh(u)/log(2) is called the generalised logarithm, a and b are
> the fitted model parameters (see references), f is a parameter
> transformation [4], and the overall constant offset c is computed
> from b such that for large x the transformation approximately
> corresponds to the log2 function. The offset c is inconsequential
> for all differential expression calculations, but many users like
> to see the data in a range that they are familiar with.
>
>
> So the difference to what you did is (i) to go from glog to glog_2 scale
> (the division by log(2)), and (ii) to subtract fit at hoffset. The
> computation of fit at hoffset is done towards the end of the "vsnMatrix"
> function.
>
> Best wishes
> Wolfgang
>
>
More information about the Bioconductor
mailing list