[BioC] About subsampling of VST in lumi

ligia at ebi.ac.uk ligia at ebi.ac.uk
Fri Dec 14 22:56:32 CET 2007


Hi Pan,

Thanks for your email.
The problem I reported is not due to the downsampling step controlled via 
"nSupport" parameter, but with a subsequent step in "vst" where if the
number of selected probes with high variance (indSel) is above 5000, then
only a random subset (5000) of these probes is used (the steps I mentioned
in my last email) to fit the linear model between variance and mean of
probe beads. Couldn't this value (5000) be just another parameter to
"vst"?

Thanks for your help,
Ligia



> Hi Ligia,
>
> Thanks for your report.
> Yes, we use down-sampling to speed up the parameter estimation. If you
> want
> to use all the data points, you can set the parameter "nSupport" of vst
> function as the length of the vector. I will add this to the vignette or
> help file. Thanks!
>
>
> Pan
>
>
> On 12/14/07 5:18 AM, "ligia at ebi.ac.uk" <ligia at ebi.ac.uk> wrote:
>
>> Dear Pan Du,
>>
>>> From what I understand when looking at "vst", the random subsampling
>>> that
>> affects my data occurs at step 4 below:
>>
>> 1       if (c3 != 0) {
>> 2            selInd <- selInd & (std^2 > c3)
>> 3            dd <- data.frame(y = sqrt(std[selInd]^2 - c3), x1 =
>> u[selInd])
>> 4            if (nrow(dd) > 5000   dd <- dd[sample(1:nrow(dd), 5000), ]
>> 5            lmm <- lm(y ~ x1, dd)
>> 6            c1 <- lmm$coef[2]
>> 7            c2 <- lmm$coef[1]
>> 8        }
>>
>> because my "dd" matrix has around 5500 rows. Maybe it would be nice to
>> have the option to turn this off, or add the option to provide the max
>> value allowed for nrow(dd)...
>>
>> Cheers,
>> Lígia
>>
>>
>>> Dear Ligia
>>>
>>> I believe this is because they random subsample the data to "speed
>>> processing", see the man page and the  nSupport parameter.
>>>
>>> I cc Pan Du with the suggestion to make the explanation of this in the
>>> man page more clear. Is there an option to switch off the random
>>> subsampling?
>>>
>>>   Best wishes
>>> Wolfgang
>>>
>>>
>>>
>>> ligia at ebi.ac.uk ha scritto:
>>>> Hi Wolfgang,
>>>>
>>>> I noticed a peculiar behaviour in lumi package: when I apply the
>>>> variance
>>>> stabilizing transformation,
>>>> it gives slightly different results each time I run the method. See
>>>> below
>>>> for a subset of the data:
>>>>
>>>>
>>>>> load("dat.rda")
>>>>> library("lumi")
>>>>
>>>>> x1 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>> 2007-12-13 10:56:35 , processing array  1
>>>> 2007-12-13 10:56:35 , processing array  2
>>>> 2007-12-13 10:56:35 , processing array  3
>>>> 2007-12-13 10:56:35 , processing array  4
>>>>
>>>>> x2 <- lumiT(dat, method="vst", ifPlot=!TRUE)
>>>> 2007-12-13 10:56:36 , processing array  1
>>>> 2007-12-13 10:56:36 , processing array  2
>>>> 2007-12-13 10:56:36 , processing array  3
>>>> 2007-12-13 10:56:37 , processing array  4
>>>>
>>>>
>>>>> table(exprs(x1)==exprs(x2))
>>>>
>>>> FALSE  TRUE
>>>> 88705     3
>>>>
>>>>> range(exprs(x1)-exprs(x2))
>>>> [1] -0.05682931  0.03592777
>>>>
>>>>> sessionInfo()
>>>> R version 2.7.0 Under development (unstable) (2007-11-29 r43558)
>>>> i686-pc-linux-gnu
>>>>
>>>> locale:
>>>> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8
>>>> ;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAM
>>>> E=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION
>>>> =C
>>>>
>>>> attached base packages:
>>>> [1] tools     stats     graphics  grDevices utils     datasets
>>>> methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>>  [1] lumi_1.5.10            annotate_1.15.6        AnnotationDbi_1.1.6
>>>>  [4] RSQLite_0.6-0          DBI_0.2-3              mgcv_1.3-29
>>>>  [7] affy_1.15.7            preprocessCore_0.99.12 affyio_1.5.7
>>>> [10] Biobase_1.17.6
>>>>
>>>> Cheers,
>>>> Ligia
>>>
>>>
>>> --
>>>
>>> Best wishes
>>>    Wolfgang
>>>
>>> ------------------------------------------------------------------
>>> Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>>>
>>
>>
>
>
> ---------------------------------------------------
> Pan Du, PhD
> Research Assistant Professor
> Robert H. Lurie Comprehensive Cancer Center
> Northwestern University
> 676 ST Clair St., #1200
> Chicago, IL 60611
> Office (312)695-4781
> dupan at northwestern.edu
> ---------------------------------------------------
>
>
>
>
>



More information about the Bioconductor mailing list