[BioC] VSN: minimum number of controls?

Martin Morgan mtmorgan at fhcrc.org
Sat Apr 3 21:16:35 CEST 2010


Hi Eric --

On 04/02/2010 02:41 PM, Eric E. Snyder wrote:
> Hello,
> 
> In my first project with R and BioConductor, I am analyzing some small
> microarrays, starting with variance normalization with vsn.  Using
> Wolfgang Huber's VSN.pdf tutorial I was able to do the exercise with the
> "kidney" dataset without trouble.  However, when trying to run:
> 
>>  fit = vsn2( noDNAcontrols )
> Error in .local(x, reference, strata, ...) :
>   One or more of the strata contain less than 42 elements.
> Please reduce the number of strata so that there is enough in each stratum.

Always good to provide sessionInfo() so that we know the details of the
software you're using

> library(vsn)
> sessionInfo()
R version 2.10.1 Patched (2010-03-27 r51570)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] vsn_3.14.0    Biobase_2.6.1

loaded via a namespace (and not attached):
[1] affy_1.24.2          affyio_1.14.0        grid_2.10.1
[4] lattice_0.18-3       limma_3.2.3          preprocessCore_1.8.0

and then good to try for a reproducible example, or at least enough info
for other to reproduce your error. I started with example(vsn2) and then

> vsn2(kidney[1:20,])
Error in vsnMatrix(exprs(x), reference, strata, ...) :
  One or more of the strata contain less than 42 elements.
Please reduce the number of strata so that there is enough in each stratum.

My guess is that noDNAcontrols is a matrix-like object with rows and
columns transposed, i.e., samples x features rather than features x
samples. What is class(noDNAcontrols) and dim(noDNAcontrols) ? Might as
well copy and paste the output directly from R

> using my own data, I got the error above.  I finally got around the
> error by simulating a dataset containing 50 controls (my original data
> had only 6).  Surprisingly, even 42 controls was insufficient.
> 
> A collaborator, using the same dataset, was able to run vsn successfully
> using an earlier version of R (2.9.0) and Bioconductor (version ?).
> 
> Is anyone familiar with this problem?
> 
> I see two ways forward:
> 
> 1,  Find the appropriate (old) version of Bioconductor and analyze with
> the original controls.
> 
> 2.  Use the current R/Bioconductor releases and either find a software
> patch or a work-around.
> 
> As for #2, maybe it is not unreasonable to use >42 controls on most
> microarrays.  However, this particular dataset is from a series of small
> protein arrays (each probed with patient serum then visualized with
> labeled anti-IgG) that contain only 214 antigens and 6 no DNA (meaning
> "no protein") controls per patient (with a total 853 patients in the
> dataset).  Consequently, it is not possible to run a huge number of
> controls, given the number of experimental cells per slide.
> 
> On a related note, in my effort to inflate the controls that I did have
> into a sufficiently large number, I used "rnorm" to simulate/synthesize
> the data.  Here "noDNAstats" is a 2 x 853 matrix consisting of the mean
> and standard deviation from the patients' noDNAcontrols in the first and
> second rows, respectively.
> 
> i=1
> noDNAsim50 = rnorm(50, noDNAstats[1,i], noDNAstats[2,i])
> for(i in c( 2:ncol(noDNAstats) ) ){
>     noDNAsim50 = cbind(noDNAsim50, rnorm(50, noDNAstats[1,i],
>     noDNAstats[2,i]))
> }
> 
> My understanding was that rnorm would create a dataset of the requested
> size with the requested mean and SD.  The numbers I get are in the same
> ballpark but the means and SD are not the same.  Am I missing something?

at one level this looks ok, but there isn't enough info to reproduce, or
to see precisely what your problem is. Can you be more specific, maybe
with a simpler example, say creating a matrix with two columns, where
you specify mean and sd as numbers directly rather than 'hidden' in a
matrix that we don't have access to?

Martin
> 
> Thanks!
> eesnyder


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list