[BioC] Surrogate variable analysis fails with “subscript out of bounds”

Vebjorn Ljosa ljosa at broad.mit.edu
Thu Jul 5 22:49:29 CEST 2012


On Mon, Jul 2, 2012 at 2:22 PM, Jeff Leek <jtleek at gmail.com> wrote:
>
> This problem is likely because of the small number of genes/features
> you are considering (453) and the high dimension of the response
> variable (12). With so many different levels of the response variable,
> many features are likely significantly associated with the response.
> Part of the iteration in the sva algorithm is to downweight features
> strongly associated with the response, so the whole data set is being
> down-weighted to 0.
>
> I would suggest running only one iteration of sva. Usually it takes a
> very small number of iterations to converge, and since your data are
> relatively low dimensional in the number of features, this may be the
> best that you can do if you are doing artifact discovery.

Thanks for responding. As I tried to use only one iteration, I soon
came across a dataset where even that was too much:
http://www.broadinstitute.org/~ljosa/svaproblem/trainData4.txt
http://www.broadinstitute.org/~ljosa/svaproblem/trainpheno4.txt

Would it be reasonable to detect the case when `dats` is all zeros
after the first iteration and to return an empty set of surrogate
variables in that case?

Vebjorn



More information about the Bioconductor mailing list