[BioC] M values; and dist functions

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Apr 26 15:22:34 CEST 2011


Hi John,


On Tue, Apr 26, 2011 at 6:24 AM, john herbert <arraystruggles at gmail.com> wrote:
> It would be helpful to get some clarification on some, supposedly, simple
> array facts;
>
> Part1)
>
> For 2 colour arrays, Mvalues!
>
> Am I correct in thinking that M values from a two colour array are the same
> as log2 fold change?
> Cy5 = case, Cy3 = control and M = log2( case/control), so a log fold change
> of -1 is 2 fold down-regulated etc?

That is correct, with the (obvious) exception that there are no hard
and fast rules for what type of samples are labeled with cy5 or cy3
... and often times there are dye swaps done to control for bias ...
those scenarios are (I think) covered in the limma manual, btw.

> For a 1 colour array, Mvalues will arise from array1 = case and arrray 2 =
> control
> So log2(array1/array2) is again the equivalent of log2 fold change.

Yup ... just make sure you normalize your arrays together.

> with both these scenarios, I am right in stating that the raw fluorescent
> signals are themselves log2 transformed to make plot distributions close to
> normal?

Yes.

> Part2)
>
> I use the marray package to extract array data, normalise and an impute
> package to replace missing values for M values.
> I make myself an expression set object using "new"
>
> I then want to generate a dist object
>
>> dd = dist(exprs(exampleSet))
> Error in vector("double", length) :
>  cannot allocate vector of length 582309001
>
> or
>
>> dd = dist(exampleSet)
> Error in vector("double", length) :
>  cannot allocate vector of length 582309001
>
> It is probable I need to reduce the data set first as most genes are not
> differentially expressed (as is the assumption with microarrays).
> It would be great to understand these types of things more.

Hmmm ...

Well, R does have a limit on vector length that is a result of it
using 32bit integers (for indexing, I guess) -- so, I think the max
size of a vector is 2^31 - 1 == .Machine$integer.max == 2147483647.

But that's still bigger than 582309001. This works on my cpu, for instance:

R> x <- integer(582309001)

It takes a ton (~2gb) of memory, but it still works (I don't have
enough free memory to make a "numeric" (aka double) vector of that
size, though). Maybe you're hitting the memory limits of your machine?

How much RAM do you have? Are you running R in 32-bit or 64-bit mode?
What's the result of:

R> sessionInfo()

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list