[BioC] Boxplots, different results?

Tue Jul 11 15:53:04 CEST 2006

Hi Ligia,

thanks for your reply... your answer makes sense, in that both methods
perhaps remove different numbers of spots. However, the number of spots
could never be larger than the total! :-)

My arrays have 10752 spots in them (14x16 times 48 blocks in a 12x4 fashion)

when I use the approach (a)
    (a)   A<-boxplot(MAw$M[,1],MAw$M[,2],MAw$M[,3])
I look at the number of observations, I get different numbers between
9000 and 9500 for each slide. That's okay. It's removing mising values
on a "per column" basis.
    A$n
    [1] 9181 9435 9331

now, when I try the approach (b)
    (b)   B<-boxplot(MAw$M ~ col(MAw$M[,1:3]))
The number of observations is identical for the three slides, 
consistent with what you say about removing the same spots across 
slides... but the values are larger than the total!
    B$n
    [1] 16218 16218 16218

and now, I try using the 'split' function (very useful, thanks for 
pointing that one to me, by the way, I'm still rather inexperienced in 
R) I get yet another result:
    (c)   C<-boxplot(split(MAw$M, col(MAw$M[,1:3])))
    C$n
    [1] 17924 18500 19027

Now the values are different on each slide, but all larger than the 
maximum 10752...

before anyboidy asks:

> dim(MAw)
[1] 10752     6

So I'm very confused...

Jose

Quoting ligia at ebi.ac.uk:

> Hi, Jose
>
> I've also noticed this feature some time ago.
> It is related with the way they handle missing data.
>
> For example, if you save the output of boxplot in either case, we can see
> that:
> a = boxplot(MA$M[,1],MA$M[,2],MA$M[,3])
> b = boxplot(MAw$M ~ col(MAw$M[,1:3]))
>
> the number of observations is different:
>
> a$n
> b$n
>
> Because option (b) is removing the NA entries that are common to all the
> columns in MAw$M, so you'll have less data points in each vector.
>
> However, if you use the command "split" we this will work, giving the same
> results as option (a):
>
> boxplot(split(MAw$M, col(MAw$M[,1:3])))
>
>
> Best wishes,
> Ligia
>
>
>>
>> Hi,
>>
>> I am using limma to analyse my cDNA expression arrays (2 channel).
>>
>> I am looking at boxplots generated from the M values of my arrays (MA =
>> product of 'normalizeWithinArrays', but I am not sure I understand the
>> syntax and what the 'boxplot' function is doing.
>>
>> This is because I get slightly different plots if I try (a) or (b)
>> below, which
>> I thought would be equivalent. Am I missing something?
>>
>> (a)
>> boxplot(MA$M[,1],MA$M[,2],MA$M[,3])
>>
>> (b)
>> boxplot(MAw$M ~ col(MAw$M[,1:3]))
>>
>> The differences are noticeable on teh spots outside the "whiskers". The
>> main box and whiskers themselves *appear* to be the same. I guess some
>> defaults must be different when defining the data as a formula or
>> explicitly naming the vectors... but I'm not finding an obvious note as
>> to which ones they may be?
>>
>> thanks for your help,
>>
>> Jose
>>
>>
>> --
>> Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
>> The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
>> Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
>> Swann Building, Mayfield Road
>> University of Edinburgh
>> Edinburgh EH9 3JR
>> UK
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK