[BioC] affy code archeology: expresso behavior for liwong/invariantset

James W. MacDonald jmacdon at uw.edu
Mon Jan 28 17:08:13 CET 2013


Hi David,

On 1/28/2013 10:14 AM, Martin Morgan wrote:
> On 01/24/2013 01:41 PM, David Eby [guest] wrote:
>>
>> Hello,
>>
>> I am in the midst of updating some inherited legacy code from R 2.8 
>> and affy_1.20.2 to R 2.15.2 and affy_1.36.0, trying to fix some 
>> longstanding bugs.
>>
>
> That's a lot of water under the bridge (4 1/2 years? do you still have 
> a working installation?)...
>
>> Everything is going well except for some very different results 
>> coming out of an expresso call in one of my regression tests, 
>> specifically from:
>>     eset <- expresso(afbatch, normalize.method="invariantset", 
>> bg.correct=FALSE, pmcorrect.method="pmonly",summary.method="liwong", 
>> verbose=TRUE)
>
> ...there is this at least in svn
>
> affy$ svn log -r57229
> ------------------------------------------------------------------------
> r57229 | jmacdon | 2011-08-04 12:18:56 -0700 (Thu, 04 Aug 2011) | 1 line
>
> fixed bug in normalize.AffyBatch.invariantset
> ------------------------------------------------------------------------
>
> and I've added Jim to the email for any insights.

Thanks for the detective work, Martin. This was on my to-do list for 
today, and now I can just press the easy button.

I don't recall how this came up, but there was definitely a bug. The 
basic idea is that we want to say which of the arrays is the reference, 
based on one of four criteria. The default is to base this on the array 
that has the median overall intensity (where overall intensity is 
defined as the mean intensity).

Prior to the fix we weren't correctly figuring out which array had the 
median overall intensity, and were instead simply picking the trunc(N/2) 
array, where N = # arrays. So if you had 8 arrays, this function would 
choose the 4th array as reference regardless.

Now it actually chooses the array with the median overall intensity, so 
the difference between your old results and the new is caused by 
switching the array that is used as reference.

Best,

Jim
>
> Martin
>
>
>> It's reporting expression values approx. 0.4-0.7x the values in the 
>> earlier setup (examples below).  I take for granted that many things 
>> have changed both at the R level and at the package level given the 
>> large leap in versions, but the difference seemed a bit odd since the 
>> results for other methods (RMA, GCRMA, MAS5) remained reasonably 
>> consistent in the update.
>>
>> It's quite possible that there is a package conflict or that I've 
>> accidentally broken something in the code base, though the legacy 
>> code itself is largely unchanged and the other regression tests seem 
>> to hold up pretty well.  I have looked at a number of things already, 
>> but before heading further down that path I wanted to check whether 
>> the implementation itself might have changed in a way where these 
>> differences would be expected.  Has there been a major change to 
>> liwong/invariantset somewhere along the way between 1.20.2 and 1.36.0?
>>
>> Checking with other members of the team, we're actually completely OK 
>> with the differences if they are in line with known changes to the 
>> affy package and can be explained to our users (this is a GenePattern 
>> module).
>>
>> Here is some example output for reference.  This is for a cut-down 
>> data set; I've seen similar results with 20 samples.  I can provide 
>> more if it would be helpful.
>>> From the original setup (using write.table(exprs(eset)):
>> "CL20030502207AA.CEL" "CL20030502208AA.CEL" "CL20030502307AA.CEL" 
>> "CL20030502308AA.CEL"
>> "1007_s_at" 228.013212425507 214.877883873475 287.677963272274 
>> 306.997621485651
>> "1053_at" 193.206766296132 168.017787218035 169.430151901596 
>> 157.819950438341
>> <...snip...>
>>
>>> From the new setup:
>> "CL20030502207AA.CEL" "CL20030502208AA.CEL" "CL20030502307AA.CEL" 
>> "CL20030502308AA.CEL"
>> "1007_s_at" 500.169414439674 461.734001304198 704.700664579873 
>> 735.43106514776
>> "1053_at" 321.921661422307 251.464504650157 261.491260842992 
>> 228.793486504634
>> <...snip...>
>>
>> Thanks in advance!
>> Regards,
>> David
>>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list