[BioC] RMA and justRMA error

Wed Aug 16 02:25:48 CEST 2006

Hi Ben
I traced my problem down a bit more.  I ftp the cel files as a .ZIP 
archive. If I uncompress them using winzip on windows, the files are 
ok.  However I was using unzip on Linux and this seems to do some weird 
and wonderful things.  Although the 1st quartile, median and 3rd 
quartile appear to be consistent (from the files I have checked),  the 
min value and the max value seem to be different.  So unzip is 
extracting the files without error (gzip or gunzip don't appear to be 
winzip .ZIP archive friendly), but it is clearly doing some character 
re-shuffling.

Sorry this is not a BioC problem.   But do you know if this a known 
problem or if there is a parameter that I should specify?? 

Thanks so much for all of your help
Regards
Aedin

***unzip details. I am using FC4***
UnZip 5.51 of 22 May 2004, by Info-ZIP.  Maintained by C. Spieler. 
Compiled with gcc 4.0.2 20051125 (Red Hat 4.0.2-8) for Unix (Linux ELF) 
on Feb 6 2006.

Ben Bolstad wrote:

>If you can send me the original CEL file I can take a look to see if it
>is something I consider that should be detectable parsing error.
>
>Ben
>
>
>On Tue, 2006-08-15 at 19:48 -0400, aedin wrote:
>  
>
>>Thanks Ben
>>Sorry I thought the same parser would apply to each method.  I found the 
>>culprit file using the approach you list below. 
>>
>>It was not obvious in any of the normal plots (hist, boxplot etc) as 
>>only one probeset had a ridiculous value (it was 5.6 x10^14).  This 
>>would completely skew a mean but not a median. 
>>
>>Should I be wary of this cel file and dump it, or if it looks ok in the 
>>hist, boxplot should I try to keep it?   Do you know what would cause 
>>this?  How frequently does this occur?
>>
>>Thanks for your help
>>Aedin
>>
>>
>>Ben Bolstad wrote:
>>
>>    
>>
>>>The parsing code does not necessarily detect all potential corruptions.
>>>And you will find that gcrma() will quite happily process the "corrupt"
>>>data I show below.
>>>
>>>The error itself is from the density() function. If you could isolate
>>>the array that is causing trouble using say something like this:
>>>
>>>for (i in 1:4){
>>>cat(i,"\n")
>>>blah <- bg.correct.rma(Dilution.Corrupted[,i])
>>>}
>>>
>>>The perhaps we could look at it a little closer.
>>>
>>>best,
>>>
>>>Ben
>>>
>>>
>>>
>>>On Tue, 2006-08-15 at 18:13 -0400, aedin wrote:
>>> 
>>>
>>>      
>>>
>>>>Dear Ben
>>>>Thanks for your reply. However if the data were corrupted, surely they
>>>>would not be read by ReadAffy and gcrma?
>>>>Aedin
>>>>
>>>>Ben Bolstad wrote: 
>>>>   
>>>>
>>>>        
>>>>
>>>>>Typically, when I have encountered others who have had this error occur
>>>>>it is because they have corrupted data. For instance this piece of
>>>>>demonstration code will generate the same error:
>>>>>
>>>>>
>>>>>library(affy);library(affydata)
>>>>>data(Dilution)
>>>>>Dilution.Corrupted <- Dilution
>>>>>pm(Dilution.Corrupted)[1,1] <- 30000000  
>>>>># that is an extreme value outside the
>>>>># range of normal raw probe intensities
>>>>>
>>>>>eset <- rma(Dilution.Corrupted)
>>>>>
>>>>>
>>>>>My suggestion would be to examine things along those lines.
>>>>>
>>>>>Best,
>>>>>
>>>>>Ben
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Tue, 2006-08-15 at 15:01 -0400, aedin wrote:
>>>>> 
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>Dear BioC
>>>>>>I know that this error is reported a few times on the Bioc mailing list, 
>>>>>>however no resolution to it is available in the archives (or at least 
>>>>>>none that google and I could find).  I get the same error whether I use 
>>>>>>R 2.3.1 or the devel version.  I enclose the devel version error.
>>>>>>
>>>>>>The cels files are read in by ReadAffy and are processed ok by gcrma, 
>>>>>>however fall over when I try to run rma or justRMA.
>>>>>>
>>>>>>Thanks for your help
>>>>>>Aedin
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>df = justRMA(filenames=filenam[125:130])
>>>>>>>              
>>>>>>>
>>>>>>Background correcting
>>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) :
>>>>>>       need at least 2 points to select a bandwidth automatically
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>df = ReadAffy(filenames=filenam[125:130])
>>>>>>>df
>>>>>>>              
>>>>>>>
>>>>>>AffyBatch object
>>>>>>size of arrays=1164x1164 features (63518 kb)
>>>>>>cdf=HG-U133_Plus_2 (54675 affyids)
>>>>>>number of samples=6
>>>>>>number of genes=54675
>>>>>>annotation=hgu133plus2
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>df.rma= rma(df)
>>>>>>>              
>>>>>>>
>>>>>>Background correcting
>>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) :
>>>>>>       need at least 2 points to select a bandwidth automatically
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>library(gcrma)
>>>>>>>df.gcrma= gcrma(df)
>>>>>>>              
>>>>>>>
>>>>>>Adjusting for optical effect......Done.
>>>>>>Computing affinities.Done.
>>>>>>Adjusting for non-specific binding......Done.
>>>>>>Normalizing
>>>>>>Calculating Expression
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>sessionInfo()
>>>>>>>              
>>>>>>>
>>>>>>R version 2.4.0 Under development (unstable) (2006-08-06 r38809)
>>>>>>i686-pc-linux-gnu
>>>>>>
>>>>>>locale:
>>>>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>>>>>
>>>>>>attached base packages:
>>>>>>[1] "splines"   "tools"     "methods"   "stats"     "graphics"  "grDevices"
>>>>>>[7] "utils"     "datasets"  "base"
>>>>>>
>>>>>>other attached packages:
>>>>>>hgu133plus2probe   hgu133plus2cdf            gcrma      matchprobes
>>>>>>       "1.12.0"         "1.12.0"          "2.5.1"          "1.5.0"
>>>>>>           affy           affyio          Biobase            made4
>>>>>>       "1.11.6"          "1.1.5"        "1.11.24"          "1.7.1"
>>>>>>  scatterplot3d             ade4
>>>>>>       "0.3-24"          "1.4-1"
>>>>>>   
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>> 
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>:-) 
>>>>
>>>>-- 
>>>>Aedín Culhane
>>>>Research Associate in Prof. J Quackenbush Lab
>>>>Harvard School of Public Health, Dana-Farber Cancer Institute
>>>>
>>>>
>>>>44 Binney Street, Mayer 232
>>>>Department of Biostatistics
>>>>Dana-Farber Cancer Institute
>>>>Boston, MA 02115
>>>>USA
>>>>
>>>>Phone: +1 (617) 632 2468
>>>>Fax:   +1 (617) 632 5444
>>>>Email: aedin at jimmy.harvard.edu
>>>>Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html
>>>>
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>    
>>

-- 
Aedín Culhane
Research Associate in Prof. J Quackenbush Lab
Harvard School of Public Health, Dana-Farber Cancer Institute

44 Binney Street, Mayer 232
Department of Biostatistics
Dana-Farber Cancer Institute
Boston, MA 02115
USA

Phone: +1 (617) 632 2468
Fax:   +1 (617) 632 5444
Email: aedin at jimmy.harvard.edu
Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html