[BioC] Unable to Generate QC Report for mogene10stv1

James W. MacDonald jmacdon at med.umich.edu
Mon Dec 20 15:33:45 CET 2010


Hi Rick,

On 12/17/2010 9:24 PM, Rick Frausto wrote:
> Hey Jim,
>
> Ok, I will give that a go. The only problem is an ExpressionSet contains all
> of the necessary information for further analysis (e.g. phenodata,
> featuredata and annotation, etc - including, treatment type, cell type, time
> points, replicates). I am still learning how to include all of these for a
> complete ExpressionSet. As a starting point I've loaded a txt file
> containing some of this information (gene abbrev, ontology, probeset ID)
> which I created using Affymetrix's Expression Console software, without
> replicate, time point and cell type info. Doing this I've gotten as far as
> creating a minimal ExpressionSet, which I guess the functions you mention
> below do just that but with the information contained in the CEL file only.
>
> In any case, since as you say, the functions in the online manual create a
> proper ExpressionSet why would I get the issue of duplication?

Oh yeah, the original question ;-D. Try running QCreport() again, and 
when it errors out run traceback() and send the output. Also include the 
output of sessionInfo().

Jim


>
> In regards to the 64-bit discussion. It may have very well made enough of a
> difference as it did not come up with the memory error the last time I tried
> it. Going to upgrade to 8GB RAM anyways, can't hurt.
>
> Cheers,
> Rick
>
>
> On 17/12/10 7:20 AM, "James W. MacDonald"<jmacdon at med.umich.edu>  wrote:
>
>> Hi Rick,
>>
>> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>>> Hi Jim,
>>>
>>> How do I run an RMA analysis without a proper ExpresionSet? Honest answer, I
>>> don't know, I just put in a command line from a manual I found online and it
>>> spit out some result- see #3 Affy packages in following link (
>>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_intro).
>>
>> You are mistaken. All of the functions mentioned there result in a
>> proper ExpressionSet. And if you just do
>>
>> abatch<- ReadAffy()
>> eset<- rma(abatch)
>>
>> Then you will 100% surely get an ExpressionSet.
>>
>>>
>>> Perhaps you don't need an ExpressionSet until after the preprocessing, at
>>> least that is what I get from the "An Introduction to Bioconductor's
>>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>>> Gentleman. Everything seemed to be going smoothly until I tried to get a QC
>>> Report.
>>>
>>> Now, the answer for why I would want to do such a thing is easy. Simply that
>>> I don't know any better :) Just started working with R a few days ago, but
>>> I'm learning.
>>>
>>>
>>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB of
>>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit OS and
>>> see if it makes a difference.
>>
>> Well, it won't be much different. The reason a 32-bit OS can only use
>> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS also
>> needs to use some RAM, so you won't get all 4 Gb there either. The issue
>> is how much RAM can be allocated to a single process, and on a 64-bit OS
>> that gets bumped up significantly.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>> Thanks for your insight!
>>>
>>> Cheers,
>>> Rick
>>>
>>>
>>>
>>>
>>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu>   wrote:
>>>
>>>> Hi Rick,
>>>>
>>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but have
>>>>> quite a few other programs running in the background...I'll see if closing
>>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the problem.
>>>>> I
>>>>> just started reading up on how to set one of these up yesterday. Will do
>>>>> this and see if the duplicates will go away.
>>>>>
>>>>> The "mydata" originates from CEL files and then I run the RMA analysis on
>>>>> it, but I didn't actually set up a proper ExpressionSet. I'm guessing that
>>>>> doing this might reduce the QCReport PDF file size quite considerably since
>>>>> I won't have any duplication and will make further analysis easier.
>>>>
>>>> How do you run an RMA analysis without setting up a proper
>>>> ExpressionSet? The default behavior is to create one. In addition, why
>>>> would you want to do such a thing? The ExpressionSet class is
>>>> specifically designed to contain these sorts of data.
>>>>
>>>>
>>>>>
>>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would running as
>>>>> 64bit still necessitate more RAM?
>>>>
>>>> Probably. The difference isn't efficiency, but the ability to address
>>>> more RAM. A 32-bit OS can still address all the available memory that
>>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>>> want to do all the chips together. As for how much, I don't know. Since
>>>> RAM isn't that expensive these days, you might look at maxing your box out.
>>>>
>>>> Best,
>>>>
>>>> Jim
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> Thanks again,
>>>>> Rick
>>>>>
>>>>>
>>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>    wrote:
>>>>>
>>>>>> Hi Rick,
>>>>>>
>>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>>> Dear All,
>>>>>>>
>>>>>>> I have recently entered the world of R. Through some trial and error I'm
>>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>>> packages.
>>>>>>> I¹m a molecular and cell biologist with rudimentary statistical knowledge
>>>>>>> and even less knowledge with respect to R.
>>>>>>>
>>>>>>> When I enter the following:
>>>>>>>
>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>>>
>>>>>>> I get some errors in return.
>>>>>>>
>>>>>>> Loading required package: lattice
>>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>>>
>>>>>> This indicates that you need more RAM, as you are running out of memory.
>>>>>>
>>>>>>> In addition: Warning message:
>>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>>       some row.names duplicated:
>>>>>>>
>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,>>
>>>>>
>>> 5
>>>>>>>
>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,>>
>>>>>
>>> 1
>>>>>>>
>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,>>
>>>>>
>>> 1
>>>>>>>
>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,>>
>>>>>
>>> 1
>>>>>>>
>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,>>
>>>>>
>>> 2
>>>>>>>
>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,>>
>>>>>
>>> 2
>>>>>>>
>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,>>
>>>>>
>>> 2
>>>>>>>
>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,>>
>>>>>
>>> 3
>>>>>>>
>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,>>
>>>>>
>>> 3
>>>>>>>
>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,>>
>>>>>
>>> 4
>>>>>>>
>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,>>
>>>>>
>>> 4
>>>>>>>
>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,>>
>>>>>
>>> 4
>>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>>> truncated]
>>>>>>
>>>>>> What exactly is 'mydata', and how did you generate it? The above error
>>>>>> indicates that you have duplicate row names, which IIRC isn't possible
>>>>>> to do with an expressionSet.
>>>>>>
>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>> code=12)
>>>>>>> *** error: can't allocate region
>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>> code=12)
>>>>>>> *** error: can't allocate region
>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>
>>>>>> More lack of memory errors.
>>>>>>
>>>>>>
>>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>>       unused argument(s) (htmlhelp = TRUE)
>>>>>>> In addition: Warning messages:
>>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>>       datasets have been moved from package 'base' to package 'datasets'
>>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>>       datasets have been moved from package 'stats' to package 'datasets'
>>>>>>> starting httpd help server ... done
>>>>>>>
>>>>>>> Would someone be able to diagnose the problem and suggest a solution?
>>>>>>
>>>>>> First, get more RAM. Second, you will be better off using a 64-bit OS.
>>>>>> Depending on your hardware, you might be able to just install a 64-bit
>>>>>> version of R.
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Jim
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> If it is useful, I am using the following R software: R for Mac OS X GUI
>>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that would be
>>>>>>> useful please let me know.
>>>>>>>
>>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>>> following
>>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>>> library(affyQCReport);
>>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be doing
>>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s been
>>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>>>
>>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>>> separately
>>>>>>> rather than as a whole.
>>>>>>>
>>>>>>> Cordially,
>>>>>>> Rick
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>
>>>
>

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list