[BioC] Unable to Generate QC Report for mogene10stv1

Rick Frausto ricardo.frausto at sydney.edu.au
Fri Jan 7 20:14:43 CET 2011


Hi James,

Below is the information that you requested - traceback() and sessioninfo().
Doesn't seem like much to me, but perhaps you can help. As you answer to a
lot of e-mails, thought I'd remind you that this is in regards to the "some
row.names duplicated" error.

Hope your holidays were good!

-Rick

[R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]

[Workspace restored from /Users/rickfrausto/.RData]
[History restored from /Users/rickfrausto/.Rapp.history]

> library(affy)
Loading required package: Biobase

Welcome to Bioconductor

  Vignettes contain introductory material. To view, type
  'openVignette()'. To cite Bioconductor, see
  'citation("Biobase")' and for packages 'citation(pkgname)'.

> mydata <- ReadAffy()
> eset <- rma(mydata)
Background correcting
Normalizing
Calculating Expression
> write.exprs(eset, file="mydata.txt")
> mypm <- pm(mydata)
> mymm <- mm(mydata)
> myaffyids <- probeNames(mydata)
> result <- data.frame(myaffyids, mypm, mymm)
> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
Loading required package: lattice
Warning message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated:
4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,5
4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,1
03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,1
47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,1
73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,2
10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,2
52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,2
96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,3
38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,3
82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,4
07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,4
49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,4
95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
truncated]
Error in plot(qc(object)) :
  error in evaluating the argument 'x' in selecting a method for function
'plot'
> traceback()
2: plot(qc(object))
1: QCReport(mydata, file = "ExampleQC.pdf")
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] affyQCReport_1.28.1   lattice_0.19-13       mogene10stv1cdf_2.7.0
[4] affy_1.28.0           Biobase_2.10.0

loaded via a namespace (and not attached):
 [1] affyio_1.18.0         affyPLM_1.26.0        annotate_1.28.0
 [4] AnnotationDbi_1.12.0  Biostrings_2.18.2     DBI_0.2-5
 [7] gcrma_2.22.0          genefilter_1.32.0     grid_2.12.0
[10] IRanges_1.8.7         preprocessCore_1.12.0 RColorBrewer_1.0-2
[13] RSQLite_0.9-4         simpleaffy_2.26.1     splines_2.12.0
[16] survival_2.36-2       tools_2.12.0          xtable_1.5-6
> 




On 20/12/10 6:33 AM, "James W. MacDonald" <jmacdon at med.umich.edu> wrote:

> Hi Rick,
> 
> On 12/17/2010 9:24 PM, Rick Frausto wrote:
>> Hey Jim,
>> 
>> Ok, I will give that a go. The only problem is an ExpressionSet contains all
>> of the necessary information for further analysis (e.g. phenodata,
>> featuredata and annotation, etc - including, treatment type, cell type, time
>> points, replicates). I am still learning how to include all of these for a
>> complete ExpressionSet. As a starting point I've loaded a txt file
>> containing some of this information (gene abbrev, ontology, probeset ID)
>> which I created using Affymetrix's Expression Console software, without
>> replicate, time point and cell type info. Doing this I've gotten as far as
>> creating a minimal ExpressionSet, which I guess the functions you mention
>> below do just that but with the information contained in the CEL file only.
>> 
>> In any case, since as you say, the functions in the online manual create a
>> proper ExpressionSet why would I get the issue of duplication?
> 
> Oh yeah, the original question ;-D. Try running QCreport() again, and
> when it errors out run traceback() and send the output. Also include the
> output of sessionInfo().
> 
> Jim
> 
> 
>> 
>> In regards to the 64-bit discussion. It may have very well made enough of a
>> difference as it did not come up with the memory error the last time I tried
>> it. Going to upgrade to 8GB RAM anyways, can't hurt.
>> 
>> Cheers,
>> Rick
>> 
>> 
>> On 17/12/10 7:20 AM, "James W. MacDonald"<jmacdon at med.umich.edu>  wrote:
>> 
>>> Hi Rick,
>>> 
>>> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>>>> Hi Jim,
>>>> 
>>>> How do I run an RMA analysis without a proper ExpresionSet? Honest answer,
>>>> I
>>>> don't know, I just put in a command line from a manual I found online and
>>>> it
>>>> spit out some result- see #3 Affy packages in following link (
>>>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_intro).
>>> 
>>> You are mistaken. All of the functions mentioned there result in a
>>> proper ExpressionSet. And if you just do
>>> 
>>> abatch<- ReadAffy()
>>> eset<- rma(abatch)
>>> 
>>> Then you will 100% surely get an ExpressionSet.
>>> 
>>>> 
>>>> Perhaps you don't need an ExpressionSet until after the preprocessing, at
>>>> least that is what I get from the "An Introduction to Bioconductor's
>>>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>>>> Gentleman. Everything seemed to be going smoothly until I tried to get a QC
>>>> Report.
>>>> 
>>>> Now, the answer for why I would want to do such a thing is easy. Simply
>>>> that
>>>> I don't know any better :) Just started working with R a few days ago, but
>>>> I'm learning.
>>>> 
>>>> 
>>>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB of
>>>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit OS
>>>> and
>>>> see if it makes a difference.
>>> 
>>> Well, it won't be much different. The reason a 32-bit OS can only use
>>> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS also
>>> needs to use some RAM, so you won't get all 4 Gb there either. The issue
>>> is how much RAM can be allocated to a single process, and on a 64-bit OS
>>> that gets bumped up significantly.
>>> 
>>> Best,
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>>> 
>>>> Thanks for your insight!
>>>> 
>>>> Cheers,
>>>> Rick
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu>   wrote:
>>>> 
>>>>> Hi Rick,
>>>>> 
>>>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but have
>>>>>> quite a few other programs running in the background...I'll see if
>>>>>> closing
>>>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the
>>>>>> problem.
>>>>>> I
>>>>>> just started reading up on how to set one of these up yesterday. Will do
>>>>>> this and see if the duplicates will go away.
>>>>>> 
>>>>>> The "mydata" originates from CEL files and then I run the RMA analysis on
>>>>>> it, but I didn't actually set up a proper ExpressionSet. I'm guessing
>>>>>> that
>>>>>> doing this might reduce the QCReport PDF file size quite considerably
>>>>>> since
>>>>>> I won't have any duplication and will make further analysis easier.
>>>>> 
>>>>> How do you run an RMA analysis without setting up a proper
>>>>> ExpressionSet? The default behavior is to create one. In addition, why
>>>>> would you want to do such a thing? The ExpressionSet class is
>>>>> specifically designed to contain these sorts of data.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would running
>>>>>> as
>>>>>> 64bit still necessitate more RAM?
>>>>> 
>>>>> Probably. The difference isn't efficiency, but the ability to address
>>>>> more RAM. A 32-bit OS can still address all the available memory that
>>>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>>>> want to do all the chips together. As for how much, I don't know. Since
>>>>> RAM isn't that expensive these days, you might look at maxing your box
>>>>> out.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Thanks again,
>>>>>> Rick
>>>>>> 
>>>>>> 
>>>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Rick,
>>>>>>> 
>>>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>>>> Dear All,
>>>>>>>> 
>>>>>>>> I have recently entered the world of R. Through some trial and error
>>>>>>>> I'm
>>>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>>>> packages.
>>>>>>>> I¹m a molecular and cell biologist with rudimentary statistical
>>>>>>>> knowledge
>>>>>>>> and even less knowledge with respect to R.
>>>>>>>> 
>>>>>>>> When I enter the following:
>>>>>>>> 
>>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>>>> 
>>>>>>>> I get some errors in return.
>>>>>>>> 
>>>>>>>> Loading required package: lattice
>>>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>>>> 
>>>>>>> This indicates that you need more RAM, as you are running out of memory.
>>>>>>> 
>>>>>>>> In addition: Warning message:
>>>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>>>       some row.names duplicated:
>>>>>>>> 
>>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,
>>>> >>
>>>>>> 
>>>> 5
>>>>>>>> 
>>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,
>>>> >>
>>>>>> 
>>>> 1
>>>>>>>> 
>>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,
>>>> >>
>>>>>> 
>>>> 1
>>>>>>>> 
>>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,
>>>> >>
>>>>>> 
>>>> 1
>>>>>>>> 
>>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,
>>>> >>
>>>>>> 
>>>> 2
>>>>>>>> 
>>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,
>>>> >>
>>>>>> 
>>>> 2
>>>>>>>> 
>>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,
>>>> >>
>>>>>> 
>>>> 2
>>>>>>>> 
>>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,
>>>> >>
>>>>>> 
>>>> 3
>>>>>>>> 
>>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,
>>>> >>
>>>>>> 
>>>> 3
>>>>>>>> 
>>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,
>>>> >>
>>>>>> 
>>>> 4
>>>>>>>> 
>>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,
>>>> >>
>>>>>> 
>>>> 4
>>>>>>>> 
>>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,
>>>> >>
>>>>>> 
>>>> 4
>>>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>>>> truncated]
>>>>>>> 
>>>>>>> What exactly is 'mydata', and how did you generate it? The above error
>>>>>>> indicates that you have duplicate row names, which IIRC isn't possible
>>>>>>> to do with an expressionSet.
>>>>>>> 
>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>> code=12)
>>>>>>>> *** error: can't allocate region
>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>> code=12)
>>>>>>>> *** error: can't allocate region
>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>> 
>>>>>>> More lack of memory errors.
>>>>>>> 
>>>>>>> 
>>>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>>>       unused argument(s) (htmlhelp = TRUE)
>>>>>>>> In addition: Warning messages:
>>>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>       datasets have been moved from package 'base' to package
>>>>>>>> 'datasets'
>>>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>       datasets have been moved from package 'stats' to package
>>>>>>>> 'datasets'
>>>>>>>> starting httpd help server ... done
>>>>>>>> 
>>>>>>>> Would someone be able to diagnose the problem and suggest a solution?
>>>>>>> 
>>>>>>> First, get more RAM. Second, you will be better off using a 64-bit OS.
>>>>>>> Depending on your hardware, you might be able to just install a 64-bit
>>>>>>> version of R.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Jim
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> If it is useful, I am using the following R software: R for Mac OS X
>>>>>>>> GUI
>>>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that would be
>>>>>>>> useful please let me know.
>>>>>>>> 
>>>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>>>> following
>>>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>>>> library(affyQCReport);
>>>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be doing
>>>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s
>>>>>>>> been
>>>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>>>> 
>>>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>>>> separately
>>>>>>>> rather than as a whole.
>>>>>>>> 
>>>>>>>> Cordially,
>>>>>>>> Rick
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Bioconductor mailing list
>>>>>>>> Bioconductor at r-project.org
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>> Search the archives:
>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>> 
>>>> 
>> 

-- 
Rick Frausto
PhD Candidate
The University of Sydney
School of Molecular Bioscience G08
Camperdown, NSW 2006 AUSTRALIA
ricardo.frausto at sydney.edu.au
Phone: 61 2 9036 5354
Lab of Iain L. Campbell



More information about the Bioconductor mailing list