[BioC] Unable to Generate QC Report for mogene10stv1

Tue Jan 11 20:57:45 CET 2011

Thanks for all your help Jim!

On 11/01/11 6:58 AM, "James W. MacDonald" <jmacdon at med.umich.edu> wrote:

> Hi Rick,
> 
> On 1/10/2011 4:57 PM, Rick Frausto wrote:
>> Hi Jim,
>> 
>> You're right...
>> 
>>> any(duplicated(unlist(indexProbes(mydata, "both"))))
>> [1] TRUE
>>> 
>> 
>> Figured it would be something simple, almost always is. Guess since the MM
>> values are only really necessary for calculating a "real" PM value I should
>> generally still be ok with using R Bioconductor packages for downstream
>> analysis of these chips?? For example, using eset<-rma() to normalize my
>> data should still be ok.
> 
> Yep. RMA only uses PM values, so this will be fine. You only get into
> trouble when trying to use mas5 based methods.
> 
>> 
>> By the way, the documentation on the AffyQCReport function regarding
>> signalDist() states that "The first is a boxplot plot of the all pm
>> intensities and the second plot consists of kernel density estimates of
>> these intensities." From this it would seem to a novice like me that it only
>> uses PM values, clearly I'm not correct. I guess these are PM values
>> adjusted for the MM signal.
> 
> Nope, they aren't adjusted for MM, they just include the MM values as
> well. Here is a little primer on how to see what is going on.
> 
> If you load the affyQCReport package and then type signalDist at the R
> prompt, you will get this:
> 
>> signalDist
> function (object)
> {
>      par(mfrow = c(2, 1))
>      ArrayIndex = as.character(1:length(sampleNames(object)))
>      boxplot(object, names = ArrayIndex, ylab = "Log2(Intensity)",
>          xlab = "Array Index")
>      hist(x = object, lt = 1:length(ArrayIndex), col = 1:length(ArrayIndex),
>          which = "both")
>      temppar <- par()
>      legend(((temppar$xaxp[2] - temppar$xaxp[1])/temppar$xaxp[3]) *
>          (temppar$xaxp[3] - 1) + temppar$xaxp[1], temppar$yaxp[2],
>          as.character(ArrayIndex), lt = 1:length(ArrayIndex),
>          col = 1:length(ArrayIndex), cex = 0.5)
> }
> <environment: namespace:affyQCReport>
> 
> So you can see that we are calling boxplot() as well as hist() on the
> 'object', which is an AffyBatch. Let's see what boxplot() and hist() do.
> 
>> boxplot
> standardGeneric for "boxplot" defined from package "graphics"
> 
> function (x, ...)
> standardGeneric("boxplot")
> <environment: 0x184ea378>
> Methods may be defined for arguments: x
> Use  showMethods("boxplot")  for currently available ones.
> 
> So this is an S4 method, and the methods are slightly harder to get to,
> but let's follow the prescription on the last line.
> 
>> showMethods(boxplot, class = "AffyBatch", includeDefs = TRUE)
> Function: boxplot (package graphics)
> x="AffyBatch"
> function (x, ...)
> {
>      .local <- function (x, which = "both", range = 0, main, ...)
>      {
>          tmp <- description(x)
>          if (missing(main) && (is(tmp, "MIAME")))
>              main <- tmp at title
>          tmp <- unlist(indexProbes(x, which))
>          tmp <- tmp[seq(1, length(tmp), len = 5000)]
>          boxplot(data.frame(log2(intensity(x)[tmp, ])), main = main,
>              range = range, ...)
>      }
>      .local(x, ...)
> }
> 
> Note two things here. I added in class = "AffyBatch", because there may
> be other boxplot methods for other objects, and we really don't care
> about them. Additionally, I included includeDefs = TRUE, which will
> cause the function to be output.
> 
> The .local function has a default of which = 'both', and you see that
> argument is used for the call to indexProbes (also note that there is a
> '...' argument to .local, that could be used to pass in a which = "pm"
> in signalDist() to override the default, but it is not, so the help page
> is incorrect). If you look at ?indexProbes, you will see this in the
> methods section:
> 
> indexProbes 'signature(object = "AffyBatch", which =
>            "character")': returns a list with locations of the probes in
>            each probe set. The affyID corresponding to the probe set to
>            retrieve can be specified in an optional parameter
>            'genenames'. By default, all the affyIDs are retrieved. The
>            names of the elements in the list returned are the affyIDs.
>            'which' can be "pm", "mm", or "both". If "both" then perfect
>            match locations are given followed by mismatch locations.
> 
> The warning you get comes from here:
> 
> tmp <- unlist(indexProbes(x, which))
> tmp <- tmp[seq(1, length(tmp), len = 5000)]
> boxplot(data.frame(log2(intensity(x)[tmp, ])), main = main,
>              range = range, ...)
> 
> Which is basically getting a subset of 5000 probes to create the
> boxplot. Since half of your indices from indexProbes() will be NA, a
> bunch of the tmp variable will be NAs as well. We can re-create the
> warning you get below with a little example:
> 
>> x <- matrix(rnorm(100), ncol = 10)
>> row.names(x) <- letters[1:10]
>> z <- data.frame(x[c(1,2,3,NA,4,5,NA),])
> Warning message:
> In data.row.names(row.names, rowsi, i) :
>    some row.names duplicated: 7 --> row.names NOT used
> 
> Best,
> 
> Jim
> 
> 
>> 
>> Thanks for figuring this out for me. Let me know if these and other related
>> questions would be better served as standalone e-mails.
>> 
>> Cheers,
>> Rick
>> 
>> 
>> 
>> On 10/01/11 7:04 AM, "James W. MacDonald"<jmacdon at med.umich.edu>  wrote:
>> 
>>> Hi Rick,
>>> 
>>> After all that, the reason is really simple. You are trying to use
>>> affyQCReport on a PM-only chip, which isn't going to work out so well. I
>>> don't have any mogene data around to play with (and don't have the time
>>> to go searching), so I will have to make some educated guesses.
>>> 
>>> Internally in signalDist() you are calling boxplot() and hist() on your
>>> AffyBatch. And the default for both functions is to use both PM and MM
>>> probes. I'm betting that
>>> 
>>> any(duplicated(unlist(indexProbes(mydata, "both"))))
>>> 
>>> returns TRUE, indicating that indexProbes doesn't work correctly on a
>>> PM-only chip, which is fair enough, as it was never designed to do so.
>>> 
>>> And plot(qc(mydata)) will never work, as it relies on computing a
>>> Wilcoxon signed-rank between the PM and MM probes, and since you don't
>>> have MM probes, well you get the picture...
>>> 
>>> Best,
>>> 
>>> Jim
>>> 
>>> 
>>> 
>>> On 1/7/2011 6:56 PM, Rick Frausto wrote:
>>>> Hi Jim,
>>>> 
>>>> Ok, so after doing a bit of reading and re-reading I was eventually able to
>>>> generate each page in a quartz window that the "QCReport" function should
>>>> also generate. I found which ones give me the errors. So, there should be 6
>>>> pages in total. Page 2 gives me the duplication error and page 3 gives me
>>>> the error in evaluating the argument x. The other pages are ok and are
>>>> generated as expected.
>>>> 
>>>> In brief, page 2 is suppose to be generated with the "signalDist(mydata)"
>>>> command. Page 3 is suppose to generated with the "plot(qc(mydata))"
>>>> command.
>>>> 
>>>> So, I guess there must be particular requirements for these commands that
>>>> I'm missing.I've included the session below along with traceback() and
>>>> sessionInfo().
>>>> 
>>>> 
>>>> R version 2.12.0 (2010-10-15)
>>>> Copyright (C) 2010 The R Foundation for Statistical Computing
>>>> ISBN 3-900051-07-0
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>> 
>>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>>> You are welcome to redistribute it under certain conditions.
>>>> Type 'license()' or 'licence()' for distribution details.
>>>> 
>>>>     Natural language support but running in an English locale
>>>> 
>>>> R is a collaborative project with many contributors.
>>>> Type 'contributors()' for more information and
>>>> 'citation()' on how to cite R or R packages in publications.
>>>> 
>>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>>> 'help.start()' for an HTML browser interface to help.
>>>> Type 'q()' to quit R.
>>>> 
>>>> [R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]
>>>> 
>>>> [Workspace restored from /Users/rickfrausto/.RData]
>>>> [History restored from /Users/rickfrausto/.Rapp.history]
>>>> 
>>>>> library(simpleaffy)
>>>> Loading required package: affy
>>>> Loading required package: Biobase
>>>> 
>>>> Welcome to Bioconductor
>>>> 
>>>>     Vignettes contain introductory material. To view, type
>>>>     'openVignette()'. To cite Bioconductor, see
>>>>     'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>> 
>>>> Loading required package: genefilter
>>>> Loading required package: gcrma
>>>> 
>>>> Attaching package: 'simpleaffy'
>>>> 
>>>> The following object(s) are masked _by_ '.GlobalEnv':
>>>> 
>>>>       getBioC
>>>> 
>>>>> library(affy)
>>>>> mydata<- ReadAffy()
>>>>> eset<- rma(mydata)
>>>> Background correcting
>>>> Normalizing
>>>> Calculating Expression
>>>>> library(affycoretools); affystart(plot=T, express="rma")
>>>> Loading required package: GO.db
>>>> Loading required package: AnnotationDbi
>>>> Loading required package: DBI
>>>> Loading required package: KEGG.db
>>>> Background correcting
>>>> Normalizing
>>>> Calculating Expression
>>>> Please give the x-coordinate for a legend.30
>>>> Please give the y-coordinate for a legend.80
>>>> ExpressionSet (storageMode: lockedEnvironment)
>>>> assayData: 34760 features, 35 samples
>>>>     element names: exprs
>>>> protocolData
>>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>>     varLabels: ScanDate
>>>>     varMetadata: labelDescription
>>>> phenoData
>>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>>     varLabels: sample
>>>>     varMetadata: labelDescription
>>>> featureData: none
>>>> experimentData: use 'experimentData(object)'
>>>> Annotation: mogene10stv1
>>>>> write.exprs(eset, file="mydata.txt")
>>>>> x<- data.frame(exprs(eset), exprs(eset_PMA), assayDataElement(eset_PMA,
>>>> "se.exprs")); x<- x[,sort(names(x))]; write.table(x, file="mydata_PMA.xls",
>>>> quote=F, col.names = NA, sep="\t")
>>>> Error in exprs(eset_PMA) :
>>>>     error in evaluating the argument 'object' in selecting a method for
>>>> function 'exprs'
>>>>> mypm<- pm(mydata)
>>>>> mymm<- mm(mydata)
>>>>> myaffyids<- probeNames(mydata)
>>>>> result<- data.frame(myaffyids, mypm, mymm)
>>>>> eset; pData(eset)
>>>> ExpressionSet (storageMode: lockedEnvironment)
>>>> assayData: 34760 features, 35 samples
>>>>     element names: exprs
>>>> protocolData
>>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>>     varLabels: ScanDate
>>>>     varMetadata: labelDescription
>>>> phenoData
>>>>     sampleNames: A_WT1_NT_2hr.CEL B_WT1_NT_2hr.CEL ...
>>>>       ZI_ST1KO_HIL6_12hr.CEL (35 total)
>>>>     varLabels: sample
>>>>     varMetadata: labelDescription
>>>> featureData: none
>>>> experimentData: use 'experimentData(object)'
>>>> Annotation: mogene10stv1
>>>>                          sample
>>>> A_WT1_NT_2hr.CEL            1
>>>> B_WT1_NT_2hr.CEL            2
>>>> C_WT1_NT_12hr.CEL           3
>>>> D_WT1_NT_12hr.CEL           4
>>>> E_WT1_HIL6_2hr.CEL          5
>>>> F_WT1_HIL6_2hr.CEL          6
>>>> G_WT1_HIL6_12hr.CEL         7
>>>> H_WT1_HIL6_12hr.CEL         8
>>>> I_FF_NT_2hr.CEL             9
>>>> J_FF_NT_2hr.CEL            10
>>>> K_FF_NT_12hr.CEL           11
>>>> L_FF_NT_12hr.CEL           12
>>>> M_FF_HIL6_2hr.CEL          13
>>>> N_FF_HIL6_2hr.CEL          14
>>>> O_FF_HIL6_12hr.CEL         15
>>>> P_FF_HIL6_12hr.CEL         16
>>>> Q_WT2_NT_2hr.CEL           17
>>>> R_WT2_NT_2hr.CEL           18
>>>> S_WT2_NT_12hr.CEL          19
>>>> T_WT2_NT_12hr.CEL          20
>>>> U_WT2_HIL6_2hr.CEL         21
>>>> V_WT2_HIL6_2hr.CEL         22
>>>> W_WT2_HIL6_12hr.CEL        23
>>>> X_WT2_HIL6_12hr.CEL        24
>>>> Y_DD_NT_2hr.CEL            25
>>>> Z_DD_NT_2hr.CEL            26
>>>> ZA_DD_NT_12hr.CEL          27
>>>> ZB_DD_NT_12hr.CEL          28
>>>> ZC_DD_HIL6_2hr.CEL         29
>>>> ZD_DD_HIL6_2hr.CEL         30
>>>> ZE_DD_HIL6_12hr.CEL        31
>>>> ZF_DD_HIL6_12hr.CEL        32
>>>> ZG_ST1KO_NT_2hr.CEL        33
>>>> ZH_ST1KO_HIL6_2hr.CEL      34
>>>> ZI_ST1KO_HIL6_12hr.CEL     35
>>>>> data.frame(eset)
>>>>                          X10338001 X10338003 X10338004 X10338017 X10338025
>>>> A_WT1_NT_2hr.CEL        11.71717 10.183620  9.440631  12.79412  8.823529
>>>> B_WT1_NT_2hr.CEL        11.78778 10.027760  9.489226  12.98544  8.843002
>>>>                          X10338026 X10338029 X10338035 X10338036 X10338037
>>>> A_WT1_NT_2hr.CEL        13.22585  9.405038  8.853564  9.379031  3.661987
>>>> B_WT1_NT_2hr.CEL        13.29043  9.575309  8.772872  9.513050  3.514885
>>>>                          X10338041 X10338042 X10338044 X10338047 X10338056
>>>> A_WT1_NT_2hr.CEL        10.94638 10.116516  11.88296  8.872839  3.133222
>>>> B_WT1_NT_2hr.CEL        11.23276 10.134084  12.03381  7.568584  3.088548
>>>>                          X10338059 X10338060 X10338063 X10338064 X10338065
>>>> 
>>>> JIM, I TRUNCATED THIS LIST, BUT THOUGHT IT MIGHT BE USEFUL IN DIAGNOSING
>>>> THE
>>>> PROBLEMS I'M HAVING. SESSION IS CONTINUED BELOW.
>>>> 
>>>>> library(affyQCReport)
>>>> Loading required package: lattice
>>>>> titlePage(mydata)
>>>> [1] TRUE
>>>>> signalDist(mydata)
>>>> Warning message:
>>>> In data.row.names(row.names, rowsi, i) :
>>>>     some row.names duplicated:
>>>> 
4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,>>>>
5
>>>> 
4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,>>>>
1
>>>> 
03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,>>>>
1
>>>> 
47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,>>>>
1
>>>> 
73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,>>>>
2
>>>> 
10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,>>>>
2
>>>> 
52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,>>>>
2
>>>> 
96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,>>>>
3
>>>> 
38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,>>>>
3
>>>> 
82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,>>>>
4
>>>> 
07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,>>>>
4
>>>> 
49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,>>>>
4
>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>> truncated]
>>>>> plot(qc(mydata))
>>>> Error in plot(qc(mydata)) :
>>>>     error in evaluating the argument 'x' in selecting a method for function
>>>> 'plot'
>>>>> borderQC1(mydata)
>>>> [1] TRUE
>>>>> borderQC2(mydata)
>>>> [1] TRUE
>>>>> correlationPlot(mydata)
>>>> [1] TRUE
>>>>> titlePage(mydata)
>>>> [1] TRUE
>>>>> titlePage(mydata)
>>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>>     plot.new has not been called yet
>>>>> correlationPlot(mydata)
>>>> [1] TRUE
>>>>> titlePage(mydata)
>>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>>     plot.new has not been called yet
>>>> In addition: Warning message:
>>>> Display list redraw incomplete
>>>>> borderQC1(mydata)
>>>> [1] TRUE
>>>>> titlePage(mydata)
>>>> [1] TRUE
>>>>> titlePage(mydata)
>>>> Error in polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05)) :
>>>>     plot.new has not been called yet
>>>>> traceback()
>>>> 2: polygon(c(0, 0, 0.9, 0.9, 0), c(0.05, 0.95, 0.95, 0.05, 0.05))
>>>> 1: titlePage(mydata)
>>>>> sessionInfo()
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>> 
>>>> locale:
>>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>> 
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>> 
>>>> other attached packages:
>>>>    [1] affyQCReport_1.28.1   lattice_0.19-13       affycoretools_1.22.0
>>>>    [4] KEGG.db_2.4.5         GO.db_2.4.5           RSQLite_0.9-4
>>>>    [7] DBI_0.2-5             AnnotationDbi_1.12.0  mogene10stv1cdf_2.7.0
>>>> [10] simpleaffy_2.26.1     gcrma_2.22.0          genefilter_1.32.0
>>>> [13] affy_1.28.0           Biobase_2.10.0
>>>> 
>>>> loaded via a namespace (and not attached):
>>>>    [1] affyio_1.18.0         affyPLM_1.26.0        annaffy_1.22.0
>>>>    [4] annotate_1.28.0       biomaRt_2.6.0         Biostrings_2.18.2
>>>>    [7] Category_2.16.0       GOstats_2.16.0        graph_1.28.0
>>>> [10] grid_2.12.0           GSEABase_1.12.2       IRanges_1.8.7
>>>> [13] limma_3.6.9           preprocessCore_1.12.0 RBGL_1.26.0
>>>> [16] RColorBrewer_1.0-2    RCurl_1.4-3           splines_2.12.0
>>>> [19] survival_2.36-2       tools_2.12.0          XML_3.2-0
>>>> [22] xtable_1.5-6
>>>>> 
>>>> 
>>>> On 7/01/11 12:47 PM, "James W. MacDonald"<jmacdon at med.umich.edu>   wrote:
>>>> 
>>>>> Hi Rick,
>>>>> 
>>>>> What happens if you load the simpleaffy package first?
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Jim
>>>>> 
>>>>> On 1/7/2011 2:14 PM, Rick Frausto wrote:
>>>>>> Hi James,
>>>>>> 
>>>>>> Below is the information that you requested - traceback() and
>>>>>> sessioninfo().
>>>>>> Doesn't seem like much to me, but perhaps you can help. As you answer to
>>>>>> a
>>>>>> lot of e-mails, thought I'd remind you that this is in regards to the
>>>>>> "some
>>>>>> row.names duplicated" error.
>>>>>> 
>>>>>> Hope your holidays were good!
>>>>>> 
>>>>>> -Rick
>>>>>> 
>>>>>> [R.app GUI 1.35 (5632) x86_64-apple-darwin9.8.0]
>>>>>> 
>>>>>> [Workspace restored from /Users/rickfrausto/.RData]
>>>>>> [History restored from /Users/rickfrausto/.Rapp.history]
>>>>>> 
>>>>>>> library(affy)
>>>>>> Loading required package: Biobase
>>>>>> 
>>>>>> Welcome to Bioconductor
>>>>>> 
>>>>>>      Vignettes contain introductory material. To view, type
>>>>>>      'openVignette()'. To cite Bioconductor, see
>>>>>>      'citation("Biobase")' and for packages 'citation(pkgname)'.
>>>>>> 
>>>>>>> mydata<- ReadAffy()
>>>>>>> eset<- rma(mydata)
>>>>>> Background correcting
>>>>>> Normalizing
>>>>>> Calculating Expression
>>>>>>> write.exprs(eset, file="mydata.txt")
>>>>>>> mypm<- pm(mydata)
>>>>>>> mymm<- mm(mydata)
>>>>>>> myaffyids<- probeNames(mydata)
>>>>>>> result<- data.frame(myaffyids, mypm, mymm)
>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>> Loading required package: lattice
>>>>>> Warning message:
>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>      some row.names duplicated:
>>>>>> 
>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,52,53,>>
>> >>
>> 5
>>>>>> 
>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,102,>>
>> >>
>> 1
>>>>>> 
>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,141,142,>>
>> >>
>> 1
>>>>>> 
>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,170,171,>>
>> >>
>> 1
>>>>>> 
>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,206,207,>>
>> >>
>> 2
>>>>>> 
>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,250,251,>>
>> >>
>> 2
>>>>>> 
>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,291,292,>>
>> >>
>> 2
>>>>>> 
>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,334,337,>>
>> >>
>> 3
>>>>>> 
>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,376,378,>>
>> >>
>> 3
>>>>>> 
>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,405,406,>>
>> >>
>> 4
>>>>>> 
>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,445,447,>>
>> >>
>> 4
>>>>>> 
>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,493,494,>>
>> >>
>> 4
>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>> truncated]
>>>>>> Error in plot(qc(object)) :
>>>>>>      error in evaluating the argument 'x' in selecting a method for
>>>>>> function
>>>>>> 'plot'
>>>>>>> traceback()
>>>>>> 2: plot(qc(object))
>>>>>> 1: QCReport(mydata, file = "ExampleQC.pdf")
>>>>>>> sessionInfo()
>>>>>> R version 2.12.0 (2010-10-15)
>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>> 
>>>>>> locale:
>>>>>> [1] en_AU.UTF-8/en_AU.UTF-8/C/C/en_AU.UTF-8/en_AU.UTF-8
>>>>>> 
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>> 
>>>>>> other attached packages:
>>>>>> [1] affyQCReport_1.28.1   latptice_0.19-13       mogene10stv1cdf_2.7.0
>>>>>> [4] affy_1.28.0           Biobase_2.10.0
>>>>>> 
>>>>>> loaded via a namespace (and not attached):
>>>>>>     [1] affyio_1.18.0         affyPLM_1.26.0        annotate_1.28.0
>>>>>>     [4] AnnotationDbi_1.12.0  Biostrings_2.18.2     DBI_0.2-5
>>>>>>     [7] gcrma_2.22.0          genefilter_1.32.0     grid_2.12.0
>>>>>> [10] IRanges_1.8.7         preprocessCore_1.12.0 RColorBrewer_1.0-2
>>>>>> [13] RSQLite_0.9-4         simpleaffy_2.26.1     splines_2.12.0
>>>>>> [16] survival_2.36-2       tools_2.12.0          xtable_1.5-6
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On 20/12/10 6:33 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi Rick,
>>>>>>> 
>>>>>>> On 12/17/2010 9:24 PM, Rick Frausto wrote:
>>>>>>>> Hey Jim,
>>>>>>>> 
>>>>>>>> Ok, I will give that a go. The only problem is an ExpressionSet
>>>>>>>> contains
>>>>>>>> all
>>>>>>>> of the necessary information for further analysis (e.g. phenodata,
>>>>>>>> featuredata and annotation, etc - including, treatment type, cell type,
>>>>>>>> time
>>>>>>>> points, replicates). I am still learning how to include all of these
>>>>>>>> for
>>>>>>>> a
>>>>>>>> complete ExpressionSet. As a starting point I've loaded a txt file
>>>>>>>> containing some of this information (gene abbrev, ontology, probeset
>>>>>>>> ID)
>>>>>>>> which I created using Affymetrix's Expression Console software, without
>>>>>>>> replicate, time point and cell type info. Doing this I've gotten as far
>>>>>>>> as
>>>>>>>> creating a minimal ExpressionSet, which I guess the functions you
>>>>>>>> mention
>>>>>>>> below do just that but with the information contained in the CEL file
>>>>>>>> only.
>>>>>>>> 
>>>>>>>> In any case, since as you say, the functions in the online manual
>>>>>>>> create
>>>>>>>> a
>>>>>>>> proper ExpressionSet why would I get the issue of duplication?
>>>>>>> 
>>>>>>> Oh yeah, the original question ;-D. Try running QCreport() again, and
>>>>>>> when it errors out run traceback() and send the output. Also include the
>>>>>>> output of sessionInfo().
>>>>>>> 
>>>>>>> Jim
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> In regards to the 64-bit discussion. It may have very well made enough
>>>>>>>> of
>>>>>>>> a
>>>>>>>> difference as it did not come up with the memory error the last time I
>>>>>>>> tried
>>>>>>>> it. Going to upgrade to 8GB RAM anyways, can't hurt.
>>>>>>>> 
>>>>>>>> Cheers,
>>>>>>>> Rick
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 17/12/10 7:20 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Rick,
>>>>>>>>> 
>>>>>>>>> On 12/16/2010 4:13 PM, Rick Frausto wrote:
>>>>>>>>>> Hi Jim,
>>>>>>>>>> 
>>>>>>>>>> How do I run an RMA analysis without a proper ExpresionSet? Honest
>>>>>>>>>> answer,
>>>>>>>>>> I
>>>>>>>>>> don't know, I just put in a command line from a manual I found online
>>>>>>>>>> and
>>>>>>>>>> it
>>>>>>>>>> spit out some result- see #3 Affy packages in following link (
>>>>>>>>>> http://manuals.bioinformatics.ucr.edu/home/R_BioCondManual#biocon_int
>>>>>>>>>> ro
>>>>>>>>>> ).
>>>>>>>>> 
>>>>>>>>> You are mistaken. All of the functions mentioned there result in a
>>>>>>>>> proper ExpressionSet. And if you just do
>>>>>>>>> 
>>>>>>>>> abatch<- ReadAffy()
>>>>>>>>> eset<- rma(abatch)
>>>>>>>>> 
>>>>>>>>> Then you will 100% surely get an ExpressionSet.
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Perhaps you don't need an ExpressionSet until after the
>>>>>>>>>> preprocessing,
>>>>>>>>>> at
>>>>>>>>>> least that is what I get from the "An Introduction to Bioconductor's
>>>>>>>>>> ExpressionSet Class" written by Seth Falcon, Martin Morgan and Robert
>>>>>>>>>> Gentleman. Everything seemed to be going smoothly until I tried to
>>>>>>>>>> get
>>>>>>>>>> a
>>>>>>>>>> QC
>>>>>>>>>> Report.
>>>>>>>>>> 
>>>>>>>>>> Now, the answer for why I would want to do such a thing is easy.
>>>>>>>>>> Simply
>>>>>>>>>> that
>>>>>>>>>> I don't know any better :) Just started working with R a few days
>>>>>>>>>> ago,
>>>>>>>>>> but
>>>>>>>>>> I'm learning.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Apparently Snow Leopard running on 32bit can only utilize about 3.2GB
>>>>>>>>>> of
>>>>>>>>>> RAM, whereas 64bit can make use of all 4GB. I'll switch to the 64 bit
>>>>>>>>>> OS
>>>>>>>>>> and
>>>>>>>>>> see if it makes a difference.
>>>>>>>>> 
>>>>>>>>> Well, it won't be much different. The reason a 32-bit OS can only use
>>>>>>>>> about 3.2 Gb of RAM is that the OS needs some to run. The 64-bit OS
>>>>>>>>> also
>>>>>>>>> needs to use some RAM, so you won't get all 4 Gb there either. The
>>>>>>>>> issue
>>>>>>>>> is how much RAM can be allocated to a single process, and on a 64-bit
>>>>>>>>> OS
>>>>>>>>> that gets bumped up significantly.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> 
>>>>>>>>> Jim
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Thanks for your insight!
>>>>>>>>>> 
>>>>>>>>>> Cheers,
>>>>>>>>>> Rick
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 16/12/10 11:31 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Rick,
>>>>>>>>>>> 
>>>>>>>>>>> On 12/16/2010 12:57 PM, Rick Frausto wrote:
>>>>>>>>>>>> Thanks Jim! How much memory would I need, I currently have 4GB, but
>>>>>>>>>>>> have
>>>>>>>>>>>> quite a few other programs running in the background...I'll see if
>>>>>>>>>>>> closing
>>>>>>>>>>>> them helps. Perhaps setting up an "ExpressionSet" would solve the
>>>>>>>>>>>> problem.
>>>>>>>>>>>> I
>>>>>>>>>>>> just started reading up on how to set one of these up yesterday.
>>>>>>>>>>>> Will
>>>>>>>>>>>> do
>>>>>>>>>>>> this and see if the duplicates will go away.
>>>>>>>>>>>> 
>>>>>>>>>>>> The "mydata" originates from CEL files and then I run the RMA
>>>>>>>>>>>> analysis
>>>>>>>>>>>> on
>>>>>>>>>>>> it, but I didn't actually set up a proper ExpressionSet. I'm
>>>>>>>>>>>> guessing
>>>>>>>>>>>> that
>>>>>>>>>>>> doing this might reduce the QCReport PDF file size quite
>>>>>>>>>>>> considerably
>>>>>>>>>>>> since
>>>>>>>>>>>> I won't have any duplication and will make further analysis easier.
>>>>>>>>>>> 
>>>>>>>>>>> How do you run an RMA analysis without setting up a proper
>>>>>>>>>>> ExpressionSet? The default behavior is to create one. In addition,
>>>>>>>>>>> why
>>>>>>>>>>> would you want to do such a thing? The ExpressionSet class is
>>>>>>>>>>> specifically designed to contain these sorts of data.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm running Snow Leopard OSX which can be set up as 64bit. Would
>>>>>>>>>>>> running
>>>>>>>>>>>> as
>>>>>>>>>>>> 64bit still necessitate more RAM?
>>>>>>>>>>> 
>>>>>>>>>>> Probably. The difference isn't efficiency, but the ability to
>>>>>>>>>>> address
>>>>>>>>>>> more RAM. A 32-bit OS can still address all the available memory
>>>>>>>>>>> that
>>>>>>>>>>> you will have with just 4 Gb RAM, so you need to bump that up if you
>>>>>>>>>>> want to do all the chips together. As for how much, I don't know.
>>>>>>>>>>> Since
>>>>>>>>>>> RAM isn't that expensive these days, you might look at maxing your
>>>>>>>>>>> box
>>>>>>>>>>> out.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> 
>>>>>>>>>>> Jim
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks again,
>>>>>>>>>>>> Rick
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 15/12/10 7:45 AM, "James W. MacDonald"<jmacdon at med.umich.edu>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi Rick,
>>>>>>>>>>>> 
>>>>>>>>>>>> On 12/14/2010 3:55 PM, Rick Frausto wrote:
>>>>>>>>>>>> Dear All,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have recently entered the world of R. Through some trial and
>>>>>>>>>>>> error
>>>>>>>>>>>> I'm
>>>>>>>>>>>> becoming more familiar with R and the relevant Bioconductor Affy
>>>>>>>>>>>> packages.
>>>>>>>>>>>> I¹m a molecular and cell biologist with rudimentary statistical
>>>>>>>>>>>> knowledge
>>>>>>>>>>>> and even less knowledge with respect to R.
>>>>>>>>>>>> 
>>>>>>>>>>>> When I enter the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> library(affyQCReport); QCReport(mydata, file="ExampleQC.pdf")
>>>>>>>>>>>> 
>>>>>>>>>>>> I get some errors in return.
>>>>>>>>>>>> 
>>>>>>>>>>>> Loading required package: lattice
>>>>>>>>>>>> Error: cannot allocate vector of size 437.4 Mb
>>>>>>>>>>>> 
>>>>>>>>>>>> This indicates that you need more RAM, as you are running out of
>>>>>>>>>>>> memory.
>>>>>>>>>>>> 
>>>>>>>>>>>> In addition: Warning message:
>>>>>>>>>>>> In data.row.names(row.names, rowsi, i) :
>>>>>>>>>>>>          some row.names duplicated:
>>>>>>>>>>>> 
>>>>>>>>>> 4,8,9,13,14,15,16,24,25,26,27,28,29,30,31,36,37,38,39,47,48,49,50,51,
>>>>>>>>>> 52
>>>>>>>>>> ,5
>>>>>>>>>> 3,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 5
>>>>>>>>>>>> 
>>>>>>>>>> 4,58,59,60,64,65,66,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,9
>>>>>>>>>> 9,
>>>>>>>>>> 10
>>>>>>>>>> 2,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 1
>>>>>>>>>>>> 
>>>>>>>>>> 03,104,108,109,110,111,114,119,120,121,122,127,134,136,137,138,139,14
>>>>>>>>>> 1,
>>>>>>>>>> 14
>>>>>>>>>> 2,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 1
>>>>>>>>>>>> 
>>>>>>>>>> 47,148,149,152,153,156,157,158,159,162,163,164,165,166,167,168,169,17
>>>>>>>>>> 0,
>>>>>>>>>> 17
>>>>>>>>>> 1,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 1
>>>>>>>>>>>> 
>>>>>>>>>> 73,175,176,179,180,183,184,185,186,191,192,195,197,198,199,200,202,20
>>>>>>>>>> 6,
>>>>>>>>>> 20
>>>>>>>>>> 7,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 2
>>>>>>>>>>>> 
>>>>>>>>>> 10,219,220,227,228,229,230,233,234,235,240,241,243,245,246,248,249,25
>>>>>>>>>> 0,
>>>>>>>>>> 25
>>>>>>>>>> 1,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 2
>>>>>>>>>>>> 
>>>>>>>>>> 52,253,257,259,260,266,271,272,276,277,280,281,284,286,287,289,290,29
>>>>>>>>>> 1,
>>>>>>>>>> 29
>>>>>>>>>> 2,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 2
>>>>>>>>>>>> 
>>>>>>>>>> 96,297,298,302,304,305,306,310,311,312,313,317,318,319,321,322,324,33
>>>>>>>>>> 4,
>>>>>>>>>> 33
>>>>>>>>>> 7,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 3
>>>>>>>>>>>> 
>>>>>>>>>> 38,339,340,341,345,346,350,351,356,359,362,364,366,367,370,371,373,37
>>>>>>>>>> 6,
>>>>>>>>>> 37
>>>>>>>>>> 8,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 3
>>>>>>>>>>>> 
>>>>>>>>>> 82,383,384,385,386,387,388,389,391,394,395,397,398,399,400,402,403,40
>>>>>>>>>> 5,
>>>>>>>>>> 40
>>>>>>>>>> 6,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 4
>>>>>>>>>>>> 
>>>>>>>>>> 07,409,410,411,415,416,418,419,425,431,432,433,434,435,440,441,443,44
>>>>>>>>>> 5,
>>>>>>>>>> 44
>>>>>>>>>> 7,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 4
>>>>>>>>>>>> 
>>>>>>>>>> 49,450,452,454,455,456,461,464,466,470,472,473,481,487,488,491,492,49
>>>>>>>>>> 3,
>>>>>>>>>> 49
>>>>>>>>>> 4,
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 4
>>>>>>>>>>>> 95,496,497,498,499,501,502,504,506,507,509,511,513,515,516,51 [...
>>>>>>>>>>>> truncated]
>>>>>>>>>>>> 
>>>>>>>>>>>> What exactly is 'mydata', and how did you generate it? The above
>>>>>>>>>>>> error
>>>>>>>>>>>> indicates that you have duplicate row names, which IIRC isn't
>>>>>>>>>>>> possible
>>>>>>>>>>>> to do with an expressionSet.
>>>>>>>>>>>> 
>>>>>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>>>>>> code=12)
>>>>>>>>>>>> *** error: can't allocate region
>>>>>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>>>>>> R(9062,0xa05c5540) malloc: *** mmap(size=458665984) failed (error
>>>>>>>>>>>> code=12)
>>>>>>>>>>>> *** error: can't allocate region
>>>>>>>>>>>> *** set a breakpoint in malloc_error_break to debug
>>>>>>>>>>>> 
>>>>>>>>>>>> More lack of memory errors.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Error in help(dt[i], package = pkg[i], htmlhelp = TRUE) :
>>>>>>>>>>>>          unused argument(s) (htmlhelp = TRUE)
>>>>>>>>>>>> In addition: Warning messages:
>>>>>>>>>>>> 1: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>>>>>          datasets have been moved from package 'base' to package
>>>>>>>>>>>> 'datasets'
>>>>>>>>>>>> 2: In data(package = .packages(all.available = TRUE)) :
>>>>>>>>>>>>          datasets have been moved from package 'stats' to package
>>>>>>>>>>>> 'datasets'
>>>>>>>>>>>> starting httpd help server ... done
>>>>>>>>>>>> 
>>>>>>>>>>>> Would someone be able to diagnose the problem and suggest a
>>>>>>>>>>>> solution?
>>>>>>>>>>>> 
>>>>>>>>>>>> First, get more RAM. Second, you will be better off using a 64-bit
>>>>>>>>>>>> OS.
>>>>>>>>>>>> Depending on your hardware, you might be able to just install a
>>>>>>>>>>>> 64-bit
>>>>>>>>>>>> version of R.
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> 
>>>>>>>>>>>> Jim
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> If it is useful, I am using the following R software: R for Mac OS
>>>>>>>>>>>> X
>>>>>>>>>>>> GUI
>>>>>>>>>>>> 1.35-dev Leopard build 32-bit. If there is any other info that
>>>>>>>>>>>> would
>>>>>>>>>>>> be
>>>>>>>>>>>> useful please let me know.
>>>>>>>>>>>> 
>>>>>>>>>>>> I had a read of the AffyQCReport Package pdf and I have added the
>>>>>>>>>>>> following
>>>>>>>>>>>> line: QCReport(ReadAffy(widget=TRUE)). Then I tried
>>>>>>>>>>>> library(affyQCReport);
>>>>>>>>>>>> QCReport(mydata, file="ExampleQC.pdf") again. It now seems to be
>>>>>>>>>>>> doing
>>>>>>>>>>>> something, in other words it doesn¹t go to the error, yet, but it¹s
>>>>>>>>>>>> been
>>>>>>>>>>>> processing for about 10 minutes. I am analyzing 35 chips.
>>>>>>>>>>>> 
>>>>>>>>>>>> Perhaps it would work if I tried to generate each QCReport page
>>>>>>>>>>>> separately
>>>>>>>>>>>> rather than as a whole.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cordially,
>>>>>>>>>>>> Rick
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>>>> Search the archives:
>>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>> 
>> 

-- 
Rick Frausto
PhD Candidate
The University of Sydney
School of Molecular Bioscience G08
Camperdown, NSW 2006 AUSTRALIA
ricardo.frausto at sydney.edu.au
Phone: 61 2 9036 5354
Lab of Iain L. Campbell