[BioC] Error in calculating P-values with Genefilter function
James W. MacDonald
jmacdon at uw.edu
Tue Jun 11 16:27:57 CEST 2013
And this brings me back to the admonition that you should always do
something like
class(X)
or
dim(X)
first. Depending on how you are running R, if X really were a 35555 x 7
matrix or data.frame and you just typed X at the prompt, you will get
the whole thing output to your screen (or up to the row limit set in
options). I run R under emacs, and sometimes it isn't possible to get R
to stop that nonsense without doing a kill command at a terminal prompt.
Best,
Jim
On 6/11/2013 10:21 AM, Bradley Cattrysse wrote:
> Hi Jim,
>
> I see what you mean, I was thinking it was giving me the number of observations in X. I will poke around some more, thanks again for the help!
> Brad
>
> ----- Original Message -----
> From: "James W. MacDonald"<jmacdon at uw.edu>
> To: "Bradley Cattrysse"<bcattrys at uoguelph.ca>
> Cc: Bioconductor at r-project.org
> Sent: Tuesday, June 11, 2013 10:13:37 AM
> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>
> Hi Brad,
>
> On 6/10/2013 10:34 AM, Bradley Cattrysse wrote:
>> Hi Jim,
>> Thanks for the additional help in trying to solve this problem. I used the option(error=recover) command and poked around like you said and found that probe 56 was giving the function a problem (like the NA in row 432 in yours). I removed that row from the data set and tried to re-run the p-value calculation to see if that would solve the problem. Although I think it solved that problem, I am now experiencing a different error with the function. There is a problem in the apply(expr, 1, flist) frame of genefilter:
>>
>>> Anova7_P0.01<-genefilter(check,Func7P0.01)
>> Error in apply(expr, 1, flist) : dim(X) must have a positive length
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: genefilter(check, Func7P0.01)
>> 2: apply(expr, 1, flist)
>>
>> Selection: 2
>> Called from: genefilter(check, Func7P0.01)
>> Browse[1]> ls()
>> [1] "dl" "FUN" "MARGIN" "X"
>> Browse[1]> X
>> [1] 35555 7
>> Browse[1]> dim(X)
>> NULL
> It doesn't say that the dimensions of X are 35555 x 7. It says that X is
> a vector with two numbers in it, (35555 and 7) and that the dimensions
> of X are NULL, which stands to reason as it is a vector, which has no
> dimensional attributes.
>
> You might try poking around in frame 1. Usually I get better results
> when I look one frame higher than I think I should.
>
> Best,
>
> Jim
>
>
>
>> It says that dim(X) must have a positive length. When I browse X it says it has 35555 rows and 7 columns, which is correct for the data set. But then when I browse the dimensions of X it says NULL. Im not sure why this is? Do you have any idea what I should do to problem shoot this?
>>
>> Thanks again I really appreciate the help troubleshooting!
>> Brad
>>
>>
>>
>> ----- Original Message -----
>> From: "James W. MacDonald"<jmacdon at uw.edu>
>> To: "Bradley Cattrysse"<bcattrys at uoguelph.ca>
>> Cc: Bioconductor at r-project.org
>> Sent: Tuesday, June 4, 2013 12:21:35 PM
>> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>>
>> Hi Brad,
>>
>> Please don't take things off-list (e.g., in future, use reply-all). We
>> like to think of the list archives as a searchable repository of
>> knowledge, and if we go off-list, that aspect is lost.
>>
>> On 6/4/2013 11:53 AM, Bradley Cattrysse wrote:
>>> Hi Jim,
>>>
>>> Thank you for the help. When I run the option(error=recover) it does show where the error is occurring, specifying that it is happening in fun(x) like when I use the traceback() function. Im not sure how to diagnose from there. We are analyzing an 8 array set, but we have deemed one array may be problematic. It works perfectly on the 8 array set, but when I drop one array I get the error. If you have any additional ideas that may help in diagnosing this problem the help would be greatly appreciated!
>> Ideally what will happen is that when you error out, you will be able to
>> figure out what the problem is by looking at the various frames that are
>> available to you. As an example (which indicates that my original idea
>> is not correct):
>>
>> dat<- matrix(rnorm(10000), ncol=10)
>> dat[432,1:5]<- NA ## make sure it will break
>> library(genefilter)
>> fact<- factor(rep(1:2, each=5))
>> f<- filterfun(Anova(fact, p=0.01))
>> options(error=recover)
>> genefilter(dat, f)
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: genefilter(dat, f)
>> 2: apply(expr, 1, flist)
>> 3: FUN(newX[, i], ...)
>> 4: fun(x)
>> 5: lm(x ~ cov)
>> 6: model.matrix(mt, mf, contrasts)
>> 7: model.matrix.default(mt, mf, contrasts)
>> 8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>>
>> Selection: 3 *<------------ I chose to enter frame #3*
>> Called from: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>> Browse[1]>*ls()<------------------------ What's in here?*
>> [1] "fun" "x"
>> Browse[1]> x *<---------------------- What is x?*
>> [1] NA NA NA NA NA 0.2737152
>> [7] 0.4907177 -0.1716024 0.2109492 1.0631105
>>
>> You can then hit enter and look at other frames. This isn't an exact
>> science. For example, frame 2 is hard to figure out:
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: genefilter(dat, f)
>> 2: apply(expr, 1, flist)
>> 3: FUN(newX[, i], ...)
>> 4: fun(x)
>> 5: lm(x ~ cov)
>> 6: model.matrix(mt, mf, contrasts)
>> 7: model.matrix.default(mt, mf, contrasts)
>> 8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
>>
>> Selection: 2
>> Called from: model.matrix.default(mt, mf, contrasts)
>> Browse[1]> ls()
>> [1] "ans" "d" "d2" "d.ans" "d.call" "dl" "dn"
>> [8] "dn.ans" "dn.call" "ds" "FUN" "i" "MARGIN" "newX"
>> [15] "s.ans" "s.call" "tmp" "X"
>>
>> That's a lot of stuff, and fairly cryptic. But we can get some info here:
>>
>> Browse[1]> i
>> [1] 432
>>
>> So we know this is row 432, where we put the NAs. You just need to poke
>> around in the various frames to try to figure out what is wrong with
>> your data, and why you get the errors. It is always safest to do
>> something like
>>
>> Browse[1]> class(X)
>> [1] "matrix"
>> Browse[1]> dim(X)
>> [1] 1000 10
>>
>> rather than just hitting X to see what it it, as sometimes these things
>> are really big and you might get stuck with lots of data being output to
>> your screen.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>
>>
>>> Thanks again,
>>> Brad
>>>
>>>
>>>
>>> ----- Original Message -----
>>> From: "James W. MacDonald"<jmacdon at uw.edu>
>>> To: "Brad Cattrysse [guest]"<guest at bioconductor.org>
>>> Cc: bioconductor at r-project.org, bcattrys at uoguelph.ca, "genefilter Maintainer"<maintainer at bioconductor.org>
>>> Sent: Monday, June 3, 2013 2:27:19 PM
>>> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>>>
>>> Hi Brad,
>>>
>>> On 6/3/2013 2:12 PM, Brad Cattrysse [guest] wrote:
>>>> To whom it may concern,
>>>>
>>>> I am having trouble with the genefilter function in R. I am attempting to extract genes from 7 arrays using a p-value of 0.01 using the following code:
>>>>
>>>> Func7P0.01<-filterfun(Anova(class7,p=0.01))
>>>> Func7P0.01
>>>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>>>> Anova7_P0.01
>>>>
>>>> Creating Func7P0.01 works fine, but when I run the genefilter using my data matrix and Func7P0.01 i get the following error.
>>>>
>>>>
>>>>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>>>> Error in if (fstat< p) return(TRUE) :
>>>> missing value where TRUE/FALSE needed
>>>>
>>>>
>>>> and when I runtraceback(), I get:
>>>>
>>>>> traceback()
>>>> 4: fun(x)
>>>> 3: FUN(newX[, i], ...)
>>>> 2: apply(expr, 1, flist)
>>>> 1: genefilter(SCDexprs7, Func7P0.01)
>>>>
>>>>
>>>> Im not entirely sure what is going on, but when I extract genes from the same 7 arrays, plus another array (8 arrays total) using the same code structure (below) it works fine.
>>> My best guess would be that you have some missing data for a particular
>>> gene, and when you only have seven arrays you get to a point where you
>>> don't have enough data of one type to fit a linear model, so the code here
>>>
>>> m1<- lm(x ~ cov)
>>> m2<- lm(x ~ 1)
>>> av<- anova(m2, m1)
>>>
>>> from Anova() breaks.
>>>
>>> Try doing
>>>
>>> options(error = recover)
>>>
>>> and then run genefilter. You will error out at the point where things
>>> are breaking, and can look at the variables being analyzed at that point
>>> to see what the problem is.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>> Func8P0.01<-filterfun(Anova(class8,p=0.01))
>>>> Func8P0.01
>>>> Anova8_P0.01<-genefilter(SCDexprs8,Func8P0.01)
>>>> Anova8_P0.01
>>>>
>>>>
>>>> Any help with this matter would be greatly appreciated as I am not sure what else to try.
>>>>
>>>> Thanks in advance!
>>>> Brad Cattrysse
>>>>
>>>>
>>>> -- output of sessionInfo():
>>>>
>>>>> sessionInfo()
>>>> R version 3.0.0 (2013-04-03)
>>>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>>>
>>>> locale:
>>>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] pd.mogene.1.1.st.v1_3.8.0 RSQLite_0.11.3
>>>> [3] DBI_0.2-6 ggplot2_0.9.3.1
>>>> [5] e1071_1.6-1 class_7.3-7
>>>> [7] pvac_1.8.0 pgmm_1.0
>>>> [9] mclust_4.1 cluster_1.14.4
>>>> [11] genefilter_1.42.0 oligoData_1.8.0
>>>> [13] oligo_1.24.0 Biobase_2.20.0
>>>> [15] oligoClasses_1.22.0 BiocGenerics_0.6.0
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] affxparser_1.32.0 affy_1.38.1 affyio_1.28.0
>>>> [4] annotate_1.38.0 AnnotationDbi_1.22.5 BiocInstaller_1.10.1
>>>> [7] Biostrings_2.28.0 bit_1.1-10 codetools_0.2-8
>>>> [10] colorspace_1.2-2 dichromat_2.0-0 digest_0.6.3
>>>> [13] ff_2.2-11 foreach_1.4.0 GenomicRanges_1.12.2
>>>> [16] grid_3.0.0 gtable_0.1.2 IRanges_1.18.0
>>>> [19] iterators_1.0.6 labeling_0.1 MASS_7.3-26
>>>> [22] munsell_0.4 plyr_1.8 preprocessCore_1.22.0
>>>> [25] proto_0.3-10 RColorBrewer_1.0-5 reshape2_1.2.2
>>>> [28] scales_0.2.3 splines_3.0.0 stats4_3.0.0
>>>> [31] stringr_0.6.2 survival_2.37-4 tools_3.0.0
>>>> [34] XML_3.95-0.2 xtable_1.7-1 zlibbioc_1.6.0
>>>> --
>>>> Sent via the guest posting facility at bioconductor.org.
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list