[BioC] Error in calculating P-values with Genefilter function

Bradley Cattrysse bcattrys at uoguelph.ca
Mon Jun 10 16:34:01 CEST 2013


Hi Jim,
Thanks for the additional help in trying to solve this problem. I used the option(error=recover) command and poked around like you said and found that probe 56 was giving the function a problem (like the NA in row 432 in yours). I removed that row from the data set and tried to re-run the p-value calculation to see if that would solve the problem. Although I think it solved that problem, I am now experiencing a different error with the function. There is a problem in the apply(expr, 1, flist) frame of genefilter:

> Anova7_P0.01<-genefilter(check,Func7P0.01)
Error in apply(expr, 1, flist) : dim(X) must have a positive length

Enter a frame number, or 0 to exit   

1: genefilter(check, Func7P0.01)
2: apply(expr, 1, flist)

Selection: 2
Called from: genefilter(check, Func7P0.01)
Browse[1]> ls()
[1] "dl"     "FUN"    "MARGIN" "X"     
Browse[1]> X
[1] 35555     7
Browse[1]> dim(X)
NULL


It says that dim(X) must have a positive length. When I browse X it says it has 35555 rows and 7 columns, which is correct for the data set. But then when I browse the dimensions of X it says NULL. Im not sure why this is? Do you have any idea what I should do to problem shoot this?

Thanks again I really appreciate the help troubleshooting!
Brad



----- Original Message -----
From: "James W. MacDonald" <jmacdon at uw.edu>
To: "Bradley Cattrysse" <bcattrys at uoguelph.ca>
Cc: Bioconductor at r-project.org
Sent: Tuesday, June 4, 2013 12:21:35 PM
Subject: Re: [BioC] Error in calculating P-values with Genefilter function

Hi Brad,

Please don't take things off-list (e.g., in future, use reply-all). We 
like to think of the list archives as a searchable repository of 
knowledge, and if we go off-list, that aspect is lost.

On 6/4/2013 11:53 AM, Bradley Cattrysse wrote:
> Hi Jim,
>
> Thank you for the help. When I run the option(error=recover) it does show where the error is occurring, specifying that it is happening in fun(x) like when I use the traceback() function. Im not sure how to diagnose from there. We are analyzing an 8 array set, but we have deemed one array may be problematic. It works perfectly on the 8 array set, but when I drop one array I get the error. If you have any additional ideas that may help in diagnosing this problem the help would be greatly appreciated!

Ideally what will happen is that when you error out, you will be able to 
figure out what the problem is by looking at the various frames that are 
available to you. As an example (which indicates that my original idea 
is not correct):

dat <- matrix(rnorm(10000), ncol=10)
dat[432,1:5] <- NA ## make sure it will break
library(genefilter)
fact <- factor(rep(1:2, each=5))
f <- filterfun(Anova(fact, p=0.01))
options(error=recover)
genefilter(dat, f)

Enter a frame number, or 0 to exit

1: genefilter(dat, f)
2: apply(expr, 1, flist)
3: FUN(newX[, i], ...)
4: fun(x)
5: lm(x ~ cov)
6: model.matrix(mt, mf, contrasts)
7: model.matrix.default(mt, mf, contrasts)
8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])

Selection: 3 *<------------ I chose to enter frame #3*
Called from: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])
Browse[1]>*ls() <------------------------ What's in here?*
[1] "fun" "x"
Browse[1]> x *<---------------------- What is x?*
  [1]         NA         NA         NA         NA         NA  0.2737152
  [7]  0.4907177 -0.1716024  0.2109492  1.0631105

You can then hit enter and look at other frames. This isn't an exact 
science. For example, frame 2 is hard to figure out:

Enter a frame number, or 0 to exit

1: genefilter(dat, f)
2: apply(expr, 1, flist)
3: FUN(newX[, i], ...)
4: fun(x)
5: lm(x ~ cov)
6: model.matrix(mt, mf, contrasts)
7: model.matrix.default(mt, mf, contrasts)
8: `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]])

Selection: 2
Called from: model.matrix.default(mt, mf, contrasts)
Browse[1]> ls()
  [1] "ans"     "d"       "d2"      "d.ans"   "d.call"  "dl"      "dn"
  [8] "dn.ans"  "dn.call" "ds"      "FUN"     "i"       "MARGIN"  "newX"
[15] "s.ans"   "s.call"  "tmp"     "X"

That's a lot of stuff, and fairly cryptic. But we can get some info here:

Browse[1]> i
[1] 432

So we know this is row 432, where we put the NAs. You just need to poke 
around in the various frames to try to figure out what is wrong with 
your data, and why you get the errors. It is always safest to do 
something like

Browse[1]> class(X)
[1] "matrix"
Browse[1]> dim(X)
[1] 1000   10

rather than just hitting X to see what it it, as sometimes these things 
are really big and you might get stuck with lots of data being output to 
your screen.

Best,

Jim





>
> Thanks again,
> Brad
>
>
>
> ----- Original Message -----
> From: "James W. MacDonald"<jmacdon at uw.edu>
> To: "Brad Cattrysse [guest]"<guest at bioconductor.org>
> Cc: bioconductor at r-project.org, bcattrys at uoguelph.ca, "genefilter Maintainer"<maintainer at bioconductor.org>
> Sent: Monday, June 3, 2013 2:27:19 PM
> Subject: Re: [BioC] Error in calculating P-values with Genefilter function
>
> Hi Brad,
>
> On 6/3/2013 2:12 PM, Brad Cattrysse [guest] wrote:
>> To whom it may concern,
>>
>> I am having trouble with the genefilter function in R. I am attempting to extract genes from 7 arrays using a p-value of 0.01 using the following code:
>>
>> Func7P0.01<-filterfun(Anova(class7,p=0.01))
>> Func7P0.01
>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>> Anova7_P0.01
>>
>> Creating Func7P0.01 works fine, but when I run the genefilter using my data matrix and Func7P0.01 i get the following error.
>>
>>
>>> Anova7_P0.01<-genefilter(SCDexprs7,Func7P0.01)
>> Error in if (fstat<   p) return(TRUE) :
>>     missing value where TRUE/FALSE needed
>>
>>
>> and when I runtraceback(), I get:
>>
>>> traceback()
>> 4: fun(x)
>> 3: FUN(newX[, i], ...)
>> 2: apply(expr, 1, flist)
>> 1: genefilter(SCDexprs7, Func7P0.01)
>>
>>
>> Im not entirely sure what is going on, but when I extract genes from the same 7 arrays, plus another array (8 arrays total) using the same code structure (below) it works fine.
> My best guess would be that you have some missing data for a particular
> gene, and when you only have seven arrays you get to a point where you
> don't have enough data of one type to fit a linear model, so the code here
>
>          m1<- lm(x ~ cov)
>           m2<- lm(x ~ 1)
>           av<- anova(m2, m1)
>
> from Anova() breaks.
>
> Try doing
>
> options(error = recover)
>
> and then run genefilter. You will error out at the point where things
> are breaking, and can look at the variables being analyzed at that point
> to see what the problem is.
>
> Best,
>
> Jim
>
>
>
>>
>> Func8P0.01<-filterfun(Anova(class8,p=0.01))
>> Func8P0.01
>> Anova8_P0.01<-genefilter(SCDexprs8,Func8P0.01)
>> Anova8_P0.01
>>
>>
>> Any help with this matter would be greatly appreciated as I am not sure what else to try.
>>
>> Thanks in advance!
>> Brad Cattrysse
>>
>>
>>    -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>    [1] pd.mogene.1.1.st.v1_3.8.0 RSQLite_0.11.3
>>    [3] DBI_0.2-6                 ggplot2_0.9.3.1
>>    [5] e1071_1.6-1               class_7.3-7
>>    [7] pvac_1.8.0                pgmm_1.0
>>    [9] mclust_4.1                cluster_1.14.4
>> [11] genefilter_1.42.0         oligoData_1.8.0
>> [13] oligo_1.24.0              Biobase_2.20.0
>> [15] oligoClasses_1.22.0       BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>>    [1] affxparser_1.32.0     affy_1.38.1           affyio_1.28.0
>>    [4] annotate_1.38.0       AnnotationDbi_1.22.5  BiocInstaller_1.10.1
>>    [7] Biostrings_2.28.0     bit_1.1-10            codetools_0.2-8
>> [10] colorspace_1.2-2      dichromat_2.0-0       digest_0.6.3
>> [13] ff_2.2-11             foreach_1.4.0         GenomicRanges_1.12.2
>> [16] grid_3.0.0            gtable_0.1.2          IRanges_1.18.0
>> [19] iterators_1.0.6       labeling_0.1          MASS_7.3-26
>> [22] munsell_0.4           plyr_1.8              preprocessCore_1.22.0
>> [25] proto_0.3-10          RColorBrewer_1.0-5    reshape2_1.2.2
>> [28] scales_0.2.3          splines_3.0.0         stats4_3.0.0
>> [31] stringr_0.6.2         survival_2.37-4       tools_3.0.0
>> [34] XML_3.95-0.2          xtable_1.7-1          zlibbioc_1.6.0
>> --
>> Sent via the guest posting facility at bioconductor.org.
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list