[BioC] Gene filtering

Heike Pospisil pospisil at zbh.uni-hamburg.de
Mon Feb 14 10:10:07 CET 2005


Hello Adai,

thanks a lot for your help. It works very well!

And once again, sorry for the (first) uncertain information I gave.

Best wishes,
Heike

Adaikalavan Ramasamy wrote:

>[[Ignore the previous mail. I hit the send button too soon]]
>
>I never used genefilter and filterfun so I would not be able to advice
>on this and hope the suggestions below solves your problem. On a
>personal note, I just calculate and store the p-values/statistics
>directly. This may be more efficient for the following reasons
>
>1) to generate various lists of interesting genes at different p-value
>cutoffs. This is often required by the biologists who might want a high
>confidence subset for biological validation and maybe a broader subset
>for computation validation (e.g. pathway analysis)
>
>2) to rank genes by p-values
>
>3) to adjust p-values for multiple hypothesis
>
>
>Here is one way how you can do this :
>
>mat <- matrix( rnorm(100000, sd=5), nc=100 ) 
>rownames(mat) <- paste("g", 1:1000, sep="")
># replace 'mat' with your exprs(data)
>
>g <- rep(1:2, each=50)                 
># Class information e.g. 50 normal and 50 tumour
># again replace this with your own groups
>
>stats <- t( apply( mat, 1, function(z) { 
>                x <- z[ which( g==1 ) ]
>                y <- z[ which( g==2 ) ]
>
>                t.p <- t.test(x, y)$p.value
>                w.p <- wilcox.test(x, y)$p.value
>                fc  <- mean(x, na.rm=T) - mean(y, na.rm=T)
>                return( c(t.pval=t.p, wilcox.pval=w.p, fold.change=fc) )
>               }))
># You can modify the above to include further tests etc.
>
>
>Hopefully you can get something compact like the following (Note : your
>results will vary due to random number generation).
>
>      t.pval wilcox.pval fold.change
>g1 0.2376890   0.2655573   1.0214440
>g2 0.1513874   0.2931174  -1.2895703
>g3 0.4788188   0.5014898  -0.7349789
>g4 0.2021780   0.1302305   1.3201382
>g5 0.2537569   0.2655573  -1.1256882
>g6 0.5881588   0.7020112  -0.5907285
>.. .........   .........  ..........
>
>
>Now, you can generate various lists such as 
>
>list1 <- names( which( stats[ , "t.pval"] < 0.05 ) )
>list2 <- names( which( stats[ , "fold.change"] > 1 ) )
>intersect( list1, list2 )
>
>I guess this is probably a matter of taste. Hope this helps.
>
>Regards, Adai
>
>
>
>On Fri, 2005-02-11 at 10:08 -0500, James W. MacDonald wrote:
>  
>
>>Heike Pospisil wrote:
>>    
>>
>>>Hello Adaikalavan
>>>
>>>      
>>>
>>>>I think justRMA() uses nearly all the memory you have access to, so it
>>>>it only able to handle small computations afterwards. What I would
>>>>suggest is try saving the exprSet and exit. Then start from a fresh R
>>>>session and do your analysis from that. See below.
>>>> 
>>>>
>>>>        
>>>>
>>>Thanks for your suggestion. Saving and loading the exprSet work and 
>>>help. But, unfortunately, my filter function do not work.
>>>
>>>ff1<-ttest(data,.001,na.rm=TRUE)
>>>ff2<-filterfun(ff1)
>>>wh2<-genefilter(exprs(data), ff2)
>>>
>>>No idea :-(
>>>
>>>Best wishes.
>>>Heike
>>>
>>>      
>>>
>>I think you are setting up ff1 incorrectly. As an example, let's say 
>>that your exprSet contains 10 samples, the first 5 are e.g., 
>>experimental, and the second 5 are control. Then you would set up ff1 
>>like this:
>>
>>ff1 <- ttest(5, 0.001, na.rm = TRUE)
>>
>>-or-
>>
>>cl <- c(rep(1,5), rep(2,5))
>>ff1 <- ttest(cl, 0.001, na.rm = TRUE)
>>
>>The second method can be used if the samples are not contiguous (e.g., 
>>they are ordered exp, cont, exp, cont, etc).
>>
>>cl <- c(rep(c(1,2), 5)
>>ff1 <- ttest(cl, 0.001, na.rm = TRUE)
>>
>>HTH,
>>
>>Jim
>>
>>
>>
>>    
>>
>
>
>  
>

-- 
Dr. Heike Pospisil
Center for Bioinformatics, University of Hamburg
Bundesstrasse 43, 20146 Hamburg, Germany
phone: +49-40-42838-7303 fax: +49-40-42838-7312



More information about the Bioconductor mailing list