[BioC] kOverA(k, A=100, na.rm=TRUE)

Tue Jul 1 09:29:49 CEST 2008

Hi,

Roberts, Raymond wrote:
> sessionInfo()
>> sessionInfo()
> R version 2.7.0 (2008-04-22) 
> i386-pc-mingw32 
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
>  [1] grid      splines   stats     graphics  grDevices datasets 
>  [7] tools     utils     methods   base     
> 
> other attached packages:
>  [1] RColorBrewer_1.0-2     goProfiles_1.2.0      
>  [3] aroma.affymetrix_0.9.3 aroma.apd_0.1.3       
>  [5] R.huge_0.1.5           affxparser_1.12.2     
>  [7] aroma.core_0.9.3       sfit_0.1.5            
>  [9] aroma.light_1.8.1      digest_0.3.1          
> [11] matrixStats_0.1.2      R.rsp_0.3.4           
> [13] R.cache_0.1.7          farms_1.3             
> [15] MASS_7.2-41            RMySQL_0.6-0          
> [17] gplots_2.6.0           gdata_2.4.2           
> [19] gtools_2.5.0           affycoretools_1.12.0  
> [21] annaffy_1.12.1         KEGG.db_2.2.0         
> [23] gcrma_2.12.1           matchprobes_1.12.0    
> [25] biomaRt_1.14.0         RCurl_0.9-3           
> [27] GOstats_2.6.0          Category_2.6.0        
> [29] genefilter_1.20.0      survival_2.34-1       
> [31] RBGL_1.16.0            annotate_1.18.0       
> [33] xtable_1.5-2           GO.db_2.2.0           
> [35] AnnotationDbi_1.2.1    RSQLite_0.6-8         
> [37] DBI_0.2-4              graph_1.18.1          
> [39] limma_2.14.5           affy_1.18.2           
> [41] preprocessCore_1.2.0   affyio_1.8.0          
> [43] rJava_0.5-1            Biobase_2.0.1         
> [45] R.utils_1.0.2          R.oo_1.4.3            
> [47] R.methodsS3_1.0.1     
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.10 XML_1.94-0.1   
> 
> 

  Please show the code and the output - clearly there is a problem where 
there should not be one, and by not giving us the whole story you do 
make it rather hard to diagnose.

> 
> There are only 78 shared between the two outputs.  kOverA(2,7) produces
> 361 genes, kOverA(2,6) produces 121 genes.  This seems like a major
> problem that I encountered in a completely different script, which was
> not mine so I just scrapped using it.  

So all kOverA is doing is asking whether at least k of the arrays are 
larger than A. In your case k is 2 and once it is 7, another time 6.
The code for kOverA is so simple as to suggest there is no problem there

function (k, A = 100, na.rm = TRUE)
{
     function(x) {
         if (na.rm)
             x <- x[!is.na(x)]
         sum(x > A) >= k
     }
}

this is applied row-wise to your expression matrix.

Suppose your ExpressionSet is called myExpr
then you can mimic kOverA as follows

kO6 = apply(exprs(myExpr), 1, function(x) sum(x > 6) >= 2)
kO7 = apply(exprs(myExpr), 1, function(x) sum(x > 7) >= 2)

so perhaps you could run both of those and tell us what happens?
and how they compare with the genes you selected - and please do show 
enough of the code so that someone other than yourself might be able to 
find the source of the error

Robert

> 
> There are no apparent similarities between the genes being included in
> both output sets.  Is there another function I can use that will do the
> same filtering but may be more reliable?
> 
> 
> Ray
> 
> Lovelace Respiratory Research Institute
> 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Robert Gentleman [mailto:rgentlem at fhcrc.org] 
> Sent: Monday, June 30, 2008 2:27 PM
> To: Roberts, Raymond
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] kOverA(k, A=100, na.rm=TRUE)
> 
> Hi,
>   That behavior seems somewhat unusual.
> Could you perhaps see which genes are selected for
> kOverA(2,7) and which for kOverA(2,6)?  And then look at the expression 
> values to see what might be going on..
> 
> Also please include the output of sessionInfo, as requested in the 
> posting guide.
> 
>   Robert
> 
> 
> Roberts, Raymond wrote:
>> Dear List,
>>
>>  
>>
>> I'm using genefilter and Limma to analyze the human exon array.  I did
>> my preprocessing with aroma.affymetrix and then loaded it into limma
> and
>> used genefilter to isolate the significant genes.  At least that is
> what
>> I think I am doing.  Here is my script starting at the function I have
>> the question about
>>
>>  
>>
>>  
>>
>> ##Filter for differentialy expressed genes
>>
>> f1<-kOverA(2,7)
>>
>> ffun<-filterfun(f1)
>>
>> which<-genefilter(exprs(RMASet), ffun)
>>
>> rma_no<-summary(which)
>>
>> rmafiltset<-RMASet[which,]
>>
>>  
>>
>>  
>>
>>  
>>
>> grps <- paste(pData(rmafiltset)$hours)
>>
>> lev <- c("time0", "time3", "time6", "time12", "time24", "time48")
>>
>> f <- factor(grps, levels = lev)
>>
>> design <- model.matrix(~ 0 + f)
>>
>> colnames(design) <- lev
>>
>> rmafit <- lmFit(rmafiltset, design)
>>
>>  
>>
>>  
>>
>>  
>>
>> contrast.matrix <- makeContrasts("time3-time0", "time6-time3",
>> "time12-time6", "time24-time12", "time48-time24", "time6-time0",
>> "time12-time0", "time24-time0", "time48-time0", levels=design)
>>
>> rmafit2 <- contrasts.fit(rmafit, contrast.matrix)
>>
>> rmafit3 <- eBayes(rmafit2)
>>
>>  
>>
>>  
>>
>> pval<-0.05
>>
>> l2foldch<-  0.58496250072
>>
>>  
>>
>> rmaresults<-decideTests(rmafit3, p.value=pval, lfc=l2foldch)
>>
>>  
>>
>>  
>>
>> rmalist1 <- rmaresults[,1]!=0
>>
>> rmalist2 <- rmaresults[,2]!=0
>>
>> rmalist3 <- rmaresults[,3]!=0
>>
>> rmalist4 <- rmaresults[,4]!=0
>>
>> rmalist5 <- rmaresults[,5]!=0
>>
>> rmalist6 <- rmaresults[,6]!=0
>>
>> rmalist7 <- rmaresults[,7]!=0
>>
>> rmalist8 <- rmaresults[,8]!=0
>>
>> rmalist9 <- rmaresults[,9]!=0
>>
>>  
>>
>> rmagroup1 <- rmafiltset[rmalist1,]
>>
>> rmagroup2 <- rmafiltset[rmalist2,]
>>
>> rmagroup3 <- rmafiltset[rmalist3,]
>>
>> rmagroup4 <- rmafiltset[rmalist4,]
>>
>> rmagroup5 <- rmafiltset[rmalist5,]
>>
>> rmagroup6 <- rmafiltset[rmalist6,]
>>
>> rmagroup7 <- rmafiltset[rmalist7,]
>>
>> rmagroup8 <- rmafiltset[rmalist8,]
>>
>> rmagroup9 <- rmafiltset[rmalist9,]
>>
>>  
>>
>> rmaglist1 <- featureNames(rmagroup1)
>>
>> rmaglist2 <- featureNames(rmagroup2)
>>
>> rmaglist3 <- featureNames(rmagroup3)
>>
>> rmaglist4 <- featureNames(rmagroup4)
>>
>> rmaglist5 <- featureNames(rmagroup5)
>>
>> rmaglist6 <- featureNames(rmagroup6)
>>
>> rmaglist7 <- featureNames(rmagroup7)
>>
>> rmaglist8 <- featureNames(rmagroup8)
>>
>> rmaglist9 <- featureNames(rmagroup9)
>>
>>  
>>
>> rmaunion <- union(rmaglist1, union(rmaglist2, union(rmaglist3,
>> union(rmaglist4, union(rmaglist5, union(rmaglist6, union(rmaglist7,
>> union(rmaglist8, rmaglist9))))))))
>>
>>  
>>
>> rmaunionset <- rmafiltset[rmaunion,]
>>
>>  
>>
>> plotdata2 <- exprs(rmaunionset)
>>
>> colnames(plotdata2) <- paste(pData(RMASet)$label)
>>
>>  
>>
>>  
>>
>> I go on to create a heatmap and html output of significant genes.   My
>> problem is that when I change kOverA(2, 7) to, say, kOverA(2, 6), I
> get
>> fewer genes, I thought this should increase the number of genes I'm
>> seeing in the final output.  If I increase from 7 to 8 I still get a
>> shorter list.  It seems like 7 is the value that generates the maximum
>> output.  If anyone could explain why this is happening I would really
>> appreciate it.   If I should post more information please just let me
>> know.
>>
>>  
>>
>>  
>>
>> Ray Roberts
>>
>>  
>>
>> Lovelace Respiratory Research Institute
>>
>>  
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org