[BioC] kOverA(k, A=100, na.rm=TRUE)

Tue Jul 1 00:13:04 CEST 2008

sessionInfo()
> sessionInfo()
R version 2.7.0 (2008-04-22) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
 [1] grid      splines   stats     graphics  grDevices datasets 
 [7] tools     utils     methods   base     

other attached packages:
 [1] RColorBrewer_1.0-2     goProfiles_1.2.0      
 [3] aroma.affymetrix_0.9.3 aroma.apd_0.1.3       
 [5] R.huge_0.1.5           affxparser_1.12.2     
 [7] aroma.core_0.9.3       sfit_0.1.5            
 [9] aroma.light_1.8.1      digest_0.3.1          
[11] matrixStats_0.1.2      R.rsp_0.3.4           
[13] R.cache_0.1.7          farms_1.3             
[15] MASS_7.2-41            RMySQL_0.6-0          
[17] gplots_2.6.0           gdata_2.4.2           
[19] gtools_2.5.0           affycoretools_1.12.0  
[21] annaffy_1.12.1         KEGG.db_2.2.0         
[23] gcrma_2.12.1           matchprobes_1.12.0    
[25] biomaRt_1.14.0         RCurl_0.9-3           
[27] GOstats_2.6.0          Category_2.6.0        
[29] genefilter_1.20.0      survival_2.34-1       
[31] RBGL_1.16.0            annotate_1.18.0       
[33] xtable_1.5-2           GO.db_2.2.0           
[35] AnnotationDbi_1.2.1    RSQLite_0.6-8         
[37] DBI_0.2-4              graph_1.18.1          
[39] limma_2.14.5           affy_1.18.2           
[41] preprocessCore_1.2.0   affyio_1.8.0          
[43] rJava_0.5-1            Biobase_2.0.1         
[45] R.utils_1.0.2          R.oo_1.4.3            
[47] R.methodsS3_1.0.1     

loaded via a namespace (and not attached):
[1] cluster_1.11.10 XML_1.94-0.1   

There are only 78 shared between the two outputs.  kOverA(2,7) produces
361 genes, kOverA(2,6) produces 121 genes.  This seems like a major
problem that I encountered in a completely different script, which was
not mine so I just scrapped using it.  

There are no apparent similarities between the genes being included in
both output sets.  Is there another function I can use that will do the
same filtering but may be more reliable?

Ray

Lovelace Respiratory Research Institute

-----Original Message-----
From: Robert Gentleman [mailto:rgentlem at fhcrc.org] 
Sent: Monday, June 30, 2008 2:27 PM
To: Roberts, Raymond
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] kOverA(k, A=100, na.rm=TRUE)

Hi,
  That behavior seems somewhat unusual.
Could you perhaps see which genes are selected for
kOverA(2,7) and which for kOverA(2,6)?  And then look at the expression 
values to see what might be going on..

Also please include the output of sessionInfo, as requested in the 
posting guide.

  Robert

Roberts, Raymond wrote:
> Dear List,
> 
>  
> 
> I'm using genefilter and Limma to analyze the human exon array.  I did
> my preprocessing with aroma.affymetrix and then loaded it into limma
and
> used genefilter to isolate the significant genes.  At least that is
what
> I think I am doing.  Here is my script starting at the function I have
> the question about
> 
>  
> 
>  
> 
> ##Filter for differentialy expressed genes
> 
> f1<-kOverA(2,7)
> 
> ffun<-filterfun(f1)
> 
> which<-genefilter(exprs(RMASet), ffun)
> 
> rma_no<-summary(which)
> 
> rmafiltset<-RMASet[which,]
> 
>  
> 
>  
> 
>  
> 
> grps <- paste(pData(rmafiltset)$hours)
> 
> lev <- c("time0", "time3", "time6", "time12", "time24", "time48")
> 
> f <- factor(grps, levels = lev)
> 
> design <- model.matrix(~ 0 + f)
> 
> colnames(design) <- lev
> 
> rmafit <- lmFit(rmafiltset, design)
> 
>  
> 
>  
> 
>  
> 
> contrast.matrix <- makeContrasts("time3-time0", "time6-time3",
> "time12-time6", "time24-time12", "time48-time24", "time6-time0",
> "time12-time0", "time24-time0", "time48-time0", levels=design)
> 
> rmafit2 <- contrasts.fit(rmafit, contrast.matrix)
> 
> rmafit3 <- eBayes(rmafit2)
> 
>  
> 
>  
> 
> pval<-0.05
> 
> l2foldch<-  0.58496250072
> 
>  
> 
> rmaresults<-decideTests(rmafit3, p.value=pval, lfc=l2foldch)
> 
>  
> 
>  
> 
> rmalist1 <- rmaresults[,1]!=0
> 
> rmalist2 <- rmaresults[,2]!=0
> 
> rmalist3 <- rmaresults[,3]!=0
> 
> rmalist4 <- rmaresults[,4]!=0
> 
> rmalist5 <- rmaresults[,5]!=0
> 
> rmalist6 <- rmaresults[,6]!=0
> 
> rmalist7 <- rmaresults[,7]!=0
> 
> rmalist8 <- rmaresults[,8]!=0
> 
> rmalist9 <- rmaresults[,9]!=0
> 
>  
> 
> rmagroup1 <- rmafiltset[rmalist1,]
> 
> rmagroup2 <- rmafiltset[rmalist2,]
> 
> rmagroup3 <- rmafiltset[rmalist3,]
> 
> rmagroup4 <- rmafiltset[rmalist4,]
> 
> rmagroup5 <- rmafiltset[rmalist5,]
> 
> rmagroup6 <- rmafiltset[rmalist6,]
> 
> rmagroup7 <- rmafiltset[rmalist7,]
> 
> rmagroup8 <- rmafiltset[rmalist8,]
> 
> rmagroup9 <- rmafiltset[rmalist9,]
> 
>  
> 
> rmaglist1 <- featureNames(rmagroup1)
> 
> rmaglist2 <- featureNames(rmagroup2)
> 
> rmaglist3 <- featureNames(rmagroup3)
> 
> rmaglist4 <- featureNames(rmagroup4)
> 
> rmaglist5 <- featureNames(rmagroup5)
> 
> rmaglist6 <- featureNames(rmagroup6)
> 
> rmaglist7 <- featureNames(rmagroup7)
> 
> rmaglist8 <- featureNames(rmagroup8)
> 
> rmaglist9 <- featureNames(rmagroup9)
> 
>  
> 
> rmaunion <- union(rmaglist1, union(rmaglist2, union(rmaglist3,
> union(rmaglist4, union(rmaglist5, union(rmaglist6, union(rmaglist7,
> union(rmaglist8, rmaglist9))))))))
> 
>  
> 
> rmaunionset <- rmafiltset[rmaunion,]
> 
>  
> 
> plotdata2 <- exprs(rmaunionset)
> 
> colnames(plotdata2) <- paste(pData(RMASet)$label)
> 
>  
> 
>  
> 
> I go on to create a heatmap and html output of significant genes.   My
> problem is that when I change kOverA(2, 7) to, say, kOverA(2, 6), I
get
> fewer genes, I thought this should increase the number of genes I'm
> seeing in the final output.  If I increase from 7 to 8 I still get a
> shorter list.  It seems like 7 is the value that generates the maximum
> output.  If anyone could explain why this is happening I would really
> appreciate it.   If I should post more information please just let me
> know.
> 
>  
> 
>  
> 
> Ray Roberts
> 
>  
> 
> Lovelace Respiratory Research Institute
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org