[BioC] eayRNASeq with Ensemble GRCh37 help

Aki Hoji akh22 at pitt.edu
Mon Sep 16 20:17:27 CEST 2013


Hi, 

I've been trying to generate an output file for DESeq2 by easyRNASeq.  An input file is a BAM generated by Tophat2/Bowtie2 with Ensemble GRCh37.72 which was a part of Illumina's  iGenome package.   I followed the overview and samples of easyRNASeq in a BioC mailing list  and fired up a following;

testcount<-easyRNASeq(filesDirectory=getwd(), organism="Hsapiens", chr.sizes="auto", readLength=100L, annotationMethod="gtf", annotationFile="Ensemble.gtf", count="exons", outputFormat="DESeq", filenames="4673Bsorted.bam")

Then I got this error;

Checking arguments... 
Fetching annotations... 
Read 2280612 records
Error in easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto",  : 
  The number of conditions: 0 did not correspond to the number of samples: 1
In addition: Warning messages:
1: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto",  :
  You enforce UCSC chromosome conventions, however the provided chromosome size list is not compliant. Correcting it.
2: In .Method(..., deparse.level = deparse.level) :
  number of columns of result is not a multiple of vector length (arg 1)
3: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto",  :
  There are 966272 features/exons defined in your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
4: In easyRNASeq(filesDirectory = getwd(), organism = "Hsapiens", chr.sizes = "auto",  :
  You enforce UCSC chromosome conventions, however the provided annotation is not compliant. Correcting it.

As far as I can tell, I am not really enforcing the UCSC chromosome convention, and chr.sizes could be set to auto since the BAM file is used.  I am getting stuck at this point and any help/pointer  will be really appreciated. 

Thanks. 

AH

> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] easyRNASeq_1.6.0       ShortRead_1.18.0       latticeExtra_0.6-26   
 [4] RColorBrewer_1.0-5     Rsamtools_1.12.4       DESeq_1.12.1          
 [7] lattice_0.20-23        locfit_1.5-9.1         BSgenome_1.28.0       
[10] GenomicRanges_1.12.5   Biostrings_2.28.0      IRanges_1.18.3        
[13] edgeR_3.2.4            limma_3.16.7           biomaRt_2.16.0        
[16] Biobase_2.20.1         genomeIntervals_1.16.0 BiocGenerics_0.6.0    
[19] intervals_0.14.0       BiocInstaller_1.10.3  

loaded via a namespace (and not attached):
 [1] annotate_1.38.0      AnnotationDbi_1.22.6 bitops_1.0-6        
 [4] DBI_0.2-7            genefilter_1.42.0    geneplotter_1.38.0  
 [7] grid_3.0.1           hwriter_1.3          RCurl_1.95-4.1      
[10] RSQLite_0.11.4       splines_3.0.1        stats4_3.0.1        
[13] survival_2.37-4      tools_3.0.1          XML_3.95-0.2        
[16] xtable_1.7-1         zlibbioc_1.6.0 



More information about the Bioconductor mailing list