[BioC] Annotation files for bacterial genome RNAseq
James W. MacDonald
jmacdon at uw.edu
Fri Jun 20 16:07:25 CEST 2014
On 6/20/2014 5:56 AM, José Luis Lavín wrote:
> Dear list members,
> I have to analyze bacterial transcriptomic data and I have a doubt about
> how to proceed.
> I have downloaded the reference genome FASTA from the NCBI and also a gff
> file containing the annotation of that reference. I can map the reads to
> the genome and so on, but when the time comes to generate the table of
> counts for the Differential Expression (DE) analysis, I have no clear Idea
> on how to use the gff annotation file to assign reads to the genomic
> I've looked for solutions like HTSeq, but to my understanding this program
> will generate a table of counts per alignment file (for instance, one table
> per each bam file) which will require to merge all the independent tables
> one by one to generate the full table of count for the DE analysis...
> To sum up; Is there any R package that enable to generate a single Table of
> counts from multiple BAM files using an annotation gff file (or similar),
> for a genome that is not included in the UCSC catalog of reference
> organisms (as is the case of this bacteria I have to analyze)?
As already mentioned, there is the easyRNASeq package. In addition, you
could use a combination of 'base' Bioconductor packages to do this.
tx <- makeTranscriptDbFromGFF(<your gff file>)
You might need other arguments; I don't work with prokaryotes as a rule,
so cannot advise, but you might need to say something about circular
genomes and whatnot.
align.to.this <- exonsBy(tx)
align.to.this <- transcriptsBy(tx)
align.to.this <- genes(tx)
Again, you need to decide at what level you want to align, using your
knowledge of prokaryotic biology to do 'the right thing'.
bfl <- BamFileList(<vector of bam files>)
olaps <- summarizeOverlaps(align.to.this, bfl)
Then your counts are in
You would be well served to read the vignettes for GenomicFeatures,
Rsamtools, and probably GenomicRanges, IRanges and GenomeInfoDb.
> Thanks in advance
> PD. I came across "Rsubread" package, but...
> package â€˜Rsubreadâ€™ is not available (for R version 3.1.0)
>> sessionInfo()R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>  LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
>  LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>  LC_TIME=English_United Kingdom.1252
> attached base packages:
>  stats graphics grDevices utils datasets methods base
> other attached packages:
>  BiocInstaller_1.14.2
> loaded via a namespace (and not attached):
>  tools_3.1.0
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor