[BioC] Annotation files for bacterial genome RNAseq

James W. MacDonald jmacdon at uw.edu
Fri Jun 20 16:07:25 CEST 2014


Hi Jose,

On 6/20/2014 5:56 AM, José Luis Lavín wrote:
> Dear list members,
>
> I have to analyze bacterial transcriptomic data and I have a doubt about
> how to proceed.
>
> I have downloaded the reference genome FASTA from the NCBI and also a gff
> file containing the annotation of that reference. I can map the reads to
> the genome and so on, but when the time comes to generate the table of
> counts for the Differential Expression (DE) analysis, I have no clear Idea
> on how to use the gff annotation file to assign reads to the genomic
> features.
>
> I've looked for solutions like HTSeq, but to my understanding this program
> will generate a table of counts per alignment file (for instance, one table
> per each bam file) which will require to merge all the independent tables
> one by one to generate the full table of count for the DE analysis...
>
> To sum up; Is there any R package that enable to generate a single Table of
> counts from multiple BAM files using an annotation gff file (or similar),
> for a genome that is not included in the UCSC catalog of reference
> organisms (as is the case of this bacteria I have to analyze)?


As already mentioned, there is the easyRNASeq package. In addition, you 
could use a combination of 'base' Bioconductor packages to do this.

library(GenomicFeatures)
tx <- makeTranscriptDbFromGFF(<your gff file>)

You might need other arguments; I don't work with prokaryotes as a rule, 
so cannot advise, but you might need to say something about circular 
genomes and whatnot.

align.to.this <- exonsBy(tx)
or
align.to.this <- transcriptsBy(tx)
or
align.to.this <- genes(tx)

Again, you need to decide at what level you want to align, using your 
knowledge of prokaryotic biology to do 'the right thing'.

library(Rsamtools)
bfl <- BamFileList(<vector of bam files>)
olaps <- summarizeOverlaps(align.to.this, bfl)

Then your counts are in

assays(olaps)$counts

You would be well served to read the vignettes for GenomicFeatures, 
Rsamtools, and probably GenomicRanges, IRanges and GenomeInfoDb.

Best,

Jim


>
> Thanks in advance
>
> JL
>
> PD. I came across "Rsubread" package, but...
>
> package ‘Rsubread’ is not available (for R version 3.1.0)
>
>> sessionInfo()R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
>
>
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list