[BioC] Regarding extraction of 3' and 5'UTRs and exonic region of a gene.

Hervé Pagès hpages at fhcrc.org
Thu Jun 27 19:31:03 CEST 2013

Hi Abdul,

Suggested workflow:

1. Build the list of genes involved in the particular cancer you're
    interested in. Could be a vector of gene ids or transcript ids (not
    all transcripts are necessarily linked to a gene).

    Suggested tools (no exhaustive): GO.db and org.Hs.eg.db packages,
    maybe the DO.db package, etc... I'm not sure what would be the best
    tool for this. But maybe you already have your list of genes?

2. Use the TxDb.Hsapiens.UCSC.hg19.knownGene + GenomicFeatures packages
    to extract the coordinates of the 5'UTRs and 3'UTRs.
    Use the fiveUTRsByTranscript() and threeUTRsByTranscript() functions
    for this. They'll return the result in a GRangesList object (you'll
    have to become a bit familiar with those objects first).

3. Use the BSgenome.Hsapiens.UCSC.hg19 package and the
    extractTranscriptsFromGenome() function from the GenomicFeatures
    package to extract the UTR sequences.
    The name of the function is misleading but it can be used to extract
    CDS or UTR sequences in addition to transcript sequences.

If you've never used those tools before, it will take you some time to
get familiarized with them. Your best friends are the man pages for the
individual functions/classes you're going to run into (don't miss the
examples section) and the vignettes in the GenomicRanges and
GenomicFeatures package.

Let us know if you have specific questions or run into specific problems
(show us what you've done and explain the problem -- don't forget your

Good luck,

On 06/27/2013 01:58 AM, Abdul Rawoof wrote:
> Hello everyone,
> Could anyone show me the way how can I extract the *3' and 5' UTRs and
> exonic regions *of all *Human genes* from *Ensembl and Kegg database* that
> are involved in particular cancer specially *breast cancer *using
> R/Biocondutor.
> Thanks in advance.
> Abdul Rawoof
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

More information about the Bioconductor mailing list