[BioC] GRanges list and reduce function

Martin Morgan mtmorgan at fhcrc.org
Mon Aug 25 19:13:57 CEST 2014


On 08/25/2014 05:31 AM, Asma rabe wrote:
> Hi Vincent, Martin,
>
> Thank you very much for your kind explanation.
>
> For Martin:
>
>>For exons group by _gene_, it's possible that genes are annotated to contain exons from different chromosomes
>
> How genes can be annotated to contain exons from different many chromosomes?

I don't know, but they are! You can see the reason for some of these; there are 
more interesting examples.

 > exByGn[elementLengths(unique(seqnames(exByGn))) > 1]
GRangesList of length 277:
$100126314
GRanges with 7 ranges and 2 metadata columns:
             seqnames               ranges strand |   exon_id   exon_name
                <Rle>            <IRanges>  <Rle> | <integer> <character>
   [1]           chr6 [30552109, 30552194]      + |     87067        <NA>
   [2]  chr6_cox_hap2 [ 2064162,  2064247]      + |    278963        <NA>
   [3]  chr6_dbb_hap3 [ 1845750,  1845835]      + |    280931        <NA>
   [4] chr6_mann_hap4 [ 1900181,  1900266]      + |    282770        <NA>
   [5]  chr6_mcf_hap5 [ 1933963,  1934048]      + |    284213        <NA>
   [6]  chr6_qbl_hap6 [ 1845017,  1845102]      + |    286075        <NA>
   [7] chr6_ssto_hap7 [ 1884391,  1884476]      + |    287961        <NA>

$100128977
GRanges with 4 ranges and 2 metadata columns:
              seqnames               ranges strand | exon_id exon_name
   [1]           chr17 [43920722, 43921527]      - |  227980      <NA>
   [2]           chr17 [43972846, 43972879]      - |  227981      <NA>
   [3] chr17_ctg5_hap1 [  894694,   894727]      + |  289539      <NA>
   [4] chr17_ctg5_hap1 [  946013,   946818]      + |  289540      <NA>

...
<275 more elements>
---
seqlengths:
                   chr1                  chr2 ...        chrUn_gl000249
              249250621             243199373 ...                 38502

>
>
>
> Best Regards,
> Asma
>
>
> On Fri, Aug 15, 2014 at 11:56 PM, Martin Morgan <mtmorgan at fhcrc.org
> <mailto:mtmorgan at fhcrc.org>> wrote:
>
>     On 08/15/2014 03:20 AM, Asma rabe wrote:
>
>         Hi ,
>
>
>         I need a Granges object with exons data for  few chromosomes, i got Granges
>         list of transcripts and their exons as follows:
>
>
>         library("TxDb.Hsapiens.UCSC.__hg19.knownGene")
>
>         txdb<-TxDb.Hsapiens.UCSC.hg19.__knownGene
>
>         tx_Exons<-exonsBy(txdb)
>
>
>
>         1-How to use reduce on Granges list?how to get the unique exons only and
>         exclude redundant exons?
>
>
>     I'm not sure what this means -- you've asked for exons grouped by
>     transcript, and there are not 'extra' exons in each transcript. Did you want
>     exonsBy(txdb, "gene") ?
>
>     reduce(tx_Exons) reduces within each transcript (list element); I'm not sure
>     what you'd really like to do?
>
>
>
>         2-How to select exons of certain chromosomes only ex: chr10? i tried the
>         following but i wonder why i got  GRnages list with empty Grange lists??
>
>
>     if you want to select transcripts where all exons are in certain
>     chromosomes, note that
>
>        seqnames(tx_Exonss) %in% "chr10"
>
>     returns an RleList, and
>
>        all(seqnames(tx_Exons) %in% "chr10")
>
>     asks element-wise whether all elements of each Rle are TRUE, returning a
>     logical vector of the same length as tx_Exons. So
>
>        tx_Exons[all(seqnames(tx___Exons) %in% "chr10")]
>
>     returns the transcripts with all exons on chr10. For exons group by _gene_,
>     it's possible that genes are annotated to contain exons from different
>     chromosomes
>
>         exByGn = exonsBy(txdb, "gene")
>         table(elementLengths(__runLength(seqnames(exByGn))))
>
>
>          1     2     3     4     5     6     7     8
>     23182    77     4     3    19    38    76    60
>
>     and only exons in chr10, preserving grouping by gene and removing genes
>     without any exons in chr10, are
>
>         chr10 <- exByGn[seqnames(exByGn) %in% "chr10"]
>
>
>     this is what you did below. The result is not empty, just contains the many
>     transcripts with exons not in chr10 removed, plus those deep in the list
>     that are on chr10. Here I remove the elements without 0 elements.
>
>         chr10[elementLengths(chr10) != 0]
>
>
>     Martin
>
>
>
>         chr10<-tx_Exons[seqnames(tx___Exons)=="chr10",]
>
>
>             chr10
>
>
>         GRangesList of length 80922:
>
>         $1
>
>         GRanges with 0 ranges and 3 metadata columns:
>
>              seqnames    ranges strand |   exon_id   exon_name exon_rank
>
>                 <Rle> <IRanges>  <Rle> | <integer> <character> <integer>
>
>
>         $2
>
>         GRanges with 0 ranges and 3 metadata columns:
>
>                seqnames ranges strand | exon_id exon_name exon_rank
>
>
>         $3
>
>         GRanges with 0 ranges and 3 metadata columns:
>
>                seqnames ranges strand | exon_id exon_name exon_rank
>
>
>         ...
>
>         <80919 more elements>
>
>         ---
>
>         seqlengths:
>
>                             chr1                  chr2 ...        chrUn_gl000249
>
>                        249250621             243199373 ...                 38502
>
>
>
>             length(chr10)
>
>
>         [1] 80922
>
>             length(tx_Exons)
>
>
>         [1] 80922
>
>
>         Thank you
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>     --
>     Computational Biology / Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N.
>     PO Box 19024 Seattle, WA 98109
>
>     Location: Arnold Building M1 B861
>     Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list