[BioC] iranges

Valerie Obenchain vobencha at fhcrc.org
Fri Aug 8 18:11:22 CEST 2014


Hi,

Please 'reply all' when responding so communication stays on the list.

If you are working with stranded ranges you should use the GRanges 
container. IRanges is not strand-aware and does not have a strand 
argument. You can see the function signature on the man page by typing

?IRanges

> Usage:
>
>      ## IRanges constructor:
>      IRanges(start=NULL, end=NULL, width=NULL, names=NULL)
>

Load a Transcript Db object and extract transcripts by gene:
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
tx <- transcriptsBy(TxDb.Dmelanogaster.UCSC.dm3.ensGene, "gene")

Select a gene with transcripts on the negative strand:
gene <- tx[[3]]
>> gene
> GRanges with 4 ranges and 2 metadata columns:
>       seqnames               ranges strand |     tx_id     tx_name
>          <Rle>            <IRanges>  <Rle> | <integer> <character>
>   [1]    chr3R [12632936, 12655767]      - |     21863 FBtr0306337
>   [2]    chr3R [12633349, 12653845]      - |     21864 FBtr0083388
>   [3]    chr3R [12633349, 12655300]      - |     21865 FBtr0083387
>   [4]    chr3R [12633349, 12655474]      - |     21866 FBtr0300485

GRanges can be manipulated with resize(), trim(), shift(), flank(), 
narrow() and several other methods. To see them type (with the quotes)

?`intra-range-methods`

and select the page for GRanges. It sounds like resize() is what you're 
looking for.

resize(gene, width = 10)
>> resize(gene, width = 10)
> GRanges with 4 ranges and 2 metadata columns:
>       seqnames               ranges strand |     tx_id     tx_name
>          <Rle>            <IRanges>  <Rle> | <integer> <character>
>   [1]    chr3R [12655758, 12655767]      - |     21863 FBtr0306337
>   [2]    chr3R [12653836, 12653845]      - |     21864 FBtr0083388
>   [3]    chr3R [12655291, 12655300]      - |     21865 FBtr0083387
>   [4]    chr3R [12655465, 12655474]      - |     21866 FBtr0300485

If you have sequence data instead of range data, the XStringSet family 
is more appropriate. For examples of manipulating sequences see Section 
E on the XStringSet man page. The functions you want are narrow() or 
subseq().

library(Biostrings)
?XStringSet


Valerie


On 08/08/2014 08:38 AM, carol white wrote:
> I have the problem when i want to take the width from the end of a
> sequence on a reverse strand.
> if I take the nucleotide seq of a gene that is on the reverse strand on
> the ncbi web site and extract for ex 10 or 20 bp from the end, i don't
> get the same as I do with iranges. As I have already given the strand as
> the parameter to the iranges function, I assume that it has already
> reverse-complemented by iranges. I don't have this problem with the
> genes that are on the forward strand nor when I take the sub sequence
> from the beginning of the sequence.
>
> Regards,
> On Friday, August 8, 2014 5:28 PM, Valerie Obenchain
> <vobencha at fhcrc.org> wrote:
>
>
> Did you provide 'start', 'end' and 'width' and get a confusing answer?
> If yes, please show your example.
>
> Thanks.
> Valerie
>
>
>
> On 08/08/2014 08:23 AM, Valerie Obenchain wrote:
>  > Hi Carol,
>  >
>  > The 'end' is the end of the range. When you specify ranges with 'end'
>  > and 'width' the range will always end at the 'end' value.
>  >
>  >  > IRanges(end = 10, width = c(5, 10))
>  > IRanges of length 2
>  >      start end width
>  > [1]    6  10    5
>  > [2]    1  10    10
>  >
>  >
>  > Similar reasoning for 'start' and 'width':
>  >
>  >  > IRanges(start = 10, width = c(5, 10))
>  > IRanges of length 2
>  >      start end width
>  > [1]    10  14    5
>  > [2]    10  19    10
>  >
>  >
>  > Valerie
>  >
>  >
>  >
>  > On 08/08/2014 01:29 AM, carol white wrote:
>  >> Hi,
>  >> How does width with start and end in IRanges work? I thought that if I
>  >> use the end with a width, then the sequence from the end with the
>  >> length of width is taken. However, in my case when I use width for ex
>  >> 20 and 10, the corresponding sequences with the length 20 and 10 are
>  >> not the same from the end but from the beginning. Did I misunderstood
>  >> some thing?
>  >>
>  >> Regards,
>  >>
>  >> Carol
>  >>    [[alternative HTML version deleted]]
>  >>
>  >> _______________________________________________
>  >> Bioconductor mailing list
>  >> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>  >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>  >> Search the archives:
>  >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>  >>
>  >
>  >
>
>
> --
> Valerie Obenchain
> Program in Computational Biology
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N, Seattle, WA 98109
>
> Email: vobencha at fhcrc.org <mailto:vobencha at fhcrc.org>
> Phone: (206) 667-3158
>
>



More information about the Bioconductor mailing list