[BioC] export funciton alters ranges in output BED file

Hervé Pagès hpages at fhcrc.org
Sun Sep 14 21:18:17 CEST 2014


Hi Dolev,

This is due to different conventions to represent ranges:

- Bioconductor uses 1-base starting and ending positions for ranges.

- The BED format and other UCSC file formats use 0-base starting
   positions and 1-base ending positions for ranges:

     http://genome.ucsc.edu/FAQ/FAQformat.html#format1

The import() and export() functions in rtracklayer are aware of that and
make the correction for you.

Hope this helps,

H.


On 09/14/2014 07:42 AM, do r wrote:
> Hello
>
> I am attempting to use the export() function to generate a BED file from a
> GRanges object.
> However, the ranges in the output file are altered so that the start
> coordinate is subtracted by one,
> for example:
>
> [987]        3 [37035154, 37035155]      +   |  Class 4     MLH1    c.116+1G>A
>    [988]        3 [37067241, 37067242]      +   |  Class 4     MLH1     c.1153C>T
>    [989]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>T
>    [990]        3 [37067125, 37067126]      +   |  Class 4     MLH1   c.1039-2A>G
>    [991]        3 [37061954, 37061955]      +   |  Class 4     MLH1   c.1038+1G>C
>
> results in this output:
> 3	37067240	37067242	.	0	+
> 3	37067124	37067126	.	0	+
> 3	37067124	37067126	.	0	+
> 3	37061953	37061955	.	0	+
>
> Since I intend to later to searrch for intersections between the
> ranges in the BED file and variants in a vcf file (using Tabix), I am
> afraid that this subtratcion may lead to false positive.
>
> What is the reason for this subtraction from the start and is there
> any way to supress it?
>
> thanks in advance
>
> Dolev Rahat
>
>
> sessionInfo:
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets
> methods   base
>
> other attached packages:
> [1] rtracklayer_1.24.2   GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
> IRanges_1.22.10      BiocGenerics_0.10.0
> [6] BiocInstaller_1.14.2 stringr_0.6.2
>
> loaded via a namespace (and not attached):
>   [1] BatchJobs_1.3           BBmisc_1.7
> BiocParallel_0.6.1      Biostrings_2.32.1
>   [5] bitops_1.0-6            brew_1.0-6              BSgenome_1.32.0
>        checkmate_1.4
>   [9] codetools_0.2-9         DBI_0.3.0               digest_0.6.4
>        fail_1.2
> [13] foreach_1.4.2           GenomicAlignments_1.0.6 iterators_1.0.7
>        Rcpp_0.11.2
> [17] RCurl_1.95-4.3          Rsamtools_1.16.1        RSQLite_0.11.4
>        sendmailR_1.1-2
> [21] stats4_3.1.0            tools_3.1.0             XML_3.98-1.1
>        XVector_0.4.0
> [25] zlibbioc_1.10.0
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list