[BioC] export funciton alters ranges in output BED file
Hervé Pagès
hpages at fhcrc.org
Sun Sep 14 21:18:17 CEST 2014
Hi Dolev,
This is due to different conventions to represent ranges:
- Bioconductor uses 1-base starting and ending positions for ranges.
- The BED format and other UCSC file formats use 0-base starting
positions and 1-base ending positions for ranges:
http://genome.ucsc.edu/FAQ/FAQformat.html#format1
The import() and export() functions in rtracklayer are aware of that and
make the correction for you.
Hope this helps,
H.
On 09/14/2014 07:42 AM, do r wrote:
> Hello
>
> I am attempting to use the export() function to generate a BED file from a
> GRanges object.
> However, the ranges in the output file are altered so that the start
> coordinate is subtracted by one,
> for example:
>
> [987] 3 [37035154, 37035155] + | Class 4 MLH1 c.116+1G>A
> [988] 3 [37067241, 37067242] + | Class 4 MLH1 c.1153C>T
> [989] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>T
> [990] 3 [37067125, 37067126] + | Class 4 MLH1 c.1039-2A>G
> [991] 3 [37061954, 37061955] + | Class 4 MLH1 c.1038+1G>C
>
> results in this output:
> 3 37067240 37067242 . 0 +
> 3 37067124 37067126 . 0 +
> 3 37067124 37067126 . 0 +
> 3 37061953 37061955 . 0 +
>
> Since I intend to later to searrch for intersections between the
> ranges in the BED file and variants in a vcf file (using Tabix), I am
> afraid that this subtratcion may lead to false positive.
>
> What is the reason for this subtraction from the start and is there
> any way to supress it?
>
> thanks in advance
>
> Dolev Rahat
>
>
> sessionInfo:
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets
> methods base
>
> other attached packages:
> [1] rtracklayer_1.24.2 GenomicRanges_1.16.4 GenomeInfoDb_1.0.2
> IRanges_1.22.10 BiocGenerics_0.10.0
> [6] BiocInstaller_1.14.2 stringr_0.6.2
>
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.3 BBmisc_1.7
> BiocParallel_0.6.1 Biostrings_2.32.1
> [5] bitops_1.0-6 brew_1.0-6 BSgenome_1.32.0
> checkmate_1.4
> [9] codetools_0.2-9 DBI_0.3.0 digest_0.6.4
> fail_1.2
> [13] foreach_1.4.2 GenomicAlignments_1.0.6 iterators_1.0.7
> Rcpp_0.11.2
> [17] RCurl_1.95-4.3 Rsamtools_1.16.1 RSQLite_0.11.4
> sendmailR_1.1-2
> [21] stats4_3.1.0 tools_3.1.0 XML_3.98-1.1
> XVector_0.4.0
> [25] zlibbioc_1.10.0
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list