[BioC] Rsamtools Package Dereferences Symbolic Links

Cook, Malcolm MEC at stowers.org
Wed Oct 17 19:15:10 CEST 2012


Hi Martin,

Great to know the root cause and that a fix is available.

The interim workaround you suggest wont work for me since I am using other
tools (IGV) which expect the index being in the same directory as the
(symlink to the) bam.

Tilde expansion is something I would expect from a login shell.  I am not
surprised that samtools does not understand this syntax.

Understanding .. to mean parent directory or . to mean `pwd` I would
expect to be understood by the file system API.  THus, samtools does not
need to do anything to 'undestand' them.  Just pass them along to the file
system's open call.

Cheers,

Malcolm


On 10/17/12 11:54 AM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

>Thanks all for your contributions.
>
>There are two things going on, both fixed in the devel version 1.11.2
>(the best 
>work-around is probably, as Lucas has discovered, is to arrange it so
>that the 
>sym links and index files are not in the same directory).
>
>- samtools expects index files _without_ .bai; Rsamtools now tries to
>tolerate 
>index files when the user (e.g., Lucas and Vince) specifies a file with
>.bai by 
>checking for and stripping .bai prior to opening the file.
>
>- samtools doesn't do tilde expansion, i.e., ~/myfile.bam does not work.
>Rsamtools tried to help the user out by using path.expand (which does
>tilde 
>expansion) and, for good measure, normalizePath, which would replace
>something 
>like ../bams/mybam.bam with the full path to mybam.bam, as well as
>dereferencing 
>symlinks. But samtools seems to know about ./ and ../ and seems to be
>happy with 
>symlinks, so we no longer do normalizePath.
>
>This also allows indexing in the directory in which the symlink occurs,
>rather 
>than in the de-referenced directory.
>
>I'd be interested in knowing if this causes alternative errors.
>
>Martin
>
>On 10/16/2012 10:10 AM, Lucas Swanson wrote:
>> I can try to clarify with a specific example.
>>
>> My original (unindexed) bam file is in a directory to which I do not
>>have write access:
>> ~/R_tests/readonly_dir/original.bam
>>
>> So I create a symbolic link to it in a directory to which I do have
>>write access:
>> ~/R_tests/writable_dir/link_to_original.bam ->
>>~/R_tests/readonly_dir/original.bam
>>
>> And then I create an index for it in the writable directory:
>> ~/R_tests/writable_dir/link_to_original.bam.bai
>>
>> Then I start up R and get the following:
>> $ R --vanilla
>>
>> R version 2.14.2 (2012-02-29)
>> Copyright (C) 2012 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>>    Natural language support but running in an English locale
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>>> library(Rsamtools)
>> Loading required package: IRanges
>>
>> Attaching package: ŒIRanges¹
>>
>> The following object(s) are masked from Œpackage:base¹:
>>
>>      cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
>>      pmin, pmin.int, rbind, rep.int, setdiff, table, union
>>
>> Loading required package: GenomicRanges
>> Loading required package: Biostrings
>>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam",
>>>index="~/R_tests/writable_dir/link_to_original.bam.bai")
>>> my.bam
>> class: BamFile
>> path: /home/lswanson/R_tests/readonly_dir/original.bam
>> index: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
>> isOpen: FALSE
>>> open(my.bam)
>> Error in open.BamFile(my.bam) : failed to load BAM index
>>    file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
>> In addition: Warning message:
>> In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.
>>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam",
>>>index="~/R_tests/writable_dir/link_to_original.bam")
>>> my.bam
>> class: BamFile
>> path: /home/lswanson/R_tests/readonly_dir/original.bam
>> index: /home/lswanson/R_tests/readonly_dir/original.bam
>> isOpen: FALSE
>>> open(my.bam)
>> Error in open.BamFile(my.bam) : failed to load BAM index
>>    file: /home/lswanson/R_tests/readonly_dir/original.bam
>> In addition: Warning message:
>> In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.
>>
>>
>> Note that, even though I used the path
>>"~/R_tests/writable_dir/link_to_original.bam" (note WRITABLE_DIR) when I
>>created the BamFile object, when I view the details of the BamFile
>>object, it has changed the path to
>>"/home/lswanson/R_tests/readonly_dir/original.bam" (note READONLY_DIR).
>>
>> This change of the (symbolic link) writable_dir path that I pass to the
>>BamFile constructor to the (target) readonly_dir path is what I mean by
>>the dereferencing of the symbolic link path.
>>
>> Note that, if I change the name of the index file to:
>> ~/R_tests/writable_dir/renamed_index.bam.bai
>>
>> I can get it to work, by excluding the ".bai" from the end of the
>>"index" path I use in the BamFile constructor:
>>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam",
>>>index="~/R_tests/writable_dir/renamed_index.bam")
>>> my.bam
>> class: BamFile
>> path: /home/lswanson/R_tests/readonly_dir/original.bam
>> index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
>> isOpen: FALSE
>>> open(my.bam)
>>> my.bam
>> class: BamFile
>> path: /home/lswanson/R_tests/readonly_dir/original.bam
>> index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
>> isOpen: TRUE
>>
>> The behaviour that I expect is that when I make a command like:
>>> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam")
>> Then the "path" attribute of my.bam should be
>>"/home/lswanson/R_tests/writable_dir/link_to_original.bam", NOT
>>"/home/lswanson/R_tests/readonly_dir/original.bam"
>>
>> It looks like another aspect of the problem is that the "index"
>>parameter of the BamFile constructor is not actually expecting the full
>>path to the index file, since it is automatically adding ".bai" to
>>whatever path it is given (even if the path already ends with ".bai").
>>And the error message is a little misleading, since when it says:
>> Error in open.BamFile(my.bam) : failed to load BAM index
>>    file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
>> The path that it is actually trying to open as the index file is:
>> /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai.bai
>> Which, of course, does not exist.
>>
>> ~Lucas
>>
>> ________________________________________
>> From: Vincent Carey [stvjc at channing.harvard.edu]
>> Sent: Tuesday, October 16, 2012 8:19 AM
>> To: Cook, Malcolm
>> Cc: Lucas Swanson; bioconductor at r-project.org
>> Subject: Re: [BioC] Rsamtools Package Dereferences Symbolic Links
>>
>> I did not find these notes to be particularly clear.  I knew that
>>BamFile allows both file and index to be specified separately.
>>
>> In the following, ex1.bam is a symbolic link in current working folder,
>>and ex1.bam.bai is the physical index file.
>>
>>> X = BamFile("ex1.bam", index="./ex1.bam.bai")
>>> X
>> class: BamFile
>> path: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/ex1.bam
>> index: 
>>/Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FO.../ex1.bam.bai
>> isOpen: FALSE
>> yieldSize: NA
>>> open(X)
>> Error in open.BamFile(X) : failed to load BAM index
>>    file: 
>>/Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FOO/ex1.bam.bai
>> In addition: Warning message:
>> In open.BamFile(X) : [bam_index_load] fail to load BAM index.
>>
>> I suppose the "dereferencing" refers to the fact that FO... is not
>>present in the path report on X
>>
>> I was surprised that the error was thrown.
>>
>>> sessionInfo()
>> R Under development (unstable) (2012-10-07 r60889)
>> Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] 
>>en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US.US-AS
>>CII
>>
>> attached base packages:
>> [1] stats     graphics  grDevices datasets  utils     tools     methods
>> [8] base
>>
>> other attached packages:
>> [1] Rsamtools_1.10.1     Biostrings_2.26.1    GenomicRanges_1.10.1
>> [4] IRanges_1.16.2       BiocGenerics_0.4.0   BiocInstaller_1.8.2
>> [7] weaver_1.24.0        codetools_0.2-8      digest_0.5.2
>>
>> loaded via a namespace (and not attached):
>> [1] bitops_1.0-4.1  parallel_2.16.0 stats4_2.16.0   zlibbioc_1.4.0
>>
>>
>> On Tue, Oct 16, 2012 at 11:00 AM, Cook, Malcolm
>><MEC at stowers.org<mailto:MEC at stowers.org>> wrote:
>> +1
>>
>> I too have noticed this.  Further, consistently with this, if you use
>>the
>> Rsamtools package to create the indices to symlinks pointing to
>>bamfiles,
>> the indices are created in the target directory.
>>
>> I think if there is a code change to address this issue by allowing
>> control over whether links are dereferenced, the DEFAULT should be NOT
>>to
>> dereference like this.
>>
>> --Malcolm Cook
>>
>>
>> On 10/15/12 9:24 PM, "Lucas Swanson"
>><lswanson at bcgsc.ca<mailto:lswanson at bcgsc.ca>> wrote:
>>
>>> Hello,
>>>
>>> I am attempting to use your Rsamtools Bioconductor package.
>>> Unfortunately, I am having a bit of trouble. You see, my BAM files are
>>>in
>>> a directory to which I do not have write access, and are too large for
>>>me
>>> to copy to my own directory. So I created symbolic links in my own
>>> directory, pointing to the BAM files, and then indexed them in my own
>>> directory. However, when I try to use these symbolic links the
>>>Rsamtools
>>> package dereferences the links, and looks for the indexes in the
>>>original
>>> directory (to which I do not have write access), rather than in my own
>>> directory.
>>>
>>> Is there any way to prevent Rsamtools from dereferencing symbolic
>>>links?
>>> (That is, not replacing paths to symbolic links with paths to the
>>>target
>>> of the link)
>>>
>>> ~Thank you,
>>> Lucas Swanson
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>>http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>
>-- 
>Computational Biology / Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N.
>PO Box 19024 Seattle, WA 98109
>
>Location: Arnold Building M1 B861
>Phone: (206) 667-2793



More information about the Bioconductor mailing list