[BioC] Rsamtools Package Dereferences Symbolic Links

Lucas Swanson lswanson at bcgsc.ca
Tue Oct 16 19:10:50 CEST 2012


I can try to clarify with a specific example.

My original (unindexed) bam file is in a directory to which I do not have write access:
~/R_tests/readonly_dir/original.bam

So I create a symbolic link to it in a directory to which I do have write access:
~/R_tests/writable_dir/link_to_original.bam -> ~/R_tests/readonly_dir/original.bam

And then I create an index for it in the writable directory:
~/R_tests/writable_dir/link_to_original.bam.bai

Then I start up R and get the following:
$ R --vanilla

R version 2.14.2 (2012-02-29)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rsamtools)
Loading required package: IRanges

Attaching package: ‘IRanges’

The following object(s) are masked from ‘package:base’:

    cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
    pmin, pmin.int, rbind, rep.int, setdiff, table, union

Loading required package: GenomicRanges
Loading required package: Biostrings
> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/link_to_original.bam.bai")
> my.bam
class: BamFile 
path: /home/lswanson/R_tests/readonly_dir/original.bam
index: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
isOpen: FALSE 
> open(my.bam)
Error in open.BamFile(my.bam) : failed to load BAM index
  file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
In addition: Warning message:
In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.
> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/link_to_original.bam")
> my.bam
class: BamFile 
path: /home/lswanson/R_tests/readonly_dir/original.bam
index: /home/lswanson/R_tests/readonly_dir/original.bam
isOpen: FALSE 
> open(my.bam)
Error in open.BamFile(my.bam) : failed to load BAM index
  file: /home/lswanson/R_tests/readonly_dir/original.bam
In addition: Warning message:
In open.BamFile(my.bam) : [bam_index_load] fail to load BAM index.


Note that, even though I used the path "~/R_tests/writable_dir/link_to_original.bam" (note WRITABLE_DIR) when I created the BamFile object, when I view the details of the BamFile object, it has changed the path to "/home/lswanson/R_tests/readonly_dir/original.bam" (note READONLY_DIR).

This change of the (symbolic link) writable_dir path that I pass to the BamFile constructor to the (target) readonly_dir path is what I mean by the dereferencing of the symbolic link path.

Note that, if I change the name of the index file to:
~/R_tests/writable_dir/renamed_index.bam.bai

I can get it to work, by excluding the ".bai" from the end of the "index" path I use in the BamFile constructor:
> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam", index="~/R_tests/writable_dir/renamed_index.bam")
> my.bam
class: BamFile 
path: /home/lswanson/R_tests/readonly_dir/original.bam
index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
isOpen: FALSE 
> open(my.bam)
> my.bam
class: BamFile 
path: /home/lswanson/R_tests/readonly_dir/original.bam
index: /home/lswanson/R_tests/writable_dir/renamed_index.bam
isOpen: TRUE 

The behaviour that I expect is that when I make a command like:
> my.bam <- BamFile("~/R_tests/writable_dir/link_to_original.bam")
Then the "path" attribute of my.bam should be "/home/lswanson/R_tests/writable_dir/link_to_original.bam", NOT "/home/lswanson/R_tests/readonly_dir/original.bam"

It looks like another aspect of the problem is that the "index" parameter of the BamFile constructor is not actually expecting the full path to the index file, since it is automatically adding ".bai" to whatever path it is given (even if the path already ends with ".bai"). And the error message is a little misleading, since when it says:
Error in open.BamFile(my.bam) : failed to load BAM index
  file: /home/lswanson/R_tests/writable_dir/link_to_original.bam.bai
The path that it is actually trying to open as the index file is:
/home/lswanson/R_tests/writable_dir/link_to_original.bam.bai.bai
Which, of course, does not exist.

~Lucas

________________________________________
From: Vincent Carey [stvjc at channing.harvard.edu]
Sent: Tuesday, October 16, 2012 8:19 AM
To: Cook, Malcolm
Cc: Lucas Swanson; bioconductor at r-project.org
Subject: Re: [BioC] Rsamtools Package Dereferences Symbolic Links

I did not find these notes to be particularly clear.  I knew that BamFile allows both file and index to be specified separately.

In the following, ex1.bam is a symbolic link in current working folder, and ex1.bam.bai is the physical index file.

> X = BamFile("ex1.bam", index="./ex1.bam.bai")
> X
class: BamFile
path: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/ex1.bam
index: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FO.../ex1.bam.bai
isOpen: FALSE
yieldSize: NA
> open(X)
Error in open.BamFile(X) : failed to load BAM index
  file: /Users/stvjc/ExternalSoft/Rpacks/Rsamtools/inst/extdata/FOO/ex1.bam.bai
In addition: Warning message:
In open.BamFile(X) : [bam_index_load] fail to load BAM index.

I suppose the "dereferencing" refers to the fact that FO... is not present in the path report on X

I was surprised that the error was thrown.

> sessionInfo()
R Under development (unstable) (2012-10-07 r60889)
Platform: x86_64-apple-darwin10.8.0/x86_64 (64-bit)

locale:
[1] en_US.US-ASCII/en_US.US-ASCII/en_US.US-ASCII/C/en_US.US-ASCII/en_US.US-ASCII

attached base packages:
[1] stats     graphics  grDevices datasets  utils     tools     methods
[8] base

other attached packages:
[1] Rsamtools_1.10.1     Biostrings_2.26.1    GenomicRanges_1.10.1
[4] IRanges_1.16.2       BiocGenerics_0.4.0   BiocInstaller_1.8.2
[7] weaver_1.24.0        codetools_0.2-8      digest_0.5.2

loaded via a namespace (and not attached):
[1] bitops_1.0-4.1  parallel_2.16.0 stats4_2.16.0   zlibbioc_1.4.0


On Tue, Oct 16, 2012 at 11:00 AM, Cook, Malcolm <MEC at stowers.org<mailto:MEC at stowers.org>> wrote:
+1

I too have noticed this.  Further, consistently with this, if you use the
Rsamtools package to create the indices to symlinks pointing to bamfiles,
the indices are created in the target directory.

I think if there is a code change to address this issue by allowing
control over whether links are dereferenced, the DEFAULT should be NOT to
dereference like this.

--Malcolm Cook


On 10/15/12 9:24 PM, "Lucas Swanson" <lswanson at bcgsc.ca<mailto:lswanson at bcgsc.ca>> wrote:

>Hello,
>
>I am attempting to use your Rsamtools Bioconductor package.
>Unfortunately, I am having a bit of trouble. You see, my BAM files are in
>a directory to which I do not have write access, and are too large for me
>to copy to my own directory. So I created symbolic links in my own
>directory, pointing to the BAM files, and then indexed them in my own
>directory. However, when I try to use these symbolic links the Rsamtools
>package dereferences the links, and looks for the indexes in the original
>directory (to which I do not have write access), rather than in my own
>directory.
>
>Is there any way to prevent Rsamtools from dereferencing symbolic links?
>(That is, not replacing paths to symbolic links with paths to the target
>of the link)
>
>~Thank you,
>Lucas Swanson
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives:
>http://news.gmane.org/gmane.science.biology.informatics.conductor

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list