[BioC] rtracklayer import.bed pipe inconsistency

Nathan Sheffield nathan.sheffield at duke.edu
Tue Aug 30 11:08:40 CEST 2011


Thanks, I figured it might just come down to updating.

And yes, I did mean nonstandard columns -- even though it's not "really" 
a BED file, it's still helpful to be able to import just the first 3 
columns so that my script can handle any type of BED-like file, standard 
or not.

The ability to select columns will be helpful in the future, thanks.

-Nathan

On 08/30/2011 01:05 AM, Michael Lawrence wrote:
> And also, just btw,
>
> What do you mean by a BED file more than three columns? rtracklayer can
> read those in just fine, unless they are non-standard columns, in which
> case you really don't have a BED file anyway.
>
> With newer versions of rtracklayer, one can specify the colnames
> argument to select only the desired BED columns. Passing character()
> would give you your desired result.
>
> Michael
>
> On Mon, Aug 29, 2011 at 4:02 PM, Michael Lawrence <michafla at gene.com
> <mailto:michafla at gene.com>> wrote:
>
>     I can't reproduce this:
>
>     >  import(pipe("cat ~/tmp/pipe-test.bed"), format="bed")
>
>     RangedData with 4 rows and 0 value columns across 3 spaces
>           space                 ranges |
>     <factor> <IRanges> |
>
>     1     chr1 [108503809, 108508915] |
>     2    chr17 [ 60212870,  60218774] |
>     3     chr8 [ 86373507,  86380637] |
>     4     chr8 [ 99303547,  99307608] |
>     >  sessionInfo()
>     R version 2.14.0 Under development (unstable) (--)
>     Platform: i686-pc-linux-gnu (32-bit)
>
>     locale:
>     [1] C
>
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils     datasets  methods   base
>
>     other attached packages:
>     [1] rtracklayer_1.13.12 RCurl_1.5-0         bitops_1.0-4.1
>
>     loaded via a namespace (and not attached):
>     [1] BSgenome_1.21.3      Biostrings_2.21.6    GenomicRanges_1.5.21
>     [4] IRanges_1.11.16      XML_3.2-0            zlibbioc_0.1.6
>
>
>     I don't remember this being an issue in the past, but who knows. My
>     only recommendation is to upgrade your R and rtracklayer.
>
>     Michael
>
>
>     On Mon, Aug 29, 2011 at 8:56 AM, Nathan Sheffield
>     <nathan.sheffield at duke.edu <mailto:nathan.sheffield at duke.edu>> wrote:
>
>         Hi,
>
>         I am having trouble with importing a bed file after running it
>         through pipe() in R. Maybe it's a bug in rtracklayer's
>         import.bed ? Or maybe I'm missing a setting, can anyone help
>         with this?
>
>         I have a bed file ("code25.bed") with 4 lines:
>         chr17 60212869 60218774
>         chr1 108503808 108508915
>         chr8 86373506 86380637
>         chr8 99303546 99307608
>
>         I can read it into R with read.table like so:
>
>             read.table("Aug5/codeBed/__code25.bed")
>
>              V1        V2        V3
>         1 chr17  60212869  60218774
>         2  chr1 108503808 108508915
>         3  chr8  86373506  86380637
>         4  chr8  99303546  99307608
>
>         I want to use rtracklayer to import to get a genomicRanges
>         object, so I try with import.bed, which also works:
>
>             import.bed("Aug5/codeBed/__code25.bed")
>
>         RangedData with 4 rows and 0 value columns across 3 spaces
>                 space                 ranges |
>         <character> <IRanges> |
>         1        chr1 [108503809, 108508915] |
>         2       chr17 [ 60212870,  60218774] |
>         3        chr8 [ 86373507,  86380637] |
>         4        chr8 [ 99303547,  99307608] |
>
>         Now, I want this to work on bed files with more than 3 columns,
>         just in case. I can do this with a commandline pipe using cut
>         like so:
>
>             read.table(pipe(paste("cut -f1,2,3 ",
>             "Aug5/codeBed/code25.bed")))
>
>              V1        V2        V3
>         1 chr17  60212869  60218774
>         2  chr1 108503808 108508915
>         3  chr8  86373506  86380637
>         4  chr8  99303546  99307608
>
>         So this gives the exact same output as the first read.table
>         above. However, when I try to pass this pipe to import.bed,
>         something strange happens:
>
>             import.bed(pipe(paste("cut -f1,2,3 ",
>             "Aug5/codeBed/code25.bed")))
>
>         RangedData with 5 rows and 0 value columns across 3 spaces
>                 space                 ranges |
>         <character> <IRanges> |
>         1        chr1 [108503809, 108508915] |
>         2       chr17 [ 60212870,  60218774] |
>         3       chr17 [ 60212870,  60218774] |
>         4        chr8 [ 86373507,  86380637] |
>         5        chr8 [ 99303547,  99307608] |
>
>         Not sure why, but it has duplicated one of the regions and now
>         has 5, instead of 4. This is a problem with import.bed combined
>         with pipe, and has nothing to do with cut:
>
>             import.bed(pipe("cat Aug5/codeBed/code25.bed"))
>
>         RangedData with 5 rows and 0 value columns across 3 spaces
>                 space                 ranges |
>         <character> <IRanges> |
>         1        chr1 [108503809, 108508915] |
>         2       chr17 [ 60212870,  60218774] |
>         3       chr17 [ 60212870,  60218774] |
>         4        chr8 [ 86373507,  86380637] |
>         5        chr8 [ 99303547,  99307608] |
>
>
>         any ideas?
>
>         -Nathan Sheffield
>         Duke University, Computational Biology Program
>
>         sessionInfo follows:
>
>         R version 2.12.0 (2010-10-15)
>         Platform: x86_64-unknown-linux-gnu (64-bit)
>         locale:
>           [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>           [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>           [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>           [7] LC_PAPER=no_NO.UTF-8       LC_NAME=C
>           [9] LC_ADDRESS=C               LC_TELEPHONE=C
>         [11] LC_MEASUREMENT=no_NO.UTF-8 LC_IDENTIFICATION=C
>
>         attached base packages:
>         [1] stats     graphics  grDevices utils     datasets  methods   base
>
>         other attached packages:
>         [1] rtracklayer_1.10.6  RCurl_1.4-3         bitops_1.0-4.1
>         [4] GenomicRanges_1.2.1 IRanges_1.8.7
>
>         loaded via a namespace (and not attached):
>         [1] Biobase_2.10.0    Biostrings_2.18.0 BSgenome_1.18.0   XML_3.2-0
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>



More information about the Bioconductor mailing list