[BioC] how to operate on a DNAStringSet object

Steve Lianoglou mailinglist.honeypot at gmail.com
Thu Mar 21 22:05:40 CET 2013


Hi,

On Thu, Mar 21, 2013 at 4:48 PM, Chris Seidel <seidel at phaget4.org> wrote:
[aggressive clipping]

> What's odd, is that this actually works:
>
> DNAStringSet(do.call(c,unlist(myRandomizedseqs)))
>
> *IF* the sequences are NOT NAMED.

This (or similar things) have come up before on the ML, but I don't
have time to search for it right now. I posted a suggestion that I use
"unname" defensively to sidestep these corner cases. Perhaps that will
help you find the thread when searching the archives. In any event,
you could do:

R> DNAStringSet(do.call(c, unname(unlist(...))))

Now that I look at your example, I think the thread I'm talking about
might have been slightly different, but I guess this should still work
in your case.

> How does one operate on the sequences of a DNAStringSet object without
> getting a list back, or without a for loop? I'm sure there's some
> elegant one-liner that completely escapes me.

To randomize the sequences, you could do:

R> xx <- DNAStringSet(c("GATACA", "GATCCTAA"))
R> endoapply(xx, sample)
  A DNAStringSet instance of length 2
    width seq
[1]     6 ACGATA
[2]     8 GTCATAAC

Where did that come from, right?

Note that a DNAStringSet is an IRanges::Vector, and you'll find lots
of things in the IRangesOverview vignette, which at first might seem
like to long/detailed to read, but will be worth your time.

Not sure how fast this will be on large XStringSet object, though. You
may not buy yourself more speed than the for loop, but can't test that
right now. Perhaps lapply(DNAstringSet, sample) might be faster, but
I'll leave this as an exercise for the reader.

HTH,
-steve

-- 
Steve Lianoglou
Defender of The Thesis
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list