[BioC] Easy way to convert CharacterList to character, collapsing each element?

Ryan C. Thompson rct at thompsonclan.org
Tue Dec 17 01:21:29 CET 2013


Thanks! I look forward to seeing this in the next release.


On 12/16/2013 04:16 PM, Hervé Pagès wrote:
> Hi Ryan,
>
> Here is one way to do this using Biostrings:
>
>   library(Biostrings)
>
>   strunsplit <- function(x, sep=",")
>   {
>     if (!is(x, "XStringSetList"))
>         x <- Biostrings:::XStringSetList("B", x)
>     if (!isSingleString(sep))
>         stop("'sep' must be a single character string")
>
>     ## unlist twice.
>     unlisted_x <- unlist(x, use.names=FALSE)
>     unlisted_ans0 <- unlist(unlisted_x, use.names=FALSE)
>
>     ## insert 'seq'.
>     unlisted_x_width <- width(unlisted_x)
>     x_partitioning <- PartitioningByEnd(x)
>     at <- cumsum(unlisted_x_width)[-end(x_partitioning)] + 1L
>     unlisted_ans <- replaceAt(unlisted_ans0, at, value=sep)
>
>     ## relist.
>     ans_width <- sum(relist(unlisted_x_width, x_partitioning))
>     x_eltlens <- width(x_partitioning)
>     idx <- which(x_eltlens >= 2L)
>     ans_width[idx] <- ans_width[idx] + (x_eltlens[idx] - 1L) * nchar(sep)
>     relist(unlisted_ans, PartitioningByWidth(ans_width))
>   }
>
> Then:
>
>   > x <- CharacterList(A=c("id35", "id2", "id18"), B=NULL, C="id4", 
> D=c("id2", "id4"))
>   > strunsplit(x)
>     A BStringSet instance of length 4
>       width seq names
>   [1]    13 id35,id2,id18                                     A
>   [2]     0                                                   B
>   [3]     3 id4                                               C
>   [4]     7 id2,id4                                           D
>
> I'll add this to Biostrings.
>
> Cheers,
> H.
>
>
> On 12/16/2013 03:04 PM, Ryan C. Thompson wrote:
>> Hi all,
>>
>> I have some annotation data in a DataFrame, and of course since
>> annotations are not one-to-one, some of the columns are CharacterList or
>> similar classes. I would like to know if there is an efficient way to
>> collapse a CharacterList to a character vector of the same length, such
>> that for elements of length > 1, those elements are collapsed with a
>> given separator. The following is what I came up with, but it is very
>> slow for large CharacterLists:
>>
>> library(stringr)
>> library(plyr)
>> flatten.CharacterList <- function(x, sep=",") {
>>    if (is.list(x)) {
>>      x[!is.na(x)] <- laply(x[!is.na(x)], str_c, collapse=sep,
>> .parallel=TRUE)
>>      x <- as(x, "character")
>>    }
>>    x
>> }
>>
>> -Ryan
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list