[BioC] consensusString Function

Patrick Aboyoun paboyoun at fhcrc.org
Wed Apr 7 00:29:12 CEST 2010


Erik, Heidi, and Wolfgang,
To bring this thread full circle, Biostrings::consensusString didn't 
support ambiguity letters in input strings for BioC <= 2.5 (R <= 2.10). 
This support has been added to BioC 2.6 (R 2.11), but as Wolfgang 
mentioned there was a bug in the code. This bug has now been fixed in 
BioC 2.6 and will be available for download from bioconductor.org within 
36 hours. Here are some examples of consensusString operating on 
DNAStringSet objects containing ambiguity letters:

 >   consensusString(DNAStringSet(c("NNNN","ACTG")))
[1] "ACTG"
 >   consensusString(DNAStringSet(c("AANN","ACTG")))
[1] "AMTG"
 >   consensusString(DNAStringSet(c("ACAG","ACAR")))
[1] "ACAR"
 >   consensusString(DNAStringSet(c("ACAG","ACAR", "ACAG")))
[1] "ACAG"
 > sessionInfo()
R version 2.11.0 alpha (2010-04-04 r51591)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.15.27 IRanges_1.5.74

loaded via a namespace (and not attached):
[1] Biobase_2.7.5



On 4/6/10 2:36 PM, Wolfgang Huber wrote:
> Hi Erik, Herv'e
>
> please always provide the output of sessionInfo(), and a complete 
> reproducible example (you let Heidi and the others guess that you're 
> talking about the Biostrings package).
>
> With a more recent version of Biostrings, I get:
>
> library("Biostrings")
> consensusString( DNAStringSet(c("AAAA","ACTG")) )
> # [1] "AMWR"
>
> consensusString( DNAStringSet(c("AAAB","ACTG")) )
> # Error in FUN(newX[, i], ...) :
>   'ambiguityMap' is missing some combinations of row names
>
> And going into the debugger where the error is caused, i.e. within the 
> function "consensusLetter", the expression
>
> i <- paste(all_letters[col >= threshold], collapse = "")
>
> is evaluated where
> Browse[1]> all_letters
> [1] "A" "C" "G" "T" "B"    ## length is 5
> Browse[1]> col
> [1] 1 0 0 0                ## length is 4
> Browse[1]> i
> [1] "AB"                   ## recycling rule was applied
>
> which seems unintended and with some more insight will probably lead 
> to the fixing of a bug.
>
>     Best wishes
>     Wolfgang
>
>
> > sessionInfo()
> R version 2.12.0 Under development (unstable) (2010-04-06 r51617)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=C              LC_NUMERIC=C            LC_TIME=C
>  [4] LC_COLLATE=C            LC_MONETARY=C LC_MESSAGES=it_IT.UTF-8
>  [7] LC_PAPER=C              LC_NAME=C               LC_ADDRESS=C
> [10] LC_TELEPHONE=C          LC_MEASUREMENT=C        LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] Biostrings_2.15.26 IRanges_1.5.74     fortunes_1.3-7
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.7.5
>
> Heidi Dvinge ha scritto:
>> Hello Erik,
>>
>> unfortunately I'm not familiar with the Biostrings package, so I can't
>> tell you why this doesn't work, but until someone else can answer, this
>> seems to be a work-around.
>>
>> Apparently, consensusString doesn't handle Ns.
>>
>>> test <- DNAStringSet(c("AANN","ACTG"))
>>> consensusString(test)
>> Error in .local(x, ...) :
>>   'threshold' must be a numeric in (0, 1/sum(rowSums(x) > 0)]
>>
>> If there are no Ns things are okay though.
>>
>>> test2 <- DNAStringSet(c("AAAA","ACTG"))
>>> consensusString(test2)
>> [1] "AMWR"
>>
>> However, Ns seem acceptable if the consensus matrix is calculated first,
>> although they might result in ?s where no consensus could be found.
>>
>>> test3 <- consensusMatrix(test)
>>> consensusString(test3)
>> [1] "A???"
>>
>> HTH
>> \Heidi
>>
>>
>>> Hello,
>>>
>>> I am trying to get a consensus string for a DNAStringSet, but I am 
>>> getting
>>> an error.  The documentation for consensusString says the argument 
>>> "x" is
>>> either a consensus matrix or an XStringSet.  So this should work, 
>>> right?:
>>>> myDNAStringSet <- DNAStringSet(c("NNNN","ACTG"))
>>>> consensusString(myDNAStringSet)
>>> Error in .local(x, ...) :
>>>   'threshold' must be a numeric in (0, 1/sum(rowSums(x) > 0)]
>>>
>>> Specifying a threshold in the arguments doesn't seem to make a 
>>> difference.
>>>
>>> Thanks!,
>>> Erik
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list