[BioC] DNAStringSet_translate error in predictCoding()

Valerie Obenchain vobencha at fhcrc.org
Wed Jun 18 22:17:58 CEST 2014


Hi Jörg,

It looks like your sessionInfo() output was cut off and I can't tell 
what version of VariantAnnotation you have.

Versions >= 1.10.0 detect structrural variants and create either a 
CharacterList or DNAStringSetList. Since you have a DNAStringSetList, 
all values should be valid bases.

Does this return TRUE?

     hasOnlyBaseLetters(unlist(alt(vcf)))

Are there any non-base characters in the matrix?

     alphabetFrequency(unlist(alt(vcf)))


To help further I'll need the version of VariantAnnotation and a 
reproducible example.

Valerie



On 06/17/2014 05:45 AM, "Dr. Jörg Linde" wrote:
> Dear bioconductor team,
>
> I have a problem with predictCoding() of the VariantAnnotation library
> posing an error which is the same as described here:
> https://stat.ethz.ch/pipermail/bioconductor/2012-November/048940.html
>
> Howerver, after reading my vcf it clearly has  a DNAStringSetList in
> it's ALT variable.
> The problem remains when using vcftools to remove indels from the vcf.
> As far as I see there are some ALTs with two possibilities.
> Is there anything else which could cause the problem?
>
> I am also aware of this thread
> https://stat.ethz.ch/pipermail/bioconductor/2012-October/048370.html
> but I can't figure out how to remove those lines causing the problem.
>
> Thank you very much
> Jörg
>
>   vcf=readVcf("file.vcf","hg")
>   coding <- predictCoding(vcf, txdb, seqSource=fa)
> Error in .Call2("DNAStringSet_translate", x, DNA_BASE_CODES, lkup,
> skipcode,  :
>    in 'x[[6655]]': not a base at pos 3
>  > alt(vcf)
> DNAStringSetList of length 142721
> [[1]] C
> [[2]] T
> [[3]] G
> [[4]] G
> [[5]] G
> [[6]] C
> [[7]] C
> [[8]] A
> [[9]] G
> [[10]] C
> ..
> <142711 more elements>
>  > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Valerie Obenchain
Program in Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, Seattle, WA 98109

Email: vobencha at fhcrc.org
Phone: (206) 667-3158



More information about the Bioconductor mailing list