[BioC] problems with strand in predictCoding

Thu Apr 26 08:59:25 CEST 2012

Hi all,

Thanks for the feedback and discussion on this topic. A patch for 
predictCoding() has been applied to 1.2.7 release/1.3.9 devel. In the 
devel branch I've created a subfunction of predictCoding() for more 
convenient testing. It accepts a GRangesList (expected to be the 
cdsbytx). The behavior is as follows,

library(BSgenome.Hsapiens.UCSC.hg19)
fun <- VariantAnnotation:::.predictCodingGRangesList
cdsbytx <- GRangesList(tx1=GRanges(seqnames="chr1",

IRanges(c(10001, 10010), width=5),
                                                            strand="+", 
cds_rank=c(1, 2)),
                                      tx2=GRanges(seqnames="chr1",

IRanges(c(10100, 10001), width=5),
                                                            strand="-", 
cds_rank=c(1,2)))
variant=DNAStringSet(c("G", "G", "C", "T", "G"))
query <- GRanges(seqnames="chr1",
                              ranges=IRanges(c(rep(10003, 3), 10011, 
10101), width=1),
                              strand=c("+", "-", "*", "*", "*"), 
variant=variant)
names(query) <- LETTERS[1:5]
coding <- fun(query, cdsbytx, Hsapiens, variant)

 > coding
GRanges with 6 ranges and 13 elementMetadata cols:
     seqnames         ranges strand |        variant      varAllele    
CDSLOC
<Rle> <IRanges> <Rle> | <DNAStringSet> <DNAStringSet> <IRanges>
   A     chr1 [10003, 10003]      + |              G              G    
[3, 3]
   B     chr1 [10003, 10003]      - |              G              C    
[8, 8]
   C     chr1 [10003, 10003]      + |              C              C    
[3, 3]
   C     chr1 [10003, 10003]      - |              C              G    
[8, 8]
   D     chr1 [10011, 10011]      + |              T              T    
[7, 7]
   E     chr1 [10101, 10101]      - |              G              C    
[4, 4]
                  PROTEINLOC   QUERYID        TXID       CDSID      GENEID
<CompressedIntegerList> <integer> <character> <character> <character>
   A                       1         1         tx1 <NA> <NA>
   B                       3         2         tx2 <NA> <NA>
   C                       1         3         tx1 <NA> <NA>
   C                       3         3         tx2 <NA> <NA>
   D                       3         4         tx1 <NA> <NA>
   E                       2         5         tx2 <NA> <NA>
       CONSEQUENCE       REFCODON       VARCODON         REFAA         VARAA
<factor> <DNAStringSet> <DNAStringSet> <AAStringSet> <AAStringSet>
   A    synonymous            TAA            TAG             *             *
   B nonsynonymous            GTT            GCT             V             A
   C nonsynonymous            TAA            TAC             *             Y
   C nonsynonymous            GTT            GGT             V             G
   D nonsynonymous            CCT            TCT             P             S
   E nonsynonymous            GGG            CGG             G             R

Note the strand of the output reflects the strand of the subject that 
was hit. query "C" is unstranded and hits a subject on both the "+" and 
"-" strand.

A few more changes you should be aware of in the devel branch,

- Column names of the output have changed to be consistent with select() 
in AnnotationDbi.
- If the variant is nonsynonymous and the VARCODON has a stop codon not 
present in the REFCODON the CONSEQUENCE is now 'nonsense'
instead of 'nonsynonymous'. I'm working on annotating the loss of start 
codons as well.

Valerie

On 04/25/12 08:28, Valerie Obenchain wrote:
> On 04/25/2012 12:28 AM, Bernd wrote:
>> Hi there,
>>
>> Jeremiah Degenhardt<degenhardt.jeremiah at ...>  writes:
>>
>>> Hello BioC,
>>>
>>> I am using the predictCoding function in the VariantAnnotation package
>>> and have run into a few issues.
>>>
>>> The first issue that I found is a small one. While I can supply any
>>> txdb object that I want to the predictCoding, the function seems to
>>> still be looking for TxDb.Hsapiens.UCSC.hg19.knownGene. Running
>>> predict coding without this package installed results in the following
>>> error:
>>>
>>> Error in isCircular(TxDb.Hsapiens.UCSC.hg19.knownGene) :
>>>    error in evaluating the argument 'x' in selecting a method for
>>> function 'isCircular': Error: object
>>> ''
>> Since I am using the library TxDb.Mmusculus.UCSC.mm9.knownGene this 
>> is a problem
>> for my analysis of snps in mice. I get the same error by using this 
>> code:
>>
>> library(BSgenome.Mmusculus.UCSC.mm9)
>> library(TxDb.Mmusculus.UCSC.mm9.knownGene)
>>
>> txdb = TxDb.Mmusculus.UCSC.mm9.knownGene
>>
>> snp1 = GRanges(seqnames="chr2", ranges=IRanges(start=28920573, width=1),
>> strand="+", varAllele=DNAStringSet("A"))
>>
>> predictCoding(query=snp1, subject=txdb, seqSource=Mmusculus,
>> varAllele=values(snp1)$varAllele)
>>
>> Is there any solution?
>> Tanks a lot!
>>
>
> The txdb error has been fixed in VariantAnnotation 1.2.6 in release 
> and 1.3.8 in devel. I'm currently working on the predictCoding strand 
> issue and will post back here when it's fixed.
>
> Valerie
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor