[BioC] Error in locateVariants

Thu Jul 25 00:34:48 CEST 2013

Thanks for sending the files. Please keep the conversation 'on-list' by 
cc'ing bioconductor at r-project.org.

There was a bug in locateVariants() related to 'features' not having 
txid or cdsid. That is now fixed in devel (1.7.38) and release (1.6.7). 
Thanks for reporting this.

A couple things to note:

(1) The second argument to readVcf() is 'genome'. Here you are setting 
the genome to the string 'Pinfestans.fa' string which is probably not 
what you intended to do.

     vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa")

(2) When 'features' is a GRangesList it must be an appropriate 
extraction to match the 'region' argument (explained on the man page). 
You are interested in CodingVariants so the GRangesList must be 
equivalent to a cdsBy(txdb, "transcript") extraction. The 
'myGRangesList' object is of length 1 with 801 different seqlevels in 
the single list element.

myGRangesList<-GRangesList(myGRanges)
 > length(myGRangesList)
[1] 1
 > length(seqlevels(myGRangesList))
[1] 801

The idea is that cds regions from the same transcript (i.e., same 
seqlevel) are grouped in list elements. Assuming that the seqlevels in 
this file are transcript names, you could subset by 'type' and split on 
seqname to get cds by transcripts.

 > cds <- myGRanges[myGRanges$type == "CDS"]
 > cdsby <- split(cds, seqnames(cds))
 > length(cdsby)
[1] 801

An alternative approach to is to make a TranscriptDb from your gff file 
with makeTranscriptDbFromGFF() in GenomicFeatures. Either the txdb or 
cdsby extraction can be used as 'features'.

     txdb <- makeTranscriptDbFromGFF("annotation.gff3")
     cdsby <- cdsBy(txdb, "transcript")

Oops. Looks like this gff file has something strange going on. I'll send 
the report to Marc.

 >  txdb <- 
makeTranscriptDbFromGFF("PI_T30-4_FINAL_CALLGENES_3.annotation.gff3")
Error in .parse_attrCol(attrCol, file, colnames) :
   Some attributes do not conform to 'tag=value' format

At any rate, makeTranscriptDbFromGFF() can be a useful function to 
convert gff to txdb for well behaved files.

Valerie

On 07/24/2013 01:34 PM, Valerie Obenchain wrote:
> Hi Jolly,
>
> I need a reproducible example in order to help. Are 'rd' and
> 'myGRangesList' too large to send? Also provide the output of
> sessionInfo().
>
> Valerie
>
>
> On 07/24/2013 11:38 AM, Jolly Shrivastava wrote:
>> Hello Valerie,
>>
>> I am running VariantAnnotation package to locate the position of SNPs
>> using
>> locateVariants.
>>
>> The commands that I gave to extract vcf file and the gff file are as
>> follows:
>>
>> ###Reading the vcf file
>> vcf1<-readVcf("CGTACG_filteredSNPS.vcf","Pinfestans.fa")
>> rd<-rowData(vcf1)
>>
>> ####imported and converted the gff3 file to GRangesList object
>> subject<-import.gff("annotation.gff3")
>> myGRanges<-as(subject,"GRanges")
>> myGRangesList<-GRangesList(myGRanges)
>>
>>
>> ### made sure that both of them have same sequence levels
>> rd<-keepSeqlevels(rd,intersect(seqlevels(rd),seqlevels(myGRangesList))
>> myGRangesList<-keepSeqlevels(myGRangesList,intersect(seqlevels(rd),seqlevels(myGRangesList))
>>
>>
>> ###Ran the locateVariants command
>> variants_found<-locateVariants(rd,myGRangesList,CodingVariants())
>>
>> ###Got the following error
>> Error in DataFrame(...) : different row counts implied by arguments
>>
>> Can you please point to what I am doing wrong?
>>
>> Regards
>> Jolly
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor