[BioC] VariantAnnotation: Performance and memory issues in readVcf

Ulrich Bodenhofer bodenhofer at bioinf.jku.at
Thu May 16 14:34:26 CEST 2013


Thanks for your reply, Valerie!

 > [...]
 > You mention that one file gives you 2 warnings but another gives you 
50. Are the other 50 warnings the same?


I checked the warning messages again and it turned out that I was wrong: 
the "duplicate keys" message does not appear multiple times, but, 
consistently with the ScanVcfParam example I sent yesterday, it appears 
only twice. All other warning messages (at least the ones that I can see 
with warnings()) are the following:

    unpackVcf field 'AD': NAs introduced by coercion

R just gives the first 50 warnings, so I do not know how often this one 
appears, but my estimate is that it appears as many times as the VCF 
sub-set has records (8,757 in my example). Do you think that this number 
of warnings could lead to the observed performance bottleneck? No matter 
whether this is the source of the problem or not: the lesson I learned 
is that I should always focus on the minimum necessary information when 
reading a VCF file. So thanks to you and Vincent for your great help!

Best regards,
Ulrich



More information about the Bioconductor mailing list