[BioC] dbSNP strand information

Alex Gutteridge alexg at ruggedtextile.com
Tue Feb 28 23:23:20 CET 2012


I notice the GRanges returned by the dbSNP packages have strand '*'. 
Does anyone know how safe am I in assuming that the variant alleles also 
given by the package actually correspond to the '+' strand? This seems 
to be the case for the 20 or so I have checked manually, but maybe I 
have just been lucky.

I ask this in the context of trying to use predictCoding in the 
VariantAnnotations package to find coding SNPs. For SNPs in genes on the 
'-' strand I have found that I have to complement the alleles given by 
dbSNP to get the correct result. I just want to make sure that assuming 
the alleles are from the '+' strand is a reasonable assumption in the 
vast majority (say >99%) of cases.

I realise from my reading of the SNPlocs.Hsapiens.dbSNP.20110815 manual 
that some SNPs will be incorrect anyway (it mentions ~0.1% of SNPs not 
mapping to the reference at all), that level of failure is acceptable, 
but anything higher would be a worry.

-- 
Alex Gutteridge



More information about the Bioconductor mailing list