[BioC] Finding coding SNPs with predictCoding

Alex Gutteridge alexg at ruggedtextile.com
Wed Feb 22 12:56:37 CET 2012


Hi,

I'm trying to find coding SNPs for a list of genes. I'm trying to use 
predictCoding() from VariantAnnotation in the devel version, but I'm 
getting an error I can't get to the bottom of. Here is the script:

library(BSgenome.Hsapiens.UCSC.hg19)
library(VariantAnnotation)
library(SNPlocs.Hsapiens.dbSNP.20110815)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

#List of genes to retrieve coding snps for
entrez.ids = c('6233')

#Transcriptdb
txdb19 = TxDb.Hsapiens.UCSC.hg19.knownGene

#SNPs - renamed seq levels to be compatible with txdb
snps   = getSNPlocs(c("ch1","ch2"),as.GRanges=T)
renamed.levels = gsub("ch","chr",seqlevels(snps))
names(renamed.levels) = seqlevels(snps)
snps = renameSeqlevels(snps,renamed.levels)

#CDS grouped by gene
gene.list = cdsBy(txdb19, by="gene")
vsd.list = gene.list[entrez.ids]

#For each gene
for(eg in entrez.ids){
	#Find all SNPs
	snp.idx = queryHits(findOverlaps(snps, vsd.list[eg][[1]]))
	eg.snps = snps[snp.idx]
	#Find variant alleles as iupac
	iupac   = values(eg.snps)[,"alleles_as_ambig"]
	#Replicate snps according to number of variant alleles
	eg.snps.exp = rep(eg.snps, nchar(IUPAC_CODE_MAP[iupac]))
	#Create list of all variants
	variant.alleles = 
DNAStringSet(strsplit(paste(IUPAC_CODE_MAP[iupac],collapse=""),"")[[1]])
	#Find coding SNPs
	aa = 
predictCoding(eg.snps.exp,txdb19,seqSource=Hsapiens,varAllele=variant.alleles)
}

########

But the predictCoding call gives this error:

Error in .setSeqNames(x, value) :
   The replacement value for isActiveSeq must be a logical vector, with 
names that match the seqlevels of the object

As you can see I had to redo the seqlevels for the SNPs because they 
were ch1, ch2, etc... rather than chr1, chr2 etc... Have I done that 
correctly or introduced some inconsistency there?

######

> sessionInfo()
R Under development (unstable) (2012-02-19 r58418)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.6.4
  [2] GenomicFeatures_1.7.25
  [3] SNPlocs.Hsapiens.dbSNP.20110815_0.99.6
  [4] VariantAnnotation_1.1.51
  [5] Rsamtools_1.7.35
  [6] AnnotationDbi_1.17.21
  [7] Biobase_2.15.3
  [8] BSgenome.Hsapiens.UCSC.hg19_1.3.17
  [9] BSgenome_1.23.2
[10] Biostrings_2.23.6
[11] GenomicRanges_1.7.25
[12] IRanges_1.13.24
[13] BiocGenerics_0.1.4
[14] Gviz_0.99.0
[15] BiocInstaller_1.3.7

loaded via a namespace (and not attached):
  [1] biomaRt_2.11.1     bitops_1.0-4.1     DBI_0.2-5          
lattice_0.20-0
  [5] Matrix_1.0-3       RColorBrewer_1.0-5 RCurl_1.91-1       
RSQLite_0.11.1
  [9] rtracklayer_1.15.7 snpStats_1.5.4     splines_2.15.0     
survival_2.36-12
[13] tools_2.15.0       XML_3.9-4          zlibbioc_1.1.1

-- 
Alex Gutteridge



More information about the Bioconductor mailing list