[BioC] Finding coding SNPs with predictCoding

Hervé Pagès hpages at fhcrc.org
Wed Feb 22 19:58:52 CET 2012


Hi Alex,

On 02/22/2012 03:56 AM, Alex Gutteridge wrote:
> Hi,
>
> I'm trying to find coding SNPs for a list of genes. I'm trying to use
> predictCoding() from VariantAnnotation in the devel version, but I'm
> getting an error I can't get to the bottom of. Here is the script:
>
> library(BSgenome.Hsapiens.UCSC.hg19)
> library(VariantAnnotation)
> library(SNPlocs.Hsapiens.dbSNP.20110815)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>
> #List of genes to retrieve coding snps for
> entrez.ids = c('6233')
>
> #Transcriptdb
> txdb19 = TxDb.Hsapiens.UCSC.hg19.knownGene
>
> #SNPs - renamed seq levels to be compatible with txdb
> snps = getSNPlocs(c("ch1","ch2"),as.GRanges=T)
> renamed.levels = gsub("ch","chr",seqlevels(snps))
> names(renamed.levels) = seqlevels(snps)
> snps = renameSeqlevels(snps,renamed.levels)

Or, more concisely, instead of the 3 above lines:

seqlevels(snps) <- gsub("ch", "chr", seqlevels(snps))

>
> #CDS grouped by gene
> gene.list = cdsBy(txdb19, by="gene")
> vsd.list = gene.list[entrez.ids]
>
> #For each gene
> for(eg in entrez.ids){
> #Find all SNPs
> snp.idx = queryHits(findOverlaps(snps, vsd.list[eg][[1]]))

Use vsd.list[[eg]] instead of vsd.list[eg][[1]]. It gives the
same result but is more concise. Also here, in your particular
example, your snp doesn't hit multiple CDS but it could. So
you might want to use unique:

snp.idx = unique(queryHits(findOverlaps(snps, vsd.list[[eg]])))


> eg.snps = snps[snp.idx]
> #Find variant alleles as iupac
> iupac = values(eg.snps)[,"alleles_as_ambig"]
> #Replicate snps according to number of variant alleles
> eg.snps.exp = rep(eg.snps, nchar(IUPAC_CODE_MAP[iupac]))
> #Create list of all variants
> variant.alleles =
> DNAStringSet(strsplit(paste(IUPAC_CODE_MAP[iupac],collapse=""),"")[[1]])
> #Find coding SNPs
> aa =
> predictCoding(eg.snps.exp,txdb19,seqSource=Hsapiens,varAllele=variant.alleles)
>
> }
>
> ########
>
> But the predictCoding call gives this error:
>
> Error in .setSeqNames(x, value) :
> The replacement value for isActiveSeq must be a logical vector, with
> names that match the seqlevels of the object
>
> As you can see I had to redo the seqlevels for the SNPs because they
> were ch1, ch2, etc... rather than chr1, chr2 etc... Have I done that
> correctly or introduced some inconsistency there?

The error message doesn't help much but I think the pb is that you
didn't rename chMT properly. Try to do this:

   seqlevels(snps) <- gsub("chrMT", "chrM", seqlevels(snps))

before you start the for(eg in entrez.ids){..} loop again.

Cheers,
H.

>
> ######
>
>> sessionInfo()
> R Under development (unstable) (2012-02-19 r58418)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] grid stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.6.4
> [2] GenomicFeatures_1.7.25
> [3] SNPlocs.Hsapiens.dbSNP.20110815_0.99.6
> [4] VariantAnnotation_1.1.51
> [5] Rsamtools_1.7.35
> [6] AnnotationDbi_1.17.21
> [7] Biobase_2.15.3
> [8] BSgenome.Hsapiens.UCSC.hg19_1.3.17
> [9] BSgenome_1.23.2
> [10] Biostrings_2.23.6
> [11] GenomicRanges_1.7.25
> [12] IRanges_1.13.24
> [13] BiocGenerics_0.1.4
> [14] Gviz_0.99.0
> [15] BiocInstaller_1.3.7
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.11.1 bitops_1.0-4.1 DBI_0.2-5 lattice_0.20-0
> [5] Matrix_1.0-3 RColorBrewer_1.0-5 RCurl_1.91-1 RSQLite_0.11.1
> [9] rtracklayer_1.15.7 snpStats_1.5.4 splines_2.15.0 survival_2.36-12
> [13] tools_2.15.0 XML_3.9-4 zlibbioc_1.1.1
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list