[BioC] GGtools: transScores function returning cis eQTL

James Peters [guest] guest at bioconductor.org
Thu Oct 4 17:56:01 CEST 2012

Hi there,

I'm using the excellent GGtools to package to look for eQTLs. When I use the transScore function to look for trans eQTL I find that some of the eQTL-gene pairs I discover are closer than the distance specified by the argument 'radius' (used to define what is cis; hence eSNPs closer to the target gene than radius should not be returned by transScores).

Any suggestions on what I'm doing wrong would be greatly appreciated.

Best wishes


An example using GGdata looking only at genes and SNPs on chromosome 1. Here I define trans as > 100,000 bps.


# get radius used to define cis
rad <- 1e+05

trS <-transScores("GGdata", snpchr = as.character(1), rhs = ~1, K = 20, targdirpref = "tsco", geneApply = lapply, chrnames = paste("chr", as.character(1), sep = ""), radius = rad, renameChrs = NULL, probesToKeep = NULL,
batchsize = 200, genegran = 50, shortfac = 10, wrapperEndo= function(x) MAFfilter(x, low=0.1),
geneannopk = "illuminaHumanv1.db", snpannopk = "SNPlocs.Hsapiens.dbSNP.20111119", gchrpref = "",
schrpref = "ch")

# get gene indexes
TOPind <- geneIndcol(trS, 1)

# construct results table using probe ids, chi squared #association scores and snp loci
TOP <- data.frame(cbind(probeid= featureNames(getSS("GGdata",1))[TOPind], score= topScores(trS), snpids = locusNames(trS)))

TOP$score <- as.numeric(as.character(TOP$score))

# set chi squared score threshold for significance for trans eQTL (equivalent to p val < 1e-11)
threshold <- 46.328476

tophits <- TOP[which(TOP$score > threshold),]

# map probe to gene symbol, entrez id and chromosome
PROBE2SYMS <- select(illuminaHumanv1.db, keys = as.character(tophits$probeid), cols = c("SYMBOL","ENTREZID","CHR"))

tophits <- cbind(PROBE2SYMS, tophits)

# get snp details
mysnps <- as.character(tophits$snpids)
annotSNPs <- AnnotateSNPList(mysnps, file="")

selectSNPinfo <- annotSNPs[,c("marker","chr","chrpos")]
# get annotation on the genes using NCBI
geneinfo <- GetGeneInfo(tophits$ENTREZID)
# ori gives info on whether the gene is on sense or #anti-sense strand
geneinfo <- geneinfo[,c("locusID","GeneLowPoint","GeneHighPoint","ori")]

y <- cbind(tophits, selectSNPinfo)
if(identical(y$marker, as.character(y$snpids))==FALSE){stop("The SNP ids don't match up")}

# merge gene info with snp info and trans eQTL results
final <- cbind(geneinfo, y)
# check
if(identical(final$locusID, final$ENTREZID)==FALSE){stop("The Entrez ids don't match up")}

# clean up the dataframe (remove duplicate cols, make col names clearer)
colnames(final)[1] <- 'EntrezID'
colnames(final)[8] <- 'geneCHR'
colnames(final)[13] <- 'snpCHR'
colnames(final)[14] <- 'snpPos'
final <- final[,-c(7,9,12)]

# calculate distance from snp to gene start position
diff <- rep(NA, nrow(final))
for (i in 1:nrow(final)){
  if(final$ori[i] == '+'){
    diff[i] <- abs(final$GeneLowPoint[i] - final$snpPos[i])  
  } else if
  (final$ori[i] == '-'){
    diff[i] <- abs(final$GeneHighPoint[i] - final$snpPos[i])

# eQTL which are in fact cis
final[which(diff < rad),]

 -- output of sessionInfo(): 

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] splines   stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] SNPlocs.Hsapiens.dbSNP.20111119_0.99.6 SNPlocs.Hsapiens.dbSNP.20110815_0.99.6 GGdata_1.0.19                          illuminaHumanv1.db_1.12.2              org.Hs.eg.db_2.7.1                    
 [6] RSQLite_0.11.2                         DBI_0.2-5                              AnnotationDbi_1.18.4                   Biobase_2.16.0                         GGtools_4.4.0                         
[11] Rsamtools_1.8.6                        Biostrings_2.24.1                      GenomicRanges_1.8.13                   IRanges_1.14.4                         BiocGenerics_0.2.0                    
[16] GGBase_3.18.0                          snpStats_1.6.0                         Matrix_1.0-9                           lattice_0.20-10                        survival_2.36-14                      
[21] NCBI2R_1.4.3                          

loaded via a namespace (and not attached):
 [1] annotate_1.34.1          biomaRt_2.12.0           bit_1.1-8                bitops_1.0-4.1           BSgenome_1.24.0          ff_2.2-7                 genefilter_1.38.0       
 [8] GenomicFeatures_1.8.3    grid_2.15.1              RCurl_1.95-0.1.2         rtracklayer_1.16.3       tools_2.15.1             VariantAnnotation_1.2.11 XML_3.95-0              
[15] xtable_1.7-0 

Sent via the guest posting facility at bioconductor.org.

More information about the Bioconductor mailing list