[BioC] find gene symbols immediately flanking before and after a SNP position
Adaikalavan Ramasamy
adaikalavan.ramasamy at gmail.com
Fri Feb 15 22:18:23 CET 2013
Dear Valerie,
Apologies for the delay. Your suggestions solves my problem perfectly.
Thank you for your help and also for writing the package. I have
blogged on how to do this at
http://adairama.wordpress.com/2013/02/15/functionally-annotate-snps-and-indels-in-bioconductor/
And here are R codes for posterity and if anyone is interested.
library(VariantAnnotation)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
#########################
## 1. Prepare the query file ##
#########################
input <- rbind.data.frame( c("rs3753344", "chr1", 1142150),
c("rs12191877", "chr6", 31252925), c("rs881375", "chr9", 123652898) )
colnames(input) <- c("rsid", "chr", "pos")
input$pos <- as.numeric(as.character(input$pos))
input
# rsid chr pos
# 1 rs3753344 chr1 1142150
# 2 rs12191877 chr6 31252925
# 3 rs881375 chr9 123652898
target <- with(input, GRanges( seqnames = Rle(chr),
ranges = IRanges(pos,
end=pos, names=rsid),
strand = Rle(strand("*")) ) )
###############
## 2. Annotate ##
###############
loc <- locateVariants(target, TxDb.Hsapiens.UCSC.hg19.knownGene, AllVariants())
names(loc) <- NULL
out <- as.data.frame(loc)
out$names <- names(target)[ out$QUERYID ]
out <- out[ , c("names", "seqnames", "start", "end", "LOCATION",
"GENEID", "PRECEDEID", "FOLLOWID")]
out <- unique(out)
out
# names seqnames start end LOCATION GENEID
PRECEDEID FOLLOWID
# 1 rs3753344 chr1 1142150 1142150 promoter 8784
<NA> <NA>
# 5 rs3753344 chr1 1142150 1142150 intergenic
<NA> 8784 7293
# 6 rs12191877 chr6 31252925 31252925 intron 3106
<NA> <NA>
# 7 rs881375 chr9 123652898 123652898 intergenic <NA>
26147 7185
##############################################
## 3. Convert from Entrez Gene ID to Gene Symbols ##
##############################################
library(org.Hs.eg.db)
Symbol2id <- as.list( org.Hs.egSYMBOL2EG )
id2Symbol <- rep( names(Symbol2id), sapply(Symbol2id, length) )
names(id2Symbol) <- unlist(Symbol2id)
x <- unique( with(out, c(levels(GENEID), levels(PRECEDEID), levels(FOLLOWID))) )
table( x %in% names(id2Symbol) ) # good, all found
out$GENESYMBOL <- id2Symbol[ as.character(out$GENEID) ]
out$PRECEDESYMBOL <- id2Symbol[ as.character(out$PRECEDEID) ]
out$FOLLOWSYMBOL <- id2Symbol[ as.character(out$FOLLOWID) ]
out
# names seqnames start end LOCATION
# 1 rs3753344 chr1 1142150 1142150 promoter
# 5 rs3753344 chr1 1142150 1142150 intergenic
# 6 rs12191877 chr6 31252925 31252925 intron
# 7 rs881375 chr9 123652898 123652898 intergenic
#
# GENEID PRECEDEID FOLLOWID GENESYMBOL PRECEDESYMBOL FOLLOWSYMBOL
# 1 8784 <NA> <NA> TNFRSF18
<NA> <NA>
# 5 <NA> 8784 7293 <NA>
TNFRSF18 TNFRSF4
# 6 3106 <NA> <NA> HLA-B
<NA> <NA>
# 7 <NA> 26147 7185 <NA>
PHF19 TRAF1
Many thanks. I hope this helps someone in the future!
Regards, Adai
On Wed, Feb 13, 2013 at 11:53 PM, Valerie Obenchain <vobencha at fhcrc.org> wrote:
>
> Hi Adai,
>
> When using precede() and follow() the 'x' and 'subject' arguments can be any of the following combinations,
>
> > showMethods("precede")
> Function: precede (package IRanges)
> x="ANY", subject="SummarizedExperiment"
> x="GenomicRanges", subject="GenomicRanges"
> x="GenomicRanges", subject="missing"
> x="Ranges", subject="RangesORmissing"
> x="SummarizedExperiment", subject="ANY"
> x="SummarizedExperiment", subject="SummarizedExperiment"
>
>
> The function transcriptsBy() returns a GRangesList. Instead try using the transcripts() function which will return a GRanges,
>
> tx <- transcripts(txdb)
>
>
> Another function worth exploring is locateVariants() in the VariantAnnotation package. See the examples on the ?locateVariants man page to make sure the seqnames (chromosome names) in your data and the txdb match. You can try using the AllVariants() region
>
> loc <- locateVariants(query, subject, AllVariants())
>
> or IntergenicVariants() if you are sure the snp is intergenic.
>
> loc <- locateVariants(query, subject, IntergenicVariants())
>
> In these examples, 'query' is a GRanges of your data and 'subject' is the txdb you made from UCSC.
>
> Valerie
>
>
>
>
>
>
> On 02/13/2013 02:38 PM, Adaikalavan Ramasamy wrote:
>>
>> Dear all,
>>
>> I have a list of several hundred SNP that I would like to annotate
>> functionally and am able to do this via websites such as SeattleSeq.
>> However, for intergenic SNPs it does not give me the neighbouring
>> genes. Therefore, I have tried to find genes immediately flanking a
>> SNP (one left and right) in R. I note that this question has been
>> asked previously. I am trying to follow one of the previous
>> suggestions (https://stat.ethz.ch/pipermail/bioconductor/2010-December/037185.html).
>> I been struggling with this for the last two days but I think I am
>> getting something fundamentally wrong.
>>
>> I have chosen the following two SNPs (among several thousands). I am
>> expecting to see the following kind of output:
>> rs881375 (chr9:123652898) is located between PHF19 and TRAF1
>> rs12191877 (chr6:31252925) is located between RPL3P2 and WASF5P
>>
>>
>> First, I code the query up as a GRange object:
>>
>> rsid <- c("rs881375", "rs12191877")
>> chr <- c("chr9", "chr6")
>> pos <- c(123652898, 31252925)
>>
>> library(GenomicFeatures)
>>
>> target <- GRanges(
>> seqnames = Rle( chr ),
>> ranges = IRanges(pos, end=pos, names=rsid),
>> strand = Rle(strand( rep("*", length(chr)) ))
>> )
>>
>> # GRanges with 2 ranges and 0 metadata columns:
>> # seqnames ranges strand
>> # <Rle> <IRanges> <Rle>
>> # rs881375 chr9 [123652898, 123652898] *
>> # rs12191877 chr6 [ 31252925, 31252925] *
>> # ---
>> # seqlengths:
>> # chr6 chr9
>> # NA NA
>>
>> txdb <- makeTranscriptDbFromUCSC("hg19") # about 5 min
>> tx <- transcriptsBy(txdb)
>>
>>
>> But when I try
>> precede( target, tx )
>> follow( target, tx )
>>
>> I get the following message:
>> Error in function (classes, fdef, mtable) :
>> unable to find an inherited method for function ‘precede’ for
>> signature ‘"GRanges", "GRangesList"’
>>
>> Any help would be very much appreciated. I am happy to try other
>> packages or websites if available. Many thanks.
>>
>> Regards, Adai
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list