[BioC] How to get a unique line of annotation for each specific genomic position by using biomaRt package

Mao Jianfeng jianfeng.mao at gmail.com
Tue Feb 8 15:24:03 CET 2011


Dear Steve,

Thanks for your kindness. Could you please give me more directions on
this annotation problem?

#########################
(1)
#########################
I want each my SNP has just one line of annotation in separate
columns. If there are the multiple terms for the same attributes (for
example, multiple go terms are shared at that location), I would like
to include them in the same column with symbols (such ;  :  | )
separated each of them.

for example I have SNPs like this:
# SNPs,chr,start,end
SNP_1,1,43,43
SNP_2,2,56,56

I would have annotations like this:
# SNPs,chr,start,end,go_term
SNP_1,1,43,43,go_1:go_3
SNP_2,2,56,56,go_100:go_1000

#########################
(2)
#########################
Alternatively, I would like to have the SNPs position be combined with
its annotations results, so as to know which the annotation lines are
corresponding to. I do not know how to do that using bioconductor
packages. Look the example followed:

for example I have SNPs like this:
# SNPs,chr,start,end
SNP_1,1,43,43
SNP_2,2,56,56

I would have annotations like this:
# SNPs,chr,start,end,go_term
SNP_1,1,43,43,go_1
SNP_1,1,43,43,go_3
SNP_2,2,56,56,go_100
SNP_2,2,56,56,go_1000

Jian-Feng,

2011/2/8 Steve Lianoglou <mailinglist.honeypot at gmail.com>:
> Hi,
>
> On Tue, Feb 8, 2011 at 5:49 AM, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:
>> Dear listers,
>>
>> I am new to bioconductor.
>>
>> I have genomic variations (SNP, indel, CNV) coordinated by
>> chromosome:start:end in GFF/BED/VCF format. One genomic variation is
>> defined a specific genomic position (in base pair).
>>
>> for example:
>> # SNPs,chr,start,end
>> SNP_1,1,43,43
>> SNP_2,2,56,56
>>
>> I would like to get such genomic variations annotated by various
>> gen/protein/passway centric annotations (as listed in BioMart
>> databases). I tried R/bioconductor biomaRt package. But, I failed to
>> get a unique line of annotation for a specific genomic position. Could
>> you please give any directions on that?
>
> Could you explain a bit more about what you mean when you say "get a
> unique line of annotation"?
>
> The only informative info `getBM` query is returning is the gene id
> for the location, and the GO term evidence code
> (go_biological_process_linkage_type). If you add, say,
> "go_biological_process_id", you get the biological go terms associated
> with the position, ie:
>
> result <- getBM(attributes=c("chromosome_name","start_position","ensembl_gene_id",
>  "go_biological_process_linkage_type", "go_biological_process_id"),
>  filters = c("chromosome_name", "start", "end"),
>  values = list(chr, start, end), mart=alyr, uniqueRows = TRUE)
>
> If you problem is that some positions have more than one row, like so:
>
> chromosome_name start_position     ensembl_gene_id  ...
> go_biological_process_id
>              1          33055   scaffold_100013.1
> GO:0006355
>              1          33055   scaffold_100013.1
> GO:0006886
>              1          33055   scaffold_100013.1
> GO:0006913
>              1          33055   scaffold_100013.1
> GO:0007165
>              1          33055   scaffold_100013.1
> GO:0007264
>
> this happens because multiple go terms are shared at that location. If
> you want to just pick one, but you'll have to decide how you want to
> do that.
>
> If you want to somehow summarize each chromosome/start_position into
> one row, you can iterate over the data by this combination easily
> with, say, the ddply function from the plyr package:
>
> library(plyr)
> summary <- ddply(result, .(chromosome_name, start_position), function(x) {
>  # x will have all of the rows for a given chromosome_name / start_position
>  # combo. We can arbitrarily just return the first row, but you'll likely
>  # want to do something smarter:
>  x[1,]
> })
>
> If you look at `summary`, you'll have one row per position.
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>



-- 
Jian-Feng, Mao

the Institute of Botany,
Chinese Academy of Botany,



More information about the Bioconductor mailing list