[BioC] ChIPpeakAnno, getAnnotation question

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Mon Aug 29 16:56:47 CEST 2011


Dear Daria,

By default, getAnnotation assumes featureType TSS. Currently, the parameter
featureType accepts one of the feature types (case sensitive):
"TSS","miRNA", "Exon", "5utr", "3utr" or "ExonPlusUtr". For example, 5utr
for 5 UTR.

You were right that with parameter featureType set to TSS, getAnnotation
returns the gene coordinates. If you think it is useful to have transcript
coordinates, I will be happy to add featureType transcript. Thanks!

Best regards,

Julie





On 8/25/11 4:00 PM, "Daria Goranskaya" <daria.goranskaya at gmail.com> wrote:

> Dear Julie:
> 
> I'm PhD student in bioinformatics in Karolinska Institutet, Stockholm.
> I've been using ChIPpeakAnno for my data and I found something strange
> with getting annotation using getAnnotation function.  Could you take
> a look on the following?
> 
>> library("biomaRt")
>> library("ChIPpeakAnno")
>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
> 
> When I tried to get TSS , I got not transcripts, but genes. And the
> last two commands gave the same results! That's strange, because in
> the first command there should be plenty of other features except TSS.
> Also I got an error, when asking for 5UTR.
> 
> How should I use this function to get necessary annotation features?
> hank you in advance!
> 
> Best regards,
> Daria
> 
> 
> P.S. Here is the whole R history:
> 
>> library("biomaRt")
>> library("ChIPpeakAnno")
>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>> Ensembl5utr<-as.data.frame(getAnnotation(mart, featureType=c("5utr")))
> Warnings:
> 1: In getAnnotation(mart, featureType = c("5utr")) :
>   Following duplicated IDs found, only one of entries of the
> duplicated id will be returned!
> 2: In getAnnotation(mart, featureType = c("5utr")) :
>   
> ENST00000400678ENST00000400776ENST00000400776ENST00000546775ENST00000550740ENS
> T00000546775ENST00000550740ENST00000546775ENST00000451927ENST00000400890ENST00
> 000550740ENST00000552764ENST00000546832ENST00000549120ENST00000550740ENST00000
> 546775ENST00000552764ENST00000546736ENST00000346061ENST00000447903ENST00000272
> 035ENST00000413237ENST00000447903ENST00000346061ENST00000418749ENST00000400681
> ENST00000418749ENST00000346061ENST00000400840ENST00000262316ENST00000420545ENS
> T00000450643ENST00000454039ENST00000338527ENST00000219431ENST00000436333ENST00
> 000397817ENST00000551377ENST00000368372ENST00000314367ENST00000431099ENST00000
> 382389ENST00000399951ENST00000323434ENST00000331302ENST00000399951ENST00000551
> 377ENST00000456528ENST00000521270ENST00000521145ENST00000523418ENST00000308811
> ENST00000523162ENST00000522866ENST00000518414ENST00000521270ENST00000320552ENS
> T00000398612ENST00000325113ENST00000525282ENST00000540150ENST00000342593ENST00
> 000399012ENST00000445062ENST00000429181ENST00000399012ENST000004
> [... truncated]
>> head(EnsemblAnnotation)
>   space start   end width           names strand
> 1     1 11869 14412  2544 ENSG00000223972      1
> 2     1 14363 29806 15444 ENSG00000227232     -1
> 3     1 29554 31109  1556 ENSG00000243485      1
> 4     1 30366 30503   138 ENSG00000221311      1
> 5     1 34554 36081  1528 ENSG00000237613     -1
> 6     1 62948 63887   940 ENSG00000240361      1
> 
>               description
> 1          DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
> [Source:HGNC Symbol;Acc:37102]
> 2                         WAS protein family homolog 7 pseudogene
> [Source:HGNC Symbol;Acc:38034]
> 3                                                microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 4                                                microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 5                   family with sequence similarity 138, member A
> [Source:HGNC Symbol;Acc:32334]
> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
> [Source:HGNC Symbol;Acc:31276]
>> head(EnsemblTSS)
>   space start   end width           names strand
> 1     1 11869 14412  2544 ENSG00000223972      1
> 2     1 14363 29806 15444 ENSG00000227232     -1
> 3     1 29554 31109  1556 ENSG00000243485      1
> 4     1 30366 30503   138 ENSG00000221311      1
> 5     1 34554 36081  1528 ENSG00000237613     -1
> 6     1 62948 63887   940 ENSG00000240361      1
> 
>               description
> 1          DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
> [Source:HGNC Symbol;Acc:37102]
> 2                         WAS protein family homolog 7 pseudogene
> [Source:HGNC Symbol;Acc:38034]
> 3                                                microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 4                                                microRNA 1302-10
> [Source:HGNC Symbol;Acc:38233]
> 5                   family with sequence similarity 138, member A
> [Source:HGNC Symbol;Acc:32334]
> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
> [Source:HGNC Symbol;Acc:31276]
>> head(Ensembl5utr)
>   space  start    end width           names strand
> 1     1  35737  36081   345 ENST00000417324     -1
> 2     1 367640 367658    19 ENST00000426406      1
> 3     1 622035 622053    19 ENST00000332831     -1
> 4     1 721320 721405    86 ENST00000358533      1
> 5     1 860260 860328    69 ENST00000420190      1
> 6     1 860530 860569    40 ENST00000437963      1
> 
>    description
> 1        family with sequence similarity 138, member A [Source:HGNC
> Symbol;Acc:32334]
> 2 olfactory receptor, family 4, subfamily F, member 29 [Source:HGNC
> Symbol;Acc:31275]
> 3 olfactory receptor, family 4, subfamily F, member 16 [Source:HGNC
> Symbol;Acc:15079]
> 4             Transmembrane protein FLJ78588
> [Source:UniProtKB/Swiss-Prot;Acc:A6NHI5]
> 5             sterile alpha motif domain containing 11 [Source:HGNC
> Symbol;Acc:28706]
> 6             sterile alpha motif domain containing 11 [Source:HGNC
> Symbol;Acc:28706]
> 
> 
> 
> 



More information about the Bioconductor mailing list