[BioC] ChIPpeakAnno, getAnnotation question

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Mon Aug 29 18:46:21 CEST 2011


Daria,

The warnings you experienced with 5utr has been fixed and transcript has
been added as an option for featureType. Please download the 2.0.2 version.

Thanks for your input!

Best regards,

Julie


On 8/29/11 10:56 AM, "Julie Zhu" <julie.zhu at umassmed.edu> wrote:

> Dear Daria,
> 
> By default, getAnnotation assumes featureType TSS. Currently, the parameter
> featureType accepts one of the feature types (case sensitive):
> "TSS","miRNA", "Exon", "5utr", "3utr" or "ExonPlusUtr". For example, 5utr
> for 5 UTR.
> 
> You were right that with parameter featureType set to TSS, getAnnotation
> returns the gene coordinates. If you think it is useful to have transcript
> coordinates, I will be happy to add featureType transcript. Thanks!
> 
> Best regards,
> 
> Julie
> 
> 
> 
> 
> 
> On 8/25/11 4:00 PM, "Daria Goranskaya" <daria.goranskaya at gmail.com> wrote:
> 
>> Dear Julie:
>> 
>> I'm PhD student in bioinformatics in Karolinska Institutet, Stockholm.
>> I've been using ChIPpeakAnno for my data and I found something strange
>> with getting annotation using getAnnotation function.  Could you take
>> a look on the following?
>> 
>>> library("biomaRt")
>>> library("ChIPpeakAnno")
>>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>> 
>> When I tried to get TSS , I got not transcripts, but genes. And the
>> last two commands gave the same results! That's strange, because in
>> the first command there should be plenty of other features except TSS.
>> Also I got an error, when asking for 5UTR.
>> 
>> How should I use this function to get necessary annotation features?
>> hank you in advance!
>> 
>> Best regards,
>> Daria
>> 
>> 
>> P.S. Here is the whole R history:
>> 
>>> library("biomaRt")
>>> library("ChIPpeakAnno")
>>> mart<-useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
>>> EnsemblAnnotation<-as.data.frame(getAnnotation(mart,
>>> featureType=c("TSS","miRNA", "Exon", "5utr", "3utr", "ExonPlusUtr")))
>>> EnsemblTSS<-as.data.frame(getAnnotation(mart, featureType=c("TSS")))
>>> Ensembl5utr<-as.data.frame(getAnnotation(mart, featureType=c("5utr")))
>> Warnings:
>> 1: In getAnnotation(mart, featureType = c("5utr")) :
>>   Following duplicated IDs found, only one of entries of the
>> duplicated id will be returned!
>> 2: In getAnnotation(mart, featureType = c("5utr")) :
>>   
>> 
ENST00000400678ENST00000400776ENST00000400776ENST00000546775ENST00000550740EN>>
S
>> 
T00000546775ENST00000550740ENST00000546775ENST00000451927ENST00000400890ENST0>>
0
>> 
000550740ENST00000552764ENST00000546832ENST00000549120ENST00000550740ENST0000>>
0
>> 
546775ENST00000552764ENST00000546736ENST00000346061ENST00000447903ENST0000027>>
2
>> 
035ENST00000413237ENST00000447903ENST00000346061ENST00000418749ENST0000040068>>
1
>> 
ENST00000418749ENST00000346061ENST00000400840ENST00000262316ENST00000420545EN>>
S
>> 
T00000450643ENST00000454039ENST00000338527ENST00000219431ENST00000436333ENST0>>
0
>> 
000397817ENST00000551377ENST00000368372ENST00000314367ENST00000431099ENST0000>>
0
>> 
382389ENST00000399951ENST00000323434ENST00000331302ENST00000399951ENST0000055>>
1
>> 
377ENST00000456528ENST00000521270ENST00000521145ENST00000523418ENST0000030881>>
1
>> 
ENST00000523162ENST00000522866ENST00000518414ENST00000521270ENST00000320552EN>>
S
>> 
T00000398612ENST00000325113ENST00000525282ENST00000540150ENST00000342593ENST0>>
0
>> 000399012ENST00000445062ENST00000429181ENST00000399012ENST000004
>> [... truncated]
>>> head(EnsemblAnnotation)
>>   space start   end width           names strand
>> 1     1 11869 14412  2544 ENSG00000223972      1
>> 2     1 14363 29806 15444 ENSG00000227232     -1
>> 3     1 29554 31109  1556 ENSG00000243485      1
>> 4     1 30366 30503   138 ENSG00000221311      1
>> 5     1 34554 36081  1528 ENSG00000237613     -1
>> 6     1 62948 63887   940 ENSG00000240361      1
>> 
>>               description
>> 1          DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
>> [Source:HGNC Symbol;Acc:37102]
>> 2                         WAS protein family homolog 7 pseudogene
>> [Source:HGNC Symbol;Acc:38034]
>> 3                                                microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 4                                                microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 5                   family with sequence similarity 138, member A
>> [Source:HGNC Symbol;Acc:32334]
>> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
>> [Source:HGNC Symbol;Acc:31276]
>>> head(EnsemblTSS)
>>   space start   end width           names strand
>> 1     1 11869 14412  2544 ENSG00000223972      1
>> 2     1 14363 29806 15444 ENSG00000227232     -1
>> 3     1 29554 31109  1556 ENSG00000243485      1
>> 4     1 30366 30503   138 ENSG00000221311      1
>> 5     1 34554 36081  1528 ENSG00000237613     -1
>> 6     1 62948 63887   940 ENSG00000240361      1
>> 
>>               description
>> 1          DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 11 like 1
>> [Source:HGNC Symbol;Acc:37102]
>> 2                         WAS protein family homolog 7 pseudogene
>> [Source:HGNC Symbol;Acc:38034]
>> 3                                                microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 4                                                microRNA 1302-10
>> [Source:HGNC Symbol;Acc:38233]
>> 5                   family with sequence similarity 138, member A
>> [Source:HGNC Symbol;Acc:32334]
>> 6 olfactory receptor, family 4, subfamily G, member 11 pseudogene
>> [Source:HGNC Symbol;Acc:31276]
>>> head(Ensembl5utr)
>>   space  start    end width           names strand
>> 1     1  35737  36081   345 ENST00000417324     -1
>> 2     1 367640 367658    19 ENST00000426406      1
>> 3     1 622035 622053    19 ENST00000332831     -1
>> 4     1 721320 721405    86 ENST00000358533      1
>> 5     1 860260 860328    69 ENST00000420190      1
>> 6     1 860530 860569    40 ENST00000437963      1
>> 
>>    description
>> 1        family with sequence similarity 138, member A [Source:HGNC
>> Symbol;Acc:32334]
>> 2 olfactory receptor, family 4, subfamily F, member 29 [Source:HGNC
>> Symbol;Acc:31275]
>> 3 olfactory receptor, family 4, subfamily F, member 16 [Source:HGNC
>> Symbol;Acc:15079]
>> 4             Transmembrane protein FLJ78588
>> [Source:UniProtKB/Swiss-Prot;Acc:A6NHI5]
>> 5             sterile alpha motif domain containing 11 [Source:HGNC
>> Symbol;Acc:28706]
>> 6             sterile alpha motif domain containing 11 [Source:HGNC
>> Symbol;Acc:28706]
>> 
>> 
>> 
>> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list