[BioC] ChIPpeakAnno::getEnrichedGo crashes but I don't know why

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Thu Jan 6 14:46:44 CET 2011


Eric,

You might want to take a look at the getBM function in biomaRt package to
see if you can do the conversion. Thanks!

Best regards,

Julie


On 1/5/11 5:31 PM, "Eric Cabot" <elcabot at gmail.com> wrote:

> Julie,
> 
>    I don't know how to do that in R. I guess I can always export the
> annotations, do the association in Perl and re-import them as a vector to
> use with getEnrichedGo.
> 
> Eric
> 
> Zhu, Lihua (Julie) wrote:
>> Eric,
>> 
>> You could convert the exon IDs associated with the peaks to ensemble gene
>> IDs and input these ensemble gene IDs to the getEnrichedGO function.
>> 
>> Best regards,
>> 
>> Julie
>> 
>> 
>> On 1/5/11 4:09 PM, "Eric Cabot" <elcabot at gmail.com> wrote:
>> 
>>> Hi Julie,
>>> 
>>>    It may be a while before I get back to you on this, because I did my
>>> mapping and ChIP-Seq analysis with Hg19 (NCBI 37), not Hg18 (NCBI 36).
>>> I'm also a little concerned about using transcription start site
>>> annotations rather than exons, because the the binding domains are not
>>> thought to be restricted to only promoters.  Any suggestions?
>>> 
>>> Eric
>>> 
>>> 
>>> 
>>> Zhu, Lihua (Julie) wrote:
>>>> Eric,
>>>> 
>>>> The annotated dataset has exon ID instead of gene ID while the
>>>> getEnrichedGO
>>>> is expecting feature_id_type="ensembl_gene_id". For a list of supported
>>>> feature_id_type, please type ?getEnrichedGO.
>>>> 
>>>> To use getEnrichedGO function, first get the TSS annotation.
>>>> 
>>>> TSS.human.NCBI36 = getAnnotation(ENSEMBLE_GENES_MART, featureType="TSS")
>>>> 
>>>> or use the build in TSS as
>>>> 
>>>> data(TSS.human.NCBI36)
>>>> 
>>>> Then annotate your peaks with TSS.human.NCBI36 followed by getEnrichedGO
>>>> call.
>>>> 
>>>> Please let me know if this works for you.
>>>> 
>>>> Best regards,
>>>> 
>>>> Julie
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 1/5/11 12:29 PM, "Eric Cabot" <elcabot at gmail.com> wrote:
>>>> 
>>>>> Hi Julie,
>>>>> 
>>>>>   Thank you for your response.
>>>>> 
>>>>> Here is the sessionInfo and traceback output and also a few lines of
>>>>> "my_annotated_regions".
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Eric Cabot
>>>>> 
>>>>>> sessionInfo()
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>> 
>>>>> locale:
>>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>> 
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>> 
>>>>> other attached packages:
>>>>>   [1] ChIPpeakAnno_1.6.0                  limma_3.6.9
>>>>>   [3] org.Hs.eg.db_2.4.6                  GO.db_2.4.5
>>>>>   [5] RSQLite_0.9-4                       DBI_0.2-5
>>>>>   [7] AnnotationDbi_1.12.0
>>>>> BSgenome.Ecoli.NCBI.20080805_1.3.16
>>>>>   [9] BSgenome_1.18.2                     GenomicRanges_1.2.2
>>>>> [11] Biostrings_2.18.2                   IRanges_1.8.8
>>>>> [13] multtest_2.6.0                      Biobase_2.10.0
>>>>> [15] biomaRt_2.6.0
>>>>> 
>>>>> loaded via a namespace (and not attached):
>>>>> [1] MASS_7.3-9      RCurl_1.5-0     splines_2.12.1  survival_2.36-2
>>>>> [5] tools_2.12.1    XML_3.2-0
>>>>> my_enrichedGO<-getEnrichedGO(my_annotated_regions,orgAnn="org.Hs.eg.db",ma
>>>>> xP
>>>>> =0
>>>>> .01,multiAdj=FALSE,minGOterm=1,feature_id_type="ensembl_gene_id")
>>>>> Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { :
>>>>>    argument is of length zero
>>>>>> traceback()
>>>>> 2: addAncestors(this.GO[this.GO[, 3] == "BP", ], "bp")
>>>>> 1: getEnrichedGO(FC2_annotated_regions, orgAnn = "org.Hs.eg.db",
>>>>>         maxP = 0.01, multiAdj = FALSE, minGOterm = 1, feature_id_type =
>>>>> "ensembl_gene_id")
>>>>> 
>>>>> 
>>>>> 
>>>>>> as.data.frame(my_annotated_regions[1:15,])
>>>>>     space     start       end width                   names    peak strand
>>>>> 1      1 241997936 241998205   270 R-10060 ENSE00001749374 R-10060      +
>>>>> 2      1 237109743 237110002   260 R-10082 ENSE00001643382 R-10082      +
>>>>> 3      1 236080267 236080415   149 R-10086 ENSE00001807176 R-10086      +
>>>>> 4      1 233853245 233853514   270 R-10096 ENSE00001776382 R-10096      +
>>>>> 5      1 233727956 233728104   149 R-10097 ENSE00001442190 R-10097      +
>>>>> 6      1 230728554 230728823   270 R-10108 ENSE00001731401 R-10108      +
>>>>> 7      1 229687129 229687277   149 R-10113 ENSE00001439385 R-10113      +
>>>>> 8      1 228943263 228943412   150 R-10121 ENSE00001903546 R-10121      +
>>>>> 9      1 218358885 218359176   292 R-10157 ENSE00001439386 R-10157      +
>>>>> 10     1 212254259 212254408   150 R-10179 ENSE00001624346 R-10179      +
>>>>> 11     1 210086264 210086513   250 R-10184 ENSE00001903225 R-10184      +
>>>>> 12     1 209863549 209863698   150 R-10185 ENSE00001336255 R-10185      +
>>>>> 13     1 207437117 207437264   148 R-10190 ENSE00001742112 R-10190      +
>>>>> 14     1 190352400 190352548   149 R-10246 ENSE00001782518 R-10246      +
>>>>> 15     1 184432607 184432755   149 R-10260 ENSE00001283926 R-10260      +
>>>>>             feature start_position end_position insideFeature
>>>>> distancetoFeature
>>>>> 1  ENSE00001749374      241995237    241996089    downstream
>>>>> 2699
>>>>> 2  ENSE00001643382      237144639    237145008      upstream
>>>>> -34896
>>>>> 3  ENSE00001807176      236078715    236078821    downstream
>>>>> 1552
>>>>> 4  ENSE00001776382      233807017    233807237    downstream
>>>>> 46228
>>>>> 5  ENSE00001442190      233749750    233750272      upstream
>>>>> -21794
>>>>> 6  ENSE00001731401      230728406    230728586    overlapEnd
>>>>> 148
>>>>> 7  ENSE00001439385      229685652    229685769    downstream
>>>>> 1477
>>>>> 8  ENSE00001903546      228882063    228882416    downstream
>>>>> 61200
>>>>> 9  ENSE00001439386      218303137    218303294    downstream
>>>>> 55748
>>>>> 10 ENSE00001624346      212253973    212254092    downstream
>>>>> 286
>>>>> 11 ENSE00001903225      210111538    210111622      upstream
>>>>> -25274
>>>>> 12 ENSE00001336255      209859550    209859630    downstream
>>>>> 3999
>>>>> 13 ENSE00001742112      207438342    207438381      upstream
>>>>> -1225
>>>>> 14 ENSE00001782518      190331193    190331400    downstream
>>>>> 21207
>>>>> 15 ENSE00001283926      184446520    184446737      upstream
>>>>> -13913
>>>>>     shortestDistance fromOverlappingOrNearest
>>>>> 1              1847             NearestStart
>>>>> 2             34637             NearestStart
>>>>> 3              1446             NearestStart
>>>>> 4             46008             NearestStart
>>>>> 5             21646             NearestStart
>>>>> 6                32             NearestStart
>>>>> 7              1360             NearestStart
>>>>> 8             60847             NearestStart
>>>>> 9             55591             NearestStart
>>>>> 10              167             NearestStart
>>>>> 11            25025             NearestStart
>>>>> 12             3919             NearestStart
>>>>> 13             1078             NearestStart
>>>>> 14            21000             NearestStart
>>>>> 15            13765             NearestStart
>>>>> 
>>>>> 
>>>>> Zhu, Lihua (Julie) wrote:
>>>>>> Hi Eric,
>>>>>> 
>>>>>> Could you please post the session information with sessionInfo() command?
>>>>>> Could you please also send a few ensembl IDs in your annotated dataset?
>>>>>> Thanks!
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>> Julie
>>>>>> 
>>>>>> 
>>>>>> On 1/4/11 6:51 PM, "Eric Cabot" <elcabot at gmail.com> wrote:
>>>>>> 
>>>>>>> I am a relatively new Bioconductor user and I am trying to analyze some
>>>>>>> ChIP-seq results that came from QuEST using the ChIPpeakAnno package.
>>>>>>> 
>>>>>>> After importing the regions of interest into RangedData objects and
>>>>>>> doing
>>>>>>> the following:
>>>>>>> 
>>>>>>> 
>>>>>> ENSEMBLE_GENES_MART<-useMart(biomart="ensembl",dataset="hsapiens_gene_ens
>>>>>> em
>>>>>> bl
>>>>>> ">
>>>>>> )
>>>>>>> ENSEMBL_ExonPlus_Annotation<-getAnnotation(ENSEMBLE_GENES_MART,
>>>>>>> featureType="ExonPlusUtr")
>>>>>>> 
>>>>>>> 
>>>>>>> I had no problem annotating  and generating  a Venn diagram to show the
>>>>>>> overlaps between my three sets of peaks. To annotate, I used:
>>>>>>> 
>>>>>>> annotated_regions=annotatePeakInBatch(myranged,
>>>>>>> AnnotationData=ENSEMBL_ExonPlus_Annotation)
>>>>>>> 
>>>>>>> 
>>>>>>> But I cannot seem to get the getEnrichedGo method to work on this (or my
>>>>>>> other two annotated regions). Here is a typical command line:
>>>>>>> 
>>>>>>> 
>>>>>>> my_enrichedGO<-getEnrichedGO(annotated_regions,orgAnn="org.Hs.eg.db",max
>>>>>>> P=
>>>>>>> 0.
>>>>>>> 01
>>>>>>> ,multiAdj=TRUE,minGOterm=1,
>>>>>>> multiAdjMethod="BH",feature_id_type="ensembl_gene_id")
>>>>>>> 
>>>>>>> and here is a typical error message:
>>>>>>> 
>>>>>>> enrichedGO<-getEnrichedGO(annotated_regions,orgAnn="org.Hs.eg.db",maxP=0
>>>>>>> .0
>>>>>>> 1,
>>>>>>> mu
>>>>>>> ltiAdj=TRUE,minGOterm=1,feature_id_type="ensembl_gene_id")
>>>>>>> Error in if (class(go.ids) != "matrix" | dim(go.ids)[2] < 4) { :
>>>>>>>    argument is of length zero
>>>>>>> 
>>>>>>> 
>>>>>>> Which leads me to ask:
>>>>>>> 
>>>>>>> 1) Is this error message supposed to be meaningful to me-i.e. a user-or
>>>>>>> is
>>>>>>> it something that I should be sending to the developer of the package?
>>>>>>> 
>>>>>>> 2) Is there anything obvious from this that suggests what corrective
>>>>>>> action I should be taking?
>>>>>>> 
>>>>>>> 
>>>>>>> Eric Cabot
>>>>>>> University of Wisconsin
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives:
>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>> 
>>>> 
>> 
>> 
> 



More information about the Bioconductor mailing list