[BioC] ChIPpeakAnno 2.12.2 bug

Dan Tenenbaum dtenenba at fhcrc.org
Tue Aug 19 07:13:00 CEST 2014



----- Original Message -----
> From: "Matthew S Zinkgraf -FS" <mszinkgraf at fs.fed.us>
> To: bioconductor at r-project.org
> Sent: Monday, August 18, 2014 11:49:07 AM
> Subject: Re: [BioC] ChIPpeakAnno 2.12.2 bug
> 
> Hi
> 
> Thanks for working on this but I am still getting the same result
> even with 2.13.1. The annotatePeakInBatch is not assigning peaks to
> features correctly when genes are located on the negative strand.
> Below is an example that produces an incorrect result when using
> annotatePeakInBatch. I think the problem is located in the  output=
> "nearestStart" because "overlapping" produces the correct result.
> Specifically the second line of the annPeak output should be
> assigning peak1 to overlapEnd instead of overlapStart because the
> gene feature is located on the negative strand.
> 
> 
> 
> Matt
> 
> 
> > library(ChIPpeakAnno)
> >
> >
> > peak<-RangedData(IRanges(start=as.numeric(as.character(c(1650,2806,8361))),
> > end=as.numeric(as.character(c(1860,3006,8591)))),
> > space=c("Chr01","Chr01","Chr01"),strand=as.integer(1))
> > row.names(peak)<-c("peak1","peak2","peak3")
> >
> >
> > #load subset of data from Populus v210 gff3 file
> > genome<-as.data.frame(rbind(c("Chr01","phytozome8_0",
> >      "gene",1660,2502, ".",     "-",
> >     ".","Potri.001G000100"),c("Chr01", "phytozome8_0",
> >     "gene",2906,6646,    ".",        "-",
> >     ".","Potri.001G000200"),c("Chr01", "phytozome8_0",
> >   "gene",8391,8860, ".",     "+",     ".","Potri.001G000300")))
> >
> >
> 
> > GENOME<-GFF2RangedData(genome)
> 
> Warning message:
> 
> In DataFrame(...) : NAs introduced by coercion
> 
> > row.names(GENOME)<-genome[,9]
> >
> >   annPeak = annotatePeakInBatch(peak, AnnotationData=GENOME,
> >   PeakLocForDistance ="middle",output="both")
> Warning messages:
> 1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by
> coercion
> 2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by
> coercion
> >
> > View(annPeak)
> 
> space
> 
> start
> 
> end
> 
> width
> 
> names
> 
> peak
> 
> strand
> 
> feature
> 
> start_position
> 
> end_position
> 
> insideFeature
> 
> distancetoFeature
> 
> shortestDistance
> 
> Chr01
> 
> 8361
> 
> 8591
> 
> 231
> 
> peak3 Potri.001G000300
> 
> peak3
> 
> +
> 
> Potri.001G000300
> 
> 8391
> 
> 8860
> 
> overlapStart
> 
> 85
> 
> 30
> 
> Chr01
> 
> 1650
> 
> 1860
> 
> 211
> 
> peak1 Potri.001G000100
> 
> peak1
> 
> -
> 
> Potri.001G000100
> 
> 1660
> 
> 2502
> 
> overlapStart
> 
> 747
> 
> 10
> 
> Chr01
> 
> 2806
> 
> 3006
> 
> 201
> 
> peak2 Potri.001G000100
> 
> peak2
> 
> -
> 
> Potri.001G000100
> 
> 1660
> 
> 2502
> 
> downstream
> 
> -404
> 
> 304
> 
> Chr01
> 
> 2806
> 
> 3006
> 
> 201
> 
> peak2 Potri.001G000200
> 
> peak2
> 
> -
> 
> Potri.001G000200
> 
> 2906
> 
> 6646
> 
> overlapEnd
> 
> 3740
> 
> 100
> 
> 
> 
> >
> > sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> 
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>  [3] LC_TIME=C            LC_COLLATE=C
>  [5] LC_MONETARY=C        LC_MESSAGES=C
>  [7] LC_PAPER=C           LC_NAME=C
>  [9] LC_ADDRESS=C         LC_TELEPHONE=C
> [11] LC_MEASUREMENT=C     LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] grid      parallel  stats     graphics  grDevices utils
> [7] datasets  methods   base
> 
> other attached packages:
> [1] ChIPpeakAnno_2.13.1  Biostrings_2.32.1
>  [3] XVector_0.4.0        IRanges_1.22.10
>  [5] biomaRt_2.20.0       VennDiagram_1.6.7
>  [7] GOSemSim_1.22.0      Rcpp_0.11.2
>  [9] BiocInstaller_1.14.2 edgeR_3.6.7
> [11] limma_3.20.8         hash_2.2.6
> [13] GSEABase_1.26.0      annotate_1.42.1
> [15] GOstats_2.30.0       graph_1.42.0
> [17] Category_2.30.0      GO.db_2.14.0
> [19] RSQLite_0.11.4       DBI_0.2-7
> [21] Matrix_1.1-4         AnnotationDbi_1.26.0
> [23] GenomeInfoDb_1.0.2   Biobase_2.24.0
> [25] BiocGenerics_0.10.0
> 
> loaded via a namespace (and not attached):
> [1] AnnotationForge_1.6.1   BBmisc_1.7
>  [3] BSgenome_1.32.0         BatchJobs_1.3
>  [5] BiocParallel_0.6.1      GenomicAlignments_1.0.5
> [7] GenomicFeatures_1.16.2  GenomicRanges_1.16.4
>  [9] MASS_7.3-33             RBGL_1.40.0
> [11] RCurl_1.95-4.3          Rsamtools_1.16.1
> [13] XML_3.98-1.1            bitops_1.0-6
> [15] brew_1.0-6              checkmate_1.2
> [17] codetools_0.2-8         digest_0.6.4
> [19] fail_1.2                foreach_1.4.2
> [21] genefilter_1.46.1       iterators_1.0.7
> [23] lattice_0.20-29         multtest_2.20.0
> [25] rtracklayer_1.24.2      sendmailR_1.1-2
> [27] splines_3.1.0           stats4_3.1.0
> [29] stringr_0.6.2           survival_2.37-7
> [31] tools_3.1.0             xtable_1.7-3
> [33] zlibbioc_1.10.0
> 

I can't speak about the ChIPpeakAnno issue, but you've installed the devel version of ChIPpeakAnno into an installation that's otherwise using the release version of Bioconductor. This will cause problems all by itself. 

I suggest either having a separate version of R that runs the devel version, or upgrade your current installation per http://www.bioconductor.org/developers/how-to/useDevel/ .

Then see if the problem persists.

Dan


> 
> >
> 
> 
> 
> 
> 
> 
> From: Zhu, Lihua (Julie) [mailto:Julie.Zhu at umassmed.edu]
> Sent: Wednesday, August 13, 2014 2:47 PM
> To: Zinkgraf, Matthew S -FS
> Cc: bioconductor at r-project.org
> Subject: Re: ChIPpeakAnno 2.12.2 bug
> 
> Matt,
> 
> Many thanks for helping us identifying the bug introduced in 2.12.2!
> The bug has been fixed in version 2.13.1. Please let me know if you
> have any issues.
> 
> For future correspondence, could you please keep the discussion in
> the bioconductor list so others can contribute or benefit? Thanks!
> 
> Best regards,
> 
> Julie
> 
> On 8/13/14 1:18 PM, "Zinkgraf, Matthew S -FS" <mszinkgraf at fs.fed.us>
> wrote:
> Hello Julie
> I recently updated ChIPpeakAnno from version 2.10 to 2.12.2 and I am
> getting incorrect results from annotatePeakInBatch.
> More specifically the output column 'insideFeature' is not taking
> feature strand into account when it is assigning peaks to gene
> features, the function thinks all gene features are on the positive
> strand even though my genome ranged data contains strand
> information. Also all other calculations from annotatePeakInBatch
> gives the same results as in 2.10.
> 
> Please let me know if you would like more information about the bug I
> am getting.
> Thanks
> 
> Matt
> 
> 
> ---
> 
> Matthew S. Zinkgraf, PhD
> Postdoctoral Researcher
> USDA, Forest Service
> Davis, CA 95618
> 530-759-1739
> mszinkgraf at fs.fed.us <mailto:mszinkgraf at fs.fed.us>
> 
> 
> 
> 
> 
> 
> 
> This electronic message contains information generated by the USDA
> solely for the intended recipients. Any unauthorized interception of
> this message or the use or disclosure of the information it contains
> may violate the law and subject the violator to civil or criminal
> penalties. If you believe you have received this message in error,
> please notify the sender and delete the email immediately.
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list