[BioC] ChIPpeakAnno 2.12.2 bug

Zinkgraf, Matthew S -FS mszinkgraf at fs.fed.us
Mon Aug 18 20:49:07 CEST 2014


Hi

Thanks for working on this but I am still getting the same result even with 2.13.1. The annotatePeakInBatch is not assigning peaks to features correctly when genes are located on the negative strand. Below is an example that produces an incorrect result when using annotatePeakInBatch. I think the problem is located in the  output= "nearestStart" because "overlapping" produces the correct result. Specifically the second line of the annPeak output should be assigning peak1 to overlapEnd instead of overlapStart because the gene feature is located on the negative strand.



Matt


> library(ChIPpeakAnno)
>
>
> peak<-RangedData(IRanges(start=as.numeric(as.character(c(1650,2806,8361))), end=as.numeric(as.character(c(1860,3006,8591)))), space=c("Chr01","Chr01","Chr01"),strand=as.integer(1))
> row.names(peak)<-c("peak1","peak2","peak3")
>
>
> #load subset of data from Populus v210 gff3 file
> genome<-as.data.frame(rbind(c("Chr01","phytozome8_0",      "gene",1660,2502, ".",     "-",     ".","Potri.001G000100"),c("Chr01", "phytozome8_0",     "gene",2906,6646,    ".",        "-",     ".","Potri.001G000200"),c("Chr01", "phytozome8_0",   "gene",8391,8860, ".",     "+",     ".","Potri.001G000300")))
>
>

> GENOME<-GFF2RangedData(genome)

Warning message:

In DataFrame(...) : NAs introduced by coercion

> row.names(GENOME)<-genome[,9]
>
>   annPeak = annotatePeakInBatch(peak, AnnotationData=GENOME, PeakLocForDistance ="middle",output="both")
Warning messages:
1: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by coercion
2: In mapply(FUN = f, ..., SIMPLIFY = FALSE) : NAs introduced by coercion
>
> View(annPeak)

space

start

end

width

names

peak

strand

feature

start_position

end_position

insideFeature

distancetoFeature

shortestDistance

Chr01

8361

8591

231

peak3 Potri.001G000300

peak3

+

Potri.001G000300

8391

8860

overlapStart

85

30

Chr01

1650

1860

211

peak1 Potri.001G000100

peak1

-

Potri.001G000100

1660

2502

overlapStart

747

10

Chr01

2806

3006

201

peak2 Potri.001G000100

peak2

-

Potri.001G000100

1660

2502

downstream

-404

304

Chr01

2806

3006

201

peak2 Potri.001G000200

peak2

-

Potri.001G000200

2906

6646

overlapEnd

3740

100



>
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
 [3] LC_TIME=C            LC_COLLATE=C
 [5] LC_MONETARY=C        LC_MESSAGES=C
 [7] LC_PAPER=C           LC_NAME=C
 [9] LC_ADDRESS=C         LC_TELEPHONE=C
[11] LC_MEASUREMENT=C     LC_IDENTIFICATION=C

attached base packages:
[1] grid      parallel  stats     graphics  grDevices utils
[7] datasets  methods   base

other attached packages:
[1] ChIPpeakAnno_2.13.1  Biostrings_2.32.1
 [3] XVector_0.4.0        IRanges_1.22.10
 [5] biomaRt_2.20.0       VennDiagram_1.6.7
 [7] GOSemSim_1.22.0      Rcpp_0.11.2
 [9] BiocInstaller_1.14.2 edgeR_3.6.7
[11] limma_3.20.8         hash_2.2.6
[13] GSEABase_1.26.0      annotate_1.42.1
[15] GOstats_2.30.0       graph_1.42.0
[17] Category_2.30.0      GO.db_2.14.0
[19] RSQLite_0.11.4       DBI_0.2-7
[21] Matrix_1.1-4         AnnotationDbi_1.26.0
[23] GenomeInfoDb_1.0.2   Biobase_2.24.0
[25] BiocGenerics_0.10.0

loaded via a namespace (and not attached):
[1] AnnotationForge_1.6.1   BBmisc_1.7
 [3] BSgenome_1.32.0         BatchJobs_1.3
 [5] BiocParallel_0.6.1      GenomicAlignments_1.0.5
[7] GenomicFeatures_1.16.2  GenomicRanges_1.16.4
 [9] MASS_7.3-33             RBGL_1.40.0
[11] RCurl_1.95-4.3          Rsamtools_1.16.1
[13] XML_3.98-1.1            bitops_1.0-6
[15] brew_1.0-6              checkmate_1.2
[17] codetools_0.2-8         digest_0.6.4
[19] fail_1.2                foreach_1.4.2
[21] genefilter_1.46.1       iterators_1.0.7
[23] lattice_0.20-29         multtest_2.20.0
[25] rtracklayer_1.24.2      sendmailR_1.1-2
[27] splines_3.1.0           stats4_3.1.0
[29] stringr_0.6.2           survival_2.37-7
[31] tools_3.1.0             xtable_1.7-3
[33] zlibbioc_1.10.0


>






From: Zhu, Lihua (Julie) [mailto:Julie.Zhu at umassmed.edu]
Sent: Wednesday, August 13, 2014 2:47 PM
To: Zinkgraf, Matthew S -FS
Cc: bioconductor at r-project.org
Subject: Re: ChIPpeakAnno 2.12.2 bug

Matt,

Many thanks for helping us identifying the bug introduced in 2.12.2! The bug has been fixed in version 2.13.1. Please let me know if you have any issues.

For future correspondence, could you please keep the discussion in the bioconductor list so others can contribute or benefit? Thanks!

Best regards,

Julie

On 8/13/14 1:18 PM, "Zinkgraf, Matthew S -FS" <mszinkgraf at fs.fed.us> wrote:
Hello Julie
I recently updated ChIPpeakAnno from version 2.10 to 2.12.2 and I am getting incorrect results from annotatePeakInBatch.
More specifically the output column 'insideFeature' is not taking feature strand into account when it is assigning peaks to gene features, the function thinks all gene features are on the positive strand even though my genome ranged data contains strand information. Also all other calculations from annotatePeakInBatch gives the same results as in 2.10.

Please let me know if you would like more information about the bug I am getting.
Thanks

Matt


---

Matthew S. Zinkgraf, PhD
Postdoctoral Researcher
USDA, Forest Service
Davis, CA 95618
530-759-1739
mszinkgraf at fs.fed.us <mailto:mszinkgraf at fs.fed.us>







This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list