[BioC] ChIPpeakAnno annotatePeakInBatch error message

Zhu, Julie Julie.Zhu at umassmed.edu
Thu May 27 20:26:12 CEST 2010


Hi Dario,

Thanks for the vigorous test of the new feature!

The peak dataset contains chrX_random that is not in the feature dataset. I added is.na check on the strand which should fix the problem. I also attached the annotated Dataset. Please let me know if you encounter any problem.

Best regards,

Julie


On 5/26/10 11:00 PM, "Dario Strbenac" <D.Strbenac at garvan.org.au> wrote:

Hello,

Yes, I encountered the same problem again. This time I tried the code on my full table of data. This is my script. All the files it refers to are web accessible, so that you can replicate it too. I am definitely using version 1.5.3 of the package.

CpGIslandsTable <- read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed", sep = '\t', stringsAsFactors = FALSE)
genesTable <- read.csv("http://129.94.136.7/file_dump/dario/humanGenomeAnnotation.csv", stringsAsFactors = FALSE)
colnames(CpGIslandsTable) <- c("chr", "start", "end", "name")

peaksRangedData <- RangedData(space = CpGIslandsTable$chr, ranges = IRanges(start = CpGIslandsTable$start, end = CpGIslandsTable$end))
featuresRangedData <- RangedData(name = genesTable$name, space = genesTable$chr, strand = genesTable$strand, ranges = IRanges(start = genesTable$start, end = genesTable$end))
featureLoc <- "TSS"

annotatePeakInBatch(peaksRangedData, AnnotationData = featuresRangedData, PeakLocForDistance = "middle")

> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-pc-mingw32

locale:
[1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252    LC_MONETARY=English_Australia.1252 LC_NUMERIC=C                       LC_TIME=English_Australia.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] ChIPpeakAnno_1.5.3                  limma_3.4.0                         org.Hs.eg.db_2.4.1                  GO.db_2.4.1                         RSQLite_0.9-0
 [6] DBI_0.2-5                           AnnotationDbi_1.10.1                BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.0                     GenomicRanges_1.0.1
[11] Biostrings_2.16.0                   IRanges_1.6.0                       multtest_2.4.0                      Biobase_2.8.0                       biomaRt_2.4.0

loaded via a namespace (and not attached):
[1] MASS_7.3-5      RCurl_1.3-1     splines_2.11.0  survival_2.35-8 XML_2.8-1

---- Original message ----
>Date: Mon, 24 May 2010 22:57:47 -0400
>From: "Zhu, Julie" <Julie.Zhu at umassmed.edu>
>Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message
>To: "D.Strbenac at garvan.org.au" <D.Strbenac at garvan.org.au>, "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>
>   Hi Dario,
>
>   Please download dev 1.5.3 version of ChIPpeakAnno
>   and let me know if you encounter any problem.
>   Thanks!
>
>   Best regards,
>
>   Julie
>
>   annotatePeakInBatch(peaksRangedData, AnnotationData
>   = featuresRangedData, PeakLocForDistance = "middle")
>   RangedData with 6 rows and 9 value columns across 2
>   spaces
>             space               ranges |        peak
>        strand     feature start_position end_position
>   insideFeature distancetoFeature
>       <character>            <IRanges> | <character>
>   <character> <character>      <numeric>    <numeric>
>     <character>         <numeric>
>   1 1        chr1 [ 2000010,  2000310] |           1
>             +           1          1e+06      2.0e+06
>      downstream           1000160
>   2 2        chr1 [19000000, 19000300] |           2
>             -           2          1e+07      2.0e+07
>          inside            999850
>   3 2        chr1 [30000000, 30000300] |           3
>             -           2          1e+07      2.0e+07
>        upstream         -10000150
>   4 4        chr2 [     300,      600] |           4
>             +           4          1e+03      5.0e+03
>        upstream              -550
>   6 6        chr2 [  100000,   100300] |           6
>             +           6          1e+04      1.5e+04
>      downstream             90150
>   5 5        chr2 [    5500,     5800] |           5
>             -           5          6e+03      7.0e+03
>      downstream              1350
>       shortestDistance fromOverlappingOrNearest
>              <numeric>              <character>
>   1 1               10             NearestStart
>   2 2           999700             NearestStart
>   3 2         10000000             NearestStart
>   4 4              400             NearestStart
>   6 6            85000             NearestStart
>   5 5              200             NearestStart
>
>   > sessionInfo()
>   R version 2.11.0 (2010-04-22)
>   i386-apple-darwin9.8.0
>
>   locale:
>   [1]
>   en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
>   attached base packages:
>   [1] stats     graphics  grDevices utils     datasets
>    methods   base
>
>   other attached packages:
>    [1] ChIPpeakAnno_1.5.3                  limma_3.4.0
>                           org.Hs.eg.db_2.4.1
>
>    [4] GO.db_2.4.1
>                           RSQLite_0.9-0
>                         DBI_0.2-5
>
>    [7] AnnotationDbi_1.10.1
>                  BSgenome.Ecoli.NCBI.20080805_1.3.16
>   BSgenome_1.16.1
>   [10] GenomicRanges_1.0.1
>                   Biostrings_2.16.0
>                     IRanges_1.6.1
>
>   [13] multtest_2.4.0
>                        Biobase_2.8.0
>                         biomaRt_2.4.0
>
>
>   On 5/24/10 5:10 AM, "Dario Strbenac"
>   <D.Strbenac at garvan.org.au> wrote:
>
>     Hello,
>
>     I made another small example of using
>     annoPeakInBatch to demonstrate to a friend, but it
>     has crashed. It's similar to the other example but
>     with different data. I'm not sure why it is
>     happening.
>
>     Here is my small example:
>
>     peaksT <- data.frame(chr = c("chr1", "chr1",
>     "chr1", "chr2", "chr2", "chr2"), start =
>     c(2000010, 19000000, 30000000, 300, 5500, 100000),
>     end = c(2000310, 19000300, 30000300, 600, 5800,
>     100300))
>     featuresT <- data.frame(name = c("gene1", "gene2",
>     "gene3", "gene4", "gene5", "gene6"), chr =
>     c("chr1", "chr1", "chr1", "chr2", "chr2", "chr2"),
>     start = c(1000000, 10000000, 15000000, 1000, 6000,
>     10000), end = c(2000000, 20000000, 22000000, 5000,
>     7000, 15000), strand = c('+', '-', '+', '+', '-',
>     '+'))
>
>     require(ChIPpeakAnno)
>
>     peaksRangedData <- RangedData(space = peaksT$chr,
>     ranges = IRanges(start = peaksT$start, end =
>     peaksT$end))
>     featuresRangedData <- RangedData(name =
>     featuresT$name, space = featuresT$chr, strand =
>     featuresT$strand, ranges = IRanges(start =
>     featuresT$start, end = featuresT$end))
>     featureLoc <- "TSS"
>
>     annotatePeakInBatch(peaksRangedData,
>     AnnotationData = featuresRangedData,
>     PeakLocForDistance = "middle")
>
>     Error in if (as.character(r.n$strand[i]) == "1" ||
>     as.character(r.n$strand[i]) ==  :
>       missing value where TRUE/FALSE needed
>
>     My sessionInfo is :
>
>     R version 2.11.0 (2010-04-22)
>     x86_64-unknown-linux-gnu
>
>     locale:
>      [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>
>      [3] LC_TIME=en_AU.UTF-8
>            LC_COLLATE=en_AU.UTF-8
>      [5] LC_MONETARY=C
>                  LC_MESSAGES=en_AU.UTF-8
>      [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>
>      [9] LC_ADDRESS=C               LC_TELEPHONE=C
>
>     [11] LC_MEASUREMENT=en_AU.UTF-8
>     LC_IDENTIFICATION=C
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils
>         datasets  methods   base
>
>     other attached packages:
>      [1] ChIPpeakAnno_1.5.2
>                      limma_3.4.0
>
>      [3] org.Hs.eg.db_2.4.1
>                      GO.db_2.4.1
>
>      [5] RSQLite_0.9-0                       DBI_0.2-5
>
>      [7] AnnotationDbi_1.10.0
>                    BSgenome.Ecoli.NCBI.20080805_1.3.16
>      [9] BSgenome_1.16.1
>                         GenomicRanges_1.0.1
>
>     [11] Biostrings_2.16.0
>                       IRanges_1.6.2
>
>     [13] multtest_2.4.0
>                          Biobase_2.8.0
>
>     [15] biomaRt_2.4.0
>
>     loaded via a namespace (and not attached):
>     [1] MASS_7.3-6      RCurl_1.4-2     splines_2.11.0
>      survival_2.35-8
>     [5] XML_3.1-0
>
>     Thanks,
>            Dario.
>
>     --------------------------------------
>     Dario Strbenac
>     Research Assistant
>     Cancer Epigenetics
>     Garvan Institute of Medical Research
>     Darlinghurst NSW 2010
>     Australia
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at stat.math.ethz.ch
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor


--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia





More information about the Bioconductor mailing list