[BioC] handling missing data with ucscTableQuery in rtracklayer package

Ana Rodrigues arodrigues at salk.edu
Tue Feb 9 20:47:17 CET 2010


Dear all,
I am using the rtracklayer package to fetch phastCons scores for
certain positions from the UCSC genome browser.  For some of these
positions, there will be no phastCons score available (ie no
alignment), and the code below will crash, rather than spit out an
error an move on to the next position.

library("rtracklayer")

get.phastCons.rtl <- function(pos, ref="ce6") {
  if(!exists("session")) {
    session <<- browserSession("UCSC")
  }

  pos <- unlist(strsplit(pos, "[:-]"))
  pos.track <- GenomicData(IRanges(start=as.numeric(pos[2]),
                                   end=as.numeric(pos[3])),
                           strand = "+", chrom = pos[1],
                           genome = ref)
  track(session, "pos") <- pos.track
  query <- ucscTableQuery(session, "multiz6way",
                          GenomicRanges(as.numeric(pos[2]),
                                        as.numeric(pos[3]),
                                        pos[1]))
  tableName(query) <- "phastCons6way"
  return(track(query)$score)
}

positions <- c("chrX:16039269-16039273",
               "chrX:16039357-16039361",
               "chrX:16039609-16039613") # no phastCons score

# this works
pc.scores.works <- sapply(positions[1:2], get.phastCons.rtl)

# this crashes with an error and no scores get reported
pc.scores.fails <- sapply(positions, get.phastCons.rtl)

Here is the error:
Error in read.table(con, colClasses = bedClasses, as.is = TRUE) :
  no lines available in input

I am wondering if you could give me some advice on making this
function able to cope with missing phastCons data, so that I can apply
it for any given position?

Thank you in advance for any help you can provide,
Ana



> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-apple-darwin9.8.0

locale:
[1] en_US.US-ASCII/en_US.UTF-8/C/C/en_US.US-ASCII/en_US.US-ASCII

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rtracklayer_1.6.0 RCurl_1.3-1       bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] Biobase_2.6.1      Biostrings_2.14.11 BSgenome_1.14.2    IRanges_1.4.11
[5] XML_2.6-0


> traceback()
35: stop("no lines available in input")
34: read.table(con, colClasses = bedClasses, as.is = TRUE)
33: DataFrame(read.table(con, colClasses = bedClasses, as.is = TRUE))
32: .local(con, variant, trackLine, genome, ...)
31: fun(con, ...)
30: fun(con, ...)
29: import(con, format, ...)
28: import(con, format, ...)
27: import(text = lines, format = "bed", variant = "bedGraph", genome = genome)
26: import(text = lines, format = "bed", variant = "bedGraph", genome = genome)
25: .local(con, genome, ...)
24: fun(con, ...)
23: fun(con, ...)
22: import(con, format, ...)
21: import(con, format, ...)
20: import(format = subformat, text = text, ...)
19: import(format = subformat, text = text, ...)
18: FUN(1L[[1L]], ...)
17: lapply(seq_along(trackLines), makeTrackSet)
16: lapply(seq_along(trackLines), makeTrackSet)
15: import.ucsc(con, "wig", TRUE, genome = genome)
14: import.ucsc(con, "wig", TRUE, genome = genome)
13: .local(con, genome, ...)
12: fun(con, ...)
11: fun(con, ...)
10: import(con, format, ...)
9: import(con, format, ...)
8: import(text = output, format = format)
7: import(text = output, format = format)
6: .local(object, ...)
5: track(query)
4: track(query)
3: FUN(c("chrX:16039269-16039273", "chrX:16039357-16039361",
"chrX:16039609-16039613"
   )[[3L]], ...)
2: lapply(X, FUN, ...)
1: sapply(positions, get.phastCons.rtl)

-- 
Ana P. C. Rodrigues, Ph.D.
Salk Institute for Biological Studies
T +1 858.453.4100 x1067
W bioinformatics.salk.edu/people/ana/



More information about the Bioconductor mailing list