[BioC] Find Affy probes within a particular region

Martin Morgan mtmorgan at fhcrc.org
Tue Jun 17 18:19:54 CEST 2008


Daniel Brewer <daniel.brewer at icr.ac.uk> writes:

> Hi,
>
> I was wondering what the best way to find which Affymetrix probes are
> within a specific genomic region (chromosome, start, stop).  I am not
> sure if Biomart nor the annotation.db can do this as they both go to
> some common ID first.  The annotation.db stuff seems to only have one
> position information too.  The other option is to dwonload the
> annotation file from Affymetrix and load that in, but I would prefer to
> avoid that if at all possible.
>
> Has anyone got any ideas.

Not sure whether this is a good idea or not, but...

## create a data frame of probe genomic location
makeLookup <- function(pkg) {
    filt <- function(x) !is.null(names(x)) # some w/out names, hence czomes
    lst <- Filter(filt, as.list(getAnnMap("CHRLOC", pkg)))
    data.frame(id=rep(names(lst), sapply(lst, length)),
               pos=unlist(lst, use.names=FALSE),
               chr=unlist(lapply(lst, names), use.names=FALSE),
               row.names=NULL)
}

this gives us

> lookup <- makeLookup("hgu95av2.db")
> head(lookup)
         id       pos chr
1   1000_at -30032926  16
2   1001_at  43539250   1
3 1002_f_at  96512452  10
4 1003_s_at 118269310  11
5 1003_s_at 118259776  11
6   1004_at 118269310  11

then...

## find probes in a single region
contains <- function(chr, start, end, table) {
    apos <- abs(table$pos)
    idx <- table$chr == chr & apos >= start & apos <=end
    table[idx,]
}

> contains(10, 96000000, 97000000, lookup)
             id       pos chr
3     1002_f_at  96512452  10
525   1455_f_at  96688429  10
550   1477_s_at  96433367  10
4321 34078_s_at  96512452  10
6798   36320_at  96152175  10
7509 36937_s_at -96987321  10
9367   38548_at -96786519  10

One could use 'contains' with mapply to get multiple regions, but
perhaps there's a more efficient way for such bulk queries.

Not sure about your concerns about just 'location'; the probes are a
common length, so you could incorporate this into the 'idx'
calculation in contains().

Probably someone else will offer up a ready-made solution.

Martin

> Many thanks
>
> -- 
> **************************************************************
> Daniel Brewer, Ph.D.
>
> Institute of Cancer Research
> Molecular Carcinogenesis
> Email: daniel.brewer at icr.ac.uk
> **************************************************************
>
> The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.
>
> This e-mail message is confidential and for use by the a...{{dropped:2}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list