[BioC] AffyID mapping question

Mon Jul 2 17:53:32 CEST 2012

Hi Jiayi,

Side note: please CC the bioconductor list when replying to emails so
they can stay online -- you'll get better help (more eyeballs on your
problem), and the list can be used as a resource to others.

I guess this might be a pain using the "guest posting" stuff -- but
subscribing to the mailing list is easy, and you'll learn a lot by
skimming the post that come through here.

OK -- now to solver your problem:

On Mon, Jul 2, 2012 at 11:03 AM, Jiayi Hou <houj2 at vcu.edu> wrote:
> Hey Steve,
>
> Sorry let me put it this way, so when a probeset hybridized to a given gene,
> the gene has a chromosomal location in terms of base pair. For a given gene,
> on average there may be 2-3 probesets attach to the same gene. However,
> these 2-3 probesets carrying different sequence of base pairs, are expected
> to attach to the different location oin the given gene. What I am looking
> for is where precisly these probesets attach to the gene.

Thanks, that's a bit clearer now.

In the past I've done this with a little elbow grease: you can get the
probe sequence info for the chip you're using from this package:

http://bioconductor.org/packages/2.10/data/annotation/html/htmg430aprobe.html

There's a short vignette on matching probe sequences (against each
other, which isn't all that helpful for you, but can be a start) using
the Biostrings package here:

http://bioconductor.org/packages/2.10/bioc/vignettes/Biostrings/inst/doc/matchprobes.pdf

You can extend the examples there by matching your probes against the
mouse genome using the appropriate BSgenome package
(BSgenome.Mmusculus.UCSC.mm9).

Alternatively, you can follow section 4.1 of the biomaRt vignette here:

http://bioconductor.org/packages/2.10/bioc/vignettes/biomaRt/inst/doc/biomaRt.pdf

For example:

R> ensembl <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
R> affyids <- c("202763_at","209310_s_at","207500_at")
R> getBM(attributes=c('affy_hg_u133_plus_2', 'hgnc_symbol',
                'chromosome_name','start_position','end_position', 'band'),
      filters = 'affy_hg_u133_plus_2', values = affyids, mart = ensembl)

  affy_hg_u133_plus_2 hgnc_symbol chromosome_name start_position
end_position  band
1           202763_at       CASP3               4      185548850
185570663 q35.1
2         209310_s_at       CASP4              11      104813593
104840163 q22.3
3           207500_at       CASP5              11      104864962
104893895 q22.3

You'll have to change the "mart/dataset" you are using, as well as the
chip id's, but you should get the idea.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact