[BioC] Find Affy probes within a particular region

Sean Davis sdavis2 at mail.nih.gov
Tue Jun 17 19:15:30 CEST 2008


On Tue, Jun 17, 2008 at 12:35 PM, James W. MacDonald
<jmacdon at med.umich.edu> wrote:
> Hi Daniel,
>
> Daniel Brewer wrote:
>>
>> Hi,
>>
>> I was wondering what the best way to find which Affymetrix probes are
>> within a specific genomic region (chromosome, start, stop).  I am not
>> sure if Biomart nor the annotation.db can do this as they both go to
>> some common ID first.  The annotation.db stuff seems to only have one
>> position information too.  The other option is to dwonload the
>> annotation file from Affymetrix and load that in, but I would prefer to
>> avoid that if at all possible.
>
> So it depends on exactly what you are trying to do here. The annotation
> package will give you the chromosome and start location for the probeset.
> You can use biomaRt to get the chromosome, start, stop for the probeset as
> well:
>
>> getBM(c("chromosome_name", "start_position","end_position"),
>> "affy_hg_u133_plus_2", "1007_s_at", mart)
>  chromosome_name start_position end_position
> 1               6       30956784     30975912
>
> Of course the question posed is to find things within a specific region, not
> find the region given the probeset ID. If this is what you want, then an
> annotation.db package might be what you want. Something like
>
> library(mouse4302.db)
> tab <- toTable(mouse4302CHRLOC)
> ind <- tab[,3] == 6 & abs(tab[,2]) <= 53043660 &
>        abs(tab[,2]) >= 50000000
> tab[ind,]
>
>> head(tab[ind,])
>         probe_id start_location Chromosome
> 1481   1416884_at       51420614          6
> 1482   1432628_at       51420614          6
> 1483 1439421_x_at       51420614          6
> 1484 1448504_a_at       51420614          6
> 1485   1454305_at       51420614          6
> 2397 1422483_a_at      -50512561          6
>
> If you then want the probeset start, end, you could feed the Affy IDs to
> biomaRt.
>
> But you specifically said probes, so you might actually want probes instead
> of probesets. If you want to get that specific then you could use the
> Biostrings package along with one of the genome packages (warning - HUGE
> download) and a probe package to get the locations of these probes. I'm not
> going to give code for that. Instead see the vignette in the BSgenome
> package, which has lots of examples.

Another alternative is to use the Ensembl core databases directly
rather than biomaRt.  This is a bit tedious, but there is a treasure
trove of information embedded therein.  To get the probe locations for
all mapped affy probes on the HG_U133A array, you could do:

> library(RMySQL)
> con <- dbConnect(MySQL(),user='anonymous',host='ensembldb.ensembl.org',dbname='homo_sapiens_core_47_36i')
> ### Warning--BIG SQL command coming....
> a=dbGetQuery(con,sprintf("select op.probeset as probeset,op.name as probe_name,s.name as chromosome,of.seq_region_start as start,of.seq_region_end as end,of.seq_region_strand as strand from oligo_feature of inner join oligo_probe op on op.oligo_probe_id=of.oligo_probe_id inner join oligo_array oa on oa.oligo_array_id=op.oligo_array_id inner join seq_region s on s.seq_region_id=of.seq_region_id where oa.name='%s' order by probeset",'HG_U133A'))
> a[1:5,]
   probeset probe_name chromosome    start      end strand
1 1007_s_at   467:181;          6 30975290 30975314      1
2 1007_s_at   365:115;          6 30975523 30975547      1
3 1007_s_at   532:139;          6 30975672 30975696      1
4 1007_s_at    86:557;          6 30975472 30975496      1
5 1007_s_at    308:15;          6 30975838 30975862      1

This takes a minute or so to complete (at least from the US).  You can
then use normal R to do filtering on this dataframe.  Hope that helps
somewhat.

Sean

>>
>> Has anyone got any ideas.
>>
>> Many thanks
>>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list