[BioC] Extract gene name from chromosome position

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 25 14:44:49 CET 2009


Andreia Fonseca <andreia.fonseca at gmail.com> writes:

> Hi,
>
> using Biomart you can get the gene name, ensembl_id, entrezgene id, using
> your chromosome positions as filters. check the pdf of biomart, there you
> can find good examples, that explain just how to do that.

IRanges::findOverlaps can help do this for many reads.  To do so,
represent the biomaRt data as a RangedData object

 library(IRanges)
 library(biomaRt)
 mart <- useMart('ensembl', 'mmusculus_gene_ensembl')
 attr <- c("chromosome_name", "start_position", "end_position",
            "ensembl_gene_id")
 bm <- getBM(attr, "chromosome_name", as.character(1:19), mart)
 rd <- with(bm,
             RangedData(IRanges(start_position, end_position),
                        space=chromosome_name, ensembl_gene_id))
  
and also your reads. (If the reads come from ShortRead, then

  reads <- as(ShortRead::readAligned(<...>), "RangesList")

but it will be more memory efficient to input just the read
chromosome, start, and aligned width). Then

  olap <- findOverlaps(reads, rd)

and associate the results (e.g., number of hits per gene) with the
biomaRt information for down-stream calculations with

  rd[["hits"]] <- tabulate(subjectHits(olap), nrow(rd))

Martin



> Good luck,
> Andreia
>
> On Tue, Nov 24, 2009 at 12:18 PM, Ramzi TEMANNI <ramzi.temanni at gmail.com>wrote:
>
>> Hi
>> I have a liste of reads that map to the genome in certain position
>> (chromosome | position) and i would like to know which function allows
>> that.
>> Thanks in advance for your help,
>> Regards,
>> Ramzi
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list