[BioC] Extract gene name from chromosome position
Martin Morgan
mtmorgan at fhcrc.org
Wed Nov 25 14:44:49 CET 2009
Andreia Fonseca <andreia.fonseca at gmail.com> writes:
> Hi,
>
> using Biomart you can get the gene name, ensembl_id, entrezgene id, using
> your chromosome positions as filters. check the pdf of biomart, there you
> can find good examples, that explain just how to do that.
IRanges::findOverlaps can help do this for many reads. To do so,
represent the biomaRt data as a RangedData object
library(IRanges)
library(biomaRt)
mart <- useMart('ensembl', 'mmusculus_gene_ensembl')
attr <- c("chromosome_name", "start_position", "end_position",
"ensembl_gene_id")
bm <- getBM(attr, "chromosome_name", as.character(1:19), mart)
rd <- with(bm,
RangedData(IRanges(start_position, end_position),
space=chromosome_name, ensembl_gene_id))
and also your reads. (If the reads come from ShortRead, then
reads <- as(ShortRead::readAligned(<...>), "RangesList")
but it will be more memory efficient to input just the read
chromosome, start, and aligned width). Then
olap <- findOverlaps(reads, rd)
and associate the results (e.g., number of hits per gene) with the
biomaRt information for down-stream calculations with
rd[["hits"]] <- tabulate(subjectHits(olap), nrow(rd))
Martin
> Good luck,
> Andreia
>
> On Tue, Nov 24, 2009 at 12:18 PM, Ramzi TEMANNI <ramzi.temanni at gmail.com>wrote:
>
>> Hi
>> I have a liste of reads that map to the genome in certain position
>> (chromosome | position) and i would like to know which function allows
>> that.
>> Thanks in advance for your help,
>> Regards,
>> Ramzi
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list