[BioC] How to retrieve gene information

Martin Morgan mtmorgan at fhcrc.org
Wed Dec 26 18:26:58 CET 2007


Hi Allen --

Also the 'org.*' annotation packages provide organism-centric
annotations. These packages have environment-like key-value structures
that typically map Entrez identifiers to other information. The CHRLOC
variables map Entrez ids to named integer vectors. Names on the vector
are chromosome locations, values are the strand (+ or -) and base
start position. Thus

> library(annotate)
> library(org.Hs.eg.db)
> 
> filt <- function(eid) {
+     loc <- abs(eid) # either strand
+     any(names(eid)=="3" & loc>1e8 & loc<1.1e8)
+ }
> found <- unlist(eapply(org.Hs.egCHRLOC, filt))
> sum(found) # named genes found
[1] 33
> 
> eids <- ls(org.Hs.egCHRLOC[found]) # subset; extract entrez ids
> head(lookUp(eids, "org.Hs.eg.db", "GENENAME")) # first 6
$`214`
[1] "activated leukocyte cell adhesion molecule"

$`868`
[1] "Cas-Br-M (murine) ecotropic retroviral transforming sequence b"

$`961`
[1] "CD47 molecule"

$`1295`
[1] "collagen, type VIII, alpha 1"

$`6152`
[1] "ribosomal protein L24"

$`9666`
[1] "zinc finger DAZ interacting protein 3"

The annotation packages are from data sources taken from a snapshot of
data sources (see org.Hs.eg_dbInfo()) that might differ from the data
sources used for biomaRt; the merits of this include (a)
reproduciblilty and (b) relative speed of computation (no internet
download required).

Martin


toedling at ebi.ac.uk writes:

> Hi Allen,
>
> have a look at the biomaRt package:
> http://www.bioconductor.org/packages/2.2/bioc/html/biomaRt.html
> and see the vignette section 4.5 Task 5.
> You may want to select additional attributes returned, such as
> "description" etc. Use the "listAttributes" function to obtain of list of
> possible attributes.
>
> Best and Merry Christmas,
> Joern
>
>> Dear list,
>>
>> I am wondering if I supply with chromosome information, start and end
>> position
>> of a region, can any Bioconductor package could return a list of genes
>> with
>> short descriptions within those regions?
>>
>> Thanks and Merry Christmas!
>>
>> Allen
>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list