[BioC] Genome Browser and R
Martin Morgan
mtmorgan at fhcrc.org
Thu Oct 24 19:53:06 CEST 2013
On 10/24/2013 09:37 AM, khadeeja ismail wrote:
>
>
> Hi,
> I am working with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in. For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track.
>
>
> Layered H3K27Ac
> Layered H3K4Me1
> Layered H3K4Me3
> Transcription
> DNase Clusters
> DNase Clusters V1
> Txn Fac ChIP V3
> Txn Factor ChIP
These tracks are available in AnnotationHub
library(AnnotationHub)
hub = AnnotationHub()
m = metadata(hub)
and then
> head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)])
[1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk"
[2] "wgEncodeBroadHistoneNhaH3k27acStdPk"
[3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk"
[4] "wgEncodeBroadHistoneK562H3k27acStdPk"
[5] "wgEncodeBroadHistoneGm12878H3k27acStdPk"
[6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk"
> xx =
hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData
Retrieving
'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData'
> head(xx)
GRanges with 6 ranges and 5 metadata columns:
seqnames ranges strand | name score signalValue
<Rle> <IRanges> <Rle> | <character> <integer> <numeric>
[1] chr22 [17091048, 17091199] * | . 579 11.651761
[2] chr22 [17305774, 17306441] * | . 531 10.111585
[3] chr22 [17517314, 17517945] * | . 527 9.991400
[4] chr22 [17518132, 17518819] * | . 837 19.847850
pValue qValue
<numeric> <numeric>
[1] 2.4 -1
[2] 15.4 -1
[3] 100.0 -1
[4] 15.3 -1
[ reached getOption("max.print") -- omitted 2 rows ]
and then ready for findOverlaps or other GRanges operations. There's a vignette
in AnnotationHub
http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html
and it is mentioned in the work flow on annotation and AnnotatingRanges work
flows are relevant
http://bioconductor.org/help/workflows/annotation/annotation/
http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/
It would be interesting and useful to have this as a stand-alone work flow, so
if you do pursue this root and are interested in writing up a workflow then let
me know...
Martin
>
>
> I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done.
>
> Thanking you,
> Khadeeja
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the Bioconductor
mailing list