[BioC] Genome Browser and R

Martin Morgan mtmorgan at fhcrc.org
Thu Oct 24 19:53:06 CEST 2013


On 10/24/2013 09:37 AM, khadeeja ismail wrote:
>
>
> Hi,
> I am working  with some 450k array probes which I need to look up in Geneome browser to see in which type of areas these probes are located in.  For example, if the CpG site (+/- 100kb) overlaps with any of the following in the GM12878 track.
>
>
> Layered H3K27Ac
> Layered H3K4Me1
> Layered H3K4Me3
> Transcription
> DNase Clusters
> DNase Clusters V1
> Txn Fac ChIP V3
> Txn Factor ChIP

These tracks are available in AnnotationHub

   library(AnnotationHub)
   hub = AnnotationHub()
   m = metadata(hub)

and then

 > head(m$Description[grep("H3k27Ac", m$Description, ignore.case=TRUE)])
[1] "wgEncodeBroadHistoneHsmmtH3k27acStdPk"
[2] "wgEncodeBroadHistoneNhaH3k27acStdPk"
[3] "wgEncodeBroadHistoneA549H3k27acEtoh02Pk"
[4] "wgEncodeBroadHistoneK562H3k27acStdPk"
[5] "wgEncodeBroadHistoneGm12878H3k27acStdPk"
[6] "wgEncodeSydhHistoneMcf7H3k27acUcdPk"

 > xx = 
hub$goldenpath.hg19.encodeDCC.wgEncodeBroadHistone.wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData
Retrieving 
'goldenpath/hg19/encodeDCC/wgEncodeBroadHistone/wgEncodeBroadHistoneGm12878H3k27acStdPk.broadPeak_0.0.1.RData'

 > head(xx)
GRanges with 6 ranges and 5 metadata columns:
       seqnames               ranges strand |        name     score signalValue
          <Rle>            <IRanges>  <Rle> | <character> <integer>   <numeric>
   [1]    chr22 [17091048, 17091199]      * |           .       579   11.651761
   [2]    chr22 [17305774, 17306441]      * |           .       531   10.111585
   [3]    chr22 [17517314, 17517945]      * |           .       527    9.991400
   [4]    chr22 [17518132, 17518819]      * |           .       837   19.847850
          pValue    qValue
       <numeric> <numeric>
   [1]       2.4        -1
   [2]      15.4        -1
   [3]     100.0        -1
   [4]      15.3        -1
  [ reached getOption("max.print") -- omitted 2 rows ]

and then ready for findOverlaps or other GRanges operations. There's a vignette 
in AnnotationHub

   http://bioconductor.org/packages/release/bioc/html/AnnotationHub.html

and it is mentioned in the work flow on annotation and AnnotatingRanges work 
flows are relevant

  http://bioconductor.org/help/workflows/annotation/annotation/
  http://bioconductor.org/help/workflows/annotation/AnnotatingRanges/

It would be interesting and useful to have this as a stand-alone work flow, so 
if you do pursue this root and are interested in writing up a workflow then let 
me know...

Martin

>
>
> I would like to do it as batch and not one by one since the list of probes is long. I have tried querying the GenomeBrowser database and also the rtracklayer package in R but have not been successful. Would be great if anyone can give me any ideas on how it can be done.
>
> Thanking you,
> Khadeeja
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list