[BioC] Positional Details with Features through UniProt.ws Ultimately to display as tracks in ggbio
Tengfei Yin
tengfei.yin at sbgenomics.com
Wed Aug 27 00:37:49 CEST 2014
Hey Anne,
So sorry for the late reply.
Ideally, I should have some kind of mapper function in biovizBase to help
map protein space to genomic space, so you don't have to do it yourself,
but before I have that, a hack would be massage your protein domain data
into a GRanges object, with domain function as coloumn, and use genomic
coordinates, and then create a separate track to plot the object as
rectangle and use color legend to indicate domain function.
I will try to develop a more general approach for doing this, if you want,
please send me an example RData or example data, so we can work on that
together.
ps: in case I don't miss your request, feel free to use github page issues
<https://github.com/tengfei/ggbio/issues>here
cheers
Tengfei
On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at georgetown.edu>
wrote:
> Dear all,
>
> biocLite(“UniProt.ws”)
> library(UniProt.ws)
>
>
> select(UniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),keytype="UNIPROTKB")
> Getting extra data for P02794 NA NA etc
> UNIPROTKB DOMAINS
> 1 P02794 Ferritin-like diiron domain (1)
>
>
> FEATURES
> 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator
> methionine (1); Metal binding (6); Modified residue (4); Sequence conflict
> (1); Turn (2)
>
> What I want are the positional details for each of these features — which
> are visible through the uniprot web page.
> FTH1 is 183 amino acids in length. There are 6 metal binding sites, each
> at a specific position.
> This information is there since you can have the web site return the
> positional details. I would like them so I may manipulate them with new
> evidential information.
>
> Ultimately I wish to display them with tracks from ggbio —
> pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
> param = ScanBamParam(which =
> genesymbol["FTH1"],what=c("seq")),
> use.names = TRUE)
>
> FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])
>
> So here I have sample information which I have aligned to the reference
> genome. I retrieve that information from a bam file.
> # create the GAlignments objects for each isoform
> FTH1.isoform.1 <- pb.53A.pos.ga[c(7)]
> FTH1.isoform.2 <- pb.53A.pos.ga[c(15)]
> FTH1.isoform.3 <- pb.53A.pos.ga[c(13)]
> FTH1.isoform.4 <- pb.53A.pos.ga[c(8)]
> FTH1.isoform.5 <- pb.53A.pos.ga[c(2)]
> FTH1.isoform.6 <- pb.53A.pos.ga[c(1)]
>
>
> p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
> p2 <- autoplot(FTH1.isoform.2, fill = "blue", color = "blue")
> p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
> p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
> p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
> p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")
>
> tracks( FTH1=p1.FTH1,
> "Iso 1"=p1,
> "Iso 2"=p2,
> "Iso 3"=p3,
> "Iso 4"=p4,
> "Iso 5"=p5,
> "Iso 6"=p6)
>
>
> I then can autopilot each of the separate isoforms. What I want to do
> however, is annotate the isoforms so that they each show the coding region
> with the full height of the bar, and a reduced height for the non-coding
> regions.
>
> Additionally, I want to color the graphic with the details for the
> protein, such as the metal binding sites, domains, etc. So that
> computationally I can generate an informative picture which explains what
> is lost or gained in separate isoforms.
>
> Thoughts?
>
> Anne
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] UniProt.ws_2.4.2
> [2] RCurl_1.95-4.3
> [3] bitops_1.0-6
> [4] RSQLite_0.11.4
> [5] DBI_0.2-7
> [6] biomaRt_2.20.0
> [7] BiocInstaller_1.14.2
> [8] GenomicAlignments_1.0.5
> [9] BSgenome_1.32.0
> [10] Rsamtools_1.16.1
> [11] Biostrings_2.32.1
> [12] XVector_0.4.0
> [13] ggbio_1.12.8
> [14] ggplot2_1.0.0
> [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
> [16] GenomicFeatures_1.16.2
> [17] AnnotationDbi_1.26.0
> [18] Biobase_2.24.0
> [19] GenomicRanges_1.16.4
> [20] GenomeInfoDb_1.0.2
> [21] IRanges_1.22.10
> [22] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
> [1] BatchJobs_1.3 BBmisc_1.7 BiocParallel_0.6.1
> [4] biovizBase_1.12.1 brew_1.0-6 checkmate_1.2
> [7] cluster_1.15.2 codetools_0.2-8 colorspace_1.2-4
> [10] dichromat_2.0-0 digest_0.6.4 fail_1.2
> [13] foreach_1.4.2 Formula_1.1-2 grid_3.1.0
> [16] gridExtra_0.9.1 gtable_0.1.2 Hmisc_3.14-4
> [19] iterators_1.0.7 labeling_0.2 lattice_0.20-29
> [22] latticeExtra_0.6-26 MASS_7.3-33 munsell_0.4.2
> [25] plyr_1.8.1 proto_0.3-10 RColorBrewer_1.0-5
> [28] Rcpp_0.11.2 reshape2_1.4 rtracklayer_1.24.2
> [31] scales_0.2.4 sendmailR_1.1-2 splines_3.1.0
> [34] stats4_3.1.0 stringr_0.6.2 survival_2.37-7
> [37] tcltk_3.1.0 tools_3.1.0
> VariantAnnotation_1.10.5
> [40] XML_3.98-1.1 zlibbioc_1.10.0
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
Tengfei Yin, PhD
Product Manager
Seven Bridges Genomics
sbgenomics.com
One Broadway FL 7
Cambridge, MA 02142
(617) 866-0446
[[alternative HTML version deleted]]
More information about the Bioconductor
mailing list