[BioC] Positional Details with Features through UniProt.ws Ultimately to display as tracks in ggbio

Wed Aug 27 00:37:49 CEST 2014

Hey Anne,

So sorry for the late reply.

Ideally, I should have some kind of mapper function in biovizBase to help
map protein space to genomic space, so you don't  have to do it yourself,
but before I have that, a hack would be massage your protein domain data
into a GRanges object, with domain function as coloumn, and use genomic
coordinates, and then create a separate track to plot the object as
rectangle and  use color legend to indicate domain function.

I will try to develop a more general approach for doing this, if you want,
please send me an example RData or example data, so we can work on that
together.

ps: in case I don't miss your request, feel free to use github page issues
<https://github.com/tengfei/ggbio/issues>here

cheers

Tengfei

On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at georgetown.edu>
wrote:

> Dear all,
>
> biocLite(“UniProt.ws”)
> library(UniProt.ws)
>
>
>  select(UniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),keytype="UNIPROTKB")
> Getting extra data for P02794 NA NA etc
>   UNIPROTKB                         DOMAINS
> 1    P02794 Ferritin-like diiron domain (1)
>
>
>   FEATURES
> 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator
> methionine (1); Metal binding (6); Modified residue (4); Sequence conflict
> (1); Turn (2)
>
> What I want are the positional details for each of these features — which
> are visible through the uniprot web page.
> FTH1 is 183 amino acids in length.  There are 6 metal binding sites, each
> at a specific position.
> This information is there since you can have the web site return the
> positional details.  I would like them so I may manipulate them with new
> evidential information.
>
> Ultimately I wish to display them with tracks from ggbio —
> pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
>                                  param = ScanBamParam(which =
> genesymbol["FTH1"],what=c("seq")),
>                                  use.names = TRUE)
>
> FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])
>
> So here I have sample information which I have aligned to the reference
> genome.  I retrieve that information from a bam file.
> # create the GAlignments objects for each isoform
> FTH1.isoform.1  <- pb.53A.pos.ga[c(7)]
> FTH1.isoform.2  <- pb.53A.pos.ga[c(15)]
> FTH1.isoform.3  <- pb.53A.pos.ga[c(13)]
> FTH1.isoform.4  <- pb.53A.pos.ga[c(8)]
> FTH1.isoform.5  <- pb.53A.pos.ga[c(2)]
> FTH1.isoform.6  <- pb.53A.pos.ga[c(1)]
>
>
> p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
> p2 <- autoplot(FTH1.isoform.2, fill =  "blue", color = "blue")
> p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
> p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
> p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
> p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")
>
> tracks( FTH1=p1.FTH1,
>        "Iso 1"=p1,
>        "Iso 2"=p2,
>        "Iso 3"=p3,
>        "Iso 4"=p4,
>        "Iso 5"=p5,
>        "Iso 6"=p6)
>
>
> I then can autopilot each of the separate isoforms.   What I want to do
> however, is annotate the isoforms so that they each show the coding region
> with the full height of the bar, and a reduced height for the non-coding
> regions.
>
> Additionally, I want to color the graphic with the details for the
> protein, such as the metal binding sites, domains, etc.  So that
> computationally I can generate an informative picture which explains what
> is lost or gained in separate isoforms.
>
> Thoughts?
>
> Anne
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>  [1] UniProt.ws_2.4.2
>  [2] RCurl_1.95-4.3
>  [3] bitops_1.0-6
>  [4] RSQLite_0.11.4
>  [5] DBI_0.2-7
>  [6] biomaRt_2.20.0
>  [7] BiocInstaller_1.14.2
>  [8] GenomicAlignments_1.0.5
>  [9] BSgenome_1.32.0
> [10] Rsamtools_1.16.1
> [11] Biostrings_2.32.1
> [12] XVector_0.4.0
> [13] ggbio_1.12.8
> [14] ggplot2_1.0.0
> [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
> [16] GenomicFeatures_1.16.2
> [17] AnnotationDbi_1.26.0
> [18] Biobase_2.24.0
> [19] GenomicRanges_1.16.4
> [20] GenomeInfoDb_1.0.2
> [21] IRanges_1.22.10
> [22] BiocGenerics_0.10.0
>
> loaded via a namespace (and not attached):
>  [1] BatchJobs_1.3            BBmisc_1.7               BiocParallel_0.6.1
>  [4] biovizBase_1.12.1        brew_1.0-6               checkmate_1.2
>  [7] cluster_1.15.2           codetools_0.2-8          colorspace_1.2-4
> [10] dichromat_2.0-0          digest_0.6.4             fail_1.2
> [13] foreach_1.4.2            Formula_1.1-2            grid_3.1.0
> [16] gridExtra_0.9.1          gtable_0.1.2             Hmisc_3.14-4
> [19] iterators_1.0.7          labeling_0.2             lattice_0.20-29
> [22] latticeExtra_0.6-26      MASS_7.3-33              munsell_0.4.2
> [25] plyr_1.8.1               proto_0.3-10             RColorBrewer_1.0-5
> [28] Rcpp_0.11.2              reshape2_1.4             rtracklayer_1.24.2
> [31] scales_0.2.4             sendmailR_1.1-2          splines_3.1.0
> [34] stats4_3.1.0             stringr_0.6.2            survival_2.37-7
> [37] tcltk_3.1.0              tools_3.1.0
> VariantAnnotation_1.10.5
> [40] XML_3.98-1.1             zlibbioc_1.10.0
>         [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Tengfei Yin, PhD
Product Manager
Seven Bridges Genomics
sbgenomics.com
One Broadway FL 7
Cambridge, MA 02142
(617) 866-0446

	[[alternative HTML version deleted]]