[BioC] Bioconductor Digest, Vol 138, Issue 27

Wed Aug 27 19:29:20 CEST 2014

Hey Laurent,

Thanks a lot, just quickly go through it, this is very useful for me! I
will look into it.

cheers

Tengfei

On Wed, Aug 27, 2014 at 6:42 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:

>
> Dear Anne and Tengfei,
>
> The mapping Pbase vignette [1] is an initial description of mapping protein
> coordinates back to the genome. My plan is to implement what is
> described in the vignette in the package but haven't had time to do so
> yet.
>
> Please do not hesitate to comment or make suggestions that would be
> useful to you or inter-operable with your use cases.
>
> Best wishes,
>
> Laurent
>
> [1]
> http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/doc/mapping.html
>
>
> On 27 August 2014 11:00, bioconductor-request at r-project.org wrote:
>
> > Message: 25
> > Date: Tue, 26 Aug 2014 18:37:49 -0400
> > From: Tengfei Yin <tengfei.yin at sbgenomics.com>
> > To: Anne Deslattes Mays <ad376 at georgetown.edu>
> > Cc: Anne Deslattes Mays Cc Routing Num 255071981
> >       <adeslat at sbresearchllc.com>,    Bioconductor mailing list
> >       <bioconductor at r-project.org>
> > Subject: Re: [BioC] Positional Details with Features through
> >       UniProt.ws Ultimately to display as tracks in ggbio
> > Message-ID:
> >       <
> CAGkUe7VoKqS4GuVBcoB5C2_23myhJC51LMu1nV7g1_4k2iNHoA at mail.gmail.com>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hey Anne,
> >
> > So sorry for the late reply.
> >
> > Ideally, I should have some kind of mapper function in biovizBase to help
> > map protein space to genomic space, so you don't  have to do it yourself,
> > but before I have that, a hack would be massage your protein domain data
> > into a GRanges object, with domain function as coloumn, and use genomic
> > coordinates, and then create a separate track to plot the object as
> > rectangle and  use color legend to indicate domain function.
> >
> > I will try to develop a more general approach for doing this, if you
> want,
> > please send me an example RData or example data, so we can work on that
> > together.
> >
> > ps: in case I don't miss your request, feel free to use github page
> issues
> > <https://github.com/tengfei/ggbio/issues>here
> >
> > cheers
> >
> > Tengfei
> >
> >
> >
> >
> > On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <
> ad376 at georgetown.edu>
> > wrote:
> >
> >> Dear all,
> >>
> >> biocLite(?UniProt.ws?)
> >> library(UniProt.ws)
> >>
> >>
> >>
> select(UniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),keytype="UNIPROTKB")
> >> Getting extra data for P02794 NA NA etc
> >>   UNIPROTKB                         DOMAINS
> >> 1    P02794 Ferritin-like diiron domain (1)
> >>
> >>
> >>   FEATURES
> >> 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator
> >> methionine (1); Metal binding (6); Modified residue (4); Sequence
> conflict
> >> (1); Turn (2)
> >>
> >> What I want are the positional details for each of these features ?
> which
> >> are visible through the uniprot web page.
> >> FTH1 is 183 amino acids in length.  There are 6 metal binding sites,
> each
> >> at a specific position.
> >> This information is there since you can have the web site return the
> >> positional details.  I would like them so I may manipulate them with new
> >> evidential information.
> >>
> >> Ultimately I wish to display them with tracks from ggbio ?
> >> pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
> >>                                  param = ScanBamParam(which =
> >> genesymbol["FTH1"],what=c("seq")),
> >>                                  use.names = TRUE)
> >>
> >> FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])
> >>
> >> So here I have sample information which I have aligned to the reference
> >> genome.  I retrieve that information from a bam file.
> >> # create the GAlignments objects for each isoform
> >> FTH1.isoform.1  <- pb.53A.pos.ga[c(7)]
> >> FTH1.isoform.2  <- pb.53A.pos.ga[c(15)]
> >> FTH1.isoform.3  <- pb.53A.pos.ga[c(13)]
> >> FTH1.isoform.4  <- pb.53A.pos.ga[c(8)]
> >> FTH1.isoform.5  <- pb.53A.pos.ga[c(2)]
> >> FTH1.isoform.6  <- pb.53A.pos.ga[c(1)]
> >>
> >>
> >> p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
> >> p2 <- autoplot(FTH1.isoform.2, fill =  "blue", color = "blue")
> >> p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
> >> p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
> >> p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
> >> p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")
> >>
> >> tracks( FTH1=p1.FTH1,
> >>        "Iso 1"=p1,
> >>        "Iso 2"=p2,
> >>        "Iso 3"=p3,
> >>        "Iso 4"=p4,
> >>        "Iso 5"=p5,
> >>        "Iso 6"=p6)
> >>
> >>
> >> I then can autopilot each of the separate isoforms.   What I want to do
> >> however, is annotate the isoforms so that they each show the coding
> region
> >> with the full height of the bar, and a reduced height for the non-coding
> >> regions.
> >>
> >> Additionally, I want to color the graphic with the details for the
> >> protein, such as the metal binding sites, domains, etc.  So that
> >> computationally I can generate an informative picture which explains
> what
> >> is lost or gained in separate isoforms.
> >>
> >> Thoughts?
> >>
> >> Anne
> >> R version 3.1.0 (2014-04-10)
> >> Platform: x86_64-apple-darwin13.1.0 (64-bit)
> >>
> >> locale:
> >> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> >>
> >> attached base packages:
> >> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> >> [8] base
> >>
> >> other attached packages:
> >>  [1] UniProt.ws_2.4.2
> >>  [2] RCurl_1.95-4.3
> >>  [3] bitops_1.0-6
> >>  [4] RSQLite_0.11.4
> >>  [5] DBI_0.2-7
> >>  [6] biomaRt_2.20.0
> >>  [7] BiocInstaller_1.14.2
> >>  [8] GenomicAlignments_1.0.5
> >>  [9] BSgenome_1.32.0
> >> [10] Rsamtools_1.16.1
> >> [11] Biostrings_2.32.1
> >> [12] XVector_0.4.0
> >> [13] ggbio_1.12.8
> >> [14] ggplot2_1.0.0
> >> [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
> >> [16] GenomicFeatures_1.16.2
> >> [17] AnnotationDbi_1.26.0
> >> [18] Biobase_2.24.0
> >> [19] GenomicRanges_1.16.4
> >> [20] GenomeInfoDb_1.0.2
> >> [21] IRanges_1.22.10
> >> [22] BiocGenerics_0.10.0
> >>
> >> loaded via a namespace (and not attached):
> >>  [1] BatchJobs_1.3            BBmisc_1.7
>  BiocParallel_0.6.1
> >>  [4] biovizBase_1.12.1        brew_1.0-6               checkmate_1.2
> >>  [7] cluster_1.15.2           codetools_0.2-8          colorspace_1.2-4
> >> [10] dichromat_2.0-0          digest_0.6.4             fail_1.2
> >> [13] foreach_1.4.2            Formula_1.1-2            grid_3.1.0
> >> [16] gridExtra_0.9.1          gtable_0.1.2             Hmisc_3.14-4
> >> [19] iterators_1.0.7          labeling_0.2             lattice_0.20-29
> >> [22] latticeExtra_0.6-26      MASS_7.3-33              munsell_0.4.2
> >> [25] plyr_1.8.1               proto_0.3-10
>  RColorBrewer_1.0-5
> >> [28] Rcpp_0.11.2              reshape2_1.4
>  rtracklayer_1.24.2
> >> [31] scales_0.2.4             sendmailR_1.1-2          splines_3.1.0
> >> [34] stats4_3.1.0             stringr_0.6.2            survival_2.37-7
> >> [37] tcltk_3.1.0              tools_3.1.0
> >> VariantAnnotation_1.10.5
> >> [40] XML_3.98-1.1             zlibbioc_1.10.0
> >>         [[alternative HTML version deleted]]
> >>
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> Laurent Gatto
> http://cpu.sysbiol.cam.ac.uk/
>

-- 
Tengfei Yin, PhD
Product Manager
Seven Bridges Genomics
sbgenomics.com
One Broadway FL 7
Cambridge, MA 02142
(617) 866-0446

	[[alternative HTML version deleted]]