[BioC] Bioconductor Digest, Vol 138, Issue 27

Laurent Gatto lg390 at cam.ac.uk
Wed Aug 27 12:42:13 CEST 2014


Dear Anne and Tengfei, 

The mapping Pbase vignette [1] is an initial description of mapping protein
coordinates back to the genome. My plan is to implement what is
described in the vignette in the package but haven't had time to do so
yet. 

Please do not hesitate to comment or make suggestions that would be
useful to you or inter-operable with your use cases.

Best wishes,

Laurent

[1] http://bioconductor.org/packages/devel/bioc/vignettes/Pbase/inst/doc/mapping.html


On 27 August 2014 11:00, bioconductor-request at r-project.org wrote:

> Message: 25
> Date: Tue, 26 Aug 2014 18:37:49 -0400
> From: Tengfei Yin <tengfei.yin at sbgenomics.com>
> To: Anne Deslattes Mays <ad376 at georgetown.edu>
> Cc: Anne Deslattes Mays Cc Routing Num 255071981
> 	<adeslat at sbresearchllc.com>,	Bioconductor mailing list
> 	<bioconductor at r-project.org>
> Subject: Re: [BioC] Positional Details with Features through
> 	UniProt.ws Ultimately to display as tracks in ggbio
> Message-ID:
> 	<CAGkUe7VoKqS4GuVBcoB5C2_23myhJC51LMu1nV7g1_4k2iNHoA at mail.gmail.com>
> Content-Type: text/plain; charset="UTF-8"
>
> Hey Anne,
>
> So sorry for the late reply.
>
> Ideally, I should have some kind of mapper function in biovizBase to help
> map protein space to genomic space, so you don't  have to do it yourself,
> but before I have that, a hack would be massage your protein domain data
> into a GRanges object, with domain function as coloumn, and use genomic
> coordinates, and then create a separate track to plot the object as
> rectangle and  use color legend to indicate domain function.
>
> I will try to develop a more general approach for doing this, if you want,
> please send me an example RData or example data, so we can work on that
> together.
>
> ps: in case I don't miss your request, feel free to use github page issues
> <https://github.com/tengfei/ggbio/issues>here
>
> cheers
>
> Tengfei
>
>
>
>
> On Sat, Aug 16, 2014 at 6:57 AM, Anne Deslattes Mays <ad376 at georgetown.edu>
> wrote:
>
>> Dear all,
>>
>> biocLite(?UniProt.ws?)
>> library(UniProt.ws)
>>
>>
>>  select(UniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),keytype="UNIPROTKB")
>> Getting extra data for P02794 NA NA etc
>>   UNIPROTKB                         DOMAINS
>> 1    P02794 Ferritin-like diiron domain (1)
>>
>>
>>   FEATURES
>> 1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator
>> methionine (1); Metal binding (6); Modified residue (4); Sequence conflict
>> (1); Turn (2)
>>
>> What I want are the positional details for each of these features ? which
>> are visible through the uniprot web page.
>> FTH1 is 183 amino acids in length.  There are 6 metal binding sites, each
>> at a specific position.
>> This information is there since you can have the web site return the
>> positional details.  I would like them so I may manipulate them with new
>> evidential information.
>>
>> Ultimately I wish to display them with tracks from ggbio ?
>> pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
>>                                  param = ScanBamParam(which =
>> genesymbol["FTH1"],what=c("seq")),
>>                                  use.names = TRUE)
>>
>> FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])
>>
>> So here I have sample information which I have aligned to the reference
>> genome.  I retrieve that information from a bam file.
>> # create the GAlignments objects for each isoform
>> FTH1.isoform.1  <- pb.53A.pos.ga[c(7)]
>> FTH1.isoform.2  <- pb.53A.pos.ga[c(15)]
>> FTH1.isoform.3  <- pb.53A.pos.ga[c(13)]
>> FTH1.isoform.4  <- pb.53A.pos.ga[c(8)]
>> FTH1.isoform.5  <- pb.53A.pos.ga[c(2)]
>> FTH1.isoform.6  <- pb.53A.pos.ga[c(1)]
>>
>>
>> p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
>> p2 <- autoplot(FTH1.isoform.2, fill =  "blue", color = "blue")
>> p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
>> p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
>> p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
>> p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")
>>
>> tracks( FTH1=p1.FTH1,
>>        "Iso 1"=p1,
>>        "Iso 2"=p2,
>>        "Iso 3"=p3,
>>        "Iso 4"=p4,
>>        "Iso 5"=p5,
>>        "Iso 6"=p6)
>>
>>
>> I then can autopilot each of the separate isoforms.   What I want to do
>> however, is annotate the isoforms so that they each show the coding region
>> with the full height of the bar, and a reduced height for the non-coding
>> regions.
>>
>> Additionally, I want to color the graphic with the details for the
>> protein, such as the metal binding sites, domains, etc.  So that
>> computationally I can generate an informative picture which explains what
>> is lost or gained in separate isoforms.
>>
>> Thoughts?
>>
>> Anne
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-apple-darwin13.1.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>  [1] UniProt.ws_2.4.2
>>  [2] RCurl_1.95-4.3
>>  [3] bitops_1.0-6
>>  [4] RSQLite_0.11.4
>>  [5] DBI_0.2-7
>>  [6] biomaRt_2.20.0
>>  [7] BiocInstaller_1.14.2
>>  [8] GenomicAlignments_1.0.5
>>  [9] BSgenome_1.32.0
>> [10] Rsamtools_1.16.1
>> [11] Biostrings_2.32.1
>> [12] XVector_0.4.0
>> [13] ggbio_1.12.8
>> [14] ggplot2_1.0.0
>> [15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
>> [16] GenomicFeatures_1.16.2
>> [17] AnnotationDbi_1.26.0
>> [18] Biobase_2.24.0
>> [19] GenomicRanges_1.16.4
>> [20] GenomeInfoDb_1.0.2
>> [21] IRanges_1.22.10
>> [22] BiocGenerics_0.10.0
>>
>> loaded via a namespace (and not attached):
>>  [1] BatchJobs_1.3            BBmisc_1.7               BiocParallel_0.6.1
>>  [4] biovizBase_1.12.1        brew_1.0-6               checkmate_1.2
>>  [7] cluster_1.15.2           codetools_0.2-8          colorspace_1.2-4
>> [10] dichromat_2.0-0          digest_0.6.4             fail_1.2
>> [13] foreach_1.4.2            Formula_1.1-2            grid_3.1.0
>> [16] gridExtra_0.9.1          gtable_0.1.2             Hmisc_3.14-4
>> [19] iterators_1.0.7          labeling_0.2             lattice_0.20-29
>> [22] latticeExtra_0.6-26      MASS_7.3-33              munsell_0.4.2
>> [25] plyr_1.8.1               proto_0.3-10             RColorBrewer_1.0-5
>> [28] Rcpp_0.11.2              reshape2_1.4             rtracklayer_1.24.2
>> [31] scales_0.2.4             sendmailR_1.1-2          splines_3.1.0
>> [34] stats4_3.1.0             stringr_0.6.2            survival_2.37-7
>> [37] tcltk_3.1.0              tools_3.1.0
>> VariantAnnotation_1.10.5
>> [40] XML_3.98-1.1             zlibbioc_1.10.0
>>         [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/



More information about the Bioconductor mailing list