[BioC] Positional Details with Features through UniProt.ws Ultimately to display as tracks in ggbio

Anne Deslattes Mays ad376 at georgetown.edu
Sat Aug 16 12:57:09 CEST 2014


Dear all,

biocLite(“UniProt.ws”)
library(UniProt.ws)

 select(UniProt.ws,keys=("P02794"),columns=c("DOMAINS","FEATURES"),keytype="UNIPROTKB")
Getting extra data for P02794 NA NA etc
  UNIPROTKB                         DOMAINS
1    P02794 Ferritin-like diiron domain (1)
                                                                                                                                                        FEATURES
1 Chain (2); Domain (1); Erroneous initiation (1); Helix (6); Initiator methionine (1); Metal binding (6); Modified residue (4); Sequence conflict (1); Turn (2)

What I want are the positional details for each of these features — which are visible through the uniprot web page. 
FTH1 is 183 amino acids in length.  There are 6 metal binding sites, each at a specific position.
This information is there since you can have the web site return the positional details.  I would like them so I may manipulate them with new evidential information.

Ultimately I wish to display them with tracks from ggbio — 
pb.53A.pos.ga <- readGAlignmentsFromBam(pb.53A.pos.bamfile,
                                 param = ScanBamParam(which = genesymbol["FTH1"],what=c("seq")),
                                 use.names = TRUE)

FTH1.ga <- geom_alignment(data = txdb,which=genesymbol["FTH1"])

So here I have sample information which I have aligned to the reference genome.  I retrieve that information from a bam file. 
# create the GAlignments objects for each isoform
FTH1.isoform.1  <- pb.53A.pos.ga[c(7)]
FTH1.isoform.2  <- pb.53A.pos.ga[c(15)]
FTH1.isoform.3  <- pb.53A.pos.ga[c(13)]
FTH1.isoform.4  <- pb.53A.pos.ga[c(8)]
FTH1.isoform.5  <- pb.53A.pos.ga[c(2)]
FTH1.isoform.6  <- pb.53A.pos.ga[c(1)]


p1 <- autoplot(FTH1.isoform.1, fill = "brown", color = "brown")
p2 <- autoplot(FTH1.isoform.2, fill =  "blue", color = "blue")
p3 <- autoplot(FTH1.isoform.3, fill = "brown", color = "brown")
p4 <- autoplot(FTH1.isoform.4, fill = "brown", color = "brown")
p5 <- autoplot(FTH1.isoform.5, fill = "brown", color = "brown")
p6 <- autoplot(FTH1.isoform.6, fill = "brown", color = "brown")

tracks( FTH1=p1.FTH1,
       "Iso 1"=p1,
       "Iso 2"=p2,
       "Iso 3"=p3,
       "Iso 4"=p4,
       "Iso 5"=p5,
       "Iso 6"=p6)


I then can autopilot each of the separate isoforms.   What I want to do however, is annotate the isoforms so that they each show the coding region with the full height of the bar, and a reduced height for the non-coding regions.

Additionally, I want to color the graphic with the details for the protein, such as the metal binding sites, domains, etc.  So that computationally I can generate an informative picture which explains what is lost or gained in separate isoforms.

Thoughts?

Anne
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] UniProt.ws_2.4.2                        
 [2] RCurl_1.95-4.3                          
 [3] bitops_1.0-6                            
 [4] RSQLite_0.11.4                          
 [5] DBI_0.2-7                               
 [6] biomaRt_2.20.0                          
 [7] BiocInstaller_1.14.2                    
 [8] GenomicAlignments_1.0.5                 
 [9] BSgenome_1.32.0                         
[10] Rsamtools_1.16.1                        
[11] Biostrings_2.32.1                       
[12] XVector_0.4.0                           
[13] ggbio_1.12.8                            
[14] ggplot2_1.0.0                           
[15] TxDb.Hsapiens.UCSC.hg19.knownGene_2.14.0
[16] GenomicFeatures_1.16.2                  
[17] AnnotationDbi_1.26.0                    
[18] Biobase_2.24.0                          
[19] GenomicRanges_1.16.4                    
[20] GenomeInfoDb_1.0.2                      
[21] IRanges_1.22.10                         
[22] BiocGenerics_0.10.0                     

loaded via a namespace (and not attached):
 [1] BatchJobs_1.3            BBmisc_1.7               BiocParallel_0.6.1      
 [4] biovizBase_1.12.1        brew_1.0-6               checkmate_1.2           
 [7] cluster_1.15.2           codetools_0.2-8          colorspace_1.2-4        
[10] dichromat_2.0-0          digest_0.6.4             fail_1.2                
[13] foreach_1.4.2            Formula_1.1-2            grid_3.1.0              
[16] gridExtra_0.9.1          gtable_0.1.2             Hmisc_3.14-4            
[19] iterators_1.0.7          labeling_0.2             lattice_0.20-29         
[22] latticeExtra_0.6-26      MASS_7.3-33              munsell_0.4.2           
[25] plyr_1.8.1               proto_0.3-10             RColorBrewer_1.0-5      
[28] Rcpp_0.11.2              reshape2_1.4             rtracklayer_1.24.2      
[31] scales_0.2.4             sendmailR_1.1-2          splines_3.1.0           
[34] stats4_3.1.0             stringr_0.6.2            survival_2.37-7         
[37] tcltk_3.1.0              tools_3.1.0              VariantAnnotation_1.10.5
[40] XML_3.98-1.1             zlibbioc_1.10.0       
	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list