[BioC] how to modify a Biostrings object

Patrick Aboyoun paboyoun at fhcrc.org
Mon Feb 9 04:20:19 CET 2009


Paul,
Herve Pages is the author of this material and probably has a more  
clever response than mine, but the replaceLetterAt function (see  
help(replaceLetterAt) for details) in the Biostrings package has a  
method for DNAString instances that you can leverage.

> library(BSgenome.Hsapiens.UCSC.hg18)
> chr9 <- Hsapiens [['chr9']]
> hg18.9q32 <- subseq (chr9, 113900001, 116700000)
> unmasked(hg18.9q32)
   2800000-letter "DNAString" instance
seq:  
TGGCTGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
> replaceLetterAt(unmasked(hg18.9q32), c(1, 3, 5), "ACG")
   2800000-letter "DNAString" instance
seq:  
AGCCGGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
> new.hg18.9q32 <- replaceLetterAt(unmasked(hg18.9q32), c(1, 3, 5), "ACG")
> masks(new.hg18.9q32) <- masks(hg18.9q32)
> new.hg18.9q32
   2800000-letter "MaskedDNAString" instance (# for masking)
seq:  
AGCCGGCACCACAGCCATCCCACTAGGGGGAGCTGT...GTGACTGGAGCGTGGAATGTGGAGTTAGGCAAACTT
masks:
   maskedwidth maskedratio active names                               desc
1           0   0.0000000   TRUE AGAPS                      assembly gaps
2           0   0.0000000   TRUE   AMB   intra-contig ambiguities (empty)
3     1407815   0.5027911  FALSE    RM                       RepeatMasker
4       21329   0.0076175  FALSE   TRF Tandem Repeats Finder [period<=12]
all masks together:
   maskedwidth maskedratio
       1408618   0.5030779
all active masks together:
   maskedwidth maskedratio
             0           0


Patrick


Quoting Paul Shannon <pshannon at systemsbiology.org>:

> I wish to introduce some mutations into a segment of Hg18 chromosome 9,
> as part of a simulation I wish to run.  I am having difficulty figuring
> out how to modify the BSgenome-related data structures.
>
> I create my (wild-type) variables like this:
>
>   chr9 <- Hsapiens [['chr9']]
>   hg18.9q32 <<- subseq (chr9, 113900001, 116700000)
>
> How would one create a mutant variant of hg18.9q32?  My wish is to
> incrementally -- one gene at a time -- introduce SNPs (and later,
> indels and translocations). Sometimes these changes are intentionally
> random, sometimes I want them at very specific places, to 'spike in' an
> interesting mutation.
>
> I see that one can injectSNPs into a genome:
>
>    Hsapiens.snp <- injectSNPs (Hsapiens, "SNPlocs.Hsapiens.dbSNP.20071016")
>
> I could find no documentation for the SNPlocs data structure.
>
> hg18.9q32 appears to be a MaskedDNAString, apparently derived from
> Biostring and XString, all of which seem to be immutable.
>
> Is there any way to do this?
>
> Let me apologize in advance if this is documented and I missed it.  I
> have combed the vignettes and the help pages and found nothing.
>
> Many thanks,
>
>   - Paul
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list