[BioC] insert Ns for repeat masked regions

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Mar 2 04:03:05 CET 2011


Hi,

On Tue, Mar 1, 2011 at 5:50 PM, rna seq <rna.seeker at gmail.com> wrote:
> Hello List,
>
> I am trying to retrieve a sequence of ~1000 nts using the getseq() function
> from the BSgenomes package
>
> I would like to replace the repeat masked regions with Ns
>
> using something similar to the inject snps function from the
> SNPlocs.Hsapiens.dbSNP package.
>
> So far I can grab sequence from the genome either masked: getSeq(Hsapiens,
> "chr21",  33665196, 33665435,  as.character=FALSE)
>
> or unmasked: getSeq(hg19snp, "chr21",  33665196, 33665435,
> as.character=FALSE)
>
> The problem is that the masked function returns a gap:
>
> TCCCAGGATGTGACATTGTTTGCCAGTGCAGAGGC...GGAGCTTTGGAAGAAGAGAGAGTTGACTACGGAAA
>
>  and I would like the gap to be filled with Ns?

I'm not sure that there is a gap there, as the middle '...' is just a
result of how XString objects "show" themselves in R.

Is that what you're talking about?

Look at the result you get when you set as.character=TRUE

R> library(BSgenome.Hsapiens.UCSC.hg19)
R> getSeq(Hsapiens, "chr21",  33665196, 33665435,  as.character=TRUE)

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list