[BioC] XStringViews with strand?

Hervé Pagès hpages at fhcrc.org
Thu Jun 4 20:14:51 CEST 2009


Hi Cei,

Cei Abreu-Goodger wrote:
> Hi all,
> 
> I'm trying to read in a fasta sequence, extract the "gene sequences" and 
> write these out to a fasta file. I can read the sequences with 
> read.DNAStringSet(), obtain an XStringViews object with Views(), but I'm 
> having trouble knowing how to obtain the reverse complement sequence for 
> the genes on the "-" strand. I can get them with a reverseComplement()
> of the XStringViews object, but I can't overwrite the elements of this 
> object. So my solution involves dealing separately with all the genes on 
> the "+" strand and those on the "-" strand. Is there an easier way?
> 
> An example:
> 
> file <- system.file("extdata", "someORF.fa", package="Biostrings")
> x <- read.DNAStringSet(file, "fasta")[[1]]
> 
> names   <- c("a","b","c")
> starts  <- c(10,384,947)
> ends    <- starts+20
> strands <- c("+","-","+")
> 
> myViews <- Views(x, start=starts, end=ends)
> names(myViews) <- names
> 
> revViews <- reverseComplement(myViews[strands=="-"])
> posViews <- myViews[strands=="+"]

Yes it has to be done separately. The content of a view cannot be modified
because that would mean modifying the underlying subject so you would end
up with a subject that is a mix of + and - strand. And what should be done
when 2 views overlap, 1 being associated with a gene on the + strand and
the other one with a gene on the - strand?

In the near future we will support replacement of the elements of a DNAStringSet
object through "[<-" and "[[<-". Then it will be possible to reverseComplement
some of its elements with something like this:

   x[strands == "-"] <- reverseComplement(x[strands == "-"])

But that will be for XStringSet only, not for XStringViews.

These developments will take place in the devel version of Biostrings in the
next few weeks.

Cheers,
H.


> 
> 
> Many thanks,
> 
> Cei
> 
> sessionInfo()
> R version 2.9.0 (2009-04-17)
> i386-apple-darwin8.11.1
> 
> locale:
> en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8
> 
> attached base packages:
> [1] tools     stats     graphics  grDevices datasets  utils     methods
> [8] base
> 
> other attached packages:
> [1] Biostrings_2.12.1 IRanges_1.2.2     Biobase_2.4.1
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list