[BioC] obtain DNA sequence

Tue Sep 1 20:31:17 CEST 2009

Hi Patrick,

Thanks for your response. I will look into IRanges and Xstring.
I also tried your code, however it gives me the following error:

> mymat
    Chr     Start      Stop
1  chr9  79466420  79466570
2  chr6  50495860  50496010
3  chr8  19687900  19688050
4  chrX  90313740  90313890
5  chr4 117732780 117732930
6 chr11   4090400   4090550

> uniqueChr <- unique(mymat[,"Chr"])
> extractedDNA <- character(nrow(mymat))
> for (chr in uniqueChr) {
+   selected <- which(mymat[,"Chr"] == chr)
+   extractedDNA[selected] <- as.character(Views(Mmusculus[[chr]], 
+ mymat[selected,"Start"], mymat[selected,"End"]))
+ }

Error in newViews(subject, start = start, end = end, names = names, Class = "XStringViews") : 
  'start' and 'end' must be numeric vectors
In addition: Warning message:
In Views(Mmusculus[[chr]], mymat[selected, "Start"], mymat[selected,  :
  masks were dropped

Simon

-----Original Message-----
From: Patrick Aboyoun [mailto:paboyoun at fhcrc.org] 
Sent: Tuesday, September 01, 2009 2:21 PM
To: Biddie, Simon (NIH/NCI) [F]
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] obtain DNA sequence

Simon,
Below is code that meets the needs of your explicit question

mymat <- <<the matrix you have below>>
uniqueChr <- unique(mymat[,"Chr"])
extractedDNA <- character(nrow(mymat))
for (chr in uniqueChr) {
  selected <- which(mymat[,"Chr"] == chr)
  extractedDNA[selected] <- as.character(Views(Mmusculus[[chr]], 
mymat[selected,"Start"], mymat[selected,"End"]))
}

The question I have for you is have you tried using the IRanges 
framework to represent your ranges? It would make this type of 
processing easier to perform. There is also write functions such as 
write.XStringSet and write.XStringViews that provide export 
functionality without requiring you to coerce the DNA sequences into 
character vectors.

Patrick

Biddie, Simon (NIH/NCI) [F] wrote:
> Dear All,
>
> I am trying to obtain DNA sequences (mouse) from chromosome coordinates. I am relatively new with R and Bioconductor and would appreciate any help.
>
> I have the following style matrix:
>
>     Chr     Start      Stop
> 1  chr9  79466420  79466570
> 2  chr6  50495860  50496010
> 3  chr8  19687900  19688050
> 4  chrX  90313740  90313890
> 5  chr4 117732780 117732930
> 6 chr11   4090400   4090550
>
> I can use the following code to obtain a single sequence by typing in the chromosome number, start and stop manually:
>
>   
>> library(BSgenome.Mmusculus.UCSC.mm9)
>>     
>
>   
>> seq1 = subseq(Mmusculus$chr9,79466420,79466570)
>>     
>
>   
>> as(seq1, "character")
>>     
>
> How would I do this for all the rows in a matrix to be output as a single txt or csv file? ... without having to type each row (I have up to 15,000!) one at a time. Please find below the sessionInfo.
>
> Thank you for any help,
>
> Simon
>
>   
>> sessionInfo()
>>     
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] BSgenome.Mmusculus.UCSC.mm9_1.3.11 BSgenome_1.10.5
> [3] Biostrings_2.10.22                 IRanges_1.0.16
> [5] R.utils_1.1.3                      R.oo_1.4.6
> [7] R.methodsS3_1.0.3
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.1         lattice_0.17-25    Matrix_0.999375-23
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>