[BioC] using and combining of "subseq"

Hervé Pagès hpages at fhcrc.org
Fri Jun 17 02:45:00 CEST 2011


On 11-06-15 11:09 AM, Harris A. Jaffee wrote:
> x1 = "AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC"
> x2 = "TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA"
> X = DNAStringSet(c(x1, x2))
>
>  > X
> A DNAStringSet instance of length 2
> width seq
> [1] 40 AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC
> [2] 40 TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA
>
>> start1 = 1
>> end1 = 10
>>
>> start2 = 21
>> end2 = 25
>
> s1 = subseq(X, start1, end1)
> s2 = subseq(X, start2, end2)
> answer = DNAStringSet(paste(s1, s2, sep=""))

Or 'answer = xscat(s1, s2)' would be more efficient here, especially
if 's1' and 's2' contain hundreds of thousands of sequences.

Cheers,
H.

>
>  > answer
> A DNAStringSet instance of length 2
> width seq
> [1] 15 AAAAAAAAAAGGGGG
> [2] 15 TTTTTTTTTTCCCCC
>
> On Jun 15, 2011, at 7:36 AM, Kristian Ullrich wrote:
>
>> Hello Biostrings curators,
>>
>> again the question to you:
>>
>> Is there an easier way to solve the follwing:
>>
>> R-code:
>> ####################
>> ####################
>> library(Biostrings)
>>
>> #example sequence
>> seq.list=list()
>> seq.list[1]="AAAAAAAAAATTTTTTTTTTGGGGGGGGGGCCCCCCCCCC"
>> seq.list[2]="TTTTTTTTTTGGGGGGGGGGCCCCCCCCCCAAAAAAAAAA"
>> fas.seq = DNAStringSet(unlist(seq.list))
>>
>> #defining start and end points of subseq
>> start1 = 1
>> end1 = 10
>>
>> start2 = 21
>> end2 = 25
>>
>> #creating first and second subseq
>> first.subseq = subseq(fas.seq,start1,end1)
>> second.subseq = subseq(fas.seq,start2,end2)
>>
>> new.seq =
>> DNAStringSet(apply(sapply(list(first.subseq,second.subseq),as.character),1,function(x)
>> paste(x,collapse="")))
>> names(new.seq) = names(fas.seq)
>> ####################
>> ####################
>>
>> I basically want to combine subseqs from one DNAStringset, something
>> like:
>>
>> subseq(DNAStringSet, start = c(start1,start2), end = c(end1,end2))
>>
>> would be nice.
>>
>> Thank you in anticipation
>>
>> Kristian Ullrich
>> --
>> Kristian Ullrich
>>
>> Leibniz Institute of Plant Biochemistry
>> Weinberg 3
>> D-06120 Halle (Saale), Germany
>> phone +49 345 5582 1221
>> fax +49 345 5582 1209
>> mail kullrich at ipb-halle.de
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list