[BioC] getSeq on a GRangesList

Martin Morgan mtmorgan at fhcrc.org
Thu Jun 23 11:35:48 CEST 2011


On 06/22/2011 04:43 PM, Michael Lawrence wrote:
> I think you want something like extractTranscriptsFromGenome(). It would be
> nice if getSeq delegated to that for GRangesList.
>
> Michael
>
> On Wed, Jun 22, 2011 at 8:58 AM, Michael Cho<remhc at channing.harvard.edu>wrote:
>
>> Hi,
>>
>> I'm trying to grab sequence from a GRangesList.  Each GRanges element has
>> several sequences which I am concatenating together and using to write a
>> fasta file (with the name of the GRanges as the name of the fasta
>> sequence).
>> I can't seem to figure out a better / faster way than to loop like this:
>>
>> seq<-getSeq(Hsapiens,myGRangesList[[i]],as.character=FALSE)
>> writeFASTA(paste(seq,collapse=""), file="myOutput.fasta",
>> desc=names(myGRangesList[i]), append=TRUE)
>>
>> Any thoughts?

maybe for some GRangesList grl

seqs <- getSeq(Hsapiens, unlist(grl, use.names=FALSE),
                as.character=TRUE)
elt <- rep(names(grl), elementLengths(grl))
sapply(split(seqs, elt), paste, collapse="")

optionally wrapping the final line in DNAStringSet(). There is just one 
call to getSeq. Working on character() in the sapply / split / paste 
would seem to avoid a lot of S4 class construction overhead. I'll also 
mention scanFa / getSeq,FaFile-method in the devel version of Rsamtools, 
which one might use if the original sequence were coming from a fasta 
file rather than BSgenome.

Martin


>> Thanks,
>> Michael
>>
>>         [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list