[BioC] writing a fasta file in blocks

Martin Morgan mtmorgan at fhcrc.org
Wed Jun 9 03:50:58 CEST 2010


On 06/09/2010 02:31 AM, Kasper Daniel Hansen wrote:
> Doing what Fahim suggests internally in writeFASTA has been on my todo
> list for a while, and it will significantly speed up the writing of
> fasta files with many small records.  Guess I should do it now, and
> cross it off my list.
> 
> But Fahim: I am not sure it is possible to do what you want to do with
> the current function (at least if you are using Biostrings), but I
> could be wrong.  If you want to investigate further, note that the
> file can be a connection (?connection).
> 
> Kasper
> 
> On Tue, Jun 8, 2010 at 4:26 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>> On Mon, Jun 7, 2010 at 11:12 PM, Fahim Md <fahim.md at gmail.com> wrote:
>>
>>> I have a data File, the format of which is given below. It has two fields,
>>> namely  sequence and probeset name.
>>>    sequence                                       Probe Set Name
>>> GCTACTTTACTCCAGAATTTTGTTA      1367452_at.1
>>> TTAGAAAGCCGCAATTTGGTCCCGC    1367452_at.2
>>> GCCACATCCTGACTACTGCAGTATA     1367452_at.3
>>> ............
>>> AAAAAAAAGGGGGGGTCCCCCCCC     1234567_at.1
>>>
>>>
>>> Now, I want to convert that into FASTA format as follows
>>>
>>>> 1367452_at.1
>>> GCTACTTTACTCCAGAATTTTGTTA
>>>> 1367452_at.2
>>> TTAGAAAGCCGCAATTTGGTCCCGC
>>>> 1367452_at.3
>>> GCCACATCCTGACTACTGCAGTATA
>>> .......
>>> ....
>>>> 1234567_at.1
>>> AAAAAAAAAAAAACCCCCCCCCCCC
>>>
>>>
>>> I am getting the required output by using writeFASTA(..) function but it is
>>> too slow because I am using loop and in every loop it access the file to
>>> write into.
>>>
>>> Is there anyway through which I can write this fasta information into some
>>> variable and once I am done I write back that  variable into the required
>>> file.

For short sequences where line wrapping is not important, you might
input the data with

  df = read.table(...)

and the like, create a template for the output

  fasta = character(nrow(df))

then fill it in (no loop required)

  fasta[c(TRUE, FALSE)] = paste(">", df[["Probe.Set.Name"]])
  fasta[c(FALSE, TRUE)] = df[["sequence"]]

and save it

  write(fasta, "/some/file.fasta")

Martin

>>>
>>>
>> Hi,Fahim.
>>
>> Probably most appropriate for here and not bioc-devel.  Perhaps a
>> reproducible code example and some sessionInfo() would be helpful.
>>
>> Sean
>>
>>        [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list