[BioC] How to prpare the input data to writeFASTA ? Examples of CharacterToFASTArecords ...

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Jul 17 04:32:04 CEST 2009


Hi,

On Jul 16, 2009, at 9:16 PM, <mauede at alice.it> <mauede at alice.it> wrote:

> I realize function write FASTA expects a list with two items,  
> respectively, description and sequence.
> However, just passing a list won't work (please, see code at the  
> bottom of this message)

Sorry ... perhaps we're not understanding your problem. Doesn't the  
reply I sent earlier today work? If I'm not mistaken, didn't you say  
that you two variables that have the description and sequence info for  
your data, like so?

library(Biostrings)
desc <- paste("gene", 1:10, " some other stuff", sep="")
seqs <- replicate(10,paste(sample(c('A','C','G', 'T'), 50,  
replace=TRUE), collapse=""))

Because this works for me:

fasta.list <- lapply(1:length(desc), function(i) list(desc=desc[i],  
seq=seqs[i]))
writeFASTA(fasta.list, 'test.fa')


> I saw there is the helper function CharacterToFASTArecords(x) that  
> presumably generates the right input  data format.
> It would b very useful to get some example of  
> CharacterToFASTArecords(x) usage.
> The on-line documentation reads:
> "For CharacterToFASTArecords, the (possibly named) character vector  
> to be converted to a list of FASTA records as one returned by  
> readFASTA"
> Since I have description and sequnce in separate variables ... I do  
> not know how to use it.

That function expects the description to be in the "names" attribute  
of your character vector. For example, taking the same variables from  
above:

names(seqs) <- paste("gene", 1:10, sep='')
fasta.list <- CharacterToFASTArecords(seqs)

>      zz <- file (filname,"w")
>           write(miRNA.rec, zz, append = FALSE)
>           write(miRNA.seq,zz, append = TRUE)

I don't get why you're writing something here manually if this is  
supposed to be your fasta file, then calling writeFASTA on it ...

> #
>       geneDesc <- paste (">",gene.id, "|",  
> gene.map[i,"ensembl_transcript_id"], sep="")
>          geneSeq <-  gene.seq[i,"3utr"]
>          gene.string <- list(desc=geneDesc, seq=geneSeq)
>          writeFASTA (gene.string, zz)

For starters, you shouldn't be pasting the ">" in the description  
attribute, as writeFASTA will take care of it.

Assuming your seqs and descs vars are as I wrote above, just use the  
example as I gave it ... it'll work.

-steve

btw - I'm not sure cross posting to r-help is necessary, as this is  
BioC specific, so I removed it from the reply.

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

Contact Info: http://cbio.mskcc.org/~lianos



More information about the Bioconductor mailing list