[BioC] fasta sequence is too long to be read

Steve Lianoglou mailinglist.honeypot at gmail.com
Tue Nov 29 19:06:38 CET 2011


Hi,

On Tue, Nov 29, 2011 at 12:32 PM, wang peter <wng.peter at gmail.com> wrote:
> hello, all
>
> i met this problem
>
> rm(list=ls())
> library(ShortRead);
> fastafile="unigenes.fasta"
> seqs <- readFasta(fastafile);
>
>
> Error in .read.fasta.in.XStringSet(efp_list, nrec, skip, use.names,
> elementType,  :
>  reading FASTA file unigenes.fasta: cannot read line 474, line is too long

How long is it?

You can always try opening the file in your favorite editor and
introducing a carriage return there to split the sequence into two
lines, perhaps.

I suspect you can use the *nix `fold` command line utility to ensure
that all your lines are less than, say 100 chars long, eg from the
command line:

$ fold -w 100 unigenes.fasta > unigenes.fold.fasta

Just make sure that none of your description lines in the fasta file
(the ones that start with ">whatever") aren't longer than whatever you
set `-w` to be.

HTH,
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list