[BioC] A problem

Martin Morgan mtmorgan at fhcrc.org
Wed Aug 21 13:35:19 CEST 2013


On 08/20/2013 01:52 AM, tooba AbaC wrote:
> Dear Sir/Madam,
>
> I have a problem with "readDNAStringSet" function. I've used this function
> for a file that contains a number of sequences with FASTA format (Glycolysis
>   Gluconeogenesis - Bos taurus.txt). But this function doesn't work for
> another similar file (tca.txt).
> The error is:
> "Error in .Call2("read_fasta_in_XStringSet", efp_list, nrec, skip,
> use.names,
> reading FASTA file tca.txt: ">" expected at beginning of line 1"
>
> I'm sure about the format (both of them are FASTA)! I've checked it! The
> second file is completely similar with first one!
> My second file contains an unicode character that marks
> the byte order or the file (U+FEFF).  One of my friend investigated
> the file with the linux tool "less" which shows the ByteOrderMark. After
> removing it (e.g. "tail --bytes=+4 tca.txt > tca2.txt"; result attached)
> everything works like expected.
> What's the problem with readDNAStringSet function?

It's expecting plain ASCII characters, not UTF8 or other encoding; this is a 
very standard assumption in bioinformatics. Likely you opened your file in an 
editor that changed its encoding; don't do that, or if you do be sure to save 
the file as 'plain text'.

Martin

>
> Regards,
> Tooba Abbassi Daloii
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list