[R] big data file geting truncated

Philipp Pagel p.pagel at gsf.de
Wed Aug 13 10:33:10 CEST 2003


> I used the following commands
>  mydata<-read.table("dataALLAMLtrain.txt", header=TRUE, sep
> ="\t",row.names=NULL)
> It reads data without any error
> Now if I use
> edit(mydata)
> It shows only 3916 entries, whereas the actual file contains 7129 entries)
> So it seems R is truncating the data. How  can I load the complete file?

Others have already recommended checking the length of the data.frame
using dim() and the file using wc. If it turns out that there really is
a difference in size the next thing would be to get an idea what lines
are affected: Are "random" lines missing or is everything ok up to line
3916 and then it stops? In either case - have a close look at the lines
missing or the last line present plus the first one missing: Is there
anything special about them?

But actually I have a feeling that this may be your problem:

read.table uses both '"' and "'" for quoting by default. Gene
descriptions love to contain things like "5'" and "3'".
=> Try quote='' in the read.table call.


Dr. Philipp Pagel                                Tel.  +49-89-3187-3675
Institute for Bioinformatics / MIPS              Fax.  +49-89-3187-3585
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1
85764 Neuherberg, Germany

More information about the R-help mailing list