[R] problem formatting data frames

Thomas Lumley tlumley at u.washington.edu
Wed Jul 17 17:54:27 CEST 2002


On Wed, 17 Jul 2002 VBMorozov at lbl.gov wrote:

>
>  Dear R-guRus:
> I have a problem with the format of my data in R.
> Let's say I have a HUGE text table which consists of columns of
> numerical data, separated by tabs, but in some places rows of text
> (error messages, etc) are inserted in between rows of numerical data.
> Because the data file is so huge and because I have thousands of these
> files, it's unpractical to try and go thru these files manually and
> remove text rows - I'd like R to do it for me.
> The following command works:
>
> MyDataFrame<-data.frame(read.table("MyFile"))
>
> but instead of numerical data in my frame I get "factor" data, because
> of these text inserts. How do I filter them out??

The simplest case would be if the error messages always began with the
same character (eg "E").  In that case you could use comment.char="E" in
read.table to say that lines beginning with E are comments.

Otherwise you will probably need to read the file line by line and remove
the error messages.  The most computationally efficient solution would
probably be to use something like Perl to preprocess the file, but you
could do it in R.

Eg

  Read the file as lines of text
  Use grep() to find which lines contain only numbers
  Write those lines to a temporary file
  Read the temporary file with read.table()

Something similar is done by read.fwf(), which reads fixed format data
files.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list