[R] R tools for large files

David Khabie-Zeitoune dave at evocapital.com
Tue Aug 26 11:03:04 CEST 2003


A starting point might be the string splitting function strsplit

For example,

> X = c("1,4,5" "1,2,5" "5,1,2")
> strsplit(X)
[[1]]
[1] "1" "4" "5"

[[2]]
[1] "1" "2" "5"

[[3]]
[1] "5" "1" "2"

This returns a list of the parsed vectors. Next you can do something
like:
> Z = data.frame(matrix(unlist(X), nrow = 3, byrow=T))
> Z
  X1 X2 X3
1  1  4  5
2  1  2  5
3  5  1  2 




-----Original Message-----
From: Ted.Harding at nessie.mcc.ac.uk [mailto:Ted.Harding at nessie.mcc.ac.uk]

Sent: 26 August 2003 09:00
To: R-help
Subject: Re: [R] R tools for large files


This has been an interesting thread! My first reaction to Murray's query
was to think "use standard Unix tools, especially awk", 'awk' being a
compact, fast, efficient program with great powers for processing lines
of data files (and in particular extracting, subsetting and transforming
database-like files e.g. CSV-type).

Of course, that became a sub-thread in its own right.

But -- and here I know I'm missing a trick which is why I'm responding
now so that someone who knows the trick can tell me -- while I normally
use 'awk' "externally" (i.e. I filter a data file through an 'awk'
program outside of R and then read the resulting file into R), I began
to think about doing it from within R.

Something on the lines of

  X <- system("cat raw_data | awk '...' ", intern=TRUE)

would create an object X which is a character vector, each element of
which is one line from the output of the command "cat ...... ".

E.g. if "raw_data" starts out as

  1,2,3,4,5
  1,3,4,2,5
  5,4,3,2,1
  5,3,4,1,2

then

  X<-system("cat raw_data.csv |
  awk 'BEGIN{FS=\",\"}{if($3>$2){print $1 \",\" $4 \",\" $5}}'",
  intern=TRUE)

gives

  > X
  [1] "1,4,5" "1,2,5" "5,1,2"

Now my Question:
How do I convert X into the dataframe I would have got if I had read
this output from a file instead of into the character vector X?

In other words, how to convert a vector of character strings, each of
which is in comma-separated format as above, into the rows of a
data-frame (or matrix, come to that)?

With thanks,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 26-Aug-03                                       Time: 08:59:48
------------------------------ XFMail ------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help




More information about the R-help mailing list