[R] gsub with regular expression

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jun 25 17:11:21 CEST 2010


On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk
<residuo.solow at gmail.com> wrote:
> If I have a text with 7 words per line and I would like to put first
> and second word joined in a vector and the rest of words one per
> column in a matrix how can I do it?
>
> First 2 lines of my text file:
> "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido"
> "2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca"
>
> Results:
>
> Vector:
> 2008/12/31 12:23:31
> 2010/02/01 02:35:31
>
> Matrix
> "numero" 343.233.233 "Rodeo"   "Vaca"   "Ruido"
> "palabra" 111.111.222 "abejorro" "Rodeo" "Vaca"
>

Here are two solutions.  Both solutions are three statements long
(read in the data, display the vector, display the matrix).  Replace
textConnection(text) with "myfile.dat", say, in each.

1. Here is a sub solution:

L <- readLines(textConnection(Lines))
sub("(\\S+ \\S+) .*", "\\1", L)
sub("\\S+ \\S+ ", "", L)


2. Here is a solution using zoo:

Lines <- "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca"

library(zoo)

z <- read.zoo(textConnection(Lines), index = 1:2,
           FUN = function(x) paste(x[,1], x[,2]))

time(z) # the vector
coredata(z) # the matrix


Another possibility would be to convert to chron or POSIXct at the
same time as reading it in:

# chron
library(chron)
z <- read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.chron(paste(x[,1], x[,2]), format = "%Y/%m/%d %H:%M:%S"))

# POSIXct
z <- read.zoo(textConnection(Lines), index = 1:2,
 FUN = function(x) as.POSIXct(paste(x[,1], x[,2]), format = "%Y/%m/%d
%H:%M:%S"))



More information about the R-help mailing list