[R] Split data frame and create a new column

arun smartpink111 at yahoo.com
Sat Nov 17 17:59:47 CET 2012


HI,

Just a modification of Rui's function:

fun1<-function(x){
r1<-unlist(strsplit(x,"L\\d+|G|P|S|max|mean|10"))
r1<-r1[r1!=""] 
r2<-r1[!grepl("\\_",r1)]
r3<-integer(length(x))
r3[grepl("^L",x)]<-gsub("L(\\d+).*","\\1",x[grep("L\\d+",x)])
r3[grepl("_\\d+$",x)]<-gsub("[\\_]","",r1[grepl("\\_",r1)])
r4<-gsub(".*(G|P|S).*","\\1",x)
res<-data.frame(col1=r2,col2=r3,col3=r4)
res}
fun1(x)
# col1 col2 col3
#1   o3    1    G
#2   o3    1    P
#3   o3    2    G
#4  nox    0    P
#5 pm25   01    S
#6   co   03    S
#7  nox   04    P

A.K.



----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Zlatan <pollaroid at gmail.com>
Cc: r-help at r-project.org
Sent: Saturday, November 17, 2012 10:22 AM
Subject: Re: [R] Split data frame and create a new column

Hello,

I don't know if this is general purpose but try


x <- scan(what = "character", text="
L1o3maxG10
L1o3P10
L2o3G10
noxP10
pm25S_01
comeanS_03
noxP_04")

fun <- function(x){
    r1 <- unlist(strsplit(x, "L[[:digit:]]+|G|P|S"))
    r1 <- r1[nchar(r1) != 0]
    r1 <- r1[rep(c(TRUE, FALSE), length(r1)/2)]
    r1 <- unlist(strsplit(r1, "max|mean"))
    r1 <- r1[nchar(r1) != 0]

    r2 <- integer(length(x))
    w2 <- grep("L[[:digit:]]+", x)
    re2 <- regexpr("L[[:digit:]]+", x)
    re2 <- unlist(strsplit(regmatches(x, re2), "L"))
    re2 <- re2[nchar(re2) != 0]
    r2[w2] <- re2
    w2 <- grep("G_|P_|S_", x)
    re2 <- regmatches(x, regexpr("(G_|P_|S_)[[:digit:]]+", x))
    re2 <- unlist(strsplit(re2, "G_|P_|S_"))
    re2 <- re2[nchar(re2) != 0]
    r2[w2] <- re2

    r3 <- regmatches(x, regexpr("G|P|S", x))

    data.frame(r1, r2, r3)
}

fun(x)


Hope this helps,

Rui Barradas
Em 16-11-2012 00:05, Zlatan escreveu:
> I need to split a data frame into 3 columns. The column I want to split
> contains indices of lag (prefix L1 or L2 and suffix 01, 03, 04), station
> name (shown in the sample data as capitalized G, P and S) and pollutant
> name. Names with no “L” prefix or 01/04 suffix are lag 0. Lag 01 is average
> of lag 0 and 1, and 04 is average of 0 to 4 days. How can one do that in R?
> I will ignore the other components( e.g. 10 , max or mean)
> 
> 
> 
> Current stand
> 
> L1o3maxG10
> L1o3P10
> L2o3G10
> noxP10
> pm25S_01
> comeanS_03
> noxP_04
> 
> What I want to get :
> 
> pollutant  Lag    station
> o3    1    G
> o3    1    P
> o3    2    G
> nox    0    P
> Pm25    01    S
> co    03    S
> nox    04    P
> 
> 
> Thanks
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Split-data-frame-and-create-a-new-column-tp4649683.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list