[R] Combining CSV data

arun smartpink111 at yahoo.com
Tue Jun 11 23:09:45 CEST 2013




HI,
You could use:
result3<- data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
colnames(result3)[5:7]<- paste0("DataComment",1:3)
A.K.
________________________________
From: Shreya Rawal <rawal.shreya at gmail.com>
To: arun <smartpink111 at yahoo.com> 
Sent: Tuesday, June 11, 2013 4:22 PM
Subject: Re: [R] Combining CSV data



Hey Arun,

I guess you could guide me with this a little bit. I have been working on the solution Jim suggested (and also because that I could understand it with my little knowledge of R :))

So with these commands I am able to get the data in this format:

> fileA <- read.csv(text = "Row_ID_CR,   Data1,    Data2,    Data3
+ 1,                   aa,          bb,          cc
+ 2,                   dd,          ee,          ff", as.is = TRUE)
> 
> fileB <- read.csv(text = "Row_ID_N,   Src_Row_ID,   DataN1
+ 1a,               1,                   This is comment 1
+ 2a,               1,                   This is comment 2
+ 3a,               2,                   This is comment 1
+ 4a,               1,                   This is comment 3", as.is = TRUE)
> 
> # get rid of leading/trailing blanks on comments
> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
> 
> # merge together
> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
> 
> # now partition by Row_ID_CR and aggregate the comments
> result2 <- do.call(rbind, 
+     lapply(split(result, result$Row_ID_CR), function(.grp){
+         cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = '|'))
+     })
+ )

Row_ID_CR                 Data1        Data2        Data3                                                 comment
1         1                    aa           bb           cc                                              This is comment 1| This is comment 2| This is comment 3
2         2                    dd           ee           ff                                                This is comment 1| This is Comment 2

I can even split the last column by this: strsplit(as.character(result2$comment), split='\\|')

[[1]]
[1] "This is comment 1" "This is comment 2" " This is comment 3"

[[2]]
[1] "This is comment 1" "This is comment 2"


but now I am not sure how to combine everything together. I guess by now you must have realized how new I am to R :)

Thanks!!
Shreya






On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink111 at yahoo.com> wrote:

Hi,
>If the dataset is like this with the comments in the order:
>
>dat2<-read.table(text="
>Row_ID_N,  Src_Row_ID,  DataN1
>1a,              1,                  This is comment 1
>2a,              1,                  This is comment 2
>3a,              2,                  This is comment 1
>4a,              1,                  This is comment 3
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>dat3<-read.table(text="
>Row_ID_N,  Src_Row_ID,  DataN1
>1a,              1,                  This is comment 1
>2a,              1,                  This is comment 2
>3a,              2,                  This is comment 1   #
>
>4a,              1,                  This is comment 3
>5a,         2,                  This is comment 2  #
>
>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>
>library(stringr)
>library(plyr)
>fun1<- function(data1,data2){
>    data2$DataN1<- str_trim(data2$DataN1)  
>        res<- merge(data1,data2,by.x=1,by.y=2)
>    res1<- res[,-5]
>    res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>    Mx1<- max(sapply(res2[,5],length))
>    res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>                                  c(x,rep(NA,Mx1-length(x)))
>
>                                  })),stringsAsFactors=FALSE)
>    colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>    res3
>    }   
>
>     
>fun1(dat1,dat2)
>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>#1         1                   aa           bb           cc This is comment 1
>
>#2         2                   dd           ee           ff This is comment 1
>#       DataComment2      DataComment3
>#1 This is comment 2 This is comment 3
>#2              <NA>              <NA>
>
> fun1(dat1,dat3)
>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>#1         1                   aa           bb           cc This is comment 1
>
>#2         2                   dd           ee           ff This is comment 1
> #      DataComment2      DataComment3
>#1 This is comment 2 This is comment 3
>
>#2 This is comment 2              <NA>
>
>
>Otherwise, you need to provide an example that matches the real dataset.
>A.K.
>
>________________________________
>From: Shreya Rawal <rawal.shreya at gmail.com>
>To: arun <smartpink111 at yahoo.com>
>Cc: R help <r-help at r-project.org>
>Sent: Tuesday, June 11, 2013 12:22 PM
>
>Subject: Re: [R] Combining CSV data
>
>
>
>Hi Arun,
>
>Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
>
>Thanks for your help, I appreciate!   
>
>
>
>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink111 at yahoo.com> wrote:
>
>HI,
>>I am not sure about your DataN1 column.  If there is any identifier to differentiate the comments (in this case 1,2,3), then it will easier to place that in the correct column.
>>  My previous solution is not helpful in situations like these:
>>
>>dat2<-read.table(text="
>>Row_ID_N,  Src_Row_ID,  DataN1
>>1a,              1,                  This is comment 1
>>2a,              1,                  This is comment 2
>>3a,              2,                  This is comment 2
>>4a,              1,                  This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>dat3<-read.table(text="
>>
>>Row_ID_N,  Src_Row_ID,  DataN1
>>1a,              1,                  This is comment 1
>>2a,              1,                  This is comment 2
>>3a,              2,                  This is comment 3
>>4a,              1,                  This is comment 3
>>5a,         2,                  This is comment 2
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>
>>library(stringr)
>>library(plyr)
>>fun1<- function(data1,data2){
>>    data2$DataN1<- str_trim(data2$DataN1)   
>>        res<- merge(data1,data2,by.x=1,by.y=2)
>>    res1<- res[,-5]
>>    res2<- ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>>    Mx1<- max(sapply(res2[,5],length))
>>    res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>>                                  indx<- as.numeric(gsub("[[:alpha:]]","",x))
>>                                  x[match(seq(Mx1),indx)]
>>                                  })),stringsAsFactors=FALSE)
>>
>>    colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>    res3
>>    }          
>>fun1(dat1,dat2)
>>
>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>#1         1                   aa           bb           cc This is comment 1
>>#2         2                   dd           ee           ff              <NA>
>>
>>#       DataComment2      DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2              <NA>
>> fun1(dat1,dat3)
>>
>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>#1         1                   aa           bb           cc This is comment 1
>>#2         2                   dd           ee           ff              <NA>
>>
>>#       DataComment2      DataComment3
>>#1 This is comment 2 This is comment 3
>>#2 This is comment 2 This is comment 3
>>
>>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>
>>From: arun <smartpink111 at yahoo.com>
>>To: Shreya Rawal <rawal.shreya at gmail.com>
>>Cc: R help <r-help at r-project.org>
>>Sent: Monday, June 10, 2013 6:41 PM
>>Subject: Re: [R] Combining CSV data
>>
>>Hi,
>>Try this:
>>
>>dat1<-read.table(text="
>>Row_ID_CR,  Data1,    Data2,    Data3
>>1,                  aa,          bb,          cc
>>2,                  dd,          ee,          ff
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>
>>dat2<-read.table(text="
>>Row_ID_N,  Src_Row_ID,  DataN1
>>1a,              1,                  This is comment 1
>>2a,              1,                  This is comment 2
>>3a,              2,                  This is comment 1
>>4a,              1,                  This is comment 3
>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
>>library(stringr)
>>dat2$DataN1<-str_trim(dat2$DataN1)
>>res<- merge(dat1,dat2,by.x=1,by.y=2)
>> res1<-res[,-5]
>>library(plyr)
>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, DataN1=list(DataN1))
>> res2
>> # Row_ID_CR                Data1        Data2        Data3
>>#1         1                   aa           bb           cc
>>#2         2                   dd           ee           ff
>>#                                                   DataN1
>>#1 This is comment 1, This is comment 2, This is comment 3
>>#2                                       This is comment 1
>>
>>
>>
>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) {x[duplicated(x)]<-NA;x})))
>> colnames(res3)[grep("X",colnames(res3))]<- paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>>res3
>>#  Row_ID_CR                Data1        Data2        Data3      DataComment1
>>#1         1                   aa           bb           cc This is comment 1
>>#2         2                   dd           ee           ff This is comment 1
>>#       DataComment2      DataComment3
>>#1 This is comment 2 This is comment 3
>>#2              <NA>              <NA>
>>
>>A.K.
>>
>>
>>----- Original Message -----
>>From: Shreya Rawal <rawal.shreya at gmail.com>
>>To: r-help at r-project.org
>>Cc:
>>Sent: Monday, June 10, 2013 4:38 PM
>>Subject: [R] Combining CSV data
>>
>>Hello R community,
>>
>>I am trying to combine two CSV files that look like this:
>>
>>File A
>>
>>Row_ID_CR,   Data1,    Data2,    Data3
>>1,                   aa,          bb,          cc
>>2,                   dd,          ee,          ff
>>
>>
>>File B
>>
>>Row_ID_N,   Src_Row_ID,   DataN1
>>1a,               1,                   This is comment 1
>>2a,               1,                   This is comment 2
>>3a,               2,                   This is comment 1
>>4a,               1,                   This is comment 3
>>
>>And the output I am looking for is, comparing the values of Row_ID_CR and
>>Src_Row_ID
>>
>>Output
>>
>>ROW_ID_CR,    Data1,    Data2,    Data3,    DataComment1,
>>DataComment2,          DataComment3
>>1,                      aa,         bb,         cc,        This is
>>comment1,    This is comment2,     This is comment 3
>>2,                      dd,          ee,         ff,          This is
>>comment1
>>
>>
>>I am a novice R user, I am able to replicate a left join but I need a bit
>>more in the final result.
>>
>>
>>Thanks!!
>>
>>    [[alternative HTML version deleted]]
>>
>>______________________________________________
>>R-help at r-project.org mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>>
>>
>



More information about the R-help mailing list