[R] splitting column into two

arun smartpink111 at yahoo.com
Mon Mar 11 07:15:49 CET 2013


HI,

Try this:
dat1<- read.table(text="
 V1,V2,V3,V4,V5,V6,V7
 chr1,564563,564598,564588 564589,1336,+,134
 chr1,564620,564649,564644 564645,94,+,10
 chr1,565369,565404,565371 565372,217,+,8
 chr1,565463,565541,565480 565481,1214,+,15
 chr1,565653,565697,565662 565663,1031,+,28
 chr1,565861,565922,565883 565884,316,+,12
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(reshape2)
dat2<-with(dat1,{cbind(dat1[,-4],colsplit(V4,pattern=" ",names=c("peak_start","peak_end")))})
 dat2
#     V1     V2     V3   V5 V6  V7 peak_start peak_end
#1  chr1 564563 564598 1336  + 134     564588   564589
#2  chr1 564620 564649   94  +  10     564644   564645
#3  chr1 565369 565404  217  +   8     565371   565372
#4  chr1 565463 565541 1214  +  15     565480   565481
#5  chr1 565653 565697 1031  +  28     565662   565663
#6  chr1 565861 565922  316  +  12     565883   565884
library(data.table)

datNew<- data.table(dat2)
A.K.



----- Original Message -----
From: "deconstructed.morning at gmail.com" <deconstructed.morning at gmail.com>
To: smartpink111 at yahoo.com
Cc: 
Sent: Sunday, March 10, 2013 5:48 PM
Subject: Re: splitting column into two

Hello,
I saw your solution for this question and I want to ask you should I do when I have a very large file, that looks like this:


> clusters<-data.table(CTSS[, grep("V1$|V2$|V3$|V4$|V5$|V6$|V7$", names(CTSS))])
> head(clusters)
     V1     V2     V3                       V4   V5 V6  V7
1: chr1 564563 564598 564588 564589 1336  + 134
2: chr1 564620 564649 564644 564645   94  +  10
3: chr1 565369 565404 565371 565372  217  +   8
4: chr1 565463 565541 565480 565481 1214  +  15
5: chr1 565653 565697 565662 565663 1031  +  28
6: chr1 565861 565922 565883 565884  316  +  12

What I want is to replace column V4 which contain two numbers separated by a space,  with two columns that are numerical. I have tried this: 
new <- cbind(CTSS,colsplit(CTSS$V4, ' ', c('peak_start', 'peak_end')) )

but instead of replacing the column it keeps it the same and adds two new columns at end of the columns(after 625 columns). Please let me know if you have a better solution.

Thank you,
Nanami




<quote author='arun kirshna'>
Hi,
May be this helps:
dat1<-read.table(text="
0111 0214 0203 0404 1112 0513 0709 1010 0915 0813 
0112 0314 0204 0504 1132 0543 0789 1020 0965 0823
",sep="",header=FALSE,colClasses=rep("character",10)) 
res<-do.call(data.frame,lapply(dat1,function(x)
do.call(rbind,lapply(strsplit(x,""),function(y)
c(paste0(y[1],y[2]),paste0(y[3],y[4]))))))
colnames(res)<-paste0("V",1:20)
res
#  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
#1 01 11 02 14 02 03 04 04 11  12  05  13  07  09  10  10  09  15  08  13
#2 01 12 03 14 02 04 05 04 11  32  05  43  07  89  10  20  09  65  08  23
A.K.
</quote>
Quoted from: 
http://r.789695.n4.nabble.com/splitting-column-into-two-tp4656108p4656111.html




More information about the R-help mailing list