[R] Splitting columns and forming new data files in R

arun smartpink111 at yahoo.com
Tue Apr 8 15:46:24 CEST 2014


Hi,
Try:
#Tmin,Tmax,Tmean,Precip 

#"Tmean" -999.9 in all files
#working directory is "sample" 

#created folder "final"
list.files()
#[1] "coordinates.csv" "final"           "Precip"          "Tmax
"           
#[5] "Tmin"
Coord <- read.csv(list.files(pattern=".csv"),header=TRUE,stringsAsFactors=FALSE)
lfile <- list.files()[!grepl(".csv|final",list.files())]
files <-  paste(paste(getwd(),lfile,sep="/"), list.files(lfile),sep="/")
lst1 <- split(files,gsub(".*\\/(.*)\\.csv","\\1",files)) 

names1 <- gsub(".*\\/(.*)\\/.*\\.csv","\\1",lst1[[1]])
lst1New <-   lapply(lst1,function(x) {lst2 <- setNames(lapply(x,function(y) {dat <- read.table(y,sep=" ",header=TRUE, stringsAsFactors=FALSE); dat[,1:104]} ), names1); dat2 <- do.call(cbind,lst2); indx <- grepl("Sim",names(dat2)); dat3 <- dat2[indx];dat4 <- dat2[!indx][,1:4]; names(dat4) <- gsub(".*\\.","",names(dat4)); lapply(split(names(dat3),gsub(".*\\.","",names(dat3))),function(x)  {dat5 <- cbind(dat4,dat3[,x]); dat5$Tmean <- -999.9; dat6 <- dat5[,c(1:4,7:6,8,5)];colnames(dat6)[2:3] <- Coord[match(unique(dat6$Site), Coord$Site),3:2];dat7 <- dat6[,-4]; dat7; colnames(dat7)[-(2:3)] <- NA; dat7})})

sapply(lst1New,length) 

#G100 G101 G102 G103 

# 100  100  100  100
lapply(names(lst1New),function(x) {nm1 <- paste(x, names(lst1New[[x]]),sep="_"); nm2 <- paste0(paste(paste0(getwd(),"/final"),nm1,sep="/"),".csv");lapply(seq_along(lst1New[[x]]),function(i) {x1 <- lst1New[[x]][i]; write.table(x1, nm2[i],quote=FALSE,row.names=FALSE)})})

length(list.files(paste0(getwd(),"/final")))
#[1] 400 


A.K.

On Tuesday, April 8, 2014 1:17 AM, Zilefac Elvis <zilefacelvis at yahoo.com> wrote:

Hi AK,
Please I need your help. I finally solved the previous task I sent to you.
I have Precip,Tmin and Tmax in three different folders (attached).
Each folder has 4 files with identical names in all folders (we can match case).

Within each file are [,YYY MM DD sim001...sim100] (some files may have more than 100 simulations. Use only the first 100).

Q1) Open all three folders, go to file 1 (e.g G100), copy column 1 (sim001, do not copy date) and paste it in a new folder called "final". Do so for column 2 (sim002),...,column 100 (sim100). So from file 1 alone with 100 sims, you will have 100 files in "final". The files in "final" should be labelled for example as G100_sim001, G100_sim002,...,G100_sim100; G101_sim001,G101_sim002 etc.

The format of all files in "final" is similar to:

50-110.7
196111-999.9-999.9-999.90
196112-999.9-999.9-999.92.38
196113-999.9-999.9-999.90
196114-999.9-999.9-999.90
196115-999.9-999.9-999.90
196116-999.9-999.9-999.90
196117-999.9-999.9-999.90
196118-999.9-999.9-999.90
196119-999.9-999.9-999.95.19
1961110-999.9-999.9-999.90
1961111-999.9-999.9-999.90
1961112-999.9-999.9-999.90
1961113-999.9-999.9-999.90
1961114-999.9-999.9-999.90
1961115-999.9-999.9-999.90


The columns after the date should be [Tmin,Tmax,Tmean,Precip]. Please do not include column names in output. Output files are .csv.

*Fill column "Tmean" with -999.9 in all files.

Therefore, using the sample I have provided, you will have 4sites*100 sims = 400 files in folder "final".



Q2) From the attached coordinates file, please copy Lat andLong corresponding to the Site and past it in the first row of every file starting with that site code.

For example, all files beginning with G100_sim... will have their first row similar to:

       49.53-96.7
196111-999.9-999.9-999.90


This looks very cumbersome for me to handle. 

Thanks very much.
Atem.




More information about the R-help mailing list