[R] data format

arun smartpink111 at yahoo.com
Tue Feb 19 16:22:20 CET 2013


Hi,
Try this:
el<- read.csv("el.csv",header=TRUE,sep="\t",stringsAsFactors=FALSE)
 elsplit<- split(el,el$st)
 
datetrial<-data.frame(date1=seq.Date(as.Date("1930.1.1",format="%Y.%m.%d"),as.Date("2010.12.31",format="%Y.%m.%d"),by="day"))
elsplit1<- lapply(elsplit,function(x) data.frame(date1=as.Date(paste(x[,2],x[,3],x[,4],sep="-"),format="%Y-%m-%d"),discharge=x[,5]))
 elsplit2<-lapply(elsplit1,function(x) x[order(x[,1]),])
library(plyr)
elsplit3<-lapply(elsplit2,function(x) join(datetrial,x,by="date1",type="full"))
 elsplit4<-lapply(elsplit3,function(x) {x[,2][is.na(x[,2])]<- "-9999.000";x})
elsplit5<-lapply(elsplit4,function(x) {x[,1]<-format(x[,1],"%Y.%m.%d");x})
elsplit6<-lapply(elsplit5,function(x){substr(x[,1],6,6)<-ifelse(substr(x[,1],6,6)==0," ",substr(x[,1],6,6));substr(x[,1],9,9)<- ifelse(substr(x[,1],9,9)==0," ",substr(x[,1],9,9));x})
 elsplit6[[1]][1:4,]
#       date1 discharge
#1 1930. 1. 1 -9999.000
#2 1930. 1. 2 -9999.000
#3 1930. 1. 3 -9999.000
#4 1930. 1. 4 -9999.000

 length(elsplit6)
#[1] 124
 tail(elsplit6[[124]],25)
#           date1 discharge
#29561 2010.12. 7 -9999.000
#29562 2010.12. 8 -9999.000
#29563 2010.12. 9 -9999.000
#29564 2010.12.10 -9999.000
#29565 2010.12.11 -9999.000
#29566 2010.12.12 -9999.000
#29567 2010.12.13 -9999.000
#29568 2010.12.14 -9999.000
#29569 2010.12.15 -9999.000
#29570 2010.12.16 -9999.000
#29571 2010.12.17 -9999.000
#29572 2010.12.18 -9999.000
#29573 2010.12.19 -9999.000
#29574 2010.12.20 -9999.000
#29575 2010.12.21 -9999.000
#29576 2010.12.22 -9999.000
#29577 2010.12.23 -9999.000
#29578 2010.12.24 -9999.000
#29579 2010.12.25 -9999.000
#29580 2010.12.26 -9999.000
#29581 2010.12.27 -9999.000
#29582 2010.12.28 -9999.000
#29583 2010.12.29 -9999.000
#29584 2010.12.30 -9999.000
#29585 2010.12.31 -9999.000

 str(head(elsplit6,3))
#List of 3
# $ AGOMO:'data.frame':    29585 obs. of  2 variables:
 # ..$ date1    : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
  #..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...
 #$ AGONO:'data.frame':    29585 obs. of  2 variables:
  #..$ date1    : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
  #..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...
 #$ ANZMA:'data.frame':    29585 obs. of  2 variables:
  #..$ date1    : chr [1:29585] "1930. 1. 1" "1930. 1. 2" "1930. 1. 3" "1930. 1. 4" ...
  #..$ discharge: chr [1:29585] "-9999.000" "-9999.000" "-9999.000" "-9999.000" ...


Regarding the space between date1 and discharge, I haven't checked it as you didn't mention whether it is needed in data.frame or not.

A.K.






________________________________
From: eliza botto <eliza_botto at hotmail.com>
To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
Sent: Tuesday, February 19, 2013 10:01 AM
Subject: RE:



THANKS ARUN..
ITS A CHARACTER....
SORRY FOR NOT TELLING YOU IN ADVANCE

ELISA


> Date: Tue, 19 Feb 2013 07:00:03 -0800
> From: smartpink111 at yahoo.com
> Subject: Re: 
> To: eliza_botto at hotmail.com
> 
> 
> 
> Hi,
> One more doubt.
> You mentioned about -9999.000.  Is it going to be a number or character like "-9999.000"?  If it is a number, the final product will be -9999.
> Arun
> 
> 
> 
> 
> ________________________________
> From: eliza botto <eliza_botto at hotmail.com>
> To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
> Sent: Tuesday, February 19, 2013 9:16 AM
> Subject: RE:
> 
> 
> 
> How can u be wrong arun?? you are right.....
> 
> elisa
> 
> 
> > Date: Tue, 19 Feb 2013 06:15:31 -0800
> > From: smartpink111 at yahoo.com
> > Subject: Re: 
> > To: eliza_botto at hotmail.com
> > 
> > Hi Elisa,
> > 
> > Just a doubt regarding the format of the date.  Is it the same format as the previous one?  0 replaced by one space if either month or day is less than 10.  Also, if I am correct, the list elements are for the different stationname, right?
> > Arun
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ________________________________
> > From: eliza botto <eliza_botto at hotmail.com>
> > To: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com> 
> > Sent: Tuesday, February 19, 2013 8:35 AM
> > Subject: 
> > 
> > 
> > 
> > 
> > 
> > Dear Arun,
> > [Text file is also attached if format is changed, where as el is data file
> > Attached with email is the excel file with contains the data. the data is following form
> > 
> > col1.    col2. col3.col4.col5.
> > stationname year month day discharge
> > A         2004 11232
> > A          2004 1 2 334
> > .............................
> > ........................
> > B         2009 11       323
> > B                       2009 12332
> > 
> > 
> > There are stations where data starts from and ends at different years but i want each year to start from 1930 and ends at 2010 with -9999.000 for those days when data is missing. i want to make a list which should appear like the following
> > 
> > [[A]]
> > 1930. 1. 1 -9999.000
> > 1930. 1. 2 -9999.000
> > 1930. 1. 3 -9999.000
> > 1930. 1. 4 -9999.000
> > 1930. 1. 5 -9999.000
> > 1930. 1. 6 -9999.000
> > 1930. 1. 7 -9999.000
> > 1930. 1. 8 -9999.000
> > 1930. 1. 9 -9999.000
> > 1930. 1.10 -9999.000
> > 1930. 1.11 -9999.000
> > 1930. 1.12 -9999.000
> > 1930. 1.13 -9999.000
> > ....................
> > ....................
> > ....................
> > 2004. 1. 1   232.0
> > 2004. 1. 2   334.0
> > ..................
> > ..................
> > 2004.12. 1   113.56
> > ....
> > ...
> > 2004.12.31   114.56
> > 
> > [[B]]
> > 1930. 1. 1 -9999.000
> > 1930. 1. 2 -9999.000
> > 1930. 1. 3 -9999.000
> > 1930. 1. 4 -9999.000
> > 1930. 1. 5 -9999.000
> > 1930. 1. 6 -9999.000
> > 1930. 1. 7 -9999.000
> > 1930. 1. 8 -9999.000
> > 1930. 1. 9 -9999.000
> > 1930. 1.10 -9999.000
> > 1930. 1.11 -9999.000
> > 1930. 1.12 -9999.000
> > 1930. 1.13 -9999.000
> > ....................
> > ....................
> > ....................
> > 2007. 1. 1    23.0
> > 2007. 1. 2    33.0
> > ..................
> > ..................
> > 2007.12. 1   13.56
> > ....
> > ...
> > 2007.12.31    4.56
> > 
> > 
> > Alongside the usual format of starting and ending....... There are stations like "BRRSD", where data is for the years 2001, 2002, 2009 and 2010, i want -9999.000 to be inserted for each day of 2003, 2004, 2005, 2006, 2007, 2008 as data is not avaliable for them. 
> > The date format should be the way written above. just one request would be to not share my data file on R forum.
> > 
> > thankyou so very much in advance
> > 
> > elisa



More information about the R-help mailing list