[R] Melt and Rbind/Rbindlist

jim holtman jholtman at gmail.com
Sun Feb 1 01:22:49 CET 2015


It would have been nice if you had at least supplied a subset (~10 lines)
from a couple of files so we could see what the data looks like and test
out any solution. Since you are using 'data.table', you should probably
also use 'fread' for reading in the data.  Here is a possible approach of
reading the data into a list and then creating a single, large data.table:

-------
myDTs <- lapply(filelist, function(.file) {
  tmp1 <- fread(.file, sep=",")
  tmp2 <- melt(tmp1, id="FIPS")
  tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
  tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
  tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
  tmp2  # return value
})

bigDT <- rbindlist(myDTs)  # rbind all the data.tables together

# then you should be able to do:

mean.temp <- bigDT[, list(temp.mean=lapply(.SD, mean),
       by=c("FIPS","year","month"), .SDcols=c("temp")]




Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, Jan 31, 2015 at 5:57 PM, Shouro Dasgupta <shouro at gmail.com> wrote:

> I have climate data for 20 years for US counties (FIPS) in csv format, each
> file represents one year of data. I have extracted the data and reshaped
> the yearly data files using melt();
>
> for (i in filelist) {
> >   tmp1 <- as.data.table(read.csv(i,header=T, sep=","))
> >   tmp2 <- melt(tmp1, id="FIPS")
> >   tmp2$year <- as.numeric(substr(tmp2$variable,2,5))
> >   tmp2$month <- as.numeric(substr(tmp2$variable,7,8))
> >   tmp2$day <- as.numeric(substr(tmp2$variable,10,11))
> > }
>
>
> Should I *rbind *in the loop here as I have the memory?
> So, the file (i) tmp2 looks like this:
>
> FIPS  temp year month  date
> > 1001 276.7936 2045 1 1/1/2045
> > 1003 276.7936 2045 1 1/1/2045
> > 1005 279.6452 2045 1 1/1/2045
> > 1007 276.7936 2045 1 1/1/2045
> > 1009 272.3748 2045 1 1/1/2045
> > 1011 279.6452 2045 1 1/1/2045
>
>
> My goal is calculate the mean by FIPS code by month/week, however, when I
> use the following code, I get a NULL value.
>
> mean.temp<- for (i in filelist) {tmp2[, list(temp.mean=lapply(.SD, mean),
> > by=c("FIPS","year","month"), .SDcols=c("temp")]}
>
>
> This works fine for individual years but with *for (i in filelist)*. What
> am I doing wrong? Can include a rbind/bindlist in the loop to make a big
> data.frame? Any suggestions will be highly appreciated. Thank you.
>
> Sincerely,
>
> Shouro
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list