# [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

David L Carlson dcarlson at tamu.edu
Fri May 13 15:59:04 CEST 2016

```You can do this with split/unsplit:

> mydf.split <- split(mydf, mydf\$blocks)
> str(mydf.split)
List of 3
\$ a:'data.frame':      5 obs. of  3 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1
..\$ v1    : num [1:5] 19 15 17 22 16
..\$ v2    : num [1:5] 35 31 35 31 39
\$ b:'data.frame':      5 obs. of  3 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 2 2 2 2 2
..\$ v1    : num [1:5] 12 24 25 22 18
..\$ v2    : num [1:5] 31 19 35 32 38
\$ c:'data.frame':      5 obs. of  3 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 3 3 3 3 3
..\$ v1    : num [1:5] 17 14 21 21 22
..\$ v2    : num [1:5] 27 25 23 23 27
> mydf.split2 <- lapply(mydf.split, function(x) data.frame(x,
+      v1mod=mynorm(x\$v1)))
> str(mydf.split2)
List of 3
\$ a:'data.frame':      5 obs. of  4 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1
..\$ v1    : num [1:5] 19 15 17 22 16
..\$ v2    : num [1:5] 35 31 35 31 39
..\$ v1mod : num [1:5] 0.571 0 0.286 1 0.143
\$ b:'data.frame':      5 obs. of  4 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 2 2 2 2 2
..\$ v1    : num [1:5] 12 24 25 22 18
..\$ v2    : num [1:5] 31 19 35 32 38
..\$ v1mod : num [1:5] 0 0.923 1 0.769 0.462
\$ c:'data.frame':      5 obs. of  4 variables:
..\$ blocks: Factor w/ 3 levels "a","b","c": 3 3 3 3 3
..\$ v1    : num [1:5] 17 14 21 21 22
..\$ v2    : num [1:5] 27 25 23 23 27
..\$ v1mod : num [1:5] 0.375 0 0.875 0.875 1
> mydf2 <- unsplit(mydf.split2, mydf\$blocks)
> str(mydf2)
'data.frame':   15 obs. of  4 variables:
\$ blocks: Factor w/ 3 levels "a","b","c": 1 1 1 1 1 2 2 2 2 2 ...
\$ v1    : num  19 15 17 22 16 12 24 25 22 18 ...
\$ v2    : num  35 31 35 31 39 31 19 35 32 38 ...
\$ v1mod : num  0.571 0 0.286 1 0.143 ...

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo Bressan
Sent: Friday, May 13, 2016 6:56 AM
To: r-help at r-project.org
Subject: [R] apply formula over columns by subset of rows in a dataframe (to get a new dataframe)

hi

I need to apply a user defined formula over some selected columns of a dataframe by subsetting group of rows (blocks) and get back a new dataframe

I’ve been managed to get the the calculations right but I’m not satisfied at all by the form of the results

please refer to my reproducible example

##########
# my user function (an example)
mynorm <- function(x) {(x - min(x, na.rm=TRUE))/(max(x, na.rm=TRUE) - min(x, na.rm=TRUE))}

# my dataframe to apply the formula by blocks
mydf<-data.frame(blocks=rep(c("a","b","c"),each=5), v1=round(runif(15,10,25),0), v2=round(rnorm(15,30,5),0))

#my attempts (not satisfied by final output)

tapply(mydf\$v1, mydf\$blocks, mynorm)

byf<-factor(mydf\$blocks)
aggregate(mydf[2:3], list(byf), mynorm)
aggregate(mydf[2:3], list(mydf\$blocks), mynorm, simplify = FALSE)

###########

please can anyone give me some hints on how to properly proceed?

I need a dataframe with all variables as final result
sorry but I’m sort of definitely stuck with this…

thanks

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help