[R] lapply with data frame

Bill.Venables at csiro.au Bill.Venables at csiro.au
Mon Mar 1 08:23:00 CET 2010


Oops!  My caveat about untested code was certainly appropriate.  The normalization code below will not work.  

Here is probably what I was thinking of doing:

data <- within(data, norm <- value / tapply(value, group, sum)[group])

The same caveats apply here as below!


________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Bill.Venables at csiro.au [Bill.Venables at csiro.au]
Sent: 01 March 2010 17:18
To: noah at smartmediacorp.com; r-help at r-project.org
Subject: [ExternalEmail] Re: [R] lapply with data frame

Data frames are lists.  Each column of the data frame is a component of the list.  So in, e.g.

lapply(data, function(x) x)

the function would receive each column of the data frame in turn.

To apply a function to each row of the data frame (which may need some care) one tool you can use is apply(...)

apply(data, 1, function(x) ...)

The form of the result will depend on the value of the function.  If the value returned by the function is a vector, these will form the *columns* of the result of apply, not the rows, which will be a matrix.

For the normalization problem, here is one way to do it:

data <- within(data, norm <- tapply(value, group, function(x) x/sum(x))[group])


Warning 1: the second of these assignment operators may not be replaced by '='.
Warning 2: untested code!

________________________________________
From: r-help-bounces at r-project.org [r-help-bounces at r-project.org] On Behalf Of Noah Silverman [noah at smartmediacorp.com]
Sent: 28 February 2010 12:37
To: r-help at r-project.org
Subject: [R] lapply with data frame

I'm a bit confused on how to use lapply with a data.frame.

For example.

lapply(data, function(x) print(x))

WHAT exactly is passed to the function.  Is it each ROW in the data
frame, one by one, or each column, or the entire frame in one shot?

What I want to do apply a function to each row in the data frame.  Is
lapply the right way.

A second application is to normalize a column value by group.  For
example, if I have the following table:
id    group    value      norm
1    A            3.2
2    A            3.0
3    A            3.1
4    B            5.5
5    B            6.0
6    B            6.2
etc...

The long version would be:
foreach (group in unique(data$group)){
     data$norm[group==group] <- data$value[group==group] /
sum(data$value[group==group])
}

There must be a faster way to do this with lapply.  (Ideally, I'd then
use mclapply to run on multi-cores and really crank up the speed.)

Any suggestions?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list