[R] Mean-Centering Question

David Winsemius dwinsemius at comcast.net
Sun Dec 9 04:12:20 CET 2012


On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:

> Hello,
>
> I'm trying to create a custom function that "mean-centers" data and  
> can be
> applied across many columns.
>
> Here is an example dataset, which is similar to my dataset:
>
>
dat <- read.table(text="Location,TimePeriod,Units,AveragePrice
Los Angeles,5/1/11,61,5.42
Los Angeles,5/8/11,49,4.69
Los Angeles,5/15/11,40,5.05
New York,5/1/11,259,6.4
New York,5/8/11,187,5.3
New York,5/15/11,177,5.7
Paris,5/1/11,672,6.26
Paris,5/8/11,514,5.3
Paris,5/15/11,455,5.2", header=TRUE, sep=",")
>
> I want to mean-center the "Units" and "AveragePrice" Columns.
>
> So, I created this function:
>
> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }

I needed to modify this to avoid errors relating to how colMeans is  
expecting its arguments:

specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }

aggregate(dat[3:4], dat[1], FUN=specialFunction2)

      Location    Units.1    Units.2    Units.3 AveragePrice.1  
AveragePrice.2
1 Los Angeles  0.2136827 -0.0053709 -0.2083118      0.0717903      
-0.0728730
2    New York  0.2354659 -0.0902535 -0.1452124      0.1014743      
-0.0871168
3       Paris  0.2193320 -0.0487031 -0.1706289      0.1173316      
-0.0491417
   AveragePrice.3
1      0.0010827
2     -0.0143575
3     -0.0681899

>
> If I use only "one" column in the first argument of the "by" function,
> everything is in fine.  For example the following code will work fine:
>
> by(data[c("Units")],
> data["Location"],
> specialFunction)
>
> But the following code will "not" work, because I have "two" columns  
> in the
> first argument...
>
> by(data[c("Units", "AveragePrice")],
> data["Location"],
> specialFunction)

OK. So then I tried this with your function and was surprised to see  
that it also works:

 > by(dat[c("Units", "AveragePrice")],
+ dat["Location"],
+ specialFunction)
Location: Los Angeles
      Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827
------------------------------------------------------------------
Location: New York
      Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357
------------------------------------------------------------------
Location: Paris
      Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819

>
> Does anyone have any ideas as to what I am doing wrong?

I guess I don't. Cannot reproduce and my other methods worked as  
well.This also works with your version and with mine but I get the  
deprecation message for `mean.data.frame` from mine:

 > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
$`Los Angeles`
      Units AveragePrice
1  0.21368    0.0717903
2  2.27351   -2.3517586
3 -0.20831    0.0010827

$`New York`
      Units AveragePrice
4  0.23547     0.101474
5  3.47628    -3.653655
6 -0.14521    -0.014357

$Paris
      Units AveragePrice
7  0.21933      0.11733
8  4.52537     -4.62322
9 -0.17063     -0.06819

>
> Please note that I'm trying to get the following results (for the "Los
> Angeles" group):
>
> Los Angeles "Units" variable (Mean-Centered)
> 0.213682659
> -0.005370907
> -0.208311751
>
> Los Angeles "AveragePrice" variable (Mean-Centered)
> 0.071790268
> -0.072872965
> 0.001082696

-- 

David Winsemius, MD
Alameda, CA, USA




More information about the R-help mailing list