[R] Mean-Centering Question

arun smartpink111 at yahoo.com
Sun Dec 9 17:26:38 CET 2012


Hi,

You could also use:
newFunction1<-function(x) {t(t(log(x))-colMeans(log(x)))}

 res1<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction1)
 res1
#Location: Los Angeles
#         Units AveragePrice
#1  0.213682659  0.071790268
#2 -0.005370907 -0.072872965
#3 -0.208311751  0.001082696
#------------------------------------------------------------ 
#Location: New York
 #       Units AveragePrice
#4  0.23546592   0.10147433
#5 -0.09025352  -0.08711684
#6 -0.14521240  -0.01435749
#------------------------------------------------------------ 
#Location: Paris
 #       Units AveragePrice
#7  0.21933200   0.11733164
#8 -0.04870308  -0.04914172
#9 -0.17062892  -0.06818992


  newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }
 res<-by(dat1[c("Units","AveragePrice")],dat1["Location"],newFunction)
 res
#Location: Los Angeles
 #        Units AveragePrice
#1  0.213682659  0.071790268
#2 -0.005370907 -0.072872965
#3 -0.208311751  0.001082696
#------------------------------------------------------------ 
#Location: New York
 #       Units AveragePrice
#4  0.23546592   0.10147433
#5 -0.09025352  -0.08711684
#6 -0.14521240  -0.01435749
#------------------------------------------------------------ 
#Location: Paris
 #       Units AveragePrice
#7  0.21933200   0.11733164
#8 -0.04870308  -0.04914172
#9 -0.17062892  -0.06818992

#the ?identical() will be FALSE, as the list elements for res is data.frame and res1 is matrix.  

A.K.


----- Original Message -----
From: "Ray DiGiacomo, Jr." <rayd at liondatasystems.com>
To: R Help <r-help at r-project.org>
Cc: 
Sent: Saturday, December 8, 2012 11:11 PM
Subject: Re: [R] Mean-Centering Question

Hi David and Arun,

Thanks for looking into this.  I think I have found a solution.

The "by" function will run ok without errors but the values returned in the
second row of the "Los Angeles" output are both incorrect.  These incorrect
values are shown below in red.

I think my original custom function was causing the incorrect values
because the subtraction inside the original custom function was subtracting
frames that had different dimensions and I think there was some "recycling"
happening.

Using the "sweep" function fixes the problem.  This is what I did to fix
things:

# here is my "new" custom function
newFunction <- function(x) { sweep(log(x), 2, colMeans(log(x)), "-") }

# this gives the correct values
by(PullData[c("Units","AveragePrice")],
PullData[c("StoreLocation")],
        newFunction)

- Ray





On Sat, Dec 8, 2012 at 7:12 PM, David Winsemius <dwinsemius at comcast.net>wrote:

>
> On Dec 8, 2012, at 3:54 PM, Ray DiGiacomo, Jr. wrote:
>
>  Hello,
>>
>> I'm trying to create a custom function that "mean-centers" data and can be
>> applied across many columns.
>>
>> Here is an example dataset, which is similar to my dataset:
>>
>>
>>  dat <- read.table(text="Location,**TimePeriod,Units,AveragePrice
>
> Los Angeles,5/1/11,61,5.42
> Los Angeles,5/8/11,49,4.69
> Los Angeles,5/15/11,40,5.05
> New York,5/1/11,259,6.4
> New York,5/8/11,187,5.3
> New York,5/15/11,177,5.7
> Paris,5/1/11,672,6.26
> Paris,5/8/11,514,5.3
> Paris,5/15/11,455,5.2", header=TRUE, sep=",")
>
>
>> I want to mean-center the "Units" and "AveragePrice" Columns.
>>
>> So, I created this function:
>>
>> specialFunction <- function(x){ log(x) - colMeans(log(x), na.rm = T) }
>>
>
> I needed to modify this to avoid errors relating to how colMeans is
> expecting its arguments:
>
> specialFunction2 <- function(x){ log(x) - mean(log(x), na.rm = T) }
>
> aggregate(dat[3:4], dat[1], FUN=specialFunction2)
>
>      Location    Units.1    Units.2    Units.3 AveragePrice.1
> AveragePrice.2
> 1 Los Angeles  0.2136827 -0.0053709 -0.2083118      0.0717903
> -0.0728730
> 2    New York  0.2354659 -0.0902535 -0.1452124      0.1014743
> -0.0871168
> 3       Paris  0.2193320 -0.0487031 -0.1706289      0.1173316
> -0.0491417
>   AveragePrice.3
> 1      0.0010827
> 2     -0.0143575
> 3     -0.0681899
>
>
>
>> If I use only "one" column in the first argument of the "by" function,
>> everything is in fine.  For example the following code will work fine:
>>
>> by(data[c("Units")],
>> data["Location"],
>> specialFunction)
>>
>> But the following code will "not" work, because I have "two" columns in
>> the
>> first argument...
>>
>> by(data[c("Units", "AveragePrice")],
>> data["Location"],
>> specialFunction)
>>
>
> OK. So then I tried this with your function and was surprised to see that
> it also works:
>
> > by(dat[c("Units", "AveragePrice")],
> + dat["Location"],
> + specialFunction)
> Location: Los Angeles
>      Units AveragePrice
> 1  0.21368    0.0717903
> 2  *2.27351   -2.3517586*
> 3 -0.20831    0.0010827
> ------------------------------**------------------------------**------
> Location: New York
>      Units AveragePrice
> 4  0.23547     0.101474
> 5  3.47628    -3.653655
> 6 -0.14521    -0.014357
> ------------------------------**------------------------------**------
> Location: Paris
>      Units AveragePrice
> 7  0.21933      0.11733
> 8  4.52537     -4.62322
> 9 -0.17063     -0.06819
>
>
>
>> Does anyone have any ideas as to what I am doing wrong?
>>
>
> I guess I don't. Cannot reproduce and my other methods worked as well.This
> also works with your version and with mine but I get the deprecation
> message for `mean.data.frame` from mine:
>
> > lapply( split(dat[3:4], dat[1]) , FUN=specialFunction )
> $`Los Angeles`
>      Units AveragePrice
> 1  0.21368    0.0717903
> 2  2.27351   -2.3517586
> 3 -0.20831    0.0010827
>
> $`New York`
>      Units AveragePrice
> 4  0.23547     0.101474
> 5  3.47628    -3.653655
> 6 -0.14521    -0.014357
>
> $Paris
>      Units AveragePrice
> 7  0.21933      0.11733
> 8  4.52537     -4.62322
> 9 -0.17063     -0.06819
>
>
>
>> Please note that I'm trying to get the following results (for the "Los
>> Angeles" group):
>>
>> Los Angeles "Units" variable (Mean-Centered)
>> 0.213682659
>> -0.005370907
>> -0.208311751
>>
>> Los Angeles "AveragePrice" variable (Mean-Centered)
>> 0.071790268
>> -0.072872965
>> 0.001082696
>>
>
> --
>
> David Winsemius, MD
> Alameda, CA, USA
>
>

    [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list