[R] R newbie | sapply and FUN error

egc forum.query at gmail.com
Thu May 20 23:42:32 CEST 2010


Greetings -

While I've used R a fair bit for basic statistical machinations, I've
not used it for data manipulation - I've used SAS for 20+ years (and
SAS real shines in data handling). So, I've started the process of
trying to figure out 'how to do in R what I can do in my sleep in SAS'
- specifically wrt to data manipulating. So, these are decidely
'newbie' level questions.

So, starting very simple. Created a tine example CSV file, which I
call test.csv.

Loc,cost
A,1
C,3
D,2
F,3
H,4
K,3
M,8

Now, all I want to do is read it in, and derive a new variable which
is a Z-transform of 'cost'. Here is what I've tried so far:

> prices <- read.csv("c:/documents and settings/user/desktop/test.csv",header=TRUE,sep=",",na.strings=".");
>  print(prices$cost);

So far, so good (being able to pull in the data is a good thing).

Now, while I'm sure there are lots of ways to do what I want, I'm
going to brute force it, by calculating column mean and column SD for
'cost', generate the Z-transformed value, and then add it to the
dataframe. However, here is where I'm having problems. After about an
hour of searching, I realized I need to use an 'apply' function to
apply a function (say, mean) to a column in a dataframe. But, I can
seem to get it to work successfully (and this is the gist of the
question).

If I try

> result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);


Works perfectly.

But, if I simply change FUN=mean to FUN=sd, not so successful:

If I try

> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
> print(result);

Throws the following error:

Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)

Further, If I try

> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
> print(result);

it prints 8 values corresponding to the value of each element of the
data set - meaning, its treating prices$cost as a row vector.Which
makes no sense to me. What do I have to do to use prices$cost as the
first argument in the sapply call? If I can't, why not?
is.vector(prices$cost) shows TRUE, so why can't I take the mean over
the vector?

At any rate, I'll start from here. Being able to apply functions to
column(s) of a dataframe seems pretty fundamental, so I'd like to
start by understanding the basics.

Thanks in advance.



More information about the R-help mailing list