[R] R newbie | sapply and FUN error

David Winsemius dwinsemius at comcast.net
Fri May 21 00:15:55 CEST 2010


On May 20, 2010, at 5:42 PM, egc wrote:

> Greetings -
>
> While I've used R a fair bit for basic statistical machinations, I've
> not used it for data manipulation - I've used SAS for 20+ years (and
> SAS real shines in data handling). So, I've started the process of
> trying to figure out 'how to do in R what I can do in my sleep in SAS'
> - specifically wrt to data manipulating. So, these are decidely
> 'newbie' level questions.
>
> So, starting very simple. Created a tine example CSV file, which I
> call test.csv.
>
> Loc,cost
> A,1
> C,3
> D,2
> F,3
> H,4
> K,3
> M,8
>
> Now, all I want to do is read it in, and derive a new variable which
> is a Z-transform of 'cost'. Here is what I've tried so far:
>
>> prices <- read.csv("c:/documents and settings/user/desktop/ 
>> test.csv",header=TRUE,sep=",",na.strings=".");
>>  print(prices$cost);
>
> So far, so good (being able to pull in the data is a good thing).
>
> Now, while I'm sure there are lots of ways to do what I want, I'm
> going to brute force it, by calculating column mean and column SD for
> 'cost', generate the Z-transformed value, and then add it to the
> dataframe. However, here is where I'm having problems. After about an
> hour of searching, I realized I need to use an 'apply' function to
> apply a function (say, mean) to a column in a dataframe. But, I can
> seem to get it to work successfully (and this is the gist of the
> question).
>
> If I try
>
>> result <- sapply(prices['cost'],MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);

I suspect you are missing the easy way to do this;

mean( prices['cost'] )

>
>
> Works perfectly.
>
> But, if I simply change FUN=mean to FUN=sd, not so successful:
>
> If I try
>
>> result <- sapply(prices['cost'],MARGIN=2,FUN=sd,na.rm=TRUE);
>> print(result);
>

Try:

result <- sd(prices['cost'])

R functions often  expect to work on vectors without an explicit look  
or apply function.


> Throws the following error:
>
> Error in FUN(X[[1L]], ...) : unused argument(s) (MARGIN = 2)
>
> Further, If I try
>
>> result <- sapply(prices$cost,MARGIN=2,FUN=mean,na.rm=TRUE);
>> print(result);
>
> it prints 8 values corresponding to the value of each element of the
> data set - meaning, its treating prices$cost as a row vector.Which
> makes no sense to me. What do I have to do to use prices$cost as the
> first argument in the sapply call?

Not use sapply. "sapply" generally will be used to produce a vector or  
list  as a result. If you only want a scalar, then it's not the right  
tool.


> If I can't, why not?
> is.vector(prices$cost) shows TRUE, so why can't I take the mean over
> the vector?
>
> At any rate, I'll start from here. Being able to apply functions to
> column(s) of a dataframe seems pretty fundamental, so I'd like to
> start by understanding the basics.
>
> Thanks in advance.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list