[R] Newbie-ish question on iteratively applying function to dataframe

Wed Mar 16 17:32:58 CET 2011

Brilliant - that was really useful!

On Tue, Mar 15, 2011 at 3:46 PM, Ista Zahn <izahn at psych.rochester.edu> wrote:
> Hi Claus,
>
> On Tue, Mar 15, 2011 at 9:33 AM, Claus O'Rourke <claus.orourke at gmail.com> wrote:
>> Hi,
>> I am trying to recursively apply a function to a selection of columns
>> in a dataframe. I've had a look around and from what I have read, I
>> should be using some version of the apply function, but I'm really
>> having some headaches with it.
>
> I would just do it in a loop (see below)
>>
>> Let me be more specific with an example.
>>
>> Say I have a data frame similar to the following
>>
>> A     x     y     z     r1    r2    r3    r4
>> 0.1  0.2  0.1 ...
>> 0.1  0.3 ...
>> 0.2 ...
>>
>> i.e., a number of columns, each of the same length, and all containing
>> real numbers. Of these columns, I want to model one variable, say A,
>> as a function of other variables, say x, y, z, and any one of my r1,
>> r2, r3, ... variables.
>>
>> i.e., I want to model
>> A ~ x + y + z + r1
>> A ~ x + y + z + r2
>> ....
>> A ~ x + y + z + rn
>>
>> But where the number of 'r' variables I will have will be large, and I
>> don't know the specific number of these variables in advance.
>>
>> My question first is, how can I select all the columns in a dataframe
>> that have a heading that matches a string pattern?
>
> ?grep
>
>>
>> And then related to this, what would be the best way of repeatedly
>> applying my modelling function to the result?
>
> Well, I don't know about the "best" way. But why not just
>
> set.seed(21 )
> dat <- as.data.frame(matrix(rnorm(100000 ), ncol=100, dimnames=list
> (1:1000, c("A", "x", "y", "z", paste("r", 1:96, sep="" )))))
>
> mods <- list()
> for(i in grep("r", names(dat ), value=TRUE)) {
>    mods[[i]] <- lm(as.formula(paste("A ~ x + y + z + ", i)), data=dat )
> }
>
> Note that  you should be cautious about making any inferences based on
> this kind of method. In the example above 9 r variables are
> "significant" at the .05 level, even though the data was generated
> "randomly":
>
> sort(sapply(mods, function(x) coef(summary(x))[5, 4]))
>
> Best,
> Ista
>>
>> Many thanks for any help for this occasional R armature.
>>
>> Claus
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Ista Zahn
> Graduate student
> University of Rochester
> Department of Clinical and Social Psychology
> http://yourpsyche.org
>