[R] Accessing specific data.frame columns within function

Greg Snow 538280 at gmail.com
Fri Feb 5 19:42:07 CET 2016


You are trying to use shortcuts where shortcuts are not appropriate
and having to go a lot longer around than if you did not use the
shortcut, see fortune(312).

You should really reread the help page: help("[[") and section 6.1 of
An Introduction to R.

Basically you should be able to do something like:

f <- function(data, oldnames) {
  data <- data[ data[[oldnames[2] ]] == 4, ]
  data[['d']] <- data[[ oldnames[1] ]]^2 + data[[ oldnames[2] ]]
  data
}

Or maybe a little more readable (but not as good a golf score):

f <- function(data, oldnames) {
  aa <- oldnames[1]
  cc <- oldnames[2]
  data <- data[ data[[ cc ]] == 4, ]
  data[['d']] <- data[[ aa ]]^2 + data[[ cc ]]
  data
}

I could have used a and c instead of aa and cc, but the doubled
letters mean less confusion with the `c` function in R.

Also you should read (and heed) the Warning section on the help page
for subset (?subset).

On Thu, Feb 4, 2016 at 9:13 PM, Clark Kogan <kogan.clark at gmail.com> wrote:
> Hello,
>
> I am trying to write a function that adds a few columns to a data.frame. The
> function uses the columns in a specific way. For instance, it might take a^2
> + c to produce a column d. Or it might do more complex manipulations that I
> don't think I need to discuss here. I want to keep x as a data.frame when I
> pass it into the function, as I want to use some data.frame functionality on
> x.
>
> Furthermore, I don't want the names in x to have to be specific. I want to
> be able to specify which columns the function should treat as "a" and "c".
>
> The way I am currently doing it, is that I pass the names of the columns
> that I want to treat as a and c.
>
> f <- function(data,oldnames) {
>   newnames <- c("a","c")
>   ix <- match(oldnames,names(y))
>   names(y)[ix] <- newnames
>   y <- subset(y,c==4)
>   y$d <- y$a^2 + y$c
>   ix <- match(newnames,names(y))
>   names(y)[ix] <- oldnames
>   y
> }
>
> y <- data.frame(k=c(1,1,1),l=c(2,2,5),m=c(4,2,4))
> f(y,c("k","m"))
>
> The way that I am doing it does not seem all that elegent or standard
> practice. My question is: are there potential problems programming with
> data.frames in this way, and are their standard practice methods of
> referencing data.frame names that deal with these problems?
>
> Thanks!
>
> Clark
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list