[R] Tying to underdressed the magic of lm redux

Stephen Ellison S@E|||@on @end|ng |rom LGCGroup@com
Mon Jun 3 13:27:40 CEST 2019


If you want to pass a data frame (not just its name) plus (some) column names to your function, the easiest way  is to put the _quoted_ names in a vector, as previously posted. For example

f1 <- function(dfrm, colnames) {
	print(dfrm[,colnames]) #or, if you like, for(nn in colnames) print(dfrm[[nn]]) 
                                                                #or for(nn in colnames) print(dfrm[,nn])
}
#Using your data frame (with name df to avoid confusion with the density for the F distribution, df())
f1(df1, c("a", "b") ) #c("a", "b") is a character vector. 


If you want to avoid quotes and a vector (why?), you could use a one-sided formula
f1f <- function (form, data) { #that way round to look like lm
	print(data[,all.vars(form)]) 
	#OR (to print one at a time)
	#for(nn in all.vars(form)) print(data[[nn]])
}
f1f(~a+b, df1) #Almost any combination of formula symbols will work but '+' is enough


Finally, some comments on why your code didn't work:
> demo <- function(first,second,df)
	#first and second must exist outside the data frame or the function won't find them and you'll get errors like "object 'b' not found"

>   # None of the following work
>   print(df[,all.vars(first)])
	#all.vars fails because first is not an expression or call object, just a vector (if a exists outside the data frame at all). 

>   print(df[,first])
	#Won't work because 'a' is unknown at function call time, and if it did exist this would use it as a vector of names or column numbers.

>   print(df[,"first"])
	#"first" is not a column name in df

>   print(df[,all.vars(second)])
	#all as for 'first' above

>   df[,"sum"] <- print(df[,first])+print(df[,second])
	#Fails principally because df[,first] and df[,second] don't work, for reasons above. 
	#It would work if first and second are legitimate index vectors, BUT only because print() returns its arguments invisibly. 
	#As a side effect it will print first and second before returning the value.
	# If R didn't return arguments invisibly, you'd be asking R to add together two printouts, which would be 
	# about as sensible as asking it to add two pieces of paper together to get a vector of numbers.

It's quicker to say what _does_ work...

First, look up ?"[" and see what kinds of value that takes  and how it works.
It will tell you that row and  col in something like df1[row, col] must be vectors, and that they can be logical, numeric or character.
If they are logical, rows and columns corresponding to TRUE are accessed by df1[row, col]. But their lengths must be equal to the number of rows and columns, respectively, in df1 because TRUE must be at the same location in the vector row (say) as the rows you want from the data frame.
If they are numeric, you'll get the relevant rows and columns counting from top left.
If they are character vectors, you'll get the rows and columns with those names.
Mixing and matching works.

So:
df1[, "a"]
df1[,1]
df1[,c(TRUE, FALSE)]
will all work and return the first column.

and
demo(first="a", second="b", df1)
will work in your function, as would demo("a", "b", df1)

demo(a, b, df1) will NOT work unless a and b are vectors that are legitimate indices for df1

Pretty much nothing else will work either, unless it returns a vecotor that [ can interpret as an index.


There is one more cunning variant of [] for matrices, which is a matrix of indices; I'll leave that for ?"[" to explain.



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}



More information about the R-help mailing list