[R] Referencing variable names rather than column numbers

Ista Zahn istazahn at gmail.com
Sat Dec 5 17:43:49 CET 2009


As baptiste noted, you can do

cor(pollute[ ,c("Pollution","Temp","Industry")]).

But

cor(pollute[,"Pollution":"Industry"])

will not work. For that you can do

cor(pollute[ ,which(names(pollute)=="Pollution"):which(names(pollute)=="Industry")])

-Ista

On Sat, Dec 5, 2009 at 11:22 AM, John-Paul Ferguson
<ferguson_john-paul at gsb.stanford.edu> wrote:
> I apologize for how basic a question this is. I am a Stata user who
> has begun using R, and the syntax differences still trip me up. The
> most basic questions, involving as they do general terms, can be the
> hardest to find solutions for through search.
>
> Assume for the moment that I have a dataset that contains seven
> variables: Pollution, Temp, Industry, Population, Wind, Rain and
> Wet.days. (This actual dataset is taken from Michael Crawley's
> "Statistics: An Introduction Using R" and is available as
> "pollute.txt" in
> http://www.bio.ic.ac.uk/research/crawley/statistics/data/zipped.zip.)
> Assume I have attached pollute. Then
>
> cor(pollute)
>
> will give me the correlation table for these seven variables. If I
> would prefer only to see the correlations between, say, Pollution,
> Temp and Industry, I can get that with
>
> cor(pollute[,1:3])
>
> or with
>
> cor(pollute[1:3])
>
> Similarly, I can see the correlations between Temp, Population and Rain with
>
> cor(pollute[,c(2,4,6)])
>
> or with
>
> cor(pollute[c(2,4,6)])
>
> This is fine for a seven-variable dataset. When I have 250 variables,
> though, I start to pale at looking up column indexes over and over. I
> know from reading the list archives that I can extract the column
> index of Industry, for example, by typing
>
> which("Industry"==names(pollute))
>
> but doing that before each command seems dire. Trained to using Stata
> as I am, I am inclined to check the correlation of the first three or
> the second, fourth and sixth columns by substituting the column names
> for the column indexes--something like the following:
>
> cor(pollute[Pollution:Industry])
> cor(pollute[c(Temp,Population,Rain)])
>
> These however throw errors.
>
> I know that many commands in R are perfectly happy to take variable
> names--the regression models, for example--but that some do not. And
> so I ask you two general questions:
>
> 1. Is there a syntax for referring to variable names rather than
> column indexes in situations like these?
> 2. Is there something that I should look for in a command's help file
> that often indicates whether it can take column names rather than
> indexes?
>
> Again, apologies for asking something that has likely been asked
> before. I would appreciate any suggestions that you have.
>
> Best,
> John-Paul Ferguson
> Assistant Professor of Organizational Behavior
> Stanford University Graduate School of Business
> 518 Memorial Way, K313
> Stanford, CA 94305
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org




More information about the R-help mailing list