[R] Defining partial list of variables

Tue Jan 5 17:15:52 CET 2021

I may not I properly understand the context of this discussion, and, in
particular what the my.formula() function does. But if I do, the following,
from ?formula, seems relevant and would indicate that the discussion is
unnecessary:

"There are two special interpretations of . in a formula. The usual one is
in the context of a data argument of model fitting functions and means ‘all
columns not otherwise in the formula’:"

This means you can fit different models just by indexing the columns -- by
number --  you wish to use in a data argument, viz:

y <- runif(100)
dat <- data.frame(matrix(runif(500), ncol = 5))
names(dat) <- letters[1:5]
head(dat)

## Use columns 1,3, and 5 only
mdl1 <- lm(y ~ ., data = dat[,c(1,3,5)])

## Result:
 summary(mdl1)

Call:
lm(formula = y ~ ., data = dat[, c(1, 3, 5)])

Residuals:
     Min       1Q   Median       3Q      Max
-0.52334 -0.27494  0.01245  0.28637  0.51998

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.51461    0.08236   6.248 1.14e-08 ***
a            0.01516    0.10928   0.139    0.890
c            0.03517    0.10399   0.338    0.736
e           -0.09437    0.10967  -0.861    0.392
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.299 on 96 degrees of freedom
Multiple R-squared:  0.008256, Adjusted R-squared:  -0.02274
F-statistic: 0.2664 on 3 and 96 DF,  p-value: 0.8495

If I have misunderstood and this is unhelpful, just ignore without comment.
You don't need to waste time explaining it to me.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Jan 5, 2021 at 4:49 AM Heinz Tuechler <tuechler using gmx.at> wrote:

> What about the Cs()-function in Hmisc?
> library(Hmisc)
> Cs(a,b,c)
> [1] "a" "b" "c"
>
> Steven Yen wrote/hat geschrieben on/am 05.01.2021 13:29:
> > Thanks Eric. Yes, "unlist" makes a difference. Below, I am doing not
> > regression but summary to keep the example simple.
> >
> >  > set.seed(123)
> >  > data<-matrix(runif(1:25),nrow=5)
> >  > colnames(data)<-c("x1","x2","x3","x4","x5"); data
> >              x1        x2        x3         x4        x5
> > [1,] 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393
> > [2,] 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034
> > [3,] 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068
> > [4,] 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698
> > [5,] 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058
> >  > j<-strsplit(gsub("[\n ]","","x1,x3,x5"),",")
> >  > j<-unlist(j); j
> > [1] "x1" "x3" "x5"
> >  > summary(data[,j])
> >         x1               x3               x5
> >   Min.   :0.2876   Min.   :0.1029   Min.   :0.6405
> >   1st Qu.:0.4090   1st Qu.:0.4533   1st Qu.:0.6557
> >   Median :0.7883   Median :0.5726   Median :0.6928
> >   Mean   :0.6617   Mean   :0.5527   Mean   :0.7746
> >   3rd Qu.:0.8830   3rd Qu.:0.6776   3rd Qu.:0.8895
> >   Max.   :0.9405   Max.   :0.9568   Max.   :0.9943
> >
> > On 2021/1/5 下午 07:08, Eric Berger wrote:
> >> wrap it in unlist
> >>
> >> xx <- unlist(strsplit( .... ))
> >>
> >>
> >>
> >> On Tue, Jan 5, 2021 at 12:59 PM Steven Yen <styen using ntu.edu.tw
> >> <mailto:styen using ntu.edu.tw>> wrote:
> >>
> >>     Thanks Eric. Perhaps I should know when to stop. The approach
> >>     produces a slightly different variable list (note the [[1]]).
> >>     Consequently, I was not able to use xx in defining my regression
> >>     formula.
> >>
> >>     > x<-colnames(subset(mydata,select=c(
> >>
> >>     +    hhsize,urban,male,
> >>     +    age3045,age4659,age60, # age1529
> >>     +    highsc,tert,           # primary
> >>     +    gov,nongov,            # unemp
> >>     +    married))); x
> >>      [1] "hhsize"  "urban"   "male"    "age3045" "age4659" "age60"
> >>     "highsc"  "tert"
> >>      [9] "gov"     "nongov"  "married"
> >>     > xx<-strsplit(gsub("[\n ]","",
> >>     +    "hhsize,urban,male,
> >>     +     age3045,age4659,age60,
> >>     +     highsc,tert,
> >>     +     gov,nongov,
> >>     +     married"
> >>     + ),","); xx
> >>     [[1]]
> >>      [1] "hhsize"  "urban"   "male"    "age3045" "age4659" "age60"
> >>     "highsc"  "tert"
> >>      [9] "gov"     "nongov"  "married"
> >>
> >>     > eq1<-my.formula(y="cig",x=x); eq1
> >>     cig ~ hhsize + urban + male + age3045 + age4659 + age60 + highsc +
> >>         tert + gov + nongov + married
> >>     > eq2<-my.formula(y="cig",x=xx); eq2
> >>     cig ~ c("hhsize", "urban", "male", "age3045", "age4659", "age60",
> >>         "highsc", "tert", "gov", "nongov", "married")
> >>
> >>     On 2021/1/5 下午 06:01, Eric Berger wrote:
> >>>     If your column names have no spaces the following should work
> >>>
> >>>      x<-strsplit(gsub("[\n ]","",
> >>>      "hhsize,urban,male,
> >>>     + gov,nongov,married"),","); x
> >>>
> >>>     On Tue, Jan 5, 2021 at 11:47 AM Steven Yen <styen using ntu.edu.tw
> >>>     <mailto:styen using ntu.edu.tw>> wrote:
> >>>
> >>>         Here we go! BUT, it works great for a continuous line. With
> >>>         line break(s), I got the nuisance "\n" inserted.
> >>>
> >>>         > x<-strsplit("hhsize,urban,male,gov,nongov,married",","); x
> >>>         [[1]]
> >>>         [1] "hhsize"  "urban"   "male"    "gov" "nongov"  "married"
> >>>
> >>>         > x<-strsplit("hhsize,urban,male,
> >>>         +             gov,nongov,married",","); x
> >>>         [[1]]
> >>>         [1] "hhsize"            "urban" "male"
> >>>         "\n            gov"
> >>>         [5] "nongov"            "married"
> >>>
> >>>         On 2021/1/5 下午 05:34, Eric Berger wrote:
> >>>>
>  zx<-strsplit("age,exercise,income,white,black,hispanic,base,somcol,grad,employed,unable,homeowner,married,divorced,widowed",",")
> >>>>
> >>>>
> >>>>
> >>>>         On Tue, Jan 5, 2021 at 11:01 AM Steven Yen <styen using ntu.edu.tw
> >>>>         <mailto:styen using ntu.edu.tw>> wrote:
> >>>>
> >>>>             Thank you, Jeff. IMO, we are all here to make R work
> >>>>             better to suit our
> >>>>             various needs. All I am asking is an easier way to
> >>>>             define variable list
> >>>>             zx, differently from the way z0 , x0, and treat are
> defined.
> >>>>
> >>>>              > zx<-colnames(subset(mydata,select=c(
> >>>>             +
> >>>>
>  age,exercise,income,white,black,hispanic,base,somcol,grad,employed,
> >>>>             + unable,homeowner,married,divorced,widowed)))
> >>>>              > z0<-c("fruit","highblood")
> >>>>              > x0<-c("vgood","poor")
> >>>>              > treat<-"depression"
> >>>>              > eq1 <-my.formula(y="depression",x=zx,z0)
> >>>>              > eq2 <-my.formula(y="bmi", x=zx,x0)
> >>>>              > eq2t<-my.formula(y="bmi", x=zx,treat)
> >>>>              > eqs<-list(eq1,eq2); eqs
> >>>>             [[1]]
> >>>>             depression ~ age + exercise + income + white + black +
> >>>>             hispanic +
> >>>>                  base + somcol + grad + employed + unable +
> >>>>             homeowner + married +
> >>>>                  divorced + widowed + fruit + highblood
> >>>>
> >>>>             [[2]]
> >>>>             bmi ~ age + exercise + income + white + black + hispanic
> >>>>             + base +
> >>>>                  somcol + grad + employed + unable + homeowner +
> >>>>             married +
> >>>>                  divorced + widowed + vgood + poor
> >>>>
> >>>>              > eqt<-list(eq1,eq2t); eqt
> >>>>             [[1]]
> >>>>             depression ~ age + exercise + income + white + black +
> >>>>             hispanic +
> >>>>                  base + somcol + grad + employed + unable +
> >>>>             homeowner + married +
> >>>>                  divorced + widowed + fruit + highblood
> >>>>
> >>>>             [[2]]
> >>>>             bmi ~ age + exercise + income + white + black + hispanic
> >>>>             + base +
> >>>>                  somcol + grad + employed + unable + homeowner +
> >>>>             married +
> >>>>                  divorced + widowed + depression
> >>>>
> >>>>             On 2021/1/5 下午 04:18, Jeff Newmiller wrote:
> >>>>             > IMO if you want to hardcode a formula then simply
> >>>>             hardcode a formula. If you want 20 formulas, write 20
> >>>>             formulas. Is that really so bad?
> >>>>             >
> >>>>             > If you want to have an abbreviated way to specify sets
> >>>>             of variables without conforming to R syntax then put
> >>>>             them into data files and read them in using a format of
> >>>>             your choice.
> >>>>             >
> >>>>             > But using NSE to avoid using quotes for entering what
> >>>>             amounts to in-script data is abuse of the language
> >>>>             justified by laziness... the amount of work you put
> >>>>             yourself and anyone else who reads your code through is
> >>>>             excessive relative to the benefit gained.
> >>>>             >
> >>>>             > NSE has its strengths... but as a method of creating
> >>>>             data objects it sucks. Note that even the tidyverse
> >>>>             (now) requires you to use quotes when you are not
> >>>>             directly referring to something that already exists. And
> >>>>             if you were... you might as well be creating a formula.
> >>>>             >
> >>>>             > On January 4, 2021 11:14:54 PM PST, Steven Yen
> >>>>             <styen using ntu.edu.tw <mailto:styen using ntu.edu.tw>> wrote:
> >>>>             >> I constantly define variable lists from a data frame
> >>>>             (e.g., to define a
> >>>>             >>
> >>>>             >> regression equation). Line 3 below does just that.
> >>>>             Placing each
> >>>>             >> variable
> >>>>             >> name in quotation marks is too much work especially
> >>>>             for a long list so
> >>>>             >> I
> >>>>             >> do that with line 4. Is there an easier way to
> >>>>             accomplish this----to
> >>>>             >> define a list of variable names containing
> >>>>             "a","c","e"? Thank you!
> >>>>             >>
> >>>>             >>> data<-as.data.frame(matrix(1:30,nrow=6))
> >>>>             >>> colnames(data)<-c("a","b","c","d","e"); data
> >>>>             >>    a  b  c  d  e
> >>>>             >> 1 1  7 13 19 25
> >>>>             >> 2 2  8 14 20 26
> >>>>             >> 3 3  9 15 21 27
> >>>>             >> 4 4 10 16 22 28
> >>>>             >> 5 5 11 17 23 29
> >>>>             >> 6 6 12 18 24 30
> >>>>             >>> x1<-c("a","c","e"); x1 # line 3
> >>>>             >> [1] "a" "c" "e"
> >>>>             >>> x2<-colnames(subset(data,select=c(a,c,e))); x2 # line
> 4
> >>>>             >> [1] "a" "c" "e"
> >>>>             >>
> >>>>             >> ______________________________________________
> >>>>             >> R-help using r-project.org <mailto:R-help using r-project.org>
> >>>>             mailing list -- To UNSUBSCRIBE and more, see
> >>>>             >> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>             <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>             >> PLEASE do read the posting guide
> >>>>             >> http://www.R-project.org/posting-guide.html
> >>>>             <http://www.R-project.org/posting-guide.html>
> >>>>             >> and provide commented, minimal, self-contained,
> >>>>             reproducible code.
> >>>>
> >>>>             ______________________________________________
> >>>>             R-help using r-project.org <mailto:R-help using r-project.org>
> >>>>             mailing list -- To UNSUBSCRIBE and more, see
> >>>>             https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>             <https://stat.ethz.ch/mailman/listinfo/r-help>
> >>>>             PLEASE do read the posting guide
> >>>>             http://www.R-project.org/posting-guide.html
> >>>>             <http://www.R-project.org/posting-guide.html>
> >>>>             and provide commented, minimal, self-contained,
> >>>>             reproducible code.
> >>>>
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]