[R] Defining partial list of variables

Steven Yen @tyen @end|ng |rom ntu@edu@tw
Tue Jan 5 13:29:03 CET 2021


Thanks Eric. Yes, "unlist" makes a difference. Below, I am doing not 
regression but summary to keep the example simple.

 > set.seed(123)
 > data<-matrix(runif(1:25),nrow=5)
 > colnames(data)<-c("x1","x2","x3","x4","x5"); data
             x1        x2        x3         x4        x5
[1,] 0.2875775 0.0455565 0.9568333 0.89982497 0.8895393
[2,] 0.7883051 0.5281055 0.4533342 0.24608773 0.6928034
[3,] 0.4089769 0.8924190 0.6775706 0.04205953 0.6405068
[4,] 0.8830174 0.5514350 0.5726334 0.32792072 0.9942698
[5,] 0.9404673 0.4566147 0.1029247 0.95450365 0.6557058
 > j<-strsplit(gsub("[\n ]","","x1,x3,x5"),",")
 > j<-unlist(j); j
[1] "x1" "x3" "x5"
 > summary(data[,j])
        x1               x3               x5
  Min.   :0.2876   Min.   :0.1029   Min.   :0.6405
  1st Qu.:0.4090   1st Qu.:0.4533   1st Qu.:0.6557
  Median :0.7883   Median :0.5726   Median :0.6928
  Mean   :0.6617   Mean   :0.5527   Mean   :0.7746
  3rd Qu.:0.8830   3rd Qu.:0.6776   3rd Qu.:0.8895
  Max.   :0.9405   Max.   :0.9568   Max.   :0.9943

On 2021/1/5 下午 07:08, Eric Berger wrote:
> wrap it in unlist
>
> xx <- unlist(strsplit( .... ))
>
>
>
> On Tue, Jan 5, 2021 at 12:59 PM Steven Yen <styen using ntu.edu.tw 
> <mailto:styen using ntu.edu.tw>> wrote:
>
>     Thanks Eric. Perhaps I should know when to stop. The approach
>     produces a slightly different variable list (note the [[1]]).
>     Consequently, I was not able to use xx in defining my regression
>     formula.
>
>     > x<-colnames(subset(mydata,select=c(
>
>     +    hhsize,urban,male,
>     +    age3045,age4659,age60, # age1529
>     +    highsc,tert,           # primary
>     +    gov,nongov,            # unemp
>     +    married))); x
>      [1] "hhsize"  "urban"   "male"    "age3045" "age4659" "age60"  
>     "highsc"  "tert"
>      [9] "gov"     "nongov"  "married"
>     > xx<-strsplit(gsub("[\n ]","",
>     +    "hhsize,urban,male,
>     +     age3045,age4659,age60,
>     +     highsc,tert,
>     +     gov,nongov,
>     +     married"
>     + ),","); xx
>     [[1]]
>      [1] "hhsize"  "urban"   "male"    "age3045" "age4659" "age60"  
>     "highsc"  "tert"
>      [9] "gov"     "nongov"  "married"
>
>     > eq1<-my.formula(y="cig",x=x); eq1
>     cig ~ hhsize + urban + male + age3045 + age4659 + age60 + highsc +
>         tert + gov + nongov + married
>     > eq2<-my.formula(y="cig",x=xx); eq2
>     cig ~ c("hhsize", "urban", "male", "age3045", "age4659", "age60",
>         "highsc", "tert", "gov", "nongov", "married")
>
>     On 2021/1/5 下午 06:01, Eric Berger wrote:
>>     If your column names have no spaces the following should work
>>
>>      x<-strsplit(gsub("[\n ]","",
>>      "hhsize,urban,male,
>>     + gov,nongov,married"),","); x
>>
>>     On Tue, Jan 5, 2021 at 11:47 AM Steven Yen <styen using ntu.edu.tw
>>     <mailto:styen using ntu.edu.tw>> wrote:
>>
>>         Here we go! BUT, it works great for a continuous line. With
>>         line break(s), I got the nuisance "\n" inserted.
>>
>>         > x<-strsplit("hhsize,urban,male,gov,nongov,married",","); x
>>         [[1]]
>>         [1] "hhsize"  "urban"   "male"    "gov" "nongov"  "married"
>>
>>         > x<-strsplit("hhsize,urban,male,
>>         +             gov,nongov,married",","); x
>>         [[1]]
>>         [1] "hhsize"            "urban" "male"             
>>         "\n            gov"
>>         [5] "nongov"            "married"
>>
>>         On 2021/1/5 下午 05:34, Eric Berger wrote:
>>>         zx<-strsplit("age,exercise,income,white,black,hispanic,base,somcol,grad,employed,unable,homeowner,married,divorced,widowed",",")
>>>
>>>
>>>
>>>         On Tue, Jan 5, 2021 at 11:01 AM Steven Yen <styen using ntu.edu.tw
>>>         <mailto:styen using ntu.edu.tw>> wrote:
>>>
>>>             Thank you, Jeff. IMO, we are all here to make R work
>>>             better to suit our
>>>             various needs. All I am asking is an easier way to
>>>             define variable list
>>>             zx, differently from the way z0 , x0, and treat are defined.
>>>
>>>              > zx<-colnames(subset(mydata,select=c(
>>>             +
>>>             age,exercise,income,white,black,hispanic,base,somcol,grad,employed,
>>>             + unable,homeowner,married,divorced,widowed)))
>>>              > z0<-c("fruit","highblood")
>>>              > x0<-c("vgood","poor")
>>>              > treat<-"depression"
>>>              > eq1 <-my.formula(y="depression",x=zx,z0)
>>>              > eq2 <-my.formula(y="bmi", x=zx,x0)
>>>              > eq2t<-my.formula(y="bmi", x=zx,treat)
>>>              > eqs<-list(eq1,eq2); eqs
>>>             [[1]]
>>>             depression ~ age + exercise + income + white + black +
>>>             hispanic +
>>>                  base + somcol + grad + employed + unable +
>>>             homeowner + married +
>>>                  divorced + widowed + fruit + highblood
>>>
>>>             [[2]]
>>>             bmi ~ age + exercise + income + white + black + hispanic
>>>             + base +
>>>                  somcol + grad + employed + unable + homeowner +
>>>             married +
>>>                  divorced + widowed + vgood + poor
>>>
>>>              > eqt<-list(eq1,eq2t); eqt
>>>             [[1]]
>>>             depression ~ age + exercise + income + white + black +
>>>             hispanic +
>>>                  base + somcol + grad + employed + unable +
>>>             homeowner + married +
>>>                  divorced + widowed + fruit + highblood
>>>
>>>             [[2]]
>>>             bmi ~ age + exercise + income + white + black + hispanic
>>>             + base +
>>>                  somcol + grad + employed + unable + homeowner +
>>>             married +
>>>                  divorced + widowed + depression
>>>
>>>             On 2021/1/5 下午 04:18, Jeff Newmiller wrote:
>>>             > IMO if you want to hardcode a formula then simply
>>>             hardcode a formula. If you want 20 formulas, write 20
>>>             formulas. Is that really so bad?
>>>             >
>>>             > If you want to have an abbreviated way to specify sets
>>>             of variables without conforming to R syntax then put
>>>             them into data files and read them in using a format of
>>>             your choice.
>>>             >
>>>             > But using NSE to avoid using quotes for entering what
>>>             amounts to in-script data is abuse of the language
>>>             justified by laziness... the amount of work you put
>>>             yourself and anyone else who reads your code through is
>>>             excessive relative to the benefit gained.
>>>             >
>>>             > NSE has its strengths... but as a method of creating
>>>             data objects it sucks. Note that even the tidyverse
>>>             (now) requires you to use quotes when you are not
>>>             directly referring to something that already exists. And
>>>             if you were... you might as well be creating a formula.
>>>             >
>>>             > On January 4, 2021 11:14:54 PM PST, Steven Yen
>>>             <styen using ntu.edu.tw <mailto:styen using ntu.edu.tw>> wrote:
>>>             >> I constantly define variable lists from a data frame
>>>             (e.g., to define a
>>>             >>
>>>             >> regression equation). Line 3 below does just that.
>>>             Placing each
>>>             >> variable
>>>             >> name in quotation marks is too much work especially
>>>             for a long list so
>>>             >> I
>>>             >> do that with line 4. Is there an easier way to
>>>             accomplish this----to
>>>             >> define a list of variable names containing
>>>             "a","c","e"? Thank you!
>>>             >>
>>>             >>> data<-as.data.frame(matrix(1:30,nrow=6))
>>>             >>> colnames(data)<-c("a","b","c","d","e"); data
>>>             >>    a  b  c  d  e
>>>             >> 1 1  7 13 19 25
>>>             >> 2 2  8 14 20 26
>>>             >> 3 3  9 15 21 27
>>>             >> 4 4 10 16 22 28
>>>             >> 5 5 11 17 23 29
>>>             >> 6 6 12 18 24 30
>>>             >>> x1<-c("a","c","e"); x1 # line 3
>>>             >> [1] "a" "c" "e"
>>>             >>> x2<-colnames(subset(data,select=c(a,c,e))); x2 # line 4
>>>             >> [1] "a" "c" "e"
>>>             >>
>>>             >> ______________________________________________
>>>             >> R-help using r-project.org <mailto:R-help using r-project.org>
>>>             mailing list -- To UNSUBSCRIBE and more, see
>>>             >> https://stat.ethz.ch/mailman/listinfo/r-help
>>>             <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>             >> PLEASE do read the posting guide
>>>             >> http://www.R-project.org/posting-guide.html
>>>             <http://www.R-project.org/posting-guide.html>
>>>             >> and provide commented, minimal, self-contained,
>>>             reproducible code.
>>>
>>>             ______________________________________________
>>>             R-help using r-project.org <mailto:R-help using r-project.org>
>>>             mailing list -- To UNSUBSCRIBE and more, see
>>>             https://stat.ethz.ch/mailman/listinfo/r-help
>>>             <https://stat.ethz.ch/mailman/listinfo/r-help>
>>>             PLEASE do read the posting guide
>>>             http://www.R-project.org/posting-guide.html
>>>             <http://www.R-project.org/posting-guide.html>
>>>             and provide commented, minimal, self-contained,
>>>             reproducible code.
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list