[R] [FORGED] Regression with factors ?

David Winsemius dwinsemius at comcast.net
Mon Jul 11 18:55:50 CEST 2016


> On Jul 11, 2016, at 7:28 AM, stn021 <stn021 at gmail.com> wrote:
> 
> Hello,
> 
> thank you for the replies. Sorry about the html-email, I forgot.
> Should be OK with this email.
> 
> 
> Don't be fooled be the apparent simplicity of the problem. I have
> tried to reduce it to only a single relatively simple question.

It would be useful to know whether this is a design effort and the data is not yet recorded or this is an analysis effort for data that is "in the can".
> 
> The idea here is to model cooperation of two persons. The model is
> about one specific aspect of that cooperation, namely that two persons
> with similar abilities may be able to produce better results that two
> very different persons.
> 
> That is only one part of the model with other parts modeling for
> example the fact that of course two persons with a higher degree of
> ability will produce better results per se.
> 
> 
> It is not classic regression with factors. That can be easily done by
> something like lm( y ~ (p1-p2)^2 ).

No. The caret "^" is an interaction operator in the formula context (not a power operator) and the minus sign causes variable removal.

Read:

?formula

If you want to create a calculated value that is the squared difference of two variables, then you need to do it either with `I` or in the dataframe before submission to the regression function.


> 
> This expands to lm( y ~ p1^2 - 2*p1*p2 + p2^2 ).

Used in a formula, p1^2 is exactly equal to p1.


> This contains a
> multiplicagtions and for lm() this implies interactions between the
> factor-levels and produces one parameter for each combination of
> factor-levels that occurs in the data. That is not what the question
> is about.
> 
> Also p1 and p2 are different levels of the same factor, while for lm()
> it would be two different factors with different levels.

Given your apparent lack of knowledge about R's formula syntax, we are also now unclear if you are using the word "factor" in the colloquial sense or as a technical term for discrete (factor) variables in R. What kind of values can p1 and p2 take?


> As for the sensical part: this has a real world application therefore
> it makes sense.
> 
> Also it is not so difficult to solve with non-linear optimization. I
> was hoping to be able to use R for that purpose because then the
> results could easily be checked with statistical tests.
> 
> So my question is not "how to solve" but "how to solve with R".
> 
> 
> As for the excess degrees of freedom, in real observations there would
> of course be added noise due to either random variations or factors
> not included in the model. So to generate a more reality-conforming
> example I could add some random normal-distributed noise to the
> dependent variable y. I previously left that part out because to me it
> did not seem relevant.

Knowing the nature of the outcome variable is generally important in statistical design.
> 
> 
> Would you like me to make a complete example dataset with more records
> and noise ?

Yes. And preferably do it with R code.

> 
> The answer I look for would be the numerical values of the
> factor-levels and numerical values for the multiplier (f) and the
> offset (o), with p1 and p2 given as names (here: persons) and y given
> as some level of achievement they reach by cooperating.
> 
> y = f * ( o - ( p1 - p2 )^2 )
> 
> Is that what you meant by "answer" ?

Not really. We would expect to see some data, at least dummy data, in a form that could be used for testing and demonstration.  The nature of "f" is particularly unclear (in large part because the science or "reality" is not described.)  Is it a function?  The "o" is probably going to be returned as an "(Intercept)". You started out with `lm` which would have little to do with non-linear optimization. You then said it "would not be so difficult" to do non-linear optimization of "something" which was not really specified with any substance. Without data and code it still reads as a salad of fragments of terminology lacking reference to a well-described scientific substrate. 

An "answer" would be:

Describe an experiment or a well designed set of observations with a specific outcome. Describe the hypotheses. Present code or data with a desired analysis plan. Ask for problems in R coding.



An off-topic question would be:

Help me design my psychology class project.


-- 
David.


> 
> 
> THX
> stefan
> 
> 
> 
> 
> 2016-07-10 2:27 GMT+02:00 Jeff Newmiller <jdnewmil at dcn.davis.ca.us>:
>> 
>> I have seen less sensical questions.
>> 
>> It would be nice if the example were a bit more complete (as in it should have excess degrees of freedom and an answer) and less like a homework problem (which are off topic here). It would of course also be helpful if the OP were to conform to the Posting Guide, particularly in respect to using plain text email.
>> 
>> It looks like the kind of nonlinear optimization problem that evolutionary algorithms are often applied to. It doesn't look (to me) like a typical problem that factors get applied to in formulas though, because multiple instances of the same factor variable are present.
>> --
>> Sent from my phone. Please excuse my brevity.
>> 
>> On July 9, 2016 4:59:30 PM PDT, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>> On 09/07/16 20:52, stn021 wrote:
>>>> Hello,
>>>> 
>>>> I would like to analyse a model like this:
>>>> 
>>>> y = 1 *  ( 1 - ( x1 - x2 )  ^ 2   )
>>>> 
>>>> x1 and x2 are not continuous variables but factors, so the
>>> observation
>>>> contain the level.
>>>> Its numerical value is unknown and is to be estimated with the model.
>>>> 
>>>> 
>>>> The observations look like this:
>>>> 
>>>> y        x1     x2
>>>> 0.96  Alice  Bob
>>>> 0.84  Alice  Charlie
>>>> 0.96  Bob   Charlie
>>>> 0.64  Dave Alice
>>>> etc.
>>>> 
>>>> Each person has a numerical value. Here for example Alice = 0.2 and
>>> Bob =
>>>> 0.4
>>>> 
>>>> Then y = 0.96 = 1* ( 1- ( 0.2-0.4 ) ^ 2 ) , see first observation.
>>>> 
>>>> How can this be done in R ?
>>> 
>>> 
>>> This question makes about as little sense as it is possible to imagine.
>>> 
>>> cheers,
>>> 
>>> Rolf Turner
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list