[R] Generate a serie of new vars that correlate withexistingvar

Greg Snow Greg.Snow at intermountainmail.org
Mon Apr 9 17:03:33 CEST 2007


Oliver,

Reading your thoughts and thinking this over, my current impression is that this could make a good wiki page or other tutorial.  Yes there are probably more people out there who would like to do this, but have not had the theory class to learn the details of how this method works.  But rather than give a function that hides the details, I would rather spell out the method (with some explanation) for them to follow, along with what checks need to be done along the way.

For example, this method only works if the desired correlation matrix is positive definite (one way to check this is that all the eigen values are positive), or a modification of this method can still work if it is positive semi-definite.  For the example correlations that started this thread, it worked out that inserting 0 for the non-specified correlations worked, but if the correlations had been enough higher, then 0 would not have worked (can you imagine a case where x1 is highly positively correlated with x2 and x3, but x2 and x3 are independent of each other?)  A tutorial page can explain the test to do and what to do if it fails, a function would tend to hide this important detail and send target users to the mailing list with questions about cryptic error messages.

I am happy to work on a tutorial page, and currently the wiki seems a logical place to put it.  I however have never wikied before (is that the proper verb :-), is there anyone out there who would be willing to help with that side of things?  Or propose a better alternative? 

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111
 
 

> -----Original Message-----
> From: Olivier ETERRADOSSI [mailto:olivier.eterradossi at ema.fr] 
> Sent: Friday, April 06, 2007 2:04 AM
> To: Greg Snow
> Cc: r-help at stat.math.ethz.ch
> Subject: Re: [R] Generate a serie of new vars that correlate 
> withexistingvar
> 
> Hello Greg (and List),
> Thnaks for your reply and reflections (and sorry for my 
> "frenglish"....).
> Of course you're right, and I agree "a posteriori" with all 
> your views. 
> Probably my suggestion was first of all a mark of 
> appreciation for your solution ;-) .
> Here is the path I followed to get where I was, but I see 
> that I was probably misunderstanding what makes the "core" of R :
> 1) The question of making such related couples of vectors is 
> nearly a FAQ, as you point out in your reply.
> 2) It appeared to me that it is often asked by newbies or 
> users with relatively small statistical knowledge.
> 3) To get to your solution, a good understanding is needed of 
> what correlation is, as well as of matrix properties and 
> operators. My guess was that the people listed above have 
> generally not.
> 4) I believed from my own experience that the core of R was 
> dedicated either to basics or to rather complicated 
> algorithms to handle or produce results appearing as "simple" 
> or "classical".
> 5) From my same own experience, I was not able to imagine to 
> which non-core package such a function should "obviously" be 
> added. I imagined that in the same manner, a person seeking 
> for the function could have some problems in locating it. 
> Until now I did not have a look to your TeachingDemos package 
> (I'll do it), but I know of other categories of searchers, 
> often not statisticians, who  have a need to generate such 
> data and would not think of getting there to find a way.
> To end with, all this mainly shows that I did not understand 
> R philosophy as well as I thought !
> Thanks, and regards. Olivier
> 
> Greg Snow a écrit :
> > Oliver,
> >
> > I have thought of adding something like this to a package, 
> but here is my current thinking on the issue.
> >
> > This question (or similar) has been asked a few times, so 
> there is some demand for a general answer, I see three approaches:
> >
> > 1. Have an example of the necessary steps archived in a 
> publicly available place.
> > 2. Write a function and include it in a non-core package.
> > 3. Add it to the core of R or a core package.
> >
> > Number 1 is already in process as the e-mails will be part 
> of the archive.  Though someone is welcome to add it to the 
> Wiki if they think that would be useful as well.
> >
> > Your suggestion is number 3, but I would argue that 2 is 
> better than 3 for the simple reason that anything added to 
> the core is implied to be top quality and have pretty much 
> any options that most people would think of.  Putting it in a 
> non-core package makes it available, with less implications 
> of quality.
> >
> > The question then becomes, what options do we make 
> available?  Do we have them specify the entire correlation 
> structure? Or just assume the new variables will be 
> independent of each other?  What should the function do if 
> the set of correlations result in a matrix that is not 
> positive definite?  What if the user wants to have 2 fixed 
> variables?  And other questions.
> >
> > My current thinking is that the process is simple enough 
> that it is easier to do this by hand than to remember all the 
> options to the function.  There are currently people who use 
> bootstrap and permutation tests without loading in the 
> packages that do these because it is quicker to write the 
> code by hand than to remember the syntax of the functions.  I 
> think this type of data generation falls under the same 
> situation.  But if you, or someone else thinks that there is 
> enough justification for a function to do this, and can 
> specify what options it should have, I will be happy to add 
> it to my TeachingDemos package (this seems an appropriate 
> place, since one of the places that I want to generate data 
> with a specific correlation structure is when creating an 
> example for students).
> >
> >
> > Hope this helps,
> >
> >   
> 
> --
> Olivier ETERRADOSSI
> Maître-Assistant
> CMGD / Equipe "Propriétés Psycho-Sensorielles des Matériaux"
> Ecole des Mines d'Alès
> Hélioparc, 2 av. P. Angot, F-64053 PAU CEDEX 9 tel std: +33 
> (0)5.59.30.54.25 tel direct: +33 (0)5.59.30.90.35
> fax: +33 (0)5.59.30.63.68
> http://www.ema.fr
> 
>



More information about the R-help mailing list