[R] Joining two datasets - recursive procedure?

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Thu Mar 19 02:38:26 CET 2015


I don't understand your description. The standard practice on this list is to provide a reproducible R example [1] of the kind of data you are working with (and any code you have tried) to go along with your description. In this case, that would be two dputs of your input data frames and a dput of an output data frame (generated by hand from your input data frame). (Probably best to not use the full number of input values just to keep the size down.) We could then make an attempt to generate code that goes from input to output.

Of course, if you post that hard work using HTML then it will get corrupted (much like the text below from your earlier emails) and we won't be able to use it. Please learn to post from your email software using plain text when corresponding with this mailing list.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On March 18, 2015 9:05:37 AM PDT, Luca Meyer <lucam1968 at gmail.com> wrote:
>Thanks for you input Michael,
>
>The continuous variable I have measures quantities (down to the 3rd
>decimal level) so unfortunately are not frequencies.
>
>Any more specific suggestions on how that could be tackled?
>
>Thanks & kind regards,
>
>Luca
>
>
>===
>
>Michael Friendly wrote:
>I'm not sure I understand completely what you want to do, but
>if the data were frequencies, it sounds like task for fitting a
>loglinear model with the model formula
>
>~ V1*V2 + V3
>
>On 3/18/2015 2:17 AM, Luca Meyer wrote:
>>* Hello,
>*>>* I am facing a quite challenging task (at least to me) and I was
>wondering
>*>* if someone could advise how R could assist me to speed the task up.
>*>>* I am dealing with a dataset with 3 discrete variables and one
>continuous
>*>* variable. The discrete variables are:
>*>>* V1: 8 modalities
>*>* V2: 13 modalities
>*>* V3: 13 modalities
>*>>* The continuous variable V4 is a decimal number always greater than
>zero in
>*>* the marginals of each of the 3 variables but it is sometimes equal
>to zero
>*>* (and sometimes negative) in the joint tables.
>*>>* I have got 2 files:
>*>>* => one with distribution of all possible combinations of V1xV2
>(some of
>*>* which are zero or neagtive) and
>*>* => one with the marginal distribution of V3.
>*>>* I am trying to build the long and narrow dataset V1xV2xV3 in such
>a way
>*>* that each V1xV2 cell does not get modified and V3 fits as closely
>as
>*>* possible to its marginal distribution. Does it make sense?
>*>>* To be even more specific, my 2 input files look like the
>following.
>*>>* FILE 1
>*>* V1,V2,V4
>*>* A, A, 24.251
>*>* A, B, 1.065
>*>* (...)
>*>* B, C, 0.294
>*>* B, D, 2.731
>*>* (...)
>*>* H, L, 0.345
>*>* H, M, 0.000
>*>>* FILE 2
>*>* V3, V4
>*>* A, 1.575
>*>* B, 4.294
>*>* C, 10.044
>*>* (...)
>*>* L, 5.123
>*>* M, 3.334
>*>>* What I need to achieve is a file such as the following
>*>>* FILE 3
>*>* V1, V2, V3, V4
>*>* A, A, A, ???
>*>* A, A, B, ???
>*>* (...)
>*>* D, D, E, ???
>*>* D, D, F, ???
>*>* (...)
>*>* H, M, L, ???
>*>* H, M, M, ???
>*>>* Please notice that FILE 3 need to be such that if I aggregate on
>V1+V2 I
>*>* recover exactly FILE 1 and that if I aggregate on V3 I can recover
>a file
>*>* as close as possible to FILE 3 (ideally the same file).
>*>>* Can anyone suggest how I could do that with R?
>*>>* Thank you very much indeed for any assistance you are able to
>provide.
>*>>* Kind regards,
>*>>* Luca*
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list