[R] Joining two datasets - recursive procedure?

Luca Meyer lucam1968 at gmail.com
Wed Mar 18 07:17:32 CET 2015


Hello,

I am facing a quite challenging task (at least to me) and I was wondering
if someone could advise how R could assist me to speed the task up.

I am dealing with a dataset with 3 discrete variables and one continuous
variable. The discrete variables are:

V1: 8 modalities
V2: 13 modalities
V3: 13 modalities

The continuous variable V4 is a decimal number always greater than zero in
the marginals of each of the 3 variables but it is sometimes equal to zero
(and sometimes negative) in the joint tables.

I have got 2 files:

=> one with distribution of all possible combinations of V1xV2 (some of
which are zero or neagtive) and
=> one with the marginal distribution of V3.

I am trying to build the long and narrow dataset V1xV2xV3 in such a way
that each V1xV2 cell does not get modified and V3 fits as closely as
possible to its marginal distribution. Does it make sense?

To be even more specific, my 2 input files look like the following.

FILE 1
V1,V2,V4
A, A, 24.251
A, B, 1.065
(...)
B, C, 0.294
B, D, 2.731
(...)
H, L, 0.345
H, M, 0.000

FILE 2
V3, V4
A, 1.575
B, 4.294
C, 10.044
(...)
L, 5.123
M, 3.334

What I need to achieve is a file such as the following

FILE 3
V1, V2, V3, V4
A, A, A, ???
A, A, B, ???
(...)
D, D, E, ???
D, D, F, ???
(...)
H, M, L, ???
H, M, M, ???

Please notice that FILE 3 need to be such that if I aggregate on V1+V2 I
recover exactly FILE 1 and that if I aggregate on V3 I can recover a file
as close as possible to FILE 3 (ideally the same file).

Can anyone suggest how I could do that with R?

Thank you very much indeed for any assistance you are able to provide.

Kind regards,

Luca

	[[alternative HTML version deleted]]



More information about the R-help mailing list