[R] r-data partitioning considering two variables (character and numeric)

Ahmed Attia @hmed@ti@80 @ending from gm@il@com
Tue Aug 28 00:54:04 CEST 2018


I would like to partition the following dataset (dataGenotype) based
on two variables; Genotype and stand_ID, for example, for Genotype
H13: stand_ID number 7 may go to training and stand_ID number 18 and
21 may go to testing.

Genotype    stand_ID    Inventory_date  stemC   mheight
H13             7        5/18/2006  1940.1075   11.33995
H13             7        11/1/2008  10898.9597  23.20395
H13             7        4/14/2009  12830.1284  23.77395
H13            18        11/3/2005  2726.42 13.4432
H13            18        6/30/2008  12226.1554  24.091967
H13            18        4/14/2009  14141.68    25.0922
H13            21        5/18/2006  4981.7158   15.7173
H13            21        4/14/2009  20327.0667  27.9155
H15            9         3/31/2006  3570.06 14.7898
H15            9         11/1/2008  15138.8383  26.2088
H15            9         4/14/2009  17035.4688  26.8778
H15           20         1/18/2005  3016.881    14.1886
H15           20        10/4/2006   8330.4688   20.19425
H15           20        6/30/2008   13576.5 25.4774
H15           32        2/1/2006    3426.2525   14.31815
U21           3         1/9/2006    3660.416    15.09925
U21           3         6/30/2008   13236.29    24.27634
U21           3         4/14/2009   16124.192   25.79562
U21           67        11/4/2005   2812.8425   13.60485
U21           67        4/14/2009   13468.455   24.6203

And the desired output is the following;

A-training

Genotype    stand_ID    Inventory_date  stemC   mheight
H13            7         5/18/2006  1940.1075   11.33995
H13            7         11/1/2008  10898.9597  23.20395
H13            7         4/14/2009  12830.1284  23.77395
H15            9         3/31/2006  3570.06 14.7898
H15            9         11/1/2008  15138.8383  26.2088
H15            9         4/14/2009  17035.4688  26.8778
U21            67        11/4/2005  2812.8425   13.60485
U21            67        4/14/2009  13468.455   24.6203

B-testing

Genotype    stand_ID    Inventory_date  stemC   mheight
H13             18       11/3/2005  2726.42 13.4432
H13             18       6/30/2008  12226.1554  24.091967
H13             18       4/14/2009  14141.68    25.0922
H13             21       5/18/2006  4981.7158   15.7173
H13             21       4/14/2009  20327.0667  27.9155
H15             20       1/18/2005  3016.881    14.1886
H15             20       10/4/2006  8330.4688   20.19425
H15             20       6/30/2008  13576.5 25.4774
H15             32       2/1/2006   3426.2525   14.31815
U21             3        1/9/2006   3660.416    15.09925
U21             3        6/30/2008  13236.29    24.27634
U21             3        4/14/2009  16124.192   25.79562

I tried the following code;

library(caret)
dataPartitioning <- createDataPartition(dataGenotype$stand_ID,1,list=F,p=0.2)
train = dataGenotype[dataPartitioning,]
test = dataGenotype[-dataPartitioning,]

Also tried

createDataPartition(unique(dataGenotype$stand_ID),1,list=F,p=0.2)

It did not produce the desired output, the data are partitioned within
the stand_ID. For example, one row of stand_ID 7 goes to training and
two rows of stand_ID 7 go to testing. How can I partition the data by
Genotype and stand_ID together?.



Ahmed Attia



More information about the R-help mailing list