[R] applying a set of rules to each row

David Winsemius dwinsemius at comcast.net
Wed Jan 26 21:52:21 CET 2011


I remember something about the degree of nesting of ifelse calls being  
limited to 7 deep (???)  that makes me worry about this approach. You  
may want to look at the arules package or the data.table package or  
the sqldf package for approaches that are specifically constructed  
with this sort of processing in mind.

-- 
David.

On Jan 26, 2011, at 3:42 PM, KATSCHKE, ADRIAN CIV DFAS wrote:

> Yes. That is exactly what I would like to have running. Here is the  
> first attempt I made at using a nested ?ifelse statement for one of  
> the retirement plans. The variables are all there but with different  
> names. ageYOSstart is ageFedStart, SCDCivLeave is srvCompDT. I  
> haven't gotten this working. I am not sure that it is the correct  
> way to do what I would like.
>
> ## Regular retirement eligibility date for FERS employees
> retData.All$regRetireDT2[retData.All$retireSystem == "FERS"] <-  
> with(retData.All[retData.All$retireSystem == "FERS",],
>                                 ifelse(DOB < "01/01/53", ## Born  
> before 1953 minimum retirement age of 55
>                                        ifelse(ageYOSstart < 26,  
> dates(DOB+(365.25*55)),
>                                               ifelse((ageYOSstart >=  
> 26 & ageYOSstart < 31), dates(SCDCivLeave*(365.25*30)),
>                                                       
> ifelse((ageYOSstart >= 31 & ageYOSstart < 41), dates(DOB+(365.25*60)),
>                                                              
> ifelse((ageYOSstart >= 41 & ageYOSstart < 43),
>                                                                     
> dates(SCDCivLeave+(365.25*20)),
>                                                                     
> ifelse((ageYOSstart >= 43 & ageYOSstart < 58),
>                                                                           dates 
> (DOB+(365.25*62)),
>                                                                           ifelse 
> (ageYOSstart >= 58,
>                                                                                  dates 
> (SCDCivLeave+(365.25*5)), NA)))))),
>                                         ifelse((DOB < "12/31/69" &  
> DOB > "01/01/53"), ## Born between 1953 and 1969 MRA of 56
>                                                 ifelse(ageYOSstart <  
> 27, dates(DOB+(365.25*56)),
>                                                         
> ifelse((ageYOSstart >= 27 & ageYOSstart < 31),
>                                                                 
> dates(SCDCivLeave+(365.25*30)),
>                                                                 
> ifelse((ageYOSstart >= 31 & ageYOSstart < 41),
>                                                                       dates 
> (DOB+(365.25*60)),
>                                                                       ifelse 
> ((ageYOSstart >= 41 & ageYOSstart < 43),
>                                                                              dates 
> (SCDCivLeave+(365.25*20)),
>                                                                              ifelse 
> ((ageYOSstart >= 43 & ageYOSstart < 58),
>                                                                                     dates 
> (DOB+(365.25*62)),
>                                                                                     ifelse 
> (ageYOSstart >= 58,
>                                                                                            dates 
> (SCDCivLeave+(365.25*5)),
>                                                                                            NA 
> ))))))),
>                                         ifelse(DOB >= "01/01/69", ##  
> Born after 1969 Min Retire Age of 57
>                                                ifelse(ageYOSstart <  
> 28, dates(DOB+(365.25*57)),
>                                                        
> ifelse((ageYOSstart >= 28 & ageYOSstart < 31),
>                                                               
> dates(SCDCivLeave+(365.25*30)),
>                                                               
> ifelse((ageYOSstart >= 31 & ageYOSstart < 41),
>                                                                      
> dates(DOB+(365.25*20)),
>                                                                      
> ifelse((ageYOSstart >= 41 & ageYOSstart < 43),
>                                                                            dates 
> (SCDCivLeave+(365.25*20)),
>                                                                            ifelse 
> ((ageYOSstart >= 43 & ageYOSstart < 57),
>                                                                                   dates 
> (DOB+(365.25*62)),
>                                                                                   ifelse 
> (ageYOSstart >= 58,
>                                                                                          dates 
> (SCDCivLeave+(365.25*5)),
>                                                                                          NA 
> ))))))), NA))
>
> Adrian
>
>
>> If I understand you correctly, you want ?ifelse, which works on the
>> full logical vectors of rules applied to the variables, not
>> if....else, which works on only a single logical.
>
>> -- Bert Gunter
>
> On Wed, Jan 26, 2011 at 12:18 PM, KATSCHKE, ADRIAN CIV DFAS
> <ADRIAN.KATSCHKE at dfas.mil> wrote:
>> All,
>>
>> I would like to apply a set of rules to each row of the sample data  
>> set
>> below. The rule sets are the guidelines for determining an  
>> individual's
>> date for retirement eligibility. The rules are found in this  
>> document,
>> http://www.opm.gov/feddata/RetirementPaperFinal_v4.pdf. I am only
>> interested in the top two categories for retirement eligibility, the
>> CSRS and FERS plans.
>>
>> The data set has four variables Date of Birth (DOB), service  
>> computation
>> date (srvCompDT), retirement plan (retirePlan), and the age at  
>> which the
>> employee entered federal service (ageFedStart). The service  
>> computation
>> date is used to compute the date eligible for retirement. The  
>> retirement
>> plan indicates what system the employee is enrolled under.
>>
>> The data does contain a few other retirement plans, for now I want to
>> just ignore those plans. I have labeled plans as 1-CSRS and 2-FERS,  
>> and
>> 3-Other. My first attempt at applying the rules was through a complex
>> nesting of ifelse statements, this was not very successful and quite
>> difficult to follow. I then wrote a function and tried using "apply"
>> unsuccessfully. The function is shown below.
>>
>> I would like to put a short script or function together that would  
>> allow
>> for an efficient application of the rules to each of the employees.  
>> I am
>> trying to avoid a loop, because my data set is quite large, and I may
>> need to update my data set regularly and re-run the analysis and  
>> reports
>> that will come from this work.
>>
>> Any advice or guidance on building the function or code to apply the
>> rules would be quite helpful.
>>
>> retireHelp <-
>> structure(list(DOB = structure(c(-6642, -5134, -3444, -5598,
>> -4356, 5737, -4894, -1951, -2950, 2467, 6945, 4908, -7930, -7236,
>> -7727, -77, 4158, -7892, -6028, -7132, -5959, 2309, -2494, -3513,
>> -383, -216, -3369, -5861, 3674, -10265, -8986, -5023, -4862,
>> 1526, -1022, 2175, -11790, -278, -7275, -5084, -1842, 430, -2220,
>> -7444, 440, 4285, -7812, 3335, -7271, -6825, -1098, -1670, -10219,
>> -7131, 5963, 704, -7662, 4219, -2813, 5147, -7334, -8223, -5922,
>> -7497, -9276, -1291, -11640, -5631, 518, -7268, -2105, -5901,
>> -690, -8146, -7059, 133, 1176, -6091, -2895, -6020, -4724, -3616,
>> -5059, -8253, -2604, -12400, -4776, -3671, -9326, -7000, -5574,
>> -3248, 4255, -1358, -6255, 8, -7115, -1701, -5227, 9, -517, -8674,
>> -2554, -4069, -2077, -9872, -6534, 2970, -8307, -3020, -1343,
>> -8897, -2304, -7424, 2078, -8274, -5559, -8888, -9262, -8473,
>> -4088, -2429, -8006, -1091, 5015, 2765, 4036, 3101, -3743, 5103,
>> -10018, -12095, -7646, -5966, -6208, -5784, -1325, -4288, -1665,
>> -1409, 4685, -7881, -3413, 2738, -2201, 1217, -5113, 206, -1292,
>> -1725, 10, -2978, -1895, -830, -105, -2395, -3496, -8244, -9956,
>> -6494, -4678, -4077, 575, 2013, -3411, 3824, -4356, 4523, -5836,
>> -6350, -5337, -41, -2001, -6632, -970, -6790, -2828, -4061, 476,
>> 5854, -9648, -4227, 850, 2619, -7747, -2672, 4069, -12618, -6898,
>> -4178, -1772, -1643, -2064, -157, 4551, -8688, -6087, -2040,
>> -7239, -783), format = "m/d/y", origin = structure(c(1, 1, 1970
>> ), .Names = c("month", "day", "year")), class = c("dates", "times"
>> )), srvCompDT = structure(c(743, 12429, 3585, 4364, 13227, 13578,
>> 13591, 8585, 9587, 13913, 14753, 13247, 2246, 1439, 8845, 7018,
>> 12625, -552, 5688, 7080, 13255, 13549, 12709, 13969, 13997, 9532,
>> 13689, 1226, 13549, 4093, 13423, 13801, 3181, 14809, 13353, 9457,
>> 7745, 8986, 4759, 4486, 6449, 11172, 8669, 3344, 13745, 12275,
>> 5081, 13605, 8006, 3048, 6330, 13521, 5254, 1733, 14095, 8516,
>> 4848, 13521, 5970, 14697, 8291, 139, 11435, 3567, 8961, 5775,
>> 3602, 1409, 11577, 12163, 12258, 13156, 9472, 7963, 1362, 10332,
>> 9557, 3997, 7509, 4691, 3133, 5877, 6782, 11449, 13283, 8040,
>> 11565, 3425, 7860, 1790, 10778, 13199, 12625, 5889, 3317, 9831,
>> 1068, 8040, 7123, 9104, 12836, 7928, 12764, 8922, 5324, -1004,
>> 1806, 10263, 5635, 10310, 5625, 8861, 14613, 3896, 10316, 5725,
>> 12751, 6113, 2997, 112, 5707, 4987, -1018, 8055, 13885, 13073,
>> 14585, 14865, 14935, 14390, 9735, 7654, 4557, 661, 1638, 1112,
>> 14011, 3086, 7032, 13942, 13325, 6735, 13900, 12673, 10148, 14193,
>> 14767, 8447, 6114, 10688, 13544, 7106, 8587, 14753, 7886, 12280,
>> 11946, 13662, 3332, 2108, 13977, 6203, 8369, 13857, 8369, 11486,
>> 8306, 12466, 12639, 7270, 4325, 13843, 14026, 14039, 6147, 7676,
>> 5781, 7038, 9187, 14640, 6174, 11491, 13913, 13787, 13465, 8854,
>> 13152, 1826, 1412, 4317, 5794, 5548, 8951, 12947, 12639, 5345,
>> 5961, 4637, 6465, 13717), format = "m/d/y", origin = structure(c(1,
>> 1, 1970), .Names = c("month", "day", "year")), class = c("dates",
>> "times")), retirePlan = c(1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>> 1, 3, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 1,
>> 2, 2, 2, 2, 3, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1,
>> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1,
>> 3, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2,
>> 1, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,
>> 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 2, 2,
>> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>> 1, 2, 2, 2, 2, 2, 3, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2,
>> 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
>>    ageFedStart = c(20.22, 48.08, 19.24, 27.27, 48.14, 21.47,
>>    50.61, 28.85, 34.32, 31.34, 21.38, 22.83, 27.86, 23.75, 45.37,
>>    19.43, 23.18, 20.1, 32.08, 38.91, 52.61, 30.77, 41.62, 47.86,
>>    39.37, 26.69, 46.7, 19.4, 27.04, 39.31, 61.35, 51.54, 22.02,
>>    36.37, 39.36, 19.94, 53.48, 25.36, 32.95, 26.2, 22.7, 29.41,
>>    29.81, 29.54, 36.43, 21.88, 35.3, 28.12, 41.83, 27.03, 20.34,
>>    41.59, 42.36, 24.27, 22.26, 21.39, 34.25, 25.47, 24.05, 26.15,
>>    42.78, 22.89, 47.52, 30.29, 49.93, 19.35, 41.73, 19.27, 30.28,
>>    53.2, 39.32, 52.18, 27.82, 44.1, 23.06, 27.92, 22.95, 27.62,
>>    28.48, 29.33, 21.51, 25.99, 32.42, 53.94, 43.5, 55.96, 44.74,
>>    19.43, 47.05, 24.07, 44.77, 45.03, 22.92, 19.84, 26.21, 26.89,
>>    22.4, 26.67, 33.81, 24.9, 36.56, 45.45, 41.94, 35.57, 20.26,
>>    24.28, 22.83, 19.97, 38.17, 36.5, 19.08, 48.62, 46.32, 30.99,
>>    22.55, 38.33, 50.13, 41.07, 33.56, 23.5, 26.82, 20.3, 19.13,
>>    25.04, 24.28, 28.22, 28.88, 32.21, 51.14, 25.43, 54.08, 54.07,
>>    33.41, 18.14, 21.48, 18.88, 41.99, 20.19, 23.81, 42.03, 23.66,
>>    40.02, 47.4, 27.2, 33.81, 35.53, 54.43, 22.56, 20.28, 33.98,
>>    37.05, 27.61, 28.7, 42.66, 21.88, 40.18, 42.28, 59.98, 36.38,
>>    23.55, 51.07, 28.15, 21.34, 32.43, 32.25, 20.98, 34.67, 21.75,
>>    50.58, 37.29, 26.45, 38.01, 43.88, 56.59, 19.49, 39.61, 23.57,
>>    30.39, 23.85, 24.05, 43.32, 43.03, 35.76, 30.58, 58.08, 31.56,
>>    24.87, 39.55, 22.75, 23.26, 20.71, 19.69, 30.16, 35.88, 22.14,
>>    38.42, 32.99, 18.28, 37.52, 39.7)), .Names = c("DOB", "srvCompDT",
>> "retirePlan", "ageFedStart"), row.names = c(NA, 200L), class =
>> "data.frame")
>>
>> rrDT <- function(retSys, ageFedStart, birthDT, serviceCompDT){
>>    if(retSys == "CSRS") {
>>        if(ageFedStart < 25) rtDT <- dates(birthDT+(365.25*55))
>>        else if (ageFedStart >= 25 & ageFedStart < 30) rtDT <-
>> dates(serviceCompDT+(365.25*30))
>>        else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>> dates(birthDT+(365.25*60))
>>        else if (ageFedStart >= 40 & ageFedStart < 45) rtDT <-
>> dates(serviceCompDT+(365.25*20))
>>        else if (ageFedStart >= 45 & ageFedStart < 60) rtDT <-
>> dates(birthDT+(365.25*65))
>>        else if (ageFedStart >= 60) rtDT <-
>> dates(serviceCompDT+(365.25*5))
>>        else rtDT <- NA
>>    }
>>    else if (retSys == "FERS") {
>>        if (birthDT < "01/01/53") {
>>            if(ageFedStart < 25) rtDT <- dates(birthDT+(365.25*55))
>>            else if (ageFedStart >= 25 & ageFedStart < 30) rtDT <-
>> dates(serviceCompDT+(365.25*30))
>>            else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>> dates(birthDT+(365.25*60))
>>            else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>> dates(serviceCompDT+(365.25*20))
>>            else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>> dates(birthDT+(365.25*62))
>>            else if (ageFedStart >= 57) rtDT <-
>> dates(serviceCompDT+(365.25*5))
>>            else rtDT <- NA
>>        }
>>        else if (birthDT >= "01/01/53" & birthDT < "01/01/70") {
>>            if(ageFedStart < 26) rtDT <- dates(birthDT+(365.25*56))
>>            else if (ageFedStart >= 27 & ageFedStart < 30) rtDT <-
>> dates(serviceCompDT+(365.25*30))
>>            else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>> dates(birthDT+(365.25*60))
>>            else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>> dates(serviceCompDT+(365.25*20))
>>            else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>> dates(birthDT+(365.25*62))
>>            else if (ageFedStart >= 57) rtDT <-
>> dates(serviceCompDT+(365.25*5))
>>            else rtDT <- NA
>>        }
>>        else if (birthDT >= "01/01/70"){
>>            if(ageFedStart < 27) rtDT <- dates(birthDT+(365.25*56))
>>            else if (ageFedStart >= 27 & ageFedStart < 30) rtDT <-
>> dates(serviceCompDT+(365.25*30))
>>            else if (ageFedStart >= 30 & ageFedStart < 40) rtDT <-
>> dates(birthDT+(365.25*60))
>>            else if (ageFedStart >= 40 & ageFedStart < 42) rtDT <-
>> dates(serviceCompDT+(365.25*20))
>>            else if (ageFedStart >= 42 & ageFedStart < 57) rtDT <-
>> dates(birthDT+(365.25*62))
>>            else if (ageFedStart >= 57) rtDT <-
>> dates(serviceCompDT+(365.25*5))
>>            else rtDT <- NA
>>        }
>>    }
>>    else rtDT <- NA
>>    return(rtDT)
>> }
>>
>> Adrian R. Katschke
>> Data Analytics Specialist
>> Human Capital Program Office
>> Human Resources
>> PH: 317-212-7813
>> DSN: 699-7813
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> -- 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list