[R] creating mulptiple new variables from one data.frame according to columns and rows in that frame

jim holtman jholtman at gmail.com
Wed Nov 4 15:29:37 CET 2009


My guess is that we are being affected by FAQ 7.31 (good old floating
point numbers).  The test 'age %in% 5:50' might be affected by round
off.  Something like the following might be better:

age < 5 | (abs(age - round(age)) < 0.001)

This should give TRUE for all ages that are 'close' to the year.  Take
a look at your data where you thing values might be missing and set
'options(digit=20)' to print out the full values.

On Wed, Nov 4, 2009 at 8:03 AM, Hayes, Daniel <D.J.Hayes at liverpool.ac.uk> wrote:
> Jim Holtman,
> Thank you for your reply.
> Your script is very concise and I think it could help me.
> However when I run it on my real data object (musigma.lat.m) the age range from 5-50 skips certain full years (see script below).
> Am not sure why that is and no error is given.
> Hoping you can help.
>
> Thank you in advance for your time and energy.
> All the best,
> Daniel
>
>> dput(musigma.lat.m[580:620,])
> structure(list(age = c(48.25, 48.3333333333333, 48.4166666666667,
> 48.5, 48.5833333333333, 48.6666666666667, 48.75, 48.8333333333333,
> 48.9166666666667, 49, 49.0833333333333, 49.1666666666667, 49.25,
> 49.3333333333333, 49.4166666666667, 49.5, 49.5833333333333, 49.6666666666667,
> 49.75, 49.8333333333333, 49.9166666666667, 50, 0, 0.0833333333333333,
> 0.166666666666667, 0.25, 0.333333333333333, 0.416666666666667,
> 0.5, 0.583333333333333, 0.666666666666667, 0.75, 0.833333333333333,
> 0.916666666666667, 1, 1.08333333333333, 1.16666666666667, 1.25,
> 1.33333333333333, 1.41666666666667, 1.5), country = structure(c(1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Bolivia", "Brazil",
> "Colombia", "Dominican Rep.", "El Salvador", "Guatemala", "Guyana",
> "Haiti", "Honduras", "Nicaragua", "Paraguay", "Peru", "Suriname"
> ), class = "factor"), mu = c(10.7198320154036, 10.7193221119285,
> 10.7188036231439, 10.7182764259851, 10.7177406001273, 10.7171962535812,
> 10.7166435245754, 10.7160826629999, 10.7155141252060, 10.7149385933270,
> 10.7143568116012, 10.7137696820872, 10.7131779280271, 10.7125822168258,
> 10.7119832145823, 10.7113816139594, 10.7107780960397, 10.7101732860418,
> 10.7095677728307, 10.7089620284128, 10.7083564497153, 10.7077512971194,
> 11.548875536071, 11.4634458099448, 11.4113675486745, 11.3384424250672,
> 11.2435706626324, 11.1313969585720, 11.0086560681222, 10.8827443523793,
> 10.7598371816865, 10.6440424747848, 10.5382165128003, 10.4442220905656,
> 10.3633207905823, 10.2961499250469, 10.2427320635721, 10.2025802100475,
> 10.1749531325293, 10.1590477762319, 10.1540156426321), sigma = c(0.0947487228789027,
> 0.0947760295260326, 0.0948033853581562, 0.0948307832769866, 0.094858216728106,
> 0.0948856796527442, 0.0949131660004063, 0.0949406718763748, 0.0949681949273155,
> 0.0949957322607503, 0.0950232806230888, 0.095050836445582, 0.0950783958990592,
> 0.0951059550037287, 0.0951335102859937, 0.0951610590705954, 0.0951885984623664,
> 0.0952161256367413, 0.0952436392777666, 0.0952711384472643, 0.0952986226318235,
> 0.0953260918098295, 0.108394172852678, 0.112555919942990, 0.114345649992535,
> 0.115763779372203, 0.116984886895669, 0.118065092089138, 0.119029362771532,
> 0.119887968678076, 0.120638553936562, 0.121278180095107, 0.121810743569063,
> 0.122245010348365, 0.122590801228219, 0.122858869689557, 0.123059216409329,
> 0.123199542683827, 0.123286339009648, 0.123324768295488, 0.123319375423601
> )), .Names = c("age", "country", "mu", "sigma"), row.names = c("580",
> "581", "582", "583", "584", "585", "586", "587", "588", "589",
> "590", "591", "592", "593", "594", "595", "596", "597", "598",
> "599", "600", "601", "602", "603", "604", "605", "606", "607",
> "608", "609", "610", "611", "612", "613", "614", "615", "616",
> "617", "618", "619", "620"), class = "data.frame")
>>
>> result <- lapply(split(musigma.lat.m, musigma.lat.m$country), function(.ctry){
> +      # keep all < 5 and only integers over 5
> +      subset(.ctry, .ctry$age < 5 | .ctry$age %in% 5:50)
> +  })
>>
>> result
> $Bolivia
>            age country       mu      sigma
> 1    0.00000000 Bolivia 11.42168 0.10148719
> 2    0.08333333 Bolivia 11.33625 0.10538375
> 3    0.16666667 Bolivia 11.28417 0.10705943
> 4    0.25000000 Bolivia 11.21125 0.10838720
> 5    0.33333333 Bolivia 11.11637 0.10953050
> ...
> 59   4.83333333 Bolivia 10.49080 0.10671819
> 60   4.91666667 Bolivia 10.48562 0.10653400
> 109  9.00000000 Bolivia 10.43279 0.10180158
> 133 11.00000000 Bolivia 10.33394 0.10160484
> 169 14.00000000 Bolivia 10.24878 0.09946659
> 193 16.00000000 Bolivia 10.20148 0.09694376
> 205 17.00000000 Bolivia 10.16589 0.09573946
>
> $Brazil
>             age country       mu      sigma
> 602   0.00000000  Brazil 11.54888 0.10839417
> 603   0.08333333  Brazil 11.46345 0.11255592
> 604   0.16666667  Brazil 11.41137 0.11434565
> 605   0.25000000  Brazil 11.33844 0.11576378
> ...
> 660   4.83333333  Brazil 10.61799 0.11398118
> 661   4.91666667  Brazil 10.61281 0.11378445
> 710   9.00000000  Brazil 10.55999 0.10872996
> 734  11.00000000  Brazil 10.46113 0.10851983
> 770  14.00000000  Brazil 10.37597 0.10623606
> 794  16.00000000  Brazil 10.32867 0.10354153
>
> -----Original Message-----
> From: jim holtman [mailto:jholtman at gmail.com]
> Sent: 04 November 2009 03:12
> To: Hayes, Daniel
> Cc: r-help at lists.R-project.org
> Subject: Re: [R] creating mulptiple new variables from one data.frame according to columns and rows in that frame
>
> try this:
>
>> x <- read.table(textConnection("          Age(yrs) country       mu     sigma
> + 1   0.00000000   Bolivia 11.42168 0.1014872
> + 2   0.08333333   Bolivia 11.33625 0.1053837
> + 3   0.16666667   Bolivia 11.28417 0.1070594
> + 4   0.25000000   Bolivia 11.21125 0.1083872
> + 5   0.33333333   Bolivia 11.11637 0.1095305
> + 5.1   5  Bolivia 11.11637 0.1095305
> + 5.2   5.5   Bolivia 11.11637 0.1095305
> + 5.3   6   Bolivia 11.11637 0.1095305
> + 5.4   20   Bolivia 11.11637 0.1095305
> + 5.5   20.1   Bolivia 11.11637 0.1095305
> + 5.6   50   Bolivia 11.11637 0.1095305
> + 602  0.00000000  Brazil 11.54888 0.10839417
> + 603  0.08333333  Brazil 11.46345 0.11255592
> + 604  0.16666667  Brazil 11.41137 0.11434565
> + 605  0.25000000  Brazil 11.33844 0.11576378
> + 606  0.33333333  Brazil 11.24357 0.11698489"), header=TRUE)
>> closeAllConnections()
>> result <- lapply(split(x, x$country), function(.ctry){
> +     # keep all < 5 and only integers over 5
> +     subset(.ctry, .ctry$Age.yrs. < 5 | .ctry$Age.yrs. %in% 5:50)
> + })
>>
>> result
> $Bolivia
>       Age.yrs. country       mu     sigma
> 1    0.00000000 Bolivia 11.42168 0.1014872
> 2    0.08333333 Bolivia 11.33625 0.1053837
> 3    0.16666667 Bolivia 11.28417 0.1070594
> 4    0.25000000 Bolivia 11.21125 0.1083872
> 5    0.33333333 Bolivia 11.11637 0.1095305
> 5.1  5.00000000 Bolivia 11.11637 0.1095305
> 5.3  6.00000000 Bolivia 11.11637 0.1095305
> 5.4 20.00000000 Bolivia 11.11637 0.1095305
> 5.6 50.00000000 Bolivia 11.11637 0.1095305
>
> $Brazil
>      Age.yrs. country       mu     sigma
> 602 0.00000000  Brazil 11.54888 0.1083942
> 603 0.08333333  Brazil 11.46345 0.1125559
> 604 0.16666667  Brazil 11.41137 0.1143456
> 605 0.25000000  Brazil 11.33844 0.1157638
> 606 0.33333333  Brazil 11.24357 0.1169849
>
>
> On Tue, Nov 3, 2009 at 9:31 AM, Hayes, Daniel <D.J.Hayes at liverpool.ac.uk> wrote:
>> Dear R-helpers,
>>
>> I have a data.frame (bcpe.lat.m) containing 13 countries, ages 0-50yrs per month, and the corresponding mu&sigma (see below).
>>
>> *        I would like to limit the age range to include all 12 months for the 1st 5 years and only whole years for all ages thereafter for each of the countries present in the data frame.
>>
>> *        I would like to create separate data.frames according to the country the data is from (Bolivia.bcpe.lat.m, brazil.bcpe.lat.m, etc)
>>
>>
>> I have tried using:  c(seq(0,5,1/12),seq(5,50,1) )  to select the desired ages but am unsure how to repeat that sequence for consecutive countries.
>> I have tried using: split(bcpe.lat.m, bcpe.lat.m$country) But end up with a string which I am no longer to select the specific ages I want and all the data still remains in one  variable
>> Have also looked a 'by', 'apply' and things like 'for (i in 1:13)'
>>
>> Help with either or both steps would be greatly appreciated.
>>
>> Greetings from Formentera,
>> Daniel
>>
>>           Age(yrs) country       mu     sigma
>> 1   0.00000000   Bolivia 11.42168 0.1014872
>> 2   0.08333333   Bolivia 11.33625 0.1053837
>> 3   0.16666667   Bolivia 11.28417 0.1070594
>> 4   0.25000000   Bolivia 11.21125 0.1083872
>> 5   0.33333333   Bolivia 11.11637 0.1095305
>> ...
>> 602  0.00000000  Brazil 11.54888 0.10839417
>> 603  0.08333333  Brazil 11.46345 0.11255592
>> 604  0.16666667  Brazil 11.41137 0.11434565
>> 605  0.25000000  Brazil 11.33844 0.11576378
>> 606  0.33333333  Brazil 11.24357 0.11698489
>> ...
>>
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?




More information about the R-help mailing list