[R] Problem with "apply"

Alan Cohen CohenA at smh.toronto.on.ca
Wed Apr 22 20:56:10 CEST 2009


Hi R users,

I am trying to assign ages to age classes for a large data set (123,000 records), and using a for-loop was too slow, so I wrote a function and used apply.  However, the function does not properly assign the first two classes (the rest are fine).  It appears that when age is one digit, it does not get assigned properly.  

I tried to provide a small-scale work-up (at the end of the email) but it does not reproduce the problem; the best I can do is to provide my code and the output below.  As you can see, I've confirmed that age is numeric, that all values are integers, and that pieces of the code work independently.  Any thoughts would be appreciated.  

To add to the mystery, depending which rows of my data set I select, I get different problems.  mds[1:100,] gives the problem above, as do mds[100:200,] , mds[150:250,] and mds[10000:10100,].  However, with mds[200:300,], mds[250:350,] and mds[1000:1100,], only ages with 3 digits are correctly assigned - all ages <100 are returned as NA.

I'm using R v 2.8.1 on Windows XP.

Cheers,
Alan Cohen
Centre for Global Health Research, 
Toronto,ON

> ageassign <- function(x){
+   y <- NA
+   if (x[11] %in% c(0:4)) {y <- "0-4"}
+   else if (x[11] %in% c(5:14)) {y <- "5-14" }
+   else if (x[11] %in% c(15:29)) {y <- "15-29" }
+   else if (x[11] %in% c(30:69)) {y <- "30-69"}
+   else if (x[11] %in% c(70:79)) {y <- "70-79"}
+   else if (x[11] %in% c(80:125)) {y <- "80+"}
+   return(y)
+ }
> jj <- apply(mds[1:100,],1,FUN=ageassign)
> jj
      1       2       3       4       5       6       7       8       9      10      11      12      13 
     NA   "80+" "30-69" "30-69"   "80+"      NA "30-69" "30-69" "70-79" "15-29" "15-29" "30-69" "70-79" 
     14      15      16      17      18      19      20      21      22      23      24      25      26 
  "80+"      NA "30-69" "30-69" "30-69"   "80+"   "80+" "15-29" "70-79" "30-69" "70-79" "70-79" "30-69" 
     27      28      29      30      31      32      33      34      35      36      37      38      39 
"70-79"   "80+"      NA   "80+" "70-79"      NA "15-29" "15-29"      NA      NA "70-79" "30-69" "30-69" 
     40      41      42      43      44      45      46      47      48      49      50      51      52 
"70-79" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "70-79" "15-29" "30-69"      NA "15-29" "30-69" 
     53      54      55      56      57      58      59      60      61      62      63      64      65 
"30-69"      NA "70-79" "30-69" "30-69" "30-69" "30-69" "15-29" "30-69" "30-69" "70-79" "30-69"      NA 
     66      67      68      69      70      71      72      73      74      75      76      77      78 
"30-69" "30-69" "30-69" "30-69" "30-69"   "80+" "30-69"   "80+" "70-79" "30-69" "30-69" "30-69"      NA 
     79      80      81      82      83      84      85      86      87      88      89      90      91 
"30-69" "30-69" "30-69"      NA   "80+" "30-69" "30-69" "30-69"      NA "15-29" "30-69" "30-69" "30-69" 
     92      93      94      95      96      97      98      99     100 
"30-69" "30-69" "30-69" "30-69" "70-79" "30-69" "30-69" "30-69" "30-69" 
> mds[1:100,11]
  [1]  3 82 40 35 82  1 37 57 71 22 21 52 73 86  1 43 60 63 84 88 29 73 69 75 73 43 75 83  4 83 77  1 27
 [34] 15  1  6 76 51 45 71 54 64 69 70 48 38 74 26 37  4 18 63 59  8 78 63 67 62 50 21 66 69 75 57  4 50
 [67] 58 60 61 62 83 69 92 75 30 49 69  1 69 63 69  0 93 64 59 69  2 25 32 60 66 67 54 53 64 79 59 49 59
[100] 64
> table(mds[,11])

   0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19 
3123 6441 3856 2884 1968 1615 1386 1088 1098  721  943  681  511  380  426  835  571  555  719  653 
  20   21   22   23   24   25   26   27   28   29   30   31   32   33   34   35   36   37   38   39 
 879  715  672  631  655  773  680  713  769  538  685  566  729  702  652  766  683  723  821  675 
  40   41   42   43   44   45   46   47   48   49   50   51   52   53   54   55   56   57   58   59 
 774  650  908  892  784  925  781 1043 1161  924 1087  827 1261 1356 1297 1272 1277 1614 1831 1523 
  60   61   62   63   64   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79 
1702 1251 1954 2157 1901 2090 1874 2705 3085 2529 2488 1777 2701 2586 2308 2020 1801 2269 2486 1856 
  80   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96   97   98   99 
1762 1047 1413 1326  967 1013  753  870  884  531  601  277  364  301  193  288  149  174  169  470 
 100  101  102  103  104  105  106  107  108  114  115  117  118  120  125 
  15    2    5    7    2    4    1    1    2    1    1    2    2    2    1 
> mode(mds[,11])
[1] "numeric"

> mds[1,11] %in% c(0:4)
[1] TRUE
> if (mds[1,11] %in% c(0:4)) {y <- "0-4"}
> y
[1] "0-4"

> xx <- matrix(trunc(runif(30,0,125)),15,2)
> aassign <- function(x){
+   y <- NA
+   if (x[2] %in% c(0:4)) {y <- "0-4"}
+   else if (x[2] %in% c(5:14)) {y <- "5-14" }
+   else if (x[2] %in% c(15:29)) {y <- "15-29" }
+   else if (x[2] %in% c(30:69)) {y <- "30-69"}
+   else if (x[2] %in% c(70:79)) {y <- "70-79"}
+   else if (x[2] %in% c(80:125)) {y <- "80+"}
+   return(y)
+ }
> jj <- apply(xx,1,FUN=aassign)
> t(xx)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,]   23   98  107   94   76  103  106   40   66    11   109   101    96    37    18
[2,]   11   57   58   91   43  123  103   77    4    79    64    10     8   105    76
> jj
 [1] "5-14"  "30-69" "30-69" "80+"   "30-69" "80+"   "80+"   "70-79" "0-4"   "70-79" "30-69" "5-14" 
[13] "5-14"  "80+"   "70-79"
> 




More information about the R-help mailing list