[R] Replace missing value within group with non-missing value

Leask, Graham g.leask at aston.ac.uk
Sun Apr 7 10:22:03 CEST 2013


Hi Bill,

Thank you for your suggestion.

I shall try running the code and test as you suggest.

Is there a straightforward way to routinely test the  structure of a complex survey data set such as this?
For example with a multinomial choice model such as this for the data to be correct for each
observation set of say 6 choices there can be only 1 choice selected and 1 non-missing month. This can
however be an issue to check when dealing with very large datasets.

Presumably if more than one choice in a set is positive this will show by the model failing to converge
due to singularity but this should have been detected at the data cleaning stage.

Best wishes


Graham

-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: 06 April 2013 22:49
To: Rui Barradas; Leask, Graham
Cc: r-help at r-project.org
Subject: RE: [R] Replace missing value within group with non-missing value

> Anyway, try replacing the lapply instruction with this.
> 
> tmp <- lapply(sp, function(x){
> 		idx <- which(!is.na(x$mth))[1]
> 		if(length(idx) > 0)
> 			x$mth <- x$mth[idx]
> 		x
> 	})

Note that
    which(anyLogicalVector)[1]
always has length 1, because of the subscript [1], so the 'if' statement may as well be omitted.

There are  2 cases the above code does not detect or deal with.
  (a) nrow(x)==0
  (b) all(is.na(x$mth))
  (c) length(which(is.na(x$mth))) > 1
Case (a) causes the function to stop in way you saw:
  > f <- function(x) { # the function passed to lapply
  +    idx <- which(!is.na(x$mth))[1]
  +    if (length(idx) > 0)
  +       x$mth <- x$mth[idx]
  +    x
  + }
  > f(data.frame(mth=integer()))
  Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_integer_) : 
    replacement has 1 rows, data has 0
but (b) and (c) may indicate some errors in your data and cause some surprises down the line.
  >  f(data.frame(mth=c(NA,NA)))
    mth
  1  NA
  2  NA
  >  f(data.frame(mth=c(NA,2,3)))
   mth
  1   2
  2   2
  3   2

You could have your code check whether there is exactly one non-missing value for mth in each non-empty group and warn if that assumption is not true for some group (but also return some reasonable result)?  The following does
that:
f2 <- function (x)  {
    idx <- !is.na(x$mth) # logical vector with length nrow(x)
    nNotNA <- sum(idx)
    if (nNotNA > 1) {
        warning("more than one non-missing mth value in group, using the first")
        idx[cumsum(idx) > 1] <- FALSE
    }
    else if (nrow(x) > 0 && nNotNA == 0) {
        warning("no non-missing values in group, all mth values will be NA")
        idx[1] <- TRUE
    }
    x$mth <- x$mth[idx]
    x
}

The error messages do not say where in 'sp' the problem arose.  You could change your lapply call so the group number was in the warning:
   lapply(seq_along(sp), function(i) {
      x <- sp[[i]]
      ... same code as in f2, but add the group number, i,  to the end of warnings ...
           warning("more than one ... in group number", i)
      ...
   })

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Rui Barradas
> Sent: Saturday, April 06, 2013 10:24 AM
> To: Leask, Graham
> Cc: r-help at r-project.org
> Subject: Re: [R] Replace missing value within group with non-missing 
> value
> 
> Hello,
> 
> I've just run my code with your data and found no error. Anyway, try 
> replacing the lapply instruction with this.
> 
> 
> tmp <- lapply(sp, function(x){
> 		idx <- which(!is.na(x$mth))[1]
> 		if(length(idx) > 0)
> 			x$mth <- x$mth[idx]
> 		x
> 	})
> 
> 
> Rui Barradas
> 
> Em 06-04-2013 18:12, Leask, Graham escreveu:
> > Hi Arun,
> >
> > How odd. Directly pasting the code from your email precisely repeats the error.
> > See below. Any thoughts on the cause of this anomaly?
> >
> >> dput(head(dat,50))
> > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), obs = c(1, 1, 1, 1, 1, 1, 2, 2, 
> > 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 
> > 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 9, 9), choice = 
> > c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 
> > 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 
> > 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 
> > 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 
> > 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth = c(NA, NA, NA, NA, NA, 
> > 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489, 
> > NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA, 
> > NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = 
> > c("dn", "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", 
> > "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
> > "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", 
> > "27", "28", "29", "30", "31", "32", "33", "34", "35", "36", "37", 
> > "38", "39", "40", "41", "42", "43", "44", "45", "46", "47", "48", 
> > "49", "50"), class = "data.frame")
> >> sp <- split(dat, list(dat$dn, dat$obs))
> >>   names(sp) <- NULL
> >>   tmp <- lapply(sp, function(x){
> > +          idx <- which(!is.na(x$mth))[1]
> > +          x$mth <- x$mth[idx]
> > +          x
> > +      })
> > Error in `$<-.data.frame`(`*tmp*`, "mth", value = NA_real_) :
> >    replacement has 1 rows, data has 0
> >>   head(do.call(rbind, tmp),7)
> > Error in do.call(rbind, tmp) : object 'tmp' not found
> >
> > Best wishes
> >
> >
> > Graham
> >
> > -----Original Message-----
> > From: arun [mailto:smartpink111 at yahoo.com]
> > Sent: 06 April 2013 17:25
> > To: Leask, Graham
> > Cc: Rui Barradas
> > Subject: Re: [R] Replace missing value within group with non-missing 
> > value
> >
> > Hello,
> > By running Rui's code, I am getting this:
> > sp <- split(dat, list(dat$dn, dat$obs))
> >   names(sp) <- NULL
> >   tmp <- lapply(sp, function(x){
> >           idx <- which(!is.na(x$mth))[1]
> >           x$mth <- x$mth[idx]
> >           x
> >       })
> >   head(do.call(rbind, tmp),7)
> >     dn obs choice br mth
> > 1   4   1      0  1 487
> > 2   4   1      0  2 487
> > 3   4   1      0  3 487
> > 4   4   1      0  4 487
> > 5   4   1      0  5 487
> > 6   4   1      1  6 487
> > 7   4   2      0  1 488
> >
> > Couldn't reproduce the error you cited.
> > A.K.
> >
> >
> >
> >
> > ----- Original Message -----
> > From: "Leask, Graham" <g.leask at aston.ac.uk>
> > To: Rui Barradas <ruipbarradas at sapo.pt>
> > Cc: "r-help at r-project.org" <r-help at r-project.org>
> > Sent: Saturday, April 6, 2013 12:16 PM
> > Subject: Re: [R] Replace missing value within group with non-missing 
> > value
> >
> > Hi Rui,
> >
> > Data as follows
> >
> > structure(list(dn = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
> > 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), 
> obs = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 
> 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 
> 8, 8, 8, 8, 9, 9), choice = c(0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 
> 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 
> 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0), br = c(1, 2, 3, 4, 5, 6, 1, 
> 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 
> 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2), mth = 
> c(NA, NA, NA, NA, NA, 487, NA, NA, 488, NA, NA, NA, NA, NA, NA, NA, NA, 488, NA, NA, 489, NA, NA, NA, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 489, NA, NA, NA, NA, NA, 490, NA, NA, NA, NA, NA, 491, NA, NA)), .Names = c("dn", "obs", "choice", "br", "mth"), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"!
>  , "12",
> "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", 
> "24", "25", "26", "27", "28", "29", "30", "31", "32", "33", "34", 
> "35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", 
> "46", "47", "48", "49", "50"), class = "data.frame")
> >
> > Best wishes
> >
> >
> > Graham
> >
> > -----Original Message-----
> > From: Rui Barradas [mailto:ruipbarradas at sapo.pt]
> > Sent: 06 April 2013 16:32
> > To: Leask, Graham
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Replace missing value within group with non-missing 
> > value
> >
> > Hello,
> >
> > Can't you post a data example? If your dataset is named 'dat' use
> >
> > dput(head(dat, 50))  # paste the output of this in a post
> >
> >
> > Rui Barradas
> >
> > Em 06-04-2013 15:34, Leask, Graham escreveu:
> >> Hi Rui,
> >>
> >> Thank you for your suggestion which is very much appreciated. 
> >> Unfortunately running
> this code produces the following error.
> >>
> >> error in '$<-.data.frame' ('*tmp*', "mth", value = NA_real_) :
> >>        replacement has 1 rows, data has 0
> >>
> >> I'm sure there must be an elegant solution to this problem?
> >>
> >> Best wishes
> >>
> >>
> >>
> >> Graham
> >>
> >> On 6 Apr 2013, at 12:15, "Rui Barradas" <ruipbarradas at sapo.pt> wrote:
> >>
> >>> Hello,
> >>>
> >>> That's not a very good way of posting your data, preferably paste 
> >>> the output of
> ?dput in a post.
> >>> Some thing along the lines of the following might do what you want.
> >>> It seems that the groups are established by 'dn' and 'obs' numbers.
> >>> If so, try
> >>>
> >>>
> >>> # Make up some data
> >>> dat <- data.frame(dn = 4, obs = rep(1:5, each = 6), mth = NA) 
> >>> dat$mth[6] <- 487 dat$mth[9] <- 488 dat$mth[18] <- 488 dat$mth[21] 
> >>> <-
> >>> 489 dat$mth[30] <- 489
> >>>
> >>>
> >>> sp <- split(dat, list(dat$dn, dat$obs))
> >>> names(sp) <- NULL
> >>> tmp <- lapply(sp, function(x){
> >>>           idx <- which(!is.na(x$mth))[1]
> >>>           x$mth <- x$mth[idx]
> >>>           x
> >>>       })
> >>> do.call(rbind, tmp)
> >>>
> >>>
> >>> Hope this helps,
> >>>
> >>> Rui Barradas
> >>>
> >>>
> >>> Em 06-04-2013 11:33, Leask, Graham escreveu:
> >>>> Dear List members
> >>>>
> >>>> I have a large dataset organised in choice groups see sample 
> >>>> below
> >>>>
> >>>>
> >>>> +----------------------------------------------------------------
> >>>> +----
> >>>> -----------------------------+
> >>>>         | dn   obs   choice      acid   br                 date
> >>>> cdate   situat~n   mth   year   set |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|
> >>>>      1. |  4     1        0     LOSEC    1                    .
> >>>> .                .      .     1 |
> >>>>      2. |  4     1        0    NEXIUM    2                    .
> >>>> .                .      .     1 |
> >>>>      3. |  4     1        0    PARIET    3                    .
> >>>> .                .      .     1 |
> >>>>      4. |  4     1        0   PROTIUM    4                    .
> >>>> .                .      .     1 |
> >>>>      5. |  4     1        0    ZANTAC    5                    .
> >>>> .                .      .     1 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|
> >>>>      6. |  4     1        1     ZOTON    6   23aug2000 01:00:00
> >>>> 23aug2000         NS   487   2000     1 |
> >>>>      7. |  4     2        0     LOSEC    1                    .
> >>>> .                .      .     2 |
> >>>>      8. |  4     2        0    NEXIUM    2                    .
> >>>> .                .      .     2 |
> >>>>      9. |  4     2        1    PARIET    3   25sep2000 01:00:00
> >>>> 25sep2000          L   488   2000     2 |  10. |  4     2        0
> >>>> PROTIUM    4                    .           .                .      .
> >>>> 2 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|  11. |  4     2        0    ZANTAC
> >>>> 5                    .           .                .      .     2 |
> >>>> 12. |  4     2        0     ZOTON    6                    .
> >>>> .                .      .     2 |  13. |  4     3        0     LOSEC
> >>>> 1                    .           .                .      .     3 |
> >>>> 14. |  4     3        0    NEXIUM    2                    .
> >>>> .                .      .     3 |  15. |  4     3        0    PARIET
> >>>> 3                    .           .                .      .     3 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|  16. |  4     3        0   PROTIUM
> >>>> 4                    .           .                .      .     3 |
> >>>> 17. |  4     3        0    ZANTAC    5                    .
> >>>> .                .      .     3 |  18. |  4     3        1     ZOTON
> >>>> 6   20sep2000 00:00:00   20sep2000          R   488   2000     3 |
> >>>> 19. |  4     4        0     LOSEC    1                    .
> >>>> .                .      .     4 |  20. |  4     4        0    NEXIUM
> >>>> 2                    .           .                .      .     4 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|  21. |  4     4        1    PARIET
> >>>> 3   27oct2000 00:00:00   27oct2000         NL   489   2000     4 |
> >>>> 22. |  4     4        0   PROTIUM    4                    .
> >>>> .                .      .     4 |  23. |  4     4        0    ZANTAC
> >>>> 5                    .           .                .      .     4 |
> >>>> 24. |  4     4        0     ZOTON    6                    .
> >>>> .                .      .     4 |  25. |  4     5        0     LOSEC
> >>>> 1                    .           .                .      .     5 |
> >>>>
> >>>> |----------------------------------------------------------------
> >>>> |----
> >>>> -----------------------------|  26. |  4     5        0    NEXIUM
> >>>> 2                    .           .                .      .     5 |
> >>>> 27. |  4     5        0    PARIET    3                    .
> >>>> .                .      .     5 |  28. |  4     5        0   PROTIUM
> >>>> 4                    .           .                .      .     5 |
> >>>> 29. |  4     5        0    ZANTAC    5                    .
> >>>> .                .      .     5 |  30. |  4     5        1     ZOTON
> >>>> 6   23oct2000 03:00:00   23oct2000         NS   489   2000     5 |
> >>>>
> >>>> I wish to fill in the missing values in each choice set - 
> >>>> delineated by dn (Doctor) obs
> (Observation number) and choices (1 to 6).
> >>>> For each choice set one choice is chosen which contains full time 
> >>>> information for that choice set ie in set 1 choice 6 was chosen 
> >>>> and shows the
> month 487. The other 5 choices show mth as missing. I want to fill 
> these with the correct mth.
> >>>>
> >>>> I am sure there must be an elegant way to do this in R?
> >>>>
> >>>>
> >>>> Best wishes
> >>>>
> >>>>
> >>>>
> >>>> Graham
> >>>>
> >>>>
> >>>>       [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list 
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list