[R] Conditional looping over a set of variables in R

David Herzberg davidh at wpspublish.com
Fri Oct 22 19:36:10 CEST 2010


Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.


David S. Herzberg, Ph.D.
Vice President, Research and Development 
Western Psychological Services
12031 Wilshire Blvd.
Los Angeles, CA 90025-1251
Phone: (310)478-2061 x144
FAX: (310)478-7838
email: davidh at wpspublish.com



-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com] 
Sent: Friday, October 22, 2010 9:52 AM
To: David Herzberg; r-help at r-project.org
Subject: RE: [R] Conditional looping over a set of variables in R

You were a bit vague about the format of your data.
I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:

makeData <- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
    # pMissing if proportion of missing values
    m <- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE), 
        nrow, ncol)
    m[runif(nrow * ncol) < pMissing] <- NA
    data.frame(m)
}

E.g.,

  > set.seed(168)
  > d <- makeData(15,3)
  > d
      X1 X2 X3
   1   1  1  1
   2   0  0 NA
   3   0  1  0
   4   0  0 NA
   5   0  1  1
   6   0  0 NA
   7   1  0  0
   8   0  1  1
   9   0  0  1
  10   1  1 NA
  11   0  0  1
  12   0  0  0
  13  NA NA NA
  14   0  0  0
  15   1  0  0

I think the following function does what you want.
The algorithm is pretty similar to what you showed.

  columnOfFirstOne <- function(data) {
      # col will be return value, one entry per row of data.
      # Fill it with NA's: NA in output will mean there were no 1's in row
      col <- rep(as.integer(NA), nrow(data))
      for (j in seq_len(ncol(data))) { # loop over columns
          # For each entry in 'col', if it has not been set yet
          # and this entry the j'th column of data is 1 (and not
missing)
          # then set to the column number.
          col[is.na(col) & !is.na(data[, j]) & data[, j] == 1] <- j
      }
      col # return this from function
  }

With the above data we get
  > columnOfFirstOne(d)
   [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1

It seems quick enough for a dataset of your size
  > dd <- makeData(nrow=1500, ncol=140)
  > system.time(columnOfFirstOne(dd)) # time in seconds
     user  system elapsed 
     0.08    0.00    0.08
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
> Sent: Friday, October 22, 2010 8:34 AM
> To: r-help at r-project.org
> Subject: [R] Conditional looping over a set of variables in R
> 
> Here's the problem I'm trying to solve in R: I have a data frame that 
> consists of about 1500 cases (rows) of data from kids who took a test 
> of listening comprehension. The columns are their scores (1 = correct, 
> 0 = incorrect,  . = missing) on 140 test items. The items are numbered 
> sequentially and are ordered by increasing difficulty as you go from 
> left to right across the columns. I want R to go through the data and 
> find the first correct response for each case. Because of basal and 
> ceiling rules, many cases have missing data on many items before the 
> first correct response appears.
> 
> For each case, I want R to evaluate the item responses sequentially 
> starting with item 1. If the score is 0 or missing, proceed to the 
> next item and evaluate it. If the score is 1, stop the operation for 
> that case, record the item number of that first correct response in a 
> new variable, proceed to the next case, and restart the operation.
> 
> In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
> IF, as follows (assuming the data set is already loaded):
> 
> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
> RESPONSE, SET IT EQUAL TO 0.
> numeric LCfirst1.
> comp LCfirst1 = 0
> 
> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
> vector x=LC1a_score to LC140a_score.
> 
> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
> LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME 
> THE LOOP RUNS.
> loop #i=1 to 140 if (LCfirst1 = 0).
> 
> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH 
> ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES 
> THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM 
> RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR 
> ELELMENTS ARE EVALUATED.
> THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE 
> VECTOR UNTIL A '1' IS ENCOUNTERED.
> + do if x(#i) = 1.
> 
> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, 
> WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
> + comp x(#i) = 99.
> 
> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE 
> VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM 
> NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE 
> OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND 
> THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
> + comp LCfirst1 = #i.
> + end if.
> end loop.
> exe.
> 
> After several hours of trying to translate this procedure to R, I'm 
> stumped. I played around with creating a list to hold the item 
> responses variables (analogous to 'vector' in SPSS), but when I tried 
> to use the list in an R procedure, I kept getting a warning along the 
> lines of  'the list contains > 1 element, only the first element will 
> be used'. So perhaps a list is not the appropriate class to 'hold' 
> these variables?
> 
> It seems that some nested arrangement of 'for' 'while' and/or 'lapply' 
> will allow me to recreate the operation described above? How do I set 
> up the indexing operation analogous to 'loop #i' in SPSS?
> 
> Any help is appreciated, and I'm happy to provide more information if 
> needed.
> 
> David S. Herzberg, Ph.D.
> Vice President, Research and Development Western Psychological 
> Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list