[R] Conditional looping over a set of variables in R

Tue Oct 26 08:41:24 CEST 2010

Hi

r-help-bounces at r-project.org napsal dne 25.10.2010 20:41:55:

> Adrienne, there's one glitch when I implement your solution below. When 
the 
> loop encounters a case with no data at all (that is, all 140 item 
responses 
> are missing), it aborts and prints this error message: " ERROR: argument 
is 
> of length zero".
> 
> I wonder if there's a logical condition I could add that would enable R 
to 
> skip these empty cases and continue executing on the next case that 
contains data.
> 
> Thanks, Dave
> 
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com
> 
> 
> 
> From: wootten.adrienne at gmail.com [mailto:wootten.adrienne at gmail.com] On 
Behalf
> Of Adrienne Wootten
> Sent: Friday, October 22, 2010 9:09 AM
> To: David Herzberg
> Cc: r-help at r-project.org
> Subject: Re: [R] Conditional looping over a set of variables in R
> 
> David,
> 
> here I'm referring to your data as testmat, a matrix of 140 columns and 
1500 
> rows, but the same or similar notation can be applied to data frames in 
R.  If
> I understand correctly, you are looking for the first response (column) 
where 
> you got a value of 1.  I'm assuming also that since your missing values 
are 
> characters then your two numeric values are also characters.  keeping 
all this
> in mind, try something like this.

If you really only want to know which column in each row has first 
occurrence of 1 (or any other value)  you can get rid of looping and use 
other R capabilities.

> set.seed(111)
> mat<-matrix(sample(1:3, 20, replace=T),5,4)
> mat
     [,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]    3    1    2    1
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2
> mat.w<-which(mat==1, arr.ind=T)
> tapply(mat.w[,2], mat.w[,1], min)
2 3 4 5 
2 3 3 2 
> mat[2, ]<-NA
> mat
     [,1] [,2] [,3] [,4]
[1,]    2    2    2    2
[2,]   NA   NA   NA   NA
[3,]    2    2    1    3
[4,]    2    2    1    1
[5,]    2    1    1    2

and this approach smoothly works with NA values too

> mat.w<-which(mat==1, arr.ind=T)
> tapply(mat.w[,2], mat.w[,1], min)
3 4 5 
3 3 2 

You can then use modify such output as you have info about columns and 
rows. I am sure there are other maybe better options, e.g.

lll<-as.list(as.data.frame(t(mat)))
> unlist(lapply(lll, function(x) min(which(x==1))))
 V1  V2  V3  V4  V5 
Inf Inf   3   3   2

Regards
Petr

> 
> first = c() # your extra variable which will eventually contain the 
first 
> correct response for each case
> 
> for(i in 1:nrow(testmat)){
> 
> c = 1
> 
> while( c<=ncol(testmat) | testmat[i,c] != "1" ){
> 
> if( testmat[i,c] == "1"){
> 
> first[i] = c
> break # will exit the while loop once it finds the first correct answer, 
and 
> then jump to the next case
> 
>  } else {
> 
> c=c+1 # procede to the next column if not
> 
> }
> 
> }
> 
> }
> 
> 
> Hope this helps you out a bit.
> 
> Adrienne Wootten
> NCSU
> 
> On Fri, Oct 22, 2010 at 11:33 AM, David Herzberg <davidh at wpspublish.com<
> mailto:davidh at wpspublish.com>> wrote:
> Here's the problem I'm trying to solve in R: I have a data frame that 
consists
> of about 1500 cases (rows) of data from kids who took a test of 
listening 
> comprehension. The columns are their scores (1 = correct, 0 = incorrect, 
 . = 
> missing) on 140 test items. The items are numbered sequentially and are 
> ordered by increasing difficulty as you go from left to right across the 

> columns. I want R to go through the data and find the first correct 
response 
> for each case. Because of basal and ceiling rules, many cases have 
missing 
> data on many items before the first correct response appears.
> 
> For each case, I want R to evaluate the item responses sequentially 
starting 
> with item 1. If the score is 0 or missing, proceed to the next item and 
> evaluate it. If the score is 1, stop the operation for that case, record 
the 
> item number of that first correct response in a new variable, proceed to 
the 
> next case, and restart the operation.
> 
> In SPSS, this operation would be carried out with LOOP, VECTOR, and DO 
IF, as 
> follows (assuming the data set is already loaded):
> 
> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT 
> RESPONSE, SET IT EQUAL TO 0.
> numeric LCfirst1.
> comp LCfirst1 = 0
> 
> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
> vector x=LC1a_score to LC140a_score.
> 
> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS LCfirst1 = 0. 
"#i" IS 
> AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME THE LOOP RUNS.
> loop #i=1 to 140 if (LCfirst1 = 0).
> 
> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH ELEMENT 
OF 
> THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES THE FIRST 
ELEMENT OF 
> THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM RESPONSES). AS THE LOOP 
RUNS 
> AND #i INCREASES, SUBSEQUENT VECTOR ELELMENTS ARE EVALUATED. THE do if 
> STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE VECTOR UNTIL A 
'1' IS 
> ENCOUNTERED.
> + do if x(#i) = 1.
> 
> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT, WHICH 

> RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
> + comp x(#i) = 99.
> 
> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE VALUE 
OF 
> LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM NUMBER OF 
THE 
> FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE OF LCfirst1 
ALSO 
> CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND THE PROGRAM MOVES 
TO THE
> NEXT CASE AND RESTARTS THE LOOP.
> + comp LCfirst1 = #i.
> + end if.
> end loop.
> exe.
> 
> After several hours of trying to translate this procedure to R, I'm 
stumped. I
> played around with creating a list to hold the item responses variables 
> (analogous to 'vector' in SPSS), but when I tried to use the list in an 
R 
> procedure, I kept getting a warning along the lines of  'the list 
contains > 1
> element, only the first element will be used'. So perhaps a list is not 
the 
> appropriate class to 'hold' these variables?
> 
> It seems that some nested arrangement of 'for' 'while' and/or 'lapply' 
will 
> allow me to recreate the operation described above? How do I set up the 
> indexing operation analogous to 'loop #i' in SPSS?
> 
> Any help is appreciated, and I'm happy to provide more information if 
needed.
> 
> David S. Herzberg, Ph.D.
> Vice President, Research and Development
> Western Psychological Services
> 12031 Wilshire Blvd.
> Los Angeles, CA 90025-1251
> Phone: (310)478-2061 x144
> FAX: (310)478-7838
> email: davidh at wpspublish.com<mailto:davidh at wpspublish.com>
> 
> 
> 
>        [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.