[R] Advice on recoding a variable depending on another which contains NAs

Anthony Staines anthony.staines at dcu.ie
Sun Nov 20 00:31:04 CET 2011

Dear colleagues,

I would be very grateful for your help with the following. I 
have banged my head off this question several times in the 
past, and repeatedly over the last week. I have looked in 
the usual places and found no obvious solution. I fear that 
this just means I didn't recognize it, but I'd be very 
grateful for your help.

I am scoring 8000 psychometric tests - the SCQ, if you have 
heard of it. On this test the scoring rules depends on one 
variable SCQ1 - if this is answered yes, the final score is 
a function of 39 variables, and if no, of 31 variables.

I've calculated both of these scores (SCQScore1 and 
SCQScore2)for all the children in my study, and I wish to 
create a final score, which is SCQScore1 when SCQ1 is 1, and 
SCQScore2 when SCQ1 is 2. There are also missing values for 
SCQ1, and I have chosen, for the moment, to set the final 
score to SCQScore1 for these. [[This is a debatable choice, 
but I am not asking your advice on that choice!]]

d$SCQScore <- 99
	##Distinct value for any other values I've missed

d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]
	## Talks using phrases/sentences, so sum S2CQ:SCQ40

d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2]
	## Can't do this, so sum SCQ8:SCQ40

d$SCQScore[is.na(d$SCQ1)] <- d$SCQScore1 [is.na(d$SCQ1)]
	## SCQ1 is missing

This fails on line 2
(d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1])
  with the error message
"NAs are not allowed in subscripted assignments",
presumably because SCQ1 does indeed contain missing values.

This can be fixed, got around, or otherwise bypassed, by 
creating a new variable SCQ1, with no missing values, as 
shown :-

SCQ1 <- d$SCQ1
SCQ1[is.na(SCQ1)] <- 3

d$SCQScore[SCQ1 == 1] <- d$SCQScore1[SCQ1 == 1]
	## Talks using phrases/sentences so sum S2CQ:SCQ40
d$SCQScore[SCQ1 == 2] <- d$SCQScore2[SCQ1 == 2]
	## Can't do this, so sum SCQ8:SCQ40
d$SCQScore[SCQ1 == 3] <- d$SCQScore1[SCQ1 == 3]
	## We don't know if he/she can talk, so guess - sum S2:S40

This type of thing is a common problem in my little world. 
Is there a better/less klutzy/smarter way of solving it than 
creating a new variable each time? Please bear in mind that 
it is critical, for later analysis, to keep the missing 
values in SCQ1.

Best wishes,
Anthony Staines
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713

More information about the R-help mailing list