[R] Advice on recoding a variable depending on another which contains NAs

Anthony Staines anthony.staines at dcu.ie
Sun Nov 20 00:31:04 CET 2011

```Dear colleagues,

I would be very grateful for your help with the following. I
have banged my head off this question several times in the
past, and repeatedly over the last week. I have looked in
the usual places and found no obvious solution. I fear that
this just means I didn't recognize it, but I'd be very

I am scoring 8000 psychometric tests - the SCQ, if you have
heard of it. On this test the scoring rules depends on one
variable SCQ1 - if this is answered yes, the final score is
a function of 39 variables, and if no, of 31 variables.

I've calculated both of these scores (SCQScore1 and
SCQScore2)for all the children in my study, and I wish to
create a final score, which is SCQScore1 when SCQ1 is 1, and
SCQScore2 when SCQ1 is 2. There are also missing values for
SCQ1, and I have chosen, for the moment, to set the final
score to SCQScore1 for these. [[This is a debatable choice,

d\$SCQScore <- 99
##Distinct value for any other values I've missed

d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences, so sum S2CQ:SCQ40

d\$SCQScore[SCQ1 == 2] <- d\$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40

d\$SCQScore[is.na(d\$SCQ1)] <- d\$SCQScore1 [is.na(d\$SCQ1)]
## SCQ1 is missing

This fails on line 2
(d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1])
with the error message
"NAs are not allowed in subscripted assignments",
presumably because SCQ1 does indeed contain missing values.

This can be fixed, got around, or otherwise bypassed, by
creating a new variable SCQ1, with no missing values, as
shown :-

SCQ1 <- d\$SCQ1
SCQ1[is.na(SCQ1)] <- 3

d\$SCQScore[SCQ1 == 1] <- d\$SCQScore1[SCQ1 == 1]
## Talks using phrases/sentences so sum S2CQ:SCQ40
d\$SCQScore[SCQ1 == 2] <- d\$SCQScore2[SCQ1 == 2]
## Can't do this, so sum SCQ8:SCQ40
d\$SCQScore[SCQ1 == 3] <- d\$SCQScore1[SCQ1 == 3]
## We don't know if he/she can talk, so guess - sum S2:S40

This type of thing is a common problem in my little world.
Is there a better/less klutzy/smarter way of solving it than
creating a new variable each time? Please bear in mind that
it is critical, for later analysis, to keep the missing
values in SCQ1.

Best wishes,
Anthony Staines
--
Anthony Staines, Professor of Health Systems,
School of Nursing and Human Sciences, DCU, Dublin 9,Ireland.
Tel:- +353 1 700 7807. Mobile:- +353 86 606 9713
http://astaines.eu/
```