[R] Unexpected behavior when giving a value to a new variable based on the value of another variable

peter dalgaard pdalgd at gmail.com
Mon Sep 1 20:10:11 CEST 2014


On 01 Sep 2014, at 13:08 , Angel Rodriguez <angel.rodriguez at matiainstituto.net> wrote:

> Thank you John, Jim, Jeff and both Davids for your answers.
> 
> After trying different combinations of values for the variable samplem, it looks like if age is greater than 65, R applies the correct code 1 whatever the value of samplem, but if age is less than 65, it just copies the values of samplem to sample. I do not understand why it does so.
> 

It's because indexed assignment is really (white lie alert: it's actually worse)

N$sample <- `[<-`(`$`(N, `sample`), index, value)

and since N$sample isn't there from the outset, partial matching kicks in for the `$`bit and makes the right hand side equivalent to the same thing with `samplem`. The result still gets assigned to N$sample, but the value is the same that N$samplem would get from

N$samplem[N$age >= 65] <- 1

Notice the difference if you do

> N$sample <- NA
> N$sample[N$age >= 65] <- 1 
> N
  age samplem sample
1  67      NA      1
2  62       1     NA
3  74       1      1
4  61       1     NA
5  60       1     NA
6  55       1     NA
7  60       1     NA
8  59       1     NA
9  58      NA     NA

-pd

> In any case, Jim's syntax work very well, although I do not understand why either.
> 
> Answering to Jim, I just wanted a variable that could identify individuals with some characteristics (not only age, as in this example that has been oversimplified).
> 
> Best regards,
> 
> Angel Rodriguez-Laso
> 
> 
> -----Mensaje original-----
> De: John McKown [mailto:john.archie.mckown at gmail.com]
> Enviado el: vie 29/08/2014 14:46
> Para: Angel Rodriguez
> CC: r-help
> Asunto: Re: [R] Unexpected behavior when giving a value to a new variable based on the value of another variable
> 
> On Fri, Aug 29, 2014 at 3:53 AM, Angel Rodriguez
> <angel.rodriguez at matiainstituto.net> wrote:
>> 
>> Dear subscribers,
>> 
>> I've found that if there is a variable in the dataframe with a name very similar to a new variable, R does not give the correct values to this latter variable based on the values of a third value:
>> 
>> 
> <snip>
>> 
>> Any clue for this behavior?
>> 
> <snip>
>> 
>> Thank you very much.
>> 
>> Angel Rodriguez-Laso
>> Research project manager
>> Matia Instituto Gerontologico
> 
> That is unusual, but appears to be documented in a section from
> 
> ?`[`
> 
> <quote>
> Character indices
> 
> Character indices can in some circumstances be partially matched (see
> pmatch) to the names or dimnames of the object being subsetted (but
> never for subassignment). Unlike S (Becker et al p. 358)), R never
> uses partial matching when extracting by [, and partial matching is
> not by default used by [[ (see argument exact).
> 
> Thus the default behaviour is to use partial matching only when
> extracting from recursive objects (except environments) by $. Even in
> that case, warnings can be switched on by
> options(warnPartialMatchDollar = TRUE).
> 
> Neither empty ("") nor NA indices match any names, not even empty nor
> missing names. If any object has no names or appropriate dimnames,
> they are taken as all "" and so match nothing.
> </quote>
> 
> Note the commend about "partial matching" in the middle paragraph in
> the quote above.
> 
> -- 
> There is nothing more pleasant than traveling and meeting new people!
> Genghis Khan
> 
> Maranatha! <><
> John McKown
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list