[R] Strange behavior when sampling rows of a data frame

Rui Barradas ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jun 19 19:37:51 CEST 2020


Hello,


Thanks, I hadn't thought of that.

But, why? Is it evaluated once before assignment and a second time when 
the assignment occurs?

To trace both sample and `[<-` gives 2 calls to sample.


trace(sample)
trace(`[<-`)
df[sample(nrow(df), 3),]$treated <- TRUE
trace: sample(nrow(df), 3)
trace: `[<-`(`*tmp*`, sample(nrow(df), 3), , value = list(unit = c(7L,
6L, 8L), treated = c(TRUE, TRUE, TRUE)))
trace: sample(nrow(df), 3)


Regards,

Rui Barradas


Às 17:20 de 19/06/2020, William Dunlap escreveu:
> The first subscript argument is getting evaluated twice.
> > trace(sample)
> > set.seed(2020); df[i<-sample(10,3), ]$Treated <- TRUE
> trace: sample(10, 3)
> trace: sample(10, 3)
> > i
> [1]  1 10  4
> > set.seed(2020); sample(10,3)
> trace: sample(10, 3)
> [1] 7 6 8
> > sample(10,3)
> trace: sample(10, 3)
> [1]  1 10  4
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com <http://tibco.com>
>
>
> On Fri, Jun 19, 2020 at 8:46 AM Rui Barradas <ruipbarradas using sapo.pt 
> <mailto:ruipbarradas using sapo.pt>> wrote:
>
>     Hello,
>
>     I don't have an answer on the reason why this happens but it seems
>     like
>     a bug. Where?
>
>     In which of  `[<-.data.frame` or `[<-.default`?
>
>     A solution is to subset and assign the vector:
>
>
>     set.seed(2020)
>     df2 <- data.frame(unit = 1:10)
>     df2$treated <- FALSE
>
>     df2$treated[sample(nrow(df2), 3)] <- TRUE
>     df2
>     #  unit treated
>     #1     1   FALSE
>     #2     2   FALSE
>     #3     3   FALSE
>     #4     4   FALSE
>     #5     5   FALSE
>     #6     6    TRUE
>     #7     7    TRUE
>     #8     8    TRUE
>     #9     9   FALSE
>     #10   10   FALSE
>
>
>     Or
>
>
>     set.seed(2020)
>     df3 <- data.frame(unit = 1:10)
>     df3$treated <- FALSE
>
>     df3[sample(nrow(df3), 3), "treated"] <- TRUE
>     df3
>     # result as expected
>
>
>     Hope this helps,
>
>     Rui  Barradas
>
>
>
>     Às 13:49 de 19/06/2020, Sébastien Lahaie escreveu:
>     > I ran into some strange behavior in R when trying to assign a
>     treatment to
>     > rows in a data frame. I'm wondering whether any R experts can
>     explain
>     > what's going on.
>     >
>     > First, let's assign a treatment to 3 out of 10 rows as follows.
>     >
>     >> df <- data.frame(unit = 1:10)
>     >> df$treated <- FALSE
>     >> s <- sample(nrow(df), 3)
>     >> df[s,]$treated <- TRUE
>     >> df
>     >     unit treated
>     >
>     > 1     1   FALSE
>     >
>     > 2     2    TRUE
>     >
>     > 3     3   FALSE
>     >
>     > 4     4   FALSE
>     >
>     > 5     5    TRUE
>     >
>     > 6     6   FALSE
>     >
>     > 7     7    TRUE
>     >
>     > 8     8   FALSE
>     >
>     > 9     9   FALSE
>     >
>     > 10   10   FALSE
>     >
>     > This is as expected. Now we'll just skip the intermediate step
>     of saving
>     > the sampled indices, and apply the treatment directly as follows.
>     >
>     >> df <- data.frame(unit = 1:10)
>     >> df$treated <- FALSE
>     >> df[sample(nrow(df), 3),]$treated <- TRUE
>     >> df
>     >     unit treated
>     >
>     > 1     6    TRUE
>     >
>     > 2     2   FALSE
>     >
>     > 3     3   FALSE
>     >
>     > 4     9    TRUE
>     >
>     > 5     5   FALSE
>     >
>     > 6     6   FALSE
>     >
>     > 7     7   FALSE
>     >
>     > 8     5    TRUE
>     >
>     > 9     9   FALSE
>     >
>     > 10   10   FALSE
>     >
>     > Now the data frame still has 10 rows with 3 assigned to the
>     treatment. But
>     > the units are garbled. Units 1 and 4 have disappeared, for
>     instance, and
>     > there are duplicates for 6 and 9, one assigned to treatment and
>     the other
>     > to control. Why would this happen?
>     >
>     > Thanks,
>     > Sebastien
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     > ______________________________________________
>     > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>     > https://stat.ethz.ch/mailman/listinfo/r-help
>     > PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     > and provide commented, minimal, self-contained, reproducible code.
>
>     -- 
>     Este e-mail foi verificado em termos de vírus pelo software
>     antivírus Avast.
>     https://www.avast.com/antivirus
>
>     ______________________________________________
>     R-help using r-project.org <mailto:R-help using r-project.org> mailing list --
>     To UNSUBSCRIBE and more, see
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.
>

-- 
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus



More information about the R-help mailing list