[R] Potential bug/unexpected behaviour in model matrix

Andrew Simmons @kw@|mmo @end|ng |rom gm@||@com
Thu Aug 26 21:26:42 CEST 2021


Hello,


I'm not so sure this is a bug, it appears to be behaving as intended from
the documentation. I would suggest using argument 'physical' from 'setkey'
to avoid reordering the rows. Something like:


x <- data.table::data.table(V1 = 9:0)
y <- data.table::copy(x)


data.table::setkey(x, V1, physical = TRUE)
data.table::setkey(y, V1, physical = FALSE)


print(x)
print(y)


attr(x, "index")
attr(y, "index")


'x' does not have an attribute index because the rows were reordered. 'y'
does have an index because its rows weren't reordered. I hope this helps!



On Thu, Aug 26, 2021 at 1:02 PM Leonidas Lundell <leo.lundell using sund.ku.dk>
wrote:

> Dear R-project,
>
> Apologies if I am sending this to the wrong list, and thank you for your
> enormous contribution.
>
> I discovered a subtle interaction between the data.table package and
> model.matrix function that influences the output to the point that you will
> get completely erroneous results:
>
> df  <- data.frame(basespaceID = 8:1, group = paste0(rep(c("a", "b"), 4),
> "_", sort(rep(c("1", "2"), 4))))
> designDF <- model.matrix(~0 + group, data = df)
>
> dt <- data.table::as.data.table(df)
> designDT <- model.matrix(~0 + group, data = dt)
>
> all(designDF == designDT)
> #TRUE
>
> data.table::setkey(dt, "basespaceID")
> designDTkeyed <- model.matrix(~0 + group, data = dt)
>
> all(designDF == designDTkeyed)
> #FALSE
>
> # It seems that a keyed data.table reorders the rows of the design matrix
> by alphabetical order:
>
>  designDFreordered <- model.matrix(~0 + group, data = df[8:1,])
> all(designDFreordered == designDTkeyed)
> #TRUE
>
> And my sessionInfo if that’s of any help:
>
> sessionInfo()
>
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 11.5.2
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] data.table_1.14.0
>
> loaded via a namespace (and not attached):
> [1] umap_0.2.7.0      Rcpp_1.0.7        knitr_1.33        magrittr_2.0.1
>  [5] maps_3.3.0        lattice_0.20-44   rlang_0.4.11
> stringr_1.4.0
>  [9] tools_4.1.0       grid_4.1.0        xfun_0.25
> png_0.1-7
> [13] audio_0.1-7       RSpectra_0.16-0   htmltools_0.5.1.1
> shapefiles_0.7
> [17] askpass_1.1       openssl_1.4.4     yaml_2.2.1
> digest_0.6.27
> [21] zip_2.2.0         Matrix_1.3-4      beepr_1.3
> evaluate_0.14
> [25] rmarkdown_2.10    openxlsx_4.2.4    sp_1.4-5
> stringi_1.7.3
> [29] compiler_4.1.0    fossil_0.4.0      jsonlite_1.7.2
> reticulate_1.20
> [33] foreign_0.8-81
>
> Best regards
>
> Leonidas Lundell
> Postdoc
> Barres & Zierath group
>
> University of Copenhagen
> Novo Nordisk Foundation
> Center for Basic Metabolic Research
>
> mailto:leo.lundell using sund.ku.dk
>
>
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list