[Rd] sparse.model.matrix Generates Non-Existent Factor Levels if Ord.factor Columns Present

Ben Bolker bbolker at gmail.com
Thu Feb 8 13:51:12 CET 2018


  color and clarity are ordered factors, so sparse.model.matrix is
generating orthogonal-polynomial contrasts  (see ?contr.poly).  This is
by design ...  what are you trying to do?  Are you interested in fac2sparse?

On 18-02-07 11:00 PM, Dario Strbenac wrote:
> Good day,
> 
> Sometimes, sparse.model.matrix outputs a dgCMatrix which has column names consisting of factor levels that were not in the original dataset. The first factor appears to be correctly transformed, but the following factors don't. For example:
> 
> diamonds <- as.data.frame(ggplot2::diamonds)
>> colnames(sparse.model.matrix(~ . -1, diamonds))
>  [1] "carat"        "cutFair"      "cutGood"      "cutVery Good" "cutPremium"   "cutIdeal"     "color.L"      "color.Q"      "color.C"      "color^4"      "color^5"     
> [12] "color^6"      "clarity.L"    "clarity.Q"    "clarity.C"    "clarity^4"    "clarity^5"    "clarity^6"    "clarity^7"    "depth"        "table"        "price"       
> [23] "x"            "y"            "z"
> 
> The variables color and clarity don't have factor levels which have been suffixed to them in the transformed matrix. The values in those columns are also wrong. Changing the Ord.factor columns into simply being factors fixes the problem. 
> 
>> diamonds[, "cut"] <- factor(as.character(diamonds[, "cut"]))
>> diamonds[, "color"] <- factor(as.character(diamonds[, "color"]))
>> diamonds[, "clarity"] <- factor(as.character(diamonds[, "clarity"]))
> 
>> colnames(sparse.model.matrix(~ . -1, diamonds)) # No more invented factor levels.
>  [1] "carat"        "cutFair"      "cutGood"      "cutIdeal"     "cutPremium"   "cutVery Good" "colorE"       "colorF"       "colorG"       "colorH"      
> [11] "colorI"       "colorJ"       "clarityIF"    "claritySI1"   "claritySI2"   "clarityVS1"   "clarityVS2"   "clarityVVS1"  "clarityVVS2"  "depth"       
> [21] "table"        "price"        "x"            "y"            "z"
> 
> Can it be made to work correctly for both plain and ordered factors?
> 
>> sessionInfo()
> R Under development (unstable) (2018-02-06 r74231)
> Platform: i386-w64-mingw32/i386 (32-bit)
> 
> other attached packages:
> [1] Matrix_1.2-12
> 
> loaded via a namespace (and not attached):
>  [1] colorspace_1.3-2 scales_0.5.0     compiler_3.5.0   lazyeval_0.2.1  
>  [5] plyr_1.8.4       pillar_1.1.0     gtable_0.2.0     tibble_1.4.2    
>  [9] Rcpp_0.12.15     ggplot2_2.2.1    grid_3.5.0       rlang_0.1.6     
> [13] munsell_0.4.3    lattice_0.20-35
> 
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list