[R] Advice: How to best ensure column values match in different vectors?

R. Michael Weylandt michael.weylandt at gmail.com
Thu Aug 9 06:32:46 CEST 2012


On Wed, Aug 8, 2012 at 10:58 AM, DG Christensen <dgc at enservio.com> wrote:
> Hello all, I would like some advice on how to order elements in a vector.
>
> Background: my company is running a k-means clustering model on our
> historical data warehouse of products, which will produce a matrix of
> cluster centers.  Then, on our production web servers, we will take
> newly created products and find the cluster that is closest to the new
> product (we're calling this "scoring" the product).  Simple stuff.  The
> complex part is that the data source for the model is different from the
> source of the new product.
>
> My concern is how to best ensure that the order of the product
> attributes in the clustering model matches the attributes of the new
> product vector.  Here's what I'm considering doing:
>
> Say my company keeps the attributes height, width, and length on our
> products (in reality we'll have over 200 attributes).  I will create a
> constant of the column (i.e. attribute) names:
>
>     PRODUCT.ATTRIBUTE.COLS  <- c("H","W","L")
>     PRODUCT.ATTRIBUTE.COUNT <- length( PRODUCT.ATTRIBUTE.COLS )
>
> All new vectors (both during modeling and scoring) will be created with
> NaN values:
>
>     product.vector <- rep(NaN, PRODUCT.ATTRIBUTE.COUNT)
>     names( product.vector ) <- PRODUCT.ATTRIBUTE.COLS
>
> The vector will then be populated with attribute values like this.  The
> values will be retrieved from whatever DB we're using:
>
>     product.vector["H"] <- height.from.db
>     product.vector["W"] <- width.from.db
>     product.vector["L"] <- length.from.db
>
> Is this a reasonable way to do this?  If so, one thing I'd like to add
> is error checking that validates that the attribute name exists, so if
> the code attempted to do:
>
>     product.vector["WEIGHT"] <- weight.from.db
>
> it would throw some sort of error.  What's the best way for handling
> that?  Can I set the length of the vector to a fixed size?

Hi DG,

You can define your own class which errors out when accessing names
which don't exist:

E.g.,

as.strictvec <- function(x){
    stopifnot(is.atomic(x))
    class(x) <- c("strictvec", class(x))
     x
}

`[<-.strictvec` <- function(x, i, j, value){
     stopifnot(j %in% colnames(x))
     NextMethod()
}

z <- matrix(1:3, ncol = 3); colnames(z) <- letters[1:3]

z.strict <- as.strictvec(z)

z[, "d"] <- 5
z.strict[, "d"] <- 5 # Error!

Adapt as needed.

Cheers,
Michael


>
> Thanks for any guidance,
> DG
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list