[R] Function to "lump" factors together?

David Wolfskill r at catwhisker.org
Tue Oct 18 03:45:01 CEST 2011


Sorry about the odd terminology, but I suspect that my intent might be
completely missed had I used "aggregate" or "classify" (each of which
appears to have some rather special meanings in statistical analysis and
modeling).

I have some data about software builds; one of the characteristics of
each is the name of the branch.

A colleague has generated some fairly interesting graphs from the data,
but he's treating each unique branch as if it were a separate factor.

Last I checked, I had 276 unique branches, but these could be
aggregated, classified, or "lumped" into about 8 - 10 categories; I
believe it would be useful and helpful for me to be able to do precisely
that.

A facility that could work for this purpose (that that we use in our
"continuous build" driver) is the Bourne shell "case" statement.  Such a
construct might look like:

	case branch in
	trunk)    factor="trunk"; continue;;
	IB*)      factor="IB"; continue;;
	DEV*)     factor="DEV"; continue;;
	PVT*)     factor="PVT"; continue;;
	RELEASE*) factor="RELEASE"; continue;;
	*)        factor="UNK"; continue;;
	esac

Which would assign one of 6 values to "factor" depending on the value of
"branch" -- using "UNK" as a default if nothing else matched.

Mind, the patterns there are "Shell Patterns" ("globs"), not regular
expressions.

I've looked at R functions match(), pmatch(), charmatch(), and switch();
while each looks as it it might be coercable to get the result I want,
it also looks to require iteration over the thousands of entries I have
-- as well as using the functions in question in a fairly "unnatural"
way.

I could also write my own function that iterates over the entries,
generating factors from the branch names -- but I can't help but think
that what I'm trying to do can't be so uncommon that someone hasn't
already written a function to do what I'm trying to do.  And I'd really
rather avoid "re-inventing the wheel," here.

So: would someone please supply a clue?

Thanks!

Peace,
david
-- 
David H. Wolfskill				r at catwhisker.org
Depriving a girl or boy of an opportunity for education is evil.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111017/c393b45a/attachment.bin>


More information about the R-help mailing list