[R] TD Matrix

Huntsinger, Reid reid_huntsinger at merck.com
Fri Mar 18 02:11:59 CET 2005


Do you mean when you encounter a new term? I would think document *length*
wouldn't matter; presumably you have a list of terms already. If so you
could treat each document as a vector of term codes, then use "tabulate" to
get the column for that document. 

If you're using all terms that appear in any document, and you don't want to
compile a list of terms first, then you might want to think of creating a
sparse representation as in the sparseM package and using the sparse linear
algebra routines there. Just an idea, though.

Reid Huntsinger

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Ryan Steckel
Sent: Thursday, March 17, 2005 6:01 PM
To: r-help at stat.math.ethz.ch
Subject: [R] TD Matrix


I'm trying to create a term document matrix where the columns are the
documents, the rows are the terms in the documents, and the cells are a
weight of term frequency in the document. My problem is the documents
are all different lengths. So when I add a new document, if the document
length is greater than the max document length in the matrix, I have to
resize the matrix and do a cbind operation. 
 
Does anyone know of an easier way?

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list