[R] Text analysis question

Andrew Perrin clists at perrin.socsci.unc.edu
Thu Jun 12 00:10:53 CEST 2003


I'm grappling with a problem and would appreciate any thoughts on it.

I'm revising a paper for resubmission to a journal. For the paper, I've
coded each "turn" in a series of conversations with several binary codes.
(A turn is one package of statements made by one speaker, starting with
the beginning of the speech and ending when the speaker stops or is
interrupted.) The reviewers want me to justify the decision I made to code
each turn individually, ignoring (for this analysis) the turns that
surround each turn.

My thought is to run a logistic regression, predicting the
presence/absence of a code in a given turn, with independent variables
being the number of turns that have elapsed since each code was last used
in the conversation. No problem so far. The problem involves treating what
are essentially missing data.  If I simply omit cases in which one or more
variables is missing, it's a very conservative test, since it includes
only turns for which all codes have already occurred once in the
conversation.

An alternative is to set the number of turns that has elapsed since the
last use of code to a suitably high number--probably 1 + the total number
of turns elapsed in the conversation--which would let me include all
statements (including those that introduce codes into a conversation) but
also would inflate the influence of prior use on current use by
postulating a nonexistent use "just before" the conversation.

I hope this is clear enough to be informative. I'd be interested in any
thoughts folks might have.

Thanks,
Andy Perrin


----------------------------------------------------------------------
Andrew J Perrin - http://www.unc.edu/~aperrin
Assistant Professor of Sociology, U of North Carolina, Chapel Hill
clists at perrin.socsci.unc.edu * andrew_perrin (at) unc.edu




More information about the R-help mailing list