[R] data frames, na.omit, and sums
millerj at truman.edu
Mon Dec 5 01:55:06 CET 2005
New to R, I'm in the middle of a project that I'm using to force me
learn R. I'm running into some behavior that I don't understand, and
I need some advice. In the last week I've gotten some great advice
from the list on visualizing my data, and I was hoping people could
help me get over another barrier I've encountered to my progress.
Before I describe what I'm trying to do and where I'm stuck with R,
let me quickly outline what I need help with:
(1) summing over the non-NA entries in each row of a data frame, and
(1) using na.omit() and na.action() with rows of data from a frame.
I have a data frame that contains information about when my academic
department offered courses and their enrollments. The data frame
looks something like
sem year C1e C1s C2e C2s
Fall 1991 10 2 NA NA
Spring 1992 3 1 8 1
Summer 1992 NA NA 100 10
where C?e represents a specific course's enrollment that semester and
C?s represents the number of sections of that course offered. The
frame is filled with integers and NAs. The data frame is of medium
size, with about 180 columns and 45 rows.
I need to cull some basic information from this dataset such as:
(1) total number of sections offered each semester (and each year),
(2) total number of credit hours generated each semester (and each
(3) the student-to-faculty ratio of the department each semester (and
From a mathematical standpoint, how to do each of these is obvious
to me. But having to negotiate working withing data frames and with
matrices that have NA entries has really gotten me confused
+frustrated. (I have no programming background.)
To calculate (1) above for semester (rows), I know how to select the
"sections" columns using grep(). What I'd like to do is sum the
selected frame's non-NA entries row-by-row. For some reason, I was
able to do this earlier today using the rowsum() function with
na.rm=TRUE, but now it's not working. It complains of non-numeric
entries. (In fact, I was able to use the rowsum() function to
calculate (1) for each year.) When I try to convert the data frame
(or a sub-frame) to a matrix, my integers turn into strings/
characters, and I have no idea what to do with that!
To calculate (2) above for a semester, I know how to select the
enrollment columns using grep(). What I'd like to do is calculate
the total credits generated by taking the dot product of each row
with a vector whose components are the credit hour values of each
course in my data frame. Of course, I'd nave to account for the NA
values in my data frame, but in the past I've had decent luck with
using na.omit() and na.action() to select the non-NA components of a
vector. Unfortunately, na.omit is absolutely no working with my
dataframe; it just returns the names of all the columns!
Until I get (1) and (2) figured out, I have no hope of figuring out (3).
Thank you for reading this far into this post. If you have any
suggestions for how I can get na.omit() and summing to work for me,
I'd appreciate hearing from you.
Jason E. Miller, Ph.D.
Associate Professor of Mathematics
Truman State University
More information about the R-help