[R] data frames, na.omit, and sums

Jason Miller millerj at truman.edu
Mon Dec 5 01:55:06 CET 2005


Dear R-helpers,

New to R, I'm in the middle of a project that I'm using to force me  
learn R.  I'm running into some behavior that I don't understand, and  
I need some advice.  In the last week I've gotten some great advice  
from the list on visualizing my data, and I was hoping people could  
help me get over another barrier I've encountered to my progress.

Before I describe what I'm trying to do and where I'm stuck with R,  
let me quickly outline what I need help with:
(1) summing over the non-NA entries in each row of a data frame, and
(1) using na.omit() and na.action() with rows of data from a frame.

I have a data frame that contains information about when my academic  
department offered courses and their enrollments.  The data frame  
looks something like

sem     year    C1e C1s C2e C2s
Fall    1991    10  2   NA  NA
Spring  1992    3   1   8   1
Summer  1992    NA  NA  100 10

where C?e represents a specific course's enrollment that semester and  
C?s represents the number of sections of that course offered.  The  
frame is filled with integers and NAs.  The data frame is of medium  
size, with about 180 columns and 45 rows.

I need to cull some basic information from this dataset such as:
(1) total number of sections offered each semester (and each year),
(2) total number of credit hours generated each semester (and each  
year), and
(3) the student-to-faculty ratio of the department each semester (and  
each year).

 From a mathematical standpoint, how to do each of these is obvious  
to me.  But having to negotiate working withing data frames and with  
matrices that have NA entries has really gotten me confused 
+frustrated.  (I have no programming background.)

To calculate (1) above for semester (rows), I know how to select the  
"sections" columns using grep().  What I'd like to do is sum the  
selected frame's non-NA entries row-by-row.  For some reason, I was  
able to do this earlier today using the rowsum() function with  
na.rm=TRUE, but now it's not working. It complains of non-numeric  
entries.  (In fact, I was able to use the rowsum() function to  
calculate (1) for each year.)  When I try to convert the data frame  
(or a sub-frame) to a matrix, my integers turn into strings/ 
characters, and I have no idea what to do with that!

To calculate (2) above for a semester, I know how to select the  
enrollment columns using grep().  What I'd like to do is calculate  
the total credits generated by taking the dot product of each row  
with a vector whose components are the credit hour values of each  
course in my data frame.  Of course, I'd nave to account for the NA  
values in my data frame, but in the past I've had decent luck with  
using na.omit() and na.action() to select the non-NA components of a  
vector. Unfortunately, na.omit is absolutely no working with my  
dataframe; it just returns the names of all the columns!

Until I get (1) and (2) figured out, I have no hope of figuring out (3).

Thank you for reading this far into this post.  If you have any  
suggestions for how I can get na.omit() and summing to work for me,  
I'd appreciate hearing from you.

Jason Miller


================================================================
Jason E. Miller, Ph.D.
Associate Professor of Mathematics
Truman State University
Kirksville, MO
http://pyrite.truman.edu/~millerj/
660.785.7430




More information about the R-help mailing list