[R] proper way to process dataframe by rows

Duncan Murdoch murdoch at stats.uwo.ca
Mon Nov 29 04:38:23 CET 2004


On Sun, 28 Nov 2004 21:25:24 -0500, Jack Tanner <ihok at hotmail.com>
wrote:

>This is a best practices / style question.
>
>The way I use RODBC is I something like this:
>
> > foo <- sqlQuery(db, "select * from foo")
> > apply(foo, 1, function{...})
>
>That is, I use apply to iterate over each result -- row -- in the 
>RODBC-produced dataframe. Is this how one generally wants to do this?
>
>My concern is that when apply iterates over the rows, it uses 
>as.matrix() to convert the dataframe to a character representation of 
>itself. Thus my database's carefully planned data types (that RODBC 
>carefully preserved when returning query results) get completely lost as 
>I process the data. I've taken to judiciously sprinkling as.numeric() 
>and friends here and there, but this is just begging for bugs.
>
>In other words, what is the smart way to process a dataframe by rows? Or 
>is there, by chance, a specific technique or practice that is available 
>for RODBC results but not for dataframes in general?

I would just use a for() loop if I didn't care about the speed too
much.  If I did, I'd avoid dealing with rows of dataframes:  access
using dataframe indexing is slow.  Depending what your function is,
you're probably better off extracting the columns of the dataframe as
vectors, and working with those.

Duncan Murdoch




More information about the R-help mailing list