[R] Use of apply rather than a loop

Sharpie chuck at sharpsteen.net
Sat Dec 5 00:16:19 CET 2009



Dennis Fisher wrote:
> 
> Colleagues,
> 
> R 2.9.0 on all platforms
> 
> I have a dataset that contains three columns of interest: ID's, serial  
> elapsed times, and a marker.  Representative data:
> Subject		Time		Marker
> 1			100.5		0
> 1			101			0
> 1			102			1
> 1			103			0
> 1			105			0
> 
> For each subject, I would like to find the time associated with MARKER  
> == 1, then replace Time with Time - (Time[Marker == 1])
> The result for this subject would be:
> Subject		Time		Marker
> 1			-1.5			0
> 1			-1			0
> 1			0			1
> 1			1			0
> 1			3			0
> 
> One proviso: some subjects do not have Marker == 1; for these  
> subjects, I would like Time to remain unchanged.
> 
> At present, I am looping over each subject.  The number of subjects is  
> large so this process is quite slow.  I assume that one of the apply  
> functions could speed this markedly but I am not facile with them.   
> Any help would be appreciated.
> 
> Dennis
> 

The best way to approach this problem would probably be to use a function
like by() which splits your data into subsets and then executes a function
on each subset. The function would search the subset for the marker time. If
it exists, the marker time would be subtracted from the Time column. If not,
no action would be taken.

Instead of the by() function, which is in the base R package, I will use
ddply() from the plyr package.  by() would return a list of sub-dataframes,
one for each subject.  Among other things, ddply will reassmble this list
into one data frame. Assuming your data.frame is called "testData":

  # Define a function that will adjust the times for a subject.
  adjustTimes <- function( subjectData ){
    
    # Find the row containing the marker value 1.
    markerLocation <- match( 1, subjectData[[ "Marker" ]] )

    # If the marker was not present, markerLocation will have the value "NA"
    # and we skip the following block of code.
    if( !is.na( markerLocation ) ){

      markerTime <- subjectData[[ 'Time' ]][ markerLocation ]
      subjectData[[ 'Time' ]] <- subjectData[[ 'Time' ]] - markerTime

    }

    return( subjectData )

  }

  require( plyr )

  # Split the testData by subject, apply the adjustTimes function to
  # each subset and then recombine the results back into a data frame.
  processedData <- ddply( testData, 'Subject', adjustTimes )


Hope this helps!

-Charlie
-- 
View this message in context: http://n4.nabble.com/Use-of-apply-rather-than-a-loop-tp948941p948957.html
Sent from the R help mailing list archive at Nabble.com.




More information about the R-help mailing list