[Rd] Any interest in "merge" and "by" implementations specifically for sorted data?

Kevin B. Hendricks kevin.hendricks at sympatico.ca
Sun Jul 30 16:11:21 CEST 2006


Hi Bill,

After playing with this some more and adding an implementation to  
handle NAs in the data vector, I have run into the problem of what to  
return when the only data values for a particular bin (or level) in  
the data vector were NAs and the user selected na.rm=T

1. Should it return 0 for counts of that particular bin and NA for  
that bin for all of the other functions?  If so, wouldn't that be  
strange to return a NA just since there is no valid data for that bin  
because the user asked for na.rm=T?

2.  Or do I have to literally rebuild the final result vector,  
removing all "unused" bins before returning the results?   And  
wouldn't that cause problems in not all of the levels from 1:ngroups  
will be returned for some variables and not for others.

I personally like the approach of 1. better since if I give an igroup  
function my groups and tell it to na.rm=T from my data vector, I  
would really want all group levels returned and not just the ones  
that had valid data in them and if a particular group had no data, I  
would want the count to be 0 for that bin and all of the other funs  
to return NA for that particular bin?

Is that what you are returning in that case?

Also, do you always return Sums, Maxs, and Mins as "numeric" or do  
you sometimes return "integer" values if an "integer" data vector is  
passed in?

Are "Counts" always returned as "integer" or do you always set them  
to "numeric" or does that vary with the type of the data vector  
passed in?

Do you handle "complex" data vectors in a similar fashion (ie. using  
the length of the complex vector as its value for Maxs, Mins, etc?)?

Thanks,

Kevin



More information about the R-devel mailing list