[R] suggestions/improvements for recoding strategy

Peter Ehlers ehlers at ucalgary.ca
Tue May 18 05:40:46 CEST 2010


Sorry, my attempt wasn't quite good enough. I didn't
consider the possibility of a 'negative' value in a
character/factor column.  To fix that, see inline below.


On 2010-05-17 14:32, Peter Ehlers wrote:
> On 2010-05-17 12:54, Henrique Dallazuanna wrote:
>> Try this:
>>
>> newData<- sapply(numdat, function(x)lapply(strsplit(as.character(x),
>> '-'),
>> function(.x)mean(as.numeric(.x))))
>
> There's a potential problem if numdat contains negative numbers.
> It would be better to restrict the recoding to character or
> factor columns.
>
> cl <- sapply(numdat, class)
> idx <- which(cl %in% c('character','factor'))
> g <- function(x){
> sapply(strsplit(as.character(x),"-"),
> function(.x) mean(as.numeric(.x), na.rm=TRUE))
> }

Replace function g() with

g <- function(x){
     sapply(strsplit(as.character(x),"-"),
       function(.x) ifelse(.x[1] == "",
                           -as.numeric(.x[2]),
                           mean(as.numeric(.x)))
     )
}

Since strsplit("-3", "-") produces c("", "3"), we recognize
any list component of the form c("", "a") as representing -a.

   -Peter Ehlers

>
> newData <- numdat
> for(i in idx) newData[,i] <- g(newData[,i])
> newData
>
> -Peter Ehlers
>
>>
>> On Mon, May 17, 2010 at 3:29 PM, Juliet
>> Hannah<juliet.hannah at gmail.com>wrote:
>>
>>> I am recoding some data. Many values that should be 1.5 are recorded
>>> as 1-2. Some example data and my solution is below. I am curious about
>>> better approaches or any other suggestions. Thanks!
>>>
>>> # example input data
>>>
>>> myData<- read.table(textConnection("id, v1, v2, v3
>>> a,1,2,3
>>> b,1-2,,3-4
>>> c,,3,4"),header=TRUE,sep=",")
>>> closeAllConnections()
>>>
>>> # the first column is IDs so remove that
>>>
>>> numdat<- myData[,-1]
>>>
>>> # function to change dashes: 1-2 to 1.5
>>>
>>> myrecode<- function(mycol)
>>> {
>>> newcol<- mycol
>>> newcol<- gsub("1-2","1.5",newcol)
>>> newcol<- gsub("2-3","2.5",newcol)
>>> newcol<- gsub("3-4","3.5",newcol)
>>> newcol<- as.numeric(newcol)
>>>
>>> }
>>>
>>> newData<- data.frame(do.call(cbind,lapply(numdat,myrecode)))
>>>
>



More information about the R-help mailing list