[R] Writing a function to return column position XXXX

Uwe Ligges ligges at statistik.tu-dortmund.de
Wed Jan 25 11:42:35 CET 2012



On 25.01.2012 02:16, R. Michael Weylandt wrote:
> I think you are getting stuck on the same regexp problem as before
> (i.e., once again the dollar sign is being interpreted as the
> beginning

You meant "end".

Uwe


> of the line rather than an actual dollar sign)
>
> If I understand your question, might I suggest something much easier?
>
> x = data.frame(a = c("$1034.23","1,230"), b = c(4,5))
> sapply(x, function(x) as.numeric(gsub("[\\$,]","",x)))
>
> That is, go by each column of the data frame and replace anything
> that's either a literal dollar sign or a comma with empty space (i.e.,
> remove it) and then convert the result to numeric. If it's already
> numeric, this will simply return it unaltered so I think it's safe to
> apply to each row.
>
> M
>
> On Tue, Jan 24, 2012 at 11:07 AM, Dan Abner<dan.abner99 at gmail.com>  wrote:
>> Hi everyone,
>>
>> I am using Michael's approach (grepl()) to identify which columns
>> containing $ signs. I was hoping to incorporate this into a line of
>> code that would automatically 1) find which columns contain $ signs,
>> 2) strip the $ and commas, and 3) convert the result to a numeric
>> vector.
>>
>> I have the following:
>>
>> col.id<-function(x) any(grepl("\\$",x))
>>
>> cand2[which(sapply(cand2,col.id))]<-
>>         as.numeric(gsub("[$,]","",cand2[which(sapply(cand2,col.id))]))
>>
>> However, I am doing something wrong: while the code correctly
>> identifies the columns containing $ signs, it also returns ALL NA for
>> those columns.
>>
>> See my initial message for this thread for example data.
>>
>> Any assistance is appreciated.
>>
>> Thanks!
>>
>> Dan
>>
>>
>> On Tue, Jan 24, 2012 at 9:04 AM, R. Michael Weylandt
>> <michael.weylandt at gmail.com>  wrote:
>>> Either
>>>
>>> any(grepl("$",x, fixed = TRUE)) # You probably want grepl not grep
>>> any(grepl("\\$",x) )
>>> ? regexpr # $ has a special value
>>>
>>> Michael
>>>
>>> PS -- Stop with HTML postings (seriously, it actually does mess up
>>> what the rest of us see and I think it causes trouble for the archives
>>> as well)
>>>
>>> On Tue, Jan 24, 2012 at 8:49 AM, Dan Abner<dan.abner99 at gmail.com>  wrote:
>>>> Hello everyone,
>>>>
>>>> I am writing my own function to return the column index of all variables
>>>> (these are currently character vectors) in a data frame that contain a
>>>> dollar sign($). A small piece of the data look like this:
>>>>
>>>>     can_sta can_zip ind_ite_con ind_uni_con AL 36106 $251,895.80 $22,874.43
>>>> AL 35802 $141,373.60 $7,100.00  AL 35201 $273,208.50 $18,193.66  AR
>>>> 72404 $186,918.00
>>>> $25,391.00  AR 72217 $451,127.00 $27,255.23  AR 7.28E+08 $58,336.22 $5,293.82
>>>>
>>>>
>>>> So far I have:
>>>>
>>>>
>>>> col.id<-function(x) any(grep("$",x))
>>>> sapply(cand2,col.id)
>>>>
>>>> However, this returns TRUE for all columns (even those that do not contain
>>>> the $).
>>>>
>>>> Any assistance is appreciated.
>>>>
>>>> Thank you,
>>>>
>>>> Dan
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list