[R] Replace Text but not from within a word

Marc Schwartz marc_schwartz at me.com
Tue Feb 28 15:50:18 CET 2017


> On Feb 28, 2017, at 8:36 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:
> 
> For tasks like this, you will probably want to make sure to import the data as character data rather than as a factor.  E.g.
> 
> dat <- read.csv( "myfile.csv", header=FALSE, as.is=TRUE )
> 
> You can check what you have with the str() function.


Jeff,

Narrowly, for this particular task, that is not relevant.

gsub() and family use as.character() internally to coerce a factor to character and will work just fine:

text <- factor(c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC"))

> text
[1] BOEING CO          ENGMANTAYLOR CO    SAGINAW COUNTY INC
Levels: BOEING CO ENGMANTAYLOR CO SAGINAW COUNTY INC

> gsub(" CO$", "", text)
[1] "BOEING"             "ENGMANTAYLOR"       "SAGINAW COUNTY INC"

Using 'as.is' becomes more a personal preference issue beyond this.

Regards,

Marc


> -- 
> Sent from my phone. Please excuse my brevity.
> 
> On February 28, 2017 5:19:40 AM PST, Marc Schwartz <marc_schwartz at me.com> wrote:
>> 
>>> On Feb 28, 2017, at 3:38 AM, Harshal Athawale
>> <pgcim15.harshal at spjimr.org> wrote:
>>> 
>>> I am new in R.
>>> 
>>> I have a file. This file contains name of the companies.
>>> 'data.frame': 494 obs. of  1 variable:
>>> $ V1: Factor w/ 470 levels "3-d engineering corp",..: 293 134 339 359
>> 143
>>> 399 122 447 398 384 ...
>>> 
>>> Problem: I would like to remove "CO" (As it is the most frequent
>> word). I
>>> would like "CO" to removed from BOEING CO --> BOEING but not from
>> SAGINAW
>>> *CO*UNTY INC*. *
>>> 
>>>> text = c("BOEING CO","ENGMANTAYLOR CO","SAGINAW COUNTY INC")
>>> 
>>>> gsub(x = text, pattern = "CO", replacement = "")
>>> 
>>> [1] "BOEING "       "ENGMANTAYLOR " "SAGINAW UNTY"
>>> 
>>> Thanks in advance.
>>> 
>>> - Sam
>> 
>> 
>> Hi,
>> 
>> See ?regex and ?grep for some details and examples on how to construct
>> the expression used for matching, as well as some of the references
>> therein.
>> 
>> In this case, you want to use something along the lines of:
>> 
>>> gsub(" CO$", "", text)
>> [1] "BOEING"             "ENGMANTAYLOR"       "SAGINAW COUNTY INC"
>> 
>> where the "CO" is preceded by a space and followed by the "$", which is
>> a special character that indicates the end of the string to be matched.
>> 
>> Regards,
>> 
>> Marc Schwartz
>> 


	[[alternative HTML version deleted]]



More information about the R-help mailing list