[R] Help with text separation

David Winsemius dwinsemius at comcast.net
Mon Nov 14 18:05:13 CET 2011


On Nov 14, 2011, at 4:20 AM, Michael Griffiths wrote:

> Good morning R list,
>
> My apologies if this has *already* answered elsewhere, but I have  
> not found
> the answer that I am looking for.
>
> I have a character string, i.e.
>
>
> form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M')
>
> Now, my aim is to find the position of all those instances of '*'  
> and to
> remove said '*'. However, I would also like to remove the preceding
> variable name before the '*', the math operator preceding this, and  
> also
> the variable name after the '*'. So, here I would like to remove  
> '+L*M'

This would be a very narrow implementation that requires the +/spc/ 
alnum/spc/*/alnum sequence exactly;

 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "

This is a more general implementation using the "*" operator that  
matches each of the preceding item 0 or more times.

  form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M',
  '~ A + B + C + C / D + E + E / F + G + H + I + J + K + L*M',
   '~ A + B + C + C / D + E + E / F + G + H + I + J + K +Llll*M'
  )
 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[2] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[3] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "


---stripped out code---

-- 
David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list