[R] Help with text separation

Sarah Goslee sarah.goslee at gmail.com
Mon Nov 14 13:09:27 CET 2011


Hi,

On Mon, Nov 14, 2011 at 4:20 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:
> Good morning R list,
>
> My apologies if this has *already* answered elsewhere, but I have not found
> the answer that I am looking for.
>
> I have a character string, i.e.
>
>
> form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M')
>
> Now, my aim is to find the position of all those instances of '*' and to
> remove said '*'. However, I would also like to remove the preceding
> variable name before the '*', the math operator preceding this, and also
> the variable name after the '*'. So, here I would like to remove '+L*M'

You just want to get rid of them? gsub() it is.

I've changed your formula a little bit to better demonstrate what's going on:
> form<-c('~ A + B * C + C / D + E + E / F * G + H + I + J + K + L * M')
> gsub(" \\+ [A-Z] \\* [A-Z]", "", form)
[1] "~ A + C / D + E + E / F * G + H + I + J + K"

That regular expression will take out a
space
+
any capital letter
space
*
space
any capital letter.

It will take out all occurrences of that sequence, but won't take out
occurrences of * not in that sequence.

If you don't want the spaces, you don't need them. Just take them out
of the regular expression as well.

Not that strsplit() was remotely the right tool here, but you can
split into characters without a separator:
> form <- 'abcd'
> strsplit(form, '')
[[1]]
[1] "a" "b" "c" "d"

Sarah

> So, far I have come up with the following code:
>
> parts<-strsplit(form,' ')
> index<-which(unlist(parts)=="*")
> for (i in 1:length(index)){
>    parts[[1]][index[i]]<-list(NULL)
>    parts[[1]][index[i]+1]<-list(NULL)
>    parts[[1]][index[i]-1]<-list(NULL)
>    parts[[1]][index[i]-2]<-list(NULL)
> }
> new.form<-unlist(parts)
>
> form<-new.form[0]
> for (i in 1: length(new.form)){
>    form<-paste(form,new.form[i], sep="")
> }
>
> However, as you can see, I have had to use strsplit in, what I consider a
> rather clumsy manner, as the character string (form) has to be in a certain
> format. All variables and maths operators require a space between them in
> order for strsplit to work in the manner I require.
>
> I would very much like to accomplish what the above code already does, but
> without the need for the initial character string having the need for the
> aforementioned spaces.
>
> If the list can offer help, I would be most appreciative.
>
> Yours
>
> Mike Griffiths
>
>
>
-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list