[R] regexpr mystery can not remove trailing spaces

Petr PIKAL petr.pikal at precheza.cz
Wed Jun 2 16:55:35 CEST 2010


Hi

I have original data for which sub(' +$', '', ...) did not work in Excel 
so I could try them again.

> grep("\t", as.character(becva$V1[1]))
integer(0)
> grep("\n", as.character(becva$V1[1]))
integer(0)

and Jim's solutions work as expected

> sub('[[:space:]]+$', '', becva$V1[1])
[1] "02.06.10 12:40"
> sub('\\W+$', '', becva$V1[1])
[1] "02.06.10 12:40"
> sub('+$', '', becva$V1[1])
[1] "02.06.10 12:40   "

However with data updated directly from internet there is no problem and 
all above commands work without problems. There could be some Excel data 
issues which is not worth solving.

Thank to you all.

Regards
Petr


Joris Meys <jorismeys at gmail.com> napsal dne 02.06.2010 16:11:05:

> Hi Petr,
> 
> Matt may very well have been right. As I copied the dput from the mail, 
any 
> white space is converted to spaces apparently. Still, it might be 
possible the
> white spaces in your original data are tabs or even newline characters. 
You 
> can check that easily with 
> 
> grep("\t", as.character(becva$V1[1]))
> grep("\n", as.character(becva$V1[1]))
> 
> Cheers
> Joris
> 
> 

> On Wed, Jun 2, 2010 at 3:54 PM, Petr PIKAL <petr.pikal at precheza.cz> 
wrote:
> Hi
> 
> thanks. I am puzzled what was wrong. Now even
> 
> sub(' +$', '', bbb[1])
> 
> works. I am checking water throughput in nearby river and copying data
> from internet. So I wonder if there was some change recently as during
> floods they update it in about 10 minutes interval.
> 
> Regards
> Petr
> 
> 
> jim holtman <jholtman at gmail.com> napsal dne 02.06.2010 15:44:42:
> 
> > You had the wrong case on 'w' and the wrong expression with
> > [:space:]';  see below
> >
> > > bbb <- c("02.06.10 12:40   ", "02.06.10 12:00   ", "02.06.10 11:00 
",
> > + "02.06.10 10:00   ", "02.06.10 09:00   ", "02.06.10 08:00   ",
> > + "02.06.10 07:00   ", "02.06.10 06:00   ", "02.06.10 05:00   ",
> > + "02.06.10 04:00   ", "02.06.10 03:00   ", "02.06.10 02:00   ",
> > + "02.06.10 01:00   ", "02.06.10 00:00   ", "01.06.10 23:00   ",
> > + "01.06.10 22:00   ", "01.06.10 21:00   ", "01.06.10 20:00   ",
> > + "01.06.10 19:00   ", "01.06.10 18:00   ", "01.06.10 17:00   ",
> > + "01.06.10 16:00   ", "01.06.10 15:00   ", "01.06.10 14:00   ",
> > + "01.06.10 13:00   ", "01.06.10 05:00   ", "31.05.10 05:00   ",
> > + "30.05.10 05:00   ", "29.05.10 05:00   ", "28.05.10 05:00   ",
> > + "27.05.10 05:00   ")
> > >  sub('\\W+$', '', bbb[1])
> > [1] "02.06.10 12:40"
> > > sub('[[:space:]]+$', '', bbb[1])
> > [1] "02.06.10 12:40"
> > >
> >
> >
> > On Wed, Jun 2, 2010 at 9:22 AM, Petr PIKAL <petr.pikal at precheza.cz>
> wrote:
> > > Hi
> > >
> > >> dput(bbb)
> > > c("02.06.10 12:40   ", "02.06.10 12:00   ", "02.06.10 11:00   ",
> > > "02.06.10 10:00   ", "02.06.10 09:00   ", "02.06.10 08:00   ",
> > > "02.06.10 07:00   ", "02.06.10 06:00   ", "02.06.10 05:00   ",
> > > "02.06.10 04:00   ", "02.06.10 03:00   ", "02.06.10 02:00   ",
> > > "02.06.10 01:00   ", "02.06.10 00:00   ", "01.06.10 23:00   ",
> > > "01.06.10 22:00   ", "01.06.10 21:00   ", "01.06.10 20:00   ",
> > > "01.06.10 19:00   ", "01.06.10 18:00   ", "01.06.10 17:00   ",
> > > "01.06.10 16:00   ", "01.06.10 15:00   ", "01.06.10 14:00   ",
> > > "01.06.10 13:00   ", "01.06.10 05:00   ", "31.05.10 05:00   ",
> > > "30.05.10 05:00   ", "29.05.10 05:00   ", "28.05.10 05:00   ",
> > > "27.05.10 05:00   ")
> > >>
> > >
> > > For simplicity I change the name and put it to single variable.
> > > I also reinstalled R to recent R-devel
> > >
> > >> sub('\\w+$', '', bbb[1])
> > > [1] "02.06.10 12:40   "
> > >> sub('[:space:]', '', bbb[1])
> > > [1] "02.06.10 1240   "
> > >>
> > >
> > > I also tried Matt's suggestion but it did not help.
> > >
> > > Regards
> > > Petr
> > >
> > > Joris Meys <jorismeys at gmail.com> napsal dne 02.06.2010 14:35:19:
> > >
> > >> Could you provide us with dput(becva$V1[1])?
> > >> Cheers
> > >> Joris
> > >
> > >> On Wed, Jun 2, 2010 at 2:07 PM, Petr PIKAL <petr.pikal at precheza.cz>
> > > wrote:
> > >> Dear all
> > >>
> > >> I encountered strange problem with regexpr replacement
> > >>
> > >> I made this character object
> > >>
> > >> str <- "02.06.10 12:40     "
> > >>
> > >> > str(str)
> > >>  chr "02.06.10 12:40      "
> > >>
> > >> I read in an object which seems to be quite similar
> > >>
> > >> > str(as.character(becva$V1)[1])
> > >>  chr "02.06.10 12:40   "
> > >>
> > >> However I can not remove trailing spaces from it
> > >>
> > >> > sub(' +$', '', as.character(becva$V1[1]))
> > >>
> > >> [1] "02.06.10 12:40   "
> > >> > sub(' +$', '', str)
> > >> [1] "02.06.10 12:40"
> > >> >
> > >>
> > >> Do somebody have an idea what to do?
> > >>
> > >> $version.string
> > >> [1] "R version 2.12.0 Under development (unstable) (2010-04-25
> r51820)"
> > >>
> > >> on Windows
> > >>
> > >> Regards
> > >> Petr
> > >>
> > >> ______________________________________________
> > >> R-help at r-project.org mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >>
> > >>
> > >> --
> > >> Joris Meys
> > >> Statistical Consultant
> > >>
> > >> Ghent University
> > >> Faculty of Bioscience Engineering
> > >> Department of Applied mathematics, biometrics and process control
> > >>
> > >> Coupure Links 653
> > >> B-9000 Gent
> > >>
> > >> tel : +32 9 264 59 87
> > >> Joris.Meys at Ugent.be
> > >> -------------------------------
> > >> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 646 9390
> >
> > What is the problem that you are trying to solve?

> 
> 
> 
> -- 
> Joris Meys
> Statistical Consultant
> 
> Ghent University
> Faculty of Bioscience Engineering 
> Department of Applied mathematics, biometrics and process control
> 
> Coupure Links 653
> B-9000 Gent
> 
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be 
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php



More information about the R-help mailing list