[R] using an array of strings with strsplit, issue when including a space in split criteria

Tony Breyal tony.breyal at googlemail.com
Tue Sep 8 12:57:12 CEST 2009


UPDATE:

I'm not sure why, but on my Windows XP 64bit machine, I ran the same
code again and this time it is not working even though it worked
previously. This has been done using the Rgui --vanilla command.

> x <- c("Weekly sales figures to 30 August 2008 published 5 September", "Weekly sales figures to 6 September 2008 published 11 September")
> strsplit(x, 'published ', fixed=TRUE)
[[1]]
[1] "Weekly sales figures to 30 August 2008 "
[2] "5 September"

[[2]]
[1] "Weekly sales figures to 6 September 2008 published 11 September"

O/S: Windows XP 64bit Pro; Service Pack 2
> sessionInfo()
R version 2.9.2 (2009-08-24)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.
1252;LC_MONETARY=English_United States.
1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
base
>



On 8 Sep, 09:47, Tony Breyal <tony.bre... at googlemail.com> wrote:
> After further investigation it appears that the problem is specific to
> my Vista PC. I am able to get the correct results using R 2.9.2 on a
> Window XP 64bit machine. However i do not know why this does not work
> on my Vista PC. The following was done after rebooting Vista.
>
> >From CMD.exe I ran the following line:
>
> C:\Program Files\R\R-2.9.2\bin>Rgui --vanilla
>
> This opened up R.
>
> ### R 2.9.2 START ###> txt <- c("sales to 23 August 2008 published 29 August",
>
> + "sales to 6 September 2008 published 11 September")
>
> > strsplit(txt, 'published', fixed=TRUE)
>
> [[1]]
> [1] "sales to 23 August 2008 " " 29 August"
>
> [[2]]
> [1] "sales to 6 September 2008 " " 11 September"
>
> > strsplit(txt, 'published ', fixed=TRUE)
>
> [[1]]
> [1] "sales to 23 August 2008 " "29 August"
>
> [[2]]
> [1] "sales to 6 September 2008 published 11 September"
>
> > sessionInfo()
>
> R version 2.9.2 (2009-08-24)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> ### R 2.9.2 END ###
>
> The exact same thing happened when I used R 2.9.0  and R 2.8.1 on this
> same vista computer.
>
> ### R 2.9.0 ###> sessionInfo()
>
> R version 2.9.0 (2009-04-17)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices datasets  utils     methods   base
>
> other attached packages:
> [1] rcom_2.1-3     rscproxy_1.3-1
>
> loaded via a namespace (and not attached):
> [1] tools_2.9.0
>
> ### R 2.8.1 ###> sessionInfo()
>
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
> Kingdom.1252;LC_MONETARY=English_United
> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
>
> my computer details are:
> Windows Vista Ultimate
> Service Pack 1
> Manufacturer: Dell
> Rating: 3.4
> Processor: Intel Core 2 Duo CPU E6750 @ 2.66 GHz
> Memory (RAM): 4.00 GB
> System type: 32-bit Operating System
>
> 2009/9/8 Gabor Grothendieck <ggrothendi... at gmail.com>:
>
>
>
> > I am using the exact same version of R as you also on Vista
> > but can't reproduce your result.  For me it splits properly.
>
> > Try starting R like this (modify path if needed) from the
> > Windows cmd line:
>
> > \Program Files\R\R-2.9.2\bin\Rgui --vanilla
>
> > and then try it.
>
> > On Mon, Sep 7, 2009 at 11:40 AM, Tony Breyal<tony.bre... at googlemail.com> wrote:
> >> Dear all,
>
> >> I'm having a problem understanding why a split does not occur with in
> >> the 2nd use of the function strsplit below:
>
> >> # text strings
> >>> txt <- c("sales to 23 August 2008 published 29 August",
> >> + "sales to 6 September 2008 published 11 September")
>
> >> # first use
> >>> strsplit(txt, 'published', fixed=TRUE)
> >> [[1]]
> >> [1] "sales to 23 August 2008 " " 29 August"
>
> >> [[2]]
> >> [1] "sales to 6 September 2008 " " 11 September"
>
> >> # second use, but with a space ' ' in the split
> >>> strsplit(txt, 'published ', fixed=TRUE)
> >> [[1]]
> >> [1] "sales to 23 August 2008 " "29 August"
>
> >> [[2]]
> >> [1] "sales to 6 September 2008 published 11 September"
>
> >> Thank you kindly for any help in advance.
> >> Tony
>
> >> O/S: Win Vista Ultimate
> >>> sessionInfo()
> >> R version 2.9.2 (2009-08-24)
> >> i386-pc-mingw32
>
> >> locale:
> >> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United Kingdom.
> >> 1252;LC_MONETARY=English_United Kingdom.
> >> 1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>
> >> attached base packages:
> >> [1] stats     graphics  grDevices utils     datasets  methods
> >> base
>
> >> other attached packages:
> >> [1] RODBC_1.3-0
>
> >> ______________________________________________
> >> R-h... at r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Tony Breyal
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list