[R] need help with excel data

William Dunlap wdunlap at tibco.com
Thu Jan 22 03:05:47 CET 2015


The following is one way to parse your file using R (using R-3.1.2 on
Windows
in a US English locale).  I downloaded it from Google Docs in tab-separated
format.
I could not get read.table() to do the job, but I don't completely
understand
the encoding/fileEncoding business there.

> file <- "exampX.xlsx - examp.tsv" # the name Google Docs suggested
> lines <- readLines(file, encoding="UTF-8")
Warning message:
In readLines(file, encoding = "UTF-8") :
  incomplete final line found on 'exampX.xlsx - examp.tsv'
> fields <- strsplit(lines, "\t")
> txt <- vapply(fields, function(x)x[2], "") # 2nd field of each line
> nmbrs <- regmatches(txt, gregexpr("[[:digit:]]+(\\*[[:digit:]]+)*", txt))
> lines[16:20]
[1] "1.97\tл.а. 11 35*46 27*46" "1.61\tсамбо 9 31*36 29*45"
[3] "1.17\tс.п. 4  37*29 39*30" "1.54\tушу 9 31*39 30*38"
[5] "1.73\tсамбо 6 32*39 29*39"
> nmbrs[16:20]
[[1]]
[1] "11"    "35*46" "27*46"

[[2]]
[1] "9"     "31*36" "29*45"

[[3]]
[1] "4"     "37*29" "39*30"

[[4]]
[1] "9"     "31*39" "30*38"

[[5]]
[1] "6"     "32*39" "29*39"

If you want to split those "x*y" into "x" and "y" you can use
the pattern "[[:digit:]]+" instead of the one I used.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jan 21, 2015 at 12:31 PM, Dr Polanski <n.polyanskij at gmail.com>
wrote:

> Hi all!
>
> Sorry to bother you, I am trying to learn some R via coursera courses and
> other internet sources yet haven’t managed to go far
>
> And now I need to do some, I hope, not too difficult things, which I think
> R can do, yet have no idea how to make it do so
>
> I have a big set of data (empirical) which was obtained by my colleagues
> and store at not convenient  way - all of the data in two cells of an excel
> table
> an example of the data is in the attached file (the link)
>
>
> https://drive.google.com/file/d/0B64YMbf_hh5BS2tzVE9WVmV3bFU/view?usp=sharing
>
> so the first column has a number and the second has a whole vector (I
> guess it is) which looks like
> «some words in Cyrillic(the length varies)» and then the set of numbers
> «12*23 34*45» (another problem that some times it is «12*23, 34*56»
>
> And the number of raws is about 3000 so it is impossible to do manually
>
> what I need to have at the end is to have it separately in different excel
> cells
> - what is written in words - |  12  | 23 | 34 | 45 |
>
> Do you think it is possible to do so using R (or something else?)
>
> Thank you very much in advance and sorry for asking for help and so stupid
> question, the problem is - I am trying and yet haven’t even managed to
> install openSUSE onto my laptop - only Ubuntu! :)
>
>
> Thank you very much!
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list