[R] spliting first 10 words in a string

Gaj Vidmar gaj.vidmar at mf.uni-lj.si
Tue Nov 2 11:24:58 CET 2010


Though <forbidden> in this list, in Excel it's just (literally!) five clicks 
away!
(with the column in question selected)
Data -> Text to Columns -> Delimited -> tick Space -> Finish
Pa je! (~Voila in Slovenian)
(then import back to R, keeping only the first 10 columns if so desired)

Regards,
Assist. Prof. Gaj Vidmar, PhD
University Rehabilitattion Institute, Republic of Slovenia

Irrelevant P.S. Long ago, before embarking on what eventually ended mainly 
in statistics,
I did two years of geology, so (and also because of knowing what the 
poster's institute does)
I even kinda imagine what these data are.

"Matev¾ Pavliè" <matevz.pavlic at gi-zrmk.si> wrote in message 
news:AD5CA6183570B54F92AA45CE2619F9B9D96994 at gi-zrmk.si...
> Hi,
>
> I am sorry, will try to be more exact from now on...
>
> I have a data.frame  with a field called Opis. IT contains sentenses that 
> I would like to split in words or fields in data.frame...when I say 
> columns I mean as in Excel table. I would like to split "Opis" into ten 
> fields from the first ten words in Opis field.
> Here is an example of my data.frame.
>
> 'data.frame':   22928 obs. of  12 variables:
> $ VrtinaID        : int  1 1 1 1 2 2 2 2 2 2 ...
> $ ZapStev         : int  1 2 3 4 1 2 3 4 5 6 ...
> $ GlobinaOd       : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GlobinaDo       : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ Opis            : Factor w/ 12754 levels "","(MIVKA) DROBEN MELJAST 
> PESEK, GOST, SIVORJAV",..: 2060 11588 2477 11660 7539 3182 7884 9123 2500 
> 4756 ...
> $ ACklasifikacija : Factor w/ 290 levels "","(CL)","(CL)/(SC)",..: 154 125 
> 101 101 NA 106 125 80 106 101 ...
> $ GeolNastOd      : num  0 0.8 9.2 10.1 0 0.9 2.6 4.9 6.8 7.3 ...
> $ GeolNastDo      : num  0.8 9.2 10.1 11 0.9 2.6 4.9 6.8 7.3 8.2 ...
> $ GeolNastOpis    : Factor w/ 113 levels "","B. M. S.",..: 56 53 53 53 56 
> 53 53 53 53 53 ...
> $ NacinVrtanjaOd  : num  0e+00 1e+09 1e+09 1e+09 0e+00 ...
> $ NacinVrtanjaDo  : num  1.1e+01 1.0e+09 1.0e+09 1.0e+09 1.0e+01 ...
> $ NacinVrtanjaOpis: Factor w/ 43 levels "","H. N.","IZKOP",..: 26 1 1 1 26 
> 1 1 1 1 1 ...
>
> Hope that explains better...
> Thank you, m
>
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Monday, November 01, 2010 10:13 PM
> To: Matev¾ Pavliè
> Cc: r-help at r-project.org
> Subject: Re: [R] spliting first 10 words in a string
>
>
> On Nov 1, 2010, at 4:39 PM, Matev¾ Pavliè wrote:
>
>> Hi all,
>>
>>
>>
>> I have a columnn with text that has quite a few words in it. I would
>> like to split these words in separate columns, but just first ten
>> words in the string. Is that possible in R?
>>
>>
>
> Not sure what a column means to you. It's not a precisely defined R
> type or class. (And you are requested to offered a concrete example
> rather than making us guess.)
>
> >words <-"I have a columnn with text that has quite a few words in
> it. I would like to split these words in separate columns, but just
> first ten words in the string. Is that possible in R?"
>
> > strsplit(words, " ")[[1]][1:10]
>  [1] "I"       "have"    "a"       "columnn" "with"    "text"
> "that"    "has"     "quite"   "a"
>
>
> Or if in a dataframe:
>
> > words <-c("I have a columnn with text that has quite a few words in
> it.",   "I would like to split these words in separate columns", "but
> just first ten words in the string. Is that possible in R?")
> > worddf <- data.frame(words=words)
>
> > t(sapply(strsplit(worddf$words, " "), "[", 1:10) )
>      [,1]  [,2]    [,3]    [,4]      [,5]    [,6]    [,7]    [,
> 8]      [,9]       [,10]
> [1,] "I"   "have"  "a"     "columnn" "with"  "text"  "that"  "has"
> "quite"    "a"
> [2,] "I"   "would" "like"  "to"      "split" "these" "words" "in"
> "separate" "columns"
> [3,] "but" "just"  "first" "ten"     "words" "in"    "the"   "string."
> "Is"       "that"
>
>
> -- 
> David Winsemius, MD
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list