[R] Reading word by word in a dataset

Thu Nov 4 14:00:30 CET 2004

Thanks, Tony.
I got a very good idea of using "flush" in scan() from
your reply, so that I successfully did my little job.
But, my next question arises if I want to extract the
list of the price items only in the 2nd column in my
example.
I did it the following way. Is it the right way to do?
Or do you have a smarter or more efficient way to do
it?

> system("more mtx.ex.1")
i1-apple 10$ New_York
i2-banana 5$ London
i3-strawberry 7$ Japan
>
> scan(file="mtx.ex.1", what=list(NULL,""),
flush=T)[[2]]
Read 3 records
[1] "10$" "5$"  "7$"

Cheers,

John

 --- Tony Plate <tplate at acm.org> wrote: 
> Trying to make it work when not all rows have the
> same numbers of fields 
> seems like a good place to use the "flush" argument
> to scan() (to skip 
> everything after the first field on the line):
> 
> With the following copied to the clipboard:
> 
> i1-apple        10$   New_York
> i2-banana
> i3-strawberry   7$    Japan
> 
> do:
> 
>  > scan("clipboard", "", flush=T)
> Read 3 items
> [1] "i1-apple"      "i2-banana"     "i3-strawberry"
>  > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "",
> flush=T))
> Read 3 items
> [1] "apple"      "banana"     "strawberry"
>  >
> 
> -- Tony Plate
> 
> At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
> >      Uwe and Andy's solutions are great for many
> applications but won't 
> > work if not all rows have the same numbers of
> fields.  Consider for 
> > example the following modification of Lee's
> example:
> >i1-apple        10$   New_York
> >i2-banana
> >i3-strawberry   7$    Japan
> >
> >      If I copy this to "clipboard" and run Andy's
> code, I get the following:
> > > read.table("clipboard",
> colClasses=c("character", "NULL", "NULL"))
> >Error in scan(file = file, what = what, sep = sep,
> quote = quote, dec = 
> >dec,  :
> >    line 2 did not have 3 elements
> >
> >      We can get around this using "scan", then
> splitting things apart 
> > similar to the way Uwe described:
> > > dat <-
> >+ scan("clipboard", character(0), sep="\n")
> >Read 3 items
> > > dash <- regexpr("-", dat)
> > > dat2 <- substring(dat, pmax(0, dash)+1)
> > >
> > > blank <- regexpr(" ", dat2)
> > > if(any(blank<0))
> >+   blank[blank<0] <- nchar(dat2[blank<0])
> > > substring(dat2, 1, blank)
> >[1] "apple "      "banana"      "strawberry "
> >
> >      hope this helps.  spencer graves
> >
> >Uwe Ligges wrote:
> >
> >>Liaw, Andy wrote:
> >>
> >>>Using R-2.0.0 on WinXPPro, cut-and-pasting the
> data you have:
> >>>
> >>>
> >>>>read.table("clipboard",
> colClasses=c("character", "NULL", "NULL"))
> >>>
> >>>
> >>>              V1
> >>>1      i1-apple
> >>>2     i2-banana
> >>>3 i3-strawberry
> >>
> >>
> >>
> >>... and if only the words after "-" are of
> interest, the statement can be 
> >>followed by
> >>
> >>  sapply(strsplit(...., "-"), "[", 2)
> >>
> >>
> >>Uwe Ligges
> >>
> >>
> >>
> >>>HTH,
> >>>Andy
> >>>
> >>>
> >>>>From: j lee
> >>>>
> >>>>Hello All,
> >>>>
> >>>>I'd like to read first words in lines into a new
> file.
> >>>>If I have a data file the following, how can I
> get the
> >>>>first words: apple, banana, strawberry?
> >>>>
> >>>>i1-apple        10$   New_York
> >>>>i2-banana       5$    London
> >>>>i3-strawberry   7$    Japan
> >>>>
> >>>>Is there any similar question already posted to
> the
> >>>>list? I am a bit new to R, having a few months
> of
> >>>>experience now.
> >>>>
> >>>>Cheers,
> >>>>
> >>>>John
> >>>>
> >>>>______________________________________________
> >>>>R-help at stat.math.ethz.ch mailing list
> >>>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>PLEASE do read the posting guide! 
> >>>>http://www.R-project.org/posting-guide.html
> >>>>
> >>>
> >>>
> >>>______________________________________________
> >>>R-help at stat.math.ethz.ch mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEASE do read the posting guide! 
> >>>http://www.R-project.org/posting-guide.html
> >>
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> >
> >
> >--
> >Spencer Graves, PhD, Senior Development Engineer
> >O:  (408)938-4420;  mobile:  (408)655-4567
> >
> >______________________________________________
> >R-help at stat.math.ethz.ch mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 
>