[R] Extract from a text file

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Wed Jun 1 04:05:39 CEST 2016


You need to go back and study how I made my solution reproducible and make your problem reproducible. 

You probably also ought to spend some time comparing the regex pattern to your actual data... the point of this list is to learn how to construct these solutions yourself.
-- 
Sent from my phone. Please excuse my brevity.

On May 31, 2016 6:26:31 PM PDT, Val <valkremk at gmail.com> wrote:
>Thank you so much Jeff. It worked for this example.
>
>When I read it from a file (c:\data\test.txt) it did not work
>
>KLEM="c:\data"
>KR=paste(KLEM,"\test.txt",sep="")
>indta <- readLines(KR, skip=46)  # not interested in the first 46
>lines)
>
>pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>firstlines <- grep( pattern, indta )
># Replace the matched portion (entire string) with the first capture #
>string
>v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
># Replace the matched portion (entire string) with the second capture #
>string
>v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
># Convert the lines just after the first lines to numeric
>v3 <- as.numeric( indta[ firstlines + 1 ] )
># put it all into a data frame
>result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>
>result
>[1] Group Mean  SE
><0 rows> (or 0-length row.names)
>
>Thank you in advance
>
>
>On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller
><jdnewmil at dcn.davis.ca.us> wrote:
>> Please learn to post in plain text (the setting is in your email
>client...
>> somewhere), as HTML is "What We See Is Not What You Saw" on this
>mailing
>> list.  In conjunction with that, try reading some of the fine
>material
>> mentioned in the Posting Guide about making reproducible examples
>like this
>> one:
>>
>> # You could read in a file
>> # indta <- readLines( "out.txt" )
>> # but there is no "current directory" in an email
>> # so here I have used the dput() function to make source code
>> # that creates a self-contained R object
>>
>> indta <- c(
>> "Mean of weight  group 1, SE of mean  :  72.289037489555276",
>> " 11.512956539215610",
>> "Average weight of group 2, SE of Mean :  83.940053900595013",
>> "  10.198495690144522",
>> "group 3 mean , SE of Mean     :                78.310441258245469",
>> " 13.015876679555",
>> "Mean of weight of group 4, SE of Mean               :
>76.967516495101669",
>> " 12.1254882985", "")
>>
>> # Regular expression patterns are discussed all over the internet
>> # in many places OTHER than R
>> # You can start with ?regex, but there are many fine tutorials also
>>
>> pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$"
>> # For this task the regex has to match the whole "first line" of each
>set
>> #  ^ =match starting at the beginning of the string
>> #  .* =any character, zero or more times
>> #  "group " =match these characters
>> #  ( =first capture string starts here
>> #  \\d = any digit (first backslash for R, second backslash for
>regex)
>> #  + =one or more of the preceding (any digit)
>> #  ) =end of first capture string
>> #  [^:] =any non-colon character
>> #  * =zero or more of the preceding (non-colon character)
>> #  : =match a colon exactly
>> #  " *" =match zero or more spaces
>> #  ( =second capture string starts here
>> #  [ =start of a set of equally acceptable characters
>> #  -+ =either of these characters are acceptable
>> #  0-9 =any digit would be acceptable
>> #  . =a period is acceptable (this is inside the [])
>> #  eE =in case you get exponential notation input
>> #  ] =end of the set of acceptable characters (number)
>> #  * =number of acceptable characters can be zero or more
>> #  ) =second capture string stops here
>> #  .* =zero or more of any character (just in case)
>> #  $ =at end of pattern, requires that the match reach the end
>> #     of the string
>>
>> # identify indexes of strings that match the pattern
>> firstlines <- grep( pattern, indta )
>> # Replace the matched portion (entire string) with the first capture
>#
>> string
>> v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) )
>> # Replace the matched portion (entire string) with the second capture
>#
>> string
>> v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) )
>> # Convert the lines just after the first lines to numeric
>> v3 <- as.numeric( indta[ firstlines + 1 ] )
>> # put it all into a data frame
>> result <- data.frame( Group = v1, Mean = v2, SE = v3 )
>>
>> Figuring out how to deliver your result (output) is a separate
>question that
>> depends where you want it to go.
>>
>>
>> On Mon, 30 May 2016, Val wrote:
>>
>>> Hi all,
>>>
>>> I have a messy text file and from this text file I want extract some
>>> information
>>> here is the text file (out.txt).  One record has tow lines. The mean
>comes
>>> in the first line and the SE of the mean is on the second line. Here
>is
>>> the
>>> sample of the data.
>>>
>>> Mean of weight  group 1, SE of mean  :  72.289037489555276
>>> 11.512956539215610
>>> Average weight of group 2, SE of Mean :  83.940053900595013
>>>  10.198495690144522
>>> group 3 mean , SE of Mean     :                78.310441258245469
>>> 13.015876679555
>>> Mean of weight of group 4, SE of Mean               :
>76.967516495101669
>>> 12.1254882985
>>>
>>> I want produce the following  table. How do i read it first and then
>>> produce a
>>>
>>>
>>> Gr1  72.289037489555276   11.512956539215610
>>> Gr2  83.940053900595013   10.198495690144522
>>> Gr3  78.310441258245469   13.015876679555
>>> Gr4  76.967516495101669   12.1254882985
>>>
>>>
>>> Thank you in advance
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>---------------------------------------------------------------------------
>> Jeff Newmiller                        The     .....       .....  Go
>Live...
>> DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
>Go...
>>                                       Live:   OO#.. Dead: OO#.. 
>Playing
>> Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
>> /Software/Embedded Controllers)               .OO#.       .OO#. 
>rocks...1k
>>
>---------------------------------------------------------------------------

	[[alternative HTML version deleted]]



More information about the R-help mailing list