[R] Ignoring initial rows in a text file import

David Winsemius dwinsemius at comcast.net
Tue Jun 1 03:53:28 CEST 2010


On May 31, 2010, at 9:14 PM, jim holtman wrote:

> This is the case where '\t' is the tab character; '\\t' would give you
> the backslash followed by 't' which is not what you want.

Both versions "worked" in the sense of returning the number of the  
line with "\tBegin Main\t"

 > txt <- "\n\tBegin Main\t"
 > in.str <- readLines(textConnection(txt))
 > grep("\tBegin", in.str)
[1] 2
 > grep("\\tBegin", in.str)
[1] 2

It's possible that Newmiller's explanation contained the answer but it  
was too abstract for me to really grasp.

-- 
David.

>
> On Mon, May 31, 2010 at 8:19 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>>
>> On May 31, 2010, at 8:14 PM, jim holtman wrote:
>>
>>> try this:
>>>
>>> input  <- readLines("yourfile.txt")
>>> # determine start
>>> start <- grep("\tBegin Main\t", input)[1]  # first line if many
>>
>> Puzzled. I thought backslashes in grepping patterns needed to be  
>> doubled? I
>> guess not.
>>
>> --
>> David.
>>
>>> if (length(start) == 1 && (start > 1)){
>>>   input <- tail(input, -(start - 1))  # delete heading lines
>>> }
>>> # find lines you want to delete
>>> breaks <- grep("\tBreak\t", input)
>>> if (length(breaks) > 0){
>>>   input <- input[-breaks]
>>> }
>>> # now read in your data
>>> real_input <- read.table(textConnection(input), header=TRUE)
>>> closeAllConnections()
>>>
>>>
>>>
>>> On Mon, May 31, 2010 at 7:51 PM, Kevin Burnham  
>>> <kburnham at gmail.com> wrote:
>>>>
>>>> I am trying to import a series of text files generated by stimulus
>>>> presentation software.  The problem that I am having is that the  
>>>> number
>>>> of
>>>> rows I need to skip is not fixed (depending on subject's pretest
>>>> behavior)
>>>> nor is the first row of the data I want always the same (the  
>>>> stimuli were
>>>> presented in random order).  So I need to bring in the .txt file  
>>>> (using
>>>> readLines?), look for the row containing the text "Begin  
>>>> Main" (see exact
>>>> row below) and start reading data to a table from that point.
>>>>
>>>>  [13] "Main Group\t1000\tBegin Main\tBegin Main\tBegin Main\t\t
>>>> \tPressed\t(any response)\tC\t25860\t\t\t\t\t"
>>>>
>>>> I would also like it to ignore the row:
>>>> [173] "Main Group\t1000\tBreak\tBreak\
>>>> tpause3\t\t \tPressed\t(any response)\tC\t47610\t\t\t\t\t"
>>>>
>>>> which will always be the same number of rows after the "Begin  
>>>> Main" row.
>>>>
>>>> Thanks,
>>>> Kevin Burnham
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>>
>>>
>>> --
>>> Jim Holtman
>>> Cincinnati, OH
>>> +1 513 646 9390
>>>
>>> What is the problem that you are trying to solve?
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>
>
>
> -- 
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list