[R] How to fetch specific part from a number of Text files?

Charles C. Berry cberry at tajo.ucsd.edu
Mon Dec 15 23:00:29 CET 2008


On Mon, 15 Dec 2008, megh wrote:

>
> Thanks Charles for this reply. I have started according to your suggestion
> and hopefully I can do it. In the mean time what I was thinking, instead of
> calling my text files by their names, is there any mechanism to call them by
> the order they are stored in that directory?


I am not sure what that order would be. If you mean 'how would I order 
files by (say) creation date?', see

 	?file.info

Eventually you need a string that has the file name in it or a connection 
object (see ?connection)  that accesses the file(s).


Means, suppose, I have total
> 1000 text files in that directory and therefore I create a vector like
> sel.no <- c(1:1000). Next I use the i-th element of the vector "sel.no" to
> access the i-th file?

Hmmm. Something about this question is telling me you are either a novice 
programmer or really unfamiliar with R or perhaps you just need that extra 
cup of coffee.

In any but the latter case, let me suggest that it helps to reread the 
Intro to R (and any other books/manuals you might have), read help pages 
for possibly relevant functions, and to run example( file.info ), say, to 
get a handle on functions you are tying to learn. Also, rereading the 
_posting guide_ is helpful as it is, in part, a guide to figuring out 
things in R.


HTH,

Chuck



>
> With regards,
>
>
>
> Charles C. Berry wrote:
>>
>> On Mon, 15 Dec 2008, megh wrote:
>>
>>>
>>> Hi all,
>>>
>>> I my c: drive I have possibly 1,000 notepad files, with .txt extension.
>>> They
>>> are named as the dates on which they were saved i.e. 1st file name is
>>> "Volume_4-18-2008", 2nd one is "Volume_4-21-2008", 3rd one
>>> "Volume_4-22-2008" and so on............
>>>
>>> Also, content of each file are in same format like :
>>>
>>> ******** content of 1st file *************
>>> section : 1
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> section : 2
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> section : 3
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> section : 4
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>> -----       ---------      ----------    -----------
>>>
>>> Here all files have 4-sections, just like shown here but contents within
>>> each section (i.e. dashed line here) differs file to file.
>>>
>>> What I have to do is I have to fetch contents of "section : 2" from each
>>> file and then save it to a R-object, matrix of list for further analysis.
>>>
>>> Can you ppl please tell me how to do that?
>>
>> Here is the outline:
>>
>>  	*) use list.files() or Sys.glob() to get a list of the files
>>
>>  	*) write a function that takes the file name as its arg, uses
>>             readLines() to swallow the text and uses grep() to find the
>>             'section' lines. Then put the 'dashes' in between two section
>>             lines into a separate object (say, dash.lines). Then use
>>
>>  		as.matrix( read.table(con <- textConnection( dash.lines ) )
>>  		close(con)
>>
>>  	  to get the numeric values or maybe
>>
>>  		sapply( strsplit(dash.lines, "[ ]+"), as.numeric)
>>
>>  	*) debug this on one file
>>
>>
>>  	*) use lapply  to step thru the list of file names.
>>
>> See
>>
>>  	?list.files
>>  	?Sys.glob
>>  	?readLines
>>  	?grep
>>  	?textConnection
>>  	?strsplit
>>  	?sapply
>>
>> HTH,
>>
>> Chuck
>>
>>
>>>
>>> Thanks and regards,
>>> --
>>> View this message in context:
>>> http://www.nabble.com/How-to-fetch-specific-part-from-a-number-of-Text-files--tp21011017p21011017.html
>>> Sent from the R help mailing list archive at Nabble.com.
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> Charles C. Berry                            (858) 534-2098
>>                                              Dept of Family/Preventive
>> Medicine
>> E mailto:cberry at tajo.ucsd.edu	            UC San Diego
>> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-fetch-specific-part-from-a-number-of-Text-files--tp21011017p21020032.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list