[R] Value Lookup from File without Slurping

r at quantide.com r at quantide.com
Fri Jan 16 11:52:47 CET 2009


I agree on the database solution.
Database are the rigth tool to solve this kind of problem.
Only consider the start up cost of setting up the database. This could 
be a very time consuming task if someone is not familiar with database 
technology.

Using file() is not a real reading of all the file. This function will 
simply open a connection to the file without reading it.
countLines should do something lile "wc -l" from a bash shell

I would say that if this is a one time job this solution should work 
even thought is not the fastest. In case this job is a repetitive one, 
then a database solution is surely better

A.


Wacek Kusnierczyk wrote:
> if the file is really large, reading it twice may add considerable penalty:
>
> r at quantide.com wrote:
>   
>> Something like this should work
>>
>> library(R.utils)
>> out = numeric()
>> qr = c("AAC", "ATT")
>> n =countLines("test.txt")
>>     
>
> # 1st pass
>
>   
>> file = file("test.txt", "r")
>> for (i in 1:n){
>>     
>
> # 2nd pass
>
>   
>> line = readLines(file, n = 1)
>> A = strsplit (line, split = " ")[[1]][1]
>> if(is.element(A, qr)) {
>> value = as.numeric(strsplit (line, split = " ")[[1]][2])
>> out = c(out, value)
>> }
>> }
>>     
>
> if this is a one-go task, counting the lines does not pay, and why
> bother.  if this is a repetitive task, a database-based solution will
> probably be a better idea.
>
> vQ
>
>




More information about the R-help mailing list