[R] Reading large files

Gabor Grothendieck ggrothendieck at gmail.com
Sat Feb 6 19:13:52 CET 2010


No.

On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
<SATISH.VADLAMANI at fritolay.com> wrote:
> Gabor:
> Can I pass colClasses as a vector to read.csv.sql? Thanks.
> Satish
>
>
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
> Sent: Saturday, February 06, 2010 9:41 AM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help at r-project.org
> Subject: Re: [R] Reading large files
>
> Its just any Windows batch command string that filters stdin to
> stdout.  What the command consists of should not be important.   An
> invocation of perl that runs a perl script that filters stdin to
> stdout might look like this:
>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>
> For an actual example see the source of read.csv2.sql which defaults
> to using a Windows vbscript program as a filter.
>
> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
> <SATISH.VADLAMANI at fritolay.com> wrote:
>> Jim, Gabor:
>> Thanks so much for the suggestions where I can use read.csv.sql and embed Perl (or gawk). I just want to mention that I am running on Windows. I am going to read the documentation the filter argument and see if it can take a decent sized Perl script and then use its output as input.
>>
>> Suppose that I write a Perl script that parses this fwf file and creates a CSV file. Can I embed this within the read.csv.sql call? Or, can it only be a statement or something? If you know the answer, please let me know. Otherwise, I will try a few things and report back the results.
>>
>> Thanks again.
>> Saitsh
>>
>>
>> -----Original Message-----
>> From: jim holtman [mailto:jholtman at gmail.com]
>> Sent: Saturday, February 06, 2010 6:16 AM
>> To: Gabor Grothendieck
>> Cc: Vadlamani, Satish {FLNA}; r-help at r-project.org
>> Subject: Re: [R] Reading large files
>>
>> In perl the 'unpack' command makes it very easy to parse fixed fielded data.
>>
>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>> <ggrothendieck at gmail.com> wrote:
>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>> input through a filter written in perl, [g]awk or other language.
>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>
>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>> width fields, e.g.
>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>> making this very easy but perl or whatever you are most used to would
>>> be fine too.
>>>
>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>> <SATISH.VADLAMANI at fritolay.com> wrote:
>>>> Hi Gabor:
>>>> Thanks. My files are all in fixed width format. They are a lot of them. It would take me some effort to convert them to CSV. I guess this cannot be avoided? I can write some Perl scripts to convert fixed width format to CSV format and then start with your suggestion. Could you let me know your thoughts on the approach?
>>>> Satish
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
>>>> Sent: Friday, February 05, 2010 5:16 PM
>>>> To: Vadlamani, Satish {FLNA}
>>>> Cc: r-help at r-project.org
>>>> Subject: Re: [R] Reading large files
>>>>
>>>> If your problem is just how long it takes to load the file into R try
>>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>>> create an SQLite database and table layout for you, read the file into
>>>> the database (without going through R so R can't slow this down),
>>>> extract all or a portion into R based on the sql argument you give it
>>>> and then remove the database.  See the examples on the home page:
>>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>>
>>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>> <SATISH.VADLAMANI at fritolay.com> wrote:
>>>>>
>>>>> Matthew:
>>>>> If it is going to help, here is the explanation. I have an end state in
>>>>> mind. It is given below under "End State" header. In order to get there, I
>>>>> need to start somewhere right? I started with a 850 MB file and could not
>>>>> load in what I think is reasonable time (I waited for an hour).
>>>>>
>>>>> There are references to 64 bit. How will that help? It is a 4GB RAM machine
>>>>> and there is no paging activity when loading the 850 MB file.
>>>>>
>>>>> I have seen other threads on the same types of questions. I did not see any
>>>>> clear cut answers or errors that I could have been making in the process. If
>>>>> I am missing something, please let me know. Thanks.
>>>>> Satish
>>>>>
>>>>>
>>>>> End State
>>>>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>>>>
>>>>>
>>>>> -----
>>>>> Satish Vadlamani
>>>>> --
>>>>> View this message in context: http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>



More information about the R-help mailing list