[R] Importing big plain files from ERP-System/Data Mining with R

Tue Oct 26 14:11:31 CEST 2004

On Tue, 26 Oct 2004 r-help.20.stefan817 at spamgourmet.com wrote:

>> how can I import really big plain text data files (several GB) from an

>Unlikely unless you have a 64-bit platform.

Why? I have a 32-bit Win XP Platform running R 2.0.0. With ACL 8.21 e.g. 10 GB were no problem.

>Only starting with R 2.0.0 can some 32-bit versions of R access files >
>2Gb, and to import the file into R you need enough address space in R for
>the object, which is normally more than the file size.

Is this really so? I want to summarize the data or calculate clusters, so only the aggregated information should be in memory. Does R first import the whole file and then calculate with it? In ACL the concept is to leave the file itself on the harddisk, scanning it for each calculation and doing only the calculation in memory. (Surely not very fast, but probably the only method for big files)

>Almost certainly not if the unmentioned platform is Windows, but you could 
>access the data from a DBMS.

I can do this also, but with several limitations.

>> ERP-System (SAP-Tables) to R?
>> The Header of these files are always similar, for example:
>> 
>> Tabelle:        T009
>> Angezeigte Felder:  7 von  7  Feststehende FÃ¼hrungsspalten: 2  Listbreite
>> 0250
>> ----------------------------------------------------------------------
>> |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT                         |
>> ----------------------------------------------------------------------
>> |X|001  |01   |X    |     |012  |02   |ABC                           |
>> |X|001  |V9   |     |     |012  |04   |Okt. - Sep., 4 Sonderperioden |
>> |X|001  |WK   |     |X    |053  |00   |Kalenderwochen                |
>> ----------------------------------------------------------------------
>> 
>> (including the first 5 rows in each downloaded table, row # 4 =field >names,
>> length of 1 row > 1023 bytes, count of fields > 256, size = several GB,
>> count records = several million)
>> 
>> What is an appropriate way to read such tables in?

Greetings
Stefan