[R] Reading very large text files into R

Dr Eberhard W Lisse no@p@m @end|ng |rom ||@@e@NA
Thu Sep 29 20:23:14 CEST 2022


To me this file looks like a CSV with 15 fields (on each line) not 16,
the last field being empty with the exception of the one which has the
'B'.  The 14th is always empty.

I also note that it does not seem to have a new line at the end.


I can strongly recommend QSV to manipulate CSV files and CSVIEW to look
at them

After renaming the file for convenience you can do something like

	qsv input --trim-fields --trim-headers sample.csv \
		| qsv select -n "1,2,6,7,8,9,10" \
		| qsv rename "date,c2,type,c4,c5,c6,c7" \
		| csview -i5 -np0

and get something like

		┌──┬────────────────┬──────┬───────┬────┬────┬──┬──┐
		│# │      date      │  c2  │ type  │ c4 │ c5 │c6│c7│
		├──┼────────────────┼──────┼───────┼────┼────┼──┼──┤
		│1 │1980-01-01 10:00│226918│WAHRAIN│5124│1001│0 │  │
		│2 │1980-01-01 10:00│228562│WAHRAIN│491 │1001│0 │  │
		│3 │1980-01-01 10:00│231581│WAHRAIN│5213│1001│0 │  │
		│4 │1980-01-01 10:00│232671│WAHRAIN│487 │1001│0 │  │
		│5 │1980-01-01 10:00│232913│WAHRAIN│5243│1001│0 │  │
		│6 │1980-01-01 10:00│234362│WAHRAIN│5265│1001│0 │  │
		│7 │1980-01-01 10:00│234682│WAHRAIN│5271│1001│0 │  │
		│8 │1980-01-01 10:00│235389│WAHRAIN│5279│1001│0 │  │
		│9 │1980-01-01 10:00│236466│WAHRAIN│497 │1001│0 │  │
		│10│1980-01-01 10:00│243350│SREW   │484 │1001│0 │  │
		│11│1980-01-01 10:00│243350│WAHRAIN│484 │1001│0 │0 │
		└──┴────────────────┴──────┴───────┴────┴────┴──┴──┘

As the files do not have headers, you could, if you have multiple files,
even do something like

	qsv cat rows s*.csv \
		| qsv input --trim-fields --trim-headers \
		| qsv select -n "1,2,6,7,8,9,10" \
		| qsv rename "date,c2,type,c4,c5,c6,c7" \
		| qsv dedup 2>/dev/null -o readmeintoR.csv


If it was REALLY a file with different numbers of fields you can use
CSVQ and do something like

	cat s*csv \
		| csvq --format CSV --no-header --allow-uneven-fields \
			"SELECT c1 as date, c2, c6 as type, c7 as c4,
                              c8 as c5, c9 as c6, c10 as c7
			FROM stdin" \
		| qsv input --trim-fields --trim-headers \
		| qsv dedup 2>/dev/null -o readmeintoR.csv

And, finally, depending on how long the reading of the CSV takes, I
would save it into a RDS, loading of which is very fast.


greetings, el

On 2022-09-29 17:26 , Nick Wray wrote:
> Hi Bert   
> 
> Right Thing is, I didn't know that there even was an instruction like
> read.csv(text = "...  your text...  ") so at any rate I can paste the
> original text files in by hand if there's no shorter cut
> Thanks v much Nick
[...]



More information about the R-help mailing list