[R] Reading fixed column format

Duncan Murdoch murdoch at stats.uwo.ca
Wed Sep 13 13:06:36 CEST 2006


Anupam Tyagi wrote:
> Barry Rowlingson <B.Rowlingson <at> lancaster.ac.uk> writes:
>
>   
>>> None of these seem to read non-coniguous variables from columns; or 
>>> may be I am missing something. "read.fwf" is not meant for large
>>> files according to a post in the archives. Thanks for the pointers. I
>>> have read the R data input and output. Anupam.
>>>       
>>   First up, how 'large' is your 'large ASCII file'? How many rows and 
>> columns?
>>     
>
> There are 356,112 records, and 326 variables. It has a fixed record length of
> 1283 positions, therefore "cut -b" can not be used.
>  
>   
>>   Secondly, what are 'non-contiguous' variables?
>>     
>
> When I do not want to read all columns. For example, I would like to read the
> following:
>
> StartingColumn  VariableName  	FieldLength
> 1 	STATE 	2
> 24 	INTVID 	3
> 27 	DISPCODE 3
> 30 	PSU 	10
>   

read.fwf() can handle the skipped columns (you use negative column 
values; see the man page).  It will break the read up into blocks, so 
the large size of the original file shouldn't be a problem.

Duncan Murdoch

> Sometimes I would also like to format the data after it has been read. For
> example, the ASCII file has price in columns 100 to 105 written as 005999. I
> want to read this and format it as 59.99 (omitting leading zeros in the price).
>
>   
>>   Perhaps if you posted the first few lines and columns of the file then 
>> we might get an idea of how to read it in.
>>     
>
> I have not even downloaded the data onto my computer yet, because I am not sure
> I can read it in. The zipped file is 67MB. Using similar data a few years ago, I
> recall the unzipped file to be about 350--400 MB. I had used MySQL then, but it
> took some doing to get it in, and there were things that did not seem to work as
> I wanted them to---I could not figure out how to label the variables. I usually
> do not have to work with a dataframe of more than 10-30 MB at a time.
>
> It would be good to have a facility in R which defines the meta-data: labelling
> and structure of the dataset: positions of variables, their names, their lables,
> their levels (e.g. for ordered choice or group variables: yes, sometimes, no
> type responses). This can be saved as a seperate object and passed to a function
> that gets the named varibales from the ASCII file (names of variables to get can
> be given as arguments or as, attaches the meta data and creates a dataframe with
> all the meta-data attached. The meta-data of the dataframe could include notes
> at dataframe and variable level, and other information. This information is
> passed on to the plotting functions and used when formatting the output of
> statistical procedures.
>
> I agree with with Michael Kobovy that this is a very helpful list, and people do
> not owe less than what one paid for the software :)
>
> Anupam.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list