[R] difficulties in reading a .prn file
P.Dalgaard at biostat.ku.dk
Wed Oct 29 17:52:12 CET 2008
jim holtman wrote:
> I would guess that your separator is not really a tab like you think
> it is. Take a small subset of the data, bring it up in a text editor,
> check the contents and then try to read it. Always start small to see
> if it is working the way you think it should. Also it seem to have a
> header, so why are you ignoring it? It may make your numeric columns
> look like factors which is probably not want you want.
Also, there seems to be 38 columns, not 29...
Does it not work with plain whitespace separation?, i.e.:
read.table("file.prn", header=T, na.strings="*")
> On Wed, Oct 29, 2008 at 12:19 PM, <jass at in.gr> wrote:
>> I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns.
>> The variables are consisted of characters, dates, time, numeric values.
>> I use read.table("file.prn", header=F, sep="\t", na.strings="*"), where the missing values are declared as "*".
>> The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column...
>>  422344 1
>> It is somehow as it reads the whole row as one column.
>> When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below:
>> ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15
>>  54678611 39356 0.1572569 RW 89 2014 21400 V11A11 4500 7200 4700 5000 * * * * * * * * * * * 0 527 594 567 * * * * * * * * * * *
>>  54678612 39356 0.1583333 RW 81 1716 33000 T11O3 7100 9100 5700 5600 5500 * * * * * * * * * * 0 397 605 133 133 * * * * * * * * * *
>> 422344 Levels: ID DATE Time RRR VEl Leng Weig Sub var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14 var15 VAR1 VAR2 VAR3 VAR4 VAR5 VAR6 VAR7 VAR8 VAR9 VAR10 VAR11 VAR12 VAR13 VAR14 VAR15 ..
>> Is there any solution? Any suggestion?
>> And what is going on with the "*"? Is there any suggestion for this as well???
>> Thanks for your time!
>> R-help at r-project.org mailing list
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help