[R] difficulties in reading a .prn file

jim holtman jholtman at gmail.com
Wed Oct 29 17:43:31 CET 2008


I would guess that your separator is not really a tab like you think
it is.  Take a small subset of the data, bring it up in a text editor,
check the contents and then try to read it.  Always start small to see
if it is working the way you think it should.  Also it seem to have a
header, so why are you ignoring it?  It may make your numeric columns
look like factors which is probably not want you want.

On Wed, Oct 29, 2008 at 12:19 PM,  <jass at in.gr> wrote:
>
> Hello,
>
> I am having problems in reading appropriately a huge .prn file of almost 450.000 rows and 29 columns.
> The variables are consisted of characters, dates, time, numeric values.
> I use read.table("file.prn", header=F, sep="\t", na.strings="*"), where the missing values are declared as "*".
> The R engine is reading it like it, but when I am asking for the dimensions of the data frame I get the right number of rows but only 1 column...
> dim(file)
> [1] 422344      1
>
> It is somehow as it reads the whole row as one column.
> When I am asking for the first 3 lines for example I got the message that R is reading everything as factors and I get something like this below:
>
>  data12L[1:3,]
> ID       DATE        Time      RRR      VEl       Leng     Weig      Sub       var1     var2     var3     var4     var5     var6     var7     var8     var9    var10    var11    var12    var13    var14    var15    VAR1    VAR2    VAR3    VAR4    VAR5    VAR6    VAR7    VAR8    VAR9   VAR10   VAR11   VAR12   VAR13   VAR14   VAR15
> [2]     54678611       39356   0.1572569    RW          89        2014       21400              V11A11      4500      7200      4700      5000         *         *         *         *         *         *         *         *         *         *         *         0       527       594       567         *         *         *         *         *         *         *         *         *         *         *
> [3]     54678612       39356   0.1583333    RW           81        1716       33000               T11O3      7100      9100      5700      5600      5500         *         *         *         *         *         *         *         *         *         *         0       397       605       133       133         *         *         *         *         *         *         *         *         *         *
>
> 422344 Levels:        ID       DATE        Time             RRR     VEl    Leng    Weig             Sub     var1     var2     var3     var4     var5     var6     var7     var8     var9    var10    var11    var12    var13    var14    var15    VAR1    VAR2    VAR3    VAR4    VAR5    VAR6    VAR7    VAR8    VAR9   VAR10   VAR11   VAR12   VAR13   VAR14   VAR15 ..
>
> Is there any solution? Any suggestion?
> And what is going on with the "*"? Is there any suggestion for this as well???
> Thanks for your time!
>
> Ismini
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



More information about the R-help mailing list