[R] Variable length datafile import problem

John Kane jrkrideau at yahoo.ca
Sun Feb 20 20:33:50 CET 2011


Hi Ingo, 

Sorry for being so slow to get back to you.  I've had a bit of a problem with my internet connection.

Just how large is the data set?  You might want to have a look at this thread re size of R data files. http://r.789695.n4.nabble.com/Boundaries-of-R-td3312593.html .

In any case, from a bit more poking around in the file it looks like the last column, Column LN in Calc, is the problem. It has only NaN in row 57 as a value.  If I remove it I can read in the rest of the file.  In fact simply changing it to 0 (zero) makes the file readable.

I've had a look at it in Calc and in jEdit but cannot see anything suspicious there.  I suspect there must be something funny in there since at Row 32 also ends with NaN and seems to be reading in properly.  

BTW what are the NaN's doing there?



--- On Fri, 2/18/11, Ingo Reinhold <ingor at kth.se> wrote:

> From: Ingo Reinhold <ingor at kth.se>
> Subject: RE: [R] Variable length datafile import problem
> To: "John Kane" <jrkrideau at yahoo.ca>, "r-help at r-project.org" <r-help at r-project.org>
> Received: Friday, February 18, 2011, 3:16 AM
> Hi John, 
> 
> seems there is no easy way. I'll just precondition it with
> AWK as described here http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg53401.html
> 
> There are some remarks in the thread that R is not supposed
> to read too large files for "political" reasons. Maybe
> that's it.
> 
> Many thanks again for the effort. 
> 
> Ingo
> ________________________________________
> From: John Kane [jrkrideau at yahoo.ca]
> Sent: Thursday, February 17, 2011 11:54 AM
> To: Ingo Reinhold
> Subject: RE: [R] Variable length datafile import problem
> 
> Generally most of the gurus are in this list. 
> Hopefully someone will take an interest in the problem.
> 
> I suspect that there may be some kind of weird value in the
> file that is upsetting in import.  Given the results I
> got when I removed the data past BD and then at AL it seems
> that the problem might be within this range.
> 
> You could try removing half the data between those columns
> and see what happens, then repeat if something turns up.
> It's tedious but unless someone with a better grasp of
> variable length data import can help it's the best I can
> suggest.
> 
> BTW you only replied to me.  You should make sure to
> cc the list otherwise readers won't realise that I am being
> of no help.
> 
> If you still have the problem by Saturday e-mail me or post
> to the list and I'll try to spent some more time messing
> about with the problem.
> 
> Sorry to be of so little help.
> --- On Thu, 2/17/11, Ingo Reinhold <ingor at kth.se>
> wrote:
> 
> > From: Ingo Reinhold <ingor at kth.se>
> > Subject: RE: [R] Variable length datafile import
> problem
> > To: "John Kane" <jrkrideau at yahoo.ca>
> > Received: Thursday, February 17, 2011, 5:36 AM
> > Hi John,
> >
> > as it seems we're hitting the wall here, can you
> maybe
> > recommend another mailing list with "gurus" (as you
> put it)
> > that may be able to help?
> >
> > Regards,
> >
> > Ingo
> > ________________________________________
> > From: John Kane [jrkrideau at yahoo.ca]
> > Sent: Thursday, February 17, 2011 11:25 AM
> > To: Ingo Reinhold
> > Subject: RE: [R] Variable length datafile import
> problem
> >
> > Hi Ingo,
> >
> > I've had a bit of time to examine the file and I must
> say
> > that, at the moment, I have no idea what is going on.
> > I tried the old cut the file into pieces trick just
> came up
> > with even more anomalous results.
> >
> > My first attempt remove all the data past column AL in
> an
> > OOo Calc spreadsheet.  This created a
> rectangular
> > dataset It imported into R with no problem with 38
> columns
> > as expected.
> >
> > Then I deleted all the data from the orignal data
> file
> > (test.dat) removing all the data past column BD in an
> OOo
> > Calc spreadsheet.
> >
> > This imported a file with only 38 columns.
> >
> > Something very funny is happening but at the moment I
> have
> > no
> >
> > --- On Wed, 2/16/11, Ingo Reinhold <ingor at kth.se>
> > wrote:
> >
> > > From: Ingo Reinhold <ingor at kth.se>
> > > Subject: RE: [R] Variable length datafile import
> > problem
> > > To: "John Kane" <jrkrideau at yahoo.ca>
> > > Received: Wednesday, February 16, 2011, 1:59 AM
> > > Hi John,
> > >
> > > V1 should be just a character. However I figured
> > something
> > > out myself. The import looks OK in terms of
> column
> > when
> > > adding the flush=TRUE option.
> > >
> > > I am still very confused about the dimensions
> that
> > the
> > > imported data shows. Loading my data file into
> > something
> > > like OOspreadsheet shows me a maximum of about
> 245,
> > which
> > > does not correspond to the 146 generated by R.
> Any
> > idea
> > > where this saturation comes from?
> > >
> > > Thanks,
> > >
> > > Ingo
> > > ________________________________________
> > > From: John Kane [jrkrideau at yahoo.ca]
> > > Sent: Wednesday, February 16, 2011 1:57 AM
> > > To: Ingo Reinhold
> > > Subject: RE: [R] Variable length datafile import
> > problem
> > >
> > > Is rawData$V1 intended to be factor or
> character?
> > >
> > > str(rawData) gives
> > > $ V1  : Factor w/ 54 levels
> "-232.0","-234.0",..:
> > 41
> > > 41 41 41 41 41 41 41 41 41 ...
> > >
> > > If you were not expecting a factor you might try
> > > options(stringsAsFactors = FALSE) before
> importing
> > the
> > > data.
> > >
> > > --- On Tue, 2/15/11, Ingo Reinhold <ingor at kth.se>
> > > wrote:
> > >
> > > > From: Ingo Reinhold <ingor at kth.se>
> > > > Subject: RE: [R] Variable length datafile
> import
> > > problem
> > > > To: "John Kane" <jrkrideau at yahoo.ca>
> > > > Received: Tuesday, February 15, 2011, 3:35
> PM
> > > > Dear all,
> > > >
> > > > I have changed the file-ending with no
> change in
> > the
> > > > result. I don't think that this should
> matter.
> > > >
> > > > http://dl.dropbox.com/u/2414056/Test.dat
> > > > is a test file which represent the structure
> I
> > am
> > > trying to
> > > > read. So far I have used
> > > >
> > > > rawData=read.table("Test.txt", fill=TRUE,
> > sep="\t",
> > > > header=FALSE);
> > > >
> > > > When then looking at rawData$V1 this gives
> me a
> > > distorted
> > > > view of my original first column.
> > > >
> > > > Thanks,
> > > >
> > > > Ingo
> > >
> > >
> > >
> >
> >
> >
> >
> 
> 
> 





More information about the R-help mailing list