[R] trouble with read.table and colClasses='raw'

Greg Snow Greg.Snow at imail.org
Thu Feb 11 19:32:19 CET 2010


The other possibility is that you could create the function to convert from character to raw (possibly wrapping as.raw around as.integer) so that read.table knows what to do.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Greg Snow
> Sent: Thursday, February 11, 2010 11:06 AM
> To: Johan Jackson; Don MacQueen
> Cc: r-help at r-project.org
> Subject: Re: [R] trouble with read.table and colClasses='raw'
> 
> The read.table function does not know how to convert the character
> representation that it reads into raw variables.  Try using 'integer'
> for the colClasses to read the data in as integers, then convert those
> back to raw (if that is really what you need).
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Johan Jackson
> > Sent: Thursday, February 11, 2010 10:29 AM
> > To: Don MacQueen
> > Cc: r-help at r-project.org
> > Subject: Re: [R] trouble with read.table and colClasses='raw'
> >
> > Hi Don and all,
> >
> > I guess we're getting somewhere. Thanks. The file (first three
> columns,
> > first five rows) looks like this:
> >
> > X10 X20 X30
> > 00   00    01
> > 00   02   02
> > 00   00  00
> > 00   01  01
> > 00  00  00
> >
> >
> > I guess R is reading 00 as a character? But here's the weird thing:
> > this
> > data (a raw matrix in R) was written out by R itself:
> >
> > write.table(dat,"data",col.names=T,row.names=F,quote=F)
> >
> > *If* I understand correctly, then this seems like very *bad behavior*
> > on R's
> > part: you should be able to write out a matrix and read it right back
> > into R
> > without hassles like this (but everytime I blame R, it turns out to
> be
> > user
> > error, so...),
> >
> > JJ
> >
> >
> >
> > On Thu, Feb 11, 2010 at 9:59 AM, Don MacQueen <macq at llnl.gov> wrote:
> >
> > > The error message says there is no method for converting from
> > 'character'
> > > to 'raw'.
> > > Apparently, R is seeing character data in the file, and is trying
> to
> > > convert it to raw, since you specified raw, and it can't.
> > >
> > > See, for example,
> > >
> > >>  as('aa','raw')
> > >>
> > > Error in as("aa", "raw") :
> > >
> > >  no method or default for coercing "character" to "raw"
> > >
> > > (same error message)
> > >
> > > So I would ask, what are your data, really? Why are you asking for
> > raw?
> > > Have you checked the help page for raw to make sure it's what you
> > want?
> > >
> > > -Don
> > >
> > > At 5:23 PM +0100 2/11/10, Ivan Calandra wrote:
> > >
> > >> Content-Type: text/plain
> > >> Content-Disposition: inline
> > >> Content-Transfer-Encoding: 8bit
> > >> Content-length: 3983
> > >>
> > >>
> > >> Well, it's too complicated for me! Here are what I would do
> (limited
> > >> since I'm still a newbie)
> > >>
> > >> 1) the syntax seems correct, it should work. The problem is
> > somewhere
> > >> else, coming from your own file. Did you try skipping the
> colClasses
> > >> argument? To see how it looks like... If you can import it that
> way,
> > try
> > >> str(x) to see what you have. It might help you.
> > >> 2) I've never had that much data to import, and for me read.table
> > works
> > >> well.
> > >>
> > >> You might want to wait for the experts!
> > >>
> > >> Ivan
> > >>
> > >> Le 2/11/2010 17:14, Johan Jackson a écrit :
> > >>
> > >>>  Hi Ivan,
> > >>>
> > >>>  Thanks for the reply. Damn IT! My original post was screwed up.
> > HERE
> > >>>  is what I did:
> > >>>
> > >>>  x <- read.table("data",header=TRUE,colClasses=rep('raw',600000))
> > >>>  #returns error:  no method or default for coercing "character"
> to
> > "raw"
> > >>>
> > >>>  I've read the ?read.table and the colClasses argument. I'm still
> > >>> unclear:
> > >>>
> > >>>  1) colClasses is a character vector, is that right? That seems
> to
> > be
> > >>>  what the help says, but I get an error when I do the above.
> > >>>
> > >>>  2) what is the most efficient way to read in huge amounts of
> data?
> > In
> > >>>  the past I found that scan() and readLines() were slower than
> > >>> read.table.
> > >>>
> > >>>  Thanks,
> > >>>
> > >>>  JJ
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>  On Thu, Feb 11, 2010 at 8:53 AM, Ivan Calandra
> > >>>  <ivan.calandra at uni-hamburg.de <mailto:ivan.calandra at uni-
> > hamburg.de>>
> > >>>  wrote:
> > >>>
> > >>>     Hi!
> > >>>
> > >>>     |"colClasses|       character. A vector of classes to be
> > assumed
> > >>>     for the
> > >>>     columns."
> > >>>     I'm not an R expert and I don't know what your "flat file
> raw"
> > is,
> > >>> but
> > >>>     the colClasses argument is to define whether the column will
> be
> > >>>     treated
> > >>>     as containing "factors", "logical", "integer" etc...
> > >>>     For more on read.table, read the manual "R Data
> Import/Export"
> > >>>     available
> > >>>     on the R-project website.
> > >>>
> > >>>     I don't know if it helps, but I hope it does!
> > >>>
> > >>  >
> > >>  >     Ivan
> > >>  >
> > >>  >     Le 2/11/2010 16:36, Johan Jackson a écrit :
> > >>  >     > Hi all,
> > >>  >     >
> > >>  >     > First off, it is surprising that there are no examples of
> > how to
> > >> use
> > >>  >     > read.table() under ?read.table !
> > >>
> > >>>     >
> > >>>     > I am trying to read in a flat file of type 'raw'. It has
> 1000
> > >>>     rows and 600K
> > >>>     > columns. I have the RAM to accomplish this, but can't get
> the
> > >>>     data into R
> > >>>     > using read.table:
> > >>>     >
> > >>>     > x<- read.table("data",header=TRUE,colClasses=rep(,600000))
> > >>>     > #returns error:  no method or default for coercing
> > "character"
> > >>>     to "raw"
> > >>>     >
> > >>>     > Then I thought that maybe the colClasses vector needed to
> > >>>     actually *be* the
> > >>>     > mode needed (here's where an example under ?read.table
> would
> > help):
> > >>>     >
> > >>>     > x<-
> > read.table("data",header=TRUE,colClasses=rep(as.raw(1),600000))
> > >>>     >
> > >>>     > I waited on the latter command for a couple of hours before
> > >>>     killing the
> > >>>     > process. What should the colClasses argument be?
> > >>>     >
> > >>>     > Should I be using another method to read the data into R?
> > Previous
> > >>>     > experience using scan() and readLines() showed that
> > read.table()
> > >>>     was faster,
> > >>>     > at least for those examples, so I've stopped trying to use
> > those
> > >>>     other
> > >>>     > functions.
> > >>>     >
> > >>>     > Thank you,
> > >>>     >
> > >>>     > JJ
> > >>>     >
> > >>>     >       [[alternative HTML version deleted]]
> > >>>     >
> > >>>     > ______________________________________________
> > >>>     > R-help at r-project.org <mailto:R-help at r-project.org> mailing
> > list
> > >>>
> > >>  >     > https://*stat.ethz.ch/mailman/listinfo/r-help
> > >>
> > >>>     > PLEASE do read the posting guide
> > >>>     http://*www.*R-project.org/posting-guide.html
> > >>>     > and provide commented, minimal, self-contained,
> reproducible
> > code.
> > >>>     >
> > >>>     >
> > >>>
> > >>>            [[alternative HTML version deleted]]
> > >>>
> > >>>
> > >>>     ______________________________________________
> > >>>     R-help at r-project.org <mailto:R-help at r-project.org> mailing
> list
> > >>>     https://*stat.ethz.ch/mailman/listinfo/r-help
> > >>>     PLEASE do read the posting guide
> > >>>     http://*www.*R-project.org/posting-guide.html
> > >>>     and provide commented, minimal, self-contained, reproducible
> > code.
> > >>>
> > >>>
> > >>>
> > >>        [[alternative HTML version deleted]]
> > >>
> > >>
> > >> ______________________________________________
> > >>
> > >> R-help at r-project.org mailing list
> > >> https://*stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide http://
> > >> *www.*R-project.org/posting-guide.html
> > >> and provide commented, minimal, self-contained, reproducible code.
> > >>
> > >
> > >
> > > --
> > > --------------------------------------
> > > Don MacQueen
> > > Environmental Protection Department
> > > Lawrence Livermore National Laboratory
> > > Livermore, CA, USA
> > > 925-423-1062
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list