[R] trouble with read.table and colClasses='raw'

Greg Snow Greg.Snow at imail.org
Thu Feb 11 19:05:47 CET 2010


The read.table function does not know how to convert the character representation that it reads into raw variables.  Try using 'integer' for the colClasses to read the data in as integers, then convert those back to raw (if that is really what you need).

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Johan Jackson
> Sent: Thursday, February 11, 2010 10:29 AM
> To: Don MacQueen
> Cc: r-help at r-project.org
> Subject: Re: [R] trouble with read.table and colClasses='raw'
> 
> Hi Don and all,
> 
> I guess we're getting somewhere. Thanks. The file (first three columns,
> first five rows) looks like this:
> 
> X10 X20 X30
> 00   00    01
> 00   02   02
> 00   00  00
> 00   01  01
> 00  00  00
> 
> 
> I guess R is reading 00 as a character? But here's the weird thing:
> this
> data (a raw matrix in R) was written out by R itself:
> 
> write.table(dat,"data",col.names=T,row.names=F,quote=F)
> 
> *If* I understand correctly, then this seems like very *bad behavior*
> on R's
> part: you should be able to write out a matrix and read it right back
> into R
> without hassles like this (but everytime I blame R, it turns out to be
> user
> error, so...),
> 
> JJ
> 
> 
> 
> On Thu, Feb 11, 2010 at 9:59 AM, Don MacQueen <macq at llnl.gov> wrote:
> 
> > The error message says there is no method for converting from
> 'character'
> > to 'raw'.
> > Apparently, R is seeing character data in the file, and is trying to
> > convert it to raw, since you specified raw, and it can't.
> >
> > See, for example,
> >
> >>  as('aa','raw')
> >>
> > Error in as("aa", "raw") :
> >
> >  no method or default for coercing "character" to "raw"
> >
> > (same error message)
> >
> > So I would ask, what are your data, really? Why are you asking for
> raw?
> > Have you checked the help page for raw to make sure it's what you
> want?
> >
> > -Don
> >
> > At 5:23 PM +0100 2/11/10, Ivan Calandra wrote:
> >
> >> Content-Type: text/plain
> >> Content-Disposition: inline
> >> Content-Transfer-Encoding: 8bit
> >> Content-length: 3983
> >>
> >>
> >> Well, it's too complicated for me! Here are what I would do (limited
> >> since I'm still a newbie)
> >>
> >> 1) the syntax seems correct, it should work. The problem is
> somewhere
> >> else, coming from your own file. Did you try skipping the colClasses
> >> argument? To see how it looks like... If you can import it that way,
> try
> >> str(x) to see what you have. It might help you.
> >> 2) I've never had that much data to import, and for me read.table
> works
> >> well.
> >>
> >> You might want to wait for the experts!
> >>
> >> Ivan
> >>
> >> Le 2/11/2010 17:14, Johan Jackson a écrit :
> >>
> >>>  Hi Ivan,
> >>>
> >>>  Thanks for the reply. Damn IT! My original post was screwed up.
> HERE
> >>>  is what I did:
> >>>
> >>>  x <- read.table("data",header=TRUE,colClasses=rep('raw',600000))
> >>>  #returns error:  no method or default for coercing "character" to
> "raw"
> >>>
> >>>  I've read the ?read.table and the colClasses argument. I'm still
> >>> unclear:
> >>>
> >>>  1) colClasses is a character vector, is that right? That seems to
> be
> >>>  what the help says, but I get an error when I do the above.
> >>>
> >>>  2) what is the most efficient way to read in huge amounts of data?
> In
> >>>  the past I found that scan() and readLines() were slower than
> >>> read.table.
> >>>
> >>>  Thanks,
> >>>
> >>>  JJ
> >>>
> >>>
> >>>
> >>>
> >>>  On Thu, Feb 11, 2010 at 8:53 AM, Ivan Calandra
> >>>  <ivan.calandra at uni-hamburg.de <mailto:ivan.calandra at uni-
> hamburg.de>>
> >>>  wrote:
> >>>
> >>>     Hi!
> >>>
> >>>     |"colClasses|       character. A vector of classes to be
> assumed
> >>>     for the
> >>>     columns."
> >>>     I'm not an R expert and I don't know what your "flat file raw"
> is,
> >>> but
> >>>     the colClasses argument is to define whether the column will be
> >>>     treated
> >>>     as containing "factors", "logical", "integer" etc...
> >>>     For more on read.table, read the manual "R Data Import/Export"
> >>>     available
> >>>     on the R-project website.
> >>>
> >>>     I don't know if it helps, but I hope it does!
> >>>
> >>  >
> >>  >     Ivan
> >>  >
> >>  >     Le 2/11/2010 16:36, Johan Jackson a écrit :
> >>  >     > Hi all,
> >>  >     >
> >>  >     > First off, it is surprising that there are no examples of
> how to
> >> use
> >>  >     > read.table() under ?read.table !
> >>
> >>>     >
> >>>     > I am trying to read in a flat file of type 'raw'. It has 1000
> >>>     rows and 600K
> >>>     > columns. I have the RAM to accomplish this, but can't get the
> >>>     data into R
> >>>     > using read.table:
> >>>     >
> >>>     > x<- read.table("data",header=TRUE,colClasses=rep(,600000))
> >>>     > #returns error:  no method or default for coercing
> "character"
> >>>     to "raw"
> >>>     >
> >>>     > Then I thought that maybe the colClasses vector needed to
> >>>     actually *be* the
> >>>     > mode needed (here's where an example under ?read.table would
> help):
> >>>     >
> >>>     > x<-
> read.table("data",header=TRUE,colClasses=rep(as.raw(1),600000))
> >>>     >
> >>>     > I waited on the latter command for a couple of hours before
> >>>     killing the
> >>>     > process. What should the colClasses argument be?
> >>>     >
> >>>     > Should I be using another method to read the data into R?
> Previous
> >>>     > experience using scan() and readLines() showed that
> read.table()
> >>>     was faster,
> >>>     > at least for those examples, so I've stopped trying to use
> those
> >>>     other
> >>>     > functions.
> >>>     >
> >>>     > Thank you,
> >>>     >
> >>>     > JJ
> >>>     >
> >>>     >       [[alternative HTML version deleted]]
> >>>     >
> >>>     > ______________________________________________
> >>>     > R-help at r-project.org <mailto:R-help at r-project.org> mailing
> list
> >>>
> >>  >     > https://*stat.ethz.ch/mailman/listinfo/r-help
> >>
> >>>     > PLEASE do read the posting guide
> >>>     http://*www.*R-project.org/posting-guide.html
> >>>     > and provide commented, minimal, self-contained, reproducible
> code.
> >>>     >
> >>>     >
> >>>
> >>>            [[alternative HTML version deleted]]
> >>>
> >>>
> >>>     ______________________________________________
> >>>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list
> >>>     https://*stat.ethz.ch/mailman/listinfo/r-help
> >>>     PLEASE do read the posting guide
> >>>     http://*www.*R-project.org/posting-guide.html
> >>>     and provide commented, minimal, self-contained, reproducible
> code.
> >>>
> >>>
> >>>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >>
> >> R-help at r-project.org mailing list
> >> https://*stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://
> >> *www.*R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> > --
> > --------------------------------------
> > Don MacQueen
> > Environmental Protection Department
> > Lawrence Livermore National Laboratory
> > Livermore, CA, USA
> > 925-423-1062
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 	[[alternative HTML version deleted]]



More information about the R-help mailing list