[R] read.table: deciding automatically between two colClasses values

Oliver Kullmann O.Kullmann at swansea.ac.uk
Sun Aug 28 17:50:16 CEST 2011


Hi Josh,

thanks, that worked!
For the record, here is a function to determine the
number of strings, space-separated, in the first line
of a file:

# Removes leading and trailing whitespaces from string x:
trim = function(x) gsub("^\\s+|\\s+$", "", x)

# The number of strings in the first line in the file with name f:
lengthfirstline = function(f) {
  length(unlist(strsplit(trim(readLines(f,1)), " ")))
}

Oliver


On Sun, Aug 28, 2011 at 07:23:07AM -0700, Joshua Wiley wrote:
> Hi Oliver,
> 
> Look at ?readLines
> 
> I imagine something like:
> 
> tmp <- readLines(filename, n = 1L)
> (do stuff with the first line to decide)
> IntN <- 6 (or 4)
> NumN <- 8 (or whatever)
> E <- read.table(file = filename, header = TRUE, colClasses =
>   c(rep("integer", IntN), "numeric", "integer", rep("numeric", NumN)), ...)
> 
> Cheers,
> 
> Josh
> 
> On Sun, Aug 28, 2011 at 7:13 AM, Oliver Kullmann
> <O.Kullmann at swansea.ac.uk> wrote:
> > Hello,
> >
> > I have a function for reading a data-frame from a file, which contains
> >
> >  E = read.table(file = filename,
> >        header = T,
> >        colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)),
> >        ...)
> >
> > Now a small variation arose, where
> >
> > colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8))
> >
> > needed to be used (so just a small change).
> > I want to have it convenient for the user, so no user intervention shall
> > be needed, but the function should choose between the two different values
> > "4" and "6" here according to the header-line.
> >
> > Now this seems to be a problem: I found only count.fields, which
> > however is not able just to read the first line. Reading the
> > whole file (just to read the first line) is awkward, and also these
> > files typically have millions of lines. The only possibility to influence
> > count.fields seems via skip, but this I could only use to skip to the
> > last line, which reads the file nevertheless, and I also don't know
> > the number of lines in the file.
> >
> > Perhaps one could catch an error, when the first invocation of
> > read.table fails, and try the second one. However tryCatch doesn't
> > seem to make it simple to write something like
> >
> > E = try(expr1 otherwise expr2)
> >
> > (if expr1 fails, evaluate expr2 instead) ?
> >
> > Oliver
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
>



More information about the R-help mailing list