[R] Example function for bigglm (biglm) data input from file

Yeh, Richard C richard.c.yeh at bankofamerica.com
Mon Jan 22 20:01:53 CET 2007

This is to submit a commented example function for use in the data
argument to the bigglm(biglm) function, when you want to read the data
from a file (instead of a URL), or rescale or modify the data before
fitting the model.  In the hope that this may be of help to someone out

make.data <- function (filename, chunksize, ...) {
  function (reset=FALSE) { 
    if (reset) {
      if (!is.null(conn)) {
      # This is for a file.
      # For other methods, see: help("connections")
      # and replace the following definition of conn
      # (and possibly the read.table call).
      conn <<- file (description=filename, open="r");
    } else {
      # It's best that the file you use has no header 
      # line, because when you use the connection to 
      # read each excerpt, any header won't get re-read.
      # If you choose to skip the first line, then the 
      # first line of each excerpt will be skipped.
      rval <- read.table (conn, nrows=chunksize, 
        skip=0, header=FALSE,...);
      if (nrow(rval)==0) {
        # Then we have reached the end of the input.
        # Clean up:
      } else {
        # We did not reach the end of the input,
        # so this function will return data.
        # Here, you can define any derived fields
        # or put instructions to rescale input data
        # that you want done after the data are read
        # but before they are used for fitting.
        # For example:
        rval$rescaled_column <- rval$original_column / 1000000.0;
        # If you don't want to do anything like this,
        # then delete this "else" clause, and make
        # the end of the function resemble the URL 
        # example in bigglm.

a <- make.data ( filename = "myfile", chunksize = 1000000, 
  # In our definition of make.data, any remaining 
  # arguments get passed to the read.table function by 
  # the ... argument.
  # Define column types:
  colClasses = list ("character", "character", 
    "integer", "numeric", "numeric"),
  # Define the column names in the call:
  # (recall that we cannot rely on the file header)
  col.names = c("fromState", "toState",
    "first", "original_column", "second")


bigglm (formula = toState ~ 1 + first + rescaled_column,
  data = a, family = binomial(link='logit'), 
  weights = ~second);


NOTICE TO RECIPIENTS: Any information contained in or attach...{{dropped}}

More information about the R-help mailing list