[R] reading in only one column from text file

Seth Falcon sfalcon at fhcrc.org
Tue Mar 7 22:56:53 CET 2006


"mark salsburg" <mark.salsburg at gmail.com> writes:
> How do I manipulate the read.table function to read in only the 2nd
> column???

If your data is small, you can read in all columns and then subset the
resulting data frame.  Try that first.

Perhaps there is a nicer way to do this that I don't know about, but
recently I coded up the following to allow for a "streamy" read.table.
I've adjusted a few things, but haven't tested.  May not work as is,
but it should give you an idea.

+ seth


readBatch <- function(con, batch.size) {
    colClasses <- rep("character", 20) ## fix for your data
    ## adjust to pick out the columns that you want
    read.csv(con, colClasses=colClasses, as.is=TRUE,
             nrows=batch.size, header=FALSE)[, 1:2]
}

readTableStreamily <- function(filePath) {
    BATCH_SIZE <- 5000 ## no idea what a good value is depends on file and RAM
    con <- file(filePath, 'r')
    colNames <- readBatch(con, batch.size=1)
    chunks <- list()
    i <- 1
    done <- FALSE
    while (!done) {
        done <- tryCatch({
            cat(".")
            chunks[[i]] <- readBatch(con, batch.size=BATCH_SIZE)
            i <- i + 1
            FALSE
        }, error=function(e) TRUE)
    }
    close(con)
    cat("\n")
    df <- do.call("rbind", chunks)
    names(df) <- colNames
    df
}




More information about the R-help mailing list