[R] Importing data from text file with mixed format

William Dunlap wdunlap at tibco.com
Sun Oct 25 22:30:53 CET 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of delnatan
> Sent: Saturday, October 24, 2009 8:32 PM
> To: r-help at r-project.org
> Subject: [R] Importing data from text file with mixed format
> 
> 
> Hi,
> I'm having difficulty importing my textfile that looks 
> something like this:
> 
> #begin text file
> Timepoint 1
> ObjectNumber     Volume     SurfaceArea
> 1                      5.3          9.7
> 2                      4.9          8.3
> 3                      5.0          9.1
> 4                      3.5          7.8
> 
> Timepoint 2
> ObjectNumber     Volume     SurfaceArea
> 1                      5.1          9.0
> 2                      4.7          8.9
> 3                      4.3          8.3
> 4                      4.2          7.9
> 
> ... #goes on to Timepoint 80
> 
> How would I import this data into a list containing 
> data.frame for each
> timepoint?
> I'd like my data to be organized like this:
> 
> >myList
> [[1]]
>    ObjectNumber     Volume     SurfaceArea
> 1  1                      5.3          9.7
> 2  2                      4.9          8.3
> 3  3                      5.0          9.1
> 4  4                      3.5          7.8
> 
> [[2]]
>   ObjectNumber     Volume     SurfaceArea
> 1 1                      5.1          9.0
> 2 2                      4.7          8.9
> 3 3                      4.3          8.3
> 4 4                      4.2          7.9

The following function reads that text file into one data.frame,
which has a Timepoint column, which is a format I usually find
more convenient.  You can use split(data, data$Timepoint)
to get to the format you asked for.  If you use the one-data-frame
format you can use the cast and melt functions from the reshape
package to rearrange it.

readMyData <- function (file) {
    # read every line in the file
    lines <- readLines(file)
    # drop empty lines
    lines <- grep("^[[:space:]]*$", lines, value=TRUE, invert=TRUE)
    # find and check header lines
    isHeaderLine <- regexpr("^ObjectNumber", lines) > 0
    if (sum(isHeaderLine)==0)
        stop("No header lines of form 'ObjectNumber ...'")
    if (length(u <- unique(lines[isHeaderLine]))>1)
        stop("Header lines vary: ", paste(sQuote(head(u)), collapse=",
"))
    col.names <- strsplit(lines[which(isHeaderLine)[1]],
"[[:space:]]+")[[1]]
    # after making column names from header lines, drop header lines
    lines <- lines[!isHeaderLine]
    # process Timepoint lines
    isTimepointLine <- regexpr("^Timepoint", lines) > 0    
    if (sum(isTimepointLine)==0)
        stop("No lines of form 'Timepoint <number>'")
    timepoints <- sub("^Timepoint[[:space:]]*", "",
lines[isTimepointLine])
    timepoints <- as.integer(timepoints)
    if (any(is.na(timepoints)))
        stop("Non-integer found in a Timepoint line: ",
            sQuote(lines[isTimepointLine][which(is.na(timepoints))[1]]))
    nRowsPerTimepoint <-
diff(c(which(isTimepointLine),length(isTimepointLine)+1)) - 1
    # drop Timepoint lines.  Remaining lines should be data lines
    lines <- lines[!isTimepointLine]
    # An error in read.table means there were lines we should have
dropped
    result <- read.table(header=FALSE,
        row.names=NULL,
        col.names=col.names,
        textConnection(lines))
    # Add Timepoint column
    result$Timepoint <- rep(timepoints, nRowsPerTimepoint)
    result 
}

E.g.,
> data <- readMyData("c:/temp/t.txt")
> data
  ObjectNumber Volume SurfaceArea Timepoint
1            1    5.3         9.7         1
2            2    4.9         8.3         1
3            3    5.0         9.1         1
4            4    3.5         7.8         1
5            1    5.1         9.0         2
6            2    4.7         8.9         2
7            3    4.3         8.3         2
8            4    4.2         7.9         2
> split(data, data$Timepoint)
$`1`
  ObjectNumber Volume SurfaceArea Timepoint
1            1    5.3         9.7         1
2            2    4.9         8.3         1
3            3    5.0         9.1         1
4            4    3.5         7.8         1

$`2`
  ObjectNumber Volume SurfaceArea Timepoint
5            1    5.1         9.0         2
6            2    4.7         8.9         2
7            3    4.3         8.3         2
8            4    4.2         7.9         2
> mdata <- melt(data, id=c("ObjectNumber","Timepoint"))
> cast(mdata, Timepoint~variable, fun.aggregate=c,
subset=variable=="SurfaceArea")
  Timepoint SurfaceArea_X1 SurfaceArea_X2 SurfaceArea_X3 SurfaceArea_X4
1         1            9.7            8.3            9.1            7.8
2         2            9.0            8.9            8.3            7.9
> cast(mdata, ObjectNumber~variable, fun.aggregate=c,
subset=variable=="SurfaceArea")
  ObjectNumber SurfaceArea_X1 SurfaceArea_X2
1            1            9.7            9.0
2            2            8.3            8.9
3            3            9.1            8.3
4            4            7.8            7.9

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> -Daniel
> -- 
> View this message in context: 
> http://www.nabble.com/Importing-data-from-text-file-with-mixed
-format-tp26045031p26045031.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list