[R] Reading a file with read.csv: two character rows not interpreted as I hope

jim holtman jholtman at gmail.com
Wed Oct 31 04:38:39 CET 2007


Try:

inFile <- file("file.csv", "r")

Then use inFile where you have "file.csv".  You are reopening the file
on each call; this keeps the file open.

On 10/30/07, Bryan Hanson <hanson at depauw.edu> wrote:
> Jim, thanks for the suggestion.  There is still something subtle &
> non-intuitive going on here.  I adapted your code with minor changes as
> follows (I had to add the sep argument) but get different behavior:
>
> c.names <- scan("file.csv", what='', nlines=1, sep=",")  # read column names
> c.options <- read.table("file.csv", as.is=TRUE, nrows=2, sep=",") # get
> lines 2-3
> c.data <- read.table("file.csv", sep=",")  # rest of the data
> colnames("file.csv") <- c.names
>
> Your code works perfectly (you knew that!).  My adaptation runs, but
> c.options contains the first two lines, not lines 2 & 3, and c.data contains
> the contents of the entire file as *factors* (data type of c.names &
> c.options is correct - character). How strange!
>
> Also, and this is an observation rather than a question: in your code, you
> call scan and get the first line as characters, then you do read.table which
> gets lines 2 & 3 presumably because the first line, from read.table's
> perspective is a hidden label (?), then the second time you use read.table
> the hidden first line is ignored, as are the two lines with character data.
> I really don't understand these behaviors, which is probably why I'm having
> trouble parsing the file!
>
> Thanks, Bryan
>
> On 10/30/07 8:40 PM, "jim holtman" <jholtman at gmail.com> wrote:
>
> > Here is one way.  You will probably use 'file' instead of textConnection
> >
> >> x.in <- textConnection('wavelength SampleA SampleB SampleC SampleD
> > +  color "green" "black" "black" "green"
> > +  class "Class 1" "Class 2" "Class 2" "Class 1"
> > +  403 1.94E-01 2.14E-01 2.11E-01 1.83E-01
> > +  409 1.92E-01 1.89E-01 2.00E-01 1.82E-01
> > +  415 1.70E-01 1.99E-01 1.94E-01 1.86E-01
> > +  420 1.59E-01 1.91E-01 2.16E-01 1.74E-01
> > +  426 1.50E-01 1.66E-01 1.72E-01 1.58E-01
> > +  432 1.42E-01 1.50E-01 1.62E-01 1.48E-01')
> >>
> >> c.names <- scan(x.in, what='', nlines=1)  # read column names
> > Read 5 items
> >> c.options <- read.table(x.in, as.is=TRUE, nrows=2) # get lines 2-3
> >> c.data <- read.table(x.in)  # rest of the data
> >> colnames(c.data) <- c.names
> >> close(x.in)
> >> c.options  # here are lines 2-3
> >      V1      V2      V3      V4      V5
> > 1 color   green   black   black   green
> > 2 class Class 1 Class 2 Class 2 Class 1
> >> c.data  # your data
> >   wavelength SampleA SampleB SampleC SampleD
> > 1        403   0.194   0.214   0.211   0.183
> > 2        409   0.192   0.189   0.200   0.182
> > 3        415   0.170   0.199   0.194   0.186
> > 4        420   0.159   0.191   0.216   0.174
> > 5        426   0.150   0.166   0.172   0.158
> > 6        432   0.142   0.150   0.162   0.148
> >
> >
> > On 10/30/07, Bryan Hanson <hanson at depauw.edu> wrote:
> >> Hi Folks... Œbeen playing with this for a while, with no luck, so I¹m hoping
> >> someone knows it off the top of their head...  Difficult to find this nuance
> >> in the archives, as so many msgs deal with read.csv!
> >>
> >> I¹m trying to read a data file with the following structure (a little piece
> >> of the actual data, they are actually csv just didn¹t paste with the
> >> commas):
> >>
> >>  wavelength SampleA SampleB SampleC SampleD
> >>  color "green" "black" "black" "green"
> >>  class "Class 1" "Class 2" "Class 2" "Class 1"
> >>  403 1.94E-01 2.14E-01 2.11E-01 1.83E-01
> >>  409 1.92E-01 1.89E-01 2.00E-01 1.82E-01
> >>  415 1.70E-01 1.99E-01 1.94E-01 1.86E-01
> >>  420 1.59E-01 1.91E-01 2.16E-01 1.74E-01
> >>  426 1.50E-01 1.66E-01 1.72E-01 1.58E-01
> >>  432 1.42E-01 1.50E-01 1.62E-01 1.48E-01
> >>
> >> Columns after the first one are sample names.  2nd row is the list of colors
> >> to use in later plotting.  3rd row is the class for later manova.  The rest
> >> of it is x data in the first column with y1, y2...following for plotting.
> >>
> >> I can read the file w/o the color or class rows with read.csv just fine,
> >> makes a nice data frame with proper data types.  The problem comes when
> >> parsing the 2nd and 3rd rows.  Here¹s the code:
> >>
> >> data = read.csv("filename", header=TRUE) # read in data
> >> color = data[1,]; color = data[-1] # capture color info & throw out 1st
> >> value
> >> class = data[2,]; class = class[-1] # capture category info & throw out 1st
> >> value
> >>
> >> cleaned.data = data[-1,] # remove color & category info for matrix
> >> operations
> >> cleaned.data = data[-1,]
> >> freq = data[,1] # capture frequency info
> >>
> >> What happens is that freq is parsed as factors, and the color and class are
> >> parsed as a data frames of factors.
> >> I need color and class to be characters which I can pass to functions in the
> >> typical way one uses colors and levels.
> >> I need the freq & the cleaned.data info as numeric for plotting.
> >>
> >> I don¹t feel I¹m far off from things working, but that¹s where you all come
> >> in!  Seems like an argument of as.something is needed, but the ones I¹ve
> >> tried don¹t work.  Would it help to put color and class above the x,y data
> >> in the file, then clean it off?
> >>
> >> Btw, I¹m on a Mac using R 2.6.0.
> >>
> >> Thanks in advance, Bryan
> >> *************
> >> Bryan Hanson
> >> Professor of Chemistry & Biochemistry
> >>
> >>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >
>
>
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list