[R] Reading a file with read.csv: two character rows not interpreted as I hope

Bryan Hanson hanson at depauw.edu
Wed Oct 31 14:50:00 CET 2007


OK, I fixed it myself!  Here's the code.  Of course, it mostly seems simple
once one gets it working... Thanks Jim.  Bryan

sample.info = read.table(input.file.name, sep=",", as.is=TRUE, nrows=3) #
get the first three lines with sample info in character format
sample.names = sample.info[1,]
sample.colors = sample.info[2,]; sample.colors =
as.character(sample.colors[-1])
sample.class = sample.info[3,]; sample.class =
as.character(sample.class[-1])
data = read.table(input.file.name, sep=",", skip=3)
colnames(data) = sample.names


On 10/30/07 10:53 PM, "Bryan Hanson" <hanson at depauw.edu> wrote:

> Jim, thanks for the suggestion.  There is still something subtle
> &
non-intuitive going on here.  I adapted your code with minor changes
> as
follows (I had to add the sep argument) but get different
> behavior:

c.names <- scan("file.csv", what='', nlines=1, sep=",")  # read
> column names
c.options <- read.table("file.csv", as.is=TRUE, nrows=2, sep=",")
> # get
lines 2-3
c.data <- read.table("file.csv", sep=",")  # rest of the
> data
colnames("file.csv") <- c.names

Your code works perfectly (you knew
> that!).  My adaptation runs, but
c.options contains the first two lines, not
> lines 2 & 3, and c.data contains
the contents of the entire file as *factors*
> (data type of c.names &
c.options is correct - character). How strange!

Also,
> and this is an observation rather than a question: in your code, you
call scan
> and get the first line as characters, then you do read.table which
gets lines
> 2 & 3 presumably because the first line, from read.table's
perspective is a
> hidden label (?), then the second time you use read.table
the hidden first
> line is ignored, as are the two lines with character data.
I really don't
> understand these behaviors, which is probably why I'm having
trouble parsing
> the file!

Thanks, Bryan 

On 10/30/07 8:40 PM, "jim holtman"
> <jholtman at gmail.com> wrote:

> Here is one way.  You will probably use 'file'
> instead of textConnection
> 
>> x.in <- textConnection('wavelength SampleA
> SampleB SampleC SampleD
> +  color "green" "black" "black" "green"
> +  class
> "Class 1" "Class 2" "Class 2" "Class 1"
> +  403 1.94E-01 2.14E-01 2.11E-01
> 1.83E-01
> +  409 1.92E-01 1.89E-01 2.00E-01 1.82E-01
> +  415 1.70E-01
> 1.99E-01 1.94E-01 1.86E-01
> +  420 1.59E-01 1.91E-01 2.16E-01 1.74E-01
> +
> 426 1.50E-01 1.66E-01 1.72E-01 1.58E-01
> +  432 1.42E-01 1.50E-01 1.62E-01
> 1.48E-01')
>> 
>> c.names <- scan(x.in, what='', nlines=1)  # read column
> names
> Read 5 items
>> c.options <- read.table(x.in, as.is=TRUE, nrows=2) #
> get lines 2-3
>> c.data <- read.table(x.in)  # rest of the data
>>
> colnames(c.data) <- c.names
>> close(x.in)
>> c.options  # here are lines
> 2-3
>      V1      V2      V3      V4      V5
> 1 color   green   black
> black   green
> 2 class Class 1 Class 2 Class 2 Class 1
>> c.data  # your
> data
>   wavelength SampleA SampleB SampleC SampleD
> 1        403   0.194
> 0.214   0.211   0.183
> 2        409   0.192   0.189   0.200   0.182
> 3
> 415   0.170   0.199   0.194   0.186
> 4        420   0.159   0.191   0.216
> 0.174
> 5        426   0.150   0.166   0.172   0.158
> 6        432   0.142
> 0.150   0.162   0.148
> 
> 
> On 10/30/07, Bryan Hanson <hanson at depauw.edu>
> wrote:
>> Hi Folks... Œbeen playing with this for a while, with no luck, so
> I¹m hoping
>> someone knows it off the top of their head...  Difficult to find
> this nuance
>> in the archives, as so many msgs deal with read.csv!
>> 
>> I¹m
> trying to read a data file with the following structure (a little piece
>> of
> the actual data, they are actually csv just didn¹t paste with the
>>
> commas):
>> 
>>  wavelength SampleA SampleB SampleC SampleD
>>  color "green"
> "black" "black" "green"
>>  class "Class 1" "Class 2" "Class 2" "Class 1"
>>
> 403 1.94E-01 2.14E-01 2.11E-01 1.83E-01
>>  409 1.92E-01 1.89E-01 2.00E-01
> 1.82E-01
>>  415 1.70E-01 1.99E-01 1.94E-01 1.86E-01
>>  420 1.59E-01 1.91E-01
> 2.16E-01 1.74E-01
>>  426 1.50E-01 1.66E-01 1.72E-01 1.58E-01
>>  432 1.42E-01
> 1.50E-01 1.62E-01 1.48E-01
>> 
>> Columns after the first one are sample
> names.  2nd row is the list of colors
>> to use in later plotting.  3rd row is
> the class for later manova.  The rest
>> of it is x data in the first column
> with y1, y2...following for plotting.
>> 
>> I can read the file w/o the color
> or class rows with read.csv just fine,
>> makes a nice data frame with proper
> data types.  The problem comes when
>> parsing the 2nd and 3rd rows.  Here¹s
> the code:
>> 
>> data = read.csv("filename", header=TRUE) # read in data
>>
> color = data[1,]; color = data[-1] # capture color info & throw out 1st
>>
> value
>> class = data[2,]; class = class[-1] # capture category info & throw
> out 1st
>> value
>> 
>> cleaned.data = data[-1,] # remove color & category
> info for matrix
>> operations
>> cleaned.data = data[-1,]
>> freq = data[,1] #
> capture frequency info
>> 
>> What happens is that freq is parsed as factors,
> and the color and class are
>> parsed as a data frames of factors.
>> I need
> color and class to be characters which I can pass to functions in the
>>
> typical way one uses colors and levels.
>> I need the freq & the cleaned.data
> info as numeric for plotting.
>> 
>> I don¹t feel I¹m far off from things
> working, but that¹s where you all come
>> in!  Seems like an argument of
> as.something is needed, but the ones I¹ve
>> tried don¹t work.  Would it help
> to put color and class above the x,y data
>> in the file, then clean it
> off?
>> 
>> Btw, I¹m on a Mac using R 2.6.0.
>> 
>> Thanks in advance,
> Bryan
>> *************
>> Bryan Hanson
>> Professor of Chemistry &
> Biochemistry
>> 
>> 
>> 
>> 
>>        [[alternative HTML version deleted]]
>>
> 
>> 
>> ______________________________________________
>> R-help at r-project.org
> mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read
> the posting guide http://www.R-project.org/posting-guide.html
>> and provide
> commented, minimal, self-contained, reproducible code.
>> 
>> 
>
> 

______________________________________________
R-help at r-project.org mailing
> list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting
> guide http://www.R-project.org/posting-guide.html
and provide commented,
> minimal, self-contained, reproducible code.



More information about the R-help mailing list