[Rd] alternative read.arff function for the package foreign

Juan Manuel Barreneche jumanbar at gmail.com
Thu Oct 22 02:45:23 CEST 2015


​Hello everyone, I guess this is really directed to the R Core Team, but I
understand that this is the best channel to submit this (please correct me
if I'm wrong!).

I would like to submit a function to consideration, as an upgrade for the
current read.arff in package foreign. Code in github:

https://raw.githubusercontent.com/jumanbar/misc/master/R/read.arff.R

This function is a modified version of the one found in the foreign
package. This changes aim to correct a problem I found with the standard
read.arff: levels in factors do not match what's explicitly written in the
original arff file.

For example, if a nominal attribute in some arff datafile has this line in
the header:

@attribute X {'A', 'B', 'C'}

But the data only have instances of 'A' and 'B', but not 'C', then what R
imports is:

dat <- read.arff("data.arff")
levels(dat$X)
[1] "a"  "b"

Not only the levels are in lowercase, but also there is one level which has
disappeared. This is troublesome, specially if I wish to export my data
frame to an arff file using write.arff.

With this version of read.arff, when dealing with the aforementioned case,
I get:

levels(dat$X)
[1] "A"  "B"  "C"

And also I can set a couple of parameters which can help me tune up my work
flow to better fit my needs (for example, reading only a limited number of
lines, since I just want to make a couple of fast tests and therefore, I
don't need the whole dataset).

Thanks for your time,
Juan Manuel

--

MSc. Juan M. Barreneche Sarasola

	[[alternative HTML version deleted]]



More information about the R-devel mailing list