[R] reading in csv files, some of which have column names and some of which don't

Sarah Goslee @@r@h@go@|ee @end|ng |rom gm@||@com
Tue Aug 13 20:39:43 CEST 2019


Like Bert, I can't see an easy approach for datasets that have
character rather than numeric data. But here's a simple approach for
distinguishing files that have possible character headers but numeric
data.



readheader <- function(filename) {

possibleheader <- read.table(filename, nrows=1, sep=",", header=FALSE)

if(all(is.numeric(possibleheader[,1]))) {
# no header
infile <- read.table(filename, sep=",", header=FALSE)
} else {
# has header
infile <- read.table(filename, sep=",", header=TRUE)
}

infile
}



#### file noheader.csv ####

1,1,1
2,2,2
3,3,3


#### file hasheader.csv ####

a,b,c
1,1,1
2,2,2
3,3,3

########################

> readheader("noheader.csv")
  V1 V2 V3
1  1  1  1
2  2  2  2
3  3  3  3
> readheader("hasheader.csv")
  a b c
1 1 1 1
2 2 2 2
3 3 3 3

Sarah

On Tue, Aug 13, 2019 at 2:00 PM Christopher W Ryan <cryan using binghamton.edu> wrote:
>
> Alas, we spend so much time and energy on data wrangling . . . .
>
> I'm given a collection of csv files to work with---"found data". They arose
> via saving Excel files to csv format. They all have the same column
> structure, except that some were saved with column names and some were not.
>
> I have a code snippet that I've used before to traverse a directory and
> read into R all the csv files of a certain filename pattern within it, and
> combine them all into a single dataframe:
>
> library(dplyr)
> ## specify the csv files that I will want to access
> files.to.read <- list.files(path = "H:/EH", pattern =
> "WICLeadLabOrdersDone.+", all.files = FALSE, full.names = TRUE, recursive =
> FALSE, ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)
>
> ## function to read csv files back in
> read.csv.files <- function(filename) {
>     bb <- read.csv(filename, colClasses = "character", header = TRUE)
>     bb
> }
>
> ## now read the csv files, as all character
> b <- lapply(files.to.read, read.csv.files)
>
> ddd <- bind_rows(b)
>
> But this assumes that all files have column names in their first row. In
> this case, some don't. Any advice how to handle it so that those with
> column names and those without are read in and combined properly? The only
> thing I've come up with so far is:
>
> ## function to read csv files back in
> ## Unfortunately, some of the csv files are saved with column headers, and
> some are saved without them.
> ## This presents a problem when defining the function to read them: header
> = TRUE or header = FALSE?
> ## The best solution I can think of as of 13 August 2019 is to use header =
> FALSE and skip the
> ## first row of every file. This will sacrifice one record from each csv of
> about 80 files
> read.csv.files <- function(filename) {
>     bb <- read.csv(filename, colClasses = "character", header = FALSE, skip
> = 1)
>     bb
> }
>
> This sacrifices about 80 out of about 1600 records. For my purposes in this
> instance, this may be acceptable, but of course I'd rather not.
>
> Thanks.
>
> --Chris Ryan
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Sarah Goslee (she/her)
http://www.numberwright.com



More information about the R-help mailing list