[R] reading in multiple data sets in 2 loops

Boris Steipe boris.steipe at utoronto.ca
Sun Feb 7 02:18:22 CET 2016


Computing filenames is a dangerous, backwards approach. If you already _have_ files, it's wrong to create filenames from assumptions. Rather you need to capture the existing filenames with an appropriate use of list.files(), and then process that vector. Computing filenames only has a place when you are creating new files.

Cheers,
Boris




On Feb 6, 2016, at 6:27 PM, William Dunlap via R-help <r-help at r-project.org> wrote:

>    I tried the following but it does not work:
> 
>    data <- lapply(
>     paste(("C:/Research3/simulation1/second_gen/pheno_
> 1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
>    read.csv, header=TRUE, sep=',' )
>    names(data) <- paste("d", LETTERS[1:3], sep='')
> 
> I tried that and R complained about syntax errors - unexpected commas,
> mismatched parentheses, illegal square brackets, etc.
> 
> Using lapply like this a perfectly fine way to solve  the problem but you
> need to get the details right.  I find it easier to break  that statement
> into parts and make sure each part is working.  E.g., after a minimal
> cleanup of your code the file names would be computed as
>    fileNames <-  paste("C:/Research3/simulation1/second_gen/pheno_
> 1000ind_4000m_add_h70_prog_", 1:2 ,"_", 2:3 ,".csv",sep='')
>    print(fileNames) # do they look right?  You said you wanted 1_2, 1_3,
> 2_3 but that will give you only 2 of them
> or perhaps you want all the files in that directory with a given pattern
>    fileNames <- dir("C:/Research3/simulation1/second_gen",
> pattern="^pheno_1000ind_4000m_add_h70_prog_[[:digit:]]+_[[:digit:]]+\\.csv$",
> full.names=TRUE, ignore.case=TRUE)
>    head(fileNames) # keep at it until the fileNames list looks good
>    tail(fileNames)
> 
> Then read the data from the files with
>    data <- lapply(fileNames, read.csv, header=TRUE, sep=",")
> If there are errors reading the files in csv format you could try
>    data <- lapply(fileNames, function(fileName) { cat(fileName, "\n");
> read.csv(fileName, header=TRUE, sep=",")}
> so you can see the name of the first offending file.
> 
> When you attach names you probably want to get the names from the fileNames
> variable, perhaps just the digits part
>    names(data) <- gsub("^.*([[:digit:]]+_[[:digit:]]+)\\.csv$", "d_\\1",
> fileNames)
> 
> 
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> On Fri, Feb 5, 2016 at 9:53 PM, Reka Howard <howardr at iastate.edu> wrote:
> 
>> Hello,
>> I have over 1000 csv data sets I need to read into R, so I want to read
>> them in using a loop. The data sets are named as
>> pheno_1000ind_4000m_add_h70_prog_1_2.csv,
>> pheno_1000ind_4000m_add_h70_prog_1_3.csv, ... so I need 2 loops (for the
>> last 2 numbers in the names). What I would like to do is the following:
>> 
>> setwd("C:/Research3/simulation1/second_gen")
>> d1<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_2.csv")
>> d2<-read.csv("pheno_1000ind_4000m_add_h70_prog_1_3.csv")
>> d3<-read.csv("pheno_1000ind_4000m_add_h70_prog_2_3.csv")
>> .
>> .
>> .
>> 
>> I am wondering how I can accomplish this with a loop. Any suggestion is
>> appreciated!
>> I tried the following but it does not work:
>> 
>> data <- lapply(
>> 
>> paste(("C:/Research3/simulation1/second_gen/pheno_1000ind_4000m_add_h70_prog_",[1:2],"_",[2:3],".csv",sep=''),
>> read.csv, header=TRUE, sep=',' )
>> names(data) <- paste("d", LETTERS[1:3], sep='')
>> 
>> Thanks!
>> Reka
>> 
>>        [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list