[R] exporting tables from an access database using parallel foreach

Vivek Sutradhara viveksutra at gmail.com
Fri Nov 20 18:38:14 CET 2015


Hi
I want to extract data from a Microsoft access database having many tables
with more than 1e7 rows. I find that the following code works to export a
table to a rds file :
#####################
setwd('C:/sFolder')
library(RODBC);library(DBI)
ch<-odbcConnect("sample")

#No. of rows in the table not known
rowN<-1e6  # no. of rows defined
db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE)
file<-paste0('Table1',1,'.rds')
df1<-saveRDS(db,file1)

rm(db);gc()   # garbage collection to free up the memory

# To successively obtain more chunks from the access database
for (i in 2:10) {
  rm(df);gc()
  df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE)
  file<-paste0('Table1',i,'.rds')
  df1<-saveRDS(df,file)
  if (dim(df)[1]<rowN)
    break
}
rm(df);gc()
odbcCloseAll()
##############################

I would like to know the following :
1. Is there any way to extract data from a table by just specifying the row
number range. I have extracted data before. Instead of repeating the
operations, I would just like to obtain data from, let's say, 8e6 to 9e6
row range. I cannot do this now. I have to successively use the
sqlfetchMore command. I would like to know if it is possible to straight
away go to the 8e6 to 9e6 row range.
2. Is it possible to use the foreach package in the extraction step (in
place of the for loop above). I am planning to use the foreach command in
parallel later for processing the data in the multiple files. I just wonder
if it is possible to do parallel processing for the data extraction also.
Thanks,
Vivek Sutradhara

	[[alternative HTML version deleted]]



More information about the R-help mailing list