[R] ESRI shape file import and time-space models

Tue Feb 18 23:13:03 CET 2003

Thanks to R. Herold for the suggested change from readBin to readChar for
the field type in the field header descriptions.  The code below is the
revised read.dbf function.  Fan's odbc.dbase function is much faster than my
read.dbf() function, and it is defintely better for large files.  Thanks
also to Fan for the comments.  Finally, I am working on putting together a
package to read and write shapefiles.  

Regards,
Benjamin Stabler
Transportation Planning Analysis Unit
Oregon Department of Transportation
555 13th Street NE, Suite 2
Salem, OR 97301  Ph: 503-986-4104

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~
#Read DBF format
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~~~~~~~~~~~~~

read.dbf <- function(dbf.name) {

	infile<-file(dbf.name,"rb")

	#Header
	file.version <- readBin(infile,integer(), 1, size=1,
endian="little")
	file.year <- readBin(infile,integer(), 1, size=1, endian="little")
	file.month <- readBin(infile,integer(), 1, size=1, endian="little")
	file.day <- readBin(infile,integer(), 1, size=1, endian="little")
	num.records <- readBin(infile,integer(), 1, size=4, endian="little")
	header.length <- readBin(infile,integer(), 1, size=2,
endian="little")
	record.length <- readBin(infile,integer(), 1, size=2,
endian="little")
	file.temp <- readBin(infile,integer(), 20, size=1, endian="little")
	header <- list(file.version,file.year, file.month, file.day,
num.records, header.length, record.length)
	names(header) <-
c("file.version","file.year","file.month","file.day","num.records","header.l
ength","record.length")
	rm(file.version,file.year, file.month, file.day, num.records,
header.length, record.length)

	#Calculate the number of fields
	num.fields <- (header$header.length-32-1)/32
	field.name <- NULL
	field.type <- NULL
	field.length <- NULL

	#Field Descriptions (32 bytes each)
	for (i in 1:num.fields) {
		field.name.test <- readBin(infile,character(), 1, size=10,
endian="little")
		field.name <- c(field.name,field.name.test)
		if (nchar(field.name.test)!=10) {
			file.temp <- readBin(infile,integer(),
10-(nchar(field.name.test)), 1, endian="little")
		}	
		field.type <- c(field.type,readChar(infile, 1))
		file.temp <- readBin(infile,integer(), 4, 1,
endian="little")
		field.length <- c(field.length,readBin(infile,integer(), 1,
1, endian="little"))
		file.temp <- readBin(infile,integer(), 15, 1,
endian="little")
	}

	#Create a table of the field info
	fields <-
data.frame(NAME=field.name,TYPE=field.type,LENGTH=field.length)
	#Set all fields with length<0 equal to correct number of characters
	fields$LENGTH[fields$LENGTH<0]<-(256+fields$LENGTH[fields$LENGTH<0])
	#Read in end of attribute descriptions terminator - should be
integer value 13
	file.temp <- readBin(infile,integer(), 1, 1, endian="little")
	#Increase the length of field 1 by one to account for the space at
the beginning of each record	
	fields$LENGTH[1]<-fields$LENGTH[1]+1
	#Add fields to the header list
	header <- c(header,fields=NULL)
	header$fields <- fields

	#Read in all the records data and the end of file value - should be
value 26
	all.records <- readBin(infile, integer(),
header$num.records*header$record.length, size=1, endian="little")
	file.temp <- readBin(infile,integer(), 1, 1, endian="little")
	close(infile)

	#Compress the binary values using run length encoding
	all.records <- rle(all.records)
	#Swap ASCII decimal codes for ASCII character codes
	ascii <-
c(32,46,48,49,50,51,52,53,54,55,56,57,65,66,67,68,69,70,71,72,73,74,75,76,77
,78,79,80,81,82,83,84,85,86,87,88,89,90,97,98,99,100,101,102,103,104,105,106
,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,33,35,36,37
,38,39,40,41,42,43,44,45,47,58,59,60,61,62,63,64,91,92,93,94,95,123,124,125,
126)
	ascii.values <- c("
",".","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H
","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","a
","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t
","u","v","w","x","y","z","!","#","$","%","&","'","(",")","*","+","\,","-","
/",":",";","<","=",">","?","@","["," ","]","^","_","{","|","}","~")
	all.records$values <-
ascii.values[match(as.character(all.records$values),as.character(ascii),
nomatch=1)]
	all.records <- inverse.rle(all.records)

	#Create a matrix of the ASCII data by record
	base.data <-
t(matrix(all.records,header$record.length,header$num.records))
	rm(all.records)

	#Function to collapse the ASCII codes, string split them and replace
" " with ""
	format.record <- function(record) {
		record <- paste(record,collapse="")
		record <- substring(record,
c(1,cumsum(fields$LENGTH)[1:length(cumsum(fields$LENGTH))-1]+1),cumsum(field
s$LENGTH))
		record <- gsub(" + ","", record)
		record
	}
	#Format the base.data ASCII record stream
	dbf <- as.data.frame(t(apply(base.data,1,format.record)))
	#Set the numeric fields to numeric
	for (i in 1:ncol(dbf)) {
		if(fields$TYPE[i]=="C") { dbf[[i]] <- as.character(dbf[[i]])
}
		if(fields$TYPE[i]=="N") { dbf[[i]] <-
as.numeric(as.character(dbf[[i]])) }
		if(fields$TYPE[i]=="F") { dbf[[i]] <-
as.numeric(as.character(dbf[[i]])) 
			warning("Possible trouble converting numeric field
in the DBF\n")
		}
	}
	colnames(dbf) <- as.character(fields$NAME)
	list(dbf=dbf, header=header)
}

>-----Original Message-----
>From: R. Herold [mailto:ralf.herold at charite.de]
>Sent: Sunday, February 16, 2003 8:49 AM
>To: r-help at stat.math.ethz.ch
>Cc: STABLER Benjamin
>Subject: Re: Re: [R] ESRI shape file import and time-space models
>
>
>Thanks for providing your functions, especially those for 
>reading and writing dBase files (read.dbf and write.dbf), 
>which presumably are of general interest because there is 
>no other implementation for reading and writing these 
>formats (apart from ODBC), as far as I know. 
>
>However, I suggest changing one byte character readBin to 
>readChar as the latter does not expect zero-terminated
>strings which were not present in my dBase-III-files' headers.
>One such header entry for example was (hex): 
>
>4B 4C 49 4e 00 00 00 00 00 00 00 43 2B 00 00 00
>02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
>
>(field "KLIN", type "C" [note "43" followed by "2B", 
> not "00"], width/length 2, padded to size 32)
>
>## ---------------------------------------------------
>## From read.dbf: 
>## Field Descriptions (32 bytes each)
>for (i in 1:num.fields) {
>  field.name.test <- readBin (infile, character(), 1, size=10,
>endian="little")
>  field.name      <- c (field.name, field.name.test)
>  if (nchar (field.name.test)!=10) {
>    file.temp <- readBin (infile,integer(), 10 - (nchar
>(field.name.test)), 1, endian="little")
>  }
>  ## RH 2003-02-16: replaced readBin by readChar in next line 
>  field.type   <- c (field.type, readChar (infile, 1))  
>  ## RH 2003-02-16: incremented by 1 to 4 items in next line 
>  ## to compensate for above change 
>  file.temp    <- readBin (infile, integer(),  4, 1, endian="little")  
>  field.length <- c (field.length, readBin (infile, integer(), 1, 1,
>endian="little")) 
>  file.temp    <- readBin (infile, integer(), 15, 1, endian="little")
>}
>## ---------------------------------------------------
>
>An enhancement might be to also set the appropriate type for 
>date fields, maybe like this (although I don't know internals
>of dBase date and time storage variants): 
>
>## ---------------------------------------------------
>## From read.dbf: 
>## Set the numeric fields to numeric
>for (i in 1:ncol(dbf)) {
>  ## RH 2003-02-16: added next line for date type setting 
>  if(fields$TYPE[i]=="D") {dbf[,i] <- strptime (as.character (dbf[,i]),
>format="%Y%m%d")}
>  if(fields$TYPE[i]=="C") {dbf[,i] <- as.character (dbf[,i])}
>  if(fields$TYPE[i]=="N") {dbf[,i] <- as.numeric (as.character
>(dbf[,i]))}
>  if(fields$TYPE[i]=="F") {dbf[,i] <- as.numeric (as.character
>(dbf[,i]))
>                           warning("Possible trouble converting numeric
>field in the DBF\n") 
>                          } 
>} 
>## ---------------------------------------------------
>
>Thanks and greetings - Ralf Herold 
>
>-- Dr. med. Ralf Herold  
>| Koordinationszentrale Kompetenznetz
>| Pädiatrische Onkologie und Hämatologie  
>| http://www.kinderkrebsinfo.de/   
>| Charité Campus Virchow-Klinikum  
>| Medizinische Fakultät Humboldt-Universität  
>| D-13353 Berlin, Augustenburger Platz 1  
>| Raum 4.3425 4. Etage Mittelallee 8  
>| Tel. +49 (30) 450-566834 Fax -566906  
>| mailto:ralf.herold at charite.de  
>
>> ----- Original Message ----- 
>> From: Benjamin.STABLER at odot.state.or.us
>> To: Ekkehardt.Altpeter at bag.admin.ch
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] ESRI shape file import and time-space models
>> Date: Fri, 14 Feb 2003 08:29:12 -0800
>[...]
>> Attached are some functions that I wrote to read and write 
>> shapefiles and
>> dbfs easily from within R.  You do not need any additional 
>> libraries or C
>> code.  I am still working out a few bugs but I have tested it 
>[...]
>> Benjamin Stabler
>> Transportation Planning Analysis Unit
>> Oregon Department of Transportation
>> 555 13th Street NE, Suite 2
>> Salem, OR 97301  Ph: 503-986-4104
>[...]
>