Title: | Delayed Read for 'GDAL' Vector Data Sources |
Version: | 0.2.0 |
Description: | Lazy read for drawings. A 'dplyr' back end for data sources supported by 'GDAL' vector drivers, that allows working with local or remote sources as if they are in-memory data frames. Basic features works with any drawing format ('GDAL vector data source') supported by the 'sf' package. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | sf (≥ 0.7-0), methods, DBI, tibble, dbplyr, magrittr, dplyr |
URL: | https://github.com/hypertidy/lazysf, https://hypertidy.github.io/lazysf/ |
BugReports: | https://github.com/hypertidy/lazysf/issues |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
Collate: | 'SFSQLConnection.R' 'SFSQLDriver.R' 'SFSQLResult.R' 'connect.R' 'lazysf-package.R' 'lazysf.R' 'utils-pipe.R' 'zzz.R' |
NeedsCompilation: | no |
Packaged: | 2025-04-03 05:51:22 UTC; mdsumner |
Author: | Michael Sumner |
Maintainer: | Michael Sumner <mdsumner@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-03 06:10:02 UTC |
lazysf: Delayed Read for 'GDAL' Vector Data Sources
Description
Lazy read for drawings. A 'dplyr' back end for data sources supported by 'GDAL' vector drivers, that allows working with local or remote sources as if they are in-memory data frames. Basic features works with any drawing format ('GDAL vector data source') supported by the 'sf' package.
Package Options
There is a debug option options(lazysf.query.debug = TRUE)
which if set will cause the generated SQL statement
to be printed before every call to sf::st_read()
. In addition it will print the number of rows actual read.
Author(s)
Maintainer: Michael Sumner mdsumner@gmail.com (ORCID)
See Also
Useful links:
Report bugs at https://github.com/hypertidy/lazysf/issues
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
SFSQL
Description
SFSQL driver, use to DBI::dbConnect()
to a data source readable by sf
Usage
SFSQL()
See Also
lazysf dbConnect
Examples
SFSQL()
Class SFSQLConnection (and methods)
Description
SFSQLConnection objects are created by passing SFSQL()
as first
argument to DBI::dbConnect()
.
They are a superclass of the DBI::DBIConnection class.
The "Usage" section lists the class methods overridden by lazysf.
Usage
## S4 method for signature 'SFSQLConnection'
show(object)
## S4 method for signature 'SFSQLConnection'
dbSendQuery(conn, statement, ...)
## S4 method for signature 'SFSQLConnection,character'
dbReadTable(conn, name, ...)
## S4 method for signature 'SFSQLConnection'
dbListTables(conn, ...)
## S4 method for signature 'SFSQLConnection,ANY'
dbExistsTable(conn, name, ...)
## S4 method for signature 'SFSQLConnection'
dbDisconnect(conn, ...)
See Also
The corresponding generic functions
DBI::dbSendQuery()
, DBI::dbDisconnect()
,
DBI::dbReadTable()
,
DBI::dbExistsTable()
, DBI::dbListTables()
.
Class SFSQLDriver.
Description
SFSQLDriver objects are created by SFSQL()
and used to select the correct
method in DBI::dbConnect()
.
They are a superclass of the DBI::DBIDriver class, and used purely for dispatch.
Usage
## S4 method for signature 'SFSQLDriver,ANY'
dbDataType(dbObj, obj, ...)
## S4 method for signature 'SFSQLDriver'
dbIsValid(dbObj, ...)
## S4 method for signature 'SFSQLDriver'
dbUnloadDriver(drv, ...)
## S4 method for signature 'SFSQLDriver'
dbGetInfo(dbObj, ...)
Details
The "Usage" section lists the class methods overridden by lazysf.
The DBI::dbUnloadDriver()
method is a null-op.
Class SFSQLResult (and methods)
Description
SFSQLResult objects are created by DBI::dbSendQuery()
or DBI::dbSendStatement()
,
and encapsulate the result of an SQL statement.
They are a superclass of the DBI::DBIResult class.
The "Usage" section lists the class methods overridden by lazsf.
Usage
## S4 method for signature 'SFSQLResult'
show(object)
## S4 method for signature 'SFSQLResult'
dbFetch(res, n = -1, ...)
## S4 method for signature 'SFSQLResult'
dbClearResult(res, ...)
## S4 method for signature 'SFSQLResult'
dbHasCompleted(res, ...)
See Also
The corresponding generic functions
DBI::dbFetch()
, DBI::dbClearResult()
, and
DBI::dbHasCompleted()
.
dbConnect
Description
dbConnect for drawings that may be read by package sf
Usage
## S4 method for signature 'SFSQLDriver'
dbConnect(drv, DSN = "", readonly = TRUE, ...)
Arguments
drv |
SFSQLDriver created by |
DSN |
data source name, may be a file, or folder path, database connection string, or URL |
readonly |
open in readonly mode ( |
... |
ignored |
Details
The 'OGRSQL' available is documented with GDAL: https://gdal.org/user/ogr_sql_dialect.html
Examples
afile <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
db <- dbConnect(SFSQL(), afile)
dbSendQuery(db, 'SELECT * FROM "nc.gpkg"')
Delayed (lazy) read for GDAL vector
Description
A lazy data frame for GDAL drawings ('vector data sources'). lazysf is DBI
compatible and designed to work with dplyr. It should work with any data source
(file, url, connection string) readable by the sf package function sf_read
.
Usage
lazysf(x, layer, ...)
## S3 method for class 'character'
lazysf(x, layer, ..., query = NA)
## S3 method for class 'SFSQLConnection'
lazysf(x, layer, ..., query = NA)
Arguments
x |
the data source name (file path, url, or database connection string
|
layer |
layer name (varies by driver, may be a file name without
extension); in case |
... |
ignored |
query |
SQL query to pass in directly |
Details
Lazy means that the usual behaviour of reading the entirety of a data source into memory is avoided. Printing the output results in a preview query being run and displayed (the top few rows of data).
The output of lazysf()
is a 'tbl_SFSQLConnectionthat extends
tbl_dbi' and
may be used with functions and workflows in the normal DBI way, see SFSQL()
for
the lazysf DBI support.
The kind of q uery that may be run will depend on the type of format, see the list on the GDAL vector drivers page. For some details see the GDALSQL vignette.
When dplyr is attached the lazy data frame can be used with the usual verbs
verbs (filter, select, distinct, mutate, transmute, arrange, left_join, pull,
collect etc.). To see the result as a SQL query rather than a data frame
preview use dplyr::show_query()
.
To obtain an in memory data frame use an explict collect()
or st_as_sf()
.
A call to collect()
is triggered by st_as_sf()
and will add the sf class
to the output. A result may not contain a geometry column, and so cannot be
convert to an sf data frame. Using collect()
on its own returns an
unclassed data.frame and may include a classed sfc
geometry column.
As well as collect()
it's also possible to use tibble::as_tibble()
or
as.data.frame()
or pull()
which all force computation and retrieve the
result.
Value
a 'tbl_SFSQLConnection', extending 'tbl_lazy' (something that works
with dplyr verbs, and only shows a preview until you commit the result via
collect()
) see Details
Examples
# online sources can work
geojson <- file.path("https://raw.githubusercontent.com/SymbolixAU",
"geojsonsf/master/inst/examples/geo_melbourne.geojson")
lazysf(geojson)
## normal file stuff
## (Geopackage is an actual database so with SELECT we must be explicit re geom-column)
f <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
lazysf(f)
lazysf(f, query = "SELECT AREA, FIPS, geom FROM \"nc.gpkg\" WHERE AREA < 0.1")
lazysf(f, layer = "nc.gpkg") %>% dplyr::select(AREA, FIPS, geom) %>% dplyr::filter(AREA < 0.1)
## the famous ESRI Shapefile (not an actual database)
## so if we SELECT we must be ex
shp <- lazysf(system.file("shape/nc.shp", package = "sf", mustWork = TRUE))
library(dplyr)
shp %>%
filter(NAME %LIKE% 'A%') %>%
mutate(abc = 1.3) %>%
select(abc, NAME, `_ogr_geometry_`) %>%
arrange(desc(NAME)) #%>% show_query()
## a multi-layer file
system.file("extdata/multi.gpkg", package = "lazysf", mustWork = TRUE)
Force computation of a GDAL query
Description
Convert lazysf to an in memory data frame or sf object
Usage
## S3 method for class 'tbl_SFSQLConnection'
st_as_sf(x, ...)
collect(x, ...)
Arguments
x |
output of |
... |
passed to |
Format
An object of class function
of length 1.
Details
collect()
retrieves data into a local table, preserving grouping and ordering.
st_as_sf()
retrieves data into a local sf data frame (will succeed only if there is a geometry column of class sfc
)
Value
a data frame from collect()
, sf data frame from st_as_sf()
(only if it contains an sfc
geometry column)
See Also
lazysf
Examples
f <- system.file("gpkg/nc.gpkg", package = "sf", mustWork = TRUE)
lsf <- lazysf(f) %>% dplyr::select(AREA, FIPS, geom) %>% dplyr::filter(AREA < 0.1)
st_as_sf(lsf)