[R] Seeking to validate data quality requirements - should I develop a package?

Architector Data Tools dtwadd at googlemail.com
Fri Aug 4 11:56:17 CEST 2017


I am planning to develop an R package to manage all aspects of data
quality. I am very experienced in data quality, but fairly new to R. I
have tried to find a suitable data quality package, and am surprised
not to find much to suit my requirements.  Developing the package
would be an ambitious effort, involving several contributors (that I
have already identified, and who also do not have much R experience
yet). So I am seeking some confidence that the effort is worthwhile.

The package will be highly configurable so it can be applied to pretty
much any situation, and will implement sophisticated data quality
capabilities, including:

(a) DEFINITION: integration with a data dictionary (perhaps metaData),
and with highly configurable and expressive data quality rules

(b) MONITORING & DETECTION: automated data quality monitoring and
alerting against any data source. Automatically raise and update
quality issues

(c) ANALYSIS & ROOT CAUSE: data quality dashboard, alerts,
drill-downs, plot trends, including perhaps a machine learning aspect
that detects noteworthy events in quality measurements for inclusion
in executive reports

(d) WORKFLOW: basic data quality management workflow (i.e. implement
'inbox' and 'actions', probably via Shiny)

The requirements will be drawn from my professional experience (as
interim head of data quality at a global bank), although this project
is not sponsored either by my employer or any of my consulting
clients. I do, however, expect the package to be of interest to
financial service organisations who rely on good quality data for
their financial and risk models, and for any other process that relies
on good data.

To sum up, if anyone can point to a data quality package that means I
don’t have to develop one that would be great. Alternatively, any
comments of support would also be very useful!

David

David Twaddell
Architector Data Tools
Tel: +44 20 3239 1099 | +44 7447 936 984
Web: www.architector.co.uk



More information about the R-help mailing list