[R] Seeking to validate data quality requirements - should I develop a package?

Bert Gunter bgunter.4567 at gmail.com
Fri Aug 4 16:15:45 CEST 2017


Sounds like you'll be reinventing square wheels.

Searching "data quality package" on rseek.org brought up many hits.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Aug 4, 2017 at 2:56 AM, Architector Data Tools via R-help
<r-help at r-project.org> wrote:
> I am planning to develop an R package to manage all aspects of data
> quality. I am very experienced in data quality, but fairly new to R. I
> have tried to find a suitable data quality package, and am surprised
> not to find much to suit my requirements.  Developing the package
> would be an ambitious effort, involving several contributors (that I
> have already identified, and who also do not have much R experience
> yet). So I am seeking some confidence that the effort is worthwhile.
>
> The package will be highly configurable so it can be applied to pretty
> much any situation, and will implement sophisticated data quality
> capabilities, including:
>
> (a) DEFINITION: integration with a data dictionary (perhaps metaData),
> and with highly configurable and expressive data quality rules
>
> (b) MONITORING & DETECTION: automated data quality monitoring and
> alerting against any data source. Automatically raise and update
> quality issues
>
> (c) ANALYSIS & ROOT CAUSE: data quality dashboard, alerts,
> drill-downs, plot trends, including perhaps a machine learning aspect
> that detects noteworthy events in quality measurements for inclusion
> in executive reports
>
> (d) WORKFLOW: basic data quality management workflow (i.e. implement
> 'inbox' and 'actions', probably via Shiny)
>
> The requirements will be drawn from my professional experience (as
> interim head of data quality at a global bank), although this project
> is not sponsored either by my employer or any of my consulting
> clients. I do, however, expect the package to be of interest to
> financial service organisations who rely on good quality data for
> their financial and risk models, and for any other process that relies
> on good data.
>
> To sum up, if anyone can point to a data quality package that means I
> don’t have to develop one that would be great. Alternatively, any
> comments of support would also be very useful!
>
> David
>
> David Twaddell
> Architector Data Tools
> Tel: +44 20 3239 1099 | +44 7447 936 984
> Web: www.architector.co.uk
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list