[R] "Design patterns" for data munging?

Aron Lindberg aron.lindberg at case.edu
Fri Feb 20 15:05:40 CET 2015


Hi All,


The most difficult challenge that I face in “learning R” is to do data munging. I have reviewed Hadley’s advanced R programming guide, familiarized myself with data structures, subsetting, plyr, dplyr, tidy, the lapply() family of functions, basic string manipulation and grepping, SQL etc. I’ve also written a few dozens of functions that do basic data munging tasks. Further, I’ve already reviewed things like the Coursera course “Computing for Data Analysis” - https://www.coursera.org/course/compdata and Data Camp's data.table course.


However, many of the tasks that are commonly solved by the tools mentioned above seem to be mainly applied to datasets with fairly well-structured variables that needs to be transformed and subsetted in various ways - these tasks are often not so difficult. 



Much of my work involves querying APIs, SQL databases or scraping websites, and then assembling lists of various things that can then be transformed into social networks or timestamped sequences of various events etc. Solutions to many tricky problems in this area still seem to imply creative leaps of imagination that I can understand after I see them, but I have trouble seeing how I could ever come up with them independently.


Therefore I ask - what do I need to learn to become better at solving tricky data munging problems?


I realize a common answer may be: solve many data munging problems. I understand that this is a clear factor, however, I’m trying to figure out if there is some more tangible guidance. 


* Is there something like “design patterns” for data munging? 
* Would doing a course in algorithms help? (I’ve reviewed parts of "Guide to Programming and Algorithms Using R" - http://www.springer.com/computer/swe/book/978-1-4471-5327-6 - many of the problems are mathematical and seem far-removed from the kinds of problems that I’m trying to solve)
* Is there something like SelectorGadget (http://selectorgadget.com/) for R objects?
* Could something like OpenRefine (http://openrefine.org/) make these tasks easier?


Best,
Aron

-- 
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management 
Case Western Reserve University
aronlindberg.github.io
	[[alternative HTML version deleted]]



More information about the R-help mailing list