[R] Cleaning up messy Excel data

Greg Snow 538280 at gmail.com
Sat Mar 3 00:41:24 CET 2012


Try sending your clients a data set (data frame, table, etc) as an MS
Access data table instead.  They can still view the data as a table,
but will have to go to much more effort to mess up the data, more
likely they will do proper edits without messing anything up (mixing
characters in with numbers, have more sexes than your biology teacher
told you about, add extra lines at top or bottom that makes reading
back into R more difficult, etc.)

I have had a few clients that I talked into using MS Access from the
start to enter their data, there was often a bit of resistance at
first, but once they tried it and went through the process of
designing the database up front they ended up thanking me and believed
that the entire data entry process was easier and quicker than had the
used excel as they originally planned.

Access is still part of MS office, so they don't need to learn R or in
any way break their chains from being prisoners of bill, but they will
be more productive in more ways than just interfacing with you.

Access (databases in general) force you to plan things out and do the
correct thing from the start.  It is possible to do the right thing in
Excel, but Excel does not encourage (let alone force) you to do the
right thing, but makes it easy to do the wrong thing.

On Thu, Mar 1, 2012 at 6:15 AM, jim holtman <jholtman at gmail.com> wrote:
> But there are some important reasons to use Excel.  In my work there
> are a lot of people that I have to send the equivalent of a data.frame
> to who want to look at the data and possibly slice/dice the data
> differently and then send back to me updates.  These folks do not know
> how to use R, but do have Microsoft Office installed on their
> computers and know how to use the different products.
>
> I have been very successful in conveying what I am doing for them by
> communicating via Excel spreadsheets.  It is also an important medium
> in dealing with some international companies who provide data via
> Excel and expect responses back via Excel.
>
> When dealing with data in a tabular form, Excel does provide a way for
> a majority of the people I work with to understand the data.  Yes,
> there are problems with some of the ways that people use Excel, and
> yes I have had to invest time in scrubbing some of the data that I get
> from them, but if I did not, then I would probably not have a job
> working for them.  I use R exclusively for the analysis that I do, but
> find it convenient to use Excel to provide a communication mechanism
> to the majority of the non-R users that I have to deal with.  It is a
> convenient "work-around" because I would never get them to invest the
> time to learn R.
>
> So in the real world these is a need to Excel and we are not going to
> cause it to go away; we have to learn how to live with it, and from my
> standpoint, it has definitely benefited me in being able to
> communicate with my users and continuing to provide them with results
> that they are happy with.  They refer to letting me work my "magic" on
> the data; all they know is they see the result via Excel and in the
> background R is doing the heavy lifting that they do not have to know
> about.
>
> On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner <rolf.turner at xtra.co.nz> wrote:
>> On 01/03/12 04:43, John Kane wrote:
>>>
>>> (mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
>>> str(mydata)
>>>
>>> newdata<- as.character(mydata)
>>>
>>> newdata[newdata==">2"]<- 0
>>> newdata<- as.numeric(newdata)
>>> str(newdata)
>>>
>>> We really need to keep Excel (and other spreadsheets) out of peoples
>>> hands.
>>
>>
>> Amen, bro'!!!
>>
>>    cheers,
>>
>>        Rolf Turner
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Jim Holtman
> Data Munger Guru
>
> What is the problem that you are trying to solve?
> Tell me what you want to do, not how you want to do it.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list