[R] Cleaning up messy Excel data

Jim Lemon jim at bitwrit.com.au
Sat Mar 3 00:58:14 CET 2012

Unfortunately, a lot of people who use MS Office don't have or know how 
to use MS Access. Where I work now (as in the past) I have to tie 
someone to their chair, give them a few pokes with the cattle prod and 
then show them that a CSV file will load straight into Excel before I 
can convince them that they can use such a heretical data format. You 
don't want to know what I have to do to convince them that they can view 
my listings in HTML.


PS - Always give them a _copy_ of the CSV file.

On 03/03/2012 10:41 AM, Greg Snow wrote:
> Try sending your clients a data set (data frame, table, etc) as an MS
> Access data table instead.  They can still view the data as a table,
> but will have to go to much more effort to mess up the data, more
> likely they will do proper edits without messing anything up (mixing
> characters in with numbers, have more sexes than your biology teacher
> told you about, add extra lines at top or bottom that makes reading
> back into R more difficult, etc.)
> I have had a few clients that I talked into using MS Access from the
> start to enter their data, there was often a bit of resistance at
> first, but once they tried it and went through the process of
> designing the database up front they ended up thanking me and believed
> that the entire data entry process was easier and quicker than had the
> used excel as they originally planned.
> Access is still part of MS office, so they don't need to learn R or in
> any way break their chains from being prisoners of bill, but they will
> be more productive in more ways than just interfacing with you.
> Access (databases in general) force you to plan things out and do the
> correct thing from the start.  It is possible to do the right thing in
> Excel, but Excel does not encourage (let alone force) you to do the
> right thing, but makes it easy to do the wrong thing.
> On Thu, Mar 1, 2012 at 6:15 AM, jim holtman<jholtman at gmail.com>  wrote:
>> But there are some important reasons to use Excel.  In my work there
>> are a lot of people that I have to send the equivalent of a data.frame
>> to who want to look at the data and possibly slice/dice the data
>> differently and then send back to me updates.  These folks do not know
>> how to use R, but do have Microsoft Office installed on their
>> computers and know how to use the different products.
>> I have been very successful in conveying what I am doing for them by
>> communicating via Excel spreadsheets.  It is also an important medium
>> in dealing with some international companies who provide data via
>> Excel and expect responses back via Excel.
>> When dealing with data in a tabular form, Excel does provide a way for
>> a majority of the people I work with to understand the data.  Yes,
>> there are problems with some of the ways that people use Excel, and
>> yes I have had to invest time in scrubbing some of the data that I get
>> from them, but if I did not, then I would probably not have a job
>> working for them.  I use R exclusively for the analysis that I do, but
>> find it convenient to use Excel to provide a communication mechanism
>> to the majority of the non-R users that I have to deal with.  It is a
>> convenient "work-around" because I would never get them to invest the
>> time to learn R.
>> So in the real world these is a need to Excel and we are not going to
>> cause it to go away; we have to learn how to live with it, and from my
>> standpoint, it has definitely benefited me in being able to
>> communicate with my users and continuing to provide them with results
>> that they are happy with.  They refer to letting me work my "magic" on
>> the data; all they know is they see the result via Excel and in the
>> background R is doing the heavy lifting that they do not have to know
>> about.
>> On Wed, Feb 29, 2012 at 4:41 PM, Rolf Turner<rolf.turner at xtra.co.nz>  wrote:
>>> On 01/03/12 04:43, John Kane wrote:
>>>> (mydata<- as.factor(c("1","2","3", ">2", "5", ">2")))
>>>> str(mydata)
>>>> newdata<- as.character(mydata)
>>>> newdata[newdata==">2"]<- 0
>>>> newdata<- as.numeric(newdata)
>>>> str(newdata)
>>>> We really need to keep Excel (and other spreadsheets) out of peoples
>>>> hands.
>>> Amen, bro'!!!
>>>     cheers,
>>>         Rolf Turner

More information about the R-help mailing list