[R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency
Greg.Snow at imail.org
Fri Jun 6 21:14:20 CEST 2008
I still like the number 4 option, so I think we need to come up with a formal definition for a "junk" of data. I read somewhere that Tukey coined the word "bit" as it applies to computers, we can share the credit/blame for "junks" of data.
My proposal for a statistical/data definition of the work junk:
A quantity of data just large enough to get the client excited about the "great" dataset they provided, but not large enough to make any useful conclusions.
Example sentence: We just received another junk of data from the boss, who gets to give him the bad news that it still does not prove his pet theory?
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
greg.snow at imail.org
> -----Original Message-----
> From: Patrick Burns [mailto:pburns at pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow at imail.org> wrote:
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not far
> >>> from optimal.
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >> Pat,
> >> I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects. I figure there are a few
> different possibilities:
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >> 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help