[R] The Future of R | API to Public Databases

Benjamin Weber mail at bwe.im
Sat Jan 14 12:44:01 CET 2012


Spencer

I highly appreciate your input. What we need is a standard for
statistics. That may reinvent the way how we see data.
The recent crisis is the best proof that we are lost in our own
generated information overload. The traditional approach is not
working anymore.

Finding the right members for the initial committee would be the
hardest but most important part.

Another point is that I am only a student of 21 years which has
limited financial capabilities with respect to what I can commit to
such kind of a work.
But I have my motivation which is the *real* engine to advance an
idea. I am open to work in my spare time on it. Over time I would
become an expert in my own field, that is implicit in such a decision.
I don't have any background of a statistician but know what the
relevance of data is.
It may be a solution that a fresher gives a new perspective. Starting
from scratch is at some point beneficial.
It will be even harder for a person like me to convince the
experienced professionals to overcome their own conventional schemes
and procedures. Because my approach would pay not respect to the
established ones. "Why the hell should I know it just better than the
experts?" I respect single solutions; they might work in a specific
situation but they make it impossible to put everything together into
a big picture which is finally required.

I am really interested in leading the initiative of such a new
standard. My problem is how to start.

Would a scientific paper which proposes the development of a standard,
be a starting point?

Benjamin

On 14 January 2012 08:19, Spencer Graves
<spencer.graves at structuremonitoring.com> wrote:
>      A traditional way to exit a chaotic situation as you describe is to try
> to establish a standards committee, invite participation from suppliers and
> users of whatever (data in this case), apply for registration with the
> International Standards Organization, and organize meetings, draft and
> circulate a proposed standard, etc.  A statistician who had published maybe
> 100 papers and 3 books told me that his work on ISO 9000 (I think) made a
> larger contribution to humanity than anything else he had done.  Work on
> standards is one of the most boring, tedious activities I can imagine -- and
> can potentially be the most impactful thing one does in this life:  If you
> have an ISO standard number for something, people who are starting something
> new may find it and follow it.  People who are working to upgrade something
> may tell their management, "Let's follow this standard."  Customers
> sometimes ask their suppliers, "If you follow the standard, you might get
> more customers."
>
>
>      I think you could get support for such a standard effort from the
> American Association for the Advancement of Science, the American Economics
> Association, the American Statistical Association, and many other
> organizations, including many on-line science journals that today pressure
> authors of papers to put the data behind their published paper in the public
> domain, downloadable from their web site, etc.
>
>
>      IMHO.
>      Spencer
>
>
> On 1/13/2012 3:39 PM, Benjamin Weber wrote:
>>
>> The whole issue is related to the mismatch of (1) the publisher of the
>> data and (2) the user at the rendezvous point.
>> Both the publisher and the user don't know anything about the
>> rendezvous point. Both want to meet but don't meet in reality.
>> The user wastes time to find the rendezvous point defined by the
>> publisher.
>> The publisher assumes any rendezvous point. As per the number of
>> publishers, the variety of the fields and the flavor of each expert,
>> we end up in today's data world. Everyone has to waste his precious
>> time to find out the rendezvous point. Only experts do know in which
>> corner to focus their search on - but even they need their time to
>> find what they want.
>> However, each expert (of each profession) believes that his approach
>> is the best one in the world.
>> Finally we have a state of total confusion, where only experts can
>> handle the information and non-experts can not even access the data
>> without diving fully into the flood of data and their specialities.
>> That's my point: Data is not accessible.
>>
>> The discussion should follow a strategical approach:
>> - Is the classical csv file (in all its varieties) the simplest and best
>> way?
>> - Isn't it the responsibility of the R community to recommend
>> standards for different kinds of data?
>> With the existence of this rendezvous point the publisher would know a
>> specific point which is favorable from the user's point of view. That
>> is missing.
>> Only a rendezvous point defined by the community can be a 'known'
>> rendezvous point for all stakeholders, globally.
>>
>> I do believe that the publisher's greatest interest is data
>> accessibility. Where is the toolkit we provide them to enable them to
>> serve us the data exactly as we want it? No, we just try to build even
>> more packages to be lost in the noise of information.
>>
>> I disagree with a proposed solution to have a maintained package or a
>> bunch of packages which just combines connections to the existing
>> databases and keeping them up to date. It is a question of time when
>> the user will be lost there. Such an approach is neither feasible, nor
>> efficient.
>>
>> We should just tell them where we would like to meet.
>>
>> Benjamin
>>
>> On 14 January 2012 04:58, Brian Diggs<diggsb at ohsu.edu>  wrote:
>>>
>>> On 1/13/2012 2:26 PM, MacQueen, Don wrote:
>>>>
>>>> It's a nice idea, but I wouldn't be optimistic about it happening:
>>>>
>>>> Each of these public databases no doubt has its own more or less unique
>>>> API, and the people likely to know the API well enough to write R code
>>>> to
>>>> access any particular database will be specialists in that field. They
>>>> likely won't know much if anything about other public databases. The
>>>> likelihood of a group forming to develop ** and maintain ** a single R
>>>> package to access the no-doubt huge variety of public databases strikes
>>>> me
>>>> as small.
>>>
>>>
>>> I agree. The more reasonable model is a collection of packages, each of
>>> which can access a particular data source.
>>>
>>>
>>>> However, this looks like a great opportunity for a new CRAN Task View.
>>>> The
>>>> task view would simply identify which packages connect to which public
>>>> databases. (sorry, I can't volunteer)
>>>
>>>
>>> A CRAN Task View would be well suited for this. I have tagged these sort
>>> of
>>> packages on crantastic with the "onlineData" tag when I happen to notice
>>> one, but I have not made a concerted effort to find all packages.  A Task
>>> View would be even better.
>>>
>>> http://crantastic.org/tags/onlineData
>>>
>>>
>>>> -Don
>>>>
>>>> p.s.
>>>> I can mention openair as a package that has tools to access public
>>>> databases.
>>>
>>>
>>> Tagged it.
>>>
>>> --
>>> Brian S. Diggs, PhD
>>> Senior Research Associate, Department of Surgery
>>> Oregon Health&  Science University
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Spencer Graves, PE, PhD
> President and Chief Technology Officer
> Structure Inspection and Monitoring, Inc.
> 751 Emerson Ct.
> San José, CA 95126
> ph:  408-655-4567
> web:  www.structuremonitoring.com



More information about the R-help mailing list