[R] analyzing results from Tuesday's US elections
@pencer@gr@ve@ @end|ng |rom e||ect|vede|en@e@org
Sun Nov 8 09:24:52 CET 2020
On 2020-11-07 23:39, Abby Spurdle wrote:
>> What can you tell me about plans to analyze data from this year's
>> general election, especially to detect possible fraud?
> I was wondering if there's any R packages with out-of-the-box
> functions for this sort of thing.
> Can you please let us know, if you find any.
>> I might be able to help with such an effort. I have NOT done
>> much with election data, but I have developed tools for data analysis,
>> including web scraping, and included them in R packages available on the
>> Comprehensive R Archive Network (CRAN) and GitHub.
> Do you have a URL for detailed election results?
> Or even better, a nice R-friendly CSV file...
> I recognize that the results aren't complete.
> And that such a file may need to be updated later.
> But that doesn't necessarily prevent modelling now.
I asked, because I don't know of any such. With the increasingly
vicious, widespread and systematic attacks on the integrity of elections
in the US, I think it would be good to have a central database of
election results with tools regularly scraping websites of local and
state election authorities. Whenever new data were posted, the software
would update the central repository and send emails to anyone
interested. That could simplify data acquisition, because historical
data could already be available there. And it would be one standard
format for the entire US and maybe the world.
This could be extremely valuable in exposing electoral fraud, thereby
reducing its magnitude and effectiveness. This is a global problem, but
it seems to have gotten dramatically worse in the US in recent years.
I'd like to join -- or organize -- a team of people working on this.
If we can create the database and data analysis tools in a package like
Ecfun on CRAN, I think we can interest college profs, especially those
teaching statistics to political science students, who would love to
involve their students in something like this. They could access data
real time in classes, analyze it using standard tools that we could
develop, and involve their students in discussing what it means and what
it doesn't. They could discuss Bayesian sequential updating and quality
control concepts using data that are real and relevant to the lives of
their students. It could help get students excited about both
statistics and elections.
Such a project may already exist. I know there are projects at some
major universities that sound like they might support this. However
with the limited time I've invested in this so far, I didn't find any
that seemed to provide easy access to such data and an easy way to join
such a project. Ballotpedia has such data but don't want help in
analyzing it and asked for a few hundred dollars for data for one
election cycle in Missouri, which is what I requested. I can get that
for free from the web site of the Missouri Secretary of State.
I thought I might next ask the Carter Center about this. However,
but I'm totally consumed with other priorities right now. I don't plan
to do anything on this in the short term -- unless I can find
If such a central database doesn't exist -- and maybe even if it does
-- I thought it might be good to make all the data available in a
standard format in Wikidata, which is a project of the Wikimedia
Foundation, which is also the parent organization of Wikipedia. Then I
could help create software and documentation on how to scrape data from
the web sites of different election organizations that have it and
automatically update Wikidata while also sending emails to people who
express interest in those election results. Then we could create
software for analyzing such data and make that available, e.g., on
Wikiversity, which is another project of the Wikimedia Foundation --
with the R code in Ecfun or some other CRAN package.
If we start now, I think we could have something mediocre in time for
various local elections that occur next year with improvements for the
2022 US Congressional elections and something even better for the 2024
US presidential elections.
Thanks for asking.
More information about the R-help