[R] Lobbying database

Spencer Graves spencer.graves at structuremonitoring.com
Fri Apr 13 06:53:14 CEST 2012


Hi, Joseph:


	  What are your priorities regarding the US lobbying database?


	  At 
"http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm", I 
see 4 links:


		    * Search the Lobbying Database (LD-1, LD-2)


		    * Download a Lobbying Documents Database


		    * Search the Contributions Database (LD-203)

	
		    * Downloadable Contributions Databases


	  Am I correct that the two "Search" links are to databases that 
contain lots of nonsense, and the task is to download the "Lobbying 
Documents" and maybe also the "Contributions" database, run a number of 
checks, screen out the nonsense and create search capabilities similar 
to what is offered at this web site but without the garbage?


	  I downloaded one file from 
"http://www.senate.gov/legislative/Public_Disclosure/database_download.htm". 
  I see that it's "xml" inside.  I have not worked with XML much before, 
but it doesn't look too difficult just from a casual perusal -- and R 
has an "XML" package.


	  Also, do you have a list publications by others who have done things 
with these data?  I'd like to contact them to find out what tools they 
have they'd be willing to share, the priorities they would suggest for a 
project like this, etc.  The project already exists on R-Forge at 
"https://r-forge.r-project.org/R/?group_id=84".  Currently, that only 
contains a very brief statement of intent.  However, that's clear 
evidence that I've done, and it's available in an environment that would 
support collaboration from others who might be interested in contributing.


	  I thought I'd first ask interested researchers for their input on 
priorities and the circumstances under which they might use and even 
contribute to a project like this.  I also plan the 41 packages 
contributed to the Comprehensive R Archive Network (CRAN) with 
"political science" mentioned on a help page.  Some of those identify 
political science professors, whom I plan to contact with similar 
questions.  After I've done this, I plan to send a broader invitation to 
"R-help at r-project.org" to see if I can get volunteers there.  With a 
modest amount of luck, this will generate both advice on the most 
important things to consider here AND volunteers to help produce the 
tools needed to make it all happen.


	  Comments?
	  Best Wishes,
	  Spencer


p.s.  A journey of a thousand miles can be achieved in a year at 3 miles 
per day or 20 miles per week.


#######################################


       Does the database you identified 
(http://www.senate.gov/legislative/Public_Disclosure/LDA_reports.htm) 
pertain to all branches of government (Senate, House, executive) or only 
the US Senate?


       I ask, because I'd like a terse name for the project like 
"USSenateLobbying" or just "USlobbying".  Which more accurately 
describes these data?  Or would you recommend something different?  (The 
name should not include blank space, though it can include a period ".".)


       I recommend we create the desired software using, at least in 
part, the free, open source software language R (www.r-project.org).  I 
propose we structure the code in a "package" to be developed on a 
subversion repository, R-Forge (r-forge.r-project.org), and submitted to 
the Comprehensive R Archive Network (CRAN).  I have substantial 
experience with R, CRAN, R package creation including using R-Forge.


       Thanks,
       Spencer


p.s.  R is the language of choice for a large and growing number of 
people engaged in new statistical algorithm development, with almost 
3700 contributed packages currently downloadable from any of 84 mirrors 
in 38 countries.  I like it partly because it promotes good development 
practices encouraging simultaneous development of documentation and 
code.  Creating a package on R-Forge makes it easy to involve a team of 
volunteers, none of whom ever need to meet face to face.  We can start 
as soon as we have a name.  After initiation, we can notify developers 
of other R packages designed for political science applications to seek 
their suggestions and possible collaboration.  With a little luck we may 
be able to obtain help from professors and similar researchers at 
Harvard, Stanford and elsewhere.


-- 
Spencer Graves, PE, PhD
President and Chief Technology Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567
web:  www.structuremonitoring.com



More information about the R-help mailing list