[R] navel-gazing

Joshua Wiley jwiley.psych at gmail.com
Fri Sep 17 23:05:49 CEST 2010


I have been tinkering around with this for a bit, and I am proud to
share navel gazer 1.0.

If no arguments are passed, it will look up the top 50 authors on the
r-help list, for the given month in the given year. You can also
specify one or more months as a character vector (e.g., "August" or
c("August", "September") ). The same goes for years.  Thanks to some
help from Henrique (although I promise this was not the only reason I
wanted it), it will not try to pull a month beyond the current one.
You can also choose a different list (such as r-devel).  If you have
the time, you can set argument entire = TRUE, which will look up every
month from whatever year(s) you specified (or the current year if you
did not). It will return a named list with each element corresponding
to one month. Also, by default it will create a dotplot in lattice
(though this may be turned off via plot = FALSE). Finally, you can
specify how many authors you want. It defaults to 50.

It is also available at here:
http://gist.github.com/584910

#######################################

navel.gazer <- function(month = NULL, year = NULL, entire = FALSE,
                         list = "r-help", n = 50, plot = TRUE) {
  # Ben Bolker came up with most of the code
  # Henrique Dallazuanna provided an edit to the z <- line of code
  # Brian Diggs provided capwords() to properly count Peter Dalgaard
  # Joshua Wiley adapted all of it to one function


  if(is.null(month)) {
    month <- format(Sys.Date(), format = "%B")
  }
  if(isTRUE(entire)) {
    month <- unique(months(as.Date(1:365, "2000-01-01")))
  }
  if(is.null(year)) {
    year <- format(Sys.Date(), format = "%Y")
  }
  if(length(year) > 1) {
    tmp <- vector(mode = "list", length = length(year))
    for(i in seq_along(year)) {
      tmp[[i]] <- paste(year[i], month, sep = "-")
    }
    times <- unlist(tmp)
  } else {
    times <- paste(year, month, sep = "-")
  }

  require(zoo)

  times <- sort(as.yearmon(times, "%Y-%B"))
  current <- as.yearmon(Sys.Date(), "%Y-%m")
  times <- format(times[times <= current], "%Y-%B")

  # Function to extract the names
  # Originally by Ben Bolker
  namefun <- function(x) {
    gsub("\\n","",gsub("^.+<I>","",gsub("</I>.+$","",x)))
  }

  # Based on a suggestion by Brian Diggs
  # Capitalizes the first letter of each word
  capwords <- function(s, strict = FALSE) {
    cap <- function(s) paste(toupper(substring(s,1,1)),
                         {s <- substring(s,2); if(strict) tolower(s) else s},
                         sep = "", collapse = " " )
    sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
  }

  # Collects the author names for the relevant month and list
  # from the R archives
  # Originally by Ben Bolker
  grabber <- function(month, list, n) {
    baseurl <- "https://stat.ethz.ch/pipermail/"
    require(RCurl)
    # z <- getURL(paste(baseurl,list,"/",month,"/author.html",sep=""))
    z <- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""),
                ssl.verifypeer = FALSE)
    zz <- strsplit(z,"<LI>")[[1]]
    cnames <- capwords(sapply(zz[3:(length(zz)-1)],namefun))
    rr <- rev(sort(table(cnames)))
    output <- rr[1:n]
    return(output)
  }

  # Create dot plots of the number of posts
  # lattice dotplot() code primarily by Ben Bolker
  plotter <- function(dat) {
    require(lattice)
    if(length(dat) > 1) {
      old.par <- par(no.readonly = TRUE)
      on.exit(par(old.par))
      par("ask" = TRUE)
    }
    for(i in seq_along(dat)) {
      print(dotplot(~rev(dat[[i]]), xlab = "Number of posts",
                    main = names(dat)[i]))
    }
    invisible()
  }

  numbers <- lapply(times, function(x)
                    {grabber(month = x, list = list, n = n)})
  names(numbers) <- times

  if(plot) {
    plotter(dat = numbers)
  }

  return(numbers)
}

#######################################


Two examples:

navel.gazer(year = 2009, entire = TRUE)
navel.gazer(month = "September", year = 2007)

This basically works out to be a tribute to David Winsemius:
navel.gazer(n = 1, entire = TRUE)

Hope you get a bit of fun out of this.  I certainly enjoyed writing it!

Josh

On Tue, Aug 17, 2010 at 1:10 PM, Henrique Dallazuanna <wwwhsd at gmail.com> wrote:
> I think that gsub example on help page is more clear:
>
> library(XML)
>
> # could be used the XML package to get the names
> cnames <- gsub('\n', '', head(tail(sapply(getNodeSet(htmlParse(z, asText =
> TRUE), "//i"), xmlValue), -3), -3))
> gsub("(\\w)(\\w*)", "\\U\\1\\L\\2", cnames, perl=TRUE)
>
>
>
> On Tue, Aug 17, 2010 at 4:44 PM, Brian Diggs <diggsb at ohsu.edu> wrote:
>
>> Since Peter Dalgaard is splitting his considerable contributions between
>> "Peter Dalgaard" and "peter dalgaard", I made the following changes (which
>> shouldn't be a problem unless e e cummings becomes a regular poster):
>>
>> # from base::chartr documentation
>> capwords <- function(s, strict = FALSE) {
>>    cap <- function(s) paste(toupper(substring(s,1,1)),
>>                  {s <- substring(s,2); if(strict) tolower(s) else s},
>>                             sep = "", collapse = " " )
>>    sapply(strsplit(s, split = " "), cap, USE.NAMES = !is.null(names(s)))
>> }
>>
>> cnames <- capwords(sapply(zz[3:(length(zz)-1)],namefun))
>>
>>
>>
>>
>> On 8/17/2010 10:00 AM, Henrique Dallazuanna wrote:
>>
>>> Ben,
>>>
>>> I change the line:
>>>
>>> z<- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""))
>>>
>>> to
>>>
>>> z<- getURL(paste(baseurl,list,"/", month,"/author.html",sep=""),
>>> ssl.verifypeer = FALSE)
>>>
>>> because don't work for me.
>>>
>>> Nice!
>>>
>>> On Tue, Aug 17, 2010 at 1:47 PM, Ben Bolker<bbolker at gmail.com>  wrote:
>>>
>>>  month<- "2010-August"
>>>> list<- "r-help"
>>>> ##list<- "r-sig-ecology"
>>>> ##list<- "r-sig-mixed-models"
>>>> ## month<- "2010q3"
>>>> n<- 50
>>>> baseurl<- "https://stat.ethz.ch/pipermail/"
>>>> library(RCurl)
>>>> z<- getURL(paste(baseurl,list,"/",month,"/author.html",sep=""))
>>>> zz<- strsplit(z,"<LI>")[[1]]
>>>> namefun<- function(x) {
>>>>  gsub("\\n","",gsub("^.+<I>","",gsub("</I>.+$","",x)))
>>>> }
>>>>
>>>> cnames<- sapply(zz[3:(length(zz)-1)],namefun)
>>>> rr<- rev(sort(table(cnames)))
>>>>
>>>>
>>>> library(lattice)
>>>> dotplot(~rev(rr[1:n]),xlab="Number of posts")
>>>>
>>>> dotplot(~rev(rr[1:n]),xlab="Number of posts",
>>>>        scales=list(x=list(log=10)))
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>>
>> --
>> Brian Diggs
>> Senior Research Associate, Department of Surgery, Oregon Health & Science
>> University
>>
>
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list