[R] How to clean up missing values in a list of lists

Aron Lindberg aron.lindberg at case.edu
Tue Feb 10 15:46:19 CET 2015


Hi,


I’m trying to query the Github API, and I’m running into some data munging issues, so I was hoping someone on the list might advise.


Here’s my code. To run it you need to replace client_id and client_secret with your own authorization information for Github.


library(github)
library(RCurl)
library(httpuv)
library(jsonlite)


# Set up the query
ctx = interactive.login(“client_id”, “client_secret”)


pull <- function(i){
  get.pull.request.files(owner = “rails”, repo = “rails”, id = i, ctx = get.github.context(), per_page=1000)
}


data <- read.csv(getURL(“https://gist.githubusercontent.com/aronlindberg/a3d135a303664046c94a/raw/e42a0734ec4542eccf5f4d5bdeed5afbdd1720e9/pull_ids”), sep = “\n”)


list <- read.csv(textConnection(data), header = FALSE)


pull_lists <- lapply(list$V1, pull)


get_files <- function(pull_lists){
  sapply(pull_lists$content, “[[“, “filename” )
}


file_lists <- lapply(pull_lists, get_files)


Everything works fine until the last command, which generates:


Error in FUN(X[[1L]], ...) : subscript out of bounds


I’ve read here: http://stackoverflow.com/questions/18461499/subscript-out-of-bounds-on-character-vector


which leads me to believe that the reason for the error is that when I run file_lists <- lapply(pull_lists, get_files) some of the entries are missing. However, I cannot figure out how to clean up the data. I have tried something along the lines of:


clean_files <- function(pull_lists){
  pull_lists$content[which(nchar(pull_lists$content)==NULL)]<-NA
}


clean_lists <- lapply(pull_lists, clean_files)


But that simply replaces *every* value with NA (similarly if I change ==NULL to <1, or <2).


How can I make this code work?


Best,
Aron


-- 
Aron Lindberg


Doctoral Candidate, Information Systems
Weatherhead School of Management 
Case Western Reserve University
aronlindberg.github.io
	[[alternative HTML version deleted]]



More information about the R-help mailing list