[R] Subsetting a list of lists using lapply

David Winsemius dwinsemius at comcast.net
Fri Feb 20 19:56:01 CET 2015


On Feb 20, 2015, at 6:13 AM, Aron Lindberg wrote:

> Hmm…Chuck’s solution may actually be problematic because there are several entries which at the deepest level are called “sha”, but that should not be included, such as:
> 
> input[[67]]$content[[1]]$commit$tree$sh
> 
> 
> and
> 
> input[[67]]$content[[1]]$parents[[1]]$sha
> 
> it’s only the “sha” that fit the following subsetting pattern that should be included:
> 
> 
> input[[i]]$content[[1]]$sha[1]
> 
> 
> It’s getting thornier!
> 
> To be fair to Rolf’s solution (which probably can be updated to solve the problem), I’ve posted the complete dput here:
> 
> https://gist.githubusercontent.com/aronlindberg/92700c04c88ff112e4f7/raw/0f3cd8468f4dc82267be3cec72d53a7a04f5c449/dput.R

I didn't try on the larger example, but this works on the smaller one:

 get_shas <- function(input){
	x <- lapply(input, "[[", "content")
        y <- lapply(x, "[[", 1)   
	z <- lapply(y, function(yy) if( length(names(yy)) && names(yy) =="sha"  ){ yy[["sha"]] })
	}
      sha_lists <- get_shas(input)

It does deliver an entry for every leaf of the input-object which is either the value of "sha" or NA. I think that is not a bad thing because it lets you figure out where the values are coming from.

> 
> -- 
> 
> Aron Lindberg
> 
> 
> 
> 
> Doctoral Candidate, Information Systems
> 
> Weatherhead School of Management 
> 
> Case Western Reserve University
> 
> aronlindberg.github.io
> 
> On Fri, Feb 20, 2015 at 8:25 AM, Aron Lindberg <aron.lindberg at case.edu>
> wrote:
> 
>> Thanks Chuck and Rolf.
>> While Rolf’s code also works on the dput that I actually gave you (a smaller subset of the full dataset), it failed to work on the larger dataset, because there are further exceptions:
>> input[[i]]$content[[1]] is sometimes a list, sometimes a character vector, and sometimes input[[i]]$content simply returns list().
>> Chuck’s solution however bypasses this and works on the full dataset (which was 8mb, which is why I didn’t upload it as a gist).
>> Best,
>> Aron
>> -- 
>> Aron Lindberg
>> Doctoral Candidate, Information Systems
>> Weatherhead School of Management 
>> Case Western Reserve University
>> aronlindberg.github.io
>> On Fri, Feb 20, 2015 at 12:44 AM, Charles Berry <ccberry at ucsd.edu> wrote:
>>> Aron Lindberg <aron.lindberg <at> case.edu> writes:
>>>> 
>>>> Hi Everyone,
>>>> 
>>>> I'm working on a thorny subsetting problem involving list of lists. I've put a 
>>> dput of the data here:
>>>> 
>>>> 	https://gist.githubusercontent.com/aronlindberg/b916dee897d051ac5be5/
>>> raw/a78cbf873a7e865c3173f943ff6309ea688c653b/dput
>>>> 
>>> IIUC, you want the value of every list element that is named "sha" and 
>>> that name will only apply to atomic objects.
>>> If so, this should do it. 
>>>> input <- dget("/tmp/dpt")
>>>> shas <- unlist( input, use.names=FALSE )[ grepl( "sha", names(unlist(input)))]
>>>> input[[67]]$content[[1]]$sha
>>> [1] "58cf43ecdc1beb7e1043e9de612ecc817b090f15"
>>>> which(input[[67]]$content[[1]]$sha == shas )
>>> [1] 194
>>> HTH,
>>> Chuck
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list