[R] Reduce woes

Stefan Kruger stefan.kruger at gmail.com
Mon Aug 1 13:52:28 CEST 2016


That seems like sage advise :)

Thanks

Stefan

On 29 July 2016 at 22:06, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:

> Having experienced some frustration myself when I first started with R
> many years ago, I can relate to your apparent frustration. However, if you
> would like to succeed in using R I strongly recommend learning R and not
> trying to write Haskell or Erlang or C or Fortran or any other language
> when writing in R. I am sure there are many things R could do better, and
> once you understand how R actually works you might even be in a position to
> contribute some improvements. But thinking in those other languages with an
> R interpreter on front of you is going to just make you more frustrated.
>
> For one thing, everything in R is a vector... even lists. Appending to a
> list is not O(1) as it would be for a linked list. Thus it is preferred to
> find algorithms that pre-allocate memory for results. Map (lapply) is 1:1
> to encourage that.  Reduce is N:1 because it is simpler that way. Use Map
> to    make a grouping vector that you can use to select which elements you
> want to process and then map over that subset of your input data or
> aggregate over the whole thing.
>
> Also, names are attributes of the list vector... one name per element.
> Not all list operations maintain that attribute so you often have to
> explicitly copy names from source to destination.
>
> Oh and "source" is a common base R function... and so it is generally
> advised to not re-use common names in the global environment.
> --
> Sent from my phone. Please excuse my brevity.
>
> On July 29, 2016 8:43:16 AM PDT, Stefan Kruger <stefan.kruger at gmail.com>
> wrote:
> >>> I still don't understand why you want Reduce to to lapply's
> >>> job.   Reduce maps many to one and lapply maps many to
> >>> many.
> >
> >Say you want to map a function over a subset of a vector or list? With
> >the
> >generalised version of Reduce you map many-to-one, but the one can be a
> >'complex' structure. lapply() and friends not only map many-to-many,
> >but
> >X-to-X - the resulting list will be the same length as the source. This
> >frequently gets used in Elixir, Erlang, Haskell etc as a means of
> >processing a pipeline or stream - start with a vector, select a subset
> >based on some predicate, turn this subset into an entirely different
> >object/list/
> >
> >In iterative-fashion pseudo code
> >
> >source = list(c(1,2,3,4), c(8,7,6,5,4,3,7), c(5,4))
> >result = { }
> >foreach (item in source) {
> >    if (length(item) > 2) {
> >        result[generate_some_name()] = length(item)
> >    }
> >}
> >
> >That's and example of what I want to do. It maps many (a subset of the
> >vectors in source) to one (the result named list). It's a map-filter -
> >but
> >even more general than your typical map-filter in that you can change
> >the
> >data structure - e.g. map a function over a vector, use a subset of the
> >results, and turn those into a list or S3 object.
> >
> >
> >Stefan
> >
> >
> >
> >On 29 July 2016 at 15:54, William Dunlap <wdunlap at tibco.com> wrote:
> >
> >> Reduce (like lapply) apparently uses the [[ operator to
> >> extract components from the list given to it. X[[i]] does
> >> not attach names(X)[i] to its output (where would it put it?).
> >> Hence your se
> >>
> >> To help understand what these functions are doing try
> >> putting print statements in your test functions:
> >> > data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
> >> > r <- Reduce(function(acc, item) { cat("acc="); str(acc) ;
> >cat("item=");
> >> str(item); length(item) }, data, init=list())
> >> acc= list()
> >> item= num [1:2] 1 1
> >> acc= int 2
> >> item= num 3
> >> acc= int 1
> >> item= num [1:2] 2 2
> >> > data2 <- list(one = c(oneA=1, onB=1), three = c(threeA=3), two =
> >> c(twoA=2, twoB=2))
> >> > r <- Reduce(function(acc, item) { cat("acc="); str(acc) ;
> >cat("item=");
> >> str(item); length(item) }, data2, init=list())
> >> acc= list()
> >> item= Named num [1:2] 1 1
> >>  - attr(*, "names")= chr [1:2] "oneA" "onB"
> >> acc= int 2
> >> item= Named num 3
> >>  - attr(*, "names")= chr "threeA"
> >> acc= int 1
> >> item= Named num [1:2] 2 2
> >>  - attr(*, "names")= chr [1:2] "twoA" "twoB"
> >>
> >>
> >> I still don't understand why you want Reduce to to lapply's
> >> job.   Reduce maps many to one and lapply maps many to
> >> many.
> >>
> >>
> >>
> >> Bill Dunlap
> >> TIBCO Software
> >> wdunlap tibco.com
> >>
> >> On Fri, Jul 29, 2016 at 1:37 AM, Stefan Kruger
> ><stefan.kruger at gmail.com>
> >> wrote:
> >>
> >>> Jeremiah -
> >>>
> >>> neat - that's one step closer, but one small thing I still don't
> >>> understand:
> >>>
> >>> > data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
> >>> > r = Reduce(function(acc, item) { append(acc,
> >setNames(length(item),
> >>> names(item))) }, data, list())
> >>> > str(r)
> >>> List of 3
> >>>  $ : int 2
> >>>  $ : int 1
> >>>  $ : int 2
> >>>
> >>> I wanted the names to remain, but it seems like the "data" parameter
> >loses
> >>> its names when consumed by the Reduce()? If I print "item" inside
> >the
> >>> reducing function, it's not got the names. I'm probably missing some
> >>> central tenet of R here.
> >>>
> >>> As to your comment of this being lapply() implemented by Reduce() -
> >as I
> >>> understand lapply()  (or map() in other functional languages), it's
> >>> limited
> >>> to returning a list/vector of the same length as the original.
> >Consider
> >>> this contrived example:
> >>>
> >>> > r = Reduce(function(acc, item) { if (length(item) > 1)
> >{append(acc,
> >>> setNames(length(item), names(item)))} }, data, list())
> >>> > str(r)
> >>>  int 2
> >>> > r
> >>> [1] 2
> >>>
> >>> I don't think you could achieve that with lapply()?
> >>>
> >>> Thanks
> >>>
> >>> Stefan
> >>>
> >>>
> >>> On 28 July 2016 at 20:19, jeremiah rounds <roundsjeremiah at gmail.com>
> >>> wrote:
> >>>
> >>> > Basically using Reduce as an lapply in that example, but I think
> >that
> >>> was
> >>> > caused by how people started talking about things in the first
> >place =)
> >>> But
> >>> > the point is the accumulator can be anything as far as I can tell.
> >>> >
> >>> > On Thu, Jul 28, 2016 at 12:14 PM, jeremiah rounds <
> >>> > roundsjeremiah at gmail.com> wrote:
> >>> >
> >>> >> Re:
> >>> >> "What I'm trying to
> >>> >> work out is how to have the accumulator in Reduce not be the same
> >type
> >>> as
> >>> >> the elements of the vector/list being reduced - ideally it could
> >be an
> >>> S3
> >>> >> instance, list, vector, or data frame."
> >>> >>
> >>> >> Pretty sure that is not true.  See code that follows.  I would
> >never
> >>> >> solve this task in this way though so no comment on the use of
> >Reduce
> >>> for
> >>> >> what you described.  (Note the accumulation of "functions" in a
> >list is
> >>> >> just a demo of possibilities).  You could accumulate in an
> >environment
> >>> too
> >>> >> and potentially gain a lot of copy efficiency.
> >>> >>
> >>> >>
> >>> >> lookup = list()
> >>> >> lookup[[as.character(1)]] = function() print("1")
> >>> >> lookup[[as.character(2)]] = function() print("2")
> >>> >> lookup[[as.character(3)]] = function() print("3")
> >>> >>
> >>> >> data = list(c(1,2), c(1,4), c(3,3), c(2,30))
> >>> >>
> >>> >>
> >>> >> r = Reduce(function(acc, item) {
> >>> >> append(acc, list(lookup[[as.character(min(item))]]))
> >>> >> }, data,list())
> >>> >> r
> >>> >> for(f in r) f()
> >>> >>
> >>> >>
> >>> >> On Thu, Jul 28, 2016 at 5:09 AM, Stefan Kruger <
> >>> stefan.kruger at gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >>> Ulrik - many thanks for your reply.
> >>> >>>
> >>> >>> I'm aware of many simple solutions as the one you suggest, both
> >>> iterative
> >>> >>> and functional style - but I'm trying to learn how to bend
> >Reduce()
> >>> for
> >>> >>> the
> >>> >>> purpose of using it in more complex processing tasks. What I'm
> >trying
> >>> to
> >>> >>> work out is how to have the accumulator in Reduce not be the
> >same
> >>> type as
> >>> >>> the elements of the vector/list being reduced - ideally it could
> >be
> >>> an S3
> >>> >>> instance, list, vector, or data frame.
> >>> >>>
> >>> >>> Here's a more realistic example (in Elixir, sorry)
> >>> >>>
> >>> >>> Given two lists:
> >>> >>>
> >>> >>> 1. data: maps an id string to a vector of revision strings
> >>> >>> 2. dict: maps known id/revision pairs as a string to true (or 1)
> >>> >>>
> >>> >>> find the items in data not already in dict, returned as a named
> >list.
> >>> >>>
> >>> >>> ```elixir
> >>> >>> data = %{
> >>> >>>     "id1" => ["rev1.1", "rev1.2"],
> >>> >>>     "id2" => ["rev2.1"],
> >>> >>>     "id3" => ["rev3.1", "rev3.2", "rev3.3"]
> >>> >>> }
> >>> >>>
> >>> >>> dict = %{
> >>> >>>     "id1/rev1.1" => 1,
> >>> >>>     "id1/rev1.2" => 1,
> >>> >>>     "id3/rev3.1" => 1
> >>> >>> }
> >>> >>>
> >>> >>> # Find the items in data not already in dict. Return as a
> >grouped map
> >>> >>>
> >>> >>> Map.keys(data)
> >>> >>>     |> Enum.flat_map(fn id -> Enum.map(data[id], fn rev -> {id,
> >rev}
> >>> end)
> >>> >>> end)
> >>> >>>     |> Enum.filter(fn {id, rev} -> !Dict.has_key?(dict,
> >>> "#{id}/#{rev}")
> >>> >>> end)
> >>> >>>     |> Enum.reduce(%{}, fn ({k, v}, d) -> Map.update(d, k, [v],
> >>> &[v|&1])
> >>> >>> end)
> >>> >>> ```
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> On 28 July 2016 at 12:03, Ulrik Stervbo
> ><ulrik.stervbo at gmail.com>
> >>> wrote:
> >>> >>>
> >>> >>> > Hi Stefan,
> >>> >>> >
> >>> >>> > in that case,lapply(data, length) should do the trick.
> >>> >>> >
> >>> >>> > Best wishes,
> >>> >>> > Ulrik
> >>> >>> >
> >>> >>> > On Thu, 28 Jul 2016 at 12:57 Stefan Kruger
> ><stefan.kruger at gmail.com
> >>> >
> >>> >>> > wrote:
> >>> >>> >
> >>> >>> >> David - many thanks for your response.
> >>> >>> >>
> >>> >>> >> What I tried to do was to turn
> >>> >>> >>
> >>> >>> >> data <- list(one = c(1, 1), three = c(3), two = c(2, 2))
> >>> >>> >>
> >>> >>> >> into
> >>> >>> >>
> >>> >>> >> result <- list(one = 2, three = 1, two = 2)
> >>> >>> >>
> >>> >>> >> that is creating a new list which has the same names as the
> >first,
> >>> but
> >>> >>> >> where the values are the vector lengths.
> >>> >>> >>
> >>> >>> >> I know there are many other (and better) trivial ways of
> >achieving
> >>> >>> this -
> >>> >>> >> my aim is less the task itself, and more figuring out if this
> >can
> >>> be
> >>> >>> done
> >>> >>> >> using Reduce() in the fashion I showed in the other examples
> >I
> >>> gave.
> >>> >>> It's
> >>> >>> >> a
> >>> >>> >> building block of doing map-filter-reduce type pipelines that
> >I'd
> >>> >>> like to
> >>> >>> >> understand how to do in R.
> >>> >>> >>
> >>> >>> >> Fumbling in the dark, I tried:
> >>> >>> >>
> >>> >>> >> Reduce(function(acc, item) { setNames(c(acc,
> >length(data[item])),
> >>> >>> item },
> >>> >>> >> names(data), accumulate=TRUE)
> >>> >>> >>
> >>> >>> >> but setNames sets all the names, not adding one - and acc is
> >still
> >>> a
> >>> >>> >> vector, not a list.
> >>> >>> >>
> >>> >>> >> It looks like 'lambda.tools.fold()' and possibly
> >'purrr.reduce()'
> >>> aim
> >>> >>> at
> >>> >>> >> doing what I'd like to do - but I've not been able to figure
> >out
> >>> quite
> >>> >>> >> how.
> >>> >>> >>
> >>> >>> >> Thanks
> >>> >>> >>
> >>> >>> >> Stefan
> >>> >>> >>
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> On 27 July 2016 at 20:35, David Winsemius
> ><dwinsemius at comcast.net>
> >>> >>> wrote:
> >>> >>> >>
> >>> >>> >> >
> >>> >>> >> > > On Jul 27, 2016, at 8:20 AM, Stefan Kruger <
> >>> >>> stefan.kruger at gmail.com>
> >>> >>> >> > wrote:
> >>> >>> >> > >
> >>> >>> >> > > Hi -
> >>> >>> >> > >
> >>> >>> >> > > I'm new to R.
> >>> >>> >> > >
> >>> >>> >> > > In other functional languages I'm familiar with you can
> >often
> >>> >>> seed a
> >>> >>> >> call
> >>> >>> >> > > to reduce() with a custom accumulator. Here's an example
> >in
> >>> >>> Elixir:
> >>> >>> >> > >
> >>> >>> >> > > map = %{"one" => [1, 1], "three" => [3], "two" => [2, 2]}
> >>> >>> >> > > map |> Enum.reduce(%{}, fn ({k,v}, acc) ->
> >Map.update(acc, k,
> >>> >>> >> > > Enum.count(v), nil) end)
> >>> >>> >> > > # %{"one" => 2, "three" => 1, "two" => 2}
> >>> >>> >> > >
> >>> >>> >> > > In R-terms that's reducing a list of vectors to become a
> >new
> >>> list
> >>> >>> >> mapping
> >>> >>> >> > > the names to the vector lengths.
> >>> >>> >> > >
> >>> >>> >> > > Even in JavaScript, you can do similar things:
> >>> >>> >> > >
> >>> >>> >> > > list = { one: [1, 1], three: [3], two: [2, 2] };
> >>> >>> >> > > var result = Object.keys(list).reduceRight(function (acc,
> >>> item) {
> >>> >>> >> > >  acc[item] = list[item].length;
> >>> >>> >> > >  return acc;
> >>> >>> >> > > }, {});
> >>> >>> >> > > // result == { two: 2, three: 1, one: 2 }
> >>> >>> >> > >
> >>> >>> >> > > In R, from what I can gather, Reduce() is restricted such
> >that
> >>> any
> >>> >>> >> init
> >>> >>> >> > > value you feed it is required to be of the same type as
> >the
> >>> >>> elements
> >>> >>> >> of
> >>> >>> >> > the
> >>> >>> >> > > vector you're reducing -- so I can't build up. So whilst
> >I can
> >>> >>> do, say
> >>> >>> >> > >
> >>> >>> >> > >> Reduce(function(acc, item) { acc + item }, c(1,2,3,4,5),
> >96)
> >>> >>> >> > > [1] 111
> >>> >>> >> > >
> >>> >>> >> > > I can't use Reduce to build up a list, vector or data
> >frame?
> >>> >>> >> > >
> >>> >>> >> > > What am I missing?
> >>> >>> >> > >
> >>> >>> >> > > Many thanks for any pointers,
> >>> >>> >> >
> >>> >>> >> > This builds a list:
> >>> >>> >> >
> >>> >>> >> > > Reduce(function(acc, item) { c(acc , item) },
> >c(1,2,3,4,5), 96,
> >>> >>> >> > accumulate=TRUE)
> >>> >>> >> > [[1]]
> >>> >>> >> > [1] 96
> >>> >>> >> >
> >>> >>> >> > [[2]]
> >>> >>> >> > [1] 96  1
> >>> >>> >> >
> >>> >>> >> > [[3]]
> >>> >>> >> > [1] 96  1  2
> >>> >>> >> >
> >>> >>> >> > [[4]]
> >>> >>> >> > [1] 96  1  2  3
> >>> >>> >> >
> >>> >>> >> > [[5]]
> >>> >>> >> > [1] 96  1  2  3  4
> >>> >>> >> >
> >>> >>> >> > [[6]]
> >>> >>> >> > [1] 96  1  2  3  4  5
> >>> >>> >> >
> >>> >>> >> > But you are not saying what you want. The other examples
> >were
> >>> doing
> >>> >>> >> > something with names but you provided no names for the R
> >example.
> >>> >>> >> >
> >>> >>> >> > This would return a list of named vectors:
> >>> >>> >> >
> >>> >>> >> > > Reduce(function(acc, item) { setNames( c(acc,item),
> >1:(item+1))
> >>> >>> },
> >>> >>> >> > c(1,2,3,4,5), 96, accumulate=TRUE)
> >>> >>> >> > [[1]]
> >>> >>> >> > [1] 96
> >>> >>> >> >
> >>> >>> >> > [[2]]
> >>> >>> >> >  1  2
> >>> >>> >> > 96  1
> >>> >>> >> >
> >>> >>> >> > [[3]]
> >>> >>> >> >  1  2  3
> >>> >>> >> > 96  1  2
> >>> >>> >> >
> >>> >>> >> > [[4]]
> >>> >>> >> >  1  2  3  4
> >>> >>> >> > 96  1  2  3
> >>> >>> >> >
> >>> >>> >> > [[5]]
> >>> >>> >> >  1  2  3  4  5
> >>> >>> >> > 96  1  2  3  4
> >>> >>> >> >
> >>> >>> >> > [[6]]
> >>> >>> >> >  1  2  3  4  5  6
> >>> >>> >> > 96  1  2  3  4  5
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >> > > Stefan
> >>> >>> >> > >
> >>> >>> >> > >
> >>> >>> >> > >
> >>> >>> >> > > --
> >>> >>> >> > > Stefan Kruger <stefan.kruger at gmail.com>
> >>> >>> >> > >
> >>> >>> >> > >       [[alternative HTML version deleted]]
> >>> >>> >> > >
> >>> >>> >> > > ______________________________________________
> >>> >>> >> > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and
> >more,
> >>> see
> >>> >>> >> > > https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> >> > > PLEASE do read the posting guide
> >>> >>> >> > http://www.R-project.org/posting-guide.html
> >>> >>> >> > > and provide commented, minimal, self-contained,
> >reproducible
> >>> code.
> >>> >>> >> >
> >>> >>> >> > David Winsemius
> >>> >>> >> > Alameda, CA, USA
> >>> >>> >> >
> >>> >>> >> >
> >>> >>> >>
> >>> >>> >>
> >>> >>> >> --
> >>> >>> >> Stefan Kruger <stefan.kruger at gmail.com>
> >>> >>> >>
> >>> >>> >>         [[alternative HTML version deleted]]
> >>> >>> >>
> >>> >>> >> ______________________________________________
> >>> >>> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> >see
> >>> >>> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> >> PLEASE do read the posting guide
> >>> >>> >> http://www.R-project.org/posting-guide.html
> >>> >>> >> and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >>> >>
> >>> >>> >
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Stefan Kruger <stefan.kruger at gmail.com>
> >>> >>>
> >>> >>>         [[alternative HTML version deleted]]
> >>> >>>
> >>> >>> ______________________________________________
> >>> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
> >see
> >>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> >>> PLEASE do read the posting guide
> >>> >>> http://www.R-project.org/posting-guide.html
> >>> >>> and provide commented, minimal, self-contained, reproducible
> >code.
> >>> >>>
> >>> >>
> >>> >>
> >>> >
> >>>
> >>>
> >>> --
> >>> Stefan Kruger <stefan.kruger at gmail.com>
> >>>
> >>>         [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >>
>
>


-- 
Stefan Kruger <stefan.kruger at gmail.com>

	[[alternative HTML version deleted]]



More information about the R-help mailing list