Enumeration is Ruby's term for iteration, and there are significant differences in how it is handled within Ruby's existing conventions when compared to other programming languages. Let us first establish some terms:
Enumerable, a built-in mixin. Other languages usually call this "iterable".Enumerator, a built-in class. Other languages usually call this "iterator".
The most basic way to iterate over something in Ruby is to call the #each
method on an object, with a block:
[1, 2, 3].each { |x| puts x }The #each method forms the backbone of enumeration in Ruby. For example, it is
very common to make something enumerable by including the mixin Enumerable in
a class, and the only pre-requisite for using this mixin is implementing the
#each method. This allows the class to get a lot of methods "for free", e.g.
#any? and #all? can all be implemented in terms of #each.
Let's say we want to make an object that produces 1, 2, and 3. The easy
way is to write (1..3), but let's do this from scratch:
class Podium
include Enumerable
def each
yield 1
yield 2
yield 3
end
endHere, we are on the other side of the fence: rather than using #each, we are
now implementing #each. The yield keyword is of interest: it calls the
block that is passed to the method. In other words, the following code is
equivalent to the code above:
class Podium
include Enumerable
def each(&blk)
blk.call(1)
blk.call(2)
blk.call(3)
end
endThe & sigil in &blk binds the block — provided by the caller — to the name
blk. If the caller failed to provide a block, then blk becomes nil. The
following two are equivalent:
yield v if block_given?blk.call(v) unless blk.nil?
Remember that a block by itself is not an object. Consult "What is the difference between a block, a proc, and a lambda in Ruby?" for more information on this topic.
Warning: if you are coming from Python, the
yieldkeyword in Ruby does not turn a function into a generator. The strict Ruby equivalent of a Python generator is called aFiber, and fibers are usually not used for iteration, unlike their Python counterparts. Hang tight for a straightforward Ruby analog to Pythonic iterative generators! We will explore them near the end.
The #each method effectively divorces how to iterate over a container from
what to do with each iterated element. The method handles the how, and the
given block handles the what. In other words, the method is responsible for
materializing every element inside the container.
What if we only want the very first element from #each, and we do not have
access to other methods, such as #[]? That is where an enumerator comes in.
Starting from Ruby 1.9, the convention is that if we call #each without a
block, then it returns an
enumerator object. We can
then call the #next method on that enumerator to produce just one element. We
can see this behavior in a built-in array:
irb(main):001:0> a = [3, 5, 7]
=> [3, 5, 7]
irb(main):002:0> e = a.each
=> #<Enumerator: [3, 5, 7]:each>
irb(main):003:0> e.next
=> 3
Our Podium class doesn't have this behavior yet, but Ruby provides the
Object#to_enum method on all objects to make it easy to support this behavior:
class Podium
include Enumerable
def each
return to_enum unless block_given?
yield 1
yield 2
yield 3
end
endIt works exactly the same way:
irb(main):011:0> p = Podium.new
=> #<Podium:0x00007fe189942358>
irb(main):012:0> e = p.each
=> #<Enumerator: #<Podium:0x00007fe189942358>:each>
irb(main):013:0> e.next
=> 1
A deeper question is: why would we want to materialize one element at a time? The value lies in space efficiency: by supporting one-at-a-time materialization, we can operate in a low-memory environment without fear of running out of memory when given an enormous data source.
In Ruby, the #to_proc method is the convention for converting something into a
callable. For example, the built-in type Symbol implements #to_proc, so that
we can abbreviate { |x| x.odd? } into :odd?.to_proc. This process is so
common that the unary operator & does the exact same thing, allowing us to
further abbreviate :odd?.to_proc as &:odd.
The choice of the punctuation & visually connects to writing &blk in an
arguments list, where the meaning of & similarly means "capture a block as a
proc".
For a deeper dive, consult "How Does Symbol#to_proc Work?".
A common idiom in Ruby is to chain multiple enumerable methods in order to produce a list of results. For example, if we want to take the numbers from 0 to 99 (inclusive), add one to each of them, pick only the odd ones, and sum them up, we could write:
(0..99)
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)The (0..99) is syntax sugar for constructing a
Range object. The desugared form
is Range.new(0, 99). This chain of methods looks great until we look at what
each step of the process does:
irb(main):001:0> (0..99).map(&:succ)
=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
It turns out that each call produces an entirely new array! If our data source had a million elements, we would be creating an additional, intermediate million-long array for each additional method we add to the chain.
In the case of arrays, we can cheat this cost away by using mutative methods:
(0..99)
.to_a
.map!(&:succ)
.select!(&:odd?)
.reduce(0, &:+)Here, we first create a new array from (0..99) using #to_a, which guarantees
us access to those mutative methods. But this forces us to remember when
mutative methods are available, and which ones are possible. The cleaner
solution should be that every method call avoids immediately doing any work
until we actually want the result!
In Rails, when dealing with data models, ActiveRecord actually does avoid doing any work until you want the result. Every method you call gives back a relation object, which records the SQL query it has built up so far. It's not until you try to use a model object that the query gets executed!
It turns out that there is a method called Enumerator#lazy that gives us the
ability to eliminate intermediates for free:
(0..99)
.each
.lazy
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)The call above does not build any intermediate arrays. This is so common, in
fact, that another convention is that if an object supports returning an
enumerator from #each, it also supports calling #lazy as a shorthand for
.each.lazy:
(0..99)
.lazy
.map(&:succ)
.select(&:odd?)
.reduce(0, &:+)We can now apply the same behavior to podium:
class Podium
include Enumerable
def each
return to_enum unless block_given?
yield 1
yield 2
yield 3
end
def lazy
each.lazy
end
endWe can verify that it's not until we actually try to use each value that those values are calculated:
irb(main):015:0> p = Podium.new
=> #<Podium:0x00007fb64cf13ce0>
irb(main):016:0> p.lazy.map(&:succ).reject(&:odd?)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: #<Podium:0x00007fb64cf13ce0>:each>>:map>:reject>
irb(main):017:0> p.lazy.map(&:succ).reject(&:odd?).each { |i| puts i }
2
4
=> nil
The Enumerator::Lazy class, used to implement Enumerator#lazy, also supports
a #force method that lets use force an eager evaluation all at once:
irb(main):018:0> p.lazy.map(&:succ).reject(&:odd?).force
=> [2, 4]
Its behavior is the same as calling #to_a.
So far we've built an enumerator by having an object wrap itself in an
enumerator. It turns out there is a second way to do this: by passing a block to
Enumerator.new:
def podium
Enumerator.new do |yielder|
yielder.yield 1
yielder.yield 2
yielder.yield 3
end
endIt can be used in exactly the same way:
irb(main):008:0> p = podium
=> #<Enumerator: #<Enumerator::Generator:0x00007fdaa15e9d08>:each>
irb(main):009:0> p.lazy.map(&:succ).reject(&:odd?)
=> #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator::Lazy: #<Enumerator: #<Enumerator::Generator:0x00007fdaa15e9d08>:each>>:map>:reject>
irb(main):010:0> p.lazy.map(&:succ).reject(&:odd?).each { |i| puts i }
2
4
=> nil
Making an enumerator from scratch is very handy if we find ourselves not needing a class.
If you're coming from Python, the block form of
Enumerator.newis the closest Ruby analog to Python generators. In fact, the low-level way to use an enumerator works almost exactly the same way:irb(main):008:0> p = podium => #<Enumerator: #<Enumerator::Generator:0x00007fa2021f2b78>:each> irb(main):009:0> p.next => 1 irb(main):010:0> p.next => 2 irb(main):011:0> p.next => 3 irb(main):012:0> p.next Traceback (most recent call last): 2: from (irb):12 1: from (irb):12:in `next' StopIteration (iteration reached an end)Enumerators in Ruby are also powered by fibers, which brings us to the question: when should one be used over the other? If the intent is to make something iterable, then enumerators are a better choice, because they include the
Enumerablemixin by default. Otherwise, if the purpose is to use coroutines as a way to perform flow control, then fibers provide a much more direct API.
to_proccould use some background material. (that could possibly be as little as once again referring to the block/proc/lambda article)"in" is spelled with two letters
Make this section a block quote?
does
.to_areturn a new array, thus allowing the mutative methods to act on an intermediate instead of the original(0..99)?