Iterators
Generators are a simple way of creating iterator
functions, i.e. functions that you can call to return a new
value. The iteration protocol is described in ?iterator
.
Here is a simple iterator that iterates over the elements of
1:3
:
library(coro)
iterator <- as_iterator(1:3)
# Call the iterator to retrieve new values
iterator()
#> [1] 1
iterator()
#> [1] 2
Once the iterator is exhausted, it returns a sentinel value that signals to its caller that there are no more values available:
# This is the last value
iterator()
#> [1] 3
# This is the exhaustion sentinel
iterator()
#> .__exhausted__.
In R we normally don’t use this sort of iteration to work with vectors. Instead, we use the idiomatic techniques of vectorised programming. Iterator functions are useful for very specific tasks:
Iterating over chunks of data when the whole data doesn’t fit in memory.
Generating sequences when you don’t know in advance how many elements you will need. These sequences may be complex and even infinite.
The iterator protocol is designed to be free of dependency. However, the easiest way to create an iterator is by using the generator factories provided in this package.
Generators
Generators create functions that can yield,
i.e. suspend themselves. When a generator reaches a
yield(value)
statement it returns the value as if you
called return(value)
. However, calling the generator again
resumes the function right where it left off. Because they preserve
their state between invokations, generators are ideal for creating
iterator functions.
generator()
creates an iterator
factory. This is a function that returns fresh iterator
functions:
# Create the iterator
abc <- generate_abc()
# Use the iterator by invoking it
abc()
#> [1] "a"
abc()
#> [1] "b"
Once the last loop in a generator has finished iterating (here there is only one), it returns the exhaustion sentinel:
# Last value
abc()
#> [1] "c"
# Exhaustion sentinel
abc()
#> .__exhausted__.
abc()
#> .__exhausted__.
You can also create infinite iterators that can’t be exhausted:
Iterating
Iterating manually over an iterator function is a bit tricky because you have to watch out for the exhaustion sentinel:
abc <- generate_abc()
while (!is_exhausted(x <- abc())) {
print(x)
}
#> [1] "a"
#> [1] "b"
#> [1] "c"
A simpler way is to iterate with a for
loop using the
iterate()
helper. Within iterate()
,
for
understands the iterator protocol:
You can also collect all remaning values of an iterator in a list
with collect()
:
abc <- generate_abc()
collect(abc)
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "b"
#>
#> [[3]]
#> [1] "c"
Beware that trying to exhaust an infinite iterator is a programming
error. This causes an infinite loop that never returns, forcing the user
to interrupt R with ctrl-c
. Make sure that you iterate over
an infinite iterator only a finite amount of time:
Adapting generators
A generator factory can take another iterator as argument to modify its values. This pattern is called adapting:
library(magrittr)
adapt_toupper <- generator(function(i) {
for (x in i) {
yield(toupper(x))
}
})
ABC <- generate_abc() %>% adapt_toupper()
ABC()
#> [1] "A"
ABC()
#> [1] "B"
Once the modified iterator is exhausted, the adaptor automatically closes as well:
ABC()
#> [1] "C"
ABC()
#> .__exhausted__.
As a user, you might not want to create an iterator factory for a
one-off adaptor. In this case you can use gen()
instead of
generator()
. This enables a more pythonic style of working
with iterators:
abc <- generate_abc()
ABC <- gen(for (x in abc) yield(toupper(x)))
collect(ABC)
#> [[1]]
#> [1] "A"
#>
#> [[2]]
#> [1] "B"
#>
#> [[3]]
#> [1] "C"
Or you can use the general purpose adaptor adapt_map()
.
It maps a function over each value of an iterator: