F...ancy PHP: lazy evaluation and concurrency


November 8, 2018

"F...ancy PHP" series answers a challenge: what features of the more well-regarded languages does PHP already have? What others can it emulate?

In this article I will cover how to use PHP to achieve, or simulate, two other common features of functional languages: lazy evaluation and concurrency.

Lazy evaluation

Lazy evaluation means that a program delays evaluation of an expression until its value is needed. (The opposite term is eager evaluation.) Lazy evaluation allows you to create infinite data structures, and, with careful use - increase performance through a lower memory footprint, and skipping some calculations. This strategy is commonly paired with memoization, that is, caching the execution results for a given combination of arguments.

Built-in PHP data structures (arrays) and functions are not lazy (compare: array_map in PHP vs map in Clojure.) However, the language itself does contain features that allow us to implement lazy evaluation: iterators and generators.

Iterators

An Iterator is an interface defining functionality of in-order traversal, for example inspecting elements of a singly linked list. PHP iterators can usually be traversed multiple times by using method rewind().

To create a lazily evaluated collection, you can therefore write a class that implements Iterator interface, and generates subsequent collection elements on-demand, whenever someone calls methods current() or next().

In an attempt to provide composability, standard PHP library provides a number of pre-implemented iterators, mixing actual first-order functionality (eg. glob iterator), lazy evaluation strategy helpers (eg. filter iterator), attempts at modelling traversal of nested collections (anything that implements RecursiveIterator.) Unfortunately, they're poorly documented and sometimes work in unintuitive ways.

Iterators have existed in PHP since version 5.0, released in year 2004.

Generators

Generators, on the other hand, are functions that can "freeze" their current state mid-execution and return an intermediate execution value. This is achieved through operator yield (translated: freeze-and-return).

You could also say that in PHP, generators are functions that return single-use iterators. Every call to a generator function actually returns an instance of an internal class Generator, which implements the Iterator interface. In other words, generator functions are conveniences that allow you to skip the boilerplate of implementing Iterator over and over again.

To provide memoization of sorts and turn a generator into a reusable iterator, you might think to wrap the generator in a CachingIterator. Unfortunately, it doesn't work in that way: it's more like an equivalent of tee). After the first full run, you need to call getCache() to fetch the cached results. Calling rewind() on CachingIterator will cause it to attempt to rewind the internal generator and crash. So in reality it's more convenient to write your own wrapper.

A generator can finish execution prematurely through a return statement or a thrown exception. If you want to prevent memory leaks and ensure that resources opened by a generator are always freed, wrap the sensitive part in a try ... finally construct.

Generators were added to PHP in version 5.5 which hit the ground in 2013, nine years after iterators.

Concurrency

Promises, futures, channels, coroutines are the different ways in which other languages provide concurrency, that is, the ability to decompose a program into chunks that can run independently and out-of-order.

Coroutines are functions that can suspend execution mid-way and pass control to some other piece of code. As it happens, we can (sort of) implement coroutines with generators.

Passing control to and from PHP generators

In the following situation:

A generator has the following options of passing control:

  • $optional = yield $value passes control to caller temporarily. When passing control back to callee, caller can pass an optional value (see send(), below)
  • yield from $hiddenDependency, a variation on yield, passes control to first $hiddenDependency and then caller. (It creates a funnel of values from $otherGenerator to the calling code that lasts until $otherGenerator has finished yielding values.)
  • return $value passes control to caller permanently, and marks callee as closed. (It functions like a break of the generator loop.)

And the calling code has the following ways of passing control to a generator:

  • Iterator::current() passes control to generator and returns the most recently yielded value (throws when called on a closed generator)
  • Iterator::next() passes control to generator (throws when called on an closed generator)
  • Generator::send($value) passes control to a generator and passes $value to it. (You can think of Generator::next() as calling Generator::send(null))

And since we're doing an overview of useful generator methods, you should also know about these:

  • Iterator::valid() checks if the generator is closed (finished generating values)
  • Generator::getReturn() fetches the returned value of a closed generator(current() is called on an open generator and getReturn() on a closedone: you can think of the latter as a program exit code.)

Implementing coroutines with generators

First, let's look at an example of generators used for their primary purpose: lazily generating a sequence of values, in batches.

function lineReader($resource) {
    // generator can relinquish control without yielding any meaningful value
    $batchSize = yield;

    while (is_int($batchSize) && $batchSize > 0) {
        $lines = [];

        for ($i = 0; $i < $batchSize && !feof($resource); $i++) {
            $lines []= rtrim(fgets($resource), "\n\r");
        }

        $batchSize = yield $lines;
    }
}

function stringEncoder(Generator $batchSource) {
    // always fetch lines in batches of 3
    while ($lines = $batchSource->send(3)) {
        foreach ($lines as $line) {
            yield urlencode($line);
        }
    }
}

$stdin = fopen('php://stdin', 'r');
$reader = lineReader($stdin);
$encoder = stringEncoder($reader);

foreach ($encoder as $line) {
    if ($line == '.') {
        break;
    }

    print("$line\n");
}

fclose($stdin);

Now, there is a feature of coroutines that generators do not perform out of the box, and that is explicitly specifying the coroutine to which the control passes. You can yield from in a loop, kinda-sorta, but I think that'll eventually run out of stack space.

So instead, you can provide something like a symbols array and a call loop yourself. Simples!

function ping() {
    yield;
    for ($i = 0; $i < 5; $i++) {
        print("Ping! $i\n");
        yield 'pong';
    }
}

function pong() {
    yield;
    for ($i = 0; $i < 5; $i++) {
        print("Pong! $i\n");
        yield 'ping';
    }
}

$instructions = ['ping', 'pong'];
$instructions = array_combine($instructions, array_map('call_user_func', $instructions));

$instruction = 'ping';

while (!empty($instruction) && isset($instructions[$instruction])) {
    $instructions[$instruction]->next();
    $instruction = $instructions[$instruction]->current();
}
Tags: hack php