La Lambda

By | 2013-07-21

Let’s say we’ve loaded a database table into memory and now we want to do some work with it. I’m imagining a case where we’ve selected a small subset of records from an enormous table (i.e. it wasn’t cheap to fetch) and we want to pull records that meet particular conditions from our subset. Let’s say it’s customer transaction records, tens from millions, now we want to display them in various ways in a UI. We’d rather not increase load on the database for these UI manipulations, so we’re doing them locally in our client. (I appreciate that I’m contriving a situation to hang the following examples around)

The subset of records we’re working with will be represented by a list<TTransactionRecord> object, CustomerTransactions. Let’s say that we want this filtered into two lists – all the debits and all the credits. Here’s a first pass:

// Example #1
void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    // Expensive database read -- do this once
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionDebits;
    list<TTransactionRecord> TransactionCredits;

    // Filter for our needs locally
    for( const auto &transaction : CustomerTransactions ) {
        // Predicate #1
        if( transaction.isDebit() ) {
            TransactionDebits.push_back(transaction);
        }
    }
    for( const auto &transaction : CustomerTransactions ) {
        // Predicate #2
        if( transaction.isCredit() ) {
            TransactionCredits.push_back(transaction);
        }
    }
}

Fairly typical. I’ve purposely split the search into two loops, one for each predicate (a predicate just being a fancy word for any object/function that can be tested for boolean response on a given input).

Whenever you find yourself looping through a container, your first thought should be to reach for one of the C++ algorithms to replace your loop. We’ll be looking at one in particular here, copy_if(). It’s used like this:

    #include <algorithm>
    // ...
    copy_if( input.cbegin(), input.cend(), output.begin(), PREDICATE );

The proviso is that the output iterator must be able to cope with allocating new memory when it is assigned to and incremented. vector<> in particular does not support that and will SEGFAULT if you use it as above. The STL has got you covered though. We replace the simple output iterator with a smart iterator:

    #include <algorithm>
    #include <iterator>
    // ...
    copy_if( input.cbegin(), input.cend(), back_inserter(output), PREDICATE );

Magic; back_inserter() creates an iterator that converts assignments to, effectively, push_back(), which vector<> will handle regardless of the memory it already has allocated.

PREDICATE is the special part here. Before the output is assigned to, the element in question is passed to the function referenced by PREDICATE, which gives a boolean response. Essentially it provides the “if” part of our loop.

Let’s rewrite using copy_if().

// Example #2
bool customerDebitPredicate( const TTransactionRecord &O ) {
    return O.isDebit();
}
bool customerDebitPredicate( const TTransactionRecord &O ) {
    return O.isCredit();
}

void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionDebits;
    list<TTransactionRecord> TransactionCredits;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionDebits), customerDebitPredicate);
    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionCredits), customerCreditPredicate);
}

It’s arguable whether this is actually an improvement. It’s certainly not reduced the number of lines of code enough to be considered simpler; and worse it’s separated the predicate definitions from their usage. If we’re reading this code we have now to stop when we reach the copy_if() to go and see what the customerDebitPredicate() actually does; and it’s not immediately obvious at that point that customerDebitPredicate is a function. However, we’re not done yet, this is only a step along the way, so let’s hold our objections for now.

Let’s take an aside. What is a function? Disregarding how it’s implemented, from our point of view as a user of the language, a function is a reference of a type that supports using the unary operator () on it. Compare that with say, the unary postincrement operator ++. What, fundamentally, is the difference between these two constructs?

    x();
    y++;

The answer is: none. The default implementation of operator()() (for that is the C++ way of naming the operator we’re talking about) is to call the reference (for those types that are callable – function references). C++ grants us the ability to add support for operator()() to our own types though, so where this is illegal:

    int x;
    x();

This might not be:

    TCustomType x;
    x();

The times it isn’t illegal is when we have done this:

class TCustomType {
  public:
    void operator()() {
        // ... whatever ...
    }
}

Objects of this form are often called function objects or functors. They’re objects that “look like” functions. With this in mind, let’s make another change to our filter function:

// Example #3
class TTransactionDebitPredicate {
  public:
    bool operator()( const TTransactionRecord &O ) { return O.isDebit(); }
}
class TTransactionCreditPredicate {
  public:
    bool operator()( const TTransactionRecord &O ) { return O.isCredit(); }
}

void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionDebits;
    list<TTransactionRecord> TransactionCredits;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionDebits), TTransactionDebitPredicate() );
    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionCredits), TTransactionCreditPredicate() );
}

Note that the copy_if() calls have changed, we have to pass instantiations (albeit temporary ones) of the predicate objects, as the operator()() implementations are not static, they must be called on an instance.

You might now be thinking, “this is even worse, it’s even more complicated than it was”. Yes, it is; and in C++98 and in this particular case, what I’ve done here wouldn’t be justified. I’ll return to this later, let it slide temporarily. For now, I’d like to take a detour to show why this particular construction can be made to be useful.

What if we were searching with a predicate more complicated than a simple boolean? Let’s say we want to filter the transactions so that we get all those above five pounds and all those below zero (i.e. refunds).

// Example #4
class TTransactionBigPredicate {
  public:
    bool operator()( const TTransactionRecord &O ) {
        return O.value() > 5;
    }
}
class TTransactionSmallPredicate {
  public:
    bool operator()( const TTransactionRecord &O ) {
        return O.value() < 0;
    }
}

void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;
    list<TTransactionRecord> TransactionSmalls;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs), TTransactionBigPredicate() );
    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionSmalls), TTransactionSmallPredicate() );
}

We’ve gone to all the effort of writing a class to test for greater-than-5. It’s so specific in purpose that it’s unlikely it’ll be needed anywhere else. The compiler will almost certainly optimise the class and call away, so it’s unlikely to have a runtime cost, but it has a think-time cost. Why bother? The answer is that we wouldn’t; but with one small change we can suddenly make it worth the bother, and reveal the key power of functors. The key change is to add a constructor. For clarity, I’ll just do the greater-than-5 version…

// Example #5
class TTransactionGreaterThanPredicate {
  public:
    TTransactionGreaterThanPredicate(int t) : mThreshold(t) {}
    bool operator()( const TTransactionRecord &O ) {
        return O.value() > mThreshold;
    }
  protected:
    int mThreshold;
}

void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs), TTransactionGreaterThanPredicate(5) );
}

Look carefully at the magic contained here. copy_if() requires a function with a very particular function signature: bool()(const TTransactionRecord &O). We cannot change that signature, so we cannot parameterise that call to suit us. However, by passing a functor that can supply a function of that signature, we are able to parameterise the functor however we like via its constructor. Here we’ve used a simple literal, but it could be anything – a reference to another transaction say; an object that reads from a UI settings box; one that fetches a threshold from a file. We can add an arbitrary level of complexity to the predicate without having to change the search function one bit, copy_if() remains the same. This can be incredibly powerful. Even more so once you realise that you can even template the functor.

// Example #6
template<typename T>
class TGreaterThanPredicate {
  public:
    TGreaterThanPredicate(int t) : mThreshold(t) {}
    bool operator()( const T &O ) {
        return O.value() > mThreshold;
    }
  protected:
    int mThreshold;
}

void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs),
        TGreaterThanPredicate<TTransactionRecord>(5) );
}

This is not a practical example, I can’t think of a time you would use this exact form, but it demonstrates the principle. TGreaterThanPredicate<> will supply ‘greater than’ functionality for any class that supplies a .value() member.

While we’ve, hopefully, now shown how the extra complexity can be useful and worth having it would be nice if we could get it to be a little less verbose. With C++98 we were stuck. With C++11, we have a new syntax that allows just that: lambdas.

In most respects we’ve already seen lambdas. They are, in operation, exactly like one of the functors we’ve already seen. Lambdas are simply syntactic sugar to make it possible to define an anonymous functor inline.

Remember our first objection to splitting the predicates into their own functions? That we made it necessary to break off our reading of the copy_if() to read the predicate definition? Lambdas remove that necessity. The syntax for defining a (simple) lambda inline is this:

auto refToLamba = [](bool param){ return param; };

The trigger is the “[]”, and you then write as you would a function definition, with whatever parameters you wish (those parameters will usually be determined by the caller you’re passing the lambda to). Let’s rewrite example #4 with lambdas to show that happening:

// Example #7
void filterTransactions() {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;
    list<TTransactionRecord> TransactionSmalls;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs),
        [](const TTransactionRecord &O){ return O.isDebit();} );
    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionSmalls),
        [](const TTransactionRecord &O){ return O.isDebit();} );
}

The body of the lambda is the body of the operator()(const TTransactionRecord &O) that we wrote in example #4. The compiler effectively generates that functor as a temporary when we use the lambda syntax; and we’re now back to being able to see the predicate inline, but also not having had to do any of the looping work ourselves.

One final piece in the puzzle: just as we were able to add parameters to our functors, we can add parameters to our lambda. There would be little point in using a literal, as we did in example #5, we can just write that literal in the lambda body. However, it can be useful to do what’s called variable capture. Let’s consider a slightly different filter function, that itself takes a parameter.

// Example #8
void filterTransactions(int threshold) {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs),
        [](const TTransactionRecord &O){ return O.value() > threshold;} );
}

This will fail. The lambda body is not considered part of the scope of filterTransactions() and so has no access to the threshold variable. What we need is the parameter to be passed as a constructor argument exactly as we did in example #5. Lambdas fortunately have a syntax for that.

// Example #9
void filterTransactions(int threshold) {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;

    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs),
        [=threshold](const TTransactionRecord &O){ return O.value() > threshold;} );
}

The [=threshold] is the equivalent of adding an int threshold data member to the functor, and passing it by value to the functor constructor (just like example #5). Lambdas offer additional syntax that let us pass variables in the enclosing scope by reference instead of by value. That means we can modify them. That gives us quite a powerful facility. For example:

// Example #10
void filterTransactions(int threshold) {
    list<TTransactionRecord> CustomerTransactions;
    CustomerTransactions = dbFetchCustomerTransactions(CUSTOMER_ID);

    list<TTransactionRecord> TransactionBigs;

    int rejectCount = 0;
    copy_if(CustomerTransactions.cbegin(), CustomerTransactions.end(),
        back_inserter(TransactionBigs),
        [=threshold, &rejectCount](const TTransactionRecord &O) {
            if( O.value() > threshold ) {
                return true;
            }
            rejectCount++;
            return false;
        });
    // rejectCount now tells us how many records _didn't_ match the
    // predicate
}

There are some additional syntax features of lambdas, but you can simply look them up. One that’s particularly worth bearing in mind is the ability to pass this as a lambda parameter. That means you can get access to more than just the enclosing scope variables, you can get access to the enclosing object.

Leave a Reply