Not Everything Is A Reference

By | 2013-07-01

I like Java as a language. Kind of. It’s strongly and statically typed, has object oriented syntax, reflection, exceptions, and interfaces (interfaces being one of the few features that I think C++ is missing). Dynamically typed languages encourage sloppiness, and don’t help you avoid sloppiness, and the overriding feature of all good software is that it is readable (i.e. not sloppy). However, like many languages, I feel that it’s been designed because “C++ is too hard”. Further, like many languages that do this, they end up in difficulties that are created because fundamentally you can’t hide complexity. As Einstein said, “Make things as simple as possible; but not simpler.”

This article is a discussion of one of those difficulties.

Consider this Java code:

    class SomeClass extends SomeOtherClass {
        private final String NAME = "SomeClass";

        @Override
        public String getName() {
            return NAME;
        }
    }

Here’s the same in C++11:

    class SomeClass : public SomeOtherClass {
        public:
            const string &getName() const override {
                return NAME;
            }
        private:
            static const string NAME;
    };
    const string SomeClass::NAME = "SomeClass";

These, while superficially similar, are very different. In C++ we have a way of defining constants. We have a way of returning references to constants, and a way of saying a particular method doesn’t alter the object. In Java we have none of those.

This lack of expressive ability is, presumably, to keep things simple. Unfortunately, wanting to keep things simple, doesn’t make them so. Imagine this code:

    public class references {
        abstract static class SomeOtherClass {
            abstract public String getName();
        }
    
        static class SomeClass extends SomeOtherClass {
            private final String NAME = "SomeClass";
    
            @Override
            public String getName() {
                return NAME;
            }
        }
    
        public static void main(String[] args) {
            SomeClass x;
            String someString = x.getName();
    
            System.out.println(x.getName());
            someString[0] = 's';
            System.out.println(x.getName());
        }
    }

This is invalid. Java Strings are immutable. so the attempt to alter the first character isn’t supported. Instead we have to reach for StringBuilder or String.replace() to create a new string from an existing string. In this case, that’s exactly what’s necessary – we can’t be allowed to modify constants.

Except, what if we want to modify a non-const string? Since all Strings are immutable, we have to copy every string before we modify it. Worse, the immutability of a String only comes about because String has been carefully coded so that it is immutable. We will have the same problem with any container type we might create of our own. We’ll have to make it immutable, and then remember to supply all the methods that might have modified in place with versions that self-copy then modify the copy.

All to avoid “const”, and all because Java only understands references to objects.

C++11’s new move semantics, strangely, let C++ go in the opposite direction – avoiding references. I say “strangely” because at first sight, and for the experienced C++ programmer, references are what let you write fast software. Everyone knows you shouldn’t, where possible, copy data. You should pass around references. We end up passing pointers or references (pointers in disguise) around. The problem is that that had its own pitfalls:

   SomeClass &manipulateSomeClasses(const SomeClass &A, const SomeClass &B)
   {
       SomeClass *result = new SomeClass();
       // ... Combine A and B into result ...
       return *result;
   }

Who is responsible for freeing up the memory allocated here? We’re returning a reference, so unless the caller immediately takes the address of this return value, it’s not going to happen. Further, unless they assign the answer as part of a reference construction, there is no reasonable way to get at the returned reference:

    SomeClass &reference( manipulateSomeClasses(A,B) );
    // ... work on reference ...
    delete &reference;

Yuck, deleting with the address-of operator. Notice as well that the caller of manipulateSomeClasses() has to know implementation detail – that its storage is allocated on the free store rather than automatic storage. Most of us have simply given in at this point and either passed a destination object in as a parameter, or returned a pointer and taken over ownership of it. Neither of which is pretty (passing a destination parameter is classic C, but makes the code less self documenting), and still requires far too much knowledge of internals.

So what do we do instead? Return by value?

   SomeClass manipulateSomeClasses(const SomeClass &A, const SomeClass &B)
   {
       SomeClass result;
       // ... Combine A and B into result ...
       return result;
   }

Assuming SomeClass has all the requisite copy constructors and assignment operators, this will at least protect us from a memory leak. If SomeClass holds a lot of data though, returning results like this is slow. The returned value has to be copy-constructed into its target in the caller from the automatic result that’s about to be destructed.

Or at least it was in C++98. C++11 lets you create a move constructor. We’ve seen previously how useful that is for construction, but it might not be entirely clear that a move constructor can be called implicitly in these return-by-value situations. The above return-by-value example, provided a C++11 move-constructor has been implemented, is not slow. In fact it’s wonderfully fast.

    SomeClass x( manipulateSomeClasses(A,B) );

x here in the old days would be copy-constructed. Now, the compiler notes that the return value of manipulateSomeClasses() is a temporary, and so the move-constructor can be used. The move constructor effectively reassigns the storage of that temporary to the non-temporary, x – nothing is copied.

We now have no worries about speed, and no worries about remembering to free up resources.

Let’s return to where we started then.

    #include <iostream>
    #include <string>
    using namespace std;

    class SomeOtherClass {
      public:
        virtual const string &getName() const = 0;
    };

    class SomeClass : public SomeOtherClass {
        public:
            const string &getName() const override {
                return NAME;
            }
        private:
            static constexpr string NAME {"SomeClass"};
    };
    const string SomeClass::NAME = "SomeClass";

    int main()
    {
        SomeClass x;
        string name(x.getName());
        cerr << "name = " << name << "  " << x.getName() << endl;
        name[0] = 's';
        cerr << "name = " << name << "  " << x.getName() << endl;

        // outputs:
        // name = SomeClass
        // name = someClass

        return 0;
    }

This one will work exactly as you might expect, with no need for builder classes or unnecessary copies (in fact the copy here is necessary, but the compiler will know that and do the right thing). In other words: our code is more readable than the Java equivalent, and will be faster when it’s possible to be faster. Importantly, the user of SomeClass doesn’t need to care about its internals, it could be complex or simple, use lots of storage or none. The caller treats it as they would any built-in type.

Leave a Reply