2012-11-19

Dependency Injection slowness solved by Doctrine Proxies

Dependency Injection Containers and Performance

Dependency Injection Containers are a vital tool for developers of complex and modular applications.
Using a Dependency Injection Container in your application brings you great benefits, allowing you to compose complex object graphs without compromises or unnecessary ugliness (i.e. static methods).

By using a Dependency Injection Container you automatically gain some unlocked benefits:

Absence of hardcoded dependencies:: Your objects do not handle instantiation of their dependencies, so you have one less problem to handle.
Better separation of concerns:: Splitting problems across multiple objects becomes easier as the container helps you gluing them all together.
Mocking is much easier:: Since you compose your instances with other dependencies that solve small problems, mocking those objects becomes really easy, and so writing tests for your application.

But there is one major pitfall: since your objects do not handle instantiation of their dependencies anymore you are now building huge object graphs, even if you're not using all of those objects.
Take for example the following code:

<?php

class A {}

class B {}

class C {}

class D {
    public function __construct(A $a, B $b, C $c)
    {
        // ...
    }
}

class HelloWorld
{
    public function __construct(D $d)
    {
        // ...
    }

    public function sayHello()
    {
        return 'Hello World';
    }
}

The example is obviously nonsense, but this actually happens in your MVC controllers, where you may have 3 or 4 actions and none of them using all of the dependencies of the controller itself.

As you notice, to call HelloWorld#sayHello() we are required to instantiate 5 objects: A, B, C, D, HelloWorld.

While this is robust code that will hardly break if A, B, C and D are correctly unit-tested, we are obviously having performance issues.
Those issues become particularly noticeable when one of these objects needs to allocate a lot of resources or to perform costly operations such as opening a file or a socket to a remote machine.

Using pure dependency injection yields stability, but introduces performance drawbacks, especially in PHP, where the object graph is rebuilt on each dispatched request.

Service Location (to the rescue?)

To solve the performance issues, some may be tempted to start using a Service Locator within their services:

<?php

class HelloWorld
{
    public function __construct(ServiceLocator $serviceLocator)
    {
        $this->serviceLocator = $serviceLocator;
    }

    public function sayHello()
    {
        return 'Hello World';
    }

    public function doSomethingWithD()
    {
        if ( ! $this->d) {
            $this->d = $this->serviceLocator->get('D');
        }

        $this->d->doSomething();
    }
}

As you have noticed, this solves the performance issue by allowing us to retrieve an instance of D only when we really need it:
performance!

Anyway, by doing so we introduced some new problems:

Our object cannot exist without a service locator:: makes testability hard, since we will need to mock the service locator in order to test HelloWorld, and mocking a service locator is not so easy.
Our object depends on the implementation of the service locator:: portability of our code is reduced, since it will work only with a specific service locator implementing the ServiceLocator contract.
Instantiation of dependencies moved to our code:: instantiation of D should not be a problem solved by our code. We introduced it in our code now, so we must test it.
Hardcoded service name in our code:: This makes our class very error prone if we don't write extensive integration tests each time we ship our code. Also, it makes our code incompatible with anything sharing the same ServiceLocator instance and requiring an instance named 'D', but with different expectations.

We solved a performance problem to introduce at least 4 new ones!
Not really nice, eh? Not at all.

If you are already using service location, STOP DOING IT NOW and please read the rest of this post.

There must be a better solution... After all, what we want to avoid is instantiating A, B, C, D alltogether if we aren't using them.
Doesn't sound to be so hard!

Doctrine Proxies to the rescue!

The idea is not new, and Lukas Smith already discussed it on the Symfony2 issue tracker.

Since I was already playing around with code generation for doctrine, I decided to implement those concepts with Doctrine Proxies.

What are Doctrine Proxies?

Doctrine Proxies are a PHP implementation of the proxy pattern used to achieve lazy loading of objects from a persistent storage.
Doctrine implements this pattern by having Virtual Proxies that behave like Ghost Objects.

The concept behind proxies is quite simple: each time a method of the proxy is called, if the proxy is not initialized, initialization logic is triggered (which usually corresponds to filling its fields with data coming from a DB).
After that, the original code that was supposed to be executed with that method call is run.

This is achieved by Doctrine by generating a class that inherits from the original object and faking all of its public API and adding the required code to trigger lazy loading:

<?php

class UserProxy extends User
{
    protected $initialized = false;

    public function getUsername()
    {
        if ( ! $this->initialized) {
            initialize($this);
        }

        return parent::getUsername();
    }
}

The previous snippet is just a simplified example, and isn't very flexible, but as you may know, Doctrine is a set of libraries focusing on persistence of data, and the first version of proxies was highly focused on supporting the purpose of loading an object from a database.

The implementation has been enhanced with a patch I'm working on, now allowing many different uses of the proxy pattern. This is mainly possible because of lambda functions used as initialization logic holders:

<?php

class UserProxy extends User
{
    /** @var Closure */
    protected $initializer;

    public function __setInitializer(Closure $initializer)
    {
        $this->initializer = $initializer;
    }

    public function getUsername()
    {
        if ($this->initializer !== null) {
            call_user_func($this->initializer);
        }

        return parent::getUsername();
    }
}

Using a Closure as an initializer now enables us to swap the initialization logic used for our proxy object. I won't get into details, but this is a requirement for our next step.

Why proxies?

Let's get back to the example with A, B, C, D, HelloWorld, but we'll introduce a proxy now:

<?php

class A {}

class B {}

class C {}

class D
{
    public function __construct(A $a, B $b, C $c)
    {
        // ...
    }

    public function doSomething()
    {
        return 'Did something with ' . $this->a . ', ' . $this->b . ', ' . $this->c;
    }
}

class D_Proxy extends D
{
    private $serviceLocator;
    private $original;

    public function __construct(ServiceLocator $serviceLocator)
    {
        $this->serviceLocator = $serviceLocator;
    }

    private function initialize()
    {
        $this->initialized = true;
        $this->original    = $this->serviceLocator->get('D');
    }

    public function doSomething()
    {
        if ( ! $this->initialized) {
            $this->initialize();
        }

        return $this->original->doSomething();
    }
}

class HelloWorld
{
    public function __construct(D $d)
    {
        // ...
    }

    public function sayHello()
    {
        return 'Hello World';
    }

    public function doSomethingWithD()
    {
        return $this->d->doSomething();
    }
}

Wait... What? Ok, let's slow this down a bit:

You can now pass an instance of D_Proxy to HelloWorld. Since D_Proxy extends D, it respects the Liskov substitution principle.
The proxy is uninitialized, and it is empty (we have replaced its constructor).
When doSomething is called on the proxy, the real instance of D is retrieved from a service locator, and put into the original property.
The method call is proxied to $this->original->doSomething();.
Since the original object is fully populated with instances of A, B and C, code works as expected.

We successfully avoided instantiating A, B, C and D when calling sayHello! Awesome!

But wait: didn't I just say that service location is evil?

Yes it is, but D_Proxy is generated code (don't worry about how it is generated) and:

Its code generation is based on how the dependency injection container defined that D should be instantiated, thus the hardcoded 'D' within the proxy code comes from the current DIC definitions. This allows it to have our DIC handling collisions between service names, and hardcoded magic strings disappear from our code base.
It abstracts the problem of lazy initialization of a service for us. The generated code doesn't need to be tested as that is something done by the implementor of the proxy generator (me).
It has the same performance impact of introducing lazy initialization logic in our classes' methods (similar amount of system calls).
Turning on or off proxies does not change the functionality provided by our applications. They're just a performance tweak. They do not affect how our logic is dispatched.
Proxies actually allow cyclic dependencies. Since objects are lazily initialized, if A depends on B, and B depends on A, and one of those two is proxied, the lazy initialization mechanism will prevent us from triggering an infinite loop in our instantiation logic. This is actually a thing I didn't think of initially, but it turns out to be a nice and powerful side effect.

General usage directions

Proxies also have some limitations though:

Cannot benefit from the initializer pattern/setter injection:: since any call to a proxy method that isn't its constructor would cause its initialization, setter injection cannot be used on a proxy, or it will basically render the underlying idea of performance tweak useless.
Cannot proxy dynamic services:: you can apply this proxy pattern only when assuming that calling $serviceLocator->get('D'); will actually return an instance of D. If the return type varies depending on i.e. environment variables, this code will break.
Must be synchronized:: changing implementation of our services requires us to re-generate proxies so that they respect the contract of the service class. Since generated code in PHP is hard to put into a cache (because opcode caches cannot act on serialized data) we need to save proxies to predictable location in our system in order to autoload them and avoid generating them over and over. That also means that we have to delete them when we change our code, so that we can let the generator rewrite them.
Add constant overhead to method calls:: If your object is lightweight, you may not need to proxy it, especially if its methods get called thousands of times.

Examples/benchmarks

If you want to read further on the proxy implementation I proposed for Zend Framework 2 you can check the corresponding pull request.
If you are interested in how proxy generation works in Doctrine, you can check my current work on doctrine common.
If performance is your concern, read about the results of the last PHPPeru hack day I had with cordoval in his blog.
I am also starting work to implement this idea for Symfony 2 too. Not quite there yet :-)

Conclusions

I can conclude that the proxies are a good solution to solve the performance issues that are introduced by Dependency Injection Containers. They also allow us to completely get rid of service location and to focus on writing clean and robust classes that are easy to test.

They surely add some hidden magic to our code, and I've been already told by Matthew Weier 'o Phinney that some newcomers may be confused by the additional calls they will in stack traces when looking at exceptions. Since proxies are an optional feature, I'm not really concerned about it.

I also worked with Luis Cordova in organizing the topics for the last PHPPeru hack day, and the participants didn't have big problems in understanding the problems and solutions suggested by the proxy approach, so I'm quite confident about having it adopted in ZF2 and SF2 soon.

Anyway, proxies are not a requirement to get our application working. They are just steroids for our services, and I'd surely suggest you to use them.

Tags: zend, zendframework, zf2, symfony, doctrine, dependency injection, service location, proxies

Marco Pivetta (Ocramius)