Dependency Injection slowness solved by Doctrine Proxies
Dependency Injection Containers and Performance
Dependency Injection Containers
are a vital tool for developers of complex and modular applications.
Using a Dependency Injection Container in your application brings you great benefits, allowing you to compose
complex object graphs without compromises or unnecessary ugliness (i.e. static methods).
By using a Dependency Injection Container you automatically gain some unlocked benefits:
- Absence of hardcoded dependencies:
- Your objects do not handle instantiation of their dependencies, so you have one less problem to handle.
- Better separation of concerns:
- Splitting problems across multiple objects becomes easier as the container helps you gluing them all together.
- Mocking is much easier:
- Since you compose your instances with other dependencies that solve small problems, mocking those objects becomes really easy, and so writing tests for your application.
But there is one major pitfall: since your objects do not handle instantiation of their dependencies anymore
you are now building huge object graphs, even if you're not using all of those objects.
Take for example the following code:
<?php
class A {}
class B {}
class C {}
class D {
public function __construct(A $a, B $b, C $c)
{
// ...
}
}
class HelloWorld
{
public function __construct(D $d)
{
// ...
}
public function sayHello()
{
return 'Hello World';
}
}
The example is obviously nonsense, but this actually happens in your MVC controllers, where you may have 3 or 4 actions and none of them using all of the dependencies of the controller itself.
As you notice, to call HelloWorld#sayHello()
we are required to instantiate 5 objects:
A
, B
, C
, D
, HelloWorld
.
While this is robust code that will hardly break if A
, B
, C
and
D
are correctly unit-tested, we are obviously having performance issues.
Those issues become particularly noticeable when one of these objects needs to allocate a lot of resources or
to perform costly operations such as opening a file or a socket to a remote machine.
Using pure dependency injection yields stability, but introduces performance drawbacks, especially in PHP, where the object graph is rebuilt on each dispatched request.
Service Location (to the rescue?)
To solve the performance issues, some may be tempted to start using a Service Locator within their services:
<?php
class HelloWorld
{
public function __construct(ServiceLocator $serviceLocator)
{
$this->serviceLocator = $serviceLocator;
}
public function sayHello()
{
return 'Hello World';
}
public function doSomethingWithD()
{
if ( ! $this->d) {
$this->d = $this->serviceLocator->get('D');
}
$this->d->doSomething();
}
}
As you have noticed, this solves the performance issue by allowing us to retrieve an instance of D
only when we really need it:
performance!
Anyway, by doing so we introduced some new problems:
- Our object cannot exist without a service locator:
-
makes testability hard, since we will need to mock the service locator in order to test
HelloWorld
, and mocking a service locator is not so easy. - Our object depends on the implementation of the service locator:
-
portability of our code is reduced, since it will work only with a specific service locator implementing the
ServiceLocator
contract. - Instantiation of dependencies moved to our code:
-
instantiation of
D
should not be a problem solved by our code. We introduced it in our code now, so we must test it. - Hardcoded service name in our code:
-
This makes our class very error prone if we don't write extensive integration tests each time we ship our
code. Also, it makes our code incompatible with anything sharing the same
ServiceLocator
instance and requiring an instance named'D'
, but with different expectations.
We solved a performance problem to introduce at least 4 new ones!
Not really nice, eh? Not at all.
If you are already using service location, STOP DOING IT NOW and please read the rest of this post.
There must be a better solution... After all, what we want to avoid is instantiating
A
, B
, C
, D
alltogether if we aren't using them.
Doesn't sound to be so hard!
Doctrine Proxies to the rescue!
The idea is not new, and Lukas Smith already discussed it on the Symfony2 issue tracker.
Since I was already playing around with code generation for doctrine, I decided to implement those concepts with Doctrine Proxies.
What are Doctrine Proxies?
Doctrine Proxies are a PHP implementation of the
proxy pattern used to achieve
lazy loading of objects from
a persistent storage.
Doctrine implements this pattern by having Virtual Proxies that behave like
Ghost Objects.
The concept behind proxies is quite simple: each time a method of the proxy is called, if the proxy is not
initialized, initialization logic is triggered (which usually corresponds to filling its fields with data
coming from a DB).
After that, the original code that was supposed to be executed with that method call is run.
This is achieved by Doctrine by generating a class that inherits from the original object and faking all of its public API and adding the required code to trigger lazy loading:
<?php
class UserProxy extends User
{
protected $initialized = false;
public function getUsername()
{
if ( ! $this->initialized) {
initialize($this);
}
return parent::getUsername();
}
}
The previous snippet is just a simplified example, and isn't very flexible, but as you may know, Doctrine is a set of libraries focusing on persistence of data, and the first version of proxies was highly focused on supporting the purpose of loading an object from a database.
The implementation has been enhanced with a patch I'm working on, now allowing many different uses of the proxy pattern. This is mainly possible because of lambda functions used as initialization logic holders:
<?php
class UserProxy extends User
{
/** @var Closure */
protected $initializer;
public function __setInitializer(Closure $initializer)
{
$this->initializer = $initializer;
}
public function getUsername()
{
if ($this->initializer !== null) {
call_user_func($this->initializer);
}
return parent::getUsername();
}
}
Using a Closure as an initializer now enables us to swap the initialization logic used for our proxy object. I won't get into details, but this is a requirement for our next step.
Why proxies?
Let's get back to the example with A
, B
, C
, D
,
HelloWorld
, but we'll introduce a proxy now:
<?php
class A {}
class B {}
class C {}
class D
{
public function __construct(A $a, B $b, C $c)
{
// ...
}
public function doSomething()
{
return 'Did something with ' . $this->a . ', ' . $this->b . ', ' . $this->c;
}
}
class D_Proxy extends D
{
private $serviceLocator;
private $original;
public function __construct(ServiceLocator $serviceLocator)
{
$this->serviceLocator = $serviceLocator;
}
private function initialize()
{
$this->initialized = true;
$this->original = $this->serviceLocator->get('D');
}
public function doSomething()
{
if ( ! $this->initialized) {
$this->initialize();
}
return $this->original->doSomething();
}
}
class HelloWorld
{
public function __construct(D $d)
{
// ...
}
public function sayHello()
{
return 'Hello World';
}
public function doSomethingWithD()
{
return $this->d->doSomething();
}
}
Wait... What? Ok, let's slow this down a bit:
-
You can now pass an instance of
D_Proxy
toHelloWorld
. SinceD_Proxy
extendsD
, it respects the Liskov substitution principle. - The proxy is uninitialized, and it is empty (we have replaced its constructor).
-
When
doSomething
is called on the proxy, the real instance ofD
is retrieved from a service locator, and put into theoriginal
property. -
The method call is proxied to
$this->original->doSomething();
. -
Since the original object is fully populated with instances of
A
,B
andC
, code works as expected.
We successfully avoided instantiating A
, B
, C
and D
when
calling sayHello
! Awesome!
But wait: didn't I just say that service location is evil?
Yes it is, but D_Proxy
is generated code (don't worry about how it is generated) and:
-
Its code generation is based on how the dependency injection container defined that
D
should be instantiated, thus the hardcoded'D'
within the proxy code comes from the current DIC definitions. This allows it to have our DIC handling collisions between service names, and hardcoded magic strings disappear from our code base. - It abstracts the problem of lazy initialization of a service for us. The generated code doesn't need to be tested as that is something done by the implementor of the proxy generator (me).
- It has the same performance impact of introducing lazy initialization logic in our classes' methods (similar amount of system calls).
- Turning on or off proxies does not change the functionality provided by our applications. They're just a performance tweak. They do not affect how our logic is dispatched.
-
Proxies actually allow cyclic dependencies. Since objects are lazily initialized, if
A
depends onB
, andB
depends onA
, and one of those two is proxied, the lazy initialization mechanism will prevent us from triggering an infinite loop in our instantiation logic. This is actually a thing I didn't think of initially, but it turns out to be a nice and powerful side effect.
General usage directions
Proxies also have some limitations though:
- Cannot benefit from the initializer pattern/setter injection:
- since any call to a proxy method that isn't its constructor would cause its initialization, setter injection cannot be used on a proxy, or it will basically render the underlying idea of performance tweak useless.
- Cannot proxy dynamic services:
-
you can apply this proxy pattern only when assuming that calling
$serviceLocator->get('D');
will actually return an instance ofD
. If the return type varies depending on i.e. environment variables, this code will break. - Must be synchronized:
- changing implementation of our services requires us to re-generate proxies so that they respect the contract of the service class. Since generated code in PHP is hard to put into a cache (because opcode caches cannot act on serialized data) we need to save proxies to predictable location in our system in order to autoload them and avoid generating them over and over. That also means that we have to delete them when we change our code, so that we can let the generator rewrite them.
- Add constant overhead to method calls:
- If your object is lightweight, you may not need to proxy it, especially if its methods get called thousands of times.
Examples/benchmarks
- If you want to read further on the proxy implementation I proposed for Zend Framework 2 you can check the corresponding pull request.
- If you are interested in how proxy generation works in Doctrine, you can check my current work on doctrine common.
- If performance is your concern, read about the results of the last PHPPeru hack day I had with cordoval in his blog.
- I am also starting work to implement this idea for Symfony 2 too. Not quite there yet :-)
Conclusions
I can conclude that the proxies are a good solution to solve the performance issues that are introduced by Dependency Injection Containers. They also allow us to completely get rid of service location and to focus on writing clean and robust classes that are easy to test.
They surely add some hidden magic to our code, and I've been already told by Matthew Weier 'o Phinney that some newcomers may be confused by the additional calls they will in stack traces when looking at exceptions. Since proxies are an optional feature, I'm not really concerned about it.
I also worked with Luis Cordova in organizing the topics for the last PHPPeru hack day, and the participants didn't have big problems in understanding the problems and solutions suggested by the proxy approach, so I'm quite confident about having it adopted in ZF2 and SF2 soon.
Anyway, proxies are not a requirement to get our application working. They are just steroids for our services, and I'd surely suggest you to use them.