Doctrine ORM Hydration Performance Optimization
PRE-REQUISITE: Please note that this article explains complexity in internal ORM operations with the Big-O notation. Consider reading this article, if you are not familiar with the Big-O syntax.
What is hydration?
Doctrine ORM, like most ORMs, is performing a process called Hydration when converting database results into objects.
This process usually involves reading a record from a database result and then converting the column values into an object's properties.
Here is a little pseudo-code snippet that shows what a mapper is actually doing under the hood:
<?php
$results = [];
$reflectionFields = $mappingInformation->reflectionFields();
foreach ($resultSet->fetchRow() as $row) {
$object = new $mappedClassName;
foreach ($reflectionFields as $column => $reflectionField) {
$reflectionField->setValue($object, $row[$column]);
}
$results[] = $object;
}
return $results;
That's a very basic example, but this gives you an idea of what an ORM is doing for you.
As you can see, this is an O(N)
operation (assuming a constant number of reflection fields).
There are multiple ways to speed up this particular process, but we can only remove constant overhead from it, and not actually reduce it to something more efficient.
When is hydration expensive?
Hydration starts to become expensive with complex resultsets.
Consider the following SQL query:
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS socialAccountId,
s.username AS socialAccountUsername,
s.type AS socialAccountType
FROM
user u
LEFT JOIN
socialAccount s
ON s.userId = u.id
Assuming that the relation from user
to socialAccount
is a one-to-many
,
this query retrieves all the social accounts for all the users in our application
A resultset may be as follows:
userId | userUsername | socialAccountId | socialAccountUsername | socialAccountType |
---|---|---|---|---|
1 | ocramius@gmail.com | 20 | ocramius | |
1 | ocramius@gmail.com | 21 | @ocramius | |
1 | ocramius@gmail.com | 22 | ocramiusaethril | Last.fm |
2 | grandpa@example.com | NULL |
NULL |
NULL |
3 | grandma@example.com | 85 | awesomegrandma9917 |
As you can see, we are now joining 2 tables in the results, and the ORM has to perform more complicated operations:
- Hydrate 1
-
User
object for ocramius@gmail.com - Hydrate 3
-
SocialAccount
instances intoUser#$socialAccounts
for ocramius@gmail.com, while skipping re-hydratingUser
ocramius@gmail.com - Hydrate 1
-
User
object for grandpa@example.com - Skip hydrating
-
User#$socialAccounts
for grandpa@example.com, as no social accounts are associated - Hydrate 1
-
User
object for grandma@example.com - Hydrate 1
-
SocialAccount
instance intoUser#$socialAccounts
for grandma@example.com
DOCS This operation is what is done by Doctrine ORM when you use the DQL Fetch Joins feature.
Fetch joins are a very efficient way to hydrate multiple records without resorting to multiple queries, but there are two performance issues with this approach (both not being covered by this article):
- Empty records require some useless looping inside the ORM internals (see grandpa@example.com's social account). This is a quick operation, but we can't simply ignore those records upfront.
-
If multiple duplicated records are being joined (happens a lot in
many-to-many
associations), then we want to de-duplicate records by keeping a temporary in-memory identifier map.
Additionally, our operation starts to become more complicated, as it is now O(n * m)
, with
n
and m
being the records in the user
and the socialAccount
tables.
What the ORM is actually doing here is normalizing data that was fetched in a de-normalized resultset, and that is going through your CPU and your memory.
Bringing hydration cost to an extreme
The process of hydration becomes extremely expensive when more than 2 LEFT JOIN
operations clauses are part of our queries:
SELECT
u.id AS userId,
u.username AS userUsername,
sa.id AS socialAccountId,
sa.username AS socialAccountUsername,
sa.type AS socialAccountType,
s.id AS sessionId,
s.expiresOn AS sessionExpiresOn,
FROM
user u
LEFT JOIN
socialAccount sa
ON sa.userId = u.id
LEFT JOIN
session s
ON s.userId = u.id
This kind of query produces a much larger resultset, and the results are duplicated by a lot:
userId | user Username | social Account Id | social Account Username | social Account Type | session Id | session Expires On |
---|---|---|---|---|---|---|
1 | ocramius@gmail.com | 20 | ocramius | ocramius-macbook | 2015-04-20 22:08:56 | |
1 | ocramius@gmail.com | 21 | @ocramius | ocramius-macbook | 2015-04-20 22:08:56 | |
1 | ocramius@gmail.com | 22 | ocramiusaethril | Last.fm | ocramius-macbook | 2015-04-20 22:08:56 |
1 | ocramius@gmail.com | 20 | ocramius | ocramius-android | 2015-04-20 22:08:56 | |
1 | ocramius@gmail.com | 21 | @ocramius | ocramius-android | 2015-04-20 22:08:56 | |
1 | ocramius@gmail.com | 22 | ocramiusaethril | Last.fm | ocramius-android | 2015-04-20 22:08:56 |
2 | grandpa@example.com | NULL |
NULL |
NULL |
NULL |
NULL |
3 | grandma@example.com | 85 | awesomegrandma | home-pc | 2015-04-15 10:05:31 |
If you try to re-normalize this resultset, you can actually see how many useless de-duplication operation have to happen.
That is because the User
ocramius@gmail.com has multiple active sessions on
multiple devices, as well as multiple social accounts.
SLOW!
The hydration operations on this resultset are O(n * m * q)
, which I'm going to simply
generalize as O(n ^ m)
, with n
being the amount of results, and m
being the amount of joined tables.
Here is a graphical representation of O(n ^ m)
:
Yes, it is bad.
How to avoid O(n ^ m)
hydration?
O(n ^ m)
can be avoided with some very simple, yet effective approaches.
No, it's not "don't use an ORM", you muppet.
Avoiding one-to-many
and many-to-many
associations
Collection valued associations are as useful as problematic, as you never know how much data you are going to load.
Unless you use fetch="EXTRA_LAZY"
and Doctrine\Common\Collections\Collection#slice()
wisely, you will probably make your app crash if you initialize a very large collection of associated objects.
Therefore, the simplest yet most limiting advice is to avoid collection-valued associations whenever they are not strictly necessary.
Additionally, reduce the amount of bi-directional associations to the strict necessary.
After all, code that is not required should not be written in first place.
Multi-step hydration
The second approach is simpler, and allows us to exploit how the ORM's UnitOfWork
is working
internally.
In fact, we can simply split hydration for different associations into different queries, or multiple steps:
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS socialAccountId,
s.username AS socialAccountUsername,
s.type AS socialAccountType
FROM
user u
LEFT JOIN
socialAccount s
ON s.userId = u.id
We already know this query: hydration for it is O(n * m)
, but that's the best we can do,
regardless of how we code it.
SELECT
u.id AS userId,
u.username AS userUsername,
s.id AS sessionId,
s.expiresOn AS sessionExpiresOn,
FROM
user u
LEFT JOIN
session s
ON s.userId = u.id
This query is another O(n * m)
hydration one, but we are now only loading the user sessions
in the resultsets, avoiding duplicate results overall.
By re-fetching the same users, we are telling the ORM to re-hydrate those objects (which are now in memory,
stored in the UnitOfWork
): that fills the User#$sessions
collections.
Also, please note that we could have used a JOIN
instead of a LEFT JOIN
, but that
would have triggered lazy-loading on the sessions for the grandpa@example.com User
Additionally, we could also skip the userUsername field from the results, as it already is in memory and well known.
SOLUTION:
We now reduced the hydration complexity from O(n ^ m)
to O(n * m * k)
, with
n
being the amount of User
instances, m
being the amount of associated
to-many
results, and k
being the amount of associations that we want to hydrate.
Coding multi-step hydration in Doctrine ORM
Let's get more specific and code the various queries represented above in DQL.
Here is the O(n ^ m)
query (in this case, O(n ^ 3)
):
return $entityManager
->createQuery('
SELECT
user, socialAccounts, sessions
FROM
User user
LEFT JOIN
user.socialAccounts socialAccounts
LEFT JOIN
user.sessions sessions
')
->getResult();
This is how you'd code the multi-step hydration approach:
$users = $entityManager
->createQuery('
SELECT
user, socialAccounts
FROM
User user
LEFT JOIN
user.socialAccounts socialAccounts
')
->getResult();
$entityManager
->createQuery('
SELECT PARTIAL
user.{id}, sessions
FROM
User user
LEFT JOIN
user.sessions sessions
')
->getResult(); // result is discarded (this is just re-hydrating the collections)
return $users;
I'd also add that this is the only legitimate use-case for partial hydration that I ever had, but it's a personal opinion/feeling.
Other alternatives (science fiction)
As you may have noticed, all this overhead is caused by normalizing de-normalized data coming from the DB.
Other solutions that we may work on in the future include:
- Generating hydrator code - solves constant overhead issues, performs better with JIT engines such as HHVM
- Leveraging the capabilities of powerful engines such as PostgreSQL, which comes with JSON support (since version 9.4), and would allow us to normalize the fetched data to some extent
- Generate more complex SQL, creating an own output format that is "hydrator-friendly" (re-inventing the wheel here seems like a bad idea)
Research material
Just so you stop thinking that I pulled out all these thought out of thin air, here is a repository with actual code examples that you can run, measure, compare and patch yourself:
https://github.com/Ocramius/Doctrine2StepHydration
Give it a spin and see the results for yourself!