Marco Pivetta (Ocramius)

Doctrine ORM Hydration Performance Optimization

PRE-REQUISITE: Please note that this article explains complexity in internal ORM operations with the Big-O notation. Consider reading this article, if you are not familiar with the Big-O syntax.

What is hydration?

Doctrine ORM, like most ORMs, is performing a process called Hydration when converting database results into objects.

This process usually involves reading a record from a database result and then converting the column values into an object's properties.

Here is a little pseudo-code snippet that shows what a mapper is actually doing under the hood:

<?php

$results          = [];
$reflectionFields = $mappingInformation->reflectionFields();

foreach ($resultSet->fetchRow() as $row) {
    $object = new $mappedClassName;

    foreach ($reflectionFields as $column => $reflectionField) {
        $reflectionField->setValue($object, $row[$column]);
    }

    $results[] = $object;
}

return $results;

That's a very basic example, but this gives you an idea of what an ORM is doing for you.

As you can see, this is an O(N) operation (assuming a constant number of reflection fields).

There are multiple ways to speed up this particular process, but we can only remove constant overhead from it, and not actually reduce it to something more efficient.

When is hydration expensive?

Hydration starts to become expensive with complex resultsets.

Consider the following SQL query:

SELECT
    u.id       AS userId,
    u.username AS userUsername,
    s.id       AS socialAccountId,
    s.username AS socialAccountUsername,
    s.type     AS socialAccountType
FROM
    user u
LEFT JOIN
    socialAccount s
        ON s.userId = u.id

Assuming that the relation from user to socialAccount is a one-to-many, this query retrieves all the social accounts for all the users in our application

A resultset may be as follows:

userId userUsername socialAccountId socialAccountUsername socialAccountType
1 ocramius@gmail.com 20 ocramius Facebook
1 ocramius@gmail.com 21 @ocramius Twitter
1 ocramius@gmail.com 22 ocramiusaethril Last.fm
2 grandpa@example.com NULL NULL NULL
3 grandma@example.com 85 awesomegrandma9917 Facebook

As you can see, we are now joining 2 tables in the results, and the ORM has to perform more complicated operations:

Hydrate 1
User object for ocramius@gmail.com
Hydrate 3
SocialAccount instances into User#$socialAccounts for ocramius@gmail.com, while skipping re-hydrating User ocramius@gmail.com
Hydrate 1
User object for grandpa@example.com
Skip hydrating
User#$socialAccounts for grandpa@example.com, as no social accounts are associated
Hydrate 1
User object for grandma@example.com
Hydrate 1
SocialAccount instance into User#$socialAccounts for grandma@example.com

DOCS This operation is what is done by Doctrine ORM when you use the DQL Fetch Joins feature.

Fetch joins are a very efficient way to hydrate multiple records without resorting to multiple queries, but there are two performance issues with this approach (both not being covered by this article):

  • Empty records require some useless looping inside the ORM internals (see grandpa@example.com's social account). This is a quick operation, but we can't simply ignore those records upfront.
  • If multiple duplicated records are being joined (happens a lot in many-to-many associations), then we want to de-duplicate records by keeping a temporary in-memory identifier map.

Additionally, our operation starts to become more complicated, as it is now O(n * m), with n and m being the records in the user and the socialAccount tables.

What the ORM is actually doing here is normalizing data that was fetched in a de-normalized resultset, and that is going through your CPU and your memory.

Bringing hydration cost to an extreme

The process of hydration becomes extremely expensive when more than 2 LEFT JOIN operations clauses are part of our queries:

SELECT
    u.id         AS userId,
    u.username   AS userUsername,
    sa.id        AS socialAccountId,
    sa.username  AS socialAccountUsername,
    sa.type      AS socialAccountType,
    s.id         AS sessionId,
    s.expiresOn  AS sessionExpiresOn,
FROM
    user u
LEFT JOIN
    socialAccount sa
        ON sa.userId = u.id
LEFT JOIN
    session s
        ON s.userId = u.id

This kind of query produces a much larger resultset, and the results are duplicated by a lot:

userId user Username social Account Id social Account Username social Account Type session Id session Expires On
1 ocramius@gmail.com 20 ocramius Facebook ocramius-macbook 2015-04-20 22:08:56
1 ocramius@gmail.com 21 @ocramius Twitter ocramius-macbook 2015-04-20 22:08:56
1 ocramius@gmail.com 22 ocramiusaethril Last.fm ocramius-macbook 2015-04-20 22:08:56
1 ocramius@gmail.com 20 ocramius Facebook ocramius-android 2015-04-20 22:08:56
1 ocramius@gmail.com 21 @ocramius Twitter ocramius-android 2015-04-20 22:08:56
1 ocramius@gmail.com 22 ocramiusaethril Last.fm ocramius-android 2015-04-20 22:08:56
2 grandpa@example.com NULL NULL NULL NULL NULL
3 grandma@example.com 85 awesomegrandma Facebook home-pc 2015-04-15 10:05:31

If you try to re-normalize this resultset, you can actually see how many useless de-duplication operation have to happen.

That is because the User ocramius@gmail.com has multiple active sessions on multiple devices, as well as multiple social accounts.

SLOW! The hydration operations on this resultset are O(n * m * q), which I'm going to simply generalize as O(n ^ m), with n being the amount of results, and m being the amount of joined tables.

Here is a graphical representation of O(n ^ m):

Boy, that escalated quickly

Yes, it is bad.

How to avoid O(n ^ m) hydration?

O(n ^ m) can be avoided with some very simple, yet effective approaches.

No, it's not "don't use an ORM", you muppet.

Avoiding one-to-many and many-to-many associations

Collection valued associations are as useful as problematic, as you never know how much data you are going to load.

Unless you use fetch="EXTRA_LAZY" and Doctrine\Common\Collections\Collection#slice() wisely, you will probably make your app crash if you initialize a very large collection of associated objects.

Therefore, the simplest yet most limiting advice is to avoid collection-valued associations whenever they are not strictly necessary.

Additionally, reduce the amount of bi-directional associations to the strict necessary.

After all, code that is not required should not be written in first place.

Multi-step hydration

The second approach is simpler, and allows us to exploit how the ORM's UnitOfWork is working internally.

In fact, we can simply split hydration for different associations into different queries, or multiple steps:

SELECT
    u.id         AS userId,
    u.username   AS userUsername,
    s.id         AS socialAccountId,
    s.username   AS socialAccountUsername,
    s.type       AS socialAccountType
FROM
    user u
LEFT JOIN
    socialAccount s
        ON s.userId = u.id

We already know this query: hydration for it is O(n * m), but that's the best we can do, regardless of how we code it.

SELECT
    u.id        AS userId,
    u.username  AS userUsername,
    s.id        AS sessionId,
    s.expiresOn AS sessionExpiresOn,
FROM
    user u
LEFT JOIN
    session s
        ON s.userId = u.id

This query is another O(n * m) hydration one, but we are now only loading the user sessions in the resultsets, avoiding duplicate results overall.

By re-fetching the same users, we are telling the ORM to re-hydrate those objects (which are now in memory, stored in the UnitOfWork): that fills the User#$sessions collections.

Also, please note that we could have used a JOIN instead of a LEFT JOIN, but that would have triggered lazy-loading on the sessions for the grandpa@example.com User

Additionally, we could also skip the userUsername field from the results, as it already is in memory and well known.

SOLUTION: We now reduced the hydration complexity from O(n ^ m) to O(n * m * k), with n being the amount of User instances, m being the amount of associated to-many results, and k being the amount of associations that we want to hydrate.

Coding multi-step hydration in Doctrine ORM

Let's get more specific and code the various queries represented above in DQL.

Here is the O(n ^ m) query (in this case, O(n ^ 3)):

return $entityManager
    ->createQuery('
        SELECT
            user, socialAccounts, sessions 
        FROM
            User user
        LEFT JOIN
            user.socialAccounts socialAccounts
        LEFT JOIN
            user.sessions sessions
    ')
    ->getResult();

This is how you'd code the multi-step hydration approach:

$users = $entityManager
    ->createQuery('
        SELECT
            user, socialAccounts
        FROM
            User user
        LEFT JOIN
            user.socialAccounts socialAccounts
    ')
    ->getResult();

$entityManager
    ->createQuery('
        SELECT PARTIAL
            user.{id}, sessions
        FROM
            User user
        LEFT JOIN
            user.sessions sessions
    ')
    ->getResult(); // result is discarded (this is just re-hydrating the collections)

return $users;

I'd also add that this is the only legitimate use-case for partial hydration that I ever had, but it's a personal opinion/feeling.

Other alternatives (science fiction)

As you may have noticed, all this overhead is caused by normalizing de-normalized data coming from the DB.

Other solutions that we may work on in the future include:

  • Generating hydrator code - solves constant overhead issues, performs better with JIT engines such as HHVM
  • Leveraging the capabilities of powerful engines such as PostgreSQL, which comes with JSON support (since version 9.4), and would allow us to normalize the fetched data to some extent
  • Generate more complex SQL, creating an own output format that is "hydrator-friendly" (re-inventing the wheel here seems like a bad idea)

Research material

Just so you stop thinking that I pulled out all these thought out of thin air, here is a repository with actual code examples that you can run, measure, compare and patch yourself:

https://github.com/Ocramius/Doctrine2StepHydration

Give it a spin and see the results for yourself!

Tags: php, doctrine, orm, hydration, performance, speed