Skip to content
On this page
developed by

Performance

Before we created the DbToolsBundle, we looked at other PHP tools to anonymize data. After a quick search, we can't find a project that seems to be over the top and most of the solutions we found was unmaintained.

Also, all these projects used the same methodology:

  1. Load the data from the database
  2. Anonymize it in PHP
  3. Persist updated data in database

This approach has a benefit: you can use an external library - such as Faker - to anonymize your data. But it also has a big drawback: it is very slow.

For the DbToolsBundle, we choose another methodology: we anonymize with SQL queries. We use database management systems for what they are good at: processing huge amounts of data.

INFO

Anonymizing through SQL, the DbToolsBundle generates long and complex update queries. To get great performance for each one of the platform it addresses, a meticulous work has been made. Queries are optimized differently depending on the database platform on which they will be executed.

Learn more about these optimizations.

We get pretty good results with this approach. And to demonstrate the DbToolsBundle capabilities, we created a benchmark app.

In this app, you will find a Symfony application that uses 4 different DBAL Doctrine connections (SQLite, PostgreSQL, MariaDB and MySQL).

For each one of these connections, we defined the same 3 entities: Customer, Address and Order.

php
#[ORM\Entity(repositoryClass: CustomerRepository::class)]
class Customer
{
    #[ORM\Id]
    #[ORM\GeneratedValue]
    #[ORM\Column]
    private ?int $id = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'email')]
    private ?string $email = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'password')]
    private ?string $password = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'lastname')]
    private ?string $lastname = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'firstname')]
    private ?string $firstname = null;

    #[ORM\Column]
    #[Anonymize(type: 'integer', options: ['min' => 10, 'max' => 99])]
    private ?int $age = null;

    #[ORM\OneToMany(mappedBy: 'customer', targetEntity: Address::class, orphanRemoval: true)]
    private Collection $addresses;

    #[ORM\Column(length: 255)]
    private ?string $telephone = null;

    #[ORM\OneToMany(mappedBy: 'customer', targetEntity: Order::class, orphanRemoval: true)]
    private Collection $orders;

    // ...
}
php
#[ORM\Entity(repositoryClass: AddressRepository::class)]
#[Anonymize(type: 'address', options: [
    'street_address' => 'street',
    'postal_code' => 'zip_code',
    'locality' => 'city',
    'country' => 'country',
])]
class Address
{
    #[ORM\Id]
    #[ORM\GeneratedValue]
    #[ORM\Column]
    private ?int $id = null;

    #[ORM\ManyToOne(inversedBy: 'addresses')]
    #[ORM\JoinColumn(nullable: false)]
    private ?Customer $customer = null;

    #[ORM\Column(length: 255)]
    private ?string $street = null;

    #[ORM\Column(length: 255)]
    private ?string $zipCode = null;

    #[ORM\Column(length: 255)]
    private ?string $city = null;

    #[ORM\Column(length: 255)]
    private ?string $country = null;

    //...
}
php
#[ORM\Entity(repositoryClass: OrderRepository::class)]
#[ORM\Table(name: '`order`')]
class Order
{
    #[ORM\Id]
    #[ORM\GeneratedValue]
    #[ORM\Column]
    private ?int $id = null;

    #[ORM\ManyToOne(inversedBy: 'orders')]
    #[ORM\JoinColumn(nullable: false)]
    private ?Customer $customer = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'fr-fr.phone')]
    private ?string $telephone = null;

    #[ORM\Column(length: 255)]
    #[Anonymize(type: 'email')]
    private ?string $email = null;

    #[ORM\Column]
    #[Anonymize(type: 'float', options: ['min' => 10, 'max' => 99])]
    private ?float $amount = null;

    #[ORM\Column]
    private ?\DateTimeImmutable $createdAt = null;

    #[ORM\ManyToOne]
    #[ORM\JoinColumn(nullable: false)]
    private ?Address $billingAddress = null;

    #[ORM\ManyToOne]
    #[ORM\JoinColumn(nullable: false)]
    private ?Address $shippingAddress = null;

    #[ORM\Column(type: Types::TEXT, nullable: true)]
    #[Anonymize(type: 'lorem')]
    private ?string $note = null;

    //...
}

Then, we executed the db-tools:anonymize command on different backups with different amounts of data.

Here are the results:

CustomerAddressOrderPostgreSQLSQLiteMariaDBMySQL
100 000--5s7s20s53s
500 000--9s10s37s3m 44s
1 000 000--16s16s1m 23s36m 56s
-200 000-6s10s26s42s
--1 000 00016s11s1m 15s25m 1s
100 000200 000-7s10s32s1m 16s
100 000200 0001 000 00024s25s1m 40s36m 47s
NB1: The indicated values are approximate and vary slightly each time the command is executed. A 10% margin of error can be assumed.
NB2: Each database vendor docker image has been used as is. Without any tweaking. This could explain the bad results for MySQL compared to MariaDB's ones.

Released under the MIT License.