| name | debugging-doctrine-performance |
| description | Use PROACTIVELY when you implement, review, audit, debug, or optimize bulk operations (imports, mass edits, variant creation, batch updates) in Doctrine/Shopsys codebases. Triggers on: slow batch operations, 'Doctrine CPU spiral', N+1 queries in loops, creating/updating 100+ entities, flush discipline issues, or side effect problems. Covers: creating bulk handlers, reviewing performance issues, auditing existing code for anti-patterns, and refactoring single-item loops to bulk paths. |
Doctrine in a large Shopsys-based e-commerce: Developer Guide for Predictable Performance & Maintainability
This guide is meant for teams working in a Shopsys / Symfony / Doctrine codebase where application code extends vendor framework code and where business operations often touch many tables (products, prices, visibilities, translations, URLs, exports, etc.).
The goal is not to "avoid Doctrine", but to use Doctrine in a way that stays predictable under load and remains maintainable for humans (and AI assistants) over time.
1) Core principles (the mental model)
Principle A ā Every request/use-case should have a clear "unit of work"
Treat each controller action / CLI command as an explicit "unit of work" with:
- clear transaction boundary
- clear flush boundary
- explicit side effects (recalculations, exports, visibility refresh, URLs, cache invalidations)
If you can't explain "what happens" in a short ordered list, the code is too implicit.
Principle B ā "Flush ownership" must be explicit
The #1 source of confusion is flushes happening deep in call stacks.
It makes performance unpredictable and breaks reasoning (because a random helper can flush and trigger huge UnitOfWork work).
Team rule (recommended):
- Only the Facade layer is allowed to call
flush().
- Lower layers (Factory, Repository) must not call
flush() unless the method name clearly states it, e.g. saveAndFlush() / createAndFlush(), and it's reviewed as intentional.
Principle C ā Bulk operations must have bulk code paths
Any operation that can create/update N items (variants, imports, mass edit, batch status changes) must not reuse single-item Facade methods in a loop.
Looping "rich" facade methods usually multiplies:
- DB statements
- entity hydrations
- flushes
- recalculation work
Instead: implement a bulk API or a dedicated bulk service (e.g., *BulkCreator).
Principle D ā Hot paths should be "SQL-shaped"
Even if you keep Doctrine ORM, your algorithm should feel like:
- preload everything needed
- compute changes in memory (pure logic)
- write changes in as few flushes/statements as possible
- schedule side effects once
If the code needs repeated "ask DB for a little thing" inside loops, you'll get N+1 sooner or later.
2) Do / Don't rules (general, but based on real problems)
2.1 Flush discipline
Do
- Do:
persist() inside loops, flush() once at the end.
- Do: if IDs are required mid-process, flush in chunks (e.g. 100ā500), then continue.
- Do: design bulk operations so they can survive
clear() if you need to control memory.
Don't
- Don't: flush per entity in a loop.
- Don't: call framework/vendor helpers that flush implicitly from inside loops.
- Don't: call a "do everything" facade method repeatedly for batch creation.
Use case hint: If an action creates 100+ items, a per-item flush will almost always produce "Doctrine CPU spiral" symptoms.
2.2 Side effects are not free (URLs, visibilities, prices, exports, caches)
Do
- Do: separate writes from side effects:
- write core rows (products/orders/whatever)
- then apply side effects explicitly (URLs, visibilities, exports, recalcs)
- Do: decide "immediate vs delayed" consciously:
- immediate = correct right after redirect, but slower
- delayed = fast response, eventual consistency via cron/queue
Don't
- Don't: run expensive recalculation or export logic N times.
- Don't: rely on side effects happening "somewhere deep" in a facade chain.
Use case hint: Visibility recalculation and price recalculation are classic "run once per family/batch", not "per item".
2.3 Ban lazy loading in hot paths
Do
- Do: use repository queries that return all data needed (join or scalar queries).
- Do: preload reference data into maps (IDs ā data) before loops.
Don't
- Don't: iterate entities and call getters that trigger lazy loads in tight loops.
- Don't: call repository
findOneBy() repeatedly for the same lookup pattern.
Use case hint: Anything like "get translation/group/setting/price for each item" can hide N+1.
2.4 Prefer "ID-first" logic
If your loop logic only needs IDs or scalars:
- fetch scalars (DBAL/scalar DQL)
- avoid hydrating full entity graphs
This reduces both memory and UnitOfWork overhead.
2.5 Avoid "edit() as a hammer"
Large CRUD methods often do far more than "save fields".
They may trigger:
- parameter delete+insert
- image processing
- URL generation + uniqueness
- recalculation scheduling
- export scheduling
- visibility refresh
Do
- create targeted methods for targeted changes (e.g., "set template", "link variants", "mark for export")
Don't
- call "save everything" methods inside bulk flows or loops just to update one column.
3) Team conventions that prevent future disasters
Convention 1 ā Name methods by behavior
Examples:
create() (no flush)
createAndFlush() (flush inside, rare, justified)
createBulk() / bulkInsert() (explicit bulk path)
scheduleRecalculations() (explicit side effects)
This helps reviewers and AI assistants understand what happens.
Convention 2 ā Layering: where things belong
A maintainable structure in Shopsys typically looks like:
- Controller
- input validation, form handling
- Facade
- transaction boundary (via framework listener for HTTP, explicit for CLI)
- orchestration of business logic
- flush boundary
- explicit side effects policy (immediate vs delayed)
- Factory
- entity creation, pure logic, no DB, no flushing
- Repository
- queries, persistence helpers (ideally no flush unless explicit)
- Schedulers/Recalculators
- explicit calls, preferably once per use case (price, visibility, availability, export)
Convention 3 ā Bulk mode must be explicit
Introduce a simple policy object like:
BulkOperationOptions { immediateRecalc=false, immediateExport=false, chunkSize=200 }
So code doesn't "accidentally" execute immediate work in bulk flows.
4) Practical review checklist (copy into PR template)
When reviewing a change in a "potential hot path" (mass edit/import/variant creation/list export):
-
Flush count
- Is
flush() called in loops? (Must be "no".)
- Are there implicit flushes hidden in helpers/facades?
-
Query count
- Any repository call inside loops that looks like lookup-by-id/group/translation/settings?
- Any "per item"
findOneBy() pattern?
-
Lazy loading
- Does the loop call getters that can trigger lazy loads?
-
Side effects
- Are recalculations/exports/visibility refresh happening per item?
- Can they be done once per batch/family?
-
Memory & UnitOfWork
- Are thousands of entities being created/managed unnecessarily?
- Is
clear() used safely (not randomly)?
5) Tooling recommendations (to enforce the rules)
- Add a debug/profiler habit for hot actions:
- log query counts per request for selected routes
- add "budget thresholds" (e.g., max 200 queries for admin bulk actions)
- Use Blackfire (you already do) and track:
- flush count
- entity count
- query count
- Consider adding dev-only detection:
- Doctrine SQL logger with request summary
- fail tests if query count explodes for key actions
6) Concrete example (Option 1: keep Doctrine, make side effects explicit)
This is the pattern that works best in Shopsys-style codebases: keep Doctrine for compatibility, but force explicit orchestration.
Example A: Command handler for creating many related items
(Keeping this example relatively close to real use cases like "create variants", but the pattern applies to imports, mass edit, bulk status updates, etc.)
Controller (thin)
public function bulkCreateAction(Request $request): Response
{
$command = new CreateManyItemsCommand(
parentId: (int) $request->get('id'),
);
$result = $this->createManyItemsHandler->handle($command);
}
Handler (transaction + explicit phases + explicit side effects)
final class CreateManyItemsHandler
{
public function handle(CreateManyItemsCommand $cmd): CreateManyItemsResult
{
return $this->em->wrapInTransaction(function () use ($cmd) {
// Phase 1: preload everything needed (no lazy loads later)
$parent = $this->parentRepository->getById($cmd->parentId);
$referenceMap = $this->referenceRepository->getMapFor($cmd);
$newItemsData = $this->planner->plan($parent, $referenceMap, $cmd);
$newEntities = [];
foreach ($newItemsData as $data) {
$entity = $this->factory->create($data);
$this->em->persist($entity);
$newEntities[] = $entity;
}
$this->em->flush();
$this->sideEffects->applyForBulkCreate($parent, $newEntities, $cmd->options);
return new CreateManyItemsResult();
});
}
}
SideEffects service (single place to reason about "what else happens")
final class SideEffects
{
public function applyForBulkCreate($parent, array $children, BulkOptions $opts): void
{
if ($opts->createVisibilityRows) {
$this->visibilityWriter->bulkCreateRows($children);
}
if ($opts->createFriendlyUrls) {
$this->urlWriter->bulkCreateUrls($children);
}
$this->scheduler->markForExport($parent, $children);
$this->scheduler->markForRecalculation($parent, $children);
if ($opts->immediateRefreshFamilyVisibility) {
$this->visibilityRefresher->refreshFamily($parent->getId());
}
}
}
Why this helps your whole team:
One handler is "the truth" of the use case. No more guessing which facade triggers which flush or refresh.
Example B: Split low-level writers from Facade
A useful internal refactor pattern for bulk operations:
*BulkCreator / *Writer: does DB writes via DBAL, minimal policy, no flush (or flush only when explicitly asked)
*Facade: composes factories, writers, and side effects under explicit options
This reduces the "one method does everything" anti-pattern and keeps Facades readable.
Final note (how to use this guide)
- Use these rules for every batch-ish operation: imports, bulk variant creation, mass edit, product duplication, feed regeneration triggers.
- The goal is that every developer can answer:
- how many flushes happen?
- how many queries happen?
- what side effects happen?
- are they immediate or delayed?
7) Diagnostic output format
When analyzing routes or code paths, always produce a Route Flow Schema showing the call hierarchy with annotations:
ProductController::detailAction() [line 309]
āāā ProductDetailViewFacade::getVisibleProductDetail($id) ā Elasticsearch ā
āāā [IF isMainVariant] getVariantsParameterTemplate($product) ā ļøā ļøā ļø CRITICAL
ā āāā ProductFacade::getById() - loads Doctrine entity
ā āāā $product->getParameterTemplate()->getParameters() - lazy load
ā āāā LOOP: parameterTemplateParameterPositionFacade->getParameterPositionInParameterTemplate()
ā āāā parameterFacade->extractProductParameterValuesDataIncludedVariants() ā ļø
ā ā āāā LOOP for all variants: productParameterValueRepository queries
ā āāā LOOP: parameterFacade->getParameterValueById() ā ļø N+1
ā āāā LOOP: parameterFacade->getParameterValuesByGroupId() ā ļø N+1
āāā GtmFacade::onProductDetailPage()
āāā Template: Front/Content/Product/detail.html.twig
ā āāā BreadcrumbController::indexAction()
ā āāā CartController::productActionAction() (x2)
ā āāā [FOR variants loop] Heavy Twig iteration ā ļø
āāā Layout (layoutWithoutPanel.html.twig)
āāā [CACHED 24h] CategoryController::panelAction()
āāā FlashMessageController::indexAction()
Annotation legend:
ā
= Good pattern (Elasticsearch, single query, proper batching)
ā ļø = Concern (lazy loading, potential N+1, uncached heavy operation)
ā ļøā ļøā ļø CRITICAL = Severe issue requiring immediate attention
ā Elasticsearch = Data fetched from Elasticsearch (good)
LOOP: = Operation inside a loop (watch for N+1)
[IF condition] = Conditional branch
[FOR x loop] = Template loop iteration
[CACHED Xh] = Cached with duration
[line N] = Source code line reference
(xN) = Called N times
After the schema, provide:
- ā
Good Patterns - what's done well
- ā ļø Concerns - potential issues with code references
- šØ Critical - must-fix issues with specific line numbers and fix suggestions