원클릭으로 Manus에서 모든 스킬 실행

sysdesign-autocomplete-trie-fuzzy

Use when designing autocomplete / search suggestions — weighted trie for top-k prefix matches, query sampling, fuzzy matching, and pre-serve moderation.

Manus에서 실행

스타0

포크0

업데이트2026년 4월 23일 02:35

출처

danilods

danilods/matilha-sysdesign-pack

GitHub 저장소 열기 Creator 저장소 보기

설치 명령

다운로드

Manus에서 실행

유용한 대상SOC

소프트웨어 개발자컴퓨터 및 수학직15-1252L4

SKILL.md

readonly

name	sysdesign-autocomplete-trie-fuzzy
description	Use when designing autocomplete / search suggestions — weighted trie for top-k prefix matches, query sampling, fuzzy matching, and pre-serve moderation.
category	sysdesign
version	1.0.0
requires	[]
optional_companions	[]

When this fires

Fires on prompts like "design autocomplete for search", "sugestão de queries em tempo real", "type-ahead for product names", or "como o Google sugere enquanto digito". Also fires when an existing autocomplete is listed as too slow, too stale, or too permissive ("it suggested a banned term"). The skill installs the four load-bearing pieces: a weighted trie for top-K-by-prefix, query sampling so the index does not grow with every search, fuzzy matching for typos and phonetics, and a content- moderation filter that runs before any suggestion leaves the server.

Preconditions

The team can describe where suggestions come from — historical search queries, catalog entries, or both. Autocomplete with no corpus is a different problem (cold-start generation, not retrieval).
There is a latency budget on the keystroke path (typical: <50ms from keystroke to rendered suggestions). Without it, the design collapses to "just query Elasticsearch".
The team can articulate what counts as a "popular" query — a rough threshold (e.g., queries seen more than 10 times in the last 30 days). This threshold drives sampling.
A moderation source of truth exists or can be bootstrapped (banned- terms list, slur filter, regulatory terms). Shipping autocomplete without this will surface the platform's worst content.

Execution Workflow

Model the data structure. A weighted trie stores prefixes as paths and associates each node with the top-K completions ranked by weight (query frequency, recency, personalization signal). Reads are O(prefix length + K) — fast enough for the keystroke path.
Precompute per-node top-K at index time, not query time. Walking the subtree on every keystroke burns the latency budget; store the top-K list directly at each node and refresh it during index updates.
Introduce query sampling explicitly. Do not index every query the platform has ever seen — the long tail is noise and personal data. Keep queries above a popularity threshold; log everything but index only the sampled set. This keeps the trie memory-bounded and the suggestions quality-gated.
Layer fuzzy matching for typos. Options, in increasing cost order: (a) deletion-based indexing (Wang et al.) — precompute variants with one deletion per position; (b) Levenshtein automata against the trie — higher precision, more memory; (c) phonetic indexing (Soundex/Metaphone) for proper-name-heavy corpora. Pick one based on the corpus, not on all three.
Place moderation at the edge of the suggestion service — filter candidates against the banned list after trie retrieval and before the response leaves the server. Moderation at index time is not enough; terms get added to the list after indexing, and you must still block them on the read path.
Design the update path. The trie is read-heavy; build the new trie offline, swap atomically (pointer flip or rolling update), and keep the old trie around for rollback during the swap window. Avoid mutating the live trie in place.
Capacity plan. Store the trie in memory on the suggestion service (Redis with custom structures or an in-process C/Rust structure are both common). Shard by prefix when memory per instance exceeds comfort; shard routing is then a prefix-hash lookup.
Ingestion must absorb spikes. Accept and log every search query regardless of index state; the sampler runs offline on the log. A spike that overruns the indexer should not drop searches.

Rules: Do

Precompute top-K at each trie node; refresh during index build, not during read.
Sample queries before indexing — the long tail is not content, it is noise plus PII.
Run moderation after retrieval and before response, even if the corpus was pre-filtered. Lists evolve faster than indexes.
Build the new trie offline and swap atomically. In-place mutation of a live read structure is how autocomplete services become unreliable.
Log all search traffic for the sampler; index a subset. Log and index are different systems with different retention rules.

Rules: Don't

Don't index every query. The platform's long tail is mostly unique and mostly unhelpful as suggestions; you will also accidentally index PII typed by users (emails, phone numbers).
Don't run Elasticsearch on the keystroke path without a trie cache in front. ES is fine as a fallback, not as a default for type-ahead.
Don't skip phonetic/fuzzy for human-name corpora. Proper names are where typos are most common and exact-prefix suggestions feel broken.
Don't mutate the trie live. Rebuild and swap, or pay for it later with stale partial writes.
Don't rely on index-time moderation alone. The banned list is updated after incidents; the read-path filter is the enforcement point.

Expected Behavior

After this skill, the autocomplete design has a clear separation: logs capture everything, a sampler selects popular queries, a trie- builder produces a versioned artifact, and the read service serves from the current trie with a post-retrieval moderation pass. Latency on the keystroke path stays inside the budget and is decoupled from total corpus size.

Conversations about "why didn't it suggest X" get a specific answer (X was below sampling threshold, or X was moderated out) rather than "the search is weird".

Quality Gates

Keystroke-path p99 latency stays under the declared budget under realistic load.
Moderation filter runs on every response; tested with a list-update scenario that post-dates the index.
Index updates are atomic — no partial-tree reads possible.
Sampling threshold is explicit and tunable; memory usage has a documented ceiling per shard.
Ingestion logs all queries even during indexer downtime.

Companion Integration

Pairs with sysdesign-rate-limiting-strategies (autocomplete endpoints are abuse magnets), sysdesign-cdn-object-store (static parts of the suggestion UI), and sysdesign-monitoring-4-golden-signals for the keystroke-path SLIs. With matilha-ux-pack installed, ux-perceived-performance covers debounce, cancellation, and render cadence on the client — the frontend half of a fast autocomplete. Methodology phase: 20-30 (spec + plan) when building new, 10 (discovery) when diagnosing an existing autocomplete.

Output Artifacts

Architecture diagram with ingestion → log → sampler → trie builder → read service → moderation filter → response.
Data-structure note describing the trie node, top-K storage, and fuzzy strategy chosen.
Sampling-threshold document, dated, with the rationale.
Banned-terms integration point marked on the diagram.
Capacity plan with memory ceiling per shard and rebuild cadence.

Example Constraint Language

Use "must" for: post-retrieval moderation, atomic index swap, a latency budget on the keystroke path.
Use "should" for: a fuzzy strategy matched to the corpus (phonetic for names, deletion-based for short words, Levenshtein automata for precision-heavy cases), a sampling threshold above the noise floor.
Use "may" for: personalization layered on top of the global top-K, per-locale tries when the corpus splits cleanly by language.

Troubleshooting

"Suggestions feel stale": rebuild cadence is too slow. Check the sampler → builder → swap latency end-to-end; the bottleneck is usually the builder step for large corpora.
"P99 blows up under typing bursts": either the trie is evicted from cache (check memory pressure) or the service is saturating on the fuzzy step. Profile; Levenshtein automata on every keystroke is expensive.
"It suggested a banned term": moderation ran only at index time, and the term was added after the last build. Add the filter at response time; both layers should exist.
"Ingestion dies during traffic spikes": the ingestion path is coupled to the indexer. Decouple: log first, sample later. The indexer can lag; logging cannot.

Concrete Example

A marketplace launches product-search autocomplete backed directly by Elasticsearch queries on every keystroke. p99 lands at 240ms (budget was 50ms) and during peak, the search cluster starts shedding writes. The team introduces a weighted trie: last 30 days of queries above ten occurrences become the sampled corpus, top-10 completions are precomputed per node, and phonetic indexing is layered for product names. A banned-terms filter runs on every response. P99 drops to 18ms, ES load halves, and when a new banned term is added after a press incident, it is filtered within minutes without a reindex.

Sources

[[concepts/design-cases]] — Design Autocomplete case
[[concepts/nfr-system-design]] — latency and accuracy sections
Synthesized from Zhiyong Tan, Acing the System Design Interview (chapter 11, Design Autocomplete).

이 저장소의 다른 Skills

같은 저장소

matilha-sysdesign-trigger

danilods/matilha-sysdesign-pack

Use when the user mentions system design, scalability, distributed systems, latency, throughput, availability, CAP theorem, database design, caching, rate limiting, CDN, microservices, SLA, NFR, capacity planning, bottleneck, message queue, or any topic related to large-scale system architecture and infrastructure design. Fires independently of compose to ensure matilha-sysdesign-pack skills activate whenever system design domain appears.

2026-04-260

sysdesign-cdn-object-store

danilods/matilha-sysdesign-pack

Use when serving images, video, or downloads globally — places a CDN in front of an object store, names invalidation strategy, and handles private-content auth.

2026-04-230

sysdesign-dead-letter-queue

danilods/matilha-sysdesign-pack

Use when handling message failures in a queue or stream — installs a DLQ with retry policy, backoff, alerting on growth, and a reprocess-after-fix flow.

2026-04-230

sysdesign-dual-write-event-sourcing

danilods/matilha-sysdesign-pack

Use when a design writes to two stores on one action (DB + Kafka, DB + cache, DB + search) or when deciding whether event sourcing is worth its complexity.

2026-04-230

sysdesign-event-streaming-kafka

danilods/matilha-sysdesign-pack

Use when deciding between Kafka and a simpler queue — picks Kafka for decoupling, ordered delivery, and replay, or rejects it for low-volume point-to-point work.

2026-04-230

sysdesign-idempotency-patterns

danilods/matilha-sysdesign-pack

Use when designing write endpoints that may be retried — install idempotency keys, dedup stores, and at-most-once semantics to prevent duplicate effects.

2026-04-230

name	sysdesign-autocomplete-trie-fuzzy
description	Use when designing autocomplete / search suggestions — weighted trie for top-k prefix matches, query sampling, fuzzy matching, and pre-serve moderation.
category	sysdesign
version	1.0.0
requires	[]
optional_companions	[]

When this fires

Preconditions

The team can describe where suggestions come from — historical search queries, catalog entries, or both. Autocomplete with no corpus is a different problem (cold-start generation, not retrieval).
There is a latency budget on the keystroke path (typical: <50ms from keystroke to rendered suggestions). Without it, the design collapses to "just query Elasticsearch".
The team can articulate what counts as a "popular" query — a rough threshold (e.g., queries seen more than 10 times in the last 30 days). This threshold drives sampling.
A moderation source of truth exists or can be bootstrapped (banned- terms list, slur filter, regulatory terms). Shipping autocomplete without this will surface the platform's worst content.

Execution Workflow

Model the data structure. A weighted trie stores prefixes as paths and associates each node with the top-K completions ranked by weight (query frequency, recency, personalization signal). Reads are O(prefix length + K) — fast enough for the keystroke path.
Precompute per-node top-K at index time, not query time. Walking the subtree on every keystroke burns the latency budget; store the top-K list directly at each node and refresh it during index updates.
Introduce query sampling explicitly. Do not index every query the platform has ever seen — the long tail is noise and personal data. Keep queries above a popularity threshold; log everything but index only the sampled set. This keeps the trie memory-bounded and the suggestions quality-gated.
Layer fuzzy matching for typos. Options, in increasing cost order: (a) deletion-based indexing (Wang et al.) — precompute variants with one deletion per position; (b) Levenshtein automata against the trie — higher precision, more memory; (c) phonetic indexing (Soundex/Metaphone) for proper-name-heavy corpora. Pick one based on the corpus, not on all three.
Place moderation at the edge of the suggestion service — filter candidates against the banned list after trie retrieval and before the response leaves the server. Moderation at index time is not enough; terms get added to the list after indexing, and you must still block them on the read path.
Design the update path. The trie is read-heavy; build the new trie offline, swap atomically (pointer flip or rolling update), and keep the old trie around for rollback during the swap window. Avoid mutating the live trie in place.
Capacity plan. Store the trie in memory on the suggestion service (Redis with custom structures or an in-process C/Rust structure are both common). Shard by prefix when memory per instance exceeds comfort; shard routing is then a prefix-hash lookup.
Ingestion must absorb spikes. Accept and log every search query regardless of index state; the sampler runs offline on the log. A spike that overruns the indexer should not drop searches.

Rules: Do

Precompute top-K at each trie node; refresh during index build, not during read.
Sample queries before indexing — the long tail is not content, it is noise plus PII.
Run moderation after retrieval and before response, even if the corpus was pre-filtered. Lists evolve faster than indexes.
Build the new trie offline and swap atomically. In-place mutation of a live read structure is how autocomplete services become unreliable.
Log all search traffic for the sampler; index a subset. Log and index are different systems with different retention rules.

Rules: Don't

Don't index every query. The platform's long tail is mostly unique and mostly unhelpful as suggestions; you will also accidentally index PII typed by users (emails, phone numbers).
Don't run Elasticsearch on the keystroke path without a trie cache in front. ES is fine as a fallback, not as a default for type-ahead.
Don't skip phonetic/fuzzy for human-name corpora. Proper names are where typos are most common and exact-prefix suggestions feel broken.
Don't mutate the trie live. Rebuild and swap, or pay for it later with stale partial writes.
Don't rely on index-time moderation alone. The banned list is updated after incidents; the read-path filter is the enforcement point.

Expected Behavior

Conversations about "why didn't it suggest X" get a specific answer (X was below sampling threshold, or X was moderated out) rather than "the search is weird".

Quality Gates

Keystroke-path p99 latency stays under the declared budget under realistic load.
Moderation filter runs on every response; tested with a list-update scenario that post-dates the index.
Index updates are atomic — no partial-tree reads possible.
Sampling threshold is explicit and tunable; memory usage has a documented ceiling per shard.
Ingestion logs all queries even during indexer downtime.

Companion Integration

Output Artifacts

Architecture diagram with ingestion → log → sampler → trie builder → read service → moderation filter → response.
Data-structure note describing the trie node, top-K storage, and fuzzy strategy chosen.
Sampling-threshold document, dated, with the rationale.
Banned-terms integration point marked on the diagram.
Capacity plan with memory ceiling per shard and rebuild cadence.

Example Constraint Language

Use "must" for: post-retrieval moderation, atomic index swap, a latency budget on the keystroke path.
Use "should" for: a fuzzy strategy matched to the corpus (phonetic for names, deletion-based for short words, Levenshtein automata for precision-heavy cases), a sampling threshold above the noise floor.
Use "may" for: personalization layered on top of the global top-K, per-locale tries when the corpus splits cleanly by language.

Troubleshooting

"Suggestions feel stale": rebuild cadence is too slow. Check the sampler → builder → swap latency end-to-end; the bottleneck is usually the builder step for large corpora.
"P99 blows up under typing bursts": either the trie is evicted from cache (check memory pressure) or the service is saturating on the fuzzy step. Profile; Levenshtein automata on every keystroke is expensive.
"It suggested a banned term": moderation ran only at index time, and the term was added after the last build. Add the filter at response time; both layers should exist.
"Ingestion dies during traffic spikes": the ingestion path is coupled to the indexer. Decouple: log first, sample later. The indexer can lag; logging cannot.

Concrete Example

Sources

[[concepts/design-cases]] — Design Autocomplete case
[[concepts/nfr-system-design]] — latency and accuracy sections
Synthesized from Zhiyong Tan, Acing the System Design Interview (chapter 11, Design Autocomplete).