Ejecuta cualquier Skill en Manus
con un clic

Ejecuta cualquier Skill en Manus con un clic

$pwd:

mariadb-errors-galera-conflicts

Name: Mariadb Errors Galera Conflicts
Author: Impertio-Studio

// Use when a Galera cluster reports certification failures, ER_LOCK_DEADLOCK fires at COMMIT in a multi-master setup, wsrep_local_cert_failures keeps climbing, a hot-row is contended across nodes, or the cluster appears to be in split-brain. Prevents the common mistake of treating Galera ER_LOCK_DEADLOCK as a bug, retrying without backoff, ignoring hot-row design, running a 2-node Galera without an arbitrator, or "fixing" symptoms by disabling wsrep_on. Covers certification-based replication semantics, ER_LOCK_DEADLOCK at COMMIT (Galera-specific path), wsrep_local_cert_failures and wsrep_local_bf_aborts status variables, hot-row detection plus sharding redesign, split-brain prevention with pc.weight and garbd, application retry strategy with exponential backoff, gcache exhaust leading to forced SST. Keywords: Galera, certification failure, wsrep_local_cert_failures, wsrep_local_bf_aborts, ER_LOCK_DEADLOCK at commit, ER_QUERY_INTERRUPTED, hot row, multi-master conflict, split brain, pc.weight, pc.ignore_sb, garb

Ejecutar en Manus

$ git log --oneline --stat

stars:1

forks:0

updated:19 de mayo de 2026, 23:38

Explorador de archivos

4 archivos

SKILL.md

readonly

related-skills.json

mismo repositorio

mariadb-agents-query-optimizer.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when a user provides a slow query plus its EXPLAIN output and wants a concrete optimization, or asks "make this query faster", or wants an index recommendation validated against MariaDB optimizer behavior. Prevents the common mistake of suggesting an index without checking selectivity, proposing FORCE INDEX as a fix, copying MySQL 8 optimizer hints, or recommending an index that duplicates an existing one. Covers a deterministic query-optimization procedure : read EXPLAIN, identify the bottleneck (type=ALL, filesort, temporary), propose index or rewrite, validate against composite leftmost-prefix rule, check covering-index opportunity, verify with ANALYZE FORMAT=JSON, cross-references mariadb-impl-query-optimization, mariadb-syntax-indexing, mariadb-errors-slow-queries. Keywords: optimize my query, make this query faster, query optimization, EXPLAIN analysis, index recommendation, why is this slow, slow query fix, covering index, index suggestion, query rewrite, optimize SQL, performance fix, EXPLAIN, ANA

2026-05-191

mariadb-agents-schema-reviewer.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when reviewing a proposed MariaDB schema before it ships, auditing an existing schema for engine / indexing / naming / normalization problems, or validating a migration DDL. Prevents the common mistake of shipping a schema with MyISAM tables, UUID-text PKs, missing tenant indexes, utf8 charset, or composite indexes in the wrong column order. Covers a deterministic schema-review checklist : storage-engine choice, primary-key type, indexing strategy and column-order, charset / collation, normalization fitness, multi-tenant pattern detection, naming-convention adherence, with severity grading and cross-references to mariadb-core-storage-engines, mariadb-syntax-indexing, mariadb-impl-schema-design. Keywords: schema review, schema audit, review my schema, is this schema correct, schema checklist, design review, storage engine audit, index audit, primary key audit, normalization check, multi-tenant check, naming convention, DDL review, before I ship this schema, ENGINE=InnoDB, MyISAM, BIGINT AUTO_INCREMENT, UUI

2026-05-191

mariadb-impl-query-optimization.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when a query is slow, when reading EXPLAIN output, when deciding on index hints, when tuning optimizer_switch flags, when using optimizer_trace, or when comparing MariaDB and MySQL optimizer behavior. Prevents the common mistake of trusting type=ALL queries in production, blindly applying USE INDEX without checking selectivity, leaving outdated statistics, or copying MySQL 8 optimizer assumptions to MariaDB. Covers EXPLAIN reading column-by-column, EXPLAIN FORMAT=JSON, ANALYZE FORMAT=JSON (actual execution stats), index hints USE/FORCE/IGNORE INDEX, optimizer_switch flags (MariaDB-specific), optimizer_trace, ANALYZE TABLE for statistics, persistent vs in-memory statistics. Keywords: EXPLAIN, EXPLAIN FORMAT JSON, ANALYZE FORMAT JSON, query plan, optimizer, optimizer_switch, optimizer_trace, USE INDEX, FORCE INDEX, IGNORE INDEX, type ALL, type ref, type range, key_len, rows, filtered, Using filesort, Using temporary, Using index, ANALYZE TABLE, persistent statistics, why is my query slow, slow query, query

2026-05-191

mariadb-syntax-sql-dml.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when writing INSERT, UPDATE, DELETE, REPLACE, or upsert statements, debugging "why was this row not updated", or migrating MySQL DML patterns to MariaDB. Prevents the INSERT IGNORE silent-corruption trap, REPLACE INTO FK cascade, ON DUPLICATE KEY auto-increment burn, multi-table UPDATE/DELETE ordering pitfalls, and the "UPDATE RETURNING does not exist in LTS" gotcha. Covers INSERT single-row + multi-row + INSERT SET + INSERT ... SELECT, INSERT ... ON DUPLICATE KEY UPDATE, INSERT IGNORE pitfalls, REPLACE INTO, UPDATE ... ORDER BY ... LIMIT, multi-table UPDATE/DELETE with JOIN, INSERT/DELETE RETURNING, DELETE HISTORY for system-versioned tables. Keywords: INSERT, UPDATE, DELETE, REPLACE, ON DUPLICATE KEY UPDATE, INSERT IGNORE, RETURNING, INSERT RETURNING, DELETE RETURNING, multi-table UPDATE, multi-table DELETE, upsert, why was my row not updated, my insert ignored the error, auto increment burn, FK cascade on REPLACE, mariadb upsert pattern, UPDATE RETURNING not supported, DELETE HISTORY, system-versioned

2026-05-191

mariadb-core-defaults-and-sql-modes.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when investigating "my query worked yesterday and now doesn't", upgrading between LTS releases, migrating from MySQL, or setting up a new MariaDB instance with explicit sql_mode and charset. Prevents the common mistake of relying on implicit defaults that change between versions, mixing sql_mode-strict with legacy data, or assuming utf8 means utf8mb4. Covers sql_mode per version (STRICT_TRANS_TABLES, ANSI_QUOTES, NO_ZERO_DATE, etc.), default server charset_set_server shift from latin1 to utf8mb4 in 11.6, default utf8mb4 collation shift to utf8mb4_uca1400_ai_ci in 11.5, default authentication plug-in evolution, default storage engine (InnoDB), default binlog format MIXED since 10.2.3, my.cnf defaults that break apps on upgrade. Keywords: mariadb defaults, sql_mode, STRICT_TRANS_TABLES, ANSI_QUOTES, default charset, utf8mb4, utf8mb4_uca1400, default authentication, default storage engine, my.cnf defaults, why does this query fail now, upgrade broke my app, sql mode change between versions, NO_ZERO_DATE, ONL

2026-05-191

mariadb-core-version-matrix.md

from "Impertio-Studio/MariaDB-Claude-Skill-Package"

Use when choosing which MariaDB version to install, upgrading between LTS releases, reasoning about feature availability, or interpreting EOL dates. Prevents the common mistake of assuming a feature works in 10.6 when it landed in 10.11, or running production on a non-LTS interim release. Covers LTS cadence (10.6 / 10.11 / 11.4 / 11.8 / next LTS), interim releases, EOL dates, breaking changes 10.6 to 10.11 to 11.x, feature-introduction matrix for top-30 features, upgrade-path with mariadb-upgrade. Keywords: mariadb version, LTS, 10.6, 10.11, 11.4, 11.8, 12, end of life, EOL, breaking change, feature matrix, when was X added, upgrade mariadb, mariadb-upgrade, mariadb_upgrade, version compatibility, jump version, which mariadb should I use, is feature X available, my version is old, support expired, production version

2026-05-191

package.json

"author": "Impertio-Studio"

"repository": "Impertio-Studio/MariaDB-Claude-Skill-Package"

Abrir repositorio de GitHub Ver repositorios del creador

$ install --global

$ download --local

Ejecutar en Manus

$ useful --forSOC

Administradores de redes y sistemas informáticosOcupaciones informáticas y matemáticas15-1244L4

name	mariadb-errors-galera-conflicts
description	Use when a Galera cluster reports certification failures, ER_LOCK_DEADLOCK fires at COMMIT in a multi-master setup, wsrep_local_cert_failures keeps climbing, a hot-row is contended across nodes, or the cluster appears to be in split-brain. Prevents the common mistake of treating Galera ER_LOCK_DEADLOCK as a bug, retrying without backoff, ignoring hot-row design, running a 2-node Galera without an arbitrator, or "fixing" symptoms by disabling wsrep_on. Covers certification-based replication semantics, ER_LOCK_DEADLOCK at COMMIT (Galera-specific path), wsrep_local_cert_failures and wsrep_local_bf_aborts status variables, hot-row detection plus sharding redesign, split-brain prevention with pc.weight and garbd, application retry strategy with exponential backoff, gcache exhaust leading to forced SST. Keywords: Galera, certification failure, wsrep_local_cert_failures, wsrep_local_bf_aborts, ER_LOCK_DEADLOCK at commit, ER_QUERY_INTERRUPTED, hot row, multi-master conflict, split brain, pc.weight, pc.ignore_sb, garbd, arbitrator, retry transaction, my galera keeps deadlocking, Galera-specific deadlock, write-set certification, wsrep_provider_options, gcache exhausted, forced SST, wsrep_cluster_status NON_PRIMARY, why does my commit fail on galera, how do I fix galera certification errors, what is a galera split brain, how do I stop galera deadlocks
license	MIT
compatibility	Designed for Claude Code. Requires MariaDB 10.6-LTS, 10.11-LTS, 11.x, 12.x.
metadata	{"author":"OpenAEC-Foundation","version":"1.0"}

MariaDB Galera Cluster Conflicts and Certification Failures

How to recognise, diagnose, and recover from Galera-specific failure modes : certification deadlocks at COMMIT, hot-row contention across nodes, split-brain and Non-Primary partitions, gcache exhaustion forcing a full SST. Galera is a synchronous multi-master cluster built on certification-based replication ; its failure surface is different from standalone InnoDB and must be reasoned about as a cluster property, not a per-node property.

Quick Reference

Galera ER_LOCK_DEADLOCK (error 1213) returned at COMMIT is NORMAL behavior, NOT a bug. The certification protocol detects write-write conflicts between nodes only at commit-time. ALWAYS retry the entire transaction.
wsrep_local_cert_failures is a monotonic counter of certification failures on THIS node. Growing fast = hot-row contention across nodes. Redesign with sharded keys.
wsrep_local_bf_aborts counts local transactions aborted by an incoming applier (brute-force abort). High value alongside cert-failures confirms multi-master write conflict pressure.
ALWAYS retry with exponential backoff (50 ms base, 2x growth, max 5 attempts, optional jitter) to avoid retry-storms that compound the conflict.
A 2-node Galera cluster is UNSAFE : it cannot form a stable quorum on a partition. ALWAYS run 3 nodes minimum, OR 2 full nodes + 1 garbd arbitrator.
pc.weight (provider option, default 1, dynamic) biases quorum on asymmetric DC topology. Set the larger DC higher so it survives a WAN split.
pc.ignore_sb=ON is DANGEROUS in multi-master : it allows writes on a disconnected partition. Use ONLY on a read-only standby or single-master topology, NEVER as a "quick fix" for split-brain alarms.
wsrep_cluster_status='NON_PRIMARY' means this node lost quorum and is read-only. Find the minority partition and recover it ; do NOT force pc.bootstrap blindly.
NEVER disable wsrep_on to "make the error go away". It de-syncs the node from the cluster and corrupts cluster consistency.
Galera replicates ONLY write-sets (modified rows). Long SELECT-only transactions on hot data do NOT cause certification failures on the executing node.

Galera Failure-Mode Identification

Symptom	Where it surfaces	What it means
`ER_LOCK_DEADLOCK` (1213, SQLSTATE 40001) returned at the `COMMIT` line	Application client on any node	Certification conflict : another node committed a write-set that conflicts with this one. Retry full transaction.
`wsrep_local_cert_failures` rising fast	`SHOW STATUS LIKE 'wsrep_local_cert_failures'`	This node is losing the certification race repeatedly. Indicates hot-row contention or cross-node write-storm.
`wsrep_local_bf_aborts` rising fast	`SHOW STATUS LIKE 'wsrep_local_bf_aborts'`	Local in-flight transactions are being killed by remote applier threads (brute-force abort). Same root cause as cert-failures from the other direction.
Application sees `ER_QUERY_INTERRUPTED`	Application client	Mid-flight transaction killed by an incoming higher-priority write-set. Treat identically to 1213 : retry.
`wsrep_cluster_status='NON_PRIMARY'` on a subset of nodes	`SHOW STATUS LIKE 'wsrep_cluster_status'`	Quorum lost on those nodes. They are read-only. Identify and recover the minority partition.
All nodes reachable but cluster-size shrunk	`SHOW STATUS LIKE 'wsrep_cluster_size'` returns less than expected node count	One or more nodes have left the primary component. Check their `wsrep_local_state_comment` for cause.
Joining node never finishes IST, falls back to full SST	Joiner log, `wsrep_local_state_comment` cycles `Joining` to `Donor/Desynced`	The donor's `gcache` no longer holds the write-sets the joiner needs. Forced full SST.

The most common confusion : a transaction that touched no obviously-contended row on the LOCAL node still gets ER_LOCK_DEADLOCK at COMMIT. That is the certification protocol working as designed. The conflict was detected against a write-set that ANOTHER node committed during this transaction's lifetime.

Decision Tree

Galera cluster reporting errors
|
+-- ER_LOCK_DEADLOCK / ER_QUERY_INTERRUPTED at COMMIT ?
|     |
|     +-- Is wsrep_on=ON on this node ? (confirm with SHOW VARIABLES)
|     |     NO  : this is standalone InnoDB. See mariadb-errors-deadlocks instead.
|     |     YES : continue.
|     |
|     +-- Has wsrep_local_cert_failures jumped > 10x baseline ?
|     |     YES : hot-row contention. Identify the hot key (see methods.md).
|     |           Redesign : shard the row or use a sequence.
|     |     NO  : transient cross-node write race. Retry transaction with backoff.
|     |
|     +-- Has the same logical transaction failed > 5 retries ?
|           YES : design problem. Stop retrying, fix the root cause (hot-row, lock-order).
|           NO  : exponential backoff retry, max 5 attempts.
|
+-- wsrep_cluster_status='NON_PRIMARY' ?
|     |
|     +-- How many nodes total in the deployment ?
|     |     2 : you have no quorum protection. Add garbd or a 3rd node BEFORE recovery.
|     |     3+ : a minority partition lost quorum. Find the majority, validate its data.
|     |
|     +-- Is this an asymmetric DC layout (2 DC-A, 1 DC-B) ?
|     |     YES : verify pc.weight is biased to the larger DC. See examples.md.
|     |     NO  : continue to recovery.
|     |
|     +-- Recover by joining the minority node BACK to the primary component.
|           Do NOT force pc.bootstrap unless ALL nodes are NON_PRIMARY and you confirmed
|           which node has the latest committed data (via wsrep_last_committed).
|
+-- Joining node loops Joining/Donor/Desynced ?
      |
      +-- Check joiner log for IST failure : gcache exhausted on donor.
      +-- Increase gcache.size on a healthy node, or restart the joiner with SST forced.
      +-- See anti-patterns.md : gcache too small under write-storm.

Certification-Based Replication Mental Model

A Galera transaction is local-only until COMMIT. At COMMIT, the modified rows are bundled as a write-set and broadcast to all nodes. Each node certifies the write-set against every in-flight and recently-committed write-set on that node. If a conflict is found (same primary key modified concurrently), certification fails and the transaction is aborted on the originating node with ER_LOCK_DEADLOCK.

Three consequences :

The error always lands at COMMIT, NEVER mid-transaction. Standalone InnoDB deadlocks land on the offending statement ; Galera certification deadlocks land on commit. This changes where the retry-handler must sit in the application code.
The originating node sees the failure even though no local lock was held by another local transaction. The conflicting transaction can be on a different node entirely.
Long-running READ-ONLY transactions do NOT cause certification failures. Only WRITE-sets are certified. A reporting transaction can run for hours without triggering this path (it has other risks ; see mariadb-impl-galera-cluster).

Key Status Variables

Variable	Type	Meaning
`wsrep_local_cert_failures`	monotonic counter	Local transactions that failed certification on this node and were rolled back
`wsrep_local_bf_aborts`	monotonic counter	Local transactions aborted by an incoming applier (brute-force abort)
`wsrep_cluster_status`	gauge	`PRIMARY` (quorum present), `NON_PRIMARY` (quorum lost), `DISCONNECTED`
`wsrep_local_state`	gauge	Internal FSM state number
`wsrep_local_state_comment`	gauge	Human-readable state (e.g. `Synced`, `Donor/Desynced`, `Joining`, `Joined`)
`wsrep_cluster_size`	gauge	Number of nodes currently in the primary component
`wsrep_last_committed`	gauge	Sequence number of the last committed write-set on this node
`wsrep_provider_options`	system variable	Active provider-option string (includes `pc.weight`, `pc.ignore_sb`, `gcache.size`, etc.)

ALWAYS alert on RATE OF CHANGE of wsrep_local_cert_failures and wsrep_local_bf_aborts, NEVER on the absolute value. A non-zero counter is normal on any healthy multi-master cluster.

-- 10.6+ : Galera health snapshot
SHOW STATUS LIKE 'wsrep_local_cert_failures';
SHOW STATUS LIKE 'wsrep_local_bf_aborts';
SHOW STATUS LIKE 'wsrep_cluster_status';
SHOW STATUS LIKE 'wsrep_cluster_size';
SHOW STATUS LIKE 'wsrep_local_state_comment';

See references/methods.md for the complete wsrep status-variable reference and a Galera error-code matrix.

Application Retry Strategy

Retry the ENTIRE transaction, not the failing statement. Use exponential backoff with random jitter to avoid synchronized retries (retry-storm). Log every retry. Treat the retry budget as a circuit-breaker : exhausting it means a design problem, not a transient failure.

# Python 3.10+, mariadb 1.1+, MariaDB 10.6+ Galera cluster
import mariadb, time, random, logging

MAX_RETRIES = 5
BASE_BACKOFF_SECONDS = 0.05  # 50 ms

GALERA_CERT_ERRORS = {1213, 1614}  # ER_LOCK_DEADLOCK, ER_QUERY_INTERRUPTED equivalents

def run_with_galera_retry(conn, work_fn):
    for attempt in range(1, MAX_RETRIES + 1):
        try:
            cur = conn.cursor()
            cur.execute("START TRANSACTION")
            work_fn(cur)
            conn.commit()
            return
        except mariadb.OperationalError as e:
            errno = e.args[0] if e.args else None
            if errno in GALERA_CERT_ERRORS:
                conn.rollback()
                if attempt == MAX_RETRIES:
                    logging.error("Galera retries exhausted ; design review needed")
                    raise
                backoff = BASE_BACKOFF_SECONDS * (2 ** (attempt - 1)) + random.uniform(0, 0.025)
                logging.warning("Galera conflict attempt %d, backoff %.3fs", attempt, backoff)
                time.sleep(backoff)
                continue
            raise

Retry budget is 3 to 5 attempts. If a transaction needs more than that under normal load, the workload has a hot-row, missing index, or bad lock-order. Retries mask the problem ; they do not fix it.

See references/examples.md for retry implementations in PHP, Node.js, and Java.

Hot-Row Redesign

A single primary key contended by writers on multiple nodes will deadlock under any sustained write load. Galera cannot make this go away ; the certification protocol detects the conflict and one node always loses. Common hot-row patterns :

Global counter table : UPDATE counter SET v = v + 1 on the only row.
Single-row inventory : UPDATE stock SET qty = qty - 1 WHERE sku = ? on a popular SKU.
Single-row rate-limit bucket : UPDATE rl SET count = count + 1 WHERE key = ?.
Sequence-emulated-as-row : a single-row current_value table.

The fix is design-level, not tuning-level. Shard the hot key across N rows, aggregate at read time.

-- 10.6+ : sharded counter, N=16 shards keyed by hash
CREATE TABLE counter_sharded (
  shard TINYINT UNSIGNED NOT NULL,
  v BIGINT NOT NULL DEFAULT 0,
  PRIMARY KEY (shard)
);

INSERT INTO counter_sharded (shard, v)
SELECT seq.seq, 0 FROM seq_0_to_15 seq;

-- Increment : pick a random shard
UPDATE counter_sharded SET v = v + 1
  WHERE shard = FLOOR(RAND() * 16);

-- Read : sum across shards
SELECT SUM(v) AS total FROM counter_sharded;

For sequences, use the native SEQUENCE object (MariaDB 10.3+) instead of an UPDATE counter. Sequences are gap-tolerant by design and avoid the certification race entirely.

-- 10.6+ : SEQUENCE avoids hot-row entirely
CREATE SEQUENCE order_id_seq START WITH 1 INCREMENT BY 1;
INSERT INTO orders (id, ...) VALUES (NEXTVAL(order_id_seq), ...);

See references/examples.md for additional hot-row redesigns (stock, rate-limit, queue table).

Split-Brain Prevention

A split-brain occurs when a network partition divides the cluster into groups that each believe the other is dead. With a 2-node cluster this is unavoidable on any partition. Galera prevents both sides from accepting writes via the Primary Component (PC) algorithm : only the partition with the majority of nodes (by count, weighted by pc.weight) accepts writes ; the minority becomes NON_PRIMARY and goes read-only.

Three deployment patterns prevent split-brain :

3 full nodes (preferred) : any single partition leaves a 2-node majority.
2 full nodes + 1 garbd arbitrator : garbd is a quorum-only daemon that participates in voting but stores no data. Place garbd on a third host so that a 2-host cluster has 3 quorum participants.
Asymmetric DC with pc.weight : 2 nodes in DC-A with pc.weight=2, 1 node in DC-B with pc.weight=1. On a WAN split, DC-A keeps writes ; DC-B goes read-only.

-- 10.6+ : set pc.weight at runtime on a node
SET GLOBAL wsrep_provider_options = 'pc.weight=2';

# Start garbd on a third host (no MariaDB needed, just galera-arbitrator package)
garbd --group prod_cluster \
      --address gcomm://node1.example.com,node2.example.com \
      --log-file /var/log/garbd.log

See references/methods.md for the full pc.* and gcs.* provider-option reference, and references/examples.md for asymmetric-DC weighted-quorum deployments.

Recovering a Non-Primary Partition

If wsrep_cluster_status='NON_PRIMARY' on every node (rare ; usually the result of a multi-DC outage), one node must bootstrap a new primary component. ALWAYS pick the node with the highest wsrep_last_committed value (the latest committed data).

-- 10.6+ : query the seqno on each NON_PRIMARY node BEFORE bootstrapping
SHOW STATUS LIKE 'wsrep_last_committed';

-- 10.6+ : on the node with the highest seqno only
SET GLOBAL wsrep_provider_options = 'pc.bootstrap=true';

The other nodes then rejoin via IST (if their gcache is still in range) or SST (full state transfer otherwise). Forcing pc.bootstrap on the wrong node loses committed data and is unrecoverable without a backup.

See references/anti-patterns.md for the "force pc.bootstrap blindly" anti-pattern.

gcache Exhaustion and Forced SST

When a node falls behind and tries to rejoin via IST (incremental state transfer), the donor node must still hold the missing write-sets in its gcache ring buffer. Default gcache.size=128M. On a write-heavy cluster, 128M holds only seconds of write-sets ; a node that was down for minutes will need a full SST instead.

Symptoms of forced SST : joiner log cycles between Joining and Donor/Desynced ; joiner wsrep_local_state_comment shows SST in progress ; donor goes read-only during SST under the legacy rsync method. The fix : size gcache.size for the expected outage window. For a 10 MB/s sustained write rate and a 1-hour outage budget : 36 GB gcache.

-- 10.6+ : check active provider options including gcache.size
SHOW VARIABLES LIKE 'wsrep_provider_options';

gcache.size is NOT dynamic ; setting it via SET GLOBAL wsrep_provider_options does not resize an active ring buffer. Restart the node with the new value in my.cnf.

SST Method Selection

Method	Status	Donor blocked	Use when
`mariabackup`	Recommended default	NO (non-blocking, InnoDB-aware)	Always, for production
`rsync`	Deprecated for SST	YES (donor read-locks during transfer)	Only for legacy setups ; avoid
`mysqldump`	Deprecated for SST	YES (logical dump, very slow)	Never on production

ALWAYS set wsrep_sst_method=mariabackup in my.cnf. The rsync and mysqldump methods are kept for backward compatibility, not for new deployments.

See references/methods.md for the SST method comparison and mariadb-impl-galera-cluster for full setup.

What This Skill Does NOT Cover

Initial Galera cluster setup and my.cnf configuration : see mariadb-impl-galera-cluster.
Standalone InnoDB deadlocks (not Galera) : see mariadb-errors-deadlocks.
Asynchronous replication lag (non-Galera) : see mariadb-errors-replication-lag.
Backup and restore including mariabackup standalone use : see mariadb-impl-backup-restore.
General performance tuning (buffer pool, IO, query cache) : see mariadb-impl-performance-tuning.

Reference Links

references/methods.md : Galera error-code matrix, complete wsrep_* status variable reference, certification flow diagram, hot-row redesign patterns, pc.* / gcs.* / evs.* provider-option reference.
references/examples.md : 10+ working examples (reproduce certification failure, monitor wsrep_local_cert_failures, hot-row sharding redesign, application retry loop, pc.weight asymmetric DC, garbd setup, gcache sizing, SST recovery, NON_PRIMARY bootstrap).
references/anti-patterns.md : 8+ real anti-patterns (treating commit-deadlock as bug, retry without backoff, hot-row design, 2-node without garbd, pc.ignore_sb in multi-master, disabling wsrep_on, gcache too small, force pc.bootstrap blindly).

Source Verification

All facts in this skill were verified via WebFetch against MariaDB official documentation :

mariadb.com/kb/en/galera-cluster-status-variables/ : wsrep_local_cert_failures (monotonic counter, total local transactions that failed certification), wsrep_local_bf_aborts (monotonic counter, local transactions aborted by replication applier threads), wsrep_cluster_status (PRIMARY / NON_PRIMARY / DISCONNECTED), wsrep_local_state, wsrep_local_state_comment.
mariadb.com/kb/en/galera-cluster-system-variables/ : wsrep_on (default OFF, must be ON to join cluster), wsrep_provider (path to libgalera_smm.so), wsrep_cluster_address (gcomm://node1,node2,node3), wsrep_retry_autocommit (default 1, range 0-10000), wsrep_max_ws_size (default 2 GB), wsrep_sync_wait (session, bitmask 0-15), wsrep_certify_nonPK (default ON).
mariadb.com/docs/galera-cluster/reference/wsrep-variable-details/wsrep_provider_options : pc.weight (default 1, dynamic, integer), pc.ignore_sb (default false, dynamic, dangerous in multi-master), pc.recovery (default true, not dynamic), gcache.size (default 128M, not dynamic), evs.suspect_timeout (default PT5S).
mariadb.com/docs/server/reference/error-codes/mariadb-error-codes-1200-to-1299 : 1213 ER_LOCK_DEADLOCK "Deadlock found when trying to get lock; try restarting transaction", 1205 ER_LOCK_WAIT_TIMEOUT "Lock wait timeout exceeded; try restarting transaction".
Vooronderzoek Cluster-3 §2 : certification-based replication semantics, wsrep_local_cert_failures as the canonical indicator, pc.weight for asymmetric DC quorum, 2-node Galera plus garbd as the minimum-viable HA shape.
LESSONS L-003 : Galera upstream galeracluster.com is bot-blocked ; MariaDB KB pages are the canonical source.

mariadb-errors-galera-conflicts

Más de este repositorio

Más de este repositorio

MariaDB Galera Cluster Conflicts and Certification Failures

Quick Reference

Galera Failure-Mode Identification

Decision Tree

Certification-Based Replication Mental Model

Key Status Variables

Application Retry Strategy

Hot-Row Redesign

Split-Brain Prevention

Recovering a Non-Primary Partition

gcache Exhaustion and Forced SST

SST Method Selection

What This Skill Does NOT Cover

Reference Links

Source Verification

MariaDB Galera Cluster Conflicts and Certification Failures

Quick Reference

Galera Failure-Mode Identification

Decision Tree

Certification-Based Replication Mental Model

Key Status Variables

Application Retry Strategy

Hot-Row Redesign

Split-Brain Prevention

Recovering a Non-Primary Partition

gcache Exhaustion and Forced SST

SST Method Selection

What This Skill Does NOT Cover

Reference Links

Source Verification