ワンクリックで
running-tests
// running tests at various levels from smoke tests to full suite to randomized tests
// running tests at various levels from smoke tests to full suite to randomized tests
extending yourself with a new reusable skill by interviewing the user
analyzing a change to determine what tests are needed and adding them to the test suite
modifying build configuration to enable/disable variants, switch compilers or flags, or otherwise prepare for a build
reviewing a change for semantic correctness, simplicity, design consistency, and completeness
reviewing a git diff for small localized coding mistakes that can be fixed without high-level understanding
how to run make correctly to get a good build, and otherwise understand the build system
| name | running-tests |
| description | running tests at various levels from smoke tests to full suite to randomized tests |
This skill is for running tests systematically, starting with fast/focused tests and progressing to slower/broader tests. This ordering allows failures to be caught early, minimizing wasted time.
This skill is designed to be run as a subagent to avoid cluttering the invoking agent's context. The output is either confirmation that all tests passed, or a report of failures.
Since subagents cannot ask for clarification, the invoking agent must gather this information before launching:
Changed files/modules: Which files or modules were changed, so the subagent can identify appropriate smoke tests and focused tests.
Test levels to run: Which levels to execute. Options:
The subagent prompt should include: "Run tests for changes in <files/modules>."
To reduce noise and keep agent context manageable, always use these flags:
# Recommended flags for quiet output
--ll fatal # Only log fatal errors (not info/debug messages)
-r simple # Use simple reporter (minimal output)
--disable-dots # Don't print progress dots
--abort # Stop on first failure (don't run remaining tests)
Example:
./stellar-core test --ll fatal -r simple --disable-dots --abort "test name"
Note that if you ever do need information about a test when trying to diagnose
what went wrong with it, you might want to turn the log level up from fatal to
info, debug or even trace, using --ll debug or --ll trace for example.
Many tests are protocol-specific and can behave differently across protocol versions. Use these flags to control which protocol versions are tested:
--version <N> # Run tests for a specific protocol version
--all-versions # Run tests for all supported protocol versions
For focused testing during development, test with the current protocol version,
which is the default. The full test suite should eventually be run with
--all-versions.
Tests use a deterministic PRNG. By default, the seed varies, but you can set a specific seed for reproducibility:
--rng-seed <N> # Use a specific RNG seed for reproducibility
This is useful for reproducing failures or for baseline checks that require consistent output.
Tests are run in order of increasing cost. Stop at the first failure.
Run 2-3 specific tests that are most likely to catch breakage in the changed code. These should complete in seconds.
To identify smoke tests:
# Run a specific test by name (use quotes for exact match)
./stellar-core test --ll fatal -r simple --abort "exact test name"
Run all tests in the test file(s) related to the change. This typically takes a few minutes.
# Run tests matching a tag pattern
./stellar-core test --ll fatal -r simple --abort "[ModuleName*]"
# Run tests from a specific area
./stellar-core test --ll fatal -r simple --abort "[ledgertxn]"
# Combine tags (AND logic - must match all)
./stellar-core test --ll fatal -r simple --abort "[tx][soroban]"
Ledger/Transaction tests:
"[ledgertxn]" - LedgerTxn operations"[tx][payment]" - Payment transaction tests"[tx][createaccount]" - CreateAccount tests"[tx][offers]" - Offer/DEX tests"[tx][soroban]" - Soroban (smart contract) transaction testsBucket/BucketList tests:
"[bucket]" - General bucket tests"[bucketlist]" - BucketList specific tests"[bucketmergemap]" - Bucket merge map testsHerder tests:
"[herder]" - General herder tests"[txset]" - Transaction set tests"[transactionqueue]" - Transaction queue tests"[quorumintersection]" - Quorum intersection tests"[upgrades]" - Protocol upgrade testsOverlay/Network tests:
"[overlay]" - Overlay network tests"[flood]" - Transaction flooding tests"[PeerManager]" - Peer management testsCrypto/Utility tests:
"[crypto]" - Cryptography tests"[decoder]" - Base32/64 encoding tests"[timer]" - VirtualClock timer tests"[cache]" - Cache implementation testsSoroban-specific tests:
"[soroban]" - All Soroban tests"[soroban][archival]" - State archival tests"[soroban][upgrades]" - Soroban upgrade testsRun the complete unit test suite. This may take 10-30 minutes.
make check
Or directly with quiet output:
./stellar-core test --ll fatal -r simple --disable-dots --abort
For faster execution, use parallel partitions via make check:
# Run with partitions equal to CPU cores
NUM_PARTITIONS=$(nproc) make check
The full test suite should be run with all protocol versions:
ALL_VERSIONS=1 NUM_PARTITIONS=$(nproc) make check
To test with SQLite only (faster, no Postgres dependency):
./configure --disable-postgres --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
NUM_PARTITIONS=$(nproc) make check
This validates that transaction test execution produces the same metadata hashes as fixed baselines stored in the repository. This catches unintended changes to transaction semantics.
Important: Always use --rng-seed 12345 for baseline checks to ensure
deterministic results.
# Check transaction tests against current protocol baseline
./stellar-core test "[tx]" --all-versions --rng-seed 12345 --ll fatal \
--abort -r simple --check-test-tx-meta test-tx-meta-baseline-current
For next-protocol testing (when preparing protocol upgrades):
./stellar-core test "[tx]" --all-versions --rng-seed 12345 --ll fatal \
--abort -r simple --check-test-tx-meta test-tx-meta-baseline-next
If baselines need updating after intentional changes, the test will fail and indicate which baselines differ.
When to run: Only needed for changes touching memory management, pointers, concurrency, or threading code. Skip for simple logic changes, config changes, or test-only changes.
Run tests with sanitizers enabled to catch memory errors and undefined behavior. This requires reconfiguring and rebuilding.
Catches memory errors: buffer overflows, use-after-free, memory leaks.
./configure --enable-asan --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abort
Catches data races and threading issues.
./configure --enable-threadsanitizer --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abort
Catches undefined behavior like integer overflow, null pointer dereference.
./configure --enable-undefinedcheck --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abort
When to run: Only for changes to core data structures or when Level 4 sanitizers found something suspicious. Usually overkill.
Run with C++ standard library debugging enabled. Slower but catches more issues.
./configure --enable-extrachecks --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
./stellar-core test --ll fatal -r simple --disable-dots --abort
Before running tests at Levels 4-6, also verify the build succeeds with
--disable-tests (the production configuration):
./configure --disable-tests --enable-ccache --enable-sdfprefs
make clean && make -j $(nproc)
This doesn't run tests but ensures the production build works.
When a test fails:
Report the results:
## Test Results: PASS
All test levels completed successfully:
- Level 1 (Smoke): 3 tests, 2.1s
- Level 2 (Focused): 47 tests, 1m 12s
- Level 3 (Full Suite): 1,234 tests, 18m 45s
- Level 3b (TX Meta Baseline): OK
Build verification:
- --disable-tests: OK
Or on failure:
## Test Results: FAIL
Failed at Level 2 (Focused Unit Tests)
**Failing test:** `LedgerManagerTests.processTransactionRejectsEmpty`
**File:** src/ledger/LedgerManagerTests.cpp:142
**Error:**
REQUIRE( result == TRANSACTION_REJECTED )
with expansion:
TRANSACTION_SUCCESS == TRANSACTION_REJECTED
**Analysis:** The test expects empty transactions to be rejected, but the
new code path is allowing them through. See LedgerManager.cpp:98 where the
empty check appears to be missing.
Levels completed before failure:
- Level 1 (Smoke): 3 tests, 2.1s ✓
For most changes (logic fixes, new features, refactors):
--all-versionsFor memory-sensitive changes (pointers, allocations, C++ containers):
For concurrency changes (threading, async, locks):
For test-only changes or documentation:
--abort flag)--ll fatal -r simple --disable-dots for quiet output--all-versions before considering complete--rng-seed 12345 for tx-meta baseline checksReport to the invoking agent: