| name | mem0-test-integration |
| description | Verify a Mem0 integration produced by /mem0-integrate. Runs in the same workspace on the same branch (loose coupling) — installs dependencies, runs the repo's native test suite, then exercises a real end-to-end smoke flow against the user's API key. Produces a scorecard. TRIGGER when: user has just run /mem0-integrate and says "verify", "test the integration", "run /mem0-test-integration", or when a .mem0-integration/ directory exists and tests have not been run yet on the current branch. DO NOT TRIGGER when: the user wants to run general project tests (defer to the repo's native test command), or when no prior /mem0-integrate run exists in the current branch (ask them to run /mem0-integrate first). This skill ONLY catches compile and runtime bugs by design. Logical integration errors — wrong data stored, wrong time retrieved, wrong user scoping — are on the human reviewer.
|
| license | Apache-2.0 |
| metadata | {"author":"mem0ai","version":"0.1.0","category":"ai-memory","tags":"memory, integration, testing, tdd, platform, oss","coupling":"loose","mem0_tested_versions":"mem0ai (PyPI) >=2.0.0,<3.0.0; mem0ai (npm) >=3.0.0,<4.0.0"} |
mem0-test-integration
Verifies what /mem0-integrate produced. Runs in the same workspace,
on the same feature branch. Loose coupling — fast, catches compile and
runtime bugs, does not catch logical errors.
Canonical sources (use these, not ambient knowledge)
All static checks and smoke-test shapes validate against these URLs.
WebFetch each before running step 3.
Read the Delegated skill: field in .mem0-integration/plan.md — if it
names a skill URL, fetch that skill and use its example blocks as the
reference for both static checks (step 3) and the smoke test (step 5).
Non-invasiveness contract
Every check in this skill assumes the integration is additive and
feature-flagged (see /mem0-integrate "Integration principles").
Specifically:
product.json must contain a feature_flag field.
- Steps 4–6 run in two passes:
- Pass A — flag unset. All pre-existing tests must pass, smoke/E2E
skip. The repo must behave like
main. Any failure here is a
hard fail — do not let the self-heal loop attempt a patch.
- Pass B — flag set. New tests must pass, smoke and E2E run.
- If Pass A fails, the scorecard marks
non_invasive: false and sets
overall: fail with a distinct reason code the integrator's heal
loop refuses to touch.
Preconditions
Refuse to start unless ALL of the following are true:
.mem0-integration/ directory exists in the repo root.
.mem0-integration/product.json, goal.md, and plan.md are readable
and internally consistent (JSON parses, docs non-empty).
- Current branch name begins with
mem0-integrate/ (set by the companion
skill). Prevents accidental runs on unrelated branches.
- Working tree is clean. The skill never modifies source files; any dirty
state means the integration is mid-edit and not ready to verify.
- The same API key the integration used is available in the environment
(
MEM0_API_KEY for Platform, OPENAI_API_KEY for OSS — read which from
product.json). Interactive mode asks if missing; CI mode exits 2.
Exit with a written rationale on any precondition failure. Never attempt
to "fix up" state.
Pipeline
1. Read the contract
Load:
product.json → which language, which product (Platform vs OSS), which
mem0 version, write_site, read_site.
plan.md → the mechanical contract (write pattern, read pattern,
preserved behavior).
goal.md → the intent (displayed in the scorecard only; not tested).
2. Install dependencies
Route by language from product.json:
| Language | Command |
|---|
| Python | pip install -e . if editable, else pip install -r requirements.txt. Then pip install mem0ai if not already present at the pinned version. |
| TypeScript / JavaScript | npm install (or pnpm install / yarn install if detected by lockfile). |
If install fails → exit code 2 with stderr tail. Never move to testing
if dependencies don't resolve.
3. Static sanity checks (fast, local, no API calls)
-
Import check: does the write-site file import the expected Mem0
surface? Authoritative list comes from ## Identify the User's Setup
in https://docs.mem0.ai/llms.txt:
- Platform Python →
from mem0 import MemoryClient
- Platform TS →
import MemoryClient from "mem0ai"
- OSS Python →
from mem0 import Memory
- OSS TS →
import { Memory } from "mem0ai/oss"
If plan.md names a delegated skill (e.g., Vercel AI), use that
skill's import signature instead of the list above. Mismatch → fail
with line number.
-
Version check: installed mem0ai version falls in the range from
this skill's mem0_tested_versions. Out of range → warn but continue.
-
Type check (TS tracks only): run tsc --noEmit or tsup --dts.
Non-zero → fail.
-
Lint (if the repo has a linter configured): run the repo's own
lint command. Lint failures from this skill's changes → fail; pre-existing
lint failures → surface as a warning.
-
Eager-init check: grep the write_site and read_site files (paths
from product.json) for MemoryClient( or Memory( at module scope —
i.e., not inside a function, method, or class body. MemoryClient()
validates the API key in __init__ (network call) and OSS Memory()
can eagerly initialize embedding/LLM providers — module-level
instantiation hits the wire on import and breaks Pass A's test
collection whenever the key is unset. Hit → fail with file:line and
the lazy-init guidance from /mem0-integrate step 8 constraint #7.
4. Run the repo's native test suite (two passes)
| Language | Test command (in priority order) |
|---|
| Python | pytest with the test files from step 5 of the companion skill, else python -m unittest discover. |
| TypeScript / JavaScript | npm test if defined in package.json; else auto-detect vitest or jest. |
Pass A — feature_flag unset. Run the entire pre-existing suite
(excluding the new test_mem0_* files). Must be 100% green. Any
failure here marks non_invasive: false in the scorecard and is
a hard fail — the integrator's self-heal loop refuses to touch it.
Pass B — feature_flag set (value from product.json). Run the
full suite including the new tests. All must pass.
Isolate integration-introduced failures using git diff main..HEAD --name-only. A test file that exists on main and fails only under
the integration branch (flag set or unset) counts against the
scorecard regardless of pass. A test file that already failed on main
is surfaced as pre_existing_unrelated and does not count — but is
still reported so the user can clean it up.
Capture output to .mem0-integration/test-stdout-flag-off.log and
.mem0-integration/test-stdout-flag-on.log. Scorecard reports pass/fail
per pass.
5. Smoke test (real API call, shortest round-trip)
Scripted end-to-end flow tailored to product.json. The call shapes
below are the minimal ones; if plan.md names a delegated skill, use
that skill's minimal example verbatim instead — it is the canonical
shape for the detected stack.
Platform (Python):
from mem0 import MemoryClient
c = MemoryClient() # uses MEM0_API_KEY
uid = f"mem0-test-integration-{os.urandom(4).hex()}"
c.add([{"role": "user", "content": "I prefer aisle seats"}], user_id=uid)
hits = c.search("seat preference", user_id=uid)
assert any("aisle" in h.get("memory", "") for h in hits), hits
c.delete_all(user_id=uid) # clean up
Platform (TS): same shape with MemoryClient from "mem0ai".
OSS (Python / TS): uses Memory() / new Memory() with default config
(OpenAI LLM via OPENAI_API_KEY, local Qdrant). If the repo ships a
docker-compose.yml with a Qdrant service, the skill starts it first and
tears it down after. If no backing store is reachable → fail with a
clear message naming the fix.
The smoke test always uses a disposable random user_id prefixed with
mem0-test-integration- so a failed cleanup doesn't pollute the user's
real data. A background tidy step deletes any prefix-matching entries
older than 24 hours on the next run.
Capture output to .mem0-integration/smoke-stdout.log.
6. E2E integration test (run the app, exercise the flow)
Unit tests + smoke prove the SDK works in isolation. This step is the
real signal: does memory actually appear in the app's user-visible
output when the integration runs end-to-end?
Requires plan.md to contain an E2E recipe: section (authored by
/mem0-integrate step 5). If absent → status skipped (not fail),
note in scorecard that the repo has no runnable entry point.
Recipe fields the skill reads:
start — shell command to launch the app using $PORT for any network
port. Run in background with stdout/stderr teed to
.mem0-integration/e2e-app.log.
ready_probe — how to detect readiness. url=... status=... polls an
HTTP endpoint; log="..." waits for a substring in e2e-app.log;
sleep=N waits N seconds (last resort). 60-second hard timeout.
compose_services — optional. If set, bring them up via
docker compose up -d <services> before start, tear them down with
docker compose down at the end.
write_call — triggers the Mem0 write path exactly once. Output is
captured and surfaced on failure. 60-second hard timeout.
write_async_wait_ms — pause after write_call to let async memory
flushes land. Default 0.
read_call — triggers the Mem0 read path. Typically a fresh session
or new request that should surface the stored memory.
read_assert — substring, regex=..., or jsonpath=<expr>=<value>
that must appear in read_call's stdout. This is the E2E pass gate.
Execution order:
- Allocate an ephemeral TCP port; export as
PORT.
- Set
MEM0_USER_ID to a disposable mem0-test-integration-<rand> value
and export it, so the app can use the same scoping the smoke test does
if the recipe wants cleanup.
- Bring up
compose_services if named.
- Run
start in the background.
- Poll
ready_probe until success or 60s timeout. Timeout → fail.
- Run
write_call. Non-zero exit → fail (but continue to cleanup).
- Sleep
write_async_wait_ms.
- Run
read_call.
- Evaluate
read_assert against read_call's stdout. Miss → fail.
- Cleanup (always, even on failure): SIGTERM the app, SIGKILL after
5s,
docker compose down if services were started, delete_all
memories matching mem0-test-integration-* on Platform scenarios.
On any failure, the scorecard includes:
- Last 40 lines of
e2e-app.log
- Full
write_call output
- Full
read_call output
- The expected vs actual for
read_assert
7. Scorecard
Write .mem0-integration/scorecard.md and .mem0-integration/scorecard.json:
{
"timestamp": "2026-04-20T14:03:11Z",
"branch": "mem0-integrate/remember-user-preferences",
"product": "platform",
"language": "python",
"mem0_version": "2.0.0",
"non_invasive": true,
"feature_flag": "MEM0_ENABLED",
"results": {
"install": {"status": "pass", "duration_ms": 12043},
"static_checks":{"status": "pass", "duration_ms": 812},
"unit_tests_flag_off": {"status": "pass", "duration_ms": 3920, "count": 47,
"reason": "all pre-existing tests green with flag unset"},
"unit_tests_flag_on": {"status": "pass", "duration_ms": 4321, "count": 49},
"smoke_test": {"status": "pass", "duration_ms": 2890, "memory_id": "mem_..."},
"e2e_test": {"status": "pass", "duration_ms": 14200,
"ready_probe_ms": 3100, "write_exit": 0,
"read_assert_matched": true}
},
"friction": {
"dependency_install_retries": 0,
"pre_existing_test_failures": 0,
"warnings": ["mem0ai 2.0.0 pinned; consider 2.0.1 for fix X"]
},
"overall": "pass"
}
The markdown version is human-readable and includes:
- Goal doc + plan doc reprinted at top (so reviewers don't have to hunt).
- Each check with pass/fail + log excerpt.
- Friction summary.
- Verbatim warnings from mem0 SDK (if any — e.g., deprecated field usage).
- Explicit "NOT checked" section listing what loose coupling misses:
"Whether the stored data is what the user wants stored. Whether search
runs at the right moment. Whether user_id matches the actual session
scope. Human review required."
8. Report + exit
- Print the scorecard path + overall pass/fail to stdout.
- Do not commit the scorecard files. They live in
.mem0-integration/,
which is gitignored. The user can inspect and optionally pin.
- On fail: print the first failing step's log tail (last 40 lines) and
stop. Do not attempt to fix anything.
Artifacts (all under .mem0-integration/)
| File | Purpose | Retention |
|---|
scorecard.md | Human-readable verdict. | Overwritten per run. |
scorecard.json | Machine-readable verdict. Consumed by the CI scorecard workflow later. | Overwritten per run. |
test-stdout-flag-off.log | Step 4 Pass A (pre-existing suite, flag unset). | Overwritten per run. |
test-stdout-flag-on.log | Step 4 Pass B (full suite, flag set). | Overwritten per run. |
smoke-stdout.log | Full output from step 5. | Overwritten per run. |
e2e-app.log | Background app stdout/stderr from step 6. | Overwritten per run. |
e2e-calls.log | write_call + read_call invocations and outputs. | Overwritten per run. |
Modes
| Mode | Trigger | Behavior |
|---|
| Interactive (default) | TTY present, MEM0_TEST_CI unset | Asks for missing keys, prints friendly summaries. |
| CI | MEM0_TEST_CI=1 | Keys must be in env, no prompts, non-zero exit on any fail. JSON scorecard goes to stdout's tail for workflow parsing. |
Invocation
/mem0-test-integration # interactive, all steps
/mem0-test-integration --ci # non-interactive
/mem0-test-integration --skip-smoke # no API calls, no E2E
/mem0-test-integration --skip-e2e # unit + smoke only (faster CI)
/mem0-test-integration --only-smoke # just smoke
/mem0-test-integration --only-e2e # just E2E (assumes deps installed)
Composition: --skip-* can stack (--skip-smoke --skip-e2e = static +
unit only, zero API cost). --only-* is mutually exclusive with all
other flags.
Exit codes
| Code | Meaning |
|---|
| 0 | All checks passed. |
| 1 | Precondition failed (no .mem0-integration/, wrong branch, dirty tree). |
| 2 | Missing env key (CI mode) or dependency install failure. |
| 3 | Static sanity check failed (wrong import, type error). |
| 4 | Unit tests failed (Pass B — integration itself broken). |
| 5 | Smoke test failed. |
| 6 | E2E test failed (ready_probe timeout, write/read call failed, or read_assert miss). |
| 7 | Non-invasiveness violation: Pass A failed (pre-existing tests broke). Integrator's heal loop refuses to touch this. |
| 8 | Internal error (skill bug — report it). |
Explicitly out of scope
- Modifying source files. The skill is read-only against the repo.
If verification exposes a bug, re-run
/mem0-integrate on the same
goal + plan; do not hand-patch.
- Fixing broken tests. Failing unit tests are a signal that the
integration is wrong, not that the tests are wrong. The skill does
not "try a different test."
- Deep logical correctness. The E2E step proves "something the user
said earlier comes back later," which is a useful but shallow signal.
It does NOT prove the integration picks the right facts to store,
scopes
user_id correctly across real users, or handles conflict
resolution well. That's human review territory.
- Self-healing. This skill never modifies source files. The paired
/mem0-integrate skill in its default --heal mode consumes the
scorecard produced here and drives its own remediation loop. Exit
code 7 (non-invasiveness violation) is the explicit signal the heal
loop must stop and surface to the user.
- Cross-branch comparisons. No
main baseline diffing. The
scorecard reflects this branch only.
- Running against production data. Every smoke test uses a disposable
random user_id and cleans up after. Never touches any other user's data.