| name | build-afm-nightly-publish |
| description | Build, test, and publish an afm-next nightly release — full from-scratch build, user testing pause, GitHub release, and Homebrew tap update. Use when user types /build-afm-nightly-publish or asks to publish a nightly build. |
| user_invocable | true |
Build & Publish AFM Nightly
Build afm from scratch (works from a fresh clone), let the user test it, then publish a GitHub pre-release and update the Homebrew tap.
Usage
/build-afm-nightly-publish — full pipeline: build + test + publish
/build-afm-nightly-publish --skip-build — skip build, use existing release binary
Prerequisites
The publish script (Scripts/publish-next.sh) requires:
gh CLI authenticated with push access to scouzi1966/maclocal-api
homebrew-afm repo at ../homebrew-afm (relative to repo root) or TAP_DIR env var — auto-cloned if missing
vesta-mac repo at ../vesta-mac (relative to repo root) or VESTA_DIR env var — required for PEP 503 wheel index update. Auto-cloned if missing.
- All build prerequisites from
/build-afm (Xcode, Swift, Node.js, etc.)
promptfoo CLI (npm install -g promptfoo or npx promptfoo) — required for the promptfoo agentic eval suite
wrangler CLI (npm install -g wrangler) — required for Cloudflare Pages deploy of wheel index
Instructions
Step 1: Validate Environment
Run these checks and present results to the user:
uname -m
sw_vers -productVersion
xcode-select -p
swift --version
git --version
node --version
npm --version
command -v promptfoo && promptfoo --version 2>/dev/null | head -1
gh auth status
GH_USER=$(gh api user -q .login)
echo "GitHub user: $GH_USER"
gh api repos/scouzi1966/maclocal-api -q '.permissions.push'
gh api repos/scouzi1966/homebrew-afm -q '.permissions.push'
TAP_DIR="${TAP_DIR:-$(cd "$(git rev-parse --show-toplevel)/.." && pwd)/homebrew-afm}"
if [ ! -f "$TAP_DIR/afm-next.rb" ]; then
echo "Tap repo missing at $TAP_DIR — cloning..."
gh repo clone scouzi1966/homebrew-afm "$TAP_DIR"
fi
test -f "$TAP_DIR/afm-next.rb" && echo "Tap OK: $TAP_DIR" || echo "FAILED to clone tap repo"
VESTA_DIR="${VESTA_DIR:-$(cd "$(git rev-parse --show-toplevel)/.." && pwd)/vesta-mac}"
if [ ! -d "$VESTA_DIR/.git" ]; then
echo "vesta-mac repo missing at $VESTA_DIR — cloning..."
gh repo clone scouzi1966/vesta-mac "$VESTA_DIR"
fi
test -d "$VESTA_DIR/.git" && echo "vesta-mac OK: $VESTA_DIR" || echo "FAILED to clone vesta-mac"
Present as a checklist. If the GitHub user is not scouzi1966 or push access is false for either repo, STOP immediately and tell the user:
This skill publishes releases and updates the Homebrew tap for scouzi1966/maclocal-api.
Only the repository owner (scouzi1966) can run it. You are authenticated as: <username>
Do NOT proceed unless: (1) all build checks pass, (2) GitHub user has push access to BOTH repos, (3) tap repo is available.
Step 2: Build from Scratch (True Clean Build)
A nightly release must be built from a completely clean state. swift package clean is NOT sufficient — it leaves behind cached modules, package resolution state, and precompiled headers that can mask stale code:
| Cached artifact | Location | What swift package clean does |
|---|
| Compiled .o/.swiftmodule | .build/arm64-apple-macosx/release/ | Removes |
| Module cache (PCM/PCH) | .build/arm64-apple-macosx/release/ModuleCache/ | Keeps (~400MB) |
| Cloned SPM dependencies | .build/repositories/ | Keeps (~300MB) |
| Package resolution lock | .build/workspace-state.json | Keeps |
| Xcode DerivedData | ~/Library/Developer/Xcode/DerivedData/*maclocal* | Keeps (if exists) |
Before running the build script, nuke all cached state:
rm -rf .build
rm -rf ~/Library/Developer/Xcode/DerivedData/*maclocal* \
~/Library/Developer/Xcode/DerivedData/*MacLocal* \
~/Library/Developer/Xcode/DerivedData/*afm* 2>/dev/null || true
test -d .build && echo "FAIL: .build still exists" || echo "OK: .build removed"
Then run the full build:
./Scripts/build-from-scratch.sh
IMPORTANT: Never add --skip-submodules, --skip-patches, or --skip-webui. This is a release build — everything must be from scratch.
Why this matters: Stale ModuleCache can cause the compiler to use old .swiftmodule files from a previous build, meaning your patches compile but the binary links against the cached (unpatched) version. Stale workspace-state.json can resolve a different version of MLX Swift than what the pin specifies. Both failures are silent — the build succeeds, the binary runs, but behavior is wrong.
If the user passed --skip-build, skip the clean and build steps, but still run all Step 2b verification checks against the existing binary:
test -x .build/arm64-apple-macosx/release/afm || test -x .build/release/afm
Step 2b: Post-Build Verification ("What Could Go Wrong")
The build script reports success, but do not trust its output alone. Independently verify every critical artifact. The build script could succeed (exit 0) while:
- Patches silently failed to apply (vendor reverted by
git submodule update)
- xgrammar compiled but wasn't linked (missing from Package.swift targets)
- MLX Swift resolved a wrong version (pin not applied to Package.swift)
- Metallib bundle missing (Metal shaders won't load at runtime → crash)
- WebUI assets missing (llama.cpp web interface won't serve)
- BuildInfo.swift not restored (leaves dirty working tree)
Run all of these checks. Present results as a table. If ANY check fails, STOP and investigate before proceeding.
Check 1: Patches byte-identical to vendor targets
The patch script says "Applied" but git submodule update can silently revert files. Verify every patch file is byte-for-byte identical to its vendor target using the actual arrays from Scripts/apply-mlx-patches.sh:
python3 -c "
import os
# These arrays MUST match Scripts/apply-mlx-patches.sh — if they drift, the check is wrong.
# Read them from the script itself to stay in sync.
patches = [
('Qwen3VL.swift','Libraries/MLXVLM/Models/Qwen3VL.swift'),
('Qwen3Next.swift','Libraries/MLXLLM/Models/Qwen3Next.swift'),
# ... all 20 entries from PATCH_FILES/TARGET_PATHS arrays ...
]
ok = fail = 0
for pf, tp in patches:
src, tgt = f'Scripts/patches/{pf}', f'vendor/mlx-swift-lm/{tp}'
if not os.path.exists(tgt):
print(f'MISSING: {pf} -> {tp}'); fail += 1
else:
with open(src,'rb') as a, open(tgt,'rb') as b:
if a.read() == b.read():
print(f'MATCH: {pf}'); ok += 1
else:
print(f'MISMATCH: {pf}'); fail += 1
print(f'\n{ok}/{ok+fail} patches verified')
"
Why this matters: If even one patch is stale, the compiled binary has upstream code instead of our optimized/fixed version. This has happened when git submodule update --init --recursive runs AFTER apply-mlx-patches.sh — it silently reverts patches.
Check 2: MLX Swift pinned AND resolved to exact version
grep 'mlx-swift.*exact' vendor/mlx-swift-lm/Package.swift
python3 -c "
import json
d = json.load(open('Package.resolved'))
for p in d.get('pins', []):
if 'mlx' in p.get('identity','').lower():
print(f'{p[\"identity\"]}: {p[\"state\"].get(\"version\",\"?\")}')
"
Why this matters: The pin in Package.swift is a request, but Package.resolved is what was actually fetched and compiled against. A stale workspace-state.json or Package.resolved from a previous build can cause SPM to use a cached resolution even after the pin changes. Nuking .build/ in Step 2 prevents this, but verify anyway.
Check 3: xgrammar submodule present and at expected version
git submodule status vendor/xgrammar
cd vendor/xgrammar && git describe --tags --always && cd -
Why this matters: xgrammar is a C++ library compiled from source. If the submodule is missing or at the wrong version, the EBNF grammar constraint feature either doesn't exist or has different behavior.
Check 4: xgrammar symbols linked into the binary
strings .build/arm64-apple-macosx/release/afm | grep -c 'xgrammar/cpp/'
strings .build/arm64-apple-macosx/release/afm | grep 'XGrammarService'
nm -a .build/arm64-apple-macosx/release/afm 2>/dev/null | grep -c 'xgrammar'
Why this matters: xgrammar could be in the source tree but excluded from the Swift Package Manager target graph. The binary would build fine but grammar-constrained decoding would silently fail at runtime.
Check 5: Metallib bundle present
METALLIB=".build/arm64-apple-macosx/release/MacLocalAPI_MacLocalAPI.bundle/default.metallib"
test -f "$METALLIB" && echo "OK: metallib $(du -h "$METALLIB" | cut -f1)" || echo "FAIL: metallib missing"
Why this matters: Without the metallib, MLX GPU kernels can't load. The server starts but crashes on first inference. The build script checks this, but verify independently.
Check 6: WebUI assets present
test -f "Resources/webui/index.html.gz" && echo "OK: webui assets" || echo "FAIL: webui missing"
Why this matters: The llama.cpp web UI is served at / — without it, browser access shows nothing.
Check 7: BuildInfo.swift is clean (not left with injected SHA)
grep 'static let version' Sources/MacLocalAPI/BuildInfo.swift
git diff Sources/MacLocalAPI/BuildInfo.swift
Why this matters: The build script injects the git SHA into BuildInfo.swift during compilation then restores it. If restore fails, the working tree is dirty and the next git commit could accidentally commit the injected version.
Check 8: Binary is stripped and reasonable size
ls -lh .build/arm64-apple-macosx/release/afm
nm -gU .build/arm64-apple-macosx/release/afm 2>/dev/null | wc -l
Check 9: Relocated binary does NOT crash (pip install simulation)
This is the most critical distribution check. SPM auto-generates resource_bundle_accessor.swift with a hardcoded absolute build path. If any code path calls Bundle.module, the binary will fatalError when installed via pip or Homebrew (because the build path no longer exists). This has shipped broken nightlies before.
TMPDIR=$(mktemp -d)
cp .build/arm64-apple-macosx/release/afm "$TMPDIR/"
cp .build/arm64-apple-macosx/release/MacLocalAPI_MacLocalAPI.bundle/default.metallib "$TMPDIR/"
MACAFM_MLX_MODEL_CACHE=/Volumes/edata/models/vesta-test-cache \
"$TMPDIR/afm" mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -s "hello" --max-tokens 5 2>&1 | head -3
EXIT_CODE=${PIPESTATUS[0]}
rm -rf "$TMPDIR"
if [ "$EXIT_CODE" -ne 0 ]; then
echo "FAIL: Relocated binary crashed (exit $EXIT_CODE)"
echo "This means Bundle.module fatalError is still reachable."
echo "Check MLXMetalLibrary.swift — it must NOT call Bundle.module."
else
echo "PASS: Relocated binary works"
fi
Why this matters: This is the exact layout pip creates: macafm_next/bin/afm + macafm_next/bin/default.metallib. If this test fails, every pip user gets a crash on first run. This check is non-negotiable — if it fails, do NOT publish.
Root cause if it fails: Someone added a Bundle.module call somewhere in the codebase. Search for it:
grep -r 'Bundle\.module' Sources/ --include='*.swift'
Check 10: No Bundle.module calls in source code
HITS=$(grep -r 'Bundle\.module' Sources/ --include='*.swift' | grep -v '//' | grep -v '^\s*//' | wc -l)
if [ "$HITS" -gt 0 ]; then
echo "FAIL: Found $HITS Bundle.module call(s) in source code"
grep -rn 'Bundle\.module' Sources/ --include='*.swift' | grep -v '//'
echo "This WILL crash when installed via pip or Homebrew."
else
echo "PASS: No Bundle.module calls"
fi
Why this matters: Even a single Bundle.module call anywhere in the code path triggers the auto-generated fatalError. This is a regression guard — any future code that adds Bundle.module will be caught here before it ships.
Check 11: Info.plist embedded with privacy usage descriptions
macOS 26 SIGABRTs any process that requests privacy-sensitive APIs (Speech Recognition, microphone, camera, contacts, etc.) without a matching *UsageDescription key in its Info.plist. Currently required for PR #107's Apple Speech feature (afm speech, POST /v1/audio/transcriptions, chat input_audio content parts). Any future privacy-API integration needs its key added here too.
BIN=.build/arm64-apple-macosx/release/afm
if otool -l "$BIN" | grep -q '__info_plist'; then
PLIST_SIZE=$(otool -l "$BIN" | grep -A4 __info_plist | grep 'size' | awk '{print $2}')
echo "PASS: __info_plist section present ($PLIST_SIZE bytes)"
else
echo "FAIL: Missing __TEXT,__info_plist section"
echo "Check Package.swift linker flags (-Xlinker -sectcreate ...) and Sources/MacLocalAPI/Info.plist"
fi
if strings "$BIN" | grep -q 'NSSpeechRecognitionUsageDescription'; then
echo "PASS: NSSpeechRecognitionUsageDescription key embedded"
else
echo "FAIL: NSSpeechRecognitionUsageDescription missing from embedded plist"
echo "afm speech / /v1/audio/transcriptions will SIGABRT on macOS 26"
fi
plutil -lint Sources/MacLocalAPI/Info.plist
Why this matters: Without the embedded plist, running afm speech -f foo.wav (or any endpoint that calls SFSpeechRecognizer) crashes before returning any output. The build script Scripts/build-from-scratch.sh already enforces this — but also check here so the publish flow fails loudly if someone bypasses the build script or if Package.swift's linker flags get reverted in a merge.
Root cause if it fails:
Sources/MacLocalAPI/Info.plist was deleted or renamed
- Package.swift's
linkerSettings lost the -Xlinker -sectcreate -Xlinker __TEXT -Xlinker __info_plist -Xlinker … flags
- Someone added a new privacy-API usage (microphone, camera) without adding the corresponding
*UsageDescription key to Info.plist
Full plist must also include CFBundleIdentifier, CFBundleName, CFBundleExecutable — these establish TCC identity. Changing CFBundleIdentifier later would force existing users to re-grant Speech Recognition permission.
Check 12: Report all vendor/submodule pin levels
Present the exact version of every submodule and SPM dependency so the user can verify the build reproduces the expected dependency tree. Also fetch the latest release tag from each upstream repo to show if we're behind.
echo "=== Git Submodules ==="
git submodule status
echo "=== SPM Resolved Versions ==="
python3 -c "
import json
d = json.load(open('Package.resolved'))
for p in sorted(d.get('pins', []), key=lambda x: x.get('identity','')):
v = p['state'].get('version') or p['state'].get('revision','?')[:12]
print(f' {p[\"identity\"]}: {v}')
"
echo "=== Upstream Latest Releases ==="
for repo in ml-explore/mlx-swift ml-explore/mlx-swift-lm mlc-ai/xgrammar ggml-org/llama.cpp huggingface/swift-transformers huggingface/swift-huggingface; do
tag=$(gh api "repos/$repo/releases/latest" -q '.tag_name' 2>/dev/null || echo "?")
echo " $repo: $tag"
done
This is informational — no pass/fail. But if a resolved version is unexpected (e.g., mlx-swift != 0.30.3), STOP.
Present verification results
| # | Check | What could go wrong | Result |
|---|
| 1 | Patches byte-identical (N/N) | submodule update reverted patches | PASS/FAIL |
| 2 | MLX Swift pin + resolved 0.30.3 | stale resolution → SDPA NaN crashes | PASS/FAIL |
| 3 | xgrammar at expected tag | missing submodule → no grammar constraints | PASS/FAIL |
| 4 | xgrammar linked in binary | compiled but not linked → silent runtime failure | PASS/FAIL |
| 5 | Metallib bundle present | missing → crash on first inference | PASS/FAIL |
| 6 | WebUI assets present | missing → no browser UI | PASS/FAIL |
| 7 | BuildInfo.swift clean | dirty working tree → accidental commit | PASS/FAIL |
| 8 | Binary stripped, reasonable size | unstripped → bloated download | PASS/FAIL |
| 9 | Relocated binary works (pip sim) | Bundle.module fatalError → crash on pip install | PASS/FAIL |
| 10 | No Bundle.module in source | regression guard → future crash on relocated binary | PASS/FAIL |
| 11 | Info.plist embedded + NSSpeechRecognitionUsageDescription | macOS 26 SIGABRTs Speech Recognition without UsageDescription key | PASS/FAIL |
Then present two separate tables for vendor pins:
Git Submodules:
| Submodule | Source | Pinned Commit | Upstream Latest | Notes |
|---|
vendor/mlx-swift-lm | Submodule | git submodule status hash + tag | gh api repos/.../releases/latest | Our patched fork |
vendor/xgrammar | Submodule | git submodule status hash + tag | gh api repos/.../releases/latest | C++ grammar engine |
vendor/llama.cpp | Submodule | git submodule status hash + tag | gh api repos/.../releases/latest | WebUI only |
SPM Dependencies (from Package.resolved):
| Package | Source | Resolved Version | Upstream Latest | Notes |
|---|
mlx-swift | SPM (exact pin) | Package.resolved version | gh api repos/.../releases/latest | 0.30.4+ has SDPA NaN — pinned to exact 0.30.3 |
swift-transformers | SPM (from) | Package.resolved version | gh api repos/.../releases/latest | Tokenizer/chat templates |
swift-huggingface | SPM (from) | Package.resolved version | gh api repos/.../releases/latest | HF hub downloads |
swift-jinja | SPM (transitive) | Package.resolved version | — | Jinja2 template engine |
vapor | SPM (from) | Package.resolved version | — | HTTP framework |
Populate the "Upstream Latest" column by querying gh api repos/OWNER/REPO/releases/latest -q '.tag_name'. This lets the user see at a glance if we're behind upstream on any dependency.
If ANY check fails, STOP. Do not proceed to user testing or publishing.
Step 3: Present Binary and Enter Test/Fix/Rebuild Loop
After all verification checks pass, get the binary path and version:
BIN=".build/arm64-apple-macosx/release/afm"
[ -x "$BIN" ] || BIN=".build/release/afm"
echo "Binary: $(cd "$(dirname "$BIN")" && pwd)/$(basename "$BIN")"
$BIN --version
Report to the user:
- Binary path (absolute)
- Version string
- Verification results table (from Step 2b)
Then use AskUserQuestion to pause and let the user decide what to do next:
Question: "The build is verified. What would you like to do?"
Options:
- "Publish as-is" — Skip testing, go straight to GitHub release and tap update
- "Run tests" — Run automated tests, then decide (see test scope question below)
- "I'll test manually" — Pause here while the user tests the binary themselves
- "Cancel" — Abort without publishing
If user selects "Run tests"
First, list available models in the cache and let the user pick:
MACAFM_MLX_MODEL_CACHE=/Volumes/edata/models/vesta-test-cache ./Scripts/list-models.sh
Question: "Which model to test with?"
Present the available models as options (show model name and size). The user picks one.
Then ask the test scope:
Question: "Which tests to run?"
Options:
- "Assertions only (all tiers including unit)" — Run
/test-afm-assertions with full tier (deterministic pass/fail tests, ~15 min/model)
- "Comprehensive only" — Run
/test-macafm smart analysis (AI-scored quality evaluation)
- "Both" — Run assertions first, then comprehensive (most thorough, ~30 min/model)
- "Full nightly suite" — Run assertions + comprehensive + promptfoo agentic evals (most complete, ~60 min)
Invoke the appropriate skill(s) with the selected model. Do NOT re-ask the model question — pass it through to the test skill(s).
If test scope includes promptfoo
Run the full promptfoo agentic eval suite after assertions/comprehensive complete:
AFM_MODEL=MODEL \
AFM_BINARY=.build/arm64-apple-macosx/release/afm \
MACAFM_MLX_MODEL_CACHE=/Volumes/edata/models/vesta-test-cache \
./Scripts/feature-promptfoo-agentic/run-promptfoo-agentic.sh all
This manages its own server lifecycle (starts/stops across 8 server profiles) and runs ~137 test cases across 16 configs:
- Structured (6+4 tests):
json_schema response format and stress tests
- Tool calling (7 tests × 3 profiles): default, adaptive-xml, adaptive-xml-grammar
- Tool call quality (6 tests × 3 profiles): BFCL-inspired when-to-call decisions
- Grammar constraints (17 tests × 8 server phases): schema/tools enforcement, concurrent, prefix-cache, mixed-strict, header assertions
- Agentic (4 tests × 3 profiles): Multi-turn coding workflows
- Frameworks (8 tests × 3 profiles): Agent framework tool schemas (OpenCode, Pi, OpenClaw, Hermes shapes)
- OpenCode (37 tests × 3 profiles): Primary-source OpenCode built-in tools
- PI (20 tests × 3 profiles): Pi coding-agent tools
- OpenClaw (12 tests × 3 profiles): OpenClaw tool coverage
- Hermes (12 tests × 3 profiles): Hermes agentic framework tools
Output: JSON reports in $AFM_PROMPTFOO_OUT_DIR (default: /Volumes/edata/promptfoo/data/maclocal-api/current/).
Interpreting promptfoo results:
- Server-side features (structured, toolcall, grammar schema/tools/headers): Should be 100% pass. Failures here indicate server bugs.
- Concurrent grammar failures: Known race condition in
--concurrent 2 grammar path — not a release blocker.
- OpenCode/PI/agentic failures: Model quality at task complexity — the model can't always pick the right tool or produce correct arguments for complex multi-tool scenarios. Not server bugs.
- adaptive-xml profile failures scoring lower than default: The adaptive-xml parser produces slightly different formatting that quality judges may score lower. Compare with default profile to confirm it's not a regression.
To extract pass/fail summary from all result files:
python3 -c "
import json, os, glob
files = sorted(glob.glob('$AFM_PROMPTFOO_OUT_DIR/*MODEL_SLUG*.json'))
total_pass = total_fail = 0
for f in files:
name = os.path.basename(f).replace('-MODEL_SLUG.json','')
d = json.load(open(f))
stats = d.get('results', d).get('stats', {})
p, fa = stats.get('successes', 0), stats.get('failures', 0)
total_pass += p; total_fail += fa
status = 'PASS' if fa == 0 else f'FAIL ({fa})'
print(f'{name}: {p}/{p+fa} — {status}')
print(f'\nTOTAL: {total_pass}/{total_pass+total_fail}')
"
After tests complete, present results and ask:
Question: "Tests complete. What next?"
Options:
- "Publish" — Results are acceptable, proceed to release
- "Fix and rebuild" — There are issues to fix before releasing
- "Cancel" — Abort
If user selects "I'll test manually"
Wait for the user to come back. When they do, ask:
Question: "Ready to proceed?"
Options:
- "Publish" — Testing passed, proceed to release
- "Fix and rebuild" — There are issues to fix before releasing
- "Cancel" — Abort
If user selects "Fix and rebuild" (from any path above)
The user will make code changes (or ask you to). After changes are made:
- Re-run Step 2 (full clean build:
rm -rf .build + ./Scripts/build-from-scratch.sh)
- Re-run Step 2b (all 8 verification checks)
- Return to Step 3 (present binary and ask again)
This loop repeats until the user selects "Publish" or "Cancel". Each iteration is a full clean rebuild — never do an incremental build for a release.
Version and changelog selection
Before publishing, ask the user two questions via AskUserQuestion:
Question 1 — Version: Determine the suggested version by reading Sources/MacLocalAPI/BuildInfo.swift and extracting the version (strip leading v). Present it to the user:
"Release version? The base version from BuildInfo.swift is X.Y.Z. The full nightly version will be X.Y.Z-next.<sha>.<date>."
Options:
- "X.Y.Z (from BuildInfo.swift)" — Use the version from BuildInfo.swift (recommended)
- "Custom version" — Enter a different base version
Question 2 — Changelog since: Show both the last nightly tag AND the last stable release tag so the user can choose the right baseline:
LAST_NIGHTLY=$(git tag -l 'nightly-*' --sort=-creatordate | head -1)
if [ -n "$LAST_NIGHTLY" ]; then
NIGHTLY_DATE=$(git log -1 --format='%ci' "$LAST_NIGHTLY" 2>/dev/null | cut -d' ' -f1)
NIGHTLY_COUNT=$(git rev-list "${LAST_NIGHTLY}..HEAD" --count 2>/dev/null)
echo "Last nightly: $LAST_NIGHTLY ($NIGHTLY_DATE) — $NIGHTLY_COUNT commits since"
fi
LAST_STABLE=$(git tag -l 'v*' --sort=-version:refname | grep -v 'nightly\|next' | head -1)
if [ -n "$LAST_STABLE" ]; then
STABLE_DATE=$(git log -1 --format='%ci' "$LAST_STABLE" 2>/dev/null | cut -d' ' -f1)
STABLE_COUNT=$(git rev-list "${LAST_STABLE}..HEAD" --count 2>/dev/null)
echo "Last stable: $LAST_STABLE ($STABLE_DATE) — $STABLE_COUNT commits since"
fi
echo "--- Commits since last nightly ---"
git log --oneline "${LAST_NIGHTLY}..HEAD" 2>/dev/null
Present both reference points and ask:
"Generate changelog from which point?"
Options:
- "Since last nightly
<tag> (N commits)" — Default for routine nightlies (incremental changelog)
- "Since last stable release
<tag> (N commits)" — Use for the first nightly after a stable release, or when you want the full delta since the last official version
- "Custom commit SHA" — Enter a specific commit SHA
Guidance for which to pick:
- Routine nightly (there have been nightlies since the last stable release): use "since last nightly" — the changelog shows only what's new since the previous nightly
- First nightly after a stable release (no nightlies since last
v* tag): use "since last stable release" — the changelog shows everything new in this development cycle
- No previous tags at all: use "Custom commit SHA" or omit
--since entirely (the script will include all commits)
Changelog filtering — exclude reverted/superseded commits: When reviewing the commit list for the changelog, omit commits whose work was later removed or fully replaced. For example, if a commit adds a Python bridge and a later commit removes it, neither should appear in the release notes — the net effect is zero. Only include commits that contribute to the final state of the codebase at HEAD. The publish-next.sh script includes all commits mechanically; you should review and curate the changelog before presenting it to the user.
If the user selects "since last stable release", pass --since <stable-tag-sha> to the publish script.
If the user provides a custom SHA, pass --since <sha>.
If the user selects "since last nightly", no --since flag is needed (the script defaults to this).
Do NOT proceed to Step 4 unless the user selects "Publish".
Step 4: Publish Release
Run the publish script with --skip-build (already built in Step 2), the confirmed version, and optional --since:
./Scripts/publish-next.sh --skip-build --version <confirmed-version>
./Scripts/publish-next.sh --skip-build --version <confirmed-version> --since <commit-sha>
This script handles everything:
- Packages the binary + metallib bundle + webui into
afm-next-arm64.tar.gz
- Generates changelog from commits since the last
nightly-* release (with both Homebrew and pip install instructions in the release notes)
- Creates a GitHub pre-release tagged
nightly-YYYYMMDD-SHORTSHA
- Updates the
nightly tag to point to HEAD
- Updates
afm-next.rb in the homebrew-afm tap (url, version, sha256)
- Commits and pushes the tap update
- Builds a nightly wheel (
macafm-next) via Scripts/build-nightly-wheel.sh
- Uploads the wheel to the GitHub release and updates the PEP 503 index on kruks.ai via
Scripts/update-wheel-index.sh (requires vesta-mac repo at ../vesta-mac and wrangler for Cloudflare Pages deploy)
Step 4b: Update README Release Link
After publishing, update the nightly release notes link in README.md to point to the new release tag:
- Read
README.md and find the nightly release notes link in the Install table
- Replace the old
nightly-* tag in the URL with the new release tag (e.g., nightly-20260312-a49c207)
- If the base version changed, also update the link text (e.g.,
v0.9.7-next → v0.9.8-next)
- Commit with message:
Update nightly release link to YYYYMMDD-SHORTSHA
- Push to remote
Do not skip this step. The README is the main page users see — it must always point to the latest nightly.
Step 5: Verify & Report
After the publish script completes, verify and report:
SHORT_SHA=$(git rev-parse --short HEAD)
DATE=$(date -u +%Y%m%d)
RELEASE_TAG="nightly-${DATE}-${SHORT_SHA}"
gh release view "$RELEASE_TAG" --repo scouzi1966/maclocal-api --json tagName,url,assets -q '.url'
TAP_DIR="${TAP_DIR:-$(cd "$(git rev-parse --show-toplevel)/.." && pwd)/homebrew-afm}"
grep 'version "' "$TAP_DIR/afm-next.rb"
Report to the user:
Step 5b: Archive Test Results
Copy all test reports from this nightly run to the versioned nightly archive:
DATE=$(date +%Y-%m-%d)
mkdir -p test-reports/nightly/$DATE
cp test-reports/assertions-report-*.html test-reports/assertions-report-*.jsonl \
test-reports/multi-assertions-report-*.html test-reports/multi-assertions-report-*.jsonl \
test-reports/nightly/$DATE/ 2>/dev/null || true
AFM_PROMPTFOO_OUT_DIR="${AFM_PROMPTFOO_OUT_DIR:-/Volumes/edata/promptfoo/data/maclocal-api/current}"
cp "$AFM_PROMPTFOO_OUT_DIR"/*.json test-reports/nightly/$DATE/ 2>/dev/null || true
cp test-reports/smart-analysis-*.md test-reports/nightly/$DATE/ 2>/dev/null || true
git add test-reports/nightly/$DATE/
git commit -m "Add nightly test results for $DATE ($(git rev-parse --short HEAD))"
git push
This maintains the test history at test-reports/nightly/YYYY-MM-DD/ for cross-nightly comparison.
Error Handling
- Build failure: Show error output, suggest running
/build-afm first to diagnose
- Verification failure (Step 2b): Do not proceed. Investigate the specific check that failed. If patches are stale, re-run
./Scripts/apply-mlx-patches.sh. If resolution is wrong, rm -rf .build and rebuild.
- Test failures (Step 3): Present the failures and let the user decide: fix and rebuild, publish anyway, or cancel. Do NOT automatically fix test failures — that's the user's decision.
- gh release create failure: Check
gh auth status, check if tag already exists (gh release view <tag>)
- Tap push failure: Check if
../homebrew-afm is on the right branch and has no uncommitted changes
- User cancels at any point: Clean exit, no publish. The built binary remains available for manual use.