| name | ops-logs-query |
| description | Internal — for Boundless team members only. Query AWS CloudWatch logs for Boundless services (provers, slasher, distributor, order stream, order generator, indexer, signal) on prod/staging environments. Use when the user asks to look at service logs, debug service behavior from log output, search logs for a request ID, or investigate errors using CloudWatch. Do NOT use for debugging local code changes, reviewing PRs, or investigating issues in the codebase itself. |
Logs Query
Query AWS CloudWatch Logs for Boundless services on prod/staging.
Prerequisites
-
Read network_secrets.toml from the repo root. Extract the AWS credentials for the target environment from [aws.prod] or [aws.staging] (access_key_id, secret_access_key). If the file is not present, recommend the user create it -- instructions and credentials are in the Boundless runbook.
-
Export credentials before running any queries:
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"
Finding Log Groups
Prover log groups
Prover log groups follow the pattern /boundless/bento/<hostname>. The hostnames are defined in the Pulumi config files:
- Staging:
infra/cw-monitoring/Pulumi.staging.yaml
- Prod:
infra/cw-monitoring/Pulumi.production.yaml
Read the relevant Pulumi config to find current hostnames. As of now:
| Log Group | Environment | Chain ID | Label (approx, in network_address_labels) | Description |
|---|
/boundless/bento/prover-84532-staging-nightly | staging | 84532 (Base Sepolia) | | Staging nightly prover |
/boundless/bento/prover-8453-prod-release | prod | 8453 (Base) | BPLatitudeRelease | Prod release prover. Legacy name: /boundless/bento/base-mainnet-prover-release |
/boundless/bento/prover-8453-prod-nightly | prod | 8453 (Base) | BPLatitudeNightly | Prod nightly prover. Legacy name: /boundless/bento/base-mainnet-prover-nightly |
/boundless/bento/prover-84532-prod-nightly | prod | 84532 (Base Sepolia) | | Prod Base Sepolia nightly prover |
/boundless/bento/prover-11155111-prod-nightly | prod | 11155111 (Eth Sepolia) | | Prod Eth Sepolia nightly prover |
/boundless/bento/prover-01 | prod | 8453 (Base) | BPProver01DC | Prod datacenter prover 01 |
/boundless/bento/prover-02 | prod | 8453 (Base) | BPProver02DC | Prod datacenter prover 02 |
The label column lists the network_address_labels.json label that the log group is believed to correspond to. Names may not match exactly -- always confirm against network_address_labels.json and Pulumi config before relying on the mapping.
If these are out of date, check the Pulumi config files for the current list.
We only have prover log groups for provers we operate. External provers do not have queryable logs.
Provers we operate are labeled with a BP prefix in network_address_labels.json (e.g. BP1, BP2, BPNightlyAWS). When investigating any issue, always highlight what our BP provers are doing -- did they skip, fail, drop, or fulfill? This should be called out explicitly even when the investigation is not specifically about our provers.
Discovering log groups for other services
Other services (slasher, distributor, order stream, order generator, indexer backend, indexer API, etc.) have log groups that follow a naming convention but may change. Discover them dynamically rather than hardcoding.
Log group naming convention: l-<staging|prod>-<chain_id>-<service-name>-<chain_id>-<resource>
Example: l-staging-167000-indexer-api-167000-lambda
Known chain IDs:
84532 = Base Sepolia
8453 = Base Mainnet
167000 = Taiko Mainnet
11155111 = Eth Sepolia
1 = Eth Mainnet
To find log groups for a specific service, search by prefix. Use the environment (staging/prod) and optionally the chain ID and service name:
aws logs describe-log-groups \
--log-group-name-prefix "l-staging-84532" \
--query 'logGroups[].logGroupName' --output table
To find log groups for a specific service across all chains in an environment:
aws logs describe-log-groups \
--query 'logGroups[?contains(logGroupName, `staging`) && contains(logGroupName, `indexer`)].logGroupName' \
--output table
To list all log groups in an environment:
aws logs describe-log-groups \
--log-group-name-prefix "l-staging" \
--query 'logGroups[].logGroupName' --output table
Also check bento prover log groups:
aws logs describe-log-groups \
--log-group-name-prefix "/boundless/bento" \
--query 'logGroups[].logGroupName' --output table
Some services have multiple log groups for different components (e.g. an indexer may have separate groups for the backend worker and the API lambda). When investigating an issue, check all matching log groups.
Service name patterns
Common service name fragments to search for:
| Service | Search fragments |
|---|
| Indexer API | indexer-api |
| Indexer backend | indexer, market-indexer, rewards-indexer |
| Order stream | order-stream |
| Order generator | order-generator, og |
| Slasher | slasher |
| Distributor | distributor |
| Signal | prod-8453-signal (no l- prefix) |
| Prover (bento) | /boundless/bento/prover or /boundless/bento/*-prover-* |
Querying Logs
Always filter by time range. Log groups are high-volume and queries without time bounds will be slow or hit limits.
Use aws logs filter-log-events for searching. Key parameters:
--log-group-name: required
--start-time / --end-time: Unix milliseconds (required -- always set these)
--filter-pattern: CloudWatch filter syntax for searching log content
--output json: pipe through jq for readability
Computing timestamps
Convert human-readable times to Unix milliseconds:
START=$(date -j -u -f "%Y-%m-%dT%H:%M:%SZ" "2026-03-30T00:00:00Z" +%s 2>/dev/null || date -d "2026-03-30T00:00:00Z" +%s)
START_MS=$((START * 1000))
END=$(date -j -u -f "%Y-%m-%dT%H:%M:%SZ" "2026-03-31T00:00:00Z" +%s 2>/dev/null || date -d "2026-03-31T00:00:00Z" +%s)
END_MS=$((END * 1000))
For relative times:
NOW_MS=$(date +%s)000
ONE_HOUR_AGO_MS=$(( ($(date +%s) - 3600) * 1000 ))
SIX_HOURS_AGO_MS=$(( ($(date +%s) - 21600) * 1000 ))
ONE_DAY_AGO_MS=$(( ($(date +%s) - 86400) * 1000 ))
Searching by request ID
The most common query pattern. Request IDs appear in log messages as hex values (e.g. 0x2a):
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$ONE_HOUR_AGO_MS" \
--end-time "$NOW_MS" \
--filter-pattern '"0xREQUEST_ID"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
Searching by request digest
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$ONE_HOUR_AGO_MS" \
--end-time "$NOW_MS" \
--filter-pattern '"0xDIGEST"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
Searching for errors
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$ONE_HOUR_AGO_MS" \
--end-time "$NOW_MS" \
--filter-pattern '"ERROR"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
Searching across multiple log groups
When a service has multiple log groups, query each one:
for LG in "l-staging-84532-indexer-api-84532-lambda" "l-staging-84532-market-indexer-84532-task"; do
echo "=== $LG ==="
aws logs filter-log-events \
--log-group-name "$LG" \
--start-time "$ONE_HOUR_AGO_MS" \
--end-time "$NOW_MS" \
--filter-pattern '"ERROR"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
sleep 1
done
Pagination
filter-log-events returns a nextToken when there are more results:
TOKEN=""
while true; do
if [ -n "$TOKEN" ]; then
RESP=$(aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$START_MS" \
--end-time "$END_MS" \
--filter-pattern '"0xREQUEST_ID"' \
--next-token "$TOKEN" \
--output json)
else
RESP=$(aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$START_MS" \
--end-time "$END_MS" \
--filter-pattern '"0xREQUEST_ID"' \
--output json)
fi
echo "$RESP" | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
TOKEN=$(echo "$RESP" | jq -r '.nextToken // empty')
[ -z "$TOKEN" ] && break
sleep 1
done
CloudWatch Filter Pattern Syntax
"exact phrase" -- matches logs containing the exact phrase (quotes required)
?term1 ?term2 -- OR: matches logs containing either term
"term1" "term2" -- AND: matches logs containing both terms
"ERROR" "request_id" -- combine filters
Checking for Recent Deployments
When investigating fulfillment rate drops, prover downtime, or success rate alarms for provers we operate, always check for recent deployments first. Nightly deployments restart the bento Docker Compose stack and can cause extended outages if the new image is broken.
Deployment events appear in the bento prover log groups (e.g. /boundless/bento/prover-11155111-prod-nightly). Look for these patterns:
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$START_MS" \
--end-time "$END_MS" \
--filter-pattern '?"Stopping Docker Compose" ?"Starting Docker Compose"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
A deployment cycle looks like:
"Stopping Docker Compose services" — old containers torn down
"Image ghcr.io/boundless-xyz/boundless/broker:<tag> Pulling" — new image pulled (the tag contains the git commit, e.g. nightly-3b8a71f)
"Container bento-broker-1 Created" / "Starting" — new containers come up
- Optionally:
"dependency failed to start: container ... is unhealthy" — a container failed its healthcheck, cascading to broker failure
- If
"Starting Docker Compose" is missing or far behind "Stopping Docker Compose", the broker may be in graceful shutdown drain — search ?"starting graceful shutdown" ?"in-progress orders to complete" ?"Cancelling critical tasks". The broker waits up to 2h (SHUTDOWN_GRACE_PERIOD_SECS) for committed orders before exiting; during this window bento_active=0 and channel closed errors from the chain monitor are expected, not an outage.
Deployments are significant events -- they restart the broker (causing a brief gap in telemetry and fulfillments even when healthy) and deploy new code that could introduce bugs or behavior changes. Always note when a deployment occurred relative to the issue being investigated.
If the broker stopped fulfilling shortly after a deployment, check for:
- Healthcheck failures:
?"unhealthy" ?"failed to start" ?"Error dependency" — a dependency container (often rest_api) failed, preventing the broker from starting
- Container crashes:
?"exit" ?"Exited" ?"Restarting" — the broker or a dependency crashed after startup
- Image tag: compare the deployed image tag (git commit hash) against the git log to identify what changed
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$START_MS" \
--end-time "$END_MS" \
--filter-pattern '?"unhealthy" ?"failed to start" ?"Exited" ?"Error dependency"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
Secondary Fulfillment in Logs
When a prover locks an order but fails to fulfill it before the lock expires, the order becomes available for secondary fulfillment by any other prover, who earns the slash collateral as reward. In broker logs, secondary fulfillment attempts appear as FulfillAfterLockExpire entries. When investigating expired or slashed requests, search our BP prover logs for the request ID to see if they evaluated the secondary fulfillment opportunity:
aws logs filter-log-events \
--log-group-name "$LOG_GROUP" \
--start-time "$START_MS" \
--end-time "$END_MS" \
--filter-pattern '"0xREQUEST_ID" "FulfillAfterLockExpire"' \
--output json | jq '.events[] | {timestamp: (.timestamp / 1000 | todate), message: .message}'
If the request ID doesn't appear at all, the prover never saw the secondary opportunity. If it appears with skip or error messages, note the reason -- common issues include the order being unprofitable at the slash collateral price, insufficient remaining deadline, or the prover being at capacity. Always check whether our BP provers attempted secondary fulfillment on orders that expired after being locked.
Tips
- Keep time windows as narrow as possible (minutes or hours, not days)
- Start with a request ID filter, then broaden if needed
- Log messages are typically structured (JSON or key=value), so
jq is useful for parsing
- If the output is very large, add
| head -50 or pipe to a file
- Use the
--limit flag to cap results per API call (default 10000)
- When unsure which log group to query, discover them first with
describe-log-groups
- Some services span multiple log groups -- check all matching groups when investigating