تشغيل أي مهارة في Manus بنقرة واحدة

truefoundry-monitor

النجوم١

التفرعات٠

آخر تحديث١٥ أبريل ٢٠٢٦ في ٠٥:٣٥

Monitors TrueFoundry deployment rollouts after deploy/apply. Polls status, checks pod health and readiness, fetches logs on failure, and reports a final summary. Use after deploying or applying a manifest to track rollout progress.

التثبيت

التثبيت باستخدام Codex أو Claude انسخ هذا Prompt والصقه في Codex أو Claude أو مساعد آخر ليراجع صفحة Skill ويثبّتها لك.

تشغيل في Manus

المصدر

truefoundry

truefoundry/tfy-deploy-skills

فتح مستودع GitHub عرض مستودعات المنشئ

تنزيل

تشغيل في Manus

المهن ذات الصلةSOC

استنادا إلى تصنيف SOC المهني

مديرو الشبكات وأنظمة الحاسوبمهن الحاسوب والرياضيات·SOC 15-1244

مستكشف الملفات

16 ملفات

SKILL.md

readonly

المزيد من هذا المستودع

نفس المستودع

truefoundry-access-control

truefoundry/tfy-deploy-skills

Manages TrueFoundry roles, teams, and collaborators. Create custom roles, organize users into teams, and grant access to resources. Use when managing permissions, creating teams, or adding collaborators.

2026-04-151

truefoundry-access-tokens

truefoundry/tfy-deploy-skills

Manages TrueFoundry personal access tokens (PATs). List, create, and delete tokens for API auth and CI/CD.

2026-04-151

truefoundry-applications

truefoundry/tfy-deploy-skills

Lists, inspects, and manages TrueFoundry application deployments. Shows status, health, and details for services, jobs, and Helm releases. Also handles requests to delete, remove, or destroy applications by directing users to the TrueFoundry UI.

2026-04-151

truefoundry-deploy

truefoundry/tfy-deploy-skills

Deploys applications to TrueFoundry. Handles single HTTP services, async/queue workers, multi-service projects, and declarative manifest apply. Supports `tfy apply`, `tfy deploy`, docker-compose translation, and CI/CD pipelines. Use when deploying apps, applying manifests, shipping services, or orchestrating multi-service deployments.

2026-04-151

truefoundry-docs

truefoundry/tfy-deploy-skills

Fetches TrueFoundry documentation, API reference, and deployment guides. Use when the user needs platform docs or how-to guidance.

2026-04-151

truefoundry-gitops

truefoundry/tfy-deploy-skills

Sets up GitOps CI/CD pipelines for TrueFoundry using tfy apply. Supports GitHub Actions, GitLab CI, and Bitbucket Pipelines.

2026-04-151

name	truefoundry-monitor
description	Monitors TrueFoundry deployment rollouts after deploy/apply. Polls status, checks pod health and readiness, fetches logs on failure, and reports a final summary. Use after deploying or applying a manifest to track rollout progress.
license	MIT
compatibility	Requires Bash, curl, and access to a TrueFoundry instance
allowed-tools	Bash(/tfy-api.sh )

Routing note: For ambiguous user intents, use the shared clarification templates in references/intent-clarification.md.

Monitor Deployment

Track a TrueFoundry deployment rollout to completion, reporting status at each stage and diagnosing failures automatically.

When to Use

After tfy apply or tfy deploy to track rollout progress
User says "monitor", "watch deployment", "is my deploy done", "check rollout"
Called automatically by the deploy skill after a successful apply/deploy

When NOT to Use

User wants to deploy → prefer deploy skill; ask if the user wants another valid path
User wants to list all apps → prefer applications skill; ask if the user wants another valid path
User wants to read historical logs → prefer logs skill; ask if the user wants another valid path

CRITICAL BEHAVIOR RULES

RULE 1: Once monitoring starts, you MUST poll until a terminal state or timeout. Do NOT stop early. Do NOT ask the user "should I keep checking?" — just keep checking.

RULE 2: Do NOT end your response while the deployment is in a non-terminal state (BUILDING, INITIALIZED, ROLLOUT_STARTED). If you are about to stop and the status is non-terminal, you are violating this rule — continue polling.

RULE 3: Between each poll, briefly tell the user what you're waiting for. Do NOT silently loop, but also do NOT ask for permission to continue.

Required Information

Before monitoring, you need:

Workspace FQN (TFY_WORKSPACE_FQN) — HARD RULE: Never auto-pick. Always ask the user to confirm.
Application name — the service or job name being deployed

If invoked right after a deploy, both should already be known from the deploy context.

Execution Priority

For all status checks, use MCP tool calls first:

tfy_applications_list(filters={"workspace_fqn": "WORKSPACE_FQN", "application_name": "APP_NAME"})

If MCP tool calls are unavailable, fall back to direct API via tfy-api.sh.

When using direct API, set TFY_API_SH to the full path of this skill's scripts/tfy-api.sh. See references/tfy-api-setup.md for paths per agent.

Monitoring Flow

Step 1: Initial Status Check

TFY_API_SH=~/.claude/skills/truefoundry-monitor/scripts/tfy-api.sh
bash $TFY_API_SH GET '/api/svc/v1/apps?workspaceFqn=WORKSPACE_FQN&applicationName=APP_NAME'

Extract from the response at data[0] (the application object):

deployment.currentStatus.status — the deployment status enum
deployment.currentStatus.transition — current transition (e.g., BUILDING, DEPLOYING)
deployment.currentStatus.state.isTerminalState — boolean, most reliable terminal check
deployment.currentStatus.state.display — human-readable state

Step 2: Poll Until Terminal State

The API response has two key fields: status (the deployment status) and transition (what's happening now). Use state.isTerminalState as the authoritative check for whether to stop polling.

Status values (from deployment.currentStatus.status):

Status	Terminal?	Action
`INITIALIZED`	No	Report "Deployment initialized, waiting...", continue polling
`BUILDING`	No	Report "Build in progress", continue polling
`BUILD_SUCCESS`	No	Report "Build succeeded, deploying...", continue polling
`BUILD_FAILED`	Yes	Fetch build logs, report failure
`ROLLOUT_STARTED`	No	Report "Rollout started", continue polling
`DEPLOY_SUCCESS`	Yes	Report success with endpoint URL
`DEPLOY_FAILED`	Yes	Fetch pod logs, diagnose failure
`DEPLOY_FAILED_WITH_RETRY`	No	Report "Deploy failed, retrying...", continue polling
`PAUSED`	Yes	Report paused/stopped
`FAILED`	Yes	Report general failure
`CANCELLED`	Yes	Report cancelled

Transition values (from deployment.currentStatus.transition):

Transition	Meaning
`BUILDING`	Image build is in progress
`DEPLOYING`	Pods are being created/updated
`REUSING_EXISTING_BUILD`	Skipping build, reusing cached image
`COMPONENTS_DEPLOYING`	Multi-component deployment in progress
`WAITING`	Waiting for resources

Best practice: Always check deployment.currentStatus.state.isTerminalState === true to decide whether to stop polling, rather than matching individual status strings. The state.display field gives a human-friendly label.

Polling schedule:

First 2 minutes: check every 15 seconds
Minutes 2-5: check every 30 seconds
After 5 minutes: check every 60 seconds
Timeout after 10 minutes — report current state and suggest the user check manually

Between polls, tell the user what you're waiting for. Do not silently loop. Do NOT ask "should I continue?" — just continue.

Step 3: On Success

When state.isTerminalState is true and status is DEPLOY_SUCCESS:

Report the final status
Show replicas ready (e.g., "2/2 replicas ready")
Show the endpoint URL if the service has an exposed port
Optionally run a quick health check on the endpoint:

# Only if the service exposes an HTTP port
curl -sf -o /dev/null -w '%{http_code}' "https://ENDPOINT_URL/health" || true

Report the HTTP status code. Do not fail the monitor if the health check fails — just report it.

Step 4: On Failure

When status is BUILD_FAILED, DEPLOY_FAILED, FAILED, or CANCELLED:

Fetch recent logs using the logs skill or direct API:

# Get the app ID first from the status response
TFY_API_SH=~/.claude/skills/truefoundry-monitor/scripts/tfy-api.sh

# Fetch recent logs (last 5 minutes)
bash $TFY_API_SH GET '/api/svc/v1/logs/WORKSPACE_ID/download?applicationFqn=APP_FQN&startTs=START_TS&endTs=END_TS'

Identify the failure cause from the logs (OOMKilled, CrashLoopBackOff, ImagePullBackOff, port mismatch, etc.)
Suggest a fix based on the error:

Error Pattern	Suggested Fix
`OOMKilled`	Increase `memory_limit` in manifest
`CrashLoopBackOff`	Check startup command and logs for crash reason
`ImagePullBackOff`	Verify image URI and registry credentials
Port mismatch	Ensure manifest port matches what the app listens on
`Readiness probe failed`	Check health probe path and startup time
Build error	Check Dockerfile and build logs

Report summary with: error type, relevant log excerpt (max 20 lines), and suggested fix
Do NOT auto-retry. Present the diagnosis and let the user decide next steps.

Presenting Status Updates

Use a consistent format for each status update:

Monitoring: my-service in cluster:workspace
Status: ROLLOUT_STARTED | Transition: DEPLOYING
Display: Deploying (1/2 replicas ready)
Elapsed: 45s
Next check in 15s...

Final summary on success:

Deployment complete: my-service
Status: DEPLOY_SUCCESS
Replicas: 2/2 ready
Endpoint: https://my-service-ws.example.com
Health check: 200 OK
Total time: 1m 32s

Final summary on failure:

Deployment failed: my-service
Status: DEPLOY_FAILED
Error: CrashLoopBackOff — container exited with code 1
Log excerpt:
  > ModuleNotFoundError: No module named 'flask'
Suggested fix: Add 'flask' to requirements.txt and redeploy

<success_criteria>

Success Criteria

Deployment status is tracked from current state to a terminal state
User sees clear progress updates at each polling interval
On success: replicas, endpoint URL, and optional health check are reported
On failure: logs are fetched, root cause is identified, and a fix is suggested
Monitor times out gracefully after 10 minutes with a status summary
The user is never left waiting without feedback

</success_criteria>

Composability

Before monitoring: Use deploy skill to deploy, then monitor
On failure: Use logs skill for deeper log analysis
Check app details: Use applications skill for full app info
Fix and redeploy: Use deploy skill to apply fixes

Error Handling

Application Not Found

Application "APP_NAME" not found in workspace "WORKSPACE_FQN".
Check:
- Application name is spelled correctly
- The deploy/apply command completed successfully
- You're checking the correct workspace

Timeout

Monitoring timed out after 10 minutes.
Current status: ROLLOUT_STARTED | Transition: DEPLOYING
The deployment is still in progress. Check manually:
- TrueFoundry dashboard: TFY_BASE_URL
- Or run this skill again to resume monitoring

Permission Denied

Cannot access this application. Check your API key permissions for this workspace.