Exécutez n'importe quel Skill dans Manus
en un clic

Exécutez n'importe quel Skill dans Manus en un clic

$pwd:

researching-performance-problems

Name: Researching Performance Problems
Author: torrust

// Workflow for investigating server and service performance bottlenecks in the torrust-tracker-demo repository. Use when debugging uptime degradation, high load, latency spikes, dropped packets, or suspected capacity limits. Triggers on "performance issue", "high load", "uptime drop", "bottleneck", "capacity", "scaling", "degraded service", "newTrackon uptime".

Exécuter dans Manus

$ git log --oneline --stat

stars:2

forks:1

updated:13 avril 2026 à 15:01

SKILL.md

readonly

name	researching-performance-problems
description	Workflow for investigating server and service performance bottlenecks in the torrust-tracker-demo repository. Use when debugging uptime degradation, high load, latency spikes, dropped packets, or suspected capacity limits. Triggers on "performance issue", "high load", "uptime drop", "bottleneck", "capacity", "scaling", "degraded service", "newTrackon uptime".
metadata	{"author":"torrust","version":"1.0"}

Researching Performance Problems

Overview

This skill provides a practical workflow to investigate performance and uptime problems without jumping to conclusions too early.

Use it to gather reproducible evidence, separate signal from noise, and decide whether the root cause is load, network path issues, application behavior, or infrastructure sizing.

When To Use

Use this workflow when any of these symptoms appear:

Uptime drops (for example external probes below expected SLA)
High load average or CPU saturation
Increased request errors, timeouts, or dropped UDP traffic
Suspected bottlenecks requiring tuning or scale-up decisions

Core Principles

Capture evidence before changing infrastructure.
Keep raw artifacts in issue-scoped evidence folders.
Separate temporary conclusions from confirmed root cause.
Treat monitoring source differences explicitly (for example tracker_stats vs tracker_metrics).
Avoid exposing secrets or client-identifying payloads in stored evidence.

Recommended Workflow

1) Create Issue-Scoped Evidence Folder

Use:

docs/issues/evidence/ISSUE-<N>/

Add one file per capture with:

Context
Exact commands
Raw output (sanitized if needed)
Short notes/findings

2) Collect Baseline Host Snapshot

Capture at minimum:

date -u, uptime, free -h, df -h, ss -u -s
docker compose ps
top CPU and memory process snapshots

3) Collect Kernel and Network Pressure Signals

Common checks:

UDP counters (/proc/net/snmp, nstat)
Interface error/drops (ip -s link)
Packet path checks during incidents (tcpdump)

4) Collect Application and Service Signals

Service-level counters and errors from Prometheus
Aggregated log category counts (avoid raw sensitive payload dumps)
Container-level CPU/memory/network (docker stats --no-stream)

5) Use Prometheus Deliberately

Validate scrape source mapping before analysis.
Prefer counter-style series (*_total) for increases/rates.
Use query_range for 24h-72h time-correlation, not only point-in-time queries.
Explicitly document caveats when metric names suggest gauge-like behavior.

6) Correlate with External Uptime

Correlate internal metrics with external probe windows:

newTrackon status changes
synthetic probes from multiple locations
host/network pressure windows

7) Decide on Remediation Path

Tune first if clear bottleneck is configuration-level.
Scale up if pressure is persistent and tuning is insufficient.
Validate impact after each change using same evidence method.

Common Things To Check

CPU contention between reverse proxy and tracker process.
UDP receive buffer errors and queue/backlog pressure.
Request error ratio trend by protocol and port.
IPv4/IPv6 differences in failure behavior.
Restart windows vs uptime drops (restart alone rarely explains large drops).

Output Template (per investigation phase)

Create a progress/conclusions note containing:

What was done
What was learned
Temporary conclusions
Candidate actions (immediate, short-term tuning, scaling)
Open questions

Example From This Repository

Use ISSUE-19 evidence as a reference implementation:

docs/issues/evidence/ISSUE-19/2026-04-13-baseline-server-snapshot.md
docs/issues/evidence/ISSUE-19/2026-04-13-host-kernel-tracker-sanitized-diagnostics.md
docs/issues/evidence/ISSUE-19/2026-04-13-prometheus-source-mapping.md
docs/issues/evidence/ISSUE-19/2026-04-13-prometheus-3h-summaries.md
docs/issues/evidence/ISSUE-19/2026-04-13-progress-and-temporary-conclusions.md

Safety Notes

Do not commit secrets, tokens, credentials, or private keys.
Avoid storing raw logs with full client-identifying request payloads when not strictly necessary.
Prefer aggregated counts when raw logs are high-volume and privacy-sensitive.

related-skills.json

même dépôt

check-udp-conntrack.md

from "torrust/torrust-tracker-demo"

Workflow for checking whether UDP packet loss or uptime degradation may be caused by conntrack saturation on the torrust-tracker-demo server. Use when diagnosing UDP timeouts, low newTrackon uptime, packet drops, conntrack pressure, UDP receive-buffer errors, or when validating whether conntrack tuning is still healthy.

2026-04-272

open-github-issue.md

from "torrust/torrust-tracker-demo"

Step-by-step process for creating and opening a GitHub issue in the torrust-tracker-demo repository. Use when asked to open, create, or file an issue. Covers writing the draft file, human review, opening on GitHub, renaming the file, and committing. Triggers on "open issue", "create issue", "new issue", "file issue", "draft issue".

2026-04-202

scale-up-server.md

from "torrust/torrust-tracker-demo"

Step-by-step workflow for resizing (scaling up) the Hetzner server in the torrust-tracker-demo stack. Use when asked to resize, scale up, or upgrade the server plan. Covers pre-resize preparation, graceful shutdown, provider panel action, post-resize recovery, and evidence capture. Triggers on "resize server", "scale up", "upgrade server plan", "Hetzner resize", "change server type".

2026-04-132

open-pull-request.md

from "torrust/torrust-tracker-demo"

Create and open a pull request in the torrust-tracker-demo repository. Use when asked to open a PR, create a pull request, submit a PR, or push changes and create a PR. Triggers on "open PR", "create PR", "submit PR", "push and open PR", "open pull request".

2026-04-132

commit.md

from "torrust/torrust-tracker-demo"

Guide for committing changes in the torrust-tracker-demo repository. Covers running all linters locally before committing to ensure CI passes. Triggers on "commit", "how to commit", "before committing", "pre-commit checks", "run linters", "check before commit".

2026-04-132

create-issue-branch.md

from "torrust/torrust-tracker-demo"

Create a git branch to start working on a GitHub issue in the torrust-tracker-demo repository. Use when asked to start working on an issue, create a branch for an issue, or check out a new branch. Triggers on "create branch", "new branch", "start working on issue", "branch for issue", "checkout branch".

2026-04-132

package.json

"author": "torrust"

"repository": "torrust/torrust-tracker-demo"

Ouvrir le dépôt GitHub Voir les dépôts du créateur

$ install --global

$ download --local

Exécuter dans Manus

$ useful --forSOC

Administrateurs de réseaux et de systèmes informatiquesProfessions informatiques et mathématiques15-1244L4

name	researching-performance-problems
description	Workflow for investigating server and service performance bottlenecks in the torrust-tracker-demo repository. Use when debugging uptime degradation, high load, latency spikes, dropped packets, or suspected capacity limits. Triggers on "performance issue", "high load", "uptime drop", "bottleneck", "capacity", "scaling", "degraded service", "newTrackon uptime".
metadata	{"author":"torrust","version":"1.0"}

Researching Performance Problems

Overview

This skill provides a practical workflow to investigate performance and uptime problems without jumping to conclusions too early.

Use it to gather reproducible evidence, separate signal from noise, and decide whether the root cause is load, network path issues, application behavior, or infrastructure sizing.

When To Use

Use this workflow when any of these symptoms appear:

Uptime drops (for example external probes below expected SLA)
High load average or CPU saturation
Increased request errors, timeouts, or dropped UDP traffic
Suspected bottlenecks requiring tuning or scale-up decisions

Core Principles

Capture evidence before changing infrastructure.
Keep raw artifacts in issue-scoped evidence folders.
Separate temporary conclusions from confirmed root cause.
Treat monitoring source differences explicitly (for example tracker_stats vs tracker_metrics).
Avoid exposing secrets or client-identifying payloads in stored evidence.

Recommended Workflow

1) Create Issue-Scoped Evidence Folder

Use:

docs/issues/evidence/ISSUE-<N>/

Add one file per capture with:

Context
Exact commands
Raw output (sanitized if needed)
Short notes/findings

2) Collect Baseline Host Snapshot

Capture at minimum:

date -u, uptime, free -h, df -h, ss -u -s
docker compose ps
top CPU and memory process snapshots

3) Collect Kernel and Network Pressure Signals

Common checks:

UDP counters (/proc/net/snmp, nstat)
Interface error/drops (ip -s link)
Packet path checks during incidents (tcpdump)

4) Collect Application and Service Signals

Service-level counters and errors from Prometheus
Aggregated log category counts (avoid raw sensitive payload dumps)
Container-level CPU/memory/network (docker stats --no-stream)

5) Use Prometheus Deliberately

Validate scrape source mapping before analysis.
Prefer counter-style series (*_total) for increases/rates.
Use query_range for 24h-72h time-correlation, not only point-in-time queries.
Explicitly document caveats when metric names suggest gauge-like behavior.

6) Correlate with External Uptime

Correlate internal metrics with external probe windows:

newTrackon status changes
synthetic probes from multiple locations
host/network pressure windows

7) Decide on Remediation Path

Tune first if clear bottleneck is configuration-level.
Scale up if pressure is persistent and tuning is insufficient.
Validate impact after each change using same evidence method.

Common Things To Check

CPU contention between reverse proxy and tracker process.
UDP receive buffer errors and queue/backlog pressure.
Request error ratio trend by protocol and port.
IPv4/IPv6 differences in failure behavior.
Restart windows vs uptime drops (restart alone rarely explains large drops).

Output Template (per investigation phase)

Create a progress/conclusions note containing:

What was done
What was learned
Temporary conclusions
Candidate actions (immediate, short-term tuning, scaling)
Open questions

Example From This Repository

Use ISSUE-19 evidence as a reference implementation:

docs/issues/evidence/ISSUE-19/2026-04-13-baseline-server-snapshot.md
docs/issues/evidence/ISSUE-19/2026-04-13-host-kernel-tracker-sanitized-diagnostics.md
docs/issues/evidence/ISSUE-19/2026-04-13-prometheus-source-mapping.md
docs/issues/evidence/ISSUE-19/2026-04-13-prometheus-3h-summaries.md
docs/issues/evidence/ISSUE-19/2026-04-13-progress-and-temporary-conclusions.md

Safety Notes

Do not commit secrets, tokens, credentials, or private keys.
Avoid storing raw logs with full client-identifying request payloads when not strictly necessary.
Prefer aggregated counts when raw logs are high-volume and privacy-sensitive.

researching-performance-problems

Researching Performance Problems

Overview

When To Use

Core Principles

Recommended Workflow

1) Create Issue-Scoped Evidence Folder

2) Collect Baseline Host Snapshot

3) Collect Kernel and Network Pressure Signals

4) Collect Application and Service Signals

5) Use Prometheus Deliberately

6) Correlate with External Uptime

7) Decide on Remediation Path

Common Things To Check

Output Template (per investigation phase)

Example From This Repository

Safety Notes

Plus depuis ce dépôt

Plus depuis ce dépôt

Researching Performance Problems

Overview

When To Use

Core Principles

Recommended Workflow

1) Create Issue-Scoped Evidence Folder

2) Collect Baseline Host Snapshot

3) Collect Kernel and Network Pressure Signals

4) Collect Application and Service Signals

5) Use Prometheus Deliberately

6) Correlate with External Uptime

7) Decide on Remediation Path

Common Things To Check

Output Template (per investigation phase)

Example From This Repository

Safety Notes