| name | hls |
| description | High-Level Synthesis — C/C++ algorithm analysis, HLS directive optimisation, synthesis execution, and co-simulation verification. Use when converting C/C++ to synthesisable RTL, optimising for latency/throughput/area targets using pragmas, or verifying that generated RTL matches the golden C model.
|
| version | 1.0.0 |
| author | chuanseng-ng |
| license | MIT |
| allowed-tools | Read, Write, Bash |
Skill: High-Level Synthesis (HLS)
Invocation
- If invoked by a user presenting an HLS task: immediately spawn the
digital-chip-design-agents:hls-orchestrator agent and pass the full user
request and any available context. Do not execute stages directly.
- If invoked by the
hls-orchestrator mid-flow: do not spawn a new agent.
Treat this file as read-only — return the requested stage rules, sign-off
criteria, or loop-back guidance to the calling orchestrator.
Spawning the orchestrator from within an active orchestrator run causes recursive
delegation and must never happen.
Pre-run Context
Before executing or advising on any stage, read the following files if they exist:
memory/hls/knowledge.md — known failure patterns, successful tool flags, PDK/tool quirks.
Incorporate its guidance into every stage decision. If absent, proceed without it.
memory/hls/run_state.md — current run identity (run_id, design_name, tool,
last_stage). Use this to resume correctly after interruption. If absent, a new run
is starting; the orchestrator will create this file before the first stage.
This pre-run read applies whether this skill is loaded by a user or called by the
orchestrator mid-flow. It ensures the fix database is consulted before any diagnosis step.
Purpose
Convert C/C++/SystemC algorithmic descriptions to synthesisable RTL.
Covers algorithm analysis for HLS compatibility, pragma/directive optimisation,
and co-simulation to verify RTL matches the golden C model.
Supported EDA Tools
Open-Source
- Bambu HLS (
bambu) — open-source HLS from Politecnico di Milano
- LegUp HLS — FPGA-targeted HLS built on LLVM
- Calyx / Futil — infrastructure for HLS compilers (academic)
- MLIR/CIRCT (
circt-opt) — compiler infrastructure for hardware design
Proprietary
- Xilinx Vitis HLS (
vitis_hls) — C/C++ to RTL for AMD/Xilinx devices
- Cadence Stratus (
stratus) — SystemC/C++ HLS for ASIC and FPGA
- Siemens Catapult (
catapult) — algorithmic synthesis from C++/SystemC
Stage: algorithm_analysis
HLS-Hostile Patterns (must fix before synthesis)
- Dynamic memory (malloc/new) → replace with fixed-size static arrays
- Recursive functions → convert to iterative with explicit stack
- Pointer aliasing → use
restrict keyword or restructure accesses
- System calls (printf, file I/O) → wrap in
#ifndef __SYNTHESIS__
- Function pointers → replace with switch/case dispatch
- Data-dependent loop bounds → add maximum bound + early-exit flag
- Floating-point → evaluate fixed-point (
ap_fixed<W,I> for Vitis HLS)
Analysis Steps
- Identify innermost critical loop — the performance bottleneck
- Analyse loop-carried dependencies — limit achievable II
- Classify memory access: sequential (burst-able) vs random (expensive)
- Calculate theoretical minimum latency: trip_count × body_latency
QoR Metrics to Evaluate
- All HLS-hostile patterns resolved
- Critical loop identified with dependency graph
- Theoretical II lower bound computed
Output Required
- Algorithm analysis report
- Fixed-point type recommendations (if applicable)
- Critical loop dependency graph
Stage: directive_planning
Pipelining and Throughput
#pragma HLS PIPELINE II=1
#pragma HLS DATAFLOW
#pragma HLS LOOP_FLATTEN
#pragma HLS LOOP_MERGE
Latency and Unrolling
#pragma HLS UNROLL factor=4
#pragma HLS UNROLL
Memory and Interfaces
#pragma HLS ARRAY_PARTITION variable=buf cyclic factor=4
#pragma HLS INTERFACE mode=axis port=data
#pragma HLS INTERFACE mode=m_axi port=mem
#pragma HLS INTERFACE mode=s_axilite port=ctrl
Resource Binding
#pragma HLS BIND_OP op=mul impl=dsp
#pragma HLS ALLOCATION operation=mul limit=4
Strategy by Target
| Target | Primary Directives |
|---|
| Low latency | UNROLL + PIPELINE II=1 |
| High throughput | PIPELINE + DATAFLOW + ARRAY_PARTITION |
| Low area | ALLOCATION limits + no UNROLL |
| Balanced | PIPELINE II=1 inner loop + ARRAY_PARTITION |
QoR Metrics to Evaluate
- Achieved II: ≤
design_state.constraints.hls.target_ii (one of target_ii or target_latency_cycles must be set; prefer target_ii if both — see Constraint Validation section)
- Latency: ≤
design_state.constraints.hls.target_latency_cycles cycles (one of target_ii or target_latency_cycles must be set)
- Area: within budget
- No directive synthesis errors
Output Required
- Annotated source with all directives and justifications
- Directive justification table
Stage: hls_synthesis
Domain Rules
- Synthesise at target clock period
- Check HLS report: latency, II, resource usage
- Compare achieved vs target — loop back to directives if miss
- Flag any warnings: unresolved dependencies, failed II, inferred latches
- Verify interface protocols match system integration requirements
QoR Metrics to Evaluate
- II: matches or beats
design_state.constraints.hls.target_ii (one of target_ii or target_latency_cycles must be set; prefer target_ii if both)
- Latency: within
design_state.constraints.hls.target_latency_cycles cycles (one of target_ii or target_latency_cycles must be set)
- Area: within budget
- No latch inference warnings
Output Required
- HLS synthesis report (latency, II, resource summary)
- Generated RTL files
- Unresolved warnings with justification
Stage: rtl_qc
Domain Rules
- Run lint on HLS-generated RTL (same rules as rtl-design skill)
- Verify no latches in generated RTL
- Verify interface signal names match integration requirements
- Check all registers reset correctly
QoR Metrics to Evaluate
- Lint: 0 errors
- No latches inferred
- Interface ports match integration spec
Output Required
- Lint report on HLS-generated RTL
Stage: cosimulation
Domain Rules
- C testbench drives RTL through HLS wrapper
- RTL outputs compared against C golden model automatically
- Measure actual latency and II — must match HLS report ±5%
- Exercise all code paths; test boundary conditions
Common Failures
| Failure | Fix |
|---|
| Output mismatch | Check fixed-point overflow; increase bit widths |
| AXI handshake error | Fix INTERFACE pragma configuration |
| Latency differs | Verify loop bounds are static |
| X propagation | Initialise all variables in C source |
QoR Metrics to Evaluate
- Co-simulation: 100% output match with C golden model
- Latency measured: within
design_state.constraints.hls.cosim_tolerance_pct% of HLS report (default: 5%)
- II measured: matches HLS report exactly
- No simulation errors or X propagation
Output Required
- Co-simulation pass/fail report
- Latency and II measurement log
Stage: hls_signoff
Sign-off Checklist
Output Required
- HLS RTL package (generated .v/.sv files)
- Co-simulation pass report
- HLS QoR report (latency, II, area)
- Interface documentation
Constraint Validation
See plugins/meta/skills/pipeline-orchestration/SKILL.md §Constraints Schema for the authoritative schema and stage-entry validation rule.
Required at entry (algorithm_analysis) — at least one must be non-null:
constraints.hls.target_ii — target initiation interval (one of target_ii or target_latency_cycles must be set; prefer target_ii if both)
constraints.hls.target_latency_cycles — target latency in clock cycles (one of target_ii or target_latency_cycles must be set)
Optional (schema defaults apply when absent):
constraints.hls.cosim_tolerance_pct (default: 5) — acceptable co-simulation latency deviation %
constraints.clock.clk_mhz — target clock for synthesis (used if set; otherwise tool default)
Memory
Write on stage completion
After each stage completes (regardless of whether an orchestrator session is active),
write or overwrite one JSON record in memory/hls/experiences.jsonl keyed by
run_id. This ensures data is persisted even if the flow is interrupted or called
without full orchestrator context.
Use run_id = hls_<YYYYMMDD>_<HHMMSS> (set once at flow start; reuse on each
stage update). Every JSON record written must include a top-level "run_id" field
whose value matches this key — this is what makes overwrites unambiguous. Set
signoff_achieved: false until the final sign-off stage completes.
Run state (write before first stage, update after each stage)
Write memory/hls/run_state.md as the first action before launching any tool:
run_id: hls_<YYYYMMDD>_<HHMMSS>
design_name: <design>
tool: <primary tool>
start_time: <ISO-8601>
last_stage: <first stage name>
Update last_stage after each stage completes. This file lets wakeup-loop prompts
and resumed sessions identify the correct run without relying on in-memory state.
Create the file and parent directories if they do not exist.
Optional: claude-mem index
If mcp__plugin_ecc_memory__add_observations is available in this session, emit each
applied fix as an observation to entity chip-design-hls-fixes after writing to
experiences.jsonl. Skip silently if the tool is absent — JSONL is the canonical record.