| name | breenix-interrupt-syscall-development |
| description | Enforce pristine interrupt/syscall paths with no logging, diagnostics, or heavy operations. Use when developing interrupt handlers, syscall entry/exit, context switches, or timer code. |
Breenix Interrupt and Syscall Path Development
CRITICAL PATH REQUIREMENTS - MANDATORY RULES
These requirements are NON-NEGOTIABLE. Violations will cause catastrophic performance degradation and subtle timing bugs that are extremely difficult to debug.
ABSOLUTELY FORBIDDEN in interrupt/syscall paths:
-
NO logging or serial output
- No
serial_println! or println! of any kind
- No debug prints, trace messages, or diagnostics
- This includes "temporary" debug code that you plan to remove later
- EXCEPTION: Only in initialization code that runs once at boot, before interrupts are enabled
-
NO page table walks or memory diagnostics
- No calls to
translate_address() or similar
- No memory allocation diagnostics
- No stack usage analysis
- No heap introspection
-
NO function calls that allocate or take locks
- No heap allocations (Box, Vec, String, etc.)
- No lock acquisitions (except purpose-built lock-free critical sections)
- No panic!() - handlers must be infallible or use explicit error codes
-
Interrupt handlers must complete in < 1000 cycles
- On modern x86_64 at 2GHz, this is ~500 nanoseconds
- Timer fires every 10ms = 20,000,000 cycles
- You have 0.005% of that budget
- Serial output takes 10,000+ cycles per character
-
Context switch path must be deterministic and fast
- No conditional diagnostics based on process state
- No complex validation beyond assertions
- Register save/restore must be branchless where possible
Why These Rules Exist
Violating these rules causes a specific failure pattern that is extremely difficult to diagnose:
Case Study: The trace_iretq_to_ring3 Bug (December 2024)
Symptom: Userspace process appeared to be created successfully but never executed a single instruction. Process remained stuck at RIP=0x40000000 (entry point) through 2,389 scheduler iterations.
Root Cause: Heavy diagnostics in trace_iretq_to_ring3():
unsafe fn trace_iretq_to_ring3(frame: &TrapFrame) {
serial_println!("=== IRETQ to Ring 3 ===");
serial_println!(" RIP: {:#x}", frame.rip);
serial_println!(" RSP: {:#x}", frame.rsp);
let phys_addr = current_page_table.translate_address(VirtAddr::new(frame.rip));
}
What Actually Happened:
- Kernel prepared perfect interrupt frame for userspace
iretq instruction successfully transitioned to Ring 3
- CPU immediately took timer interrupt within 100-500 cycles
- Userspace RIP never changed from 0x40000000 - not even one instruction executed
- Timer preempted userspace before it could do anything
- Scheduler selected same process again
- Loop repeated 2,389 times with zero forward progress
Why This Happened:
Serial output in trace_iretq_to_ring3() consumed ~10,000-50,000 cycles. Timer fires every 20,000,000 cycles. The logging pushed the kernel dangerously close to the next timer interrupt. By the time iretq completed:
- Timer interrupt was already pending or fired within microseconds
- Userspace had 100-500 cycles of execution time (enough for ~50-250 instructions theoretically)
- But context switch overhead consumed most of that window
- Process never got to execute even its first
mov instruction
Evidence:
- 2,389 scheduler loop iterations with ZERO userspace progress
- RIP frozen at 0x40000000 across all iterations
- All registers frozen in initial state
- Timer interrupt dominated CPU time
The Fix:
Remove ALL diagnostics from the interrupt return path:
unsafe fn return_to_ring3(frame: &TrapFrame) {
core::arch::asm!(
"iretq",
in("rsp") frame as *const _ as u64,
options(noreturn)
);
}
After this fix: Userspace executed immediately and correctly.
Key Insight
You cannot debug interrupt paths by adding logging to interrupt paths.
The act of observation changes the system behavior so dramatically that what you're debugging no longer exists. It's a kernel-level Heisenbug - the diagnostic itself destroys the evidence.
APPROVED PATTERNS - What IS Allowed
These operations are acceptable in interrupt/syscall paths:
1. Atomic Counter Increments (for statistics)
TIMER_TICKS.fetch_add(1, Ordering::Relaxed);
SYSCALL_COUNT.fetch_add(1, Ordering::SeqCst);
2. Simple Flag Checks
if should_reschedule.load(Ordering::Acquire) {
schedule();
}
3. Direct Register Manipulation
unsafe {
core::arch::asm!(
"mov {}, cr3",
out(reg) cr3_value
);
}
4. Calling Other Minimal Inline Functions
#[inline(always)]
fn save_context(regs: &mut Registers) {
regs.rax = read_rax();
regs.rbx = read_rbx();
}
5. Assertions (but use sparingly)
debug_assert!(is_kernel_address(stack_ptr));
debug_assert_eq!(cs & 3, 0);
Assertions compile to nothing in release builds, so they're safe for sanity checks.
DEBUGGING ALTERNATIVES - How to Debug Without Breaking the Path
Since you cannot add logging to interrupt paths, use these techniques instead:
1. QEMU Interrupt Tracing (BEST OPTION)
cargo run --bin qemu-uefi -- -d int,cpu_reset -D /tmp/qemu.log
grep "interrupt" /tmp/qemu.log
This shows every interrupt entry/exit with register state. No kernel instrumentation required.
2. GDB Breakpoints (SECOND BEST)
cargo run --bin qemu-uefi -- -s -S
gdb target/x86_64-breenix/release/breenix
(gdb) target remote :1234
(gdb) break timer_interrupt_handler
(gdb) continue
GDB lets you inspect state without modifying the timing characteristics.
3. Conditional Compilation (USE SPARINGLY)
#[cfg(debug_assertions)]
{
INTERRUPT_COUNT.fetch_add(1, Ordering::Relaxed);
}
Only for counters and flags - never for serial output.
4. Post-Interrupt Diagnostics
fn timer_interrupt_handler() {
TIMER_TICKS.fetch_add(1, Ordering::Relaxed);
acknowledge_interrupt();
schedule_if_needed();
}
pub fn print_timer_stats() {
let ticks = TIMER_TICKS.load(Ordering::Relaxed);
serial_println!("Timer ticks: {}", ticks);
}
Accumulate data in the hot path, log it in a cold path.
5. Hardware Performance Counters (ADVANCED)
let start = read_tsc();
critical_operation();
let end = read_tsc();
CYCLE_COUNTS[operation_id].fetch_add(end - start, Ordering::Relaxed);
CODE REVIEW CHECKLIST
Before merging ANY code that touches interrupt or syscall paths, ask these questions:
Mandatory Rejection Criteria
Careful Review Required
Acceptable Operations
Testing Requirements
IMPLEMENTATION WORKFLOW
When implementing interrupt or syscall handlers:
1. Design the Pristine Path First
pub extern "C" fn syscall_entry() {
}
2. Add Atomic Counters for Observability
static SYSCALL_COUNTS: [AtomicU64; 256] = [];
pub extern "C" fn syscall_entry(syscall_num: u64) {
SYSCALL_COUNTS[syscall_num as usize].fetch_add(1, Ordering::Relaxed);
}
3. Test in Release Mode
cargo run --release --bin qemu-uefi
4. Verify with External Tracing
cargo run --release --bin qemu-uefi -- -d int -D /tmp/qemu.log
5. Only Then Add Debug-Only Diagnostics (if absolutely necessary)
#[cfg(all(debug_assertions, feature = "interrupt-trace"))]
{
INTERRUPT_TRACE_BUFFER[index].store(rip, Ordering::Relaxed);
}
ANTI-PATTERNS TO AVOID
Anti-Pattern 1: "Temporary" Debug Logging
fn timer_handler() {
serial_println!("Timer fired");
}
Why it's wrong: You'll forget to remove it, or it will get copied to other handlers. When the bug appears in production, the logging will be the bug.
Anti-Pattern 2: Conditional Diagnostics on Process State
fn context_switch(next: &Process) {
if next.is_userspace() {
serial_println!("Switching to PID {}", next.pid);
}
}
Why it's wrong: Userspace processes get slower context switches than kernel processes. Creates Heisenbugs where timing depends on process type.
Anti-Pattern 3: "Just This Once" Memory Allocation
fn interrupt_handler() {
let msg = format!("Interrupt at {:#x}", rip);
log_message(&msg);
}
Why it's wrong: What if the allocator is out of memory? What if the allocator lock is held? Handler must never fail.
Anti-Pattern 4: Complex Validation
fn syscall_handler(num: u64, args: &[u64]) {
validate_user_memory(args[0]);
validate_file_descriptor(args[1]);
}
Why it's wrong: Validation can be expensive. Do it before entering the critical path, or use hardware-based validation (page faults).
SUCCESS METRICS
You've implemented a pristine interrupt/syscall path if:
-
Userspace makes forward progress immediately
- RIP changes on every scheduler quantum
- Processes complete in expected time
- No infinite loops at entry point
-
Interrupt frequency is stable
- Timer fires at configured rate (e.g., 100 Hz = every 10ms)
- No interrupt storms
- QEMU logs show regular intervals
-
Zero overhead in release builds
- No serial output
- No allocations
- Cycle counts < 1000 per handler
-
Debuggable with external tools
- QEMU tracing shows correct behavior
- GDB breakpoints work
- Performance counters provide visibility
WHEN TO USE THIS SKILL
Invoke this skill when:
- Implementing new interrupt handlers (timer, keyboard, syscall, page fault)
- Modifying syscall entry/exit paths
- Implementing context switch logic
- Adding scheduler hooks
- Debugging performance issues where userspace seems "stuck"
- Reviewing PRs that touch interrupt.asm, syscall/entry.asm, or timer.rs
- Investigating "works in debug, fails in release" bugs (timing-related)
RELATED SKILLS
breenix-kernel-debug-loop: For iterative debugging with external tools
breenix-systematic-debugging: For documenting root cause analysis
breenix-code-quality-check: For enforcing zero-warning builds
breenix-interrupt-trace: For QEMU-based interrupt tracing
breenix-gdb-attach: For breakpoint-based debugging
REFERENCES
- Intel Software Developer Manual Vol 3, Chapter 6: Interrupt and Exception Handling
- AMD64 Architecture Manual Vol 2, Chapter 8: Exceptions and Interrupts
- Linux kernel: arch/x86/entry/entry_64.S - pristine syscall entry path
- FreeBSD: sys/amd64/amd64/exception.S - interrupt handling
VERSION HISTORY
- v1.0 (2024-12-02): Initial skill creation after fixing trace_iretq_to_ring3 bug