| name | memory-debugging |
| description | This skill should be used when debugging memory-related issues in the Breenix kernel including page faults, double faults, frame allocation problems, page table issues, heap allocation failures, stack overflows, and virtual memory mapping errors. |
Memory Debugging for Breenix
Debug kernel memory issues including page faults, allocator problems, and page table errors.
Purpose
Memory bugs in kernel development are among the most difficult to debug. This skill provides systematic approaches for diagnosing page faults, double faults, allocator issues, and page table problems specific to Breenix.
When to Use
- Page faults: Accessing unmapped or incorrectly mapped memory
- Double faults: Stack issues or cascading exceptions
- Frame allocation failures: Out of memory or allocator bugs
- Page table problems: Wrong mappings, missing entries, incorrect flags
- Heap allocation issues: OOM, corruption, leaks
- Stack overflows: Exceeding stack size, missing guard pages
- Virtual address conflicts: Multiple processes mapping same address
Memory Subsystems in Breenix
1. Physical Memory (Frame Allocator)
Location: kernel/src/memory/frame_allocator.rs
What it does: Manages physical memory frames (4KB pages)
Common issues:
- Running out of frames
- Double allocation of same frame
- Frames not freed properly
- Initialization failures
Debug approach:
log::debug!("Allocating frame at {:?}", frame);
log::debug!("Frame allocator: {} frames used", count);
log::info!("Physical memory: {} MiB usable", memory_mb);
2. Virtual Memory (Page Tables)
Location: kernel/src/memory/ (process_memory.rs, kernel_page_table.rs)
What it does: Maps virtual addresses to physical frames
Common issues:
- Missing page table entries
- Wrong flags (PRESENT, WRITABLE, USER_ACCESSIBLE)
- Shared page table entries causing conflicts
- Kernel mappings not copied to process page tables
Debug approach:
log::debug!("Mapping page {:?} to frame {:?} with flags {:?}",
page, frame, flags);
let result = page_table.translate_addr(addr);
log::debug!("Address {:?} translates to {:?}", addr, result);
3. Heap Allocator
Location: Uses Rust's #[global_allocator]
Size: 1024 KiB
Common issues:
- Out of heap memory
- Heap corruption
- Allocations during early boot (before heap init)
Debug approach:
log::info!("Heap: 1024 KiB");
4. Kernel Stacks
Location: kernel/src/memory/kernel_stack.rs
Layout: 8KB stacks with 4KB guard pages at 0xffffc900_0000_0000
Common issues:
- Stack overflow into guard page
- Kernel stack not mapped in process page table
- IST stack issues for double faults
Debug approach:
log::debug!("Allocated kernel stack {} at {:?}", id, addr);
log::debug!("Stack bottom: {:?}, top: {:?}", bottom, top);
Common Memory Errors
Error 1: Page Fault
Symptoms:
PAGE FAULT at 0x... Error Code: 0x...
Error Code Decoding:
Bit 0 (P): 0 = Page not present
1 = Protection violation
Bit 1 (W): 0 = Read access
1 = Write access
Bit 2 (U): 0 = Kernel mode
1 = User mode
Bit 3 (R): 1 = Reserved bit set
Bit 4 (I): 1 = Instruction fetch
Common Causes:
1. Accessing unmapped memory
let ptr = 0x12345000 as *const u64;
unsafe { *ptr }
Diagnosis:
- Check if address should be mapped
- Verify page table has entry for this address
- Confirm physical frame was allocated
Fix: Map the page before accessing
2. Writing to read-only page
let ptr = read_only_page as *mut u64;
unsafe { *ptr = 42; }
Diagnosis:
- Check page table flags
- Verify WRITABLE flag is set
- Confirm not writing to kernel code/data
Fix: Add WRITABLE flag or don't write to read-only pages
3. User accessing kernel page
let ptr = 0xFFFF_8000_0000_0000 as *const u64;
unsafe { *ptr }
Diagnosis:
- Check if address is in kernel space (upper half)
- Verify page doesn't have USER_ACCESSIBLE flag
- Confirm userspace should not access this
Fix: Don't allow userspace to access kernel memory
4. Accessing kernel stack not mapped in process page table
Diagnosis:
- Check if kernel stack is mapped in process page table
- Verify TSS RSP0 points to valid kernel stack
- Look for double fault during syscalls (int 0x80)
Fix: Copy kernel stack mappings to process page table
Error 2: Double Fault
Symptoms:
DOUBLE FAULT - Error Code: 0x...
Instruction Pointer: 0x...
Stack Pointer: 0x...
What it means: Exception occurred while handling another exception
Common Causes:
1. Kernel stack not mapped during exception
Sequence:
1. Exception occurs (page fault, etc.)
2. CPU tries to switch to kernel stack
3. Kernel stack not mapped in current page table
4. Page fault accessing kernel stack
5. DOUBLE FAULT
Diagnosis:
- Check which exception triggered the double fault
- Verify kernel stack is mapped
- Check TSS RSP0 value
- Look at instruction pointer (where was CPU when it faulted?)
Fix: Ensure kernel stack mapped in all page tables
2. Stack overflow
Sequence:
1. Recursive function or large stack allocation
2. Stack exceeds allocated size
3. Writes into guard page
4. Page fault (guard page not mapped)
5. Page fault handler needs stack
6. DOUBLE FAULT
Diagnosis:
- Check stack pointer value
- Compare against stack bounds
- Look for recursive calls
- Check for large stack allocations
Fix: Increase stack size or fix code causing overflow
3. Exception handler itself faults
Sequence:
1. Exception occurs
2. Handler tries to access unmapped memory
3. Page fault inside handler
4. DOUBLE FAULT
Diagnosis:
- Review exception handler code
- Check what handler was executing
- Verify handler doesn't access invalid addresses
Fix: Fix bug in exception handler
Error 3: Page Already Mapped
Symptoms:
Error: Attempted to map already-mapped page
Common Causes:
1. Shared page table levels
Diagnosis:
- Check if page table levels are shared between processes
- Verify each process has independent L3/L2/L1 tables
- Look at PML4 entry copying code
Fix: Deep copy page table levels, don't share
2. Mapping same address twice
page_table.map_to(page, frame, flags, allocator)?;
page_table.map_to(page, frame, flags, allocator)?;
Diagnosis:
- Check if page is already mapped before mapping
- Look for duplicate mapping calls
- Verify cleanup properly unmaps pages
Fix: Check before mapping or unmap first
Error 4: Out of Memory
Symptoms:
Error: Frame allocator out of memory
Common Causes:
1. Too many allocations
loop {
allocator.allocate_frame();
}
Diagnosis:
- Log total memory available
- Count allocations vs deallocations
- Check for memory leaks
Fix: Free frames when done, or increase memory
2. Memory leaks
let frame = allocator.allocate_frame()?;
Diagnosis:
- Track allocation/deallocation counts
- Look for allocations without corresponding frees
- Use systematic allocation patterns
Fix: Properly free all allocated frames
Debugging Techniques
Technique 1: Add Checkpoint Logging
Add logging at critical memory operations:
log::debug!("CHECKPOINT: Before page table operation");
page_table.map_to(page, frame, flags, allocator)?;
log::debug!("CHECKPOINT: After page table operation");
log::debug!("CHECKPOINT: Before memory access");
unsafe { *(addr as *const u64) };
log::debug!("CHECKPOINT: After memory access");
If crash happens between checkpoints, you know where to focus.
Technique 2: Verify Assumptions
Check assumptions about memory state:
match page_table.translate_addr(addr) {
Some(phys) => log::debug!("Address {:?} mapped to {:?}", addr, phys),
None => log::warn!("Address {:?} NOT MAPPED", addr),
}
log::debug!("Allocated frame: {:?}", frame);
assert!(frame.start_address().as_u64() > 0);
log::debug!("Page mapped with flags: {:?}", flags);
assert!(flags.contains(PageTableFlags::PRESENT));
Technique 3: Dump Page Table State
Add functions to dump page table state:
pub fn dump_page_table_entry(page_table: &PageTable, addr: VirtAddr) {
let result = page_table.translate_addr(addr);
log::debug!("Address: {:?}", addr);
log::debug!(" Translation: {:?}", result);
}
Technique 4: Use kernel-debug-loop
Fast iteration for memory issues:
kernel-debug-loop/scripts/quick_debug.py \
--signal "MEMORY OPERATION COMPLETE" \
--timeout 10
Technique 5: Compare Working vs Broken State
If something used to work:
git checkout working_commit
kernel-debug-loop/scripts/quick_debug.py ... > working.log
git checkout broken_commit
kernel-debug-loop/scripts/quick_debug.py ... > broken.log
diff -u working.log broken.log
Memory Issue Patterns
Pattern: Syscall Page Fault
Scenario: Page fault when userspace calls int 0x80
Diagnosis:
- Kernel stack not mapped in process page table
- Ring 3 → Ring 0 transition fails
Fix: copy_kernel_stack_to_process()
Reference: DIRECT_EXECUTION_FIX.md
Pattern: Context Switch Double Fault
Scenario: Double fault during context switch between processes
Diagnosis:
- Wrong page table activated
- Current stack not mapped in new page table
Fix: Ensure kernel stacks globally mapped
Reference: Global kernel page table architecture
Pattern: Process Creation "Already Mapped"
Scenario: Second process creation fails with "page already mapped"
Diagnosis:
- Processes sharing page table levels
- First process's mappings conflict with second
Fix: Deep copy page table levels
Reference: PAGE_TABLE_FIX.md
Pattern: Heap Allocation Panic
Scenario: Panic during heap allocation
Diagnosis:
- Out of heap memory
- Heap not initialized yet
- Heap corruption
Fix: Increase heap size, defer allocation, or fix corruption
Integration with Other Skills
With kernel-debug-loop
kernel-debug-loop/scripts/quick_debug.py \
--signal "MEMORY TEST COMPLETE" \
--timeout 15
With systematic-debugging
Document complex memory bugs:
# Problem
Page fault at 0x10001082 during sys_write
# Root Cause
User buffer not mapped in process page table
# Solution
Verify user addresses before access
# Evidence
Before: PAGE FAULT
After: Successful syscall
With log-analysis
echo '"PAGE FAULT"' > /tmp/log-query.txt
./scripts/find-in-logs
echo '"Allocated\|Deallocated"' > /tmp/log-query.txt
./scripts/find-in-logs
Best Practices
- Log extensively: Memory operations should be well-logged
- Verify before access: Check addresses are mapped
- Use checkpoints: Narrow down failure location
- Test incrementally: Add small changes, test frequently
- Understand architecture: Know how page tables work
- Reference working code: Look at similar working code
- Document patterns: Save solutions for future reference
Quick Reference
Page Fault Error Codes
0x0: Read from unmapped page (kernel mode)
0x1: Read from non-present page (kernel mode)
0x2: Write to unmapped page (kernel mode)
0x3: Write to non-present page (kernel mode)
0x4: Read from unmapped page (user mode)
0x6: Write to unmapped page (user mode)
Memory Regions
0x0000_0000_0000_0000 - 0x0000_7FFF_FFFF_FFFF: User space
0xFFFF_8000_0000_0000 - 0xFFFF_FFFF_FFFF_FFFF: Kernel space
0xFFFF_C900_0000_0000: Kernel stacks
Key Files
kernel/src/memory/frame_allocator.rs - Physical memory
kernel/src/memory/process_memory.rs - Process page tables
kernel/src/memory/kernel_page_table.rs - Kernel mappings
kernel/src/memory/kernel_stack.rs - Kernel stack allocator
Summary
Memory debugging requires:
- Understanding memory subsystems (frame allocator, page tables, heap, stacks)
- Systematic diagnosis of page faults and double faults
- Checkpoint logging to isolate failures
- Verification of page table state and mappings
- Reference to past fixes (DIRECT_EXECUTION_FIX, PAGE_TABLE_FIX)
- Integration with fast iteration tools
Memory bugs are complex but systematic debugging always works.