| name | fragmentation-aware-packing |
| description | Choose placements that preserve useful residual capacity. Use for bin packing, GPU sharing, accelerator placement, and multi-resource scheduling where stranded capacity hurts future fit. |
Fragmentation-Aware Packing
Use this skill when several feasible placements exist and the choice affects future capacity.
Core Idea
A placement is not good just because it fits. Good placements preserve useful residual capacity. With fractional GPUs, this often means packing small compatible jobs together while preserving whole or scarce GPU slots. The same idea applies to any slots, bins, or resources with discrete capacities.
Marginal Fragmentation
For each feasible placement, compute a local before/after estimate:
- Measure current free capacity by resource type and slot.
- Copy the target machine or bin state.
- Compute
fragmentation_before.
- Apply the candidate placement.
- Compute
fragmentation_after.
- Set
marginal_fragmentation = fragmentation_after - fragmentation_before.
best = None
for placement in feasible_placements:
target_before = copy(target_state)
fragmentation_before = estimate_fragmentation(target_before, workload_types)
target_after = apply(placement, target_before)
fragmentation_after = estimate_fragmentation(target_after, workload_types)
marginal_fragmentation = fragmentation_after - fragmentation_before
score = weighted_action_score(
marginal_fragmentation=marginal_fragmentation,
other_component_deltas=estimate_other_deltas(placement)
)
best = lower_score(best, placement, score)
choose best
Respect hard feasibility first. Use marginal_fragmentation as an input to the weighted action score, not as the only decision rule.
Estimating Fragmentation
When workload shape probabilities are available, such as workload_types from cluster_config.json, use them to estimate which free capacity is likely to be useful:
fragmentation = 0
for workload_type in workload_types_from_cluster_config:
if workload_type.gpu_type is incompatible with target.gpu_type:
continue
can_fit =
target.cpu_free >= workload_type.cpu_units
and target.memory_free >= workload_type.memory_units
and any(slot.free_gpu_units >= workload_type.gpu_units
for slot in target.gpu_slots)
compatible_free_gpu = sum(slot.free_gpu_units for slot in target.gpu_slots)
if not can_fit:
fragmentation += workload_type.probability * compatible_free_gpu
else:
small_fragments = sum(
slot.free_gpu_units
for slot in target.gpu_slots
if 0 < slot.free_gpu_units < workload_type.gpu_units
)
fragmentation += workload_type.probability * small_fragments
Intuitive Example
If two 50-unit GPU jobs can share one 100-unit GPU slot, placing both on the same slot leaves another full slot free. Placing them on two separate slots creates two 50-unit leftovers, which may be harder for future 75- or 100-unit jobs to use.
The same pattern appears outside GPUs: two small tasks may belong in one bin so another bin remains available for a large task. When scores are close, use stable tie-breaks such as urgency, priority, smaller harmless leftovers, and deterministic target order.
Weighted Objective Context
Fragmentation is one objective component. A placement with slightly worse fragmentation may still be better if it substantially improves another weighted component, such as waiting, lateness, resource activation, or unserved-work cost. Conversely, a placement with excellent fragmentation may be bad if it causes a large cost elsewhere.
Use the before/after fragmentation estimate as one delta in a general score:
weighted_marginal_score =
fragmentation_weight * marginal_fragmentation
+ other_weight_1 * delta_other_component_1
+ other_weight_2 * delta_other_component_2
+ deterministic_tie_break