| name | convert-model-to-be |
| description | Convert a GGUF model to big-endian format for IBM i / PowerPC systems and publish it to the huggingface.co/models-for-i collection. Use when the user asks to convert a GGUF model to big-endian (BE), add a model to the models-for-i HF collection, or prepare a HuggingFace model for use on IBM i with Llama.cpp. Triggers on phrases like "convert X to big endian", "upload X to models-for-i", "add this model to the collection". |
Convert Model To Big-Endian
Workflow for taking a little-endian GGUF on HuggingFace, converting it to big-endian for IBM i, and publishing it under huggingface.co/models-for-i. Assumes the project root (containing gguf_convert_endian.py) is the working directory.
Workflow
- Verify the source repo and pick a quantization
- Check tensor-type compatibility (dry-run)
- Download the file
- Convert in place to big-endian
- Rename with
-be suffix and write the README
- Create the HF repo and upload (with xet disabled)
Execute steps sequentially; do not skip the compatibility check โ it saves a wasted multi-GB download when the converter doesn't yet support a tensor type used by the model.
1. Pick a quantization
Unsloth and similar publishers ship many quants per model. The converter supports only a subset of GGML tensor types: F32, F16, BF16, Q4_0, Q8_0, Q4_K, Q5_K, Q6_K, MXFP4. Any other type (Q2_K, Q3_K, Q5_0, Q5_1, IQ*, UD-*) will fail.
Default: Q4_K_M โ matches the precedent set by Ministral-3-8B-Instruct-2512 already in the collection and produces tensors in {Q4_K, Q6_K, F32}. Only deviate if the user names a specific quant or Q4_K_M is unavailable.
List what's available:
uv run python -c "
from huggingface_hub import HfApi
for s in HfApi().repo_info('<owner>/<repo>', files_metadata=True).siblings:
if s.rfilename.endswith('.gguf'):
print(f'{s.rfilename:55s} {s.size/(1024*1024):8.1f} MB')
"
2. Compatibility check
The converter script's own --dry-run validates types and aborts on the first unsupported tensor. Since gguf.GGUFReader must read the whole file, you have to download first (step 3) before you can dry-run. But also enumerate tensor types up front so you can foresee failures and plan a fix before committing to a huge download:
uv run python -c "
import gguf
r = gguf.GGUFReader('models/<name>/<file>.gguf', 'r')
from collections import Counter
print(Counter(t.tensor_type.name for t in r.tensors))
supported = {'F32','F16','BF16','Q4_0','Q8_0','Q4_K','Q5_K','Q6_K','MXFP4'}
bad = [(t.name, t.tensor_type.name) for t in r.tensors if t.tensor_type.name not in supported]
for n,k in bad: print('UNSUPPORTED:', n, k)
"
If any unsupported tensor types appear, extend gguf_convert_endian.py before running the conversion. See references/extending_converter.md for block layouts and the pattern.
3. Download
mkdir -p models/<name>
hf download <owner>/<repo> <quant-file>.gguf --local-dir models/<name>
4. Convert
The converter modifies the file in-place and prompts YES, I am sure> before writing. Pipe YES in:
echo "YES" | uv run gguf_convert_endian.py models/<name>/<quant-file>.gguf big
Verify:
uv run python -c "import gguf; print(gguf.GGUFReader('models/<name>/<quant-file>.gguf','r').endianess.name)"
5. Rename and write README
Add the -be suffix matching the collection convention:
mv models/<name>/<quant-file>.gguf models/<name>/<quant-file>-be.gguf
Write models/<name>/README.md using this template (fill <license>, <base-model>, <name>, <quant>):
---
license: <license>
base_model:
- <base-model>
tags:
- BE
---
Big Endian GGUF <name> (<quant>), converted from [<source-owner>/<source-repo>](https://huggingface.co/<source-owner>/<source-repo>) for use on IBM i with Llama.cpp.
<base-model> is the original upstream model (e.g. google/gemma-4-E2B-it), NOT the GGUF republisher. If uncertain, inspect the source repo's own README for its base_model frontmatter.
6. Create repo and upload
uv run python -c "
from huggingface_hub import HfApi
print(HfApi().create_repo('models-for-i/<name>', repo_type='model', exist_ok=True))
"
Then upload with Xet disabled:
HF_HUB_DISABLE_XET=1 hf upload models-for-i/<name> models/<name> \
--repo-type model \
--commit-message "Add big-endian <quant> converted from <source-owner>/<source-repo>" \
--exclude ".cache/*"
Why HF_HUB_DISABLE_XET=1: the default Xet uploader hangs after partial uploads against this collection (sockets stall in CLOSE_WAIT, ~72% in; observed twice). Plain HTTPS/LFS uploads cleanly at 7-9 MB/s with no intervention needed. Always use the flag until the upstream xet-core bug is fixed.
Verify the commit:
uv run python -c "
from huggingface_hub import HfApi
info = HfApi().repo_info('models-for-i/<name>', files_metadata=True)
for s in info.siblings: print(f'{s.rfilename:45s} {s.size}')
"
Troubleshooting
ValueError: Cannot handle type <QTYPE> for tensor '<name>' โ the file uses a tensor type the converter doesn't support. Extend gguf_convert_endian.py per references/extending_converter.md. The file is untouched at this point (the check runs before writes).
Upload appears stuck at N% for 5+ minutes with xet enabled โ kill the process (kill $(pgrep -f "python3.*hf upload")) and retry with HF_HUB_DISABLE_XET=1. The xet cache dedups already-uploaded chunks, but the LFS path is more reliable here.
Auth: hf auth whoami must show models-for-i in orgs. If not, run hf auth login with a write-scope token.