| name | dify-docs-format-check-cjk |
| description | Check formatting compliance in changed Chinese and Japanese documentation against writing-guides/formatting-guide.md, tools/translate/formatting-zh.md, and tools/translate/formatting-ja.md. Use after finalizing a translation batch, or when the user says "check formatting (zh/ja)", "check CJK formatting", or "format audit" on translated content.
|
CJK Formatting Check (Chinese + Japanese)
Purpose
Verify Chinese (zh/) and Japanese (ja/) documentation against every rule in:
writing-guides/formatting-guide.md โ general rules
tools/translate/formatting-zh.md โ Chinese-specific rules
tools/translate/formatting-ja.md โ Japanese-specific rules
Mechanical rules are enforced by the linter script (check-format-cjk.py); judgment-call rules are checked by reading the file. The linter automatically selects the right rule set based on whether the path starts with zh/ or ja/.
Before Starting
Audit the entire file, not just the diff. Default to files currently under review, detected via:
git diff --name-only
git diff --cached --name-only
- Untracked files from
git status --porcelain (lines starting with ??)
Filter for .mdx and .md files under zh/ or ja/. If no files are detected, ask the user which files to check.
Because translations are typically produced as a zh/ja pair from the same English source, it is natural to audit both languages in a single session.
Checks
Part 1 โ Deterministic (run the linter)
Run the linter:
python3 .claude/skills/dify-docs-format-check-cjk/check-format-cjk.py <file> [<file> ...]
The script selects zh or ja rules based on the file path. Rules fall into three groups: shared (both zh and ja), zh-only, and ja-only.
Shared structural rules (zh + ja)
F-title-missing โ frontmatter missing title.
F-desc-trailing-period โ description ends with a period.
F-quote-needed / F-quote-unnecessary / F-single-quote โ frontmatter quoting rules.
F-blank-after-fm โ no blank line between frontmatter close and body.
H-trailing-hash, H-blank-before, H-blank-after, H-skip-level โ heading structure rules from the general guide.
B-trailing-colon-inside โ colon inside **...**.
L-asterisk-bullet, L-nested-indent, L-blank-before, L-blank-after โ list structure.
C-no-language, C-blank-before, C-blank-after โ code block rules.
Li-click-here, Li-http-external โ link rules.
I-raw-img-tag, I-alt-too-long, I-caption-alt-mismatch, I-filename-* โ image rules (same set as the EN skill).
M-tab-no-title โ Mintlify component rules.
S-double-blank โ spacing.
P-em-dash-spaces, P-en-dash-spaces โ general punctuation.
Shared CJK rules (zh + ja)
CJK-latin-spacing โ CJK character directly adjacent to Latin letter, digit, or backtick without a space. Exceptions: punctuation boundary, start/end of line, inside code/URLs.
CJK-halfwidth-punct โ half-width punctuation , . : ; ? ! ( ) directly adjacent to a CJK character. (Slash and other exceptions are handled by language-specific rules below.)
CJK-bold-no-space โ bold span **...** adjacent to CJK character without a space on each side.
CJK-link-no-space โ markdown link text adjacent to CJK character without a space on each side.
CJK-italic โ *text* italic used on a CJK span. Chinese and Japanese should use bold, never italic.
CJK-em-dash โ em dash โ or double em dash โโ appears in CJK text. Restructure the sentence.
CJK-disclaimer-missing โ translation disclaimer (<Note> โ ๏ธ ...) missing directly below the frontmatter.
CJK-cross-lang-link โ internal link begins with the wrong language prefix (e.g., /en/... inside a zh/ file).
Heading rules
H-heading-end-punct โ CJK heading ends with sentence-ending punctuation (ใ๏ผใ๏ผ๏ผ).
Chinese-only rules
ZH-ascii-ellipsis โ ... used where Chinese ellipsis โฆโฆ is expected.
ZH-fullwidth-slash โ full-width slash ๏ผ used; must be /.
ZH-quotes โ mainland-style double or single quotation marks "", '' used; must be corner brackets ใใ (single) or ใใ (nested).
ZH-range-hyphen โ numeric range uses - or โ; must use ๏ฝ.
ZH-percent-space โ space between a digit and % or ยฐ.
Japanese-only rules
JA-fullwidth-digit โ full-width digit used (๏ผ, ๏ผ, ...).
JA-fullwidth-latin โ full-width Latin letter used (๏ผก, ๏ผข, ...).
JA-fullwidth-space โ full-width space (ใ) used.
JA-sentence-too-long โ sentence longer than 80 Japanese characters.
JA-go-prefix โ ใ prefix used on a verb in the "avoid" list (ใ็ขบ่ชใใ ใใ, ใๅ็
งใใ ใใ, ใๅ
ฅๅใใ ใใ). ใๅฉ็จ is allowed.
JA-heading-sentence-ending โ heading ends with ใใพใ / ใใพใใ / ใใพใ๏ผ, indicating a full-sentence rather than noun-phrase form.
JA-style-mix โ the file contains both ใงใ/ใพใ and ใ /ใงใใ forms in body text. Only one register should be used.
Part 2 โ Judgment-call review (LLM reads each file)
For each changed file, read it and look for:
Translatable elements
- Tab titles (
<Tab title="...">) translated with the glossary value.
- Frame captions and image alt text translated.
- Bold UI labels translated, matching the codebase i18n file (
web/i18n/zh-Hans/ for zh, web/i18n/ja-JP/ for ja). Cross-check with the terminology-check skill.
- Natural-language prompt examples inside code blocks translated. Variable placeholders (
{{variable_name}}) stay unchanged.
Anchor translation (zh/ja)
- Cross-references
[text](#anchor) or [text](/path#anchor) use the translated heading slug, not the English original. Example: the heading ## ๅๅบ produces #ๅๅบ, not #response.
Chinese style
- Enumeration comma
ใ used for parallel items within a sentence (where Chinese separates items that would use commas in English).
- Chinese ellipsis
โฆโฆ used correctly; not combined with ็ญ in the same phrase.
- Arabic numerals used for technical content (
3 ็ง, not ไธ็ง) โ judgment call since some idiomatic expressions use Chinese numerals.
- Translationese patterns from
tools/translate/formatting-zh.md: redundant ไฝ ็, unnecessary ไผ, ๅฝ...ๆถ wrappers, ่ฝๅค instead of ่ฝ, ๅฏไปฅ where ๅฏ would suffice. Flag sentences that read as machine-translated.
Japanese style
ใงใ/ใพใ maintained in body text.
- Headings are noun phrases, not full sentences.
- Katakana long-vowel mark rules: short loanwords (3 morae or fewer) keep the trailing
ใผ; longer ones drop it. Established compound katakana terms (ใฏใผใฏใใญใผ, ใใฌใใธใใผใน) match the glossary.
- Middle dot
ใป only where needed for readability; established compound terms have no middle dot.
- Translationese patterns from
tools/translate/formatting-ja.md: redundant ใใชใใฎ, ใใใใใจใใงใใพใ instead of ใใงใใพใ, ใใจใ/ใๅ ดๅ wrappers, stacked ใฎ. Flag sentences that read as machine-translated.
General
- Terminology matches the glossary (
writing-guides/glossary.md). Prefer invoking the terminology-check skill for rigorous checking rather than duplicating its logic here.
Output Format
## CJK Formatting Check Results
### File: {path} ({lang})
**Deterministic violations** ({n})
- Line {n} [{rule-id}]: {message}
- ...
**Judgment-call findings** ({n})
- Line {n}: {description of issue} โ {rule area}
- ...
**Clean checks**
- โ
Disclaimer present
- โ
No bold/CJK spacing issues
- ...
Group by file. If a file has no issues at all, report a single โ
line.
Important
- Do NOT modify any files. This is a read-only audit.
- When a deterministic rule flags something that is clearly intentional on inspection (e.g., a half-width comma inside a Latin acronym, a mainland quotation mark quoted from an external source), surface it but note the ambiguity so the user can decide.
- The terminology-check skill is the authoritative source for glossary and UI-label verification. If a finding overlaps, defer to that skill and just note the overlap.
- Japanese sentence-length and style-mix checks are heuristic. They flag candidates for human review; they are not definitive rules violations.