code-tape subtitle postprocessor ONNX

This is the browser-local ONNX export of the code-tape subtitle post-processing model. It is the default LLM used by the code-tape web app for the "纠错并生成章节" workflow.

The model receives ASR subtitle segments plus code context and returns strict JSON:

sparse subtitle corrections for frontend/code terminology;
playback chapter jump points derived from subtitle timestamps;
no Markdown, no explanation, no extra wrapper text.

This model is not ASR. In code-tape, ASR is handled separately by Whisper; this ONNX model only post-processes the resulting subtitle text.

Repository role

code-tape publishes this model family in three forms:

Repository	Purpose
`ceilf6/code-tape-subtitle-postprocessor-lora`	LoRA adapter for reproducibility and continued fine-tuning.
`ceilf6/code-tape-subtitle-postprocessor-merged`	Full merged Hugging Face model.
`ceilf6/code-tape-subtitle-postprocessor-onnx`	This Transformers.js-compatible ONNX export for browser-local inference.

Use this repository when integrating with the browser app.

Intended contract

Input payload:

{
  "context": {
    "fileName": "SubtitlePanel.tsx",
    "code": "await postProcessor.process({ track, context });",
    "runtimeOutput": "",
    "glossary": ["SubtitlePanel", "postProcessor", "chapters"]
  },
  "inputSegments": [
    { "id": "subtitle-1", "text": "这里创建 hugging face 字幕 post processor" },
    { "id": "subtitle-2", "text": "最后生成 corrections 和 chapters" }
  ],
  "timeline": [
    { "id": "subtitle-1", "startMs": 0, "endMs": 1600 },
    { "id": "subtitle-2", "startMs": 1600, "endMs": 3300 }
  ]
}

Expected output shape:

{
  "segments": [
    { "id": "subtitle-1", "text": "这里创建 Hugging Face 字幕 postProcessor" }
  ],
  "chapters": [
    { "title": "创建字幕后处理器", "startMs": 0, "endMs": 1600 },
    { "title": "生成纠错和章节", "startMs": 1600, "endMs": 3300 }
  ]
}

segments is a sparse change set. Omitted subtitle segments are treated as unchanged by the application.

Browser usage

import { pipeline } from "@huggingface/transformers";

const generator = await pipeline(
  "text-generation",
  "ceilf6/code-tape-subtitle-postprocessor-onnx",
  { device: "wasm", dtype: "q8" },
);

const messages = [
  {
    role: "system",
    content: [
      "You are the code-tape subtitle post-processing model.",
      "Only output one JSON object.",
      "Goal: correct ASR subtitle text for frontend/code terms and create playback chapter jump points.",
      'Output shape: {"segments":[{"id":"subtitle-1","text":"corrected text"}],"chapters":[{"title":"问题分析","startMs":0,"endMs":1000}]}',
    ].join("\n"),
  },
  {
    role: "user",
    content: JSON.stringify({
      context: { fileName: "Counter.tsx", code: "", runtimeOutput: "", glossary: ["useState"] },
      inputSegments: [{ id: "subtitle-1", text: "这里用 use state" }],
      timeline: [{ id: "subtitle-1", startMs: 0, endMs: 1200 }],
    }),
  },
];

const output = await generator(messages, {
  max_new_tokens: 384,
  do_sample: false,
  return_full_text: false,
});

In production, code-tape loads the validated WASM q8 path directly. The q4/q4f16 exports were not published for the current v12 artifact because local Transformers.js smoke testing produced malformed JSON. The application also handles browser cache write failures and validates every model response before applying it.

Integration notes

Public browser loading does not require a Hugging Face token.
Keep prompts short. The code-tape app budgets source code, runtime output, and output token count to keep local inference responsive.
Validate JSON before use. Invalid JSON, unknown segment ids, duplicate ids, empty text, overlapping chapters, or chapters outside the subtitle timeline must fall back safely.
This model should run after ASR, not before ASR.

Training and export lineage

Fine-tune a LoRA adapter from HuggingFaceTB/SmolLM2-135M-Instruct.
Merge the adapter into a full Hugging Face model.
Export/quantize the merged model to ONNX for @huggingface/transformers browser inference.

Evaluation

code-tape evaluates this model family with project-specific checks:

JSON parseability;
sparse segment reference validity;
glossary preservation after sparse corrections are applied to the source subtitles;
chapter ordering, overlap, and bounds within the subtitle timeline.

No broad general-purpose benchmark score is claimed.

Current v12 smoke result

On the code-tape validation prompt with the inputSegments plus timeline contract:

q8 load: 651 ms;
q8 generation: 1274 ms;
JSON valid: yes;
unknown segment ids: 0;
extra timing fields inside segments: 0.

Limitations

The model is small and domain-specific; malformed JSON is possible.
It is optimized for frontend/code explanation subtitles, not arbitrary subtitles.
It cannot transcribe audio.
Long subtitle tracks should be split before local browser inference.

Privacy and security

The intended path is browser-local inference. Audio transcription, subtitle correction, and chapter generation can run without sending media or subtitles to a hosted inference API.

Do not include secrets, private source code, credentials, or access tokens in prompts unless you control the full runtime and storage environment.