Skip to content

Xilu016 Stem Compare

src.xil_pipeline.XILU016_stem_compare

Cross-reference Whisper transcripts from xil-stem-verify against the parsed script.

Reads a stem_verify JSON report (produced by XILU015) and a parsed script JSON (produced by XILP001), joins on seq, and flags dialogue stems where the Whisper transcript differs significantly from the scripted line.

Status codes

ok similarity >= threshold (silent — not written to flags list) garbled similarity < threshold and transcript is non-empty silent transcript.text is empty/whitespace (Whisper heard nothing) no_stem dialogue entry in parsed has no matching stem in the verify report not_transcribed stem exists but transcript is null (xil-stem-verify --no-transcribe)

SFX stems are always excluded — Whisper output on sound effects is meaningless.

Usage::

xil-stem-compare --episode S01E01
xil-stem-compare --show the413 --episode S01E01 --threshold 0.70
xil-stem-compare --stem-verify path/to/report.json --parsed path/to/parsed.json
xil-stem-compare --episode S01E01 --output compare_S01E01.json
xil-stem-compare --episode S01E01 --csv

logger module-attribute

logger = get_logger(__name__)

get_parser

get_parser() -> argparse.ArgumentParser
Source code in src/xil_pipeline/XILU016_stem_compare.py
def get_parser() -> argparse.ArgumentParser:
    parser = argparse.ArgumentParser(
        prog="xil-stem-compare",
        description="Cross-reference Whisper transcripts against parsed script dialogue to flag garbled stems.",
    )
    parser.add_argument("--show", "-s", default=None, metavar="SLUG",
                        help="Show slug (default: resolved from project.json)")
    parser.add_argument("--episode", "-e", default=None, metavar="TAG",
                        help="Episode tag, e.g. S01E01 (derives both JSON paths if not overridden)")
    parser.add_argument("--stem-verify", default=None, metavar="FILE",
                        help="Path to stem_verify JSON (default: <workspace>/parsed/<slug>/stem_verify_<episode>.json)")
    parser.add_argument("--parsed", default=None, metavar="FILE",
                        help="Path to parsed script JSON (default: <workspace>/parsed/<slug>/parsed_<episode>.json)")
    parser.add_argument("--threshold", type=float, default=0.75, metavar="FLOAT",
                        help="Similarity below this marks a stem as garbled (default: 0.75)")
    parser.add_argument("--output", "-o", default=None, metavar="FILE",
                        help="Write full JSON report to this file (in addition to terminal output)")
    parser.add_argument("--csv", action="store_true",
                        help="Print flagged entries as CSV to stdout instead of the banner summary")
    return parser

main

main() -> None

CLI entry point for Whisper transcript vs. script cross-reference.

Source code in src/xil_pipeline/XILU016_stem_compare.py
def main() -> None:
    """CLI entry point for Whisper transcript vs. script cross-reference."""
    configure_logging()
    args = get_parser().parse_args()
    with run_banner():
        _run(args)