Cross-reference Whisper transcripts from xil-stem-verify against the parsed script.
Reads a stem_verify JSON report (produced by XILU015) and a parsed script JSON
(produced by XILP001), joins on seq, and flags dialogue stems where the Whisper
transcript differs significantly from the scripted line.
Status codes
ok similarity >= threshold (silent — not written to flags list)
garbled similarity < threshold and transcript is non-empty
silent transcript.text is empty/whitespace (Whisper heard nothing)
no_stem dialogue entry in parsed has no matching stem in the verify report
not_transcribed stem exists but transcript is null (xil-stem-verify --no-transcribe)
SFX stems are always excluded — Whisper output on sound effects is meaningless.
Usage::
xil-stem-compare --episode S01E01
xil-stem-compare --show the413 --episode S01E01 --threshold 0.70
xil-stem-compare --stem-verify path/to/report.json --parsed path/to/parsed.json
xil-stem-compare --episode S01E01 --output compare_S01E01.json
xil-stem-compare --episode S01E01 --csv
logger
module-attribute
logger = get_logger(__name__)
get_parser
get_parser() -> argparse.ArgumentParser
Source code in src/xil_pipeline/XILU016_stem_compare.py
| def get_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
prog="xil-stem-compare",
description="Cross-reference Whisper transcripts against parsed script dialogue to flag garbled stems.",
)
parser.add_argument("--show", "-s", default=None, metavar="SLUG",
help="Show slug (default: resolved from project.json)")
parser.add_argument("--episode", "-e", default=None, metavar="TAG",
help="Episode tag, e.g. S01E01 (derives both JSON paths if not overridden)")
parser.add_argument("--stem-verify", default=None, metavar="FILE",
help="Path to stem_verify JSON (default: <workspace>/parsed/<slug>/stem_verify_<episode>.json)")
parser.add_argument("--parsed", default=None, metavar="FILE",
help="Path to parsed script JSON (default: <workspace>/parsed/<slug>/parsed_<episode>.json)")
parser.add_argument("--threshold", type=float, default=0.75, metavar="FLOAT",
help="Similarity below this marks a stem as garbled (default: 0.75)")
parser.add_argument("--output", "-o", default=None, metavar="FILE",
help="Write full JSON report to this file (in addition to terminal output)")
parser.add_argument("--csv", action="store_true",
help="Print flagged entries as CSV to stdout instead of the banner summary")
return parser
|
main
CLI entry point for Whisper transcript vs. script cross-reference.
Source code in src/xil_pipeline/XILU016_stem_compare.py
| def main() -> None:
"""CLI entry point for Whisper transcript vs. script cross-reference."""
configure_logging()
args = get_parser().parse_args()
with run_banner():
_run(args)
|