Xilp000 Script Scanner
src.xil_pipeline.XILP000_script_scanner
Pre-flight scanner for production scripts.
Reads a raw markdown script, applies the same two-pass normalization that XILP001 uses, then scans every ALL-CAPS candidate line and reports which speakers and sections are recognized vs. unknown — before any parsing state machine runs.
Use this to catch missing KNOWN_SPEAKERS or SECTION_MAP entries before they cause silent failures in XILP001.
Usage:
python XILP000_script_scanner.py "scripts/<script>.md"
python XILP000_script_scanner.py "scripts/<script>.md" --json
is_all_caps_candidate
Return True if line is a bare ALL-CAPS line worth classifying.
Excludes dividers, stage directions, scene headers, and very short or very long strings. Anything that passes is either a speaker name, a section header, or an unrecognized ALL-CAPS label.
Source code in src/xil_pipeline/XILP000_script_scanner.py
load_and_normalize
Read path and apply the two-pass markdown normalization.
Returns a list of individual lines (including blank lines) after both
strip_markdown_escapes and strip_markdown_formatting have been
applied.
Source code in src/xil_pipeline/XILP000_script_scanner.py
scan_script
scan_script(lines: list[str], known_speakers: list[str] | None = None, speaker_keys: dict[str, str] | None = None) -> dict
Scan normalized lines and classify every ALL-CAPS candidate.
Parameters:
-
lines(list[str]) –Normalized script lines.
-
known_speakers(list[str] | None, default:None) –Ordered list of speaker display names (longest-first). Defaults to the module-level speakers from XILP001.
-
speaker_keys(dict[str, str] | None, default:None) –Mapping from display names to normalized keys. Defaults to the module-level speakers from XILP001.
Returns a dict::
{
"sections": [{"text": str, "slug": str, "line": int}, ...],
"speakers": {key: {"display": str, "count": int, "lines": [int, ...]}, ...},
"unrecognized": [{"text": str, "lines": [int, ...]}, ...],
}
Source code in src/xil_pipeline/XILP000_script_scanner.py
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | |
scan_direction_texts
Audit direction texts against an existing SFX config.
Classifies each unique direction text as:
- matched: key exists in sfx_effects with a source field
- hinted: key not in sfx_effects but script has a pipe-hint (new source)
- new: key not in sfx_effects and no hint (will need generation prompt)
Parameters:
-
lines(list[str]) –Normalized script lines.
-
sfx_effects(dict) –The
effectsdict fromsfx_<TAG>.json.
Returns a dict with keys matched, hinted, new — each a list of
{"text": str, "hint": str | None, "lines": [int, ...]} dicts.
Source code in src/xil_pipeline/XILP000_script_scanner.py
scan_vintage_filter_pairing
Check that every VINTAGE FILTER ENGAGES has a matching DISENGAGES.
Returns a list of unpaired marker dicts:
{"text": str, "line": int, "type": "ENGAGES" | "DISENGAGES"}
Source code in src/xil_pipeline/XILP000_script_scanner.py
scan_preamble_postamble
Check whether PREAMBLE and POSTAMBLE sections are present.
Parameters:
Returns a dict {"preamble": bool, "postamble": bool}.
Source code in src/xil_pipeline/XILP000_script_scanner.py
scan_ambience_coverage
Check that every looping AMBIENCE direction has a stop marker.
A stop marker is either [AMBIENCE: STOP] or a direction ending in
FADES OUT. Returns a list of unclosed ambience dicts:
{"text": str, "line": int}
Source code in src/xil_pipeline/XILP000_script_scanner.py
format_report
Render scan results as a human-readable text report.
Source code in src/xil_pipeline/XILP000_script_scanner.py
315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | |
harvest_cast
Scan all scripts for CAST: entries and optionally add new ones to speakers.json.
Reports every character declared in any CAST: block that is absent from the current speakers.json. With apply=True, appends those entries.
Source code in src/xil_pipeline/XILP000_script_scanner.py
backfill_cast
backfill_cast(scripts_dir: str, speakers_path: str | None, parsed_dir: str | None = None, dry_run: bool = True) -> None
Add CAST: blocks to scripts that don't have one.
Speaker lists are inferred from existing parsed JSON (most reliable) or by body-scanning the raw script against the known speakers.json list.
Source code in src/xil_pipeline/XILP000_script_scanner.py
530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 | |
get_parser
Source code in src/xil_pipeline/XILP000_script_scanner.py
main
Source code in src/xil_pipeline/XILP000_script_scanner.py
709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 | |