Mix Common
src.xil_pipeline.mix_common
Shared multi-track mixing utilities for the audio pipeline.
Provides timeline construction and per-layer audio building used by XILP003 (automated two-pass mix) and XILP005 (DAW layer export). Both stages classify stems by direction_type from the parsed script JSON, then build foreground (dialogue/SFX) and background (ambience/ music) layers independently before combining.
Module Attributes
BACKGROUND_DIRECTION_TYPES: direction_type values routed to the background layer rather than the foreground timeline. AMBIENCE_LEVEL_DB: Default dB reduction applied to ambience overlays in the automated mix (Option A). 0 for DAW export (Option C). MUSIC_LEVEL_DB: Default dB reduction applied to music overlays in the automated mix. 0 for DAW export.
BACKGROUND_DIRECTION_TYPES
module-attribute
StemPlan
dataclass
Resolved metadata for a single audio stem file.
Attributes:
-
seq(int) –Sequence number extracted from the stem filename.
-
filepath(str) –Absolute or relative path to the MP3 stem file.
-
direction_type(str | None) –Parsed direction category for this entry (
"SFX","MUSIC","AMBIENCE","BEAT"), orNonefor dialogue stems. -
entry_type(str | None) –Parsed entry classification (
"dialogue","direction", etc.), orNoneif not in index. -
foreground_override(bool) –When
True, forces the stem into the foreground timeline even ifdirection_typewould normally route it to the background (e.g. preamble intro music that must play sequentially, not as an overlay).
Source code in src/xil_pipeline/mix_common.py
__init__
__init__(seq: int, filepath: str, direction_type: str | None, entry_type: str | None, text: str | None = None, scene: str | None = None, foreground_override: bool = False, volume_percentage: float | None = None, ramp_in_seconds: float | None = None, ramp_out_seconds: float | None = None, play_duration: float | None = None, tts_model: str | None = None, pre_trimmed: bool = False, loop: bool = True) -> None
extract_seq
Extract the sequence number from a stem filename.
Stems are named {seq:03d}_{section}[-{scene}]_{speaker}.mp3.
Legacy preamble stems used an n prefix (n{abs(seq):03d}_...mp3)
and are still parsed for backward compatibility.
Parameters:
-
filepath(str) –Path like
stems/S01E01/001_preamble_tina.mp3orstems/S01E01/003_cold-open_adam.mp3.
Returns:
-
int–Integer sequence number (e.g.
"003"→3).
Source code in src/xil_pipeline/mix_common.py
load_entries_index
Load a parsed script JSON and return a {seq: entry} index.
Parameters:
-
parsed_path(str) –Path to the parsed script JSON produced by XILP001.
Returns:
Source code in src/xil_pipeline/mix_common.py
collect_stem_plans
collect_stem_plans(stems_dir: str, entries_index: dict[int, dict], sfx_config=None) -> list[StemPlan]
Collect and classify all MP3 stems in a stems directory.
Uses the entries index to look up each stem's direction_type
and entry_type by sequence number. Stems whose seq is not in
the index are logged as stale and skipped.
Parameters:
-
stems_dir(str) –Directory containing episode stem MP3 files.
-
entries_index(dict[int, dict]) –{seq: entry}mapping from :func:load_entries_index. -
sfx_config–Optional :class:
~models.SfxConfiguration; when provided, resolves per-effect or category-default volume/ramp values into MUSIC and AMBIENCE plans.
Returns:
Source code in src/xil_pipeline/mix_common.py
254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 | |
apply_phone_filter
Apply a phone-speaker audio filter to an audio segment.
Cuts frequencies below 300 Hz and above 3000 Hz, then boosts volume by 5 dB to simulate a telephone speaker.
Parameters:
-
segment(AudioSegment) –Input audio segment to filter.
Returns:
-
AudioSegment–Filtered audio segment.
Source code in src/xil_pipeline/mix_common.py
apply_vintage_filter
Apply a vintage (1960s-era) audio filter to an audio segment.
Rolls off high frequencies above 5 kHz (1960s tape/broadcast ceiling) and low rumble below 150 Hz, then reduces volume by 3 dB to simulate the compressed, mid-forward quality of aged tape recording.
Parameters:
-
segment(AudioSegment) –Input audio segment to filter.
Returns:
-
AudioSegment–Filtered audio segment.
Source code in src/xil_pipeline/mix_common.py
build_foreground
build_foreground(stem_plans: list[StemPlan], cast_config: dict, gap_ms: int = 600, vintage_scenes: list[str] | None = None) -> tuple[AudioSegment, dict[int, int]]
Build the foreground audio track and a full-episode timeline.
Iterates stems in sequence order. Foreground stems (dialogue, SFX, BEAT) are concatenated with silence gaps and their positions are recorded in the timeline. Background stems (AMBIENCE, MUSIC) are recorded in the timeline at the current foreground cursor position but do not advance it — they are overlaid at that cue point in a separate background pass.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
cast_config(dict) –{speaker_key: {"pan": float, "filter": str | bool | None}}for per-speaker audio effects.filteracceptsFalse/None(no filter),True/"phone"(phone filter),"vintage", or a comma-separated combination such as"vintage,phone". -
gap_ms(int, default:600) –Silence inserted between foreground stems in ms.
-
vintage_scenes(list[str] | None, default:None) –Optional list of scene labels (e.g.
["scene-3", "scene-4"]) whose dialogue stems receive an additional vintage filter pass. Applied after the per-speaker filter chain.
Returns:
-
AudioSegment–Tuple of
(foreground_audio, timeline)wheretimeline -
dict[int, int]–maps sequence numbers to millisecond offsets within the
-
tuple[AudioSegment, dict[int, int]]–foreground track.
Source code in src/xil_pipeline/mix_common.py
build_ambience_layer
build_ambience_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = AMBIENCE_LEVEL_DB) -> AudioSegment
Build the ambience background layer.
Each AMBIENCE stem is looped from its cue point to the start of
the next background cue (AMBIENCE or MUSIC) or the end of the
track, whichever comes first. The level_db parameter controls
ducking; use 0 for DAW layer export so the producer controls
levels in-DAW.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
timeline(dict[int, int]) –Cue-point timestamps from :func:
build_foreground. -
total_ms(int) –Total foreground track length in milliseconds.
-
level_db(float, default:AMBIENCE_LEVEL_DB) –Volume adjustment applied to the clip before looping. Negative values duck the ambience below dialogue.
Returns:
-
AudioSegment–Tuple of
(layer, labels)where layer is a full-length -
AudioSegment–class:
~pydub.AudioSegmentwith ambience looped at each cue -
AudioSegment–point, and labels is a list of
(start_sec, end_sec, text) -
AudioSegment–tuples spanning each looped region.
Source code in src/xil_pipeline/mix_common.py
591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 | |
build_vintage_filter_layer
build_vintage_filter_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = 0) -> tuple[AudioSegment, list[tuple]]
Build the vintage filter crackle layer.
Loops the crackle source between each VINTAGE FILTER ENGAGES
marker and the next VINTAGE FILTER DISENGAGES marker (or track
end if no DISENGAGES follows). Use level_db=0 for DAW export so
the producer controls levels in-DAW.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
timeline(dict[int, int]) –Cue-point timestamps from :func:
build_foreground. -
total_ms(int) –Total foreground track length in milliseconds.
-
level_db(float, default:0) –Volume adjustment applied to each looped region.
Returns:
-
AudioSegment–Tuple of
(layer, labels)where layer is a full-length -
list[tuple]–class:
~pydub.AudioSegmentwith crackle looped at each active -
tuple[AudioSegment, list[tuple]]–span, and labels is a list of
(start_sec, end_sec, text) -
tuple[AudioSegment, list[tuple]]–tuples for each looped region.
Source code in src/xil_pipeline/mix_common.py
677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 | |
build_music_layer
build_music_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = MUSIC_LEVEL_DB, include_foreground_override: bool = False) -> AudioSegment
Build the music/sting background layer.
Each MUSIC stem is overlaid at its cue point without looping.
Use level_db=0 for DAW layer export so levels are set in-DAW.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
timeline(dict[int, int]) –Cue-point timestamps from :func:
build_foreground. -
total_ms(int) –Total foreground track length in milliseconds.
-
level_db(float, default:MUSIC_LEVEL_DB) –Volume adjustment applied before overlaying.
-
include_foreground_override(bool, default:False) –When
True, preamble/postamble MUSIC stems (foreground_override=True) are placed at their timeline position in this layer. Used by DAW export so the operator can see and mix them; set toFalse(default) for the integrated mix where they play sequentially via :func:build_foregroundinstead.
Returns:
-
AudioSegment–Tuple of
(layer, labels)where layer is a full-length -
AudioSegment–class:
~pydub.AudioSegmentwith music stings overlaid at -
AudioSegment–their cue positions, and labels is a list of
-
AudioSegment–(start_sec, end_sec, text)tuples for each sting.
Source code in src/xil_pipeline/mix_common.py
build_dialogue_layer
build_dialogue_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, cast_config: dict, vintage_scenes: list[str] | None = None) -> tuple
Build an isolated dialogue layer for DAW export.
Places only dialogue stems (entry_type == "dialogue") at their
foreground timeline positions in a full-length silent segment.
Filter and pan effects are applied per speaker as configured.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
timeline(dict[int, int]) –Cue-point timestamps from :func:
build_foreground. -
total_ms(int) –Total track length in milliseconds.
-
cast_config(dict) –Per-speaker audio settings.
-
vintage_scenes(list[str] | None, default:None) –Optional list of scene labels whose dialogue stems receive an additional vintage filter pass (same as :func:
build_foreground).
Returns:
-
tuple–Tuple of
(layer, labels)where layer is a full-length -
tuple–class:
~pydub.AudioSegmentwith dialogue stems at their -
tuple–timeline positions, and labels is a list of
-
tuple–(start_sec, end_sec, speaker)tuples for Audacity label export.
Source code in src/xil_pipeline/mix_common.py
build_foreground_timeline_only
build_foreground_timeline_only(stem_plans: list[StemPlan], gap_ms: int = 600) -> tuple[int, dict[int, int]]
Build a foreground timeline without decoding audio.
Lightweight variant of :func:build_foreground that reads MP3
durations via mutagen header inspection instead of loading full
audio via pydub. Enables --dry-run --timeline without
expensive audio decoding.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
gap_ms(int, default:600) –Silence gap between foreground stems in ms.
Returns:
-
int–Tuple of
(total_ms, timeline)wheretimelinemaps -
dict[int, int]–sequence numbers to millisecond offsets.
Source code in src/xil_pipeline/mix_common.py
compute_dialogue_labels
Compute dialogue label tuples without loading audio.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list.
-
timeline(dict[int, int]) –Cue-point timestamps from a foreground build.
Returns:
-
list[tuple]–List of 7-element tuples
(start_s, end_s, speaker, None, None, None, snippet) -
list[tuple]–where snippet is the first 5 words of the dialogue text (or
Noneif no -
list[tuple]–text is available). Positions [3]–[5] are
None(dialogue has no ramp or -
list[tuple]–play_duration); position [6] carries the snippet for the HTML tooltip.
Source code in src/xil_pipeline/mix_common.py
compute_ambience_labels
compute_ambience_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]
Compute ambience label tuples without loading audio.
Uses the same boundary logic as :func:build_ambience_layer.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list.
-
timeline(dict[int, int]) –Cue-point timestamps.
-
total_ms(int) –Total episode duration in ms.
Returns:
Source code in src/xil_pipeline/mix_common.py
compute_vintage_filter_labels
compute_vintage_filter_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]
Compute vintage filter label tuples without loading audio.
Uses the same boundary logic as :func:build_vintage_filter_layer.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list.
-
timeline(dict[int, int]) –Cue-point timestamps.
-
total_ms(int) –Total episode duration in ms.
Returns:
Source code in src/xil_pipeline/mix_common.py
compute_music_labels
compute_music_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, include_foreground_override: bool = False) -> list[tuple[float, float, str]]
Compute music label tuples without loading audio.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list.
-
timeline(dict[int, int]) –Cue-point timestamps.
-
total_ms(int) –Total episode duration in ms.
-
include_foreground_override(bool, default:False) –When
True, include preamble/ postamble MUSIC stems. Mirror of the same flag on :func:build_music_layer.
Returns:
Source code in src/xil_pipeline/mix_common.py
compute_sfx_labels
compute_sfx_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]
Compute SFX/BEAT label tuples without loading audio.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list.
-
timeline(dict[int, int]) –Cue-point timestamps.
-
total_ms(int) –Total episode duration in ms.
Returns:
Source code in src/xil_pipeline/mix_common.py
build_sfx_layer
build_sfx_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> AudioSegment
Build an isolated SFX layer for DAW export.
Places only one-shot SFX and BEAT stems (direction_type in
("SFX", "BEAT")) at their foreground timeline positions.
Parameters:
-
stem_plans(list[StemPlan]) –Classified stem list from :func:
collect_stem_plans. -
timeline(dict[int, int]) –Cue-point timestamps from :func:
build_foreground. -
total_ms(int) –Total track length in milliseconds.
Returns:
-
AudioSegment–Tuple of
(layer, labels)where layer is a full-length -
AudioSegment–class:
~pydub.AudioSegmentwith SFX stems at their timeline -
AudioSegment–positions, and labels is a list of
(start_sec, end_sec, text) -
AudioSegment–tuples for each one-shot effect.