Skip to content

Mix Common

src.xil_pipeline.mix_common

Shared multi-track mixing utilities for the audio pipeline.

Provides timeline construction and per-layer audio building used by XILP003 (automated two-pass mix) and XILP005 (DAW layer export). Both stages classify stems by direction_type from the parsed script JSON, then build foreground (dialogue/SFX) and background (ambience/ music) layers independently before combining.

Module Attributes

BACKGROUND_DIRECTION_TYPES: direction_type values routed to the background layer rather than the foreground timeline. AMBIENCE_LEVEL_DB: Default dB reduction applied to ambience overlays in the automated mix (Option A). 0 for DAW export (Option C). MUSIC_LEVEL_DB: Default dB reduction applied to music overlays in the automated mix. 0 for DAW export.

logger module-attribute

logger = get_logger(__name__)

BACKGROUND_DIRECTION_TYPES module-attribute

BACKGROUND_DIRECTION_TYPES: frozenset[str] = frozenset({'AMBIENCE', 'MUSIC', 'VINTAGE FILTER'})

AMBIENCE_LEVEL_DB module-attribute

AMBIENCE_LEVEL_DB: float = -10.0

MUSIC_LEVEL_DB module-attribute

MUSIC_LEVEL_DB: float = -6.0

StemPlan dataclass

Resolved metadata for a single audio stem file.

Attributes:

  • seq (int) –

    Sequence number extracted from the stem filename.

  • filepath (str) –

    Absolute or relative path to the MP3 stem file.

  • direction_type (str | None) –

    Parsed direction category for this entry ("SFX", "MUSIC", "AMBIENCE", "BEAT"), or None for dialogue stems.

  • entry_type (str | None) –

    Parsed entry classification ("dialogue", "direction", etc.), or None if not in index.

  • foreground_override (bool) –

    When True, forces the stem into the foreground timeline even if direction_type would normally route it to the background (e.g. preamble intro music that must play sequentially, not as an overlay).

Source code in src/xil_pipeline/mix_common.py
@dataclass
class StemPlan:
    """Resolved metadata for a single audio stem file.

    Attributes:
        seq: Sequence number extracted from the stem filename.
        filepath: Absolute or relative path to the MP3 stem file.
        direction_type: Parsed direction category for this entry
            (``"SFX"``, ``"MUSIC"``, ``"AMBIENCE"``, ``"BEAT"``),
            or ``None`` for dialogue stems.
        entry_type: Parsed entry classification (``"dialogue"``,
            ``"direction"``, etc.), or ``None`` if not in index.
        foreground_override: When ``True``, forces the stem into the
            foreground timeline even if ``direction_type`` would normally
            route it to the background (e.g. preamble intro music that
            must play sequentially, not as an overlay).
    """

    seq: int
    filepath: str
    direction_type: str | None
    entry_type: str | None
    text: str | None = None
    scene: str | None = None
    foreground_override: bool = False
    volume_percentage: float | None = None
    ramp_in_seconds: float | None = None
    ramp_out_seconds: float | None = None
    play_duration: float | None = None
    tts_model: str | None = None
    pre_trimmed: bool = False
    loop: bool = True

    @property
    def is_background(self) -> bool:
        """True if this stem belongs in the background layer."""
        if self.foreground_override:
            return False
        return self.direction_type in BACKGROUND_DIRECTION_TYPES

seq instance-attribute

seq: int

filepath instance-attribute

filepath: str

direction_type instance-attribute

direction_type: str | None

entry_type instance-attribute

entry_type: str | None

text class-attribute instance-attribute

text: str | None = None

scene class-attribute instance-attribute

scene: str | None = None

foreground_override class-attribute instance-attribute

foreground_override: bool = False

volume_percentage class-attribute instance-attribute

volume_percentage: float | None = None

ramp_in_seconds class-attribute instance-attribute

ramp_in_seconds: float | None = None

ramp_out_seconds class-attribute instance-attribute

ramp_out_seconds: float | None = None

play_duration class-attribute instance-attribute

play_duration: float | None = None

tts_model class-attribute instance-attribute

tts_model: str | None = None

pre_trimmed class-attribute instance-attribute

pre_trimmed: bool = False

loop class-attribute instance-attribute

loop: bool = True

is_background property

is_background: bool

True if this stem belongs in the background layer.

__init__

__init__(seq: int, filepath: str, direction_type: str | None, entry_type: str | None, text: str | None = None, scene: str | None = None, foreground_override: bool = False, volume_percentage: float | None = None, ramp_in_seconds: float | None = None, ramp_out_seconds: float | None = None, play_duration: float | None = None, tts_model: str | None = None, pre_trimmed: bool = False, loop: bool = True) -> None

extract_seq

extract_seq(filepath: str) -> int

Extract the sequence number from a stem filename.

Stems are named {seq:03d}_{section}[-{scene}]_{speaker}.mp3. Legacy preamble stems used an n prefix (n{abs(seq):03d}_...mp3) and are still parsed for backward compatibility.

Parameters:

  • filepath (str) –

    Path like stems/S01E01/001_preamble_tina.mp3 or stems/S01E01/003_cold-open_adam.mp3.

Returns:

  • int

    Integer sequence number (e.g. "003"3).

Source code in src/xil_pipeline/mix_common.py
def extract_seq(filepath: str) -> int:
    """Extract the sequence number from a stem filename.

    Stems are named ``{seq:03d}_{section}[-{scene}]_{speaker}.mp3``.
    Legacy preamble stems used an ``n`` prefix (``n{abs(seq):03d}_...mp3``)
    and are still parsed for backward compatibility.

    Args:
        filepath: Path like ``stems/S01E01/001_preamble_tina.mp3`` or
            ``stems/S01E01/003_cold-open_adam.mp3``.

    Returns:
        Integer sequence number (e.g. ``"003"`` → ``3``).
    """
    basename = os.path.splitext(os.path.basename(filepath))[0]
    prefix = basename.split("_")[0]
    if prefix.startswith("n") and prefix[1:].isdigit():
        return -int(prefix[1:])
    return int(prefix)

load_entries_index

load_entries_index(parsed_path: str) -> dict[int, dict]

Load a parsed script JSON and return a {seq: entry} index.

Parameters:

  • parsed_path (str) –

    Path to the parsed script JSON produced by XILP001.

Returns:

  • dict[int, dict]

    Dict mapping each sequence number to its full entry dict.

Source code in src/xil_pipeline/mix_common.py
def load_entries_index(parsed_path: str) -> dict[int, dict]:
    """Load a parsed script JSON and return a ``{seq: entry}`` index.

    Args:
        parsed_path: Path to the parsed script JSON produced by XILP001.

    Returns:
        Dict mapping each sequence number to its full entry dict.
    """
    with open(parsed_path, encoding="utf-8") as f:
        data = json.load(f)
    return {entry["seq"]: entry for entry in data["entries"]}

collect_stem_plans

collect_stem_plans(stems_dir: str, entries_index: dict[int, dict], sfx_config=None) -> list[StemPlan]

Collect and classify all MP3 stems in a stems directory.

Uses the entries index to look up each stem's direction_type and entry_type by sequence number. Stems whose seq is not in the index are logged as stale and skipped.

Parameters:

  • stems_dir (str) –

    Directory containing episode stem MP3 files.

  • entries_index (dict[int, dict]) –

    {seq: entry} mapping from :func:load_entries_index.

  • sfx_config

    Optional :class:~models.SfxConfiguration; when provided, resolves per-effect or category-default volume/ramp values into MUSIC and AMBIENCE plans.

Returns:

  • list[StemPlan]

    List of :class:StemPlan instances sorted by sequence number.

Source code in src/xil_pipeline/mix_common.py
def collect_stem_plans(
    stems_dir: str, entries_index: dict[int, dict], sfx_config=None
) -> list[StemPlan]:
    """Collect and classify all MP3 stems in a stems directory.

    Uses the entries index to look up each stem's ``direction_type``
    and ``entry_type`` by sequence number. Stems whose seq is not in
    the index are logged as stale and skipped.

    Args:
        stems_dir: Directory containing episode stem MP3 files.
        entries_index: ``{seq: entry}`` mapping from
            :func:`load_entries_index`.
        sfx_config: Optional :class:`~models.SfxConfiguration`; when provided,
            resolves per-effect or category-default volume/ramp values into
            MUSIC and AMBIENCE plans.

    Returns:
        List of :class:`StemPlan` instances sorted by sequence number.
    """
    stem_files = sorted(glob.glob(os.path.join(stems_dir, "*.mp3")))
    plans = []
    for filepath in stem_files:
        try:
            seq = extract_seq(filepath)
        except ValueError:
            continue  # preamble_*.mp3 and other non-seq files handled separately
        entry = entries_index.get(seq, {})
        entry_type = entry.get("type")

        # Cross-check filename suffix vs parsed entry type to catch stale stems.
        # SFX/direction stems always end with `_sfx`; dialogue stems end with a
        # speaker key (never `_sfx`).
        basename = os.path.splitext(os.path.basename(filepath))[0]
        suffix = basename.rsplit("_", 1)[-1]
        is_sfx_stem = suffix == "sfx"

        # Seq not in parsed JSON at all — stale stem from a prior run (e.g. postamble
        # injected before a new section was appended to the script).
        if not entry:
            logger.warning("Stale stem skipped: %s (seq %d not in parsed JSON)",
                           os.path.basename(filepath), seq)
            continue

        # Header entries (section_header, scene_header) never have stems
        if entry_type not in ("dialogue", "direction"):
            logger.warning("Stale stem skipped: %s (seq %d is now a %s entry)",
                           os.path.basename(filepath), seq, entry_type)
            continue
        if is_sfx_stem and entry_type == "dialogue":
            logger.warning("Stale stem skipped: %s (seq %d is now a dialogue entry)",
                           os.path.basename(filepath), seq)
            continue
        if not is_sfx_stem and entry_type == "direction":
            logger.warning("Stale stem skipped: %s (seq %d is now a direction entry)",
                           os.path.basename(filepath), seq)
            continue
        # Check speaker suffix matches parsed speaker for dialogue stems
        if entry_type == "dialogue" and entry.get("speaker"):
            expected_suffix = f"_{entry['speaker']}"
            if not basename.endswith(expected_suffix):
                logger.warning("Stale stem skipped: %s (seq %d speaker is now %s)",
                               os.path.basename(filepath), seq, entry["speaker"])
                continue

        plan = StemPlan(
            seq=seq,
            filepath=filepath,
            direction_type=entry.get("direction_type"),
            entry_type=entry_type,
            text=entry.get("text"),
            scene=entry.get("scene"),
        )
        # Preamble intro music (seq < 0) and postamble outro music both play
        # sequentially in the foreground rather than as background overlays.
        if plan.direction_type == "MUSIC" and (
            entry.get("section") in ("preamble", "postamble") or plan.seq < 0
        ):
            plan.foreground_override = True
        vol, ri, ro, pd = _resolve_audio_params(plan, sfx_config)
        plan.volume_percentage = vol
        plan.ramp_in_seconds = ri
        plan.ramp_out_seconds = ro

        plan.play_duration = pd
        if entry_type == "dialogue" and _MutagenMP3 is not None:
            try:
                audio = _MutagenMP3(filepath)
                if audio.tags:
                    comm = audio.tags.get("COMM::eng")
                    if comm and comm.text:
                        plan.tts_model = comm.text[0]
            except Exception:
                pass
        src_entry = _find_effect_entry(sfx_config, plan.text) if sfx_config else None
        if src_entry is not None:
            if src_entry.loop is False:
                plan.loop = False
            # Derive play_duration from duration_seconds for source-based clips.
            # duration_seconds controls API generation length for generated effects;
            # for source= entries it has no effect unless we convert it here.
            if (src_entry.source is not None
                    and plan.play_duration is None
                    and src_entry.duration_seconds > 0):
                try:
                    clip_ms = _mp3_duration_ms(filepath)
                    if clip_ms > 0:
                        target_ms = src_entry.duration_seconds * 1000
                        plan.play_duration = min(100.0, target_ms / clip_ms * 100.0)
                except Exception:
                    logger.debug("Could not read duration for %s — duration_seconds ignored",
                                 os.path.basename(filepath))
        plans.append(plan)

    # Deduplicate: if multiple stems share the same seq (e.g. old and new
    # section names for an SFX entry), keep only the first one and warn.
    deduped = []
    seen_plan_seqs: set[int] = set()
    for plan in plans:
        if plan.seq in seen_plan_seqs:
            logger.warning("Duplicate stem skipped: %s (seq %d already loaded)",
                           os.path.basename(plan.filepath), plan.seq)
            continue
        seen_plan_seqs.add(plan.seq)
        deduped.append(plan)
    plans = deduped

    # Inject synthetic stop markers for ambience-end directives in the index.
    # "AMBIENCE: STOP" and "AMBIENCE: * FADES OUT" have no stem file on disk
    # but must appear in the timeline so build_ambience_layer can use their
    # cue position as the loop end boundary.
    seen_seqs = {p.seq for p in plans}
    for seq, entry in entries_index.items():
        if seq in seen_seqs:
            continue
        text = entry.get("text", "")
        if entry.get("direction_type") == "AMBIENCE" and (
            text == "AMBIENCE: STOP" or text.endswith("FADES OUT")
        ):
            plans.append(StemPlan(
                seq=seq,
                filepath="",  # sentinel: no audio — skip in layer builders
                direction_type="AMBIENCE",
                entry_type=entry.get("type"),
                text=text,
            ))
        if entry.get("direction_type") == "VINTAGE FILTER" and "DISENGAGES" in text:
            plans.append(StemPlan(
                seq=seq,
                filepath="",  # sentinel: no audio — boundary only
                direction_type="VINTAGE FILTER",
                entry_type=entry.get("type"),
                text=text,
            ))

    return plans

apply_phone_filter

apply_phone_filter(segment: AudioSegment) -> AudioSegment

Apply a phone-speaker audio filter to an audio segment.

Cuts frequencies below 300 Hz and above 3000 Hz, then boosts volume by 5 dB to simulate a telephone speaker.

Parameters:

  • segment (AudioSegment) –

    Input audio segment to filter.

Returns:

  • AudioSegment

    Filtered audio segment.

Source code in src/xil_pipeline/mix_common.py
def apply_phone_filter(segment: AudioSegment) -> AudioSegment:
    """Apply a phone-speaker audio filter to an audio segment.

    Cuts frequencies below 300 Hz and above 3000 Hz, then boosts
    volume by 5 dB to simulate a telephone speaker.

    Args:
        segment: Input audio segment to filter.

    Returns:
        Filtered audio segment.
    """
    return segment.high_pass_filter(300).low_pass_filter(3000) + 5

apply_vintage_filter

apply_vintage_filter(segment: AudioSegment) -> AudioSegment

Apply a vintage (1960s-era) audio filter to an audio segment.

Rolls off high frequencies above 5 kHz (1960s tape/broadcast ceiling) and low rumble below 150 Hz, then reduces volume by 3 dB to simulate the compressed, mid-forward quality of aged tape recording.

Parameters:

  • segment (AudioSegment) –

    Input audio segment to filter.

Returns:

  • AudioSegment

    Filtered audio segment.

Source code in src/xil_pipeline/mix_common.py
def apply_vintage_filter(segment: AudioSegment) -> AudioSegment:
    """Apply a vintage (1960s-era) audio filter to an audio segment.

    Rolls off high frequencies above 5 kHz (1960s tape/broadcast ceiling)
    and low rumble below 150 Hz, then reduces volume by 3 dB to simulate
    the compressed, mid-forward quality of aged tape recording.

    Args:
        segment: Input audio segment to filter.

    Returns:
        Filtered audio segment.
    """
    mono = segment.set_channels(1).set_channels(segment.channels)
    return mono.low_pass_filter(5000).high_pass_filter(150) - 3

build_foreground

build_foreground(stem_plans: list[StemPlan], cast_config: dict, gap_ms: int = 600, vintage_scenes: list[str] | None = None) -> tuple[AudioSegment, dict[int, int]]

Build the foreground audio track and a full-episode timeline.

Iterates stems in sequence order. Foreground stems (dialogue, SFX, BEAT) are concatenated with silence gaps and their positions are recorded in the timeline. Background stems (AMBIENCE, MUSIC) are recorded in the timeline at the current foreground cursor position but do not advance it — they are overlaid at that cue point in a separate background pass.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • cast_config (dict) –

    {speaker_key: {"pan": float, "filter": str | bool | None}} for per-speaker audio effects. filter accepts False/None (no filter), True/"phone" (phone filter), "vintage", or a comma-separated combination such as "vintage,phone".

  • gap_ms (int, default: 600 ) –

    Silence inserted between foreground stems in ms.

  • vintage_scenes (list[str] | None, default: None ) –

    Optional list of scene labels (e.g. ["scene-3", "scene-4"]) whose dialogue stems receive an additional vintage filter pass. Applied after the per-speaker filter chain.

Returns:

  • AudioSegment

    Tuple of (foreground_audio, timeline) where timeline

  • dict[int, int]

    maps sequence numbers to millisecond offsets within the

  • tuple[AudioSegment, dict[int, int]]

    foreground track.

Source code in src/xil_pipeline/mix_common.py
def build_foreground(
    stem_plans: list[StemPlan],
    cast_config: dict,
    gap_ms: int = 600,
    vintage_scenes: list[str] | None = None,
) -> tuple[AudioSegment, dict[int, int]]:
    """Build the foreground audio track and a full-episode timeline.

    Iterates stems in sequence order. Foreground stems (dialogue, SFX,
    BEAT) are concatenated with silence gaps and their positions are
    recorded in the timeline. Background stems (AMBIENCE, MUSIC) are
    recorded in the timeline at the current foreground cursor position
    but do not advance it — they are overlaid at that cue point in a
    separate background pass.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        cast_config: ``{speaker_key: {"pan": float, "filter": str | bool | None}}``
            for per-speaker audio effects.  ``filter`` accepts ``False``/``None``
            (no filter), ``True``/``"phone"`` (phone filter), ``"vintage"``, or a
            comma-separated combination such as ``"vintage,phone"``.
        gap_ms: Silence inserted between foreground stems in ms.
        vintage_scenes: Optional list of scene labels (e.g. ``["scene-3",
            "scene-4"]``) whose dialogue stems receive an additional vintage
            filter pass.  Applied after the per-speaker filter chain.

    Returns:
        Tuple of ``(foreground_audio, timeline)`` where ``timeline``
        maps sequence numbers to millisecond offsets within the
        foreground track.
    """
    foreground = AudioSegment.empty()
    timeline: dict[int, int] = {}
    current_ms = 0
    vf_engaged = _vf_engaged_seqs(stem_plans)

    for plan in sorted(stem_plans, key=lambda p: p.seq):
        # Record cue position for ALL stems (both fg and bg).
        # Background stems don't advance current_ms — they overlay.
        timeline[plan.seq] = current_ms

        if plan.is_background:
            continue

        segment = AudioSegment.from_file(plan.filepath)

        # Trim SFX/BEAT stems to play_duration before advancing the timeline
        # so dialogue placement reflects the shortened clip, not the full file.
        if plan.play_duration is not None and not plan.pre_trimmed:
            segment = segment[:max(1, int(len(segment) * plan.play_duration / 100.0))]

        # Apply volume_percentage to SFX/BEAT stems in the foreground.
        if plan.direction_type in ("SFX", "BEAT") and plan.volume_percentage is not None:
            segment = segment + _volume_pct_to_db(plan.volume_percentage)

        # Apply per-speaker effects to dialogue stems.
        basename = os.path.splitext(os.path.basename(plan.filepath))[0]
        speaker = basename.rsplit("_", 1)[-1]
        if speaker in cast_config:
            segment = _apply_speaker_filters(segment, cast_config[speaker].get("filter"))
            segment = segment.pan(cast_config[speaker].get("pan", 0.0))

        # Vintage filter for dialogue: script-direction spans take precedence;
        # fall back to scene-scoped vintage_scenes list.
        if plan.entry_type == "dialogue":
            if plan.seq in vf_engaged:
                segment = apply_vintage_filter(segment)
            elif vintage_scenes and plan.scene in vintage_scenes:
                segment = apply_vintage_filter(segment)

        foreground += segment + AudioSegment.silent(duration=gap_ms)
        current_ms += len(segment) + gap_ms

    return foreground, timeline

build_ambience_layer

build_ambience_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = AMBIENCE_LEVEL_DB) -> AudioSegment

Build the ambience background layer.

Each AMBIENCE stem is looped from its cue point to the start of the next background cue (AMBIENCE or MUSIC) or the end of the track, whichever comes first. The level_db parameter controls ducking; use 0 for DAW layer export so the producer controls levels in-DAW.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • timeline (dict[int, int]) –

    Cue-point timestamps from :func:build_foreground.

  • total_ms (int) –

    Total foreground track length in milliseconds.

  • level_db (float, default: AMBIENCE_LEVEL_DB ) –

    Volume adjustment applied to the clip before looping. Negative values duck the ambience below dialogue.

Returns:

  • AudioSegment

    Tuple of (layer, labels) where layer is a full-length

  • AudioSegment

    class:~pydub.AudioSegment with ambience looped at each cue

  • AudioSegment

    point, and labels is a list of (start_sec, end_sec, text)

  • AudioSegment

    tuples spanning each looped region.

Source code in src/xil_pipeline/mix_common.py
def build_ambience_layer(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
    level_db: float = AMBIENCE_LEVEL_DB,
) -> AudioSegment:
    """Build the ambience background layer.

    Each AMBIENCE stem is looped from its cue point to the start of
    the next background cue (AMBIENCE or MUSIC) or the end of the
    track, whichever comes first. The ``level_db`` parameter controls
    ducking; use ``0`` for DAW layer export so the producer controls
    levels in-DAW.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        timeline: Cue-point timestamps from :func:`build_foreground`.
        total_ms: Total foreground track length in milliseconds.
        level_db: Volume adjustment applied to the clip before looping.
            Negative values duck the ambience below dialogue.

    Returns:
        Tuple of ``(layer, labels)`` where *layer* is a full-length
        :class:`~pydub.AudioSegment` with ambience looped at each cue
        point, and *labels* is a list of ``(start_sec, end_sec, text)``
        tuples spanning each looped region.
    """
    layer = AudioSegment.silent(duration=total_ms)
    labels: list[tuple[float, float, str]] = []
    ambience_plans = sorted(
        (p for p in stem_plans if p.direction_type == "AMBIENCE"),
        key=lambda p: p.seq,
    )
    if not ambience_plans:
        return layer, labels

    # All background cue ms values (AMBIENCE + MUSIC) sorted by position.
    bg_cues: list[tuple[int, int]] = sorted(
        (
            (timeline.get(p.seq, 0), p.seq)
            for p in stem_plans
            if p.is_background
        ),
        key=lambda t: t[0],
    )

    for plan in ambience_plans:
        if not plan.filepath:  # AMBIENCE: STOP marker — boundary only, no audio
            continue
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue

        # End at the next background cue after this one, or track end.
        end_ms = total_ms
        for cue_ms, cue_seq in bg_cues:
            if cue_seq > plan.seq and cue_ms > start_ms:
                end_ms = min(cue_ms, total_ms)
                break

        duration_needed = end_ms - start_ms
        if duration_needed <= 0:
            continue

        try:
            clip = AudioSegment.from_file(plan.filepath)
        except (CouldntDecodeError, OSError) as exc:
            logger.warning("Skipping corrupt ambience stem: %s (%s)", plan.filepath, exc)
            continue
        ramp_in_ms = int((plan.ramp_in_seconds or 0) * 1000)
        ramp_out_ms = int((plan.ramp_out_seconds or 0) * 1000)
        looped = _loop_clip(clip, duration_needed) if plan.loop else clip[:duration_needed]
        looped = _apply_clip_effects(
            looped, plan.volume_percentage, ramp_in_ms, ramp_out_ms, level_db
        )
        layer = layer.overlay(looped, position=start_ms)
        label_text = plan.text or plan.direction_type or "AMBIENCE"
        labels.append((
            start_ms / 1000.0, end_ms / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds,
            None, None, plan.volume_percentage, plan.seq,
        ))

    return layer, labels

build_vintage_filter_layer

build_vintage_filter_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = 0) -> tuple[AudioSegment, list[tuple]]

Build the vintage filter crackle layer.

Loops the crackle source between each VINTAGE FILTER ENGAGES marker and the next VINTAGE FILTER DISENGAGES marker (or track end if no DISENGAGES follows). Use level_db=0 for DAW export so the producer controls levels in-DAW.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • timeline (dict[int, int]) –

    Cue-point timestamps from :func:build_foreground.

  • total_ms (int) –

    Total foreground track length in milliseconds.

  • level_db (float, default: 0 ) –

    Volume adjustment applied to each looped region.

Returns:

  • AudioSegment

    Tuple of (layer, labels) where layer is a full-length

  • list[tuple]

    class:~pydub.AudioSegment with crackle looped at each active

  • tuple[AudioSegment, list[tuple]]

    span, and labels is a list of (start_sec, end_sec, text)

  • tuple[AudioSegment, list[tuple]]

    tuples for each looped region.

Source code in src/xil_pipeline/mix_common.py
def build_vintage_filter_layer(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
    level_db: float = 0,
) -> tuple[AudioSegment, list[tuple]]:
    """Build the vintage filter crackle layer.

    Loops the crackle source between each ``VINTAGE FILTER ENGAGES``
    marker and the next ``VINTAGE FILTER DISENGAGES`` marker (or track
    end if no DISENGAGES follows).  Use ``level_db=0`` for DAW export so
    the producer controls levels in-DAW.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        timeline: Cue-point timestamps from :func:`build_foreground`.
        total_ms: Total foreground track length in milliseconds.
        level_db: Volume adjustment applied to each looped region.

    Returns:
        Tuple of ``(layer, labels)`` where *layer* is a full-length
        :class:`~pydub.AudioSegment` with crackle looped at each active
        span, and *labels* is a list of ``(start_sec, end_sec, text)``
        tuples for each looped region.
    """
    layer = AudioSegment.silent(duration=total_ms)
    labels: list[tuple[float, float, str]] = []
    vf_plans = sorted(
        (p for p in stem_plans if p.direction_type == "VINTAGE FILTER"),
        key=lambda p: p.seq,
    )
    if not vf_plans:
        return layer, labels

    # All VF cue positions sorted — used to find the end boundary for each span.
    vf_cues: list[tuple[int, int]] = sorted(
        ((timeline.get(p.seq, 0), p.seq) for p in vf_plans),
        key=lambda t: t[0],
    )

    for plan in vf_plans:
        # Skip DISENGAGES: either the filepath sentinel ("") or a real file whose
        # text says DISENGAGES (happens when the producer generated an SFX stem for
        # the DISENGAGES direction entry before collect_stem_plans could inject the
        # empty-filepath sentinel, causing the real file to win deduplication).
        if not plan.filepath or "DISENGAGES" in (plan.text or ""):
            continue
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue

        # End at the next VF cue after this one, or track end.
        end_ms = total_ms
        for cue_ms, cue_seq in vf_cues:
            if cue_seq > plan.seq and cue_ms > start_ms:
                end_ms = min(cue_ms, total_ms)
                break

        duration_needed = end_ms - start_ms
        if duration_needed <= 0:
            continue

        try:
            clip = AudioSegment.from_file(plan.filepath)
        except (CouldntDecodeError, OSError) as exc:
            logger.warning("Skipping corrupt vintage filter stem: %s (%s)", plan.filepath, exc)
            continue
        ramp_in_ms = int((plan.ramp_in_seconds or 0) * 1000)
        ramp_out_ms = int((plan.ramp_out_seconds or 0) * 1000)
        looped = _loop_clip(clip, duration_needed) if plan.loop else clip[:duration_needed]
        looped = _apply_clip_effects(
            looped, plan.volume_percentage, ramp_in_ms, ramp_out_ms, level_db
        )
        layer = layer.overlay(looped, position=start_ms)
        label_text = plan.text or "VINTAGE FILTER"
        labels.append((
            start_ms / 1000.0, end_ms / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds,
            None, None, plan.volume_percentage, plan.seq,
        ))

    return layer, labels

build_music_layer

build_music_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, level_db: float = MUSIC_LEVEL_DB, include_foreground_override: bool = False) -> AudioSegment

Build the music/sting background layer.

Each MUSIC stem is overlaid at its cue point without looping. Use level_db=0 for DAW layer export so levels are set in-DAW.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • timeline (dict[int, int]) –

    Cue-point timestamps from :func:build_foreground.

  • total_ms (int) –

    Total foreground track length in milliseconds.

  • level_db (float, default: MUSIC_LEVEL_DB ) –

    Volume adjustment applied before overlaying.

  • include_foreground_override (bool, default: False ) –

    When True, preamble/postamble MUSIC stems (foreground_override=True) are placed at their timeline position in this layer. Used by DAW export so the operator can see and mix them; set to False (default) for the integrated mix where they play sequentially via :func:build_foreground instead.

Returns:

  • AudioSegment

    Tuple of (layer, labels) where layer is a full-length

  • AudioSegment

    class:~pydub.AudioSegment with music stings overlaid at

  • AudioSegment

    their cue positions, and labels is a list of

  • AudioSegment

    (start_sec, end_sec, text) tuples for each sting.

Source code in src/xil_pipeline/mix_common.py
def build_music_layer(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
    level_db: float = MUSIC_LEVEL_DB,
    include_foreground_override: bool = False,
) -> AudioSegment:
    """Build the music/sting background layer.

    Each MUSIC stem is overlaid at its cue point without looping.
    Use ``level_db=0`` for DAW layer export so levels are set in-DAW.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        timeline: Cue-point timestamps from :func:`build_foreground`.
        total_ms: Total foreground track length in milliseconds.
        level_db: Volume adjustment applied before overlaying.
        include_foreground_override: When ``True``, preamble/postamble
            MUSIC stems (``foreground_override=True``) are placed at
            their timeline position in this layer.  Used by DAW export
            so the operator can see and mix them; set to ``False``
            (default) for the integrated mix where they play
            sequentially via :func:`build_foreground` instead.

    Returns:
        Tuple of ``(layer, labels)`` where *layer* is a full-length
        :class:`~pydub.AudioSegment` with music stings overlaid at
        their cue positions, and *labels* is a list of
        ``(start_sec, end_sec, text)`` tuples for each sting.
    """
    layer = AudioSegment.silent(duration=total_ms)
    labels: list[tuple[float, float, str]] = []
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.direction_type != "MUSIC":
            continue
        if plan.foreground_override and not include_foreground_override:
            continue  # preamble/postamble music plays in foreground, not here
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue
        clip = AudioSegment.from_file(plan.filepath)
        if plan.play_duration is not None and not plan.pre_trimmed:
            clip = clip[:max(1, int(len(clip) * plan.play_duration / 100.0))]
        ramp_in_ms = int((plan.ramp_in_seconds or 0) * 1000)
        ramp_out_ms = int((plan.ramp_out_seconds or 0) * 1000)
        clip = _apply_clip_effects(
            clip, plan.volume_percentage, ramp_in_ms, ramp_out_ms, level_db
        )
        layer = layer.overlay(clip, position=start_ms)
        label_text = plan.text or plan.direction_type or "MUSIC"
        labels.append((
            start_ms / 1000.0, (start_ms + len(clip)) / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds, plan.play_duration,
            None, plan.volume_percentage, plan.seq,
        ))
    return layer, labels

build_dialogue_layer

build_dialogue_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, cast_config: dict, vintage_scenes: list[str] | None = None) -> tuple

Build an isolated dialogue layer for DAW export.

Places only dialogue stems (entry_type == "dialogue") at their foreground timeline positions in a full-length silent segment. Filter and pan effects are applied per speaker as configured.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • timeline (dict[int, int]) –

    Cue-point timestamps from :func:build_foreground.

  • total_ms (int) –

    Total track length in milliseconds.

  • cast_config (dict) –

    Per-speaker audio settings.

  • vintage_scenes (list[str] | None, default: None ) –

    Optional list of scene labels whose dialogue stems receive an additional vintage filter pass (same as :func:build_foreground).

Returns:

  • tuple

    Tuple of (layer, labels) where layer is a full-length

  • tuple

    class:~pydub.AudioSegment with dialogue stems at their

  • tuple

    timeline positions, and labels is a list of

  • tuple

    (start_sec, end_sec, speaker) tuples for Audacity label export.

Source code in src/xil_pipeline/mix_common.py
def build_dialogue_layer(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
    cast_config: dict,
    vintage_scenes: list[str] | None = None,
) -> tuple:
    """Build an isolated dialogue layer for DAW export.

    Places only dialogue stems (``entry_type == "dialogue"``) at their
    foreground timeline positions in a full-length silent segment.
    Filter and pan effects are applied per speaker as configured.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        timeline: Cue-point timestamps from :func:`build_foreground`.
        total_ms: Total track length in milliseconds.
        cast_config: Per-speaker audio settings.
        vintage_scenes: Optional list of scene labels whose dialogue stems
            receive an additional vintage filter pass (same as
            :func:`build_foreground`).

    Returns:
        Tuple of ``(layer, labels)`` where *layer* is a full-length
        :class:`~pydub.AudioSegment` with dialogue stems at their
        timeline positions, and *labels* is a list of
        ``(start_sec, end_sec, speaker)`` tuples for Audacity label export.
    """
    layer = AudioSegment.silent(duration=total_ms)
    labels: list[tuple[float, float, str]] = []
    vf_engaged = _vf_engaged_seqs(stem_plans)
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.entry_type != "dialogue":
            continue
        start_ms = timeline.get(plan.seq, 0)
        segment = AudioSegment.from_file(plan.filepath)
        basename = os.path.splitext(os.path.basename(plan.filepath))[0]
        speaker = basename.rsplit("_", 1)[-1]
        if speaker in cast_config:
            segment = _apply_speaker_filters(segment, cast_config[speaker].get("filter"))
            segment = segment.pan(cast_config[speaker].get("pan", 0.0))
        if plan.seq in vf_engaged:
            segment = apply_vintage_filter(segment)
        elif vintage_scenes and plan.scene in vintage_scenes:
            segment = apply_vintage_filter(segment)
        end_ms = start_ms + len(segment)
        labels.append((start_ms / 1000.0, end_ms / 1000.0, speaker, None, None, None, None, None, plan.seq))
        layer = layer.overlay(segment, position=start_ms)
    return layer, labels

build_foreground_timeline_only

build_foreground_timeline_only(stem_plans: list[StemPlan], gap_ms: int = 600) -> tuple[int, dict[int, int]]

Build a foreground timeline without decoding audio.

Lightweight variant of :func:build_foreground that reads MP3 durations via mutagen header inspection instead of loading full audio via pydub. Enables --dry-run --timeline without expensive audio decoding.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • gap_ms (int, default: 600 ) –

    Silence gap between foreground stems in ms.

Returns:

  • int

    Tuple of (total_ms, timeline) where timeline maps

  • dict[int, int]

    sequence numbers to millisecond offsets.

Source code in src/xil_pipeline/mix_common.py
def build_foreground_timeline_only(
    stem_plans: list[StemPlan],
    gap_ms: int = 600,
) -> tuple[int, dict[int, int]]:
    """Build a foreground timeline without decoding audio.

    Lightweight variant of :func:`build_foreground` that reads MP3
    durations via mutagen header inspection instead of loading full
    audio via pydub.  Enables ``--dry-run --timeline`` without
    expensive audio decoding.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        gap_ms: Silence gap between foreground stems in ms.

    Returns:
        Tuple of ``(total_ms, timeline)`` where ``timeline`` maps
        sequence numbers to millisecond offsets.
    """
    timeline: dict[int, int] = {}
    current_ms = 0

    for plan in sorted(stem_plans, key=lambda p: p.seq):
        timeline[plan.seq] = current_ms
        if plan.is_background:
            continue
        duration = _mp3_duration_ms(plan.filepath)
        current_ms += duration + gap_ms

    return current_ms, timeline

compute_dialogue_labels

compute_dialogue_labels(stem_plans: list[StemPlan], timeline: dict[int, int]) -> list[tuple]

Compute dialogue label tuples without loading audio.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list.

  • timeline (dict[int, int]) –

    Cue-point timestamps from a foreground build.

Returns:

  • list[tuple]

    List of 7-element tuples (start_s, end_s, speaker, None, None, None, snippet)

  • list[tuple]

    where snippet is the first 5 words of the dialogue text (or None if no

  • list[tuple]

    text is available). Positions [3]–[5] are None (dialogue has no ramp or

  • list[tuple]

    play_duration); position [6] carries the snippet for the HTML tooltip.

Source code in src/xil_pipeline/mix_common.py
def compute_dialogue_labels(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
) -> list[tuple]:
    """Compute dialogue label tuples without loading audio.

    Args:
        stem_plans: Classified stem list.
        timeline: Cue-point timestamps from a foreground build.

    Returns:
        List of 7-element tuples ``(start_s, end_s, speaker, None, None, None, snippet)``
        where *snippet* is the first 5 words of the dialogue text (or ``None`` if no
        text is available).  Positions [3]–[5] are ``None`` (dialogue has no ramp or
        play_duration); position [6] carries the snippet for the HTML tooltip.
    """
    labels = []
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.entry_type != "dialogue":
            continue
        start_ms = timeline.get(plan.seq, 0)
        duration = _mp3_duration_ms(plan.filepath)
        end_ms = start_ms + duration
        basename = os.path.splitext(os.path.basename(plan.filepath))[0]
        speaker = basename.rsplit("_", 1)[-1]
        words = (plan.text or "").split()
        snippet = " ".join(words[:5]) if words else None
        labels.append((start_ms / 1000.0, end_ms / 1000.0, speaker, None, None, None, snippet, None, plan.seq, plan.tts_model))
    return labels

compute_ambience_labels

compute_ambience_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]

Compute ambience label tuples without loading audio.

Uses the same boundary logic as :func:build_ambience_layer.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list.

  • timeline (dict[int, int]) –

    Cue-point timestamps.

  • total_ms (int) –

    Total episode duration in ms.

Returns:

Source code in src/xil_pipeline/mix_common.py
def compute_ambience_labels(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
) -> list[tuple[float, float, str]]:
    """Compute ambience label tuples without loading audio.

    Uses the same boundary logic as :func:`build_ambience_layer`.

    Args:
        stem_plans: Classified stem list.
        timeline: Cue-point timestamps.
        total_ms: Total episode duration in ms.

    Returns:
        List of ``(start_s, end_s, text)`` tuples.
    """
    labels: list[tuple[float, float, str]] = []
    ambience_plans = sorted(
        (p for p in stem_plans if p.direction_type == "AMBIENCE"),
        key=lambda p: p.seq,
    )
    if not ambience_plans:
        return labels

    bg_cues: list[tuple[int, int]] = sorted(
        (
            (timeline.get(p.seq, 0), p.seq)
            for p in stem_plans
            if p.is_background
        ),
        key=lambda t: t[0],
    )

    for plan in ambience_plans:
        if not plan.filepath:  # AMBIENCE: STOP marker — boundary only, no label
            continue
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue
        end_ms = total_ms
        for cue_ms, cue_seq in bg_cues:
            if cue_seq > plan.seq and cue_ms > start_ms:
                end_ms = min(cue_ms, total_ms)
                break
        if end_ms - start_ms <= 0:
            continue
        label_text = plan.text or plan.direction_type or "AMBIENCE"
        labels.append((
            start_ms / 1000.0, end_ms / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds,
            None, None, plan.volume_percentage, plan.seq,
        ))

    return labels

compute_vintage_filter_labels

compute_vintage_filter_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]

Compute vintage filter label tuples without loading audio.

Uses the same boundary logic as :func:build_vintage_filter_layer.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list.

  • timeline (dict[int, int]) –

    Cue-point timestamps.

  • total_ms (int) –

    Total episode duration in ms.

Returns:

Source code in src/xil_pipeline/mix_common.py
def compute_vintage_filter_labels(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
) -> list[tuple[float, float, str]]:
    """Compute vintage filter label tuples without loading audio.

    Uses the same boundary logic as :func:`build_vintage_filter_layer`.

    Args:
        stem_plans: Classified stem list.
        timeline: Cue-point timestamps.
        total_ms: Total episode duration in ms.

    Returns:
        List of label tuples spanning each active crackle region.
    """
    labels: list[tuple[float, float, str]] = []
    vf_plans = sorted(
        (p for p in stem_plans if p.direction_type == "VINTAGE FILTER"),
        key=lambda p: p.seq,
    )
    if not vf_plans:
        return labels

    vf_cues: list[tuple[int, int]] = sorted(
        ((timeline.get(p.seq, 0), p.seq) for p in vf_plans),
        key=lambda t: t[0],
    )

    for plan in vf_plans:
        if not plan.filepath or "DISENGAGES" in (plan.text or ""):
            continue
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue
        end_ms = total_ms
        for cue_ms, cue_seq in vf_cues:
            if cue_seq > plan.seq and cue_ms > start_ms:
                end_ms = min(cue_ms, total_ms)
                break
        if end_ms - start_ms <= 0:
            continue
        label_text = plan.text or "VINTAGE FILTER"
        labels.append((
            start_ms / 1000.0, end_ms / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds,
            None, None, plan.volume_percentage, plan.seq,
        ))

    return labels

compute_music_labels

compute_music_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int, include_foreground_override: bool = False) -> list[tuple[float, float, str]]

Compute music label tuples without loading audio.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list.

  • timeline (dict[int, int]) –

    Cue-point timestamps.

  • total_ms (int) –

    Total episode duration in ms.

  • include_foreground_override (bool, default: False ) –

    When True, include preamble/ postamble MUSIC stems. Mirror of the same flag on :func:build_music_layer.

Returns:

Source code in src/xil_pipeline/mix_common.py
def compute_music_labels(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
    include_foreground_override: bool = False,
) -> list[tuple[float, float, str]]:
    """Compute music label tuples without loading audio.

    Args:
        stem_plans: Classified stem list.
        timeline: Cue-point timestamps.
        total_ms: Total episode duration in ms.
        include_foreground_override: When ``True``, include preamble/
            postamble MUSIC stems.  Mirror of the same flag on
            :func:`build_music_layer`.

    Returns:
        List of ``(start_s, end_s, text)`` tuples.
    """
    labels: list[tuple[float, float, str]] = []
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.direction_type != "MUSIC":
            continue
        if plan.foreground_override and not include_foreground_override:
            continue  # preamble/postamble music plays in foreground, not here
        start_ms = timeline.get(plan.seq, 0)
        if start_ms >= total_ms:
            continue
        duration = _mp3_duration_ms(plan.filepath)
        if plan.play_duration is not None and not plan.pre_trimmed:
            duration = max(1, int(duration * plan.play_duration / 100.0))
        label_text = plan.text or plan.direction_type or "MUSIC"
        labels.append((
            start_ms / 1000.0, (start_ms + duration) / 1000.0, label_text,
            plan.ramp_in_seconds, plan.ramp_out_seconds, plan.play_duration,
            None, plan.volume_percentage, plan.seq,
        ))
    return labels

compute_sfx_labels

compute_sfx_labels(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> list[tuple[float, float, str]]

Compute SFX/BEAT label tuples without loading audio.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list.

  • timeline (dict[int, int]) –

    Cue-point timestamps.

  • total_ms (int) –

    Total episode duration in ms.

Returns:

Source code in src/xil_pipeline/mix_common.py
def compute_sfx_labels(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
) -> list[tuple[float, float, str]]:
    """Compute SFX/BEAT label tuples without loading audio.

    Args:
        stem_plans: Classified stem list.
        timeline: Cue-point timestamps.
        total_ms: Total episode duration in ms.

    Returns:
        List of ``(start_s, end_s, text)`` tuples.
    """
    labels: list[tuple[float, float, str]] = []
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.direction_type not in ("SFX", "BEAT"):
            continue
        start_ms = timeline.get(plan.seq, 0)
        duration = _mp3_duration_ms(plan.filepath)
        if plan.play_duration is not None and not plan.pre_trimmed:
            duration = max(1, int(duration * plan.play_duration / 100.0))
        label_text = plan.text or plan.direction_type or "SFX"
        labels.append((
            start_ms / 1000.0, (start_ms + duration) / 1000.0, label_text,
            None, None, plan.play_duration, None, plan.volume_percentage, plan.seq,
        ))
    return labels

build_sfx_layer

build_sfx_layer(stem_plans: list[StemPlan], timeline: dict[int, int], total_ms: int) -> AudioSegment

Build an isolated SFX layer for DAW export.

Places only one-shot SFX and BEAT stems (direction_type in ("SFX", "BEAT")) at their foreground timeline positions.

Parameters:

  • stem_plans (list[StemPlan]) –

    Classified stem list from :func:collect_stem_plans.

  • timeline (dict[int, int]) –

    Cue-point timestamps from :func:build_foreground.

  • total_ms (int) –

    Total track length in milliseconds.

Returns:

  • AudioSegment

    Tuple of (layer, labels) where layer is a full-length

  • AudioSegment

    class:~pydub.AudioSegment with SFX stems at their timeline

  • AudioSegment

    positions, and labels is a list of (start_sec, end_sec, text)

  • AudioSegment

    tuples for each one-shot effect.

Source code in src/xil_pipeline/mix_common.py
def build_sfx_layer(
    stem_plans: list[StemPlan],
    timeline: dict[int, int],
    total_ms: int,
) -> AudioSegment:
    """Build an isolated SFX layer for DAW export.

    Places only one-shot SFX and BEAT stems (``direction_type in
    ("SFX", "BEAT")``) at their foreground timeline positions.

    Args:
        stem_plans: Classified stem list from :func:`collect_stem_plans`.
        timeline: Cue-point timestamps from :func:`build_foreground`.
        total_ms: Total track length in milliseconds.

    Returns:
        Tuple of ``(layer, labels)`` where *layer* is a full-length
        :class:`~pydub.AudioSegment` with SFX stems at their timeline
        positions, and *labels* is a list of ``(start_sec, end_sec, text)``
        tuples for each one-shot effect.
    """
    layer = AudioSegment.silent(duration=total_ms)
    labels: list[tuple[float, float, str]] = []
    for plan in sorted(stem_plans, key=lambda p: p.seq):
        if plan.direction_type not in ("SFX", "BEAT"):
            continue
        start_ms = timeline.get(plan.seq, 0)
        segment = AudioSegment.from_file(plan.filepath)
        if plan.play_duration is not None and not plan.pre_trimmed:
            segment = segment[:max(1, int(len(segment) * plan.play_duration / 100.0))]
        if plan.volume_percentage is not None:
            segment = segment + _volume_pct_to_db(plan.volume_percentage)
        layer = layer.overlay(segment, position=start_ms)
        label_text = plan.text or plan.direction_type or "SFX"
        labels.append((
            start_ms / 1000.0, (start_ms + len(segment)) / 1000.0, label_text,
            None, None, plan.play_duration, None, plan.volume_percentage, plan.seq,
        ))
    return layer, labels