Post

BirdCLEF 2026: EoS9 + OOF-Gated PCEN

BirdCLEF 2026: EoS9 + OOF-Gated PCEN

BirdCLEF 2026: EoS9 + OOF-Gated PCEN

Competition link:
BirdCLEF+ 2026

Kaggle notebook link:
BirdCLEF 2026: EoS9 + OOF-Gated PCEN

BirdCLEF 2026 EoS9 and OOF-gated PCEN cover

The clearest way to read this notebook is as a controlled submission architecture, not as a single-model inference script. Its purpose is not to stack as many audio classifiers as possible. The central idea is deliberately conservative:

Keep a strong public EoS9-style anchor fixed, then allow optional sidecars to move only the parts of the prediction matrix where offline evidence says they are allowed to help.

That distinction matters in BirdCLEF. The target is a large multilabel soundscape table: every row_id corresponds to a short time slice, and every class column is a species or taxon label. Small probability perturbations can change rank-sensitive leaderboard behavior, but an ungated side model can also damage classes where it has no reliable signal.

The notebook therefore separates the system into three layers:

LayerRole
AnchorA public EoS9-style three-branch ensemble that produces the canonical submission.
Post-anchor smoothingA light taxonomy prior that shares evidence among related labels.
Sidecar correctionsOptional BirdNET and PCEN/ConvNeXt rank corrections gated by masks, OOF class weights, and movement budgets.

The active notebook configuration is:

ComponentActive Setting
RUN_MODEeos9_tax
Anchor weights0.012 yukiZ, 0.021 PowerOptimization PSSM, 0.967 PowerOptimization EoS9
Main rank blend0.600 ProtoSSM rank + 0.400 SED rank
Taxonomy smoothinggenus 0.15, class 0.05
BirdNET sidecarenabled, but dry-run public rows keep anchor when row IDs cannot match
PCEN sidecarenabled, exp002b_5s, fold [0], OOF gate enabled
Final dry-run result3 rows, 235 columns, finite probabilities in [0.461531, 0.529922]

The key engineering rule is:

1
anchor first, correction second, validation always

1. Submission Pipeline

The anchor is a compact three-branch ensemble:

\[p_{\text{anchor raw}} = w_y p_{\text{yukiZ}} + w_p p_{\text{PSSM}} + w_s p_{\text{EoS9}}\]

with:

\[(w_y, w_p, w_s) = (0.012,\ 0.021,\ 0.967)\]

The dominant branch is the EoS9 / PowerOptimization-style branch. Inside that branch, the notebook combines ProtoSSM and distilled SED predictions in rank space:

\[z = 0.60 \cdot R(p_{\text{ProtoSSM}}) + 0.40 \cdot R(p_{\text{SED}})\]

where (R(\cdot)) denotes class-wise percentile rank. The reason to blend ranks rather than raw probabilities is that the branch outputs are not necessarily calibrated on the same scale. Rank blending asks a simpler question: which classes does each branch place near the top for a given row?

BirdCLEF EoS9 anchor and gated sidecar pipeline

After the direct blend, the notebook applies taxonomy smoothing and writes submission.csv. Only then do the optional sidecars get a chance to make local rank-space corrections. That ordering is important:

1
2
branch CSVs -> direct anchor -> taxonomy smoothing -> submission_before_all_sidecars.csv
            -> optional sidecars -> final submission.csv

The sidecars are not global ensemble members. They are post-processing modules whose movement is explicitly budgeted.

2. Why The Anchor Is The Control Path

The notebook is built around the EoS9 anchor because it is already a strong, structured baseline. It contains several pieces that are hard to replace with a small side model:

Anchor PartContribution
yukiZ Perch / ProtoSSM / ResidualSSMLow-weight diversity from a pretrained Perch-based path.
PowerOptimization PSSM branchA small intermediate branch saved from the Karnakbayev PowerOptimization pipeline.
PowerOptimization EoS9 branchThe dominant ProtoSSM + distilled SED rank branch with prior and rank-aware scaling.
Ecological priorsSite/hour and test-prior effects that improve plausible taxon ranking.
Temporal smoothing and continuity gatesSuppress single-window spikes and encourage soundscape consistency.
Rank-aware scalingGives extra structure to file-level high-confidence classes.

The branch registry is deliberately explicit:

BranchWeightOutput FileRole
yukiZ_Perch_ProtoSSM_ResSSM0.012subm_yukiz_perch_proto_res.csvsmall pretrained diversity
Karnakbayev_PowerOptimization_PSSM0.021subm_karnakbayev_poweropt_pssm.csvintermediate PSSM side branch
Karnakbayev_PowerOptimization_EoS90.967subm_karnakbayev_poweropt_sz2.csvdominant EoS9 rank branch
Show notebook snippet: front-of-notebook control weights
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
RUN_MODE = "eos9_tax"

if RUN_MODE == "eos9_tax":
    YUKIZ_BLEND_WEIGHT = 0.012
    POWEROPT_PSSM_BLEND_WEIGHT = 0.021
    POWEROPT_SZ2_BLEND_WEIGHT = 0.967
elif RUN_MODE == "eos9_no_pssm":
    YUKIZ_BLEND_WEIGHT = 0.012
    POWEROPT_PSSM_BLEND_WEIGHT = 0.0
    POWEROPT_SZ2_BLEND_WEIGHT = 0.988
elif RUN_MODE == "poweropt_eos9_only":
    YUKIZ_BLEND_WEIGHT = 0.0
    POWEROPT_PSSM_BLEND_WEIGHT = 0.0
    POWEROPT_SZ2_BLEND_WEIGHT = 1.0

PROTO_RANK_WEIGHT = 0.600
SED_RANK_WEIGHT = 1.0 - PROTO_RANK_WEIGHT

assert abs(
    YUKIZ_BLEND_WEIGHT
    + POWEROPT_PSSM_BLEND_WEIGHT
    + POWEROPT_SZ2_BLEND_WEIGHT
    - 1.0
) < 1e-9
assert abs((PROTO_RANK_WEIGHT + SED_RANK_WEIGHT) - 1.0) < 1e-9

This pattern matters for public competition notebooks. Every score-changing knob is visible at the top, and the notebook asserts that the weight contracts still hold. That prevents silent drift across experiment runs.

3. Taxonomy Smoothing

BirdCLEF labels are not independent in the real world. Species share genus, class, habitat, and acoustic confusions. The anchor already produces a strong per-class ordering, but a small biological prior can help when two related labels are close.

The taxonomy smoothing step implements that prior directly. For a genus group (g):

\[p_c \leftarrow (1 - \alpha_g)p_c + \alpha_g \operatorname{mean}_{j \in g}(p_j)\]

Then for a broader class group (k):

\[p_c \leftarrow (1 - \alpha_k)p_c + \alpha_k \operatorname{mean}_{j \in k}(p_j)\]

The active values are:

\[\alpha_g = 0.15, \qquad \alpha_k = 0.05\]

In the notebook dry run, taxonomy.csv produced 29 multi-species genus groups and 4 multi-label class groups.

Taxonomy smoothing flow for BirdCLEF predictions

The step is not a replacement for the acoustic model. It is a weak sharing prior:

1
2
3
preserve most of the anchor score,
borrow a little evidence from related labels,
clip and validate the same output schema.
Show notebook snippet: taxonomy smoothing
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def f_TAX_SMOOTHING_POSTPROC(f_direct=direct, genus_alpha=None, class_alpha=None):
    submission = f_direct()
    genus_alpha = TAX_GENUS_ALPHA if genus_alpha is None else float(genus_alpha)
    class_alpha = TAX_CLASS_ALPHA if class_alpha is None else float(class_alpha)

    taxonomy_path = _find_taxonomy_path_for_smoothing()
    if taxonomy_path is None:
        msg = "taxonomy.csv not found; TAX_SMOOTHING cannot be applied"
        if globals().get("TAX_SMOOTHING_REQUIRE", True):
            raise FileNotFoundError(msg)
        submission.index.name = "row_id"
        return submission

    tax = pd.read_csv(taxonomy_path)
    species_to_genus = {}
    species_to_class = {}
    for _, row in tax.iterrows():
        label = str(row.get("primary_label", ""))
        sci = str(row.get("scientific_name", ""))
        class_name = str(row.get("class_name", ""))
        genus = sci.split(" ")[0] if " " in sci else sci
        if label:
            species_to_genus[label] = genus
            species_to_class[label] = class_name

    probs = submission.to_numpy(dtype=np.float32, copy=True)

    for members in multi_genus.values():
        idx = [col_to_idx[m] for m in members]
        group_mean = probs[:, idx].mean(axis=1, keepdims=True)
        probs[:, idx] = (1.0 - genus_alpha) * probs[:, idx] + genus_alpha * group_mean

    for members in multi_class.values():
        idx = [col_to_idx[m] for m in members]
        group_mean = probs[:, idx].mean(axis=1, keepdims=True)
        probs[:, idx] = (1.0 - class_alpha) * probs[:, idx] + class_alpha * group_mean

    probs = np.clip(probs, 0.0, 1.0)
    return pd.DataFrame(probs, index=submission.index, columns=submission.columns)

4. Sidecars Are Local Corrections, Not New Anchors

The notebook has two optional sidecar families:

SidecarActive StateIntended Use
BirdNET v2.4enabledconservative correction for covered Aves labels
PCEN/ConvNeXt exp002benabledcustom weak-audio correction gated by class-wise OOF evidence

The shared formula is:

\[A = R(p_{\text{anchor}}), \qquad S = R(p_{\text{sidecar}})\] \[B = A + W_{\text{class}} \cdot M \cdot (S - A)\]

where:

SymbolMeaning
(A)anchor ranks after taxonomy smoothing
(S)sidecar ranks
(M)boolean cell mask where movement is allowed
(W_{\text{class}})scalar or class-wise correction weight
(B)corrected rank table

The philosophy is deliberately cautious. A sidecar is allowed to nudge the anchor, not replace it.

OOF-gated rank correction flow

The correction must pass several guards:

GuardPurpose
Top-k maskMove only classes already plausible under the anchor or strongly proposed by the sidecar.
Anchor threshold tauPrevent sidecar-only noise from moving classes the anchor considers implausible.
OOF class gateOpen only classes where offline sidecar replay helped the anchor.
Movement budget DLimit mean absolute rank movement from the anchor.
Top-3 / top-10 overlapReject corrections that reorder too much of the head of the prediction.
Schema validationEnsure row order, class columns, finite values, and [0, 1] bounds.

The perturbation budget is:

\[D = \operatorname{mean}\left(\left|B - A\right|\right)\]

For the PCEN sidecar in this run:

KnobValue
SIDECAR_EXP002_VARIANTexp002b_5s
SIDECAR_EXP002_FOLDS[0]
SIDECAR_EXP002_WEIGHT_CAP0.030
SIDECAR_EXP002_D_BUDGET0.004
SIDECAR_EXP002_ANCHOR_TOPK48
SIDECAR_EXP002_SIDE_TOPK32
SIDECAR_EXP002_TAU0.55
SIDECAR_EXP002_GATE_CSVexp002b_oof_gate.csv
SIDECAR_EXP002_USE_OOF_GATETrue

The gate CSV is produced outside the notebook from replay/OOF artifacts. Conceptually, for each class (c), it asks:

\[\Delta_c(w) = \operatorname{AUC}_c\left(A + wM(S-A)\right) - \operatorname{AUC}_c(A)\]

Then the sidecar opens class (c) only if the best masked blend gives reliable positive delta after shrinkage checks. Classes that do not pass remain exactly anchored because their gate_weight is zero.

Show notebook snippet: loading PCEN OOF gate weights
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
def _load_exp002_gate_weights(asset_root, class_cols):
    if not SIDECAR_EXP002_USE_OOF_GATE:
        return None
    gate_path = Path(asset_root) / str(SIDECAR_EXP002_GATE_CSV)
    if not gate_path.exists():
        if SIDECAR_EXP002_GATE_REQUIRE_FILE:
            raise FileNotFoundError(f"{SIDECAR_EXP002_LABEL} OOF gate CSV not found")
        return None

    gate = pd.read_csv(gate_path)
    assert "primary_label" in gate.columns
    assert "gate_weight" in gate.columns
    assert gate["primary_label"].is_unique

    weight_map = dict(zip(gate["primary_label"], gate["gate_weight"]))
    weights = np.array(
        [float(weight_map.get(str(col), 0.0)) for col in class_cols],
        dtype=np.float32,
    )
    weights = np.clip(weights, 0.0, float(SIDECAR_EXP002_GATE_MAX_WEIGHT))
    assert np.isfinite(weights).all()

    enabled_count = int((weights > 0).sum())
    if SIDECAR_EXP002_REQUIRE and enabled_count == 0:
        raise ValueError(f"{SIDECAR_EXP002_LABEL} OOF gate has zero enabled classes")
    return weights

5. PCEN/ConvNeXt Correction Logic

The PCEN sidecar is a custom log-mel / PCEN ConvNeXt path. The active variant is exp002b_5s, which expects:

Asset FieldValue
context seconds5.0
target seconds5.0
image time256
timeout600 seconds
requested folds[0]

PCEN is useful because it emphasizes different signal properties than the anchor branches. Where the anchor leans heavily on Perch, ProtoSSM, distilled SED, prior correction, and rank scaling, the PCEN sidecar can contribute local evidence for acoustic patterns that the anchor under-ranks.

But it is also riskier. For that reason, it is not added as:

1
p_final = 0.97 * p_anchor + 0.03 * p_pcen

Instead, it is applied as a masked rank correction:

Show notebook snippet: PCEN masked rank blend
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
def _exp002_masked_rank_blend(anchor_df, side_df, class_weights=None):
    assert anchor_df.index.equals(side_df.index)
    assert anchor_df.columns.equals(side_df.columns)

    A = _exp002_rank_cols(anchor_df).to_numpy(np.float32)
    S = _exp002_rank_cols(side_df).to_numpy(np.float32)
    mask = _exp002_topk_mask(A, SIDECAR_EXP002_ANCHOR_TOPK) | (
        _exp002_topk_mask(S, SIDECAR_EXP002_SIDE_TOPK)
        & (A >= float(SIDECAR_EXP002_TAU))
    )

    if class_weights is None:
        E = float(np.mean(mask * np.abs(S - A)))
        w_budget = 0.0 if E <= 1e-12 else float(SIDECAR_EXP002_D_BUDGET) / E
        w = min(float(SIDECAR_EXP002_WEIGHT_CAP), w_budget)
        B = A.copy()
        if mask.any() and w > 0:
            B[mask] = A[mask] + w * (S[mask] - A[mask])
    else:
        W = np.clip(class_weights, 0.0, float(SIDECAR_EXP002_GATE_MAX_WEIGHT))
        WW = np.broadcast_to(W[None, :], A.shape)
        B0 = A.copy()
        if mask.any() and W.max() > 0:
            B0[mask] = A[mask] + WW[mask] * (S[mask] - A[mask])
        D0 = float(np.mean(np.abs(B0 - A)))
        if D0 > float(SIDECAR_EXP002_D_BUDGET) and D0 > 1e-12:
            W = W * (float(SIDECAR_EXP002_D_BUDGET) / D0)

        WW = np.broadcast_to(W[None, :], A.shape)
        B = A.copy()
        if mask.any() and W.max() > 0:
            B[mask] = A[mask] + WW[mask] * (S[mask] - A[mask])

    D = float(np.mean(np.abs(B - A)))
    top3 = _exp002_topk_overlap(A, B, 3)
    top10 = _exp002_topk_overlap(A, B, 10)
    ok = (
        D > 0.0
        and active <= float(SIDECAR_EXP002_MAX_ACTIVE_FRACTION)
        and top3 >= float(SIDECAR_EXP002_MIN_TOP3_OVERLAP)
        and top10 >= float(SIDECAR_EXP002_MIN_TOP10_OVERLAP)
    )

This code encodes a useful competition principle:

A specialized side model should earn the right to move the anchor class by class.

Global blending assumes the sidecar is broadly calibrated. OOF-gated masked rank correction assumes something weaker and often more realistic: the sidecar may be good for some classes and harmful for others.

6. BirdNET Sidecar

BirdNET is handled with the same caution. It is not allowed to move every label. The notebook distinguishes:

GroupWeight Cap
covered Aves classes already mapped by Perch0.015
covered Aves classes where Perch is weaker or unmapped0.060
non-bird classes0.000

The intended effect is:

1
2
3
use BirdNET where it has direct biological/audio coverage,
give it a larger voice where the anchor has weaker direct mapping,
keep non-bird taxa anchored.

The dry-run output explains why this branch did not change the public notebook preview:

1
2
BirdNET row_id mismatch; keeping anchor submission. missing=3, extra=12
BirdNET sidecar elapsed: 2.1s

That fallback is intentional. If the sidecar cannot align to the canonical submission rows, it must not write a partially mismatched correction.

7. Public Dry-Run Behavior

The notebook output shown in the public environment is a dry run because the hidden test soundscapes are not mounted. That creates two important effects:

StageDry-Run Result
yukiZ branchruns on dry-run rows and aligns to sample_submission.csv
PowerOptimization PSSM branchwrites diagnostic and intermediate CSVs
PowerOptimization EoS9 branchwrites the dominant anchor CSV
taxonomy smoothingapplies successfully using taxonomy.csv
BirdNET sidecarskipped because row IDs do not match anchor rows
PCEN sidecarskipped because test_soundscapes/*.ogg files are not visible
final outputremains the taxonomy-smoothed anchor

The notebook reports:

1
2
3
4
Wrote submission.csv: rows=3, cols=235, min=0.461531, max=0.529922
final_D_vs_base_anchor: 0.0
final_top3_overlap_vs_base: 1.0
final_top10_overlap_vs_base: 1.0

Zero movement in this preview should not be interpreted as evidence that the sidecars have no hidden-test effect. It means the public dry run did not have matching soundscape inputs for those sidecar corrections. In a real test run, the sidecar path is allowed to execute only if its assets, row IDs, class columns, OOF gates, and movement diagnostics all pass.

8. Final Writer And Submission Safety

The final writer is one of the most important parts of the notebook. High-scoring Kaggle notebooks often fail not because the model is weak, but because the final submission.csv has a silent schema problem:

FailureExample
row mismatchsidecar emits different row_id set
column mismatchmodel class order differs from sample_submission.csv
duplicate rowsrepeated row_id after merge
invalid valuesNaN, inf, or values outside [0, 1]
index leakageaccidental Unnamed: 0 column

This notebook reloads and checks the file after writing.

Show notebook snippet: final submission writer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
def _align_to_sample_submission_if_possible(df):
    sample_path = _find_sample_submission_path()
    if sample_path is None:
        return df
    sample = pd.read_csv(sample_path)
    assert "row_id" in sample.columns
    assert sample["row_id"].is_unique
    sample_cols = sample.columns.tolist()
    missing_cols = [c for c in sample_cols if c not in df.columns]
    assert not missing_cols

    final_ids = set(df["row_id"])
    sample_ids = set(sample["row_id"])
    if final_ids == sample_ids:
        aligned = (
            df.set_index("row_id")
              .loc[sample["row_id"], sample_cols[1:]]
              .reset_index()
        )
        aligned.columns = sample_cols
        return aligned
    raise AssertionError("final row_id set differs from sample_submission")


def write_final_submission(pred, path="submission.csv"):
    df = _as_explicit_submission_table(pred)
    df = _align_to_sample_submission_if_possible(df)
    prob_cols = [c for c in df.columns if c != "row_id"]
    values = df[prob_cols].to_numpy(dtype=np.float32)
    assert np.isfinite(values).all()
    assert values.min() >= 0.0 and values.max() <= 1.0
    df.to_csv(path, index=False)

    check = pd.read_csv(path)
    assert check.columns.tolist() == df.columns.tolist()
    assert len(check) == len(df)
    assert check["row_id"].is_unique
    return df

This is more than defensive programming. It is what makes sidecar experimentation safe. If BirdNET, PCEN, taxonomy smoothing, or any branch alignment creates a schema mismatch, the notebook either keeps the anchor or fails loudly.

9. How To Read The Notebook Outputs

The most useful output blocks are not the giant inference logs. They are the compact summaries:

OutputInterpretation
Active model listConfirms which anchor branches are contributing.
Weight tableConfirms the outer blend sums to one.
Sidecar overviewConfirms whether BirdNET and PCEN are active and which gates/budgets they use.
Taxonomy smoothing printoutConfirms taxonomy.csv, genus groups, class groups, and alpha values.
Branch diagnosticsConfirms row count, class count, min/max probability, and duplicate rows.
Final previewShows the actual submission.csv after all enabled corrections.
Final delta summaryConfirms how far final ranks moved from the pre-sidecar anchor.

For this public dry run, the main diagnostic story is:

1
Anchor branches run -> taxonomy smoothing applies -> sidecars cannot match dry-run inputs -> final remains anchor.

That fallback is the desired behavior. The public notebook remains reproducible, while the hidden-test path is ready to apply sidecars when real test soundscapes are present.

10. Why This Structure Is Useful

The strongest idea in the notebook is not any single threshold. It is the separation of responsibilities:

ResponsibilityImplementation Pattern
preserve known public strengthkeep EoS9 as the anchor
add biological priortaxonomy smoothing after direct blend
test new models safelysidecars as masked corrections, not global blends
avoid over-moving ranksD budgets and top-k overlap checks
use offline evidenceclass-wise OOF gate weights
protect submission formatfinal writer with schema and range checks

This structure also makes experiments interpretable. If the PCEN sidecar improves score, the improvement is not mixed with a changed anchor. If it hurts, the diagnostics can show whether the harm came from too many active cells, too much rank movement, weak OOF gates, or bad row alignment.

The notebook therefore functions less as a single model recipe and more as a small experiment framework:

1
2
3
4
fixed anchor
    + controlled taxonomy prior
    + optional class-gated sidecar movement
    + strict final validation

11. References

ReferenceUsed In This Notebook As
F.A.Nina, birdclef-2026-eos-9primary EoS9 anchor structure
Karnakbayev Artur, power-optimizationPowerOptimization branch and validation style
yukiZ, Perch + ProtoSSM + ResSSMlow-weight pretrained diversity branch
Tucker Arrants, BC2026 Distilled SEDdistilled SED ONNX branch
custom exp002 / exp002b assetsoptional PCEN/log-mel ConvNeXt sidecars

12. What This Notebook Establishes

This notebook establishes a disciplined way to extend a strong public BirdCLEF anchor:

  1. Build the submission around a known strong EoS9-style control path.
  2. Apply taxonomy smoothing as a small, interpretable biological prior.
  3. Treat side models as local rank corrections, not unconditional ensemble members.
  4. Use OOF gates so sidecars move only classes where offline evidence supports them.
  5. Enforce row, column, finite-value, range, and duplicate checks at the final writer.

The result is a submission architecture that is flexible but controlled. It can attach PCEN/ConvNeXt and BirdNET side evidence, while still requiring every sidecar movement to pass a mask, a class gate, and a perturbation budget.

That is the central lesson:

In a high-dimensional multilabel audio competition, the safest way to add a new model is not to blend it everywhere. It is to let it move only where it has earned trust.

This post is licensed under CC BY 4.0 by the author.