March Machine Learning Mania 2026 - Baseline Pipeline

Posted Apr 1, 2026 Updated Apr 18, 2026

By Pilkwang Kim

20 min read

March Machine Learning Mania 2026: Baseline Pipeline, Step by Step

Competition link: March Machine Learning Mania 2026

Kaggle notebook link: MMLM2026_Baseline Pipeline (Step-by-step)

Related EDA notebook: Structured EDA: Dataset Statistical Analysis

Competition result for this broader MMLM2026 project track: Silver Medal, 92nd place

This notebook builds a readable end-to-end baseline for forecasting the 2026 NCAA basketball tournaments. The emphasis is not on a complicated ensemble. It is on three practical goals:

keep the 2026 feature build leakage-safe
turn raw season data into a unified team feature store
convert team features into matchup probabilities and write a valid submission.csv

TL;DR

Forecast every possible 2026 tournament matchup for both men and women
Use an aligned 2026 cutoff day of 93 so current-season features do not peek too far ahead
Build one merged team feature store with 16,635 rows and 71 columns
Train on historical NCAA tournament pairs: 2,410 rows total
Compare a compact core feature set (13 columns) with an expanded set (74 columns)
Validate with rolling season-forward OOF Brier on 2021-2025
Best internal blend: 57% core + 43% expanded
Best blended OOF Brier: 0.169731
Final notebook run wrote 519,144 predictions and passed submission-format checks
The broader competition run ultimately finished 92nd, earning a Silver Medal

Why This Notebook Was Useful During The Competition

The most important point is that this notebook was not especially valuable because it was novel. It was valuable because it reduced the amount of work another competitor had to do before making a correct submission.

One reasonable mental model for live-competition notebook recommendations is:

\[\text{Recs} \propto U \times C \times T \times S\]

where:

U = utility
C = clarity
T = timing
S = score credibility

For a baseline notebook, that product often matters more than novelty. In other words, the question is usually not “Is this the smartest model here?” It is “Can I fork this, trust it, and move forward without stepping on a rake?”

That framing fit March Machine Learning Mania 2026 unusually well. The competition itself had several operational details that made a clean baseline more useful than a flashy one:

the prediction target is matchup probability, not simple team ranking
the pair orientation is fixed by TeamLow / TeamHigh, so label construction must be done carefully
current-season files were updated during the live competition, so a leakage-safe 2026 cutoff mattered
submission handling was stage-aware, and the practical workflow around templates and preferred submissions made format mistakes costly

In that setting, a notebook that does the boring parts correctly has real value. This notebook does exactly that:

it starts from a prior EDA instead of pretending feature engineering came from nowhere
it makes the 2026 cutoff explicit and logs cutoff_day_2026 = 93
it builds a 16,635 x 71 unified team feature store rather than scattered ad hoc tables
it turns that store into 74 expanded pairwise features
it uses season-forward OOF Brier instead of optimistic random splitting
it writes a stage-compatible submission file
it ends with explicit submission sanity checks instead of assuming the CSV is fine

That combination makes it read less like a “look what I tried” notebook and more like a utility notebook. For another competitor, that means lower startup cost, lower debugging cost, and lower risk of silent leakage or malformed submission files.

The idea can be summarized as a user-cost problem:

\[L_{\text{user}} = t_{\text{understand}} + t_{\text{debug}} + \lambda \,\mathbf{1}_{\text{submission error}} + \mu \,\mathbf{1}_{\text{leakage}}\]

What this notebook did well was not minimizing every leaderboard decimal. It was minimizing that user-side cost.

This is also why the notebook’s honesty helped. The model is still just a logistic-regression baseline,

\[p(y=1 \mid x) = \sigma(\beta^\top x),\]

and the diagnostics are not dressed up to look magical:

core OOF Brier: 0.171637
expanded OOF Brier: 0.172966
blended OOF Brier: 0.169731

So the notebook does not claim that more features automatically won. Instead, it shows where the signal came from and where it did not. That kind of transparent reporting increases trust, which is part of S, score credibility. In the published run, the public leaderboard score was 0.1172485, which was good enough to make the notebook read as usable rather than purely illustrative.

There was also a clear demand-side match. Other public notebooks in the same competition were explicitly positioned as starting points, for example March Machine Learning Mania 2026 Starter and March ML Mania 2026 | HistGB + XGB + CatBoost. That is a useful clue that participants were actively looking for clean, reproducible baselines they could extend, not only for exotic modeling ideas. In practice, notebooks like this are often consumed as quiet utility assets: forked, adapted, and upvoted because they remove friction, not because they become long discussion threads.

Step 1. Runtime Configuration and Input Availability

Purpose

The first section makes the notebook portable across Kaggle and local environments. It also fixes the training window and validation years before any feature engineering begins.

Parameter choices and why

INPUT_SEARCH_DIRS: search in an explicit env path first, then Kaggle input, then local fallback.
TRAIN_START_BY_PREFIX = {"M": 2003, "W": 2010}: men and women have different practical history windows in the competition files.
TRAIN_END_SEASON = 2025: do not train on 2026 tournament outcomes.
VALID_SEASONS = [2021, 2022, 2023, 2024, 2025]: use recent seasons for season-forward OOF diagnostics.
PRED_CLIP = (0.02, 0.98): avoid extreme submission probabilities.

Outcome in this notebook

Primary input directory resolved to /kaggle/input/march-machine-learning-mania-2026
All key files existed in the notebook run, including compact results, detailed results, Massey ordinals, and stage submission template
Output directory resolved to /kaggle/working

Quick notes for beginners:

Environment variables make the same notebook reusable without hardcoding paths.
A fixed validation window makes model comparisons fairer.

Show key code snippet

Reduced from original notebook: long feature lists and helper internals were omitted.

  
DEFAULT_KAGGLE_INPUT_DIR = Path("/kaggle/input/march-machine-learning-mania-2026")
INPUT_SEARCH_DIRS = _build_input_search_dirs()
INPUT_DIR = next((d for d in INPUT_SEARCH_DIRS if d.exists()), Path("."))

OUTPUT_DIR = Path(os.getenv("MMLM2026_OUTPUT_DIR", "."))
SUBMISSION_NROWS = int(os.getenv("MMLM2026_SUB_NROWS", "0")) or None
SUBMISSION_TEMPLATE = os.getenv("MMLM2026_SUBMISSION_TEMPLATE", "auto")

TRAIN_START_BY_PREFIX = {"M": 2003, "W": 2010}
TRAIN_END_SEASON = 2025
VALID_SEASONS = [2021, 2022, 2023, 2024, 2025]
PRED_CLIP = (0.02, 0.98)

Step 2. Leakage-Safe 2026 Day Cutoff

Purpose

This is one of the most important design choices in the whole notebook. If we let 2026 features use data from a later date than the current tournament prediction point, we leak future information into the feature store.

Parameter choices and why

Read MRegularSeasonCompactResults.csv and WRegularSeasonCompactResults.csv.
Find the maximum available DayNum for each league in 2026.
Use the minimum of those two values as the aligned cutoff.
Apply that cutoff to every 2026 table that depends on daily game results.

Outcome in this notebook

The aligned cutoff day was:

93

That means all 2026 regular-season and related feature blocks were built using only information up to day 93.

Quick notes for beginners:

Leakage means using information that would not have been available at prediction time.
An aligned cutoff is especially useful here because the men and women files may not be updated to the same day.

Show key code snippet

  
def get_aligned_cutoff_day_2026() -> int:
    m = pd.read_csv(input_path("MRegularSeasonCompactResults.csv"),
                    usecols=["Season", "DayNum"])
    w = pd.read_csv(input_path("WRegularSeasonCompactResults.csv"),
                    usecols=["Season", "DayNum"])

    m_max = m.loc[m["Season"] == 2026, "DayNum"].max()
    w_max = w.loc[w["Season"] == 2026, "DayNum"].max()
    return int(min(m_max, w_max))


def apply_2026_cutoff(df: pd.DataFrame, cutoff_day_2026: int) -> pd.DataFrame:
    return df[
        (df["Season"] < 2026)
        | ((df["Season"] == 2026) & (df["DayNum"] <= cutoff_day_2026))
    ].copy()

Step 3A. Core Team Base + Seed Features

Purpose

This block creates the first layer of team-level features from compact results: games played, wins, average scoring, average margin, and last observed day. It also joins official NCAA seeds when available.

Parameter choices and why

Build one row per (Season, TeamID) from compact results.
Convert each game into team-centric rows for both winner and loser.
Use official seeds from NCAATourneySeeds.csv.
Keep seed availability explicit with seed_known.

Outcome in this notebook

Base feature coverage:
- Men: 13,753 rows across 42 seasons and 381 teams
- Women: 9,851 rows across 29 seasons and 370 teams
Seed feature coverage:
- Men: 2,626 rows
- Women: 1,744 rows
Seed rows have seed_known_ratio = 1.0 because the seed table only contains teams with known seeds

Quick notes for beginners:

Compact results only need final scores and winner/loser IDs, so they are useful for broad coverage.
Seeds are sparse by design because not every team is seeded.

Show key code snippet

  
def build_compact_team_base(prefix: str, cutoff_day_2026: int) -> pd.DataFrame:
    reg = pd.read_csv(
        input_path(f"{prefix}RegularSeasonCompactResults.csv"),
        usecols=["Season", "DayNum", "WTeamID", "LTeamID", "WScore", "LScore"],
    )
    reg = apply_2026_cutoff(reg, cutoff_day_2026)

    w = reg.rename(columns={"WTeamID": "TeamID", "LTeamID": "OppTeamID",
                            "WScore": "ScoreFor", "LScore": "ScoreAgainst"})
    l = reg.rename(columns={"LTeamID": "TeamID", "WTeamID": "OppTeamID",
                            "LScore": "ScoreFor", "WScore": "ScoreAgainst"})
    w["Win"] = 1
    l["Win"] = 0

    g = pd.concat([w, l], ignore_index=True)
    g["Margin"] = g["ScoreFor"] - g["ScoreAgainst"]

    out = g.groupby(["Season", "TeamID"], as_index=False).agg(
        games=("Win", "size"),
        wins=("Win", "sum"),
        avg_score_for=("ScoreFor", "mean"),
        avg_score_against=("ScoreAgainst", "mean"),
        avg_margin=("Margin", "mean"),
        last_day=("DayNum", "max"),
    )
    out["win_pct"] = out["wins"] / out["games"]
    return out

Step 3B. Detailed Box-Based Features (Recency + Opponent Adjustment)

Purpose

Compact results are broad, but they do not describe how a team wins. This block brings in box-score detail and turns it into efficiency-style features.

Parameter choices and why

Estimate possessions with:

\[\mathrm{poss} = \mathrm{FGA} - \mathrm{OR} + \mathrm{TO} + 0.475 \times \mathrm{FTA}\]

Build off_rtg, def_rtg, and net_rtg.
Use exponential recency weighting with tau_days = 15.0:

\[w_t = \exp\left(-\frac{d_{\max} - d_t}{15}\right)\]

Add opponent-adjusted ratings by comparing each team’s game-level ratings to the opponents’ season baselines.

Outcome in this notebook

Recency-weighted coverage:
- Men: 8,346 rows
- Women: 5,965 rows
Opponent-adjusted coverage matched the same counts

The lower row counts relative to compact features reflect the shorter historical span of detailed box-score files.

Quick notes for beginners:

off_rtg is points scored per 100 possessions.
Recency weighting gives more influence to a team’s latest form without discarding earlier games.

Show key code snippet

  
g["poss"] = g["FGA"] - g["OR"] + g["TO"] + 0.475 * g["FTA"]
g["opp_poss"] = g["OppFGA"] - g["OppOR"] + g["OppTO"] + 0.475 * g["OppFTA"]

g["off_rtg"] = 100.0 * g["ScoreFor"] / g["poss"].clip(lower=1)
g["def_rtg"] = 100.0 * g["ScoreAgainst"] / g["opp_poss"].clip(lower=1)
g["net_rtg"] = g["off_rtg"] - g["def_rtg"]

max_day = g.groupby(["Season", "TeamID"])["DayNum"].transform("max")
g["rw_weight"] = np.exp(-(max_day - g["DayNum"]) / 15.0)

base = g.groupby(["Season", "TeamID"], as_index=False).agg(
    raw_off_rtg=("off_rtg", "mean"),
    raw_def_rtg=("def_rtg", "mean"),
)

Step 3C. Volatility, Conference Strength, Venue/Travel Proxy

Purpose

This block adds context features that are not just about raw scoring efficiency. It asks questions like:

Is the team stable or volatile?
How strong is its conference?
What kind of home/away/neutral schedule did it play?
How geographically varied was its season?

Parameter choices and why

close_margin = 5: define close games as games decided by 5 points or fewer.
prior_games = 20: shrink conference-strength estimates toward neutral when inter-conference sample size is small.
Use GameCities.csv and Cities.csv to derive city/state diversity and entropy proxies.

Outcome in this notebook

Volatility coverage:
- Men: 13,753 rows
- Women: 9,851 rows
Conference-strength coverage:
- Men: 13,753 rows
- Women: 9,853 rows
Venue coverage:
- Men: 13,753 rows
- Women: 9,851 rows

These blocks give broad season coverage and help the baseline go beyond simple win percentage.

Quick notes for beginners:

Entropy is a summary of how spread out a distribution is.
Bayesian-style shrinkage is a practical way to avoid overreacting to tiny samples.

Show key code snippet

  
g["close_game"] = (g["margin"].abs() <= close_margin).astype(int)
g["close_win"] = g["close_game"] * g["Win"]
g["ot_game"] = (g["NumOT"] > 0).astype(int)

conf_stats["conf_strength_win"] = (conf_stats["conf_inter_wins"] + prior_games * 0.5) / (
    conf_stats["conf_inter_games"] + prior_games
)
conf_stats["conf_strength_margin"] = conf_stats["conf_margin_sum"] / (
    conf_stats["conf_inter_games"] + prior_games
)

out["city_entropy"] = out["city_entropy"].fillna(0.0)
out["city_diversity"] = out["unique_cities"] / out["city_games"].replace(0, np.nan)

Step 3D. Men-Only Features (Massey + Coach)

Purpose

The competition files include some men-only sources that are still useful enough to keep: Massey ordinal rankings and coaching continuity signals.

Parameter choices and why

Use MMasseyOrdinals.csv up to RankingDayNum <= 133.
Build both consensus rank summaries and short-term trend features with trend_window = 21.
Build coach features at the season anchor day, including coach change, tenure, and days under the current coach.
Leave women-only counterparts as missing and let downstream imputation handle them.

Outcome in this notebook

Massey coverage: 8,356 rows across 24 seasons and 372 teams
Coach coverage: 13,763 rows across 42 seasons and 381 teams

This is a clean example of a pragmatic baseline choice: use extra information when it exists, but do not break the unified pipeline when it does not.

Quick notes for beginners:

Massey ordinals are consensus-like ranking signals from multiple systems.
Missing values are acceptable if the downstream model can handle them consistently.

Show key code snippet

  
consensus = final.groupby(["Season", "TeamID"], as_index=False).agg(
    massey_rank_mean=("OrdinalRank", "mean"),
    massey_rank_median=("OrdinalRank", "median"),
    massey_rank_std=("OrdinalRank", "std"),
    massey_n_systems=("SystemName", "nunique"),
)

out["coach_days_at_anchor"] = (out["anchor_day"] - out["FirstDayNum"] + 1).clip(lower=1)
out["coach_tenure_seasons"] = out.index.map(tenure)

Step 4. Build Unified Team Feature Store (All Blocks Merged)

Purpose

All earlier feature blocks are useful only if they can be merged into a single season-team table. This section builds that table and adds one of the most important engineered concepts in the notebook: surrogate_strength.

Parameter choices and why

Merge compact, seed, metadata, season context, detailed, volatility, conference, venue, Massey, and coach blocks.
Compute program_age and d1_active_flag.
Build surrogate_strength from standardized feature z-scores such as win_pct, avg_margin, rw_net_rtg, adj_net_rtg, close_win_rate, conf_strength_win, neutral_rate, and negative margin_std.
Add inverse Massey rank when available.
Rank teams within (Season, League) to create:
- surrogate_rank
- surrogate_pct
- pseudo_seed
Define seed_or_pseudo as official seed when known, otherwise pseudo-seed.

Outcome in this notebook

team_store rows: 16,635
team_store cols: 71
By league:
- Men: 8,346 rows across 24 seasons and 372 teams
- Women: 8,289 rows across 24 seasons and 369 teams

Coverage diagnostics were also informative:

many core season features had 100% non-null coverage
seed_num had low coverage (0.174091) because seeds only exist for tournament teams
Massey and coach columns had roughly 50% non-null coverage because they are men-only or history-limited

Quick notes for beginners:

surrogate_strength is a synthetic rating that lets the notebook score every team, not just seeded teams.
seed_or_pseudo is a simple but clever way to fill the seed gap for non-seeded teams.

Show key code snippet

  
weights = {
    "win_pct": 2.0,
    "avg_margin": 1.5,
    "rw_net_rtg": 1.5,
    "adj_net_rtg": 1.2,
    "close_win_rate": 0.8,
    "conf_strength_win": 1.0,
    "neutral_rate": 0.3,
    "margin_std": -0.5,
}

out["surrogate_strength"] = 0.0
for col, w in weights.items():
    _, z = _group_fill_and_zscore(out, col)
    out["surrogate_strength"] += w * z

out["seed_or_pseudo"] = np.where(out["seed_num"].notna(),
                                 out["seed_num"],
                                 out["pseudo_seed"])

Step 5. Build Pairwise Tournament Training Frame

Purpose

Tournament prediction is not a team-level task. It is a matchup-level task. This section converts the team feature store into one row per (Season, TeamLow, TeamHigh) tournament pair.

Parameter choices and why

Order teams as TeamLow and TeamHigh by ID for a fixed pair orientation.
Define target as whether TeamLow actually won.
Use feature differences such as low_feature - high_feature.
Build two model families:
- core_features: a compact, hand-picked subset
- expanded_features: all pairwise differences plus context flags

Outcome in this notebook

train_pairs: 2,410
train_df: 2,410
core_features: 13
expanded_features: 74

Target balance was nearly even:

Men: 0.500345
Women: 0.504683

That is exactly what we want in this formulation, because team ID order is not supposed to encode team strength.

Quick notes for beginners:

Pairwise differencing is a common sports-modeling trick because it makes the model compare teams directly.
A nearly balanced target often makes a simple logistic model easier to train and interpret.

Show key code snippet

  
for c in team_diff_cols:
    feats[f"{c}_diff"] = (x[f"low_{c}"] - x[f"high_{c}"]).astype(np.float32)

feats["seed_known_both"] = (low_seed_known * high_seed_known).astype(np.float32)
feats["seed_missing_any"] = (1.0 - feats["seed_known_both"]).astype(np.float32)
feats["surrogate_gap_abs"] = feats["surrogate_strength_diff"].abs().astype(np.float32)
feats["is_men"] = (x["League"] == "Men").astype(np.float32)
feats["season"] = feats["Season"].astype(np.float32)
feats["is_2026"] = (feats["Season"] == 2026).astype(np.float32)

Step 6. Rolling OOF Diagnostics (Season-Forward, 2021-2025)

Purpose

This section answers the most practical question in the notebook: does the feature set generalize when we move forward by season?

Parameter choices and why

For each validation season in 2021-2025, train only on earlier seasons.
Evaluate with Brier score:

\[\mathrm{Brier} = \frac{1}{n}\sum_{i=1}^{n}(p_i - y_i)^2\]

Compare core vs expanded.
Search blend weights on a grid from 0.00 to 1.00 in 41 steps.

Results and interpretation

Model	All	Men	Women
Core	`0.171637`	`0.199955`	`0.143063`
Expanded	`0.172966`	`0.199440`	`0.146252`

Best blend:

expanded weight: 0.43
blended OOF Brier: 0.169731

Interpretation:

the expanded feature set was not best on its own overall
but it added complementary information, because the blend beat both standalone models
in this notebook run, the women’s side was easier for the baseline than the men’s side

Quick notes for beginners:

Lower Brier is better.
Season-forward validation is stricter than random CV because it respects time order.

Show key code snippet

  
def rolling_oof_predictions(df: pd.DataFrame,
                            feature_cols: list[str],
                            valid_seasons: list[int]) -> np.ndarray:
    pred = np.full(len(df), np.nan, dtype=float)

    for season in valid_seasons:
        val_mask = df["Season"] == season
        tr_mask = df["Season"] < season

        model = make_model()
        model.fit(df.loc[tr_mask, feature_cols], df.loc[tr_mask, "Target"])
        pred[val_mask] = model.predict_proba(df.loc[val_mask, feature_cols])[:, 1]

    return pred

Step 7. Fit Final Models on Full Training Window

Purpose

Once the notebook decides how to use the core and expanded views, it fits final models on the full historical training window.

Parameter choices and why

Use a simple linear baseline:
- SimpleImputer(strategy="median")
- StandardScaler()
- LogisticRegression(C=0.9, max_iter=1500, solver="lbfgs")
Fit one model for core_features and one for expanded_features

Outcome in this notebook

Both final models were fitted successfully on all available tournament training rows.

Why this is a good baseline:

it handles missing values cleanly
it keeps coefficients interpretable
it is stable enough to reveal whether the feature engineering is helping

Show key code snippet

  
def make_model() -> Pipeline:
    return Pipeline(
        steps=[
            ("imputer", SimpleImputer(strategy="median")),
            ("scaler", StandardScaler()),
            ("clf", LogisticRegression(C=0.9, max_iter=1500, solver="lbfgs")),
        ]
    )

Step 8. Predict Submission IDs and Save `submission.csv`

Purpose

The final prediction phase converts Kaggle submission IDs into matchup rows, scores them, blends the two models, clips the probabilities, and writes the output file.

Parameter choices and why

Automatically select a submission template:
- prefer sample_submission.csv if present
- otherwise choose between stage files
Parse ID strings into Season, TeamLow, and TeamHigh
Blend predictions using the best OOF blend weight
Clip final probabilities to [0.02, 0.98]

Outcome in this notebook

Selected template: SampleSubmissionStage1.csv
Template rows: 519,144
File written: submission.csv

The first few predictions looked like this:

2022_1101_1102  0.565414
2022_1101_1103  0.445941
2022_1101_1104  0.234558
2022_1101_1105  0.705890
2022_1101_1106  0.599468

Quick notes for beginners:

Kaggle submission IDs often encode the inference keys directly.
Clipping can reduce pathological overconfidence from a simple baseline.

Show key code snippet

  
sub = pd.read_csv(template_path, nrows=SUBMISSION_NROWS)
pred_pairs = parse_submission_ids(sub)
pred_df = build_pair_feature_frame(pred_pairs, team_store)

p_core = core_model.predict_proba(pred_df[core_features])[:, 1]
p_exp = exp_model.predict_proba(pred_df[expanded_features])[:, 1]
pred = (1.0 - blend_w) * p_core + blend_w * p_exp
pred = np.clip(pred, *PRED_CLIP)

out = pd.DataFrame({"ID": sub["ID"], "Pred": pred})
out.to_csv(OUTPUT_DIR / "submission.csv", index=False)

Step 9. Submission Sanity Checks

Purpose

A strong model still fails if the submission file is malformed. This last step verifies format, row count, ordering, missing values, and numeric range.

Parameter choices and why

Check exact column names: ["ID", "Pred"]
Check row count against the selected template
Check ID equality and ordering
Check prediction range inside [0, 1]
Assert no missing predictions

Outcome in this notebook

Column check passed
Row count matched: 519144 vs 519144
IDs were unique
Pred missing count was 0
Prediction range was exactly 0.02 to 0.98
All submission format checks passed

Show key code snippet

  
assert list(sub_check.columns) == ["ID", "Pred"]
assert len(sub_check) == len(base_check)
assert sub_check["ID"].equals(base_check["ID"])
assert sub_check["Pred"].between(0, 1).all()

Supplementary

A. Why `TeamLow` vs `TeamHigh`?

Using sorted team IDs gives every matchup a fixed orientation. That avoids duplicate representations like (A, B) and (B, A), and it makes the target definition stable: Target = 1 means the lower-ID team won.

B. Why `surrogate_strength` matters

Official seeds are powerful, but they are sparse. The notebook therefore builds a season-and-league-relative strength score for every team, then converts it into pseudo_seed. This lets the model keep a seed-like feature even when an official seed does not exist.

C. Why a simple logistic baseline is still useful

This notebook is a baseline, not a final ensemble. Because the model is simple, improvements in score mostly come from better feature design, better leakage control, and better validation logic. That makes iteration much easier.

D. The Practical Utility of This Notebook

The notebook’s strongest contribution was not modeling novelty. It was practical utility. During a live Kaggle competition, that matters a lot.

The code gives another participant a workflow that is easy to inherit:

read the competition files from a sane path-resolution layer
align current-season data with a leakage-safe cutoff
build one reusable team feature store
transform it into pairwise training data
validate season-forward
generate a submission file that is checked before export

That is exactly the kind of notebook people often fork quietly, adapt locally, and upvote because it saved them time.

E. Repro Checklist

make sure the competition data files are available in Kaggle or via MMLM2026_DATA_DIR
run the cells in order
confirm the aligned 2026 cutoff day
inspect OOF Brier for core, expanded, and blended predictions
verify that submission.csv passes the final sanity checks

Final Summary

This baseline notebook is best understood as a clean modeling template for March Machine Learning Mania 2026. It starts from leakage-safe season truncation, builds a multi-block team feature store, transforms that store into pairwise tournament examples, validates with season-forward OOF Brier, and finishes with a checked submission file.

The more important lesson, though, is about notebook utility under competition pressure. For this kind of live Kaggle setting, a notebook can be genuinely useful even when the model itself is simple. If it reduces ambiguity, prevents leakage, prevents submission mistakes, and gives others a trustworthy starting point, it has already done something valuable.

That is the lens through which I would read this notebook. Its edge was not “model genius.” Its edge was that it lowered the practical cost of participation for other competitors while remaining reproducible, explicit, and honest about what was actually working. And in the broader competition context, this line of work still connected to a real result: the March Machine Learning Mania 2026 run finished in 92nd place and earned a Silver Medal.

AI, Kaggle

This post is licensed under CC BY 4.0 by the author.

March Machine Learning Mania 2026: Baseline Pipeline, Step by Step

TL;DR

Why This Notebook Was Useful During The Competition

Step 1. Runtime Configuration and Input Availability

Purpose

Parameter choices and why

Outcome in this notebook

Step 2. Leakage-Safe 2026 Day Cutoff

Purpose

Parameter choices and why

Outcome in this notebook

Step 3A. Core Team Base + Seed Features

Purpose

Parameter choices and why

Outcome in this notebook

Step 3B. Detailed Box-Based Features (Recency + Opponent Adjustment)

Purpose

Parameter choices and why

Outcome in this notebook

Step 3C. Volatility, Conference Strength, Venue/Travel Proxy

Purpose

Parameter choices and why

Outcome in this notebook

Step 3D. Men-Only Features (Massey + Coach)

Purpose

Parameter choices and why

Outcome in this notebook

Step 4. Build Unified Team Feature Store (All Blocks Merged)

Purpose

Parameter choices and why

Outcome in this notebook

Step 5. Build Pairwise Tournament Training Frame

Purpose

Parameter choices and why

Outcome in this notebook

Step 6. Rolling OOF Diagnostics (Season-Forward, 2021-2025)

Purpose

Parameter choices and why

Results and interpretation

Step 7. Fit Final Models on Full Training Window

Purpose

Parameter choices and why

Outcome in this notebook

Step 8. Predict Submission IDs and Save submission.csv

Purpose

Parameter choices and why

Outcome in this notebook

Step 9. Submission Sanity Checks

Purpose

Parameter choices and why

Outcome in this notebook

Supplementary

A. Why TeamLow vs TeamHigh?

B. Why surrogate_strength matters

C. Why a simple logistic baseline is still useful

D. The Practical Utility of This Notebook

E. Repro Checklist

Final Summary

Trending Tags

Step 8. Predict Submission IDs and Save `submission.csv`

A. Why `TeamLow` vs `TeamHigh`?

B. Why `surrogate_strength` matters