March Machine Learning Mania 2026 - Baseline Pipeline
March Machine Learning Mania 2026: Baseline Pipeline, Step by Step
Competition link: March Machine Learning Mania 2026
Kaggle notebook link: MMLM2026_Baseline Pipeline (Step-by-step)
Related EDA notebook: Structured EDA: Dataset Statistical Analysis
Competition result for this broader MMLM2026 project track: Silver Medal, 92nd place
This notebook builds a readable end-to-end baseline for forecasting the 2026 NCAA basketball tournaments. The emphasis is not on a complicated ensemble. It is on three practical goals:
- keep the 2026 feature build leakage-safe
- turn raw season data into a unified team feature store
- convert team features into matchup probabilities and write a valid
submission.csv
TL;DR
- Forecast every possible 2026 tournament matchup for both men and women
- Use an aligned 2026 cutoff day of
93so current-season features do not peek too far ahead - Build one merged team feature store with
16,635rows and71columns - Train on historical NCAA tournament pairs:
2,410rows total - Compare a compact
corefeature set (13columns) with anexpandedset (74columns) - Validate with rolling season-forward OOF Brier on
2021-2025 - Best internal blend:
57% core + 43% expanded - Best blended OOF Brier:
0.169731 - Final notebook run wrote
519,144predictions and passed submission-format checks - The broader competition run ultimately finished
92nd, earning aSilver Medal
Why This Notebook Was Useful During The Competition
The most important point is that this notebook was not especially valuable because it was novel. It was valuable because it reduced the amount of work another competitor had to do before making a correct submission.
One reasonable mental model for live-competition notebook recommendations is:
\[\text{Recs} \propto U \times C \times T \times S\]where:
U= utilityC= clarityT= timingS= score credibility
For a baseline notebook, that product often matters more than novelty. In other words, the question is usually not “Is this the smartest model here?” It is “Can I fork this, trust it, and move forward without stepping on a rake?”
That framing fit March Machine Learning Mania 2026 unusually well. The competition itself had several operational details that made a clean baseline more useful than a flashy one:
- the prediction target is matchup probability, not simple team ranking
- the pair orientation is fixed by
TeamLow/TeamHigh, so label construction must be done carefully - current-season files were updated during the live competition, so a leakage-safe 2026 cutoff mattered
- submission handling was stage-aware, and the practical workflow around templates and preferred submissions made format mistakes costly
In that setting, a notebook that does the boring parts correctly has real value. This notebook does exactly that:
- it starts from a prior EDA instead of pretending feature engineering came from nowhere
- it makes the 2026 cutoff explicit and logs
cutoff_day_2026 = 93 - it builds a
16,635 x 71unified team feature store rather than scattered ad hoc tables - it turns that store into
74expanded pairwise features - it uses season-forward OOF Brier instead of optimistic random splitting
- it writes a stage-compatible submission file
- it ends with explicit submission sanity checks instead of assuming the CSV is fine
That combination makes it read less like a “look what I tried” notebook and more like a utility notebook. For another competitor, that means lower startup cost, lower debugging cost, and lower risk of silent leakage or malformed submission files.
The idea can be summarized as a user-cost problem:
\[L_{\text{user}} = t_{\text{understand}} + t_{\text{debug}} + \lambda \,\mathbf{1}_{\text{submission error}} + \mu \,\mathbf{1}_{\text{leakage}}\]What this notebook did well was not minimizing every leaderboard decimal. It was minimizing that user-side cost.
This is also why the notebook’s honesty helped. The model is still just a logistic-regression baseline,
\[p(y=1 \mid x) = \sigma(\beta^\top x),\]and the diagnostics are not dressed up to look magical:
coreOOF Brier:0.171637expandedOOF Brier:0.172966- blended OOF Brier:
0.169731
So the notebook does not claim that more features automatically won. Instead, it shows where the signal came from and where it did not. That kind of transparent reporting increases trust, which is part of S, score credibility. In the published run, the public leaderboard score was 0.1172485, which was good enough to make the notebook read as usable rather than purely illustrative.
There was also a clear demand-side match. Other public notebooks in the same competition were explicitly positioned as starting points, for example March Machine Learning Mania 2026 Starter and March ML Mania 2026 | HistGB + XGB + CatBoost. That is a useful clue that participants were actively looking for clean, reproducible baselines they could extend, not only for exotic modeling ideas. In practice, notebooks like this are often consumed as quiet utility assets: forked, adapted, and upvoted because they remove friction, not because they become long discussion threads.
Step 1. Runtime Configuration and Input Availability
Purpose
The first section makes the notebook portable across Kaggle and local environments. It also fixes the training window and validation years before any feature engineering begins.
Parameter choices and why
INPUT_SEARCH_DIRS: search in an explicit env path first, then Kaggle input, then local fallback.TRAIN_START_BY_PREFIX = {"M": 2003, "W": 2010}: men and women have different practical history windows in the competition files.TRAIN_END_SEASON = 2025: do not train on 2026 tournament outcomes.VALID_SEASONS = [2021, 2022, 2023, 2024, 2025]: use recent seasons for season-forward OOF diagnostics.PRED_CLIP = (0.02, 0.98): avoid extreme submission probabilities.
Outcome in this notebook
- Primary input directory resolved to
/kaggle/input/march-machine-learning-mania-2026 - All key files existed in the notebook run, including compact results, detailed results, Massey ordinals, and stage submission template
- Output directory resolved to
/kaggle/working
Quick notes for beginners:
- Environment variables make the same notebook reusable without hardcoding paths.
- A fixed validation window makes model comparisons fairer.
Show key code snippet
Reduced from original notebook: long feature lists and helper internals were omitted.
1
2
3
4
5
6
7
8
9
10
11
12
DEFAULT_KAGGLE_INPUT_DIR = Path("/kaggle/input/march-machine-learning-mania-2026")
INPUT_SEARCH_DIRS = _build_input_search_dirs()
INPUT_DIR = next((d for d in INPUT_SEARCH_DIRS if d.exists()), Path("."))
OUTPUT_DIR = Path(os.getenv("MMLM2026_OUTPUT_DIR", "."))
SUBMISSION_NROWS = int(os.getenv("MMLM2026_SUB_NROWS", "0")) or None
SUBMISSION_TEMPLATE = os.getenv("MMLM2026_SUBMISSION_TEMPLATE", "auto")
TRAIN_START_BY_PREFIX = {"M": 2003, "W": 2010}
TRAIN_END_SEASON = 2025
VALID_SEASONS = [2021, 2022, 2023, 2024, 2025]
PRED_CLIP = (0.02, 0.98)
Step 2. Leakage-Safe 2026 Day Cutoff
Purpose
This is one of the most important design choices in the whole notebook. If we let 2026 features use data from a later date than the current tournament prediction point, we leak future information into the feature store.
Parameter choices and why
- Read
MRegularSeasonCompactResults.csvandWRegularSeasonCompactResults.csv. - Find the maximum available
DayNumfor each league in 2026. - Use the minimum of those two values as the aligned cutoff.
- Apply that cutoff to every 2026 table that depends on daily game results.
Outcome in this notebook
The aligned cutoff day was:
1
93
That means all 2026 regular-season and related feature blocks were built using only information up to day 93.
Quick notes for beginners:
- Leakage means using information that would not have been available at prediction time.
- An aligned cutoff is especially useful here because the men and women files may not be updated to the same day.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
def get_aligned_cutoff_day_2026() -> int:
m = pd.read_csv(input_path("MRegularSeasonCompactResults.csv"),
usecols=["Season", "DayNum"])
w = pd.read_csv(input_path("WRegularSeasonCompactResults.csv"),
usecols=["Season", "DayNum"])
m_max = m.loc[m["Season"] == 2026, "DayNum"].max()
w_max = w.loc[w["Season"] == 2026, "DayNum"].max()
return int(min(m_max, w_max))
def apply_2026_cutoff(df: pd.DataFrame, cutoff_day_2026: int) -> pd.DataFrame:
return df[
(df["Season"] < 2026)
| ((df["Season"] == 2026) & (df["DayNum"] <= cutoff_day_2026))
].copy()
Step 3A. Core Team Base + Seed Features
Purpose
This block creates the first layer of team-level features from compact results: games played, wins, average scoring, average margin, and last observed day. It also joins official NCAA seeds when available.
Parameter choices and why
- Build one row per
(Season, TeamID)from compact results. - Convert each game into team-centric rows for both winner and loser.
- Use official seeds from
NCAATourneySeeds.csv. - Keep seed availability explicit with
seed_known.
Outcome in this notebook
- Base feature coverage:
- Men:
13,753rows across42seasons and381teams - Women:
9,851rows across29seasons and370teams
- Men:
- Seed feature coverage:
- Men:
2,626rows - Women:
1,744rows
- Men:
- Seed rows have
seed_known_ratio = 1.0because the seed table only contains teams with known seeds
Quick notes for beginners:
- Compact results only need final scores and winner/loser IDs, so they are useful for broad coverage.
- Seeds are sparse by design because not every team is seeded.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def build_compact_team_base(prefix: str, cutoff_day_2026: int) -> pd.DataFrame:
reg = pd.read_csv(
input_path(f"{prefix}RegularSeasonCompactResults.csv"),
usecols=["Season", "DayNum", "WTeamID", "LTeamID", "WScore", "LScore"],
)
reg = apply_2026_cutoff(reg, cutoff_day_2026)
w = reg.rename(columns={"WTeamID": "TeamID", "LTeamID": "OppTeamID",
"WScore": "ScoreFor", "LScore": "ScoreAgainst"})
l = reg.rename(columns={"LTeamID": "TeamID", "WTeamID": "OppTeamID",
"LScore": "ScoreFor", "WScore": "ScoreAgainst"})
w["Win"] = 1
l["Win"] = 0
g = pd.concat([w, l], ignore_index=True)
g["Margin"] = g["ScoreFor"] - g["ScoreAgainst"]
out = g.groupby(["Season", "TeamID"], as_index=False).agg(
games=("Win", "size"),
wins=("Win", "sum"),
avg_score_for=("ScoreFor", "mean"),
avg_score_against=("ScoreAgainst", "mean"),
avg_margin=("Margin", "mean"),
last_day=("DayNum", "max"),
)
out["win_pct"] = out["wins"] / out["games"]
return out
Step 3B. Detailed Box-Based Features (Recency + Opponent Adjustment)
Purpose
Compact results are broad, but they do not describe how a team wins. This block brings in box-score detail and turns it into efficiency-style features.
Parameter choices and why
- Estimate possessions with:
- Build
off_rtg,def_rtg, andnet_rtg. - Use exponential recency weighting with
tau_days = 15.0:
- Add opponent-adjusted ratings by comparing each team’s game-level ratings to the opponents’ season baselines.
Outcome in this notebook
- Recency-weighted coverage:
- Men:
8,346rows - Women:
5,965rows
- Men:
- Opponent-adjusted coverage matched the same counts
The lower row counts relative to compact features reflect the shorter historical span of detailed box-score files.
Quick notes for beginners:
off_rtgis points scored per 100 possessions.- Recency weighting gives more influence to a team’s latest form without discarding earlier games.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
g["poss"] = g["FGA"] - g["OR"] + g["TO"] + 0.475 * g["FTA"]
g["opp_poss"] = g["OppFGA"] - g["OppOR"] + g["OppTO"] + 0.475 * g["OppFTA"]
g["off_rtg"] = 100.0 * g["ScoreFor"] / g["poss"].clip(lower=1)
g["def_rtg"] = 100.0 * g["ScoreAgainst"] / g["opp_poss"].clip(lower=1)
g["net_rtg"] = g["off_rtg"] - g["def_rtg"]
max_day = g.groupby(["Season", "TeamID"])["DayNum"].transform("max")
g["rw_weight"] = np.exp(-(max_day - g["DayNum"]) / 15.0)
base = g.groupby(["Season", "TeamID"], as_index=False).agg(
raw_off_rtg=("off_rtg", "mean"),
raw_def_rtg=("def_rtg", "mean"),
)
Step 3C. Volatility, Conference Strength, Venue/Travel Proxy
Purpose
This block adds context features that are not just about raw scoring efficiency. It asks questions like:
- Is the team stable or volatile?
- How strong is its conference?
- What kind of home/away/neutral schedule did it play?
- How geographically varied was its season?
Parameter choices and why
close_margin = 5: define close games as games decided by5points or fewer.prior_games = 20: shrink conference-strength estimates toward neutral when inter-conference sample size is small.- Use
GameCities.csvandCities.csvto derive city/state diversity and entropy proxies.
Outcome in this notebook
- Volatility coverage:
- Men:
13,753rows - Women:
9,851rows
- Men:
- Conference-strength coverage:
- Men:
13,753rows - Women:
9,853rows
- Men:
- Venue coverage:
- Men:
13,753rows - Women:
9,851rows
- Men:
These blocks give broad season coverage and help the baseline go beyond simple win percentage.
Quick notes for beginners:
- Entropy is a summary of how spread out a distribution is.
- Bayesian-style shrinkage is a practical way to avoid overreacting to tiny samples.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
g["close_game"] = (g["margin"].abs() <= close_margin).astype(int)
g["close_win"] = g["close_game"] * g["Win"]
g["ot_game"] = (g["NumOT"] > 0).astype(int)
conf_stats["conf_strength_win"] = (conf_stats["conf_inter_wins"] + prior_games * 0.5) / (
conf_stats["conf_inter_games"] + prior_games
)
conf_stats["conf_strength_margin"] = conf_stats["conf_margin_sum"] / (
conf_stats["conf_inter_games"] + prior_games
)
out["city_entropy"] = out["city_entropy"].fillna(0.0)
out["city_diversity"] = out["unique_cities"] / out["city_games"].replace(0, np.nan)
Step 3D. Men-Only Features (Massey + Coach)
Purpose
The competition files include some men-only sources that are still useful enough to keep: Massey ordinal rankings and coaching continuity signals.
Parameter choices and why
- Use
MMasseyOrdinals.csvup toRankingDayNum <= 133. - Build both consensus rank summaries and short-term trend features with
trend_window = 21. - Build coach features at the season anchor day, including coach change, tenure, and days under the current coach.
- Leave women-only counterparts as missing and let downstream imputation handle them.
Outcome in this notebook
- Massey coverage:
8,356rows across24seasons and372teams - Coach coverage:
13,763rows across42seasons and381teams
This is a clean example of a pragmatic baseline choice: use extra information when it exists, but do not break the unified pipeline when it does not.
Quick notes for beginners:
- Massey ordinals are consensus-like ranking signals from multiple systems.
- Missing values are acceptable if the downstream model can handle them consistently.
Show key code snippet
1
2
3
4
5
6
7
8
9
consensus = final.groupby(["Season", "TeamID"], as_index=False).agg(
massey_rank_mean=("OrdinalRank", "mean"),
massey_rank_median=("OrdinalRank", "median"),
massey_rank_std=("OrdinalRank", "std"),
massey_n_systems=("SystemName", "nunique"),
)
out["coach_days_at_anchor"] = (out["anchor_day"] - out["FirstDayNum"] + 1).clip(lower=1)
out["coach_tenure_seasons"] = out.index.map(tenure)
Step 4. Build Unified Team Feature Store (All Blocks Merged)
Purpose
All earlier feature blocks are useful only if they can be merged into a single season-team table. This section builds that table and adds one of the most important engineered concepts in the notebook: surrogate_strength.
Parameter choices and why
- Merge compact, seed, metadata, season context, detailed, volatility, conference, venue, Massey, and coach blocks.
- Compute
program_ageandd1_active_flag. - Build
surrogate_strengthfrom standardized feature z-scores such aswin_pct,avg_margin,rw_net_rtg,adj_net_rtg,close_win_rate,conf_strength_win,neutral_rate, and negativemargin_std. - Add inverse Massey rank when available.
- Rank teams within
(Season, League)to create:surrogate_ranksurrogate_pctpseudo_seed
- Define
seed_or_pseudoas official seed when known, otherwise pseudo-seed.
Outcome in this notebook
team_store rows: 16,635team_store cols: 71- By league:
- Men:
8,346rows across24seasons and372teams - Women:
8,289rows across24seasons and369teams
- Men:
Coverage diagnostics were also informative:
- many core season features had
100%non-null coverage seed_numhad low coverage (0.174091) because seeds only exist for tournament teams- Massey and coach columns had roughly
50%non-null coverage because they are men-only or history-limited
Quick notes for beginners:
surrogate_strengthis a synthetic rating that lets the notebook score every team, not just seeded teams.seed_or_pseudois a simple but clever way to fill the seed gap for non-seeded teams.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
weights = {
"win_pct": 2.0,
"avg_margin": 1.5,
"rw_net_rtg": 1.5,
"adj_net_rtg": 1.2,
"close_win_rate": 0.8,
"conf_strength_win": 1.0,
"neutral_rate": 0.3,
"margin_std": -0.5,
}
out["surrogate_strength"] = 0.0
for col, w in weights.items():
_, z = _group_fill_and_zscore(out, col)
out["surrogate_strength"] += w * z
out["seed_or_pseudo"] = np.where(out["seed_num"].notna(),
out["seed_num"],
out["pseudo_seed"])
Step 5. Build Pairwise Tournament Training Frame
Purpose
Tournament prediction is not a team-level task. It is a matchup-level task. This section converts the team feature store into one row per (Season, TeamLow, TeamHigh) tournament pair.
Parameter choices and why
- Order teams as
TeamLowandTeamHighby ID for a fixed pair orientation. - Define target as whether
TeamLowactually won. - Use feature differences such as
low_feature - high_feature. - Build two model families:
core_features: a compact, hand-picked subsetexpanded_features: all pairwise differences plus context flags
Outcome in this notebook
train_pairs: 2,410train_df: 2,410core_features: 13expanded_features: 74
Target balance was nearly even:
- Men:
0.500345 - Women:
0.504683
That is exactly what we want in this formulation, because team ID order is not supposed to encode team strength.
Quick notes for beginners:
- Pairwise differencing is a common sports-modeling trick because it makes the model compare teams directly.
- A nearly balanced target often makes a simple logistic model easier to train and interpret.
Show key code snippet
1
2
3
4
5
6
7
8
9
for c in team_diff_cols:
feats[f"{c}_diff"] = (x[f"low_{c}"] - x[f"high_{c}"]).astype(np.float32)
feats["seed_known_both"] = (low_seed_known * high_seed_known).astype(np.float32)
feats["seed_missing_any"] = (1.0 - feats["seed_known_both"]).astype(np.float32)
feats["surrogate_gap_abs"] = feats["surrogate_strength_diff"].abs().astype(np.float32)
feats["is_men"] = (x["League"] == "Men").astype(np.float32)
feats["season"] = feats["Season"].astype(np.float32)
feats["is_2026"] = (feats["Season"] == 2026).astype(np.float32)
Step 6. Rolling OOF Diagnostics (Season-Forward, 2021-2025)
Purpose
This section answers the most practical question in the notebook: does the feature set generalize when we move forward by season?
Parameter choices and why
- For each validation season in
2021-2025, train only on earlier seasons. - Evaluate with Brier score:
- Compare
corevsexpanded. - Search blend weights on a grid from
0.00to1.00in41steps.
Results and interpretation
| Model | All | Men | Women |
|---|---|---|---|
| Core | 0.171637 | 0.199955 | 0.143063 |
| Expanded | 0.172966 | 0.199440 | 0.146252 |
Best blend:
- expanded weight:
0.43 - blended OOF Brier:
0.169731
Interpretation:
- the expanded feature set was not best on its own overall
- but it added complementary information, because the blend beat both standalone models
- in this notebook run, the women’s side was easier for the baseline than the men’s side
Quick notes for beginners:
- Lower Brier is better.
- Season-forward validation is stricter than random CV because it respects time order.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
12
13
14
def rolling_oof_predictions(df: pd.DataFrame,
feature_cols: list[str],
valid_seasons: list[int]) -> np.ndarray:
pred = np.full(len(df), np.nan, dtype=float)
for season in valid_seasons:
val_mask = df["Season"] == season
tr_mask = df["Season"] < season
model = make_model()
model.fit(df.loc[tr_mask, feature_cols], df.loc[tr_mask, "Target"])
pred[val_mask] = model.predict_proba(df.loc[val_mask, feature_cols])[:, 1]
return pred
Step 7. Fit Final Models on Full Training Window
Purpose
Once the notebook decides how to use the core and expanded views, it fits final models on the full historical training window.
Parameter choices and why
- Use a simple linear baseline:
SimpleImputer(strategy="median")StandardScaler()LogisticRegression(C=0.9, max_iter=1500, solver="lbfgs")
- Fit one model for
core_featuresand one forexpanded_features
Outcome in this notebook
Both final models were fitted successfully on all available tournament training rows.
Why this is a good baseline:
- it handles missing values cleanly
- it keeps coefficients interpretable
- it is stable enough to reveal whether the feature engineering is helping
Show key code snippet
1
2
3
4
5
6
7
8
def make_model() -> Pipeline:
return Pipeline(
steps=[
("imputer", SimpleImputer(strategy="median")),
("scaler", StandardScaler()),
("clf", LogisticRegression(C=0.9, max_iter=1500, solver="lbfgs")),
]
)
Step 8. Predict Submission IDs and Save submission.csv
Purpose
The final prediction phase converts Kaggle submission IDs into matchup rows, scores them, blends the two models, clips the probabilities, and writes the output file.
Parameter choices and why
- Automatically select a submission template:
- prefer
sample_submission.csvif present - otherwise choose between stage files
- prefer
- Parse ID strings into
Season,TeamLow, andTeamHigh - Blend predictions using the best OOF blend weight
- Clip final probabilities to
[0.02, 0.98]
Outcome in this notebook
- Selected template:
SampleSubmissionStage1.csv - Template rows:
519,144 - File written:
submission.csv
The first few predictions looked like this:
1
2
3
4
5
2022_1101_1102 0.565414
2022_1101_1103 0.445941
2022_1101_1104 0.234558
2022_1101_1105 0.705890
2022_1101_1106 0.599468
Quick notes for beginners:
- Kaggle submission IDs often encode the inference keys directly.
- Clipping can reduce pathological overconfidence from a simple baseline.
Show key code snippet
1
2
3
4
5
6
7
8
9
10
11
sub = pd.read_csv(template_path, nrows=SUBMISSION_NROWS)
pred_pairs = parse_submission_ids(sub)
pred_df = build_pair_feature_frame(pred_pairs, team_store)
p_core = core_model.predict_proba(pred_df[core_features])[:, 1]
p_exp = exp_model.predict_proba(pred_df[expanded_features])[:, 1]
pred = (1.0 - blend_w) * p_core + blend_w * p_exp
pred = np.clip(pred, *PRED_CLIP)
out = pd.DataFrame({"ID": sub["ID"], "Pred": pred})
out.to_csv(OUTPUT_DIR / "submission.csv", index=False)
Step 9. Submission Sanity Checks
Purpose
A strong model still fails if the submission file is malformed. This last step verifies format, row count, ordering, missing values, and numeric range.
Parameter choices and why
- Check exact column names:
["ID", "Pred"] - Check row count against the selected template
- Check ID equality and ordering
- Check prediction range inside
[0, 1] - Assert no missing predictions
Outcome in this notebook
- Column check passed
- Row count matched:
519144 vs 519144 - IDs were unique
Predmissing count was0- Prediction range was exactly
0.02to0.98 - All submission format checks passed
Show key code snippet
1
2
3
4
assert list(sub_check.columns) == ["ID", "Pred"]
assert len(sub_check) == len(base_check)
assert sub_check["ID"].equals(base_check["ID"])
assert sub_check["Pred"].between(0, 1).all()
Supplementary
A. Why TeamLow vs TeamHigh?
Using sorted team IDs gives every matchup a fixed orientation. That avoids duplicate representations like (A, B) and (B, A), and it makes the target definition stable: Target = 1 means the lower-ID team won.
B. Why surrogate_strength matters
Official seeds are powerful, but they are sparse. The notebook therefore builds a season-and-league-relative strength score for every team, then converts it into pseudo_seed. This lets the model keep a seed-like feature even when an official seed does not exist.
C. Why a simple logistic baseline is still useful
This notebook is a baseline, not a final ensemble. Because the model is simple, improvements in score mostly come from better feature design, better leakage control, and better validation logic. That makes iteration much easier.
D. The Practical Utility of This Notebook
The notebook’s strongest contribution was not modeling novelty. It was practical utility. During a live Kaggle competition, that matters a lot.
The code gives another participant a workflow that is easy to inherit:
- read the competition files from a sane path-resolution layer
- align current-season data with a leakage-safe cutoff
- build one reusable team feature store
- transform it into pairwise training data
- validate season-forward
- generate a submission file that is checked before export
That is exactly the kind of notebook people often fork quietly, adapt locally, and upvote because it saved them time.
E. Repro Checklist
- make sure the competition data files are available in Kaggle or via
MMLM2026_DATA_DIR - run the cells in order
- confirm the aligned 2026 cutoff day
- inspect OOF Brier for
core,expanded, and blended predictions - verify that
submission.csvpasses the final sanity checks
Final Summary
This baseline notebook is best understood as a clean modeling template for March Machine Learning Mania 2026. It starts from leakage-safe season truncation, builds a multi-block team feature store, transforms that store into pairwise tournament examples, validates with season-forward OOF Brier, and finishes with a checked submission file.
The more important lesson, though, is about notebook utility under competition pressure. For this kind of live Kaggle setting, a notebook can be genuinely useful even when the model itself is simple. If it reduces ambiguity, prevents leakage, prevents submission mistakes, and gives others a trustworthy starting point, it has already done something valuable.
That is the lens through which I would read this notebook. Its edge was not “model genius.” Its edge was that it lowered the practical cost of participation for other competitors while remaining reproducible, explicit, and honest about what was actually working. And in the broader competition context, this line of work still connected to a real result: the March Machine Learning Mania 2026 run finished in 92nd place and earned a Silver Medal.