Post

Maze Crawler: Structure Baseline

Maze Crawler: Structure Baseline

Maze Crawler: Structure Baseline

Competition link:
Maze Crawler

Kaggle notebook link:
Maze Crawler: Structure Baseline

Maze Crawler structure baseline cover

Maze Crawler looks like a grid game, but the useful mental model is closer to a small control system under partial observation. The agent is not trying to predict a label. It is trying to keep a convoy alive while a scrolling loss boundary rises from the south, resources appear under limited vision, and every robot action can interfere with every other robot.

The baseline in this notebook is built around one rule:

\[\textbf{survive first, convert energy second, avoid self-inflicted losses always.}\]

That rule explains the structure of the agent. The implementation is not organized around isolated heuristics such as “move north” or “collect crystal.” It is organized as a layered planner:

LayerResponsibility
Mechanics / physicsEncode movement, walls, cooldowns, upkeep, scroll timing, and collision risk.
World modelRemember walls, mines, nodes, robot history, visible resources, and fogged facts.
PathingConvert targets into reachable first steps using bounded BFS over trusted map facts.
StrategyChoose factory tempo, worker corridor support, scout ferry behavior, and optional mine hooks.
Safety plannerNormalize actions, reserve cells, handle forced vacates, and block illegal or self-destructive actions.
Entry pointReturn the Kaggle action dictionary through agent(obs, config).

The important design choice is dependency direction:

1
rules -> memory -> route candidates -> strategic intent -> safety shield -> action dict

The strategy can be tuned. The mechanics should stay boring, deterministic, and rule-correct.

1. The Game In One Page

Maze Crawler is easiest to read as a scrolling survival race with an economy tiebreaker. The factory must keep moving north while the southern boundary rises. Energy matters, but only if it is converted back into usable robot energy before the units carrying or storing it are lost.

Maze Crawler unit roles: factory, scout, worker, and miner

UnitPlain RoleWhat The Baseline Must Respect
FactoryKingLosing it ends the episode, and every build competes with northward tempo.
WorkerEngineerOpens walls and keeps the factory corridor alive.
ScoutEyes and courierFinds crystals quickly, but carried energy must be banked safely.
MinerInvestmentCan create high value, but mine energy must be harvested before it counts.

The two main loops are:

LoopMain QuestionTypical Failure
Survival loopCan the factory keep enough margin above the scroll boundary?Too many build, side-step, or idle turns.
Energy loopCan collected energy become robot energy before it is lost?Scout dies full, mine stores stranded energy, transfer creates blockers.

The factory margin is the first state variable I want to see in a replay:

\[m_t = \operatorname{row}(\text{factory}_t) - \operatorname{southBound}_t\]

When (m_t) becomes small, almost every other ambition should be demoted. The agent can still collect value, but only through actions that do not weaken the route floor.

2. Why A Greedy Agent Fails

A greedy baseline is tempting:

1
move north, collect nearby crystals, build whenever affordable

That is enough to produce legal-looking actions, but it breaks as soon as rule side effects accumulate. The issue is not that greedy choices are always bad. The issue is that greedy choices rarely price the tempo, liquidity, and collision consequences of the action.

Greedy play versus survival-first play in Maze Crawler

Greedy HabitHidden CostBaseline Response
Build whenever affordableFactory loses a north move.Use move-first gates and margin bands.
Chase every crystalWorkers and scouts abandon survival jobs.Score targets by role and pressure state.
Transfer whenever adjacentSource can become a zero-energy blocker.Use transfer ledgers and corridor guards.
Transform every nodeMine energy may never return to robots.Price lifetime, carrier ETA, and recovery.
Trust only current visionFog hides remembered walls, nodes, and mines.Keep persistent world memory.
Side-step every wallFactory survives locally but loses tempo.Prefer route search and careful jump use.

The operational principle is:

\[\textbf{take useful actions only after pricing their rule side effects.}\]

That is why the baseline contains more validation than a starter bot. Most game losses in this kind of environment are not caused by failing to see a fancy tactic. They are caused by illegal moves, stale memory, friendly blockers, or spending the factory turn on something that should have waited.

3. Module Map

The notebook writes a standalone main.py agent. The code is long, but the architectural map is compact:

ModuleCore QuestionWhy It Exists
Shared setupWhat vocabulary does every layer use?Unit types, wall bits, directions, action names, profile knobs, and dataclasses.
Mechanics / physicsWhat does the game engine allow?Movement, walls, cooldowns, scroll timing, combat danger, and upkeep affordability.
World modelWhat do we know after fog hides cells?Durable map memory, resources, mines, visible robots, and assignments.
Pathing and targetsWhich useful goals are reachable?Bounded BFS, factory route probes, frontier discovery, crystals, unload cells, and wall jobs.
Strategy and safetyWhich legal action should each robot take?Factory floor selection, worker vanguard, scout ferry, mine hooks, reservations, and normalization.
Entry pointHow does Kaggle call the agent?compute_actions(...), agent(obs, config), and per-player state storage.

The code intentionally separates what is true about the board from what the current policy prefers. This makes tuning less dangerous. Changing scout thresholds should not accidentally change wall legality. Changing worker behavior should not rewrite scroll inference.

Show notebook snippet: shared dataclasses
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
@dataclass(frozen=True)
class Robot:
    uid: str
    rtype: int
    col: int
    row: int
    energy: int
    owner: int
    move_cd: int = 0
    jump_cd: int = 0
    build_cd: int = 0

    @property
    def pos(self):
        return (self.col, self.row)


@dataclass
class Target:
    """High-level intent candidate before primitive action commitment."""

    kind: str
    pos: tuple[int, int]
    score: float


@dataclass(frozen=True)
class RouteQuality:
    """Factory route summary used to price tempo before building units."""

    action: str | None
    next_pos: tuple[int, int] | None
    exists: bool
    uses_jump: bool
    north_gain: int
    margin_surplus: int
    blocked: bool


@dataclass(frozen=True)
class SafetyDecision:
    """Pure factory-destination evaluation result."""

    ok: bool
    forced_actions: dict
    forced_reserved_next: dict
    reserved_next: set
    sacrifice_uids: set
    reason: str = ""

4. Mechanics / Physics

In this notebook, “physics” means the deterministic rules that decide whether an action is possible or dangerous. For Maze Crawler, the crucial mechanics are:

MechanicConsequence
Scrolling boundaryA locally useful action can still be fatal if it spends factory tempo.
Walls as edge factsMovement legality depends on the wall bit of the current cell and sometimes the opposite bit of the neighbor.
CooldownsA good route action is useless if the robot cannot move yet.
Upkeep before actionA robot that cannot pay upkeep should not propose a non-idle action.
Combat rankEnding on an enemy cell can destroy the robot.
Jump cooldownJump can save tempo, but spending it casually can remove the escape option later.

The danger band is dynamic. As the scroll speed ramps, a fixed “safe margin” becomes too optimistic. The baseline therefore expands the caution gap with step progress:

\[d_t = d_0 + \left\lfloor b \cdot \min\left(1,\frac{t}{T_{\text{ramp}}}\right) \right\rfloor\]

where (d_0) is the base danger gap, (b) is the ramp bonus, and (T_{\text{ramp}}) is the scroll ramp duration.

Show notebook snippet: scroll and wall mechanics
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
def action_ready(cd):
    # Cooldowns tick before action validation.
    return int(cd or 0) <= 1


def dynamic_danger_gap_for_step(step, config):
    ramp = int(cfg(config, "scrollRampSteps", 400))
    progress = min(1.0, max(0.0, float(step) / max(1, ramp)))
    return SCROLL_DANGER_GAP + int(SCROLL_DANGER_RAMP_BONUS * progress)


def step_pos(pos, direction, distance=1):
    dc, dr, _ = DIRS[direction]
    return pos[0] + dc * distance, pos[1] + dr * distance


def known_edge_blocked(world, pos, direction):
    """Return True if either known side of an edge has the wall bit set."""
    if direction not in DIRS:
        return False
    _, _, bit = DIRS[direction]
    wall = world.wall_at(pos)
    if wall is not None and (wall & bit):
        return True
    nxt = step_pos(pos, direction)
    if not world.in_bounds(nxt):
        return False
    opposite_wall = world.wall_at(nxt)
    if opposite_wall is None:
        return False
    opposite_bit = DIRS[OPPOSITE[direction]][2]
    return bool(opposite_wall & opposite_bit)

The key is that mechanics do not rank strategic options. They simply answer questions such as:

1
2
3
4
5
Can this robot move?
Can it pay after upkeep?
Is the edge blocked?
Is the destination occupied?
Would the robot be crushed?

That boundary keeps the strategy layer honest.

5. World Model

The raw observation is only the current visible slice of the world. The agent needs a memory layer because pathing and economy decisions depend on facts that may disappear into fog.

The world model stores:

MemoryWhat It Does
Wall memoryKeeps seen wall bits and synchronizes reciprocal edge facts.
Node memoryRemembers discovered mineable nodes.
Mine memoryKeeps remembered mines and estimates hidden generation.
Visible stateTracks current crystals, enemies, friendly robots, and visible cells.
Assignment memoryReduces repeated target switching.
Robot historyHelps identify movement patterns and stale units.

This is the central split:

1
2
WorldModel asks: what do we know?
Strategy asks: what should we do with it?

The world model also uses symmetry. When a wall is observed on one side of the map, the mirrored wall can be inserted as a default memory fact. A direct observation later still wins. That is a useful compromise: the agent benefits from map structure without pretending that inferred facts are as strong as fresh vision.

Show notebook snippet: memory update and reciprocal walls
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class WorldModel:
    """Observation normalizer and durable memory."""

    def __init__(self, obs, config, state_store=None):
        self.obs = to_plain_dict(obs)
        self.config = to_plain_dict(config)
        self.state_store = STATE_BY_PLAYER if state_store is None else state_store
        self.player = int(self.obs.get("player", 0))
        self.width = int(cfg(self.config, "width", 20))
        self.height = int(cfg(self.config, "height", 20))
        self.step = self._infer_step()
        self.south = self._infer_south_bound()
        self.north = self._infer_north_bound()
        self.state = self._state_for_player()

        self.own = {}
        self.enemies = {}
        self.own_positions = {}
        self.enemy_positions = {}
        self.visible_crystals = {}
        self.visible_nodes = set()
        self.visible_cells = set()
        self.factory = None

        self.update_memory()

    def update_memory(self):
        self._read_robots()
        self._read_visible_cells()
        self._read_walls()
        self._read_resources()
        self._prune_memory()

    def _read_walls(self):
        walls = self.obs.get("walls") or []
        observed = {}
        for idx, wall in enumerate(walls):
            wall = int(wall)
            if wall < 0:
                continue
            pos = (idx % self.width, self.south + idx // self.width)
            if self.in_bounds(pos):
                observed[pos] = wall
                self.state["known_walls"][pos] = wall

        # A wall is an edge fact. If one side is observed and the opposite
        # cell is remembered, synchronize the reciprocal bit.
        for pos, wall in observed.items():
            for direction, (_, _, bit) in DIRS.items():
                nxt = step_pos(pos, direction)
                if not self.in_bounds(nxt) or nxt in observed:
                    continue
                if nxt not in self.state["known_walls"]:
                    continue
                opposite_bit = DIRS[OPPOSITE[direction]][2]
                neighbor_wall = int(self.state["known_walls"][nxt])
                if wall & bit:
                    neighbor_wall |= opposite_bit
                else:
                    neighbor_wall &= ~opposite_bit
                self.state["known_walls"][nxt] = neighbor_wall

The mine logic is particularly important. A mine outside vision may still be generating energy, but its last_seen should not be refreshed unless the cell is actually visible. Otherwise the planner starts treating old mine estimates as fresh evidence.

6. Pathing And Candidate Targets

Pathing converts high-level intent into a first primitive step. The baseline uses bounded BFS because the action timeout matters:

1
2
if a target is too expensive to search,
degrade to local safe motion instead of missing the turn deadline

The pathing layer produces:

Pathing BlockPurpose
BFS first stepFind a legal first direction toward a target.
Factory route probeTest whether the factory has a north-gaining route before reserving cells.
Jump-aware routeAsk whether a jump can preserve tempo when walls block ordinary movement.
Frontier detectionFind remembered cells adjacent to unknown space.
Target enumerationGenerate crystals, unload cells, wall jobs, node jobs, mine harvests, and vanguard positions.
Sticky orderingAvoid target churn when multiple options are close.

The pathing function avoids friendly-occupied cells by default. There is one important exception: a miner that will TRANSFORM leaves its current cell before movement, so that cell can be a valid same-turn destination for a carrier.

Show notebook snippet: bounded BFS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def bfs_first_step_and_distance(
    world,
    robot,
    goals,
    reserved_next,
    max_nodes=MAX_BFS_NODES,
    allow_occupied_goal=False,
):
    goals = set(goals)
    if not goals:
        return None, None, None
    start = robot.pos
    if start in goals:
        return "IDLE", start, 0

    queue = deque([start])
    first_dir = {start: None}
    distance = {start: 0}
    seen = {start}
    nodes = 0

    while queue and nodes < max_nodes:
        cur = queue.popleft()
        nodes += 1
        for direction, nxt in world.neighbors(cur, allow_unknown_target=False):
            if nxt in seen:
                continue
            if nxt != start and nxt in reserved_next:
                continue
            if allow_occupied_goal and nxt in goals:
                if crush_danger(robot, world.enemy_at(nxt)):
                    continue
                return direction if cur == start else first_dir[cur], nxt, distance[cur] + 1
            if nxt != start and nxt in world.own_positions:
                continue
            if crush_danger(robot, world.enemy_at(nxt)):
                continue
            seen.add(nxt)
            first_dir[nxt] = direction if cur == start else first_dir[cur]
            distance[nxt] = distance[cur] + 1
            if nxt in goals:
                return first_dir[nxt], nxt, distance[nxt]
            queue.append(nxt)
    return None, None, None

This is a good example of the baseline’s style. Pathing does not decide that a crystal is worth chasing. It only reports whether a route exists and what the first step would be. Target scoring lives one layer above.

7. Resource Conversion: Scout, Transfer, Miner

Energy is not a score until it becomes useful. This is why the baseline treats crystal collection, transfer, and mine transform as separate contracts.

Scout Ferry

Scouts are useful because they expand vision and collect crystals quickly. But a scout full of energy far from the factory is a liability. The policy therefore asks whether the scout still has a good forward job. If not, high carried energy triggers unload behavior.

1
2
if scout energy is high and forward value is weak:
    return toward factory unload cells

The threshold is deliberately explicit:

KnobMeaning
SCOUT_RETURN_ENERGYSoft return threshold.
SCOUT_FORCE_RETURN_ENERGYHard return threshold.
UNLOAD_FRACTIONFraction of capacity that indicates banking pressure.
ENABLE_MARGIN_SURPLUS_SCOUTAllows a scout only when route margin is healthy.

Transfer

Transfer looks harmless because no unit moves. But transfer changes two things at once:

  1. where the energy is,
  2. whether the source robot becomes an empty blocker.

Unsafe transfer versus safer transfer in Maze Crawler

A safe transfer must therefore consider liquidity and geometry. Funding the next factory build can be worth it. Creating a zero-energy robot directly in the factory corridor is usually not.

The baseline also avoids over-liquidation. If the factory already committed a build this turn, a later transfer cannot fund that build, and the next build will be delayed by cooldown anyway. So the transfer check looks at planned actions, not only current energy.

Miner Transform

A miner is an investment. Transforming a node into a mine is not the reward. The reward appears only if mine energy returns to the robot economy before the scroll, distance, or carrier constraints make it useless.

Good transform versus risky transform in Maze Crawler

The active profile keeps MINER_TARGET_COUNT = 0, so mine hooks are present but conservative. That is intentional. Before adding a stronger mining economy, the survival and route-floor behavior should be measured cleanly.

8. Factory Tempo

The factory is the main budget. Every non-movement factory action has an opportunity cost because the southern boundary keeps advancing.

This baseline builds two factory candidates:

  1. a floor candidate that preserves survival,
  2. a complete candidate that may spend surplus on support or scouting.

The selector is:

\[a_F = \begin{cases} a_{\text{floor}}, & \text{route blocked or margin pressure is high}\\ a_{\text{complete}}, & \text{route healthy and margin surplus available} \end{cases}\]

Scout-first opening versus worker-first opening in Maze Crawler

There is no universal first build. An open corridor can support scout-first search. A blocked or low-branching start may need worker-first wall control. The baseline tries to make that decision from route quality rather than from a fixed opening script.

Side-step versus jump north tradeoff in Maze Crawler

Jump is the same kind of tradeoff. It can recover north tempo, but it spends cooldown. The baseline therefore uses jump more freely under emergency pressure and more conservatively when a side route or worker wall job can preserve the long game.

Show notebook snippet: factory floor versus complete candidate
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def factory_policy(self, robot):
    wall = self.world.current_wall(robot)
    gap = robot.row - self.world.south
    danger = self.dynamic_danger_gap()
    emergency = gap <= danger

    route_quality = self.factory_route_quality(
        robot,
        emergency=emergency,
        apply=False,
    )
    floor = self.factory_floor_candidate(robot, wall, route_quality, emergency)
    if self.factory_floor_pressure(robot, route_quality):
        return floor

    complete = self.factory_complete_candidate(robot, wall, route_quality, emergency)
    if complete is not None:
        return complete
    return floor


def factory_floor_pressure(self, robot, route_quality):
    return (
        route_quality.blocked
        or route_quality.margin_surplus <= FACTORY_FLOOR_PRESSURE_SURPLUS
        or self.front_worker_is_doing_wall_job(robot)
        or (
            route_quality.uses_jump
            and route_quality.margin_surplus <= FACTORY_FLOOR_PRESSURE_SURPLUS + 3
        )
    )

This is the core of the profile named:

1
PROFILE = "baseline_route_floor_hybrid"

The profile does not try to maximize short-term energy. It tries to keep a reliable route floor, then uses surplus margin to add economy.

9. Strategy And Safety Planner

The strategy layer decides intent. The safety planner decides whether the intent can actually be committed.

This distinction matters because many failures occur after a good high-level idea has been translated into a bad primitive action. For example:

Good IntentBad Primitive Failure
Move factory northFriendly worker blocks the destination.
Bank scout energyScout transfer leaves an empty body in the corridor.
Open a wallWorker cannot pay upkeep plus wall cost.
Harvest mineCarrier path collides with another planned unit.
Jump out of dangerJump cooldown is unavailable or destination is unsafe.

The planner orders robots by urgency. Factory and robots near the scroll line are planned first because their reservations define constraints for less urgent units.

Show notebook snippet: planning order, reservations, and normalization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
def plan(self):
    danger = self.dynamic_danger_gap()
    self.planned_special_actions = self.preplan_special_actions()

    robots = sorted(
        self.world.own.values(),
        key=lambda robot: (
            0 if robot.rtype == FACTORY else 1 if robot.row - self.world.south <= danger else 2,
            0 if robot.rtype == WORKER else 1 if robot.rtype == MINER else 2 if robot.rtype == SCOUT else 3,
            robot.row,
            robot.uid,
        ),
    )

    for robot in robots:
        if robot.uid in self.forced_actions:
            action, next_pos = self.forced_actions[robot.uid]
            self.commit(robot, action, next_pos)
            continue
        if robot.uid in self.planned_special_actions:
            action, next_pos = self.planned_special_actions[robot.uid]
            self.commit(robot, action, next_pos)
            continue
        if self.expired():
            self.commit(robot, "IDLE", robot.pos)
            continue
        action, next_pos = self.policy_action(robot)
        self.commit(robot, action, next_pos)

    self.world.state["last_actions"] = dict(self.actions)
    return self.actions


def commit(self, robot, action, next_pos):
    action, next_pos = self.normalize_action(robot, action)
    self.actions[robot.uid] = action

    if action in DIRS or action.startswith("JUMP_"):
        self.reserved_next.add(next_pos)
    elif action in {"BUILD_SCOUT", "BUILD_WORKER", "BUILD_MINER"}:
        self.reserved_next.add(robot.pos)
        self.reserved_next.add(step_pos(robot.pos, "NORTH"))
    elif action == "TRANSFORM":
        # The miner disappears before movement, so the cell is not reserved.
        pass
    else:
        self.reserved_next.add(robot.pos)

The safety layer also normalizes invalid actions to IDLE. That may sound passive, but it is much better than returning an illegal action dictionary and letting the engine decide the failure mode.

10. Action Emitter

The Kaggle-facing entry point is deliberately small. The main planner can be tested through compute_actions(...), while agent(obs, config) simply supplies the shared state store.

Show notebook snippet: competition entry point
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def compute_actions(obs, config, started, state_store=None, strategy_cls=None):
    deadline = started + SOFT_ACT_DEADLINE
    try:
        world = WorldModel(obs, config, state_store=state_store)
        strategy_type = Strategy if strategy_cls is None else strategy_cls
        strategy = strategy_type(world, started, deadline)
        actions = strategy.plan()
        return {uid: action for uid, action in actions.items() if uid in world.own}
    except Exception:
        try:
            player = int(to_plain_dict(obs).get("player", 0))
            store = STATE_BY_PLAYER if state_store is None else state_store
            store[player] = fresh_state(player)
        except Exception:
            pass
        return {}


def agent(obs, config):
    return compute_actions(obs, config, time.perf_counter())

Two details are worth keeping:

DetailReason
state_store is injectableTests can isolate memory without changing production behavior.
strategy_cls is injectableExperiments can swap policy behavior while keeping mechanics and safety fixed.

That makes the notebook a baseline platform rather than only a single bot.

11. Rule-Level Checks

Single-game score is noisy, so the notebook includes rule-level checks before score interpretation. These checks are not meant to prove that the strategy is strong. They protect invariants that are easy to break while tuning.

Check FamilyWhat It Guards
Factory survivalRoute-first cooldown behavior, floor-candidate pressure behavior, safe jump handling.
Worker engineeringCorrect wall direction, corridor opening, vanguard movement, guarded refuel.
Scout economyOne-scout target count, active scout replacement, high-energy return behavior.
Mine economyTransform timing, hidden mine energy, carrier reachability, same-turn harvest hooks.
Transfer policyNo careless zero-energy blockers, planned-build ledger respected, route-critical support.
Standalone fileCompile success and synchronized main.py / submission.py source.

A failed rule check should stop score interpretation. Otherwise it is too easy to mistake a broken mechanic for a weak strategy.

The notebook also copies the generated agent into a standalone submission file:

1
Path("submission.py").write_text(Path("main.py").read_text())

Then it verifies that the standalone file remains importable and synchronized.

12. Evaluation Metrics

For actual strategy comparison, the right unit is a paired seed comparison. For each seed:

\[d_i = R_i^{\text{candidate}} - R_i^{\text{baseline}}\]

Then summarize:

\[CI_{95} \approx \bar d \pm 1.96 \frac{s_d}{\sqrt{N}}\]

But the reward delta is only the first line of evidence. The replay metrics should explain the failure shape.

Failure ShapeFirst Metrics To CheckLikely Adjustment
Factory dies earlyfactory_min_margin, factory_death_step, factory_build_when_route_existsLet the floor candidate dominate more often.
Route exists but progress is weakfactory_idle_when_route_exists, side-step streaks, route cooldown turnsImprove route scoring or worker vanguard support.
Scout underperformsscout_build_margin_surplus, scout_energy_ge_90_turns, scout_transfer_countRaise scout surplus threshold or disable scout in floor runs.
Worker appears too earlyfactory_build_count, first worker timingBuild worker only when long route is blocked or near a dead end.
Worker cannot sustain wallsworker_energy_below_threshold, failed wall-removal turns, refuel countCompare route-critical refuel budgets.
Jump is missing laternoncritical_north_wall_jump_count, jump_cd_unavailable_in_danger_countKeep noncritical jumps conservative.
Transfer hurts scorefactory_worker_refuel_count, estimated overflow, worker wall action after refuelTighten refuel purpose gates or overflow budget.
Many actions normalize to idlenormalized_to_IDLE_countInspect legality, cooldowns, reservations, and target conflicts.

The diagnosis tree is intentionally simple:

1
2
3
4
5
6
7
Did factory margin collapse?
    yes -> inspect whether floor candidate was overridden
    no  -> did the worker fail to keep the route open?
        yes -> tune route-critical refuel and worker wall targeting
        no  -> did scout energy return safely?
            no  -> tighten scout surplus / return thresholds
            yes -> consider later economy hooks

That is the reason this baseline is structured rather than only heuristic. If the route floor is weak, tune survival. If the factory survives but the score is low, inspect resource conversion. If both are fine but the score plateaus, then it is time to add a stronger economy layer.

13. What This Baseline Establishes

The notebook establishes a practical rule-based foundation for Maze Crawler:

FoundationWhat It Enables
Rule-correct mechanicsStrategy changes do not need to rediscover cooldown, wall, upkeep, and collision rules.
Durable world memoryThe agent can plan from fogged facts without treating stale estimates as fresh vision.
Bounded pathingThe planner can search useful routes without risking the action timeout.
Factory floor policySurvival has a stable fallback before economy is considered.
Safety shieldGood high-level intents are prevented from becoming illegal primitive actions.
Rule checksExperiments can be filtered before noisy score comparisons.

The result is not a final grandmaster agent. It is a structured baseline that makes later improvements measurable.

The next natural experiments are:

  1. tune the factory floor versus complete candidate selector,
  2. compare scout surplus thresholds on paired seeds,
  3. activate miner hooks only after route survival is stable,
  4. add replay metrics that separate wall-block losses from resource-conversion losses,
  5. test alternative worker vanguard policies without changing mechanics.

The central lesson is the same as the opening rule:

In Maze Crawler, economy is real only after survival, pathing, and safety have already done their jobs.

This post is licensed under CC BY 4.0 by the author.