Chinese Checkers AlphaZero

A from-scratch implementation of DeepMind’s AlphaZero algorithm applied to Chinese Checkers, featuring self-play training, Monte Carlo Tree Search, and a polished web interface for playing against trained AI agents or friends.

Overview

This project explores the application of modern reinforcement learning techniques to Chinese Checkers—a classic strategy game with a branching factor that makes traditional minimax approaches computationally intractable.

The implementation follows the AlphaZero paradigm: a neural network learns to evaluate board positions and suggest moves entirely through self-play, without any human game knowledge beyond the rules. The result is a family of AI agents at various skill levels, from beginner-friendly to highly competitive.

Why Chinese Checkers?

Chinese Checkers presents unique challenges compared to games like Chess or Go:

High branching factor: A single marble can chain multiple jumps, creating hundreds of possible moves per turn
Multi-agent dynamics: Traditional Chinese Checkers supports 2-6 players, requiring different strategic considerations
Asymmetric objectives: Players race to opposite corners, creating interesting tension between offensive progress and defensive blocking
Sparse rewards: Games can last 50+ moves with the only reward signal at game end

Technical Architecture

Neural Network Design

The policy-value network uses a ResNet-inspired architecture optimized for the hexagonal board structure:

Input Layer (Board State Encoding)
    │
    ├── 121 hexagonal positions
    ├── Current player encoding
    ├── Move history (last 8 states)
    └── Valid move mask
    │
    ▼
Residual Tower (10 blocks)
    │
    ├── 128 filters per conv layer
    ├── Batch normalization
    └── ReLU activation
    │
    ├────────────┬────────────┤
    ▼            ▼
Policy Head   Value Head
    │            │
    ▼            ▼
Move Probs   Win Probability
(softmax)    (tanh: -1 to 1)

Key Design Decisions:

Hexagonal convolutions: Custom convolutional kernels that respect the 6-neighbor topology of the Chinese Checkers board
Jump sequence encoding: Moves are encoded as (start, end) pairs, with the network learning to evaluate multi-jump sequences
Opponent modeling: For 2-player mode, the value head predicts win probability; for multiplayer, it outputs a ranking distribution

Monte Carlo Tree Search (MCTS)

The MCTS implementation includes several enhancements over vanilla UCT:

class MCTS:
    def search(self, root_state, num_simulations=800):
        """
        Perform MCTS with neural network guidance.

        Selection:  UCB1 with neural network prior
        Expansion:  Expand all children, evaluate with NN
        Backup:     Update Q-values along path
        """
        for _ in range(num_simulations):
            node = self.select(root_state)
            value = self.expand_and_evaluate(node)
            self.backup(node, value)

        return self.get_policy(root_state)

MCTS Enhancements:

Feature	Purpose
Dirichlet noise	Adds exploration at root node during training
Progressive widening	Limits branching in high-action states
Virtual losses	Enables parallel tree search
Transposition table	Caches evaluations for repeated positions

Training Pipeline

The self-play training loop follows the AlphaZero methodology:

┌─────────────────────────────────────────────────────────────┐
│                    Training Loop                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐            │
│   │ Self-Play │───▶│ Replay   │───▶│ Training │            │
│   │ Workers   │    │ Buffer   │    │ Worker   │            │
│   └──────────┘    └──────────┘    └──────────┘            │
│        │                               │                    │
│        │         ┌──────────┐         │                    │
│        └────────▶│ Model    │◀────────┘                    │
│                  │ Checkpoint│                              │
│                  └──────────┘                               │
│                       │                                     │
│                       ▼                                     │
│               ┌──────────────┐                             │
│               │  Evaluation  │                             │
│               │  (Elo Rating)│                             │
│               └──────────────┘                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Training Statistics:

Self-play games: ~500,000 games over training
Training iterations: 200 iterations, ~2,500 games per iteration
Hardware: 4x RTX 3090 for self-play, 1x A100 for training
Total training time: ~72 hours

Game Features

AI Difficulty Levels

The trained models are checkpointed at various stages to provide a range of opponents:

Level	Name	Elo	MCTS Sims	Description
1	Novice	800	50	Early training checkpoint. Makes obvious mistakes, good for learning.
2	Casual	1200	100	Plays reasonable moves but misses complex tactics.
3	Intermediate	1600	200	Solid positional play, occasional brilliant moves.
4	Advanced	2000	400	Strong tactical awareness, efficient pathing.
5	Expert	2400	800	Full-strength model. Will exploit any mistake.
6	Master	2600+	1600	Extended search. Nearly optimal play.

Game Modes

vs AI (Single Player)

Select difficulty from Novice to Master
Undo/redo moves for learning
Move hints showing AI’s top 3 suggestions
Post-game analysis with AI evaluation graph

Local Multiplayer (Pass & Play)

2-player mode on a single device
Automatic board rotation on turn change
Optional move timer (30s, 1min, 2min, unlimited)
Win animations and statistics

Online Multiplayer (planned)

Real-time matches via WebSocket
Ranked matchmaking with Elo system
Spectator mode for ongoing games

User Interface Design

The UI draws inspiration from modern game platforms like chess.com and classic Flash/IO games, prioritizing:

Visual Design Principles

┌─────────────────────────────────────────────────────────────┐
│  ⚙️ Settings    Chinese Checkers AlphaZero    🏆 Stats      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    ┌─────────────┐                         │
│                    │   Board     │                         │
│                    │   Area      │                         │
│                    │             │                         │
│                    │    ⬡ ⬡ ⬡    │                         │
│                    │   ⬡ ⬡ ⬡ ⬡   │                         │
│                    │  ⬡ ⬡ ⬡ ⬡ ⬡  │                         │
│                    │             │                         │
│                    └─────────────┘                         │
│                                                             │
│  ┌──────────┐                          ┌──────────┐        │
│  │ Player 1 │                          │ Player 2 │        │
│  │ ●●●●●●   │                          │ ○○○○○○   │        │
│  │ Your Turn│                          │ AI: Exp  │        │
│  └──────────┘                          └──────────┘        │
│                                                             │
│         [↩️ Undo]  [💡 Hint]  [🔄 New Game]                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Core UI Components

Game Board

Smooth, responsive hexagonal grid
Marble pieces with satisfying physics-based animations
Valid move highlighting on piece selection
Jump path visualization for multi-hop moves
Subtle board texture with depth shadows

Piece Interactions

Click-to-select, click-to-move (beginner-friendly)
Drag-and-drop with snap-to-position
Hover states showing piece ownership
Selection glow effect with team color

Animations & Feedback

Marble sliding animations with easing curves
Jump chain animations (piece arcs over jumped positions)
Capture/arrival celebrations (subtle particle effects)
Turn transition with smooth board rotation (2P mode)
Win sequence with fireworks and stats summary

Sound Design

Satisfying “click” on piece selection
Marble sliding sounds with pitch variation
Distinct “hop” sound for jumps
Ambient background music (toggleable)
Victory fanfare

Settings Panel

┌─────────────────────────────────────┐
│         ⚙️ Settings                 │
├─────────────────────────────────────┤
│                                     │
│  🎮 Game                            │
│  ├─ AI Difficulty    [Advanced ▼]  │
│  ├─ Move Timer       [Off      ▼]  │
│  ├─ Show Hints       [●]           │
│  └─ Confirm Moves    [ ]           │
│                                     │
│  🎨 Appearance                      │
│  ├─ Board Theme      [Classic  ▼]  │
│  ├─ Piece Style      [Glass    ▼]  │
│  ├─ Animations       [●]           │
│  └─ Dark Mode        [●]           │
│                                     │
│  🔊 Audio                           │
│  ├─ Sound Effects    [●]  🔊━━━━○  │
│  ├─ Music            [ ]  🔊━━○━━  │
│  └─ Voice Lines      [ ]           │
│                                     │
│  📊 Stats                           │
│  ├─ Games Played: 47               │
│  ├─ Win Rate: 62%                  │
│  └─ Best Win: vs Expert            │
│                                     │
│           [Reset Stats]             │
│                                     │
└─────────────────────────────────────┘

Theme Options

Theme	Description
Classic	Traditional wooden board with glass marbles
Neon	Dark background with glowing pieces (IO game style)
Minimal	Clean whites and blacks, subtle shadows
Nature	Earthy tones, stone-like pieces
Retro	Pixel art style, 8-bit sound effects

Implementation Details

Tech Stack

Frontend

React 18 with TypeScript
Canvas/WebGL for board rendering (via PixiJS)
Framer Motion for UI animations
Zustand for state management
Howler.js for audio

Backend

Python FastAPI for game server
PyTorch for model inference
Redis for session management
WebSocket for real-time communication

AI Engine

ONNX Runtime for optimized inference
Web Worker isolation for non-blocking AI computation
Configurable search depth per difficulty level

Performance Optimizations

// Move generation with bitboard optimization
class MoveGenerator {
  // Precomputed jump tables for instant lookup
  private jumpTable: Map<number, number[]>;

  generateMoves(state: GameState): Move[] {
    const moves: Move[] = [];

    // Single steps: O(1) lookup per piece
    for (const piece of state.currentPlayerPieces) {
      moves.push(...this.singleSteps[piece]);
    }

    // Jump chains: BFS with visited tracking
    for (const piece of state.currentPlayerPieces) {
      this.generateJumpChains(piece, moves, new Set());
    }

    return moves;
  }
}

Optimization Results:

Metric	Before	After
Move generation	12ms	0.3ms
Board evaluation	8ms	0.1ms
AI move (Expert)	5s	1.2s
Memory usage	180MB	45MB

Development Roadmap

Completed

Core AlphaZero implementation
Self-play training pipeline
6 difficulty levels with distinct personalities
Web-based game interface
Local 2-player mode
Move hints and undo system
Settings persistence

In Progress

Post-game analysis with move-by-move AI evaluation
Board themes and piece customization
Sound effects and music

Planned

Online multiplayer with matchmaking
Tournament mode
Mobile-responsive design
Opening book integration
Endgame tablebase for perfect play

Research Insights

What the AI Learned

Through self-play, the AI discovered several strategic principles:

Center control: Strong models prioritize occupying central hexes early, maximizing jump opportunities
Chain building: Creating “highways” of pieces that enable long jump sequences
Blocking tactics: Strategically leaving pieces to obstruct opponent paths
Tempo management: Understanding when to make small advances vs. waiting for big jumps

Training Curves

The model’s strength (measured in Elo) evolved through distinct phases:

Elo
2600 │                                    ══════════
     │                              ═════╝
2200 │                        ═════╝
     │                  ═════╝
1800 │            ═════╝
     │      ═════╝
1400 │ ════╝
     │
1000 └──────────────────────────────────────────────
     0        50       100      150      200
                   Training Iterations

Iterations 0-30: Learning basic piece movement
Iterations 30-80: Discovering jump chains
Iterations 80-150: Positional understanding
Iterations 150+: Tactical refinement, diminishing returns

Try It Yourself

Play Online: checkers.shawnzhang.dev

Run Locally:

# Clone the repository
git clone https://github.com/shawnzhang12/chinese-checkers-alphazero
cd chinese-checkers-alphazero

# Install dependencies
pip install -r requirements.txt
npm install --prefix frontend

# Start the game
python server.py &
npm run dev --prefix frontend

# Open http://localhost:5173

Train Your Own Model:

# Configure training in config.yaml
python train.py --config config.yaml

# Monitor training
tensorboard --logdir runs/

Acknowledgments

DeepMind’s AlphaZero paper for the foundational algorithm
Leela Chess Zero community for distributed training insights
chess.com for UI/UX inspiration

Built with curiosity and too many late nights. If you enjoy the game, consider starring the repo!