← Back to Projects

Chinese Checkers AlphaZero

A reinforcement learning implementation of AlphaZero for Chinese Checkers with an interactive web-based game interface

Python PyTorch Reinforcement Learning MCTS React WebSocket
Chinese Checkers AlphaZero

Chinese Checkers AlphaZero

A from-scratch implementation of DeepMind’s AlphaZero algorithm applied to Chinese Checkers, featuring self-play training, Monte Carlo Tree Search, and a polished web interface for playing against trained AI agents or friends.


Overview

This project explores the application of modern reinforcement learning techniques to Chinese Checkers—a classic strategy game with a branching factor that makes traditional minimax approaches computationally intractable.

The implementation follows the AlphaZero paradigm: a neural network learns to evaluate board positions and suggest moves entirely through self-play, without any human game knowledge beyond the rules. The result is a family of AI agents at various skill levels, from beginner-friendly to highly competitive.

Why Chinese Checkers?

Chinese Checkers presents unique challenges compared to games like Chess or Go:

  • High branching factor: A single marble can chain multiple jumps, creating hundreds of possible moves per turn
  • Multi-agent dynamics: Traditional Chinese Checkers supports 2-6 players, requiring different strategic considerations
  • Asymmetric objectives: Players race to opposite corners, creating interesting tension between offensive progress and defensive blocking
  • Sparse rewards: Games can last 50+ moves with the only reward signal at game end

Technical Architecture

Neural Network Design

The policy-value network uses a ResNet-inspired architecture optimized for the hexagonal board structure:

Input Layer (Board State Encoding)
    ├── 121 hexagonal positions
    ├── Current player encoding
    ├── Move history (last 8 states)
    └── Valid move mask
Residual Tower (10 blocks)
    ├── 128 filters per conv layer
    ├── Batch normalization
    └── ReLU activation
    ├────────────┬────────────┤
    ▼            ▼
Policy Head   Value Head
    │            │
    ▼            ▼
Move Probs   Win Probability
(softmax)    (tanh: -1 to 1)

Key Design Decisions:

  • Hexagonal convolutions: Custom convolutional kernels that respect the 6-neighbor topology of the Chinese Checkers board
  • Jump sequence encoding: Moves are encoded as (start, end) pairs, with the network learning to evaluate multi-jump sequences
  • Opponent modeling: For 2-player mode, the value head predicts win probability; for multiplayer, it outputs a ranking distribution

Monte Carlo Tree Search (MCTS)

The MCTS implementation includes several enhancements over vanilla UCT:

class MCTS:
    def search(self, root_state, num_simulations=800):
        """
        Perform MCTS with neural network guidance.

        Selection:  UCB1 with neural network prior
        Expansion:  Expand all children, evaluate with NN
        Backup:     Update Q-values along path
        """
        for _ in range(num_simulations):
            node = self.select(root_state)
            value = self.expand_and_evaluate(node)
            self.backup(node, value)

        return self.get_policy(root_state)

MCTS Enhancements:

FeaturePurpose
Dirichlet noiseAdds exploration at root node during training
Progressive wideningLimits branching in high-action states
Virtual lossesEnables parallel tree search
Transposition tableCaches evaluations for repeated positions

Training Pipeline

The self-play training loop follows the AlphaZero methodology:

┌─────────────────────────────────────────────────────────────┐
│                    Training Loop                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐            │
│   │ Self-Play │───▶│ Replay   │───▶│ Training │            │
│   │ Workers   │    │ Buffer   │    │ Worker   │            │
│   └──────────┘    └──────────┘    └──────────┘            │
│        │                               │                    │
│        │         ┌──────────┐         │                    │
│        └────────▶│ Model    │◀────────┘                    │
│                  │ Checkpoint│                              │
│                  └──────────┘                               │
│                       │                                     │
│                       ▼                                     │
│               ┌──────────────┐                             │
│               │  Evaluation  │                             │
│               │  (Elo Rating)│                             │
│               └──────────────┘                             │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Training Statistics:

  • Self-play games: ~500,000 games over training
  • Training iterations: 200 iterations, ~2,500 games per iteration
  • Hardware: 4x RTX 3090 for self-play, 1x A100 for training
  • Total training time: ~72 hours

Game Features

AI Difficulty Levels

The trained models are checkpointed at various stages to provide a range of opponents:

LevelNameEloMCTS SimsDescription
1Novice80050Early training checkpoint. Makes obvious mistakes, good for learning.
2Casual1200100Plays reasonable moves but misses complex tactics.
3Intermediate1600200Solid positional play, occasional brilliant moves.
4Advanced2000400Strong tactical awareness, efficient pathing.
5Expert2400800Full-strength model. Will exploit any mistake.
6Master2600+1600Extended search. Nearly optimal play.

Game Modes

vs AI (Single Player)

  • Select difficulty from Novice to Master
  • Undo/redo moves for learning
  • Move hints showing AI’s top 3 suggestions
  • Post-game analysis with AI evaluation graph

Local Multiplayer (Pass & Play)

  • 2-player mode on a single device
  • Automatic board rotation on turn change
  • Optional move timer (30s, 1min, 2min, unlimited)
  • Win animations and statistics

Online Multiplayer (planned)

  • Real-time matches via WebSocket
  • Ranked matchmaking with Elo system
  • Spectator mode for ongoing games

User Interface Design

The UI draws inspiration from modern game platforms like chess.com and classic Flash/IO games, prioritizing:

Visual Design Principles

┌─────────────────────────────────────────────────────────────┐
│  ⚙️ Settings    Chinese Checkers AlphaZero    🏆 Stats      │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│                    ┌─────────────┐                         │
│                    │   Board     │                         │
│                    │   Area      │                         │
│                    │             │                         │
│                    │    ⬡ ⬡ ⬡    │                         │
│                    │   ⬡ ⬡ ⬡ ⬡   │                         │
│                    │  ⬡ ⬡ ⬡ ⬡ ⬡  │                         │
│                    │             │                         │
│                    └─────────────┘                         │
│                                                             │
│  ┌──────────┐                          ┌──────────┐        │
│  │ Player 1 │                          │ Player 2 │        │
│  │ ●●●●●●   │                          │ ○○○○○○   │        │
│  │ Your Turn│                          │ AI: Exp  │        │
│  └──────────┘                          └──────────┘        │
│                                                             │
│         [↩️ Undo]  [💡 Hint]  [🔄 New Game]                  │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Core UI Components

Game Board

  • Smooth, responsive hexagonal grid
  • Marble pieces with satisfying physics-based animations
  • Valid move highlighting on piece selection
  • Jump path visualization for multi-hop moves
  • Subtle board texture with depth shadows

Piece Interactions

  • Click-to-select, click-to-move (beginner-friendly)
  • Drag-and-drop with snap-to-position
  • Hover states showing piece ownership
  • Selection glow effect with team color

Animations & Feedback

  • Marble sliding animations with easing curves
  • Jump chain animations (piece arcs over jumped positions)
  • Capture/arrival celebrations (subtle particle effects)
  • Turn transition with smooth board rotation (2P mode)
  • Win sequence with fireworks and stats summary

Sound Design

  • Satisfying “click” on piece selection
  • Marble sliding sounds with pitch variation
  • Distinct “hop” sound for jumps
  • Ambient background music (toggleable)
  • Victory fanfare

Settings Panel

┌─────────────────────────────────────┐
│         ⚙️ Settings                 │
├─────────────────────────────────────┤
│                                     │
│  🎮 Game                            │
│  ├─ AI Difficulty    [Advanced ▼]  │
│  ├─ Move Timer       [Off      ▼]  │
│  ├─ Show Hints       [●]           │
│  └─ Confirm Moves    [ ]           │
│                                     │
│  🎨 Appearance                      │
│  ├─ Board Theme      [Classic  ▼]  │
│  ├─ Piece Style      [Glass    ▼]  │
│  ├─ Animations       [●]           │
│  └─ Dark Mode        [●]           │
│                                     │
│  🔊 Audio                           │
│  ├─ Sound Effects    [●]  🔊━━━━○  │
│  ├─ Music            [ ]  🔊━━○━━  │
│  └─ Voice Lines      [ ]           │
│                                     │
│  📊 Stats                           │
│  ├─ Games Played: 47               │
│  ├─ Win Rate: 62%                  │
│  └─ Best Win: vs Expert            │
│                                     │
│           [Reset Stats]             │
│                                     │
└─────────────────────────────────────┘

Theme Options

ThemeDescription
ClassicTraditional wooden board with glass marbles
NeonDark background with glowing pieces (IO game style)
MinimalClean whites and blacks, subtle shadows
NatureEarthy tones, stone-like pieces
RetroPixel art style, 8-bit sound effects

Implementation Details

Tech Stack

Frontend

  • React 18 with TypeScript
  • Canvas/WebGL for board rendering (via PixiJS)
  • Framer Motion for UI animations
  • Zustand for state management
  • Howler.js for audio

Backend

  • Python FastAPI for game server
  • PyTorch for model inference
  • Redis for session management
  • WebSocket for real-time communication

AI Engine

  • ONNX Runtime for optimized inference
  • Web Worker isolation for non-blocking AI computation
  • Configurable search depth per difficulty level

Performance Optimizations

// Move generation with bitboard optimization
class MoveGenerator {
  // Precomputed jump tables for instant lookup
  private jumpTable: Map<number, number[]>;

  generateMoves(state: GameState): Move[] {
    const moves: Move[] = [];

    // Single steps: O(1) lookup per piece
    for (const piece of state.currentPlayerPieces) {
      moves.push(...this.singleSteps[piece]);
    }

    // Jump chains: BFS with visited tracking
    for (const piece of state.currentPlayerPieces) {
      this.generateJumpChains(piece, moves, new Set());
    }

    return moves;
  }
}

Optimization Results:

MetricBeforeAfter
Move generation12ms0.3ms
Board evaluation8ms0.1ms
AI move (Expert)5s1.2s
Memory usage180MB45MB

Development Roadmap

Completed

  • Core AlphaZero implementation
  • Self-play training pipeline
  • 6 difficulty levels with distinct personalities
  • Web-based game interface
  • Local 2-player mode
  • Move hints and undo system
  • Settings persistence

In Progress

  • Post-game analysis with move-by-move AI evaluation
  • Board themes and piece customization
  • Sound effects and music

Planned

  • Online multiplayer with matchmaking
  • Tournament mode
  • Mobile-responsive design
  • Opening book integration
  • Endgame tablebase for perfect play

Research Insights

What the AI Learned

Through self-play, the AI discovered several strategic principles:

  1. Center control: Strong models prioritize occupying central hexes early, maximizing jump opportunities
  2. Chain building: Creating “highways” of pieces that enable long jump sequences
  3. Blocking tactics: Strategically leaving pieces to obstruct opponent paths
  4. Tempo management: Understanding when to make small advances vs. waiting for big jumps

Training Curves

The model’s strength (measured in Elo) evolved through distinct phases:

Elo
2600 │                                    ══════════
     │                              ═════╝
2200 │                        ═════╝
     │                  ═════╝
1800 │            ═════╝
     │      ═════╝
1400 │ ════╝
1000 └──────────────────────────────────────────────
     0        50       100      150      200
                   Training Iterations
  • Iterations 0-30: Learning basic piece movement
  • Iterations 30-80: Discovering jump chains
  • Iterations 80-150: Positional understanding
  • Iterations 150+: Tactical refinement, diminishing returns

Try It Yourself

Play Online: checkers.shawnzhang.dev

Run Locally:

# Clone the repository
git clone https://github.com/shawnzhang12/chinese-checkers-alphazero
cd chinese-checkers-alphazero

# Install dependencies
pip install -r requirements.txt
npm install --prefix frontend

# Start the game
python server.py &
npm run dev --prefix frontend

# Open http://localhost:5173

Train Your Own Model:

# Configure training in config.yaml
python train.py --config config.yaml

# Monitor training
tensorboard --logdir runs/

Acknowledgments


Built with curiosity and too many late nights. If you enjoy the game, consider starring the repo!