Chinese Checkers AlphaZero
A from-scratch implementation of DeepMind’s AlphaZero algorithm applied to Chinese Checkers, featuring self-play training, Monte Carlo Tree Search, and a polished web interface for playing against trained AI agents or friends.
Overview
This project explores the application of modern reinforcement learning techniques to Chinese Checkers—a classic strategy game with a branching factor that makes traditional minimax approaches computationally intractable.
The implementation follows the AlphaZero paradigm: a neural network learns to evaluate board positions and suggest moves entirely through self-play, without any human game knowledge beyond the rules. The result is a family of AI agents at various skill levels, from beginner-friendly to highly competitive.
Why Chinese Checkers?
Chinese Checkers presents unique challenges compared to games like Chess or Go:
- High branching factor: A single marble can chain multiple jumps, creating hundreds of possible moves per turn
- Multi-agent dynamics: Traditional Chinese Checkers supports 2-6 players, requiring different strategic considerations
- Asymmetric objectives: Players race to opposite corners, creating interesting tension between offensive progress and defensive blocking
- Sparse rewards: Games can last 50+ moves with the only reward signal at game end
Technical Architecture
Neural Network Design
The policy-value network uses a ResNet-inspired architecture optimized for the hexagonal board structure:
Input Layer (Board State Encoding)
│
├── 121 hexagonal positions
├── Current player encoding
├── Move history (last 8 states)
└── Valid move mask
│
▼
Residual Tower (10 blocks)
│
├── 128 filters per conv layer
├── Batch normalization
└── ReLU activation
│
├────────────┬────────────┤
▼ ▼
Policy Head Value Head
│ │
▼ ▼
Move Probs Win Probability
(softmax) (tanh: -1 to 1)
Key Design Decisions:
- Hexagonal convolutions: Custom convolutional kernels that respect the 6-neighbor topology of the Chinese Checkers board
- Jump sequence encoding: Moves are encoded as (start, end) pairs, with the network learning to evaluate multi-jump sequences
- Opponent modeling: For 2-player mode, the value head predicts win probability; for multiplayer, it outputs a ranking distribution
Monte Carlo Tree Search (MCTS)
The MCTS implementation includes several enhancements over vanilla UCT:
class MCTS:
def search(self, root_state, num_simulations=800):
"""
Perform MCTS with neural network guidance.
Selection: UCB1 with neural network prior
Expansion: Expand all children, evaluate with NN
Backup: Update Q-values along path
"""
for _ in range(num_simulations):
node = self.select(root_state)
value = self.expand_and_evaluate(node)
self.backup(node, value)
return self.get_policy(root_state)
MCTS Enhancements:
| Feature | Purpose |
|---|---|
| Dirichlet noise | Adds exploration at root node during training |
| Progressive widening | Limits branching in high-action states |
| Virtual losses | Enables parallel tree search |
| Transposition table | Caches evaluations for repeated positions |
Training Pipeline
The self-play training loop follows the AlphaZero methodology:
┌─────────────────────────────────────────────────────────────┐
│ Training Loop │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Self-Play │───▶│ Replay │───▶│ Training │ │
│ │ Workers │ │ Buffer │ │ Worker │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ │ ┌──────────┐ │ │
│ └────────▶│ Model │◀────────┘ │
│ │ Checkpoint│ │
│ └──────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ Evaluation │ │
│ │ (Elo Rating)│ │
│ └──────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Training Statistics:
- Self-play games: ~500,000 games over training
- Training iterations: 200 iterations, ~2,500 games per iteration
- Hardware: 4x RTX 3090 for self-play, 1x A100 for training
- Total training time: ~72 hours
Game Features
AI Difficulty Levels
The trained models are checkpointed at various stages to provide a range of opponents:
| Level | Name | Elo | MCTS Sims | Description |
|---|---|---|---|---|
| 1 | Novice | 800 | 50 | Early training checkpoint. Makes obvious mistakes, good for learning. |
| 2 | Casual | 1200 | 100 | Plays reasonable moves but misses complex tactics. |
| 3 | Intermediate | 1600 | 200 | Solid positional play, occasional brilliant moves. |
| 4 | Advanced | 2000 | 400 | Strong tactical awareness, efficient pathing. |
| 5 | Expert | 2400 | 800 | Full-strength model. Will exploit any mistake. |
| 6 | Master | 2600+ | 1600 | Extended search. Nearly optimal play. |
Game Modes
vs AI (Single Player)
- Select difficulty from Novice to Master
- Undo/redo moves for learning
- Move hints showing AI’s top 3 suggestions
- Post-game analysis with AI evaluation graph
Local Multiplayer (Pass & Play)
- 2-player mode on a single device
- Automatic board rotation on turn change
- Optional move timer (30s, 1min, 2min, unlimited)
- Win animations and statistics
Online Multiplayer (planned)
- Real-time matches via WebSocket
- Ranked matchmaking with Elo system
- Spectator mode for ongoing games
User Interface Design
The UI draws inspiration from modern game platforms like chess.com and classic Flash/IO games, prioritizing:
Visual Design Principles
┌─────────────────────────────────────────────────────────────┐
│ ⚙️ Settings Chinese Checkers AlphaZero 🏆 Stats │
├─────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ │
│ │ Board │ │
│ │ Area │ │
│ │ │ │
│ │ ⬡ ⬡ ⬡ │ │
│ │ ⬡ ⬡ ⬡ ⬡ │ │
│ │ ⬡ ⬡ ⬡ ⬡ ⬡ │ │
│ │ │ │
│ └─────────────┘ │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Player 1 │ │ Player 2 │ │
│ │ ●●●●●● │ │ ○○○○○○ │ │
│ │ Your Turn│ │ AI: Exp │ │
│ └──────────┘ └──────────┘ │
│ │
│ [↩️ Undo] [💡 Hint] [🔄 New Game] │
│ │
└─────────────────────────────────────────────────────────────┘
Core UI Components
Game Board
- Smooth, responsive hexagonal grid
- Marble pieces with satisfying physics-based animations
- Valid move highlighting on piece selection
- Jump path visualization for multi-hop moves
- Subtle board texture with depth shadows
Piece Interactions
- Click-to-select, click-to-move (beginner-friendly)
- Drag-and-drop with snap-to-position
- Hover states showing piece ownership
- Selection glow effect with team color
Animations & Feedback
- Marble sliding animations with easing curves
- Jump chain animations (piece arcs over jumped positions)
- Capture/arrival celebrations (subtle particle effects)
- Turn transition with smooth board rotation (2P mode)
- Win sequence with fireworks and stats summary
Sound Design
- Satisfying “click” on piece selection
- Marble sliding sounds with pitch variation
- Distinct “hop” sound for jumps
- Ambient background music (toggleable)
- Victory fanfare
Settings Panel
┌─────────────────────────────────────┐
│ ⚙️ Settings │
├─────────────────────────────────────┤
│ │
│ 🎮 Game │
│ ├─ AI Difficulty [Advanced ▼] │
│ ├─ Move Timer [Off ▼] │
│ ├─ Show Hints [●] │
│ └─ Confirm Moves [ ] │
│ │
│ 🎨 Appearance │
│ ├─ Board Theme [Classic ▼] │
│ ├─ Piece Style [Glass ▼] │
│ ├─ Animations [●] │
│ └─ Dark Mode [●] │
│ │
│ 🔊 Audio │
│ ├─ Sound Effects [●] 🔊━━━━○ │
│ ├─ Music [ ] 🔊━━○━━ │
│ └─ Voice Lines [ ] │
│ │
│ 📊 Stats │
│ ├─ Games Played: 47 │
│ ├─ Win Rate: 62% │
│ └─ Best Win: vs Expert │
│ │
│ [Reset Stats] │
│ │
└─────────────────────────────────────┘
Theme Options
| Theme | Description |
|---|---|
| Classic | Traditional wooden board with glass marbles |
| Neon | Dark background with glowing pieces (IO game style) |
| Minimal | Clean whites and blacks, subtle shadows |
| Nature | Earthy tones, stone-like pieces |
| Retro | Pixel art style, 8-bit sound effects |
Implementation Details
Tech Stack
Frontend
- React 18 with TypeScript
- Canvas/WebGL for board rendering (via PixiJS)
- Framer Motion for UI animations
- Zustand for state management
- Howler.js for audio
Backend
- Python FastAPI for game server
- PyTorch for model inference
- Redis for session management
- WebSocket for real-time communication
AI Engine
- ONNX Runtime for optimized inference
- Web Worker isolation for non-blocking AI computation
- Configurable search depth per difficulty level
Performance Optimizations
// Move generation with bitboard optimization
class MoveGenerator {
// Precomputed jump tables for instant lookup
private jumpTable: Map<number, number[]>;
generateMoves(state: GameState): Move[] {
const moves: Move[] = [];
// Single steps: O(1) lookup per piece
for (const piece of state.currentPlayerPieces) {
moves.push(...this.singleSteps[piece]);
}
// Jump chains: BFS with visited tracking
for (const piece of state.currentPlayerPieces) {
this.generateJumpChains(piece, moves, new Set());
}
return moves;
}
}
Optimization Results:
| Metric | Before | After |
|---|---|---|
| Move generation | 12ms | 0.3ms |
| Board evaluation | 8ms | 0.1ms |
| AI move (Expert) | 5s | 1.2s |
| Memory usage | 180MB | 45MB |
Development Roadmap
Completed
- Core AlphaZero implementation
- Self-play training pipeline
- 6 difficulty levels with distinct personalities
- Web-based game interface
- Local 2-player mode
- Move hints and undo system
- Settings persistence
In Progress
- Post-game analysis with move-by-move AI evaluation
- Board themes and piece customization
- Sound effects and music
Planned
- Online multiplayer with matchmaking
- Tournament mode
- Mobile-responsive design
- Opening book integration
- Endgame tablebase for perfect play
Research Insights
What the AI Learned
Through self-play, the AI discovered several strategic principles:
- Center control: Strong models prioritize occupying central hexes early, maximizing jump opportunities
- Chain building: Creating “highways” of pieces that enable long jump sequences
- Blocking tactics: Strategically leaving pieces to obstruct opponent paths
- Tempo management: Understanding when to make small advances vs. waiting for big jumps
Training Curves
The model’s strength (measured in Elo) evolved through distinct phases:
Elo
2600 │ ══════════
│ ═════╝
2200 │ ═════╝
│ ═════╝
1800 │ ═════╝
│ ═════╝
1400 │ ════╝
│
1000 └──────────────────────────────────────────────
0 50 100 150 200
Training Iterations
- Iterations 0-30: Learning basic piece movement
- Iterations 30-80: Discovering jump chains
- Iterations 80-150: Positional understanding
- Iterations 150+: Tactical refinement, diminishing returns
Try It Yourself
Play Online: checkers.shawnzhang.dev
Run Locally:
# Clone the repository
git clone https://github.com/shawnzhang12/chinese-checkers-alphazero
cd chinese-checkers-alphazero
# Install dependencies
pip install -r requirements.txt
npm install --prefix frontend
# Start the game
python server.py &
npm run dev --prefix frontend
# Open http://localhost:5173
Train Your Own Model:
# Configure training in config.yaml
python train.py --config config.yaml
# Monitor training
tensorboard --logdir runs/
Acknowledgments
- DeepMind’s AlphaZero paper for the foundational algorithm
- Leela Chess Zero community for distributed training insights
- chess.com for UI/UX inspiration
Built with curiosity and too many late nights. If you enjoy the game, consider starring the repo!
