Welcome to The Gygax Test
The Gygax Test is a research blog exploring whether Large Language Models (LLMs) can effectively serve as Dungeon Masters in tabletop roleplaying games like Dungeons & Dragons.
The Gygax Test evaluates whether an AI can:
- Create and maintain consistent characters across a long campaign
- Develop emerging narrative arcs that respond to player choices
- Run tactically interesting combat using complex rule systems
- Deliver a satisfying collaborative storytelling experience
Research Focus
This blog documents ongoing experiments, observations, and technical challenges in developing LLMs capable of running engaging D&D campaigns. It also serves as a sandbox for me to screw around in all areas of model training. Topics include:
- Context length limitations and their impact on campaign-scale memory
- Evaluation metrics for narrative quality and combat fairness
- Training approaches for specialized DM capabilities
- Evals for testing difficult to verify skills (narrative coherence)
Contact Me
You can find me on Twitter or email me at pact@gygaxtest.com.