Welcome to The Gygax Test

The Gygax Test is a research blog exploring whether Large Language Models (LLMs) can effectively serve as Dungeon Masters in tabletop roleplaying games like Dungeons & Dragons.

The Gygax Test evaluates whether an AI can:

  • Create and maintain consistent characters across a long campaign
  • Develop emerging narrative arcs that respond to player choices
  • Run tactically interesting combat using complex rule systems
  • Deliver a satisfying collaborative storytelling experience

Research Focus

This blog documents ongoing experiments, observations, and technical challenges in developing LLMs capable of running engaging D&D campaigns. It also serves as a sandbox for me to screw around in all areas of model training. Topics include:

  • Context length limitations and their impact on campaign-scale memory
  • Evaluation metrics for narrative quality and combat fairness
  • Training approaches for specialized DM capabilities
  • Evals for testing difficult to verify skills (narrative coherence)

Contact Me

You can find me on Twitter or email me at pact@gygaxtest.com.