No API. No Problem: Deploying tiny, fast, fine-tuned models offline.

About the talk
Game developers, especially those working on single-player or mobile titles, are often reluctant to integrate online APIs due to latency, cost, instability or platform constraints. Yet, the promise of generative AI remains strong, provided it can be fast, private, cheap, and seamlessly embedded into game experiences.
This talk explores a production-ready approach to deploying small, fine-tuned, dynamic language models inside games — without needing any internet connectivity. While many LLM-based use cases in games focus on low-hanging fruit like quest dialogue, we want users to take control of any text. We strive to bridge the gap between loose ideas or raw knowledge bases and structured data pipelines — the foundation for generating what we call Diamonds. Ultimately, you’ll understand how to generate reliable results with dramatically reduced memory usage and power consumption, even on the smallest of devices.

Takeaway
– Examine the operational and UX challenges of API – bound systems in videogames and how offline language models unlock new possibilities
– How is it that strategic templating governing input/output behaviours can be paired with strong alignment to your domain for reliable, adaptable outputs in a compact package
– Performance wins of embedded language models: tiny, fast, green and offline a. Side – by – side results comparing models across loading time, RAM use, etc.
– Lessons from building a toolchain for developers: what has and hasn’t worked
– What comes next: a peek into our ongoing R&D and Dynamic AI Platform features.

Experience level needed: Intermediate, Advanced