On‑Device Generative Agents for Lifelike NPCs

About the talk
Developers are starting embrace generative AI to elevate NPC believability, but current cloud‑based agent frameworks are far too costly and slow for real‑time integration. In this session, you’ll learn how Atelico’s Generative Agents Realtime Playground (GARP) leverages proprietary small LLMs, cognitive memory architecture, and hot‑swappable adapters to run lifelike, emergent NPC behavior directly on consumer hardware with zero inference cost and no rate‑limits. We’ll walk through our end‑to‑end pipeline:
– Game‑object serialization & prompt templating
– Memory DB design (working vs. long‑term memory, retrieval, reflection)
– Adapter‑based fine‑tuning for character, planning & chat
– Performance optimizations (quantized models, parallel LM calls, caching, guardrails)
You’ll come away with actionable guidelines for implementing LLMs in your game, architecting cognitive agents, and designing player interactions that nudge emergent narratives, all without breaking the bank.

Takeaway
– AI Engine Architecture: How to orchestrate LLMs, memory and adapters for real‑time agents
– Implementation Patterns: Prompt templating, JSON serialization, parallel calls & safety guardrails
– Performance Strategies: Quantized small models, LoRA swapping, prefix caching, batching
– Design Best‑Practices: Balancing autonomy vs. authorial control, chat‑driven nudging, emergent behavior design
– Integration Guidelines: Godot plugin workflows, data pipelines & dev tooling All these are lessons learned implementing GARP and our first, yet unannounced, game.

Experience level needed: Intermediate