About the talk
Working with LLM agents sometimes feels like herding cats—from small prompt tweaks causing outsized effects, to unpredictable player behaviors surfacing troublesome outcomes. In this talk, Batu Aytemiz will share practical insights drawn from developing a Roblox game with over 20,000 daily active users. He will discuss tools and techniques designed to make AI agents more reliable, safer, and easier to iterate on, ranging from lightweight, custom-built internal tools anyone can implement, to simple yet powerful prompting practices you can start using right away.
Takeaway
– Why we should test LLMs incredibly rigorously, and the potential ethical concerns of not doing so. – An iterative workflow that helps you evaluate and improve your LLM pipeline in a systematic manner, grounded in user data.
– Techniques that are focused on increasing iteration speed when making prompt changes.
– Techniques that are focused on verifying that the changed prompts aren’t causing unexpected behaviors.
– Simple tool to analyze bulk user data while respecting privacy.
Experience level needed: Intermediate