Daytona AI Researchers - Berkeley, May 2026

Name: Daytona AI Researchers - Berkeley, May 2026
Start: 2026-05-08T00:00:00
End: 2026-05-08T03:00:00
Location: Berkeley, California

📅 May 07, 2026

🕐 5:00 PM PT

👥 43 attending

About This Event

Welcome to our first meetup dedicated to AI researchers at Berkeley!
Agenda
🕒 5:00 pm – 5:05 pm
Welcome and Opening Remarks
🎤 Marijan Cipcic, Principal Events Manager

🕒 5:05 pm – 5:20 pm
Talk "Today's Agents Don't Live In Episodes"
🎤 Muhammad Annas Hashmi, DevRel at Daytona
Outline:
The 'episode' (short, stateless, resettable) has been RL's foundational abstraction since ATARI. It underpins the Gym API, GRPO, PPO, and the conventional sandbox lifecycle. Today's agents no longer fit it. Tasks span for days; the env state at hour 18 of an agent session with warm caches, installed deps, live processes, open sockets, dirty git tree, is worth hours of wall clock to reproduce.

Three things are scaling simultaneously. Rollout horizon: seconds -> days. Env state: disposable between episodes -> first-class learning substrate. Branching: absent in modern LLM-RL -> speculative fork trees. Each stresses the inherited toolkit in a different way, and all three have been gated on the same missing primitives: VMs you can fork cheaply, pause without killing processes, snapshot mid-run, and resume hours later.

This talk walks through what opens up when those primitives become available. Live demo of long-horizon sessionful rollouts, mid-trajectory forking, and cross-calendar-time training. The research questions that follow (long-horizon benchmarks, speculative RL algorithms, event-driven training, to name a few) are where the next wave of agent RL gets built.

🕒 5:20 pm – 5:35 pm
Talk "Closing the Visibility Gap: Lessons from Safety Critical Agentic Systems"
🎤 Vivek Pandit, Frontier AI Lead at Frontier AI Lead at Turing
Outline:
AI agents are moving from demos to production, but their success depends on how well we can evaluate, benchmark, and trust them in high stakes workflows. This talk explores why traditional software metrics and static benchmarks fall short for agentic systems, especially when agents must reason, plan, call tools, recover from failure, and operate over long horizons. I’ll argue for evaluation frameworks that treat execution traces, reasoning trajectories, and tool interactions as first class signals, alongside outcome based metrics such as task success, pass rates, coverage, and behavioral robustness.
To ground these ideas, the talk draws from chip design verification, where over 60% of development time is spent validating design intent against complex specifications. Verification is not just a tooling problem but a reasoning problem, making it a strong testbed for agent evaluation. I’ll share lessons from building agents that interoperate with EDA toolchains, coordinate across stages like mental model formation, test planning, testbench generation, and run and debug, and use auto correction loops to safely adapt from tool feedback. The broader lesson is that better observability and domain aware benchmarking are essential for deploying reliable agents in production.

🕒 5:35 pm – 5:50 pm
Talk "TBA"
🎤 Speaker TBA
Outline:
TBA

🕒 5:50 pm - 8:00 pm
Networking
With food and beverages
________________________
About event
An engaging meetup designed for AI researchers to connect, share ideas, and explore the latest advancements in artificial intelligence. The event features informal networking, short talks, and discussions on current research trends, fostering collaboration and knowledge exchange within the AI community.