In our previous blog post, “Agentic Conversational AI for Consuming Procedural Knowledge”, we introduced how the PERKS Conductor Agent enables users to interact dynamically with structured procedures. In that example, procedural knowledge was accessed through a chat widget, allowing users to follow workflows step by step while asking contextual questions along the way. However, chat is only one possible interaction channel. The consumption of procedural knowledge in PERKS is not tied to written interfaces. One can imagine the same agentic orchestration can operate across different channels. One particularly powerful extension is Agentic Voice, enabled through a Realtime Agent.
1. From Chat to Voice — Same Intelligence, New Interface
The intelligence behind PERKS remains unchanged. The Conductor Agent manages context, reasons across procedural steps, selects the appropriate tools, and ensures that workflows remain structured and traceable. What changes is the interaction layer. Instead of typing a question such as “What is the next step?”, a user can simply ask, “What do I do next?” The Realtime Agent processes spoken input, maintains context, and responds immediately, while the Conductor Agent continues to orchestrate the procedural logic in the background. Voice does not replace the agentic reasoning layer, but rather makes it more accessible.

2. Why Voice Matters for Procedural Knowledge
In many real-world environments, writing is not the most natural or practical form of interaction. Technicians, operators, and field workers may not have the time or ability to type detailed queries. Even in office settings, spoken language often feels more intuitive than carefully phrased text. Voice lowers friction. It reduces cognitive load. It allows users to ask follow-up questions, request clarification, or correct themselves in a more natural flow. This becomes particularly valuable when dealing with complex procedures that are rarely followed in a perfectly linear way. With Agentic Voice, users can move through a process step by step, pause to ask “why is this necessary?”, explore alternative branches, and continue seamlessly — all through natural conversation. The procedural structure remains intact because the Conductor Agent maintains the workflow state and ensures traceability in the background.
Voice is not only a channel for consuming procedural knowledge. It can also support its creation. Domain experts often explain processes more clearly when speaking than when writing formal documentation. An agentic voice system can capture these explanations, ask clarifying follow-up questions, and help structure the spoken input into formal procedural representations. In this way, the same infrastructure supports both knowledge use and knowledge acquisition.
3. Multichannel Procedural Intelligence
The key insight is that PERKS separates the agentic reasoning layer from the user interface. In the previous article, the chat widget illustrated how procedural knowledge can become interactive and adaptive. Agentic Voice demonstrates that the same capabilities can be delivered through spoken interaction, powered by a Realtime Agent. Procedural intelligence is no longer locked inside static documents, nor confined to a single channel. Whether accessed through chat, voice, or future multimodal interfaces, the underlying agentic system remains consistent. Chat was the first step in demonstrating interactive procedural guidance. Voice is the natural continuation, making procedural knowledge even more accessible, intuitive, and embedded in real operational environments.