When was the last time you barked orders at Alexa?
The way we interact with machines is about to change dramatically.
While most people are obsessing over ChatGPT and image generators, they’re underestimating the biggest paradigm shift: Voice AI.
Today, I'll show you why. We’re going to cover:
🗣️ Advanced Voice Mode: What is it
🗣️ Why Voice is the (obvious) last frontier of AI
🗣️ Key Applications in the Workplace
🗣️ Voice and Workplace Simulations
🗣️ Implementation Challenges and Solutions
🗣️ Top AI Voice Tools
Let’s dive in!
Death of the Keyboard
It's 2024, and our main way to work is still with a typewriter-like machine invented in 1714, with the keyboard invented in 1981.
Think about it: we spend our days translating our thoughts into finger movements, typing away on a device that was designed for a completely different era. While writing has served us well for millennia, and will continue to be crucial for many tasks, the emergence of advanced voice AI presents an opportunity to rethink how we interact with technology at work.
Sure, we've had Siri and Alexa for over a decade now, but their capabilities have largely been confined to simple commands like setting timers or playing music – barely scratching the surface of voice technology's potential. This shift in how we interact with technology mirrors a broader trend: just as TikTok has transformed how we consume content – making it more immediate, conversational, and natural – voice AI is poised to do the same for our work interactions.
We're moving away from formal, structured inputs (like typing emails or commands) toward more natural, fluid ways of getting things done. The 'TikTokification' of work (which we wrote about here and here) isn't just about short-form videos; it's about making technology adapt to our natural behaviors rather than the other way around.
Speaking is our most natural form of communication – we learn it from birth, use it effortlessly, and can convey nuance and emotion in ways that writing often struggles to capture. Until now, though, voice interfaces have been limited by technology's inability to truly understand and respond to natural speech. That's about to change.
Advanced Voice Mode: What is it
OpenAI released its Advanced Voice Mode on September 24, 2024, to a broader group of users, specifically targeting subscribers of the ChatGPT Plus and Team plans. This rollout followed an initial alpha testing phase that began in July 2024, which was limited to a select group of users due to safety concerns and the need for further testing. The launch also followed controversy between OpenAI and Scarlett Johansson, following Sam Altman’s proposal for the actress to be the official voice of this feature, following the acclaimed (and must-see) movie ‘Her’ that featured a ‘sentient’ AI operating system.
Advanced Voice Mode allows to generate human-like speech with unprecedented realism and naturalness. It incorporates nuances in tone, inflection, and emotion, making it nearly indistinguishable from human speech. With Advanced Voice Mode, AI can not only read text aloud but also engage in dynamic, context-aware conversations.
Why Voice is the (obvious) last frontier of AI
Now while this feature is currently only available in the ChatGPT application, it’s only going to be a matter of time until it’s going to be featured in third-party apps and be generally more ubiquitous.
Why? It’s simple; it removes the last barrier of ‘friction’ between a user and the technology. Just think about it - AI and LLMs were already available before 2022 (by the way; ChatGPT just blew its 2 year birthday candles just a few days ago, doesn’t it seem like much longer?). The craze started when usability (friction) barriers were removed by giving users a text box that gave them (everyone) the possibility to use it, much like it happened with Google in 1998.
Voice removes this last barrier by allowing users to interact with AI through natural speech, making the technology even more accessible and intuitive. This advancement opens up a whole new realm of possibilities for AI applications, particularly in the workplace, where voice-based interactions can significantly enhance productivity and collaboration.
Why you ask?
Because this is not just a technological advancement; it's a paradigm shift in how we interact with machines. By mimicking natural human conversation, voice AI can create more engaging and efficient workplace environments. For instance, it can facilitate hands-free operations, enable multitasking, and provide accessibility options for individuals with visual or motor impairments. Moreover, voice AI can potentially revolutionize customer service, virtual meetings, and even creative processes in the workplace.
(This is a longer article - clear here to read in your browser.)
Key Applications in the Workplace
Voice AI is revolutionizing workplace training across three critical domains: customer service, leadership development, and technical operations.
Each area leverages this technology in unique ways, creating immersive learning experiences that were previously impossible to scale. The results have been remarkable – from reducing training time by up to 50% to significantly improving employee confidence and performance.
Customer Service - In customer service, voice AI creates a safe space for representatives to master challenging scenarios before facing real customers. Imagine a new hire practicing with an AI that can seamlessly switch between an angry customer demanding a refund and a confused user needing technical support. The AI provides real-time feedback on tone, word choice, and emotional intelligence, while also measuring key metrics like clarity and pace. What's particularly powerful is the ability to simulate various accents and communication styles, preparing representatives for the true diversity of their customer base. One major retail company reported that after implementing voice AI training, their customer satisfaction scores improved by 23% within the first three months.
Leadership Development and Communication Training - Leadership development has seen equally impressive applications. Voice AI serves as a virtual coach, allowing emerging leaders to practice crucial conversations in a risk-free environment. From delivering tough performance reviews to managing team conflicts, the technology provides immediate feedback on leadership presence and communication effectiveness. The AI can simulate complex scenarios like budget negotiations or crisis management situations, allowing leaders to develop muscle memory for high-stakes conversations. More importantly, it helps them recognize patterns in their communication style and develop more effective approaches over time.
One example:
Hi, I’m Alec Beckett, creative partner at Nale Communications, a creative agency in Providence. Now, we have a client who’s been really hesitant to make subjective decisions until his CEO has weighed in. But of course, the CEO was always busy, and it’s very hard finding time on our calendar. So we were seeing timelines starting to slip when Stephen, our director of strategy, came to me with an idea that I really loved. Let’s make a synthetic version of the CEO. So we trained a custom GPT with the CEO’s latest strategic plan and as many of her speeches and blog posts and podcast transcripts as we can find. Then before we presented the next round of work to our client, we uploaded that presentation to our synthetic CEO and asked for an opinion. The feedback was actually kind of amazing. It wasn’t all positive. But frankly, it was exactly the kind of feedback we’d love to get more often from our clients. It’s very cogent and strategic and clear. And it was really interesting to see how that sort of loosened up our client who felt like he was getting a version of his CEO’s perspective. And it seemed to make him more willing to make some of these decisions and keep the process moving forward. It did get us start to wondering, like, are synthetic CEOs the future? I mean, they certainly are cheaper.
Nothing is going to substitute for the real thing, and from learning from real failures. But in so many environments, including sports, you get to train and practice before that match. This is the same idea, that can be possible and easier to implement by feeding this technology with context. Which makes me think that companies could/should even provide such datasets from the beginning, i.e. ‘Mike Smith’ so you can know how he interacts and practice. This should also be backed by science, by mixing it with the psychological / personality profiling (Big5, etc).
Technical Training - This domain showcases perhaps the most practical applications of voice AI. Engineers and technicians can now receive hands-free, step-by-step guidance while working on equipment, significantly reducing the risk of errors in critical procedures. The AI adapts to the user's pace, provides real-time troubleshooting support, and can even detect potential safety issues before they become problems. This has been particularly valuable in industries like manufacturing and healthcare, where precision and safety are paramount. For instance, a leading manufacturing plant reported a 60% reduction in training-related incidents after implementing voice AI guidance for their maintenance procedures.
Meeting Mediator - Voice AI can serve as an impartial third party in heated discussions, analyzing conversation patterns and emotional undertones in real-time. For example, it could notify participants when someone is being repeatedly interrupted or when the discussion is becoming unnecessarily confrontational. One tech company reported that implementing this reduced meeting tensions and improved participation from quieter team members by 40%.
Cultural Intelligence Coach - For global companies, voice AI can help employees navigate cultural nuances in communication. Beyond just language translation, it can provide real-time guidance on cultural context, helping professionals adjust their communication style for different cultural contexts. A sales team using this technology reported a 35% improvement in closing deals with international clients.
Wellness Monitor - Voice AI can analyze subtle changes in speech patterns to detect signs of stress, burnout, or declining engagement. This early warning system helps managers proactively address well-being concerns before they become serious issues. One organization using this approach saw a 45% reduction in stress-related leave.
Documentation Companion - Instead of traditional note-taking, imagine having a voice AI that not only transcribes but actively participates in the documentation process. While working on complex tasks, engineers or doctors could verbally explain what they're doing, with the AI creating structured documentation, highlighting potential issues, and suggesting relevant references or protocols.
Here's a real example that illustrates an innovative application:
At a mid-sized software company, they created what they call a ‘Voice Time Machine’ - developers can have conversations with previous versions of their codebase. By training the AI on historical documentation, commit messages, and team discussions, developers can ask questions like "Why did we make this architectural decision last year?" or "What were the main concerns when we designed this feature?" This has dramatically reduced the time spent digging through old documentation and helped new team members get up to speed faster.
These applications show how voice AI can go beyond simple automation to create new ways of working and thinking. The key is to look for areas where natural conversation can remove friction or create value in unexpected ways.
Voice and Workplace Simulations
Many of the areas above concern training and development, but now I’d like to talk about ‘simulations’. There’s two main types:
Job simulations - pre-employment assessments designed to evaluate candidates by having them perform tasks and scenarios that reflect the actual responsibilities of the job they are applying for.
Project simulations - during employment, these could be any type of simulations that anticipate how a new product can be built, risk assessments and so on. The concept is similar, but even stronger because it is not ‘just’ used to understand if the candidate is suitable to be hired, but used for:
Skill Development: Employees can practice and refine their skills in a safe setting, allowing for experimentation without the risk of real-world consequences.
Team Collaboration: These simulations often involve group work, helping to build teamwork and communication skills among employees.
Realistic Experience: By mimicking actual project scenarios, employees gain valuable insights into project management, decision-making processes, and resource allocation.
Feedback Mechanism: Participants receive immediate feedback on their performance, enabling them to identify areas for improvement and adjust their approaches accordingly.
Some examples include:
Case Studies: Employees analyze real or hypothetical situations, developing solutions and strategies as they would in an actual project.
Role-Playing Scenarios: Participants assume different roles within a project team to understand various perspectives and responsibilities.
Cross-Functional Projects: Employees from different departments collaborate on a simulated project to enhance interdepartmental communication and cooperation.
Virtual Simulations: Utilizing software tools, employees engage in digital environments that replicate complex project scenarios, allowing for exploration of various outcomes based on their decisions.
While traditional simulations often felt mechanical and scripted, voice AI brings a level of natural interaction that makes these experiences feel natural, and therefore more immersive and effective.
What makes this combination particularly powerful is its ability to create what I call "dynamic learning environments." Unlike traditional e-learning or role-playing exercises, voice-enabled simulations can adapt in real-time to the learner's responses, creating truly personalized training experiences. For example, if a sales representative struggles with handling price objections, the simulation can automatically increase the frequency of such scenarios while adjusting the difficulty based on their progress.
The impact of this technology becomes even more significant when combined with other emerging technologies. Some forward-thinking companies are already experimenting with voice AI-powered simulations in virtual reality environments, creating what some call "the closest thing to real-world experience you can get without real-world consequences." Imagine practicing a high-stakes board presentation where you can not only see your virtual audience but also interact with them through natural conversation, receiving immediate feedback on everything from your tone of voice to your body language.
Looking ahead, we're seeing the emergence of what I call "hybrid simulation environments." These combine voice AI with other technologies like AR/VR and motion tracking to create comprehensive training experiences.
Implementation
Implementing these types of simulations is not easy, there’s no current one-stop-shop or proper integration into the various Learning Management Systems just yet. It’s an early stage, where experimentation is key, so my suggestion would be to build an internal program where AI power-users map some of the above examples and how they could fit into the specific company and employees use cases, set some goals and milestones and see what sticks or doesn’t. I think the beauty of these tools is that they’re highly customizable, and could be part of the ‘agent’ workforce that everyone builds for them in the future.
That said, I’m totally skipping some key concerns and considerations that should be done about privacy of the data, ethics and so on. From my point of view they follow AI policy and regulation in general and should be put inside that basket.
TLDR;
Voice AI represents more than just a new way to interact with machines – it's a fundamental shift in how we might approach work itself. From training and simulations to real-time assistance and cultural navigation, the applications we've discussed are just the beginning.
The key insight is this: every major leap in technology adoption has been marked by reduced friction between humans and machines. The mouse made computers accessible to everyone. The touchscreen made mobile computing intuitive. Now, voice AI promises to remove perhaps the last major barrier – the need to translate our thoughts into typed text.
However, this isn't about replacing writing entirely. Instead, it's about having the right tool for the right job. Writing will remain crucial for tasks requiring precision, permanence, and careful consideration. Voice AI will excel in scenarios requiring natural interaction, real-time feedback, and hands-free operation. It's an opportunity for employers to bridge the gap with younger workers in particular who expect to use similar technology from home and play in the workplace.
The organizations that will thrive in this new era will be those that thoughtfully integrate voice AI into their workflows, understanding both its potential and limitations. They'll create environments where employees can seamlessly switch between speaking and typing, choosing the most effective medium for each task.
As we look ahead, the question isn't whether voice AI will transform the workplace – it's how quickly and thoroughly this transformation will occur. The technology is here. The use cases are clear. The only remaining question is: are you ready to find your voice in this new world of work?
Ciao,
Matteo
What a fascinating and compelling read. Do you have any views on how long it will be before this becomes the new normal in the world of work.
I use vocie ALL the time. Being a remote worker means not having co-workers present at all times. I have ChatGPT instead. And when commuting into the office, I take my car to the train station, it takes around 30 min to drive and I now start my day in the car, talking to ChatGPT and then I just pick it up when at my computer on the train. It's amazing.