AI systems are becoming more human-like than ever. Recently, in the domain of audio processing, they have sometimes surpassed humans in mastering emotional expressions, non-verbal cues and vocal intonations.
For example, Google’s AI note-taking and research app, NotebookLM, introduced an “Audio Overview” feature in mid-September. This function can transform a written paper into a podcast, well-suited for those who prefer listening over reading. The output not only presents content that flows coherently but also delivers vocal expressions that sound just right.
Another application, called Hume, acts like a psychological counselor. Users can chat with it through their phone’s microphone, and it can respond to your positive and negative emotions. It then offers guidance to help you manage your thoughts and feelings.
In other words, AI has progressed to the point where it not only exhibits subtle vocal expressions but can also detect a person’s emotional state as they speak.
What can AI do with this capability? The answer is: assist humans in listening.
This is particularly useful in public affairs, with elections being a prime example. When Boston Mayor Michelle Wu ran for office in 2021, she employed the “Real Talk For Change” platform to respond to voters’ concerns. The tool was a digital platform developed by Cortico, a nonprofit AI startup team.
Last year, Cortico has developed a mobile app that is integrated with their platform. The app’s specific use involves recording the speech of all participants during small-group discussions. AI then analyzes the words and expressions used by the speakers, categorizes and summarizes key points, and ultimately helps identify resonant insights expressed by the participants, publish them online and present them to decision makers.
In the past, decisions were typically produced through voting, which might be limited to a handful of options that fails to convince everyone. The aforementioned method is different; by listening broadly, it discovers the rough consensus and shared values among participants. This makes it easier for everyone to listen to each other across groups, bridging the gap between opposing sides.
AI’s ability to “understand” human speech indeed compensates for humans’ innate limitations. When one person tries to listen to another, there are many variables—such as the listener’s personality, character, rational and emotional tendencies, cognitive abilities, and so on. Even with professional training, it’s challenging to maintain a constant state of active listening. In contrast, AI lacks these interferences; devoid of an ego, it can serve the role of a tireless listener.
However, even as AI’s capabilities become increasingly sophisticated, it doesn’t mean it can completely replace humans. Just as in Cortico’s example, conversations still need to take place between people, rather than each person interacting individually with machines.
In this context, AI serves to help create bridges between people, helping to find genuine consensus, filling in a missing piece of the communication puzzle. Even if there is a vast chasm of differences between individuals, as AI’s functions grow ever stronger, there is a chance for that gap to be bridged.