Sound and Technology - Hearing the Digital World

This is Part 3 in my series exploring Human–Computer Interaction through the senses. After touch, we now turn to hearing; how sound helps us orient, communicate, and make sense of digital experiences. From voice assistants to spatial audio, the way we design with sound is shaping how technology feels.


TL;DR

  • Hearing in HCI isn’t just output, it’s how people perceive, orient, and emotionally connect with technology.

  • Sound design shapes trust, attention, and experience, whether through alerts, voice, or immersive spatial audio.

  • Inclusive audio requires awareness of hearing differences, cultural context, and ethical boundaries.


Sound – Hearing the Digital World

Maybe you instinctively knew which direction that notification came from in your headphones, or found yourself leaning away from a grating video call echo. These aren’t just sounds. They’re how the digital world positions itself in your auditory space. What does it mean to hear in a digital environment? And how far can technology go in creating immersive, accessible, human-centered audio?


What Is Hearing?

In Human-Computer Interaction (HCI), hearing is a core sensory channel that allows people to receive feedback, interpret context, and engage emotionally with technology. Whether it’s a spoken command, a spatial audio cue, or the subtle tone of a notification, sound shapes how we navigate and trust digital systems.

In HCI, this area is known as Auditory Interaction, and includes:

  • Auditory Interfaces: systems using sound to support interaction

  • Auditory Displays: including earcons (brief musical cues), spearcons (sped-up speech sounds), and sonification (data converted into sound)

  • Voice User Interfaces (VUIs): voice-based input and output systems

  • Spatial Audio – sound designed to reflect position and environment

  • Sound Design for UX: crafting emotional tone and feedback cues

Hearing is a complex sensory system that processes multiple layers of information beyond simple sound detection:

Capability Description
Frequency Detection Pitch and tone, from the deep bass of a subway to the high whistle of a kettle
Amplitude Processing Volume and intensity, like distinguishing a whisper from a shout
Temporal Analysis Timing and rhythm, such as recognizing speech patterns or musical beats
Spatial Localization Direction and distance, like hearing a car approach from behind
Prosody Recognition Emotional tone carried by pitch, pace, and inflection—how we hear mood or intent


Each element plays a distinct role in how people experience and interpret the digital world through sound. From alerts and instructions to emotion and atmosphere, these layers shape how we respond and engage.



Hearing Beyond the Surface

I recently went to an immersive listening experience at a small venue called Envelop SF. The space was carpeted, quiet, and intimate. About 25 people were surrounded by 32 carefully positioned speakers. We listened to two Frank Ocean albums, both favorites of mine.


The first, Channel Orange, sounded brilliant. But the second, Blonde, was something else. It felt completely new. Richer, more layered, more alive. I could hear details I’d never noticed before. The emotional build, the shifts in texture, the space between the sounds. It wasn’t just music. It was a reminder that hearing isn’t passive. It’s how we experience depth.


For many people, experiences like these aren’t luxuries. They’re daily encounters with apps, alerts, and interfaces that either invite them in or leave them out. Just as spatial audio can transform a favorite album, thoughtful sound design can make digital tools feel either deeply personal, or frustratingly alien. Designers have a responsibility to ensure those auditory interactions are welcoming, respectful, and inclusive.


That’s the power of thoughtful sound design. When we get it right, when we shape how people hear, not just what, they don’t just hear it. They feel it.



How Technology Listens (and Talks Back)



Modern interfaces increasingly depend on sound to guide, inform, and interact:

• Spatial audio simulates directional hearing to enhance immersion in gaming, VR, and conferencing

• Voice interfaces allow spoken interaction, but still struggle with accents, dialects, and noisy environments

• Auditory cues (like alerts and earcons) guide behavior, sometimes subtly, sometimes too subtly



When sound works well, it vanishes into the background. When sound design fails, it doesn’t just disappoint. It excludes. And even immersive, responsive sound isn’t enough. Real inclusivity means designing for how differently people hear, and that’s where many systems fall short.






Designing for Hearing Differences

Not everyone hears the same but not all systems are designed with that in mind.

• Neurodivergent experiences – alert fatigue, unpredictable volume spikes, and overlapping sounds can cause real discomfort or distress. Minecraft’s granular sound settings, for example, let players reduce high-frequency noise while preserving gameplay

• Assistive technologies – tools like Google's Live Transcribe, Apple's Sound Recognition, and Microsoft's Narrator offer powerful adaptations. Recent improvements include Apple's custom sound recognition and Microsoft's more natural-sounding Narrator voices.

• Hearing aid innovation – wireless charging, Bluetooth connectivity, and transcription tools like Nagish have reshaped the accessibility landscape

• Cultural norms – what reads as urgent in one culture might feel invasive in another. Compare Seoul’s melodic subway chimes with the harsh screech of NYC alarms


Designing inclusive audio means more than turning up the volume.



The Ethics of Listening

As sound becomes more embedded in our tech, we also need to ask: what are the boundaries?

Sound is intimate but it can also be intrusive. It spills across walls, devices, and expectations.

Whether it’s an always-on mic, an app that plays emotionally manipulative soundtracks, or a smart speaker capturing unintended snippets, the ethics of listening deserve more attention. Who’s doing the listening? Who gave consent? And what’s being done with what they hear?

Consider a meditation app that uses soothing background tracks to encourage relaxation, but also embeds subtle cues that guide users toward in-app purchases. Or a voice assistant that records wake-word errors, capturing conversations never meant to be heard. These blurred boundaries between helpful design and intrusive surveillance are increasingly common.

At the same time, many AI systems show a troubling modality bias, over-prioritizing visual input while mishandling audio. In sound localization tasks, for instance, AI often fails when visual cues are missing, an imbalance that would be unacceptable for humans in similar contexts. This raises fairness concerns and highlights the risks of designing systems that hear poorly, or not at all.


2025 Developments

Several breakthroughs have shaped the auditory HCI landscape this year alone. With AI enhancing and increasing the velocity of research and capability, we are likely to see much more.


  • Apple Spatial Audio Format (ASAF): announced at WWDC 2025, goes beyond Dolby Atmos by integrating head tracking, object positioning, and environmental responsiveness

  • Eclipsa Audio: an open-source alternative to Atmos, launched by Samsung and Google

  • Google Meet Real-Time Translation: launched in May 2025, enabling fluent, expressive speech across languages

  • Personalized Voice AI: systems that adapt in real-time to individual preferences, memory, and context

  • Expanded accessibility: from Apple’s custom sound alerts to Microsoft’s upgraded Narrator voices and new assistive devices

  • Bias mitigation: new research from the University of Colorado Boulder shows persistent gaps in speech recognition, particularly for children and underrepresented dialects

  • Modality bias in AI: emerging evidence shows that sound localization models overly rely on visual cues, creating ethical and practical challenges in multimodal systems


These trends reinforce the need to treat auditory experience not as a secondary layer, but as a primary design consideration for inclusive, intelligent, and trustworthy systems.


Final Thoughts

Sound is more than an interface, it’s a presence. It cues our attention, stirs our emotions, and shapes our sense of connection in digital spaces.


Designing for hearing isn’t just a technical task. It’s a responsibility. It asks us to understand the emotional, cultural, and cognitive diversity of those on the other end.

If we can design technologies that not only speak but truly listen and be attuned to diversity, emotion, and context, we’ll move closer to digital spaces that feel more human. Sound isn’t just something we hear. It’s something we live inside.

Next, we’ll explore how vision shapes our digital experiences, from the subtle glow of a notification to the immersive worlds of augmented reality. If sound is how we feel space, sight might just be how we believe in it.


Try This (60 Seconds)

Say the phrase “I don’t know” out loud. Once neutral. Once frustrated. Once curious. Notice how your voice shifts. Pitch, pace, volume. That’s prosody: emotion carried in sound.

Now sit quietly for 20 seconds. Count how many sounds you hear around you. How many would a device notice? How many would it understand?


Previous
Previous

Vision and HCI – Seeing in the Digital World

Next
Next

Exploring the Role of Touch in the Digital World