Phase 1 Summary

Screen-to-Soundscape (STS) is a speculative design prototype that re-images traditional screenreaders by transforming a browser into immersive soundscapes. In phase 1, our primary objective was to build a prototype that demonstrates how spatial audio can provide a richer, more navigable web experience. This prototype utilizes layered voices and spatial audio to help users discern the text's location and context within the browser, significantly enhancing digital accessibility. The project's goal is to explore how spatial audio and AI-generated voices and alt-text can create more accessible digital environments by converting traditional screen interfaces into immersive and personalized soundscapes. This approach aims to make digital content more intuitive and engaging, particularly for blind and visually impaired users.

Below is a NotebookML generated podcast summary:

Audio Block

Double-click here to upload or link to a .mp3. Learn more

Phase 1 Methodology

The STS project employed a collaborative, user-centered approach to design and development, emphasizing co-creation as a core methodology. The process involved engaging with blind and visually impaired co-creators through a series of guided sessions to ensure the tool was tailored to their unique needs and preferences.

Phase 1 Co-Creation

A group of six people in a well-lit meeting room, gathered around a large rectangular table. Laptops, microphones, and other devices are scattered on the table, suggesting a collaborative work session. A guide dog is resting on the floor next to one participant. The group appears to be engaged in a discussion, with one person standing while others are seated, listening attentively.

The first phase involved building relationships with potential co-creators and establishing an open dialogue about the project's aims and research questions. We partnered with Constant vzw, a Brussels-based arts organization, and reached out to artists, disability organizations, and assistive technology experts to create a supportive ecosystem. We conducted initial co-creation sessions with three screen reader users (Chris, Bruno, and Raphael), where we introduced our prototype and engaged them in experimental design exercises. Through deep listening practices, technology demonstrations, and guided discussions, we explored how screen readers currently function and how users interact with digital content. This phase emphasized understanding how different individuals use screen readers, allowing us to identify the variations and challenges they face, particularly when accessing complex visual information.

Prototype Development and Testing

Our team developed an initial prototype using A-Frame, a web framework for creating virtual reality (VR) experiences, and hosted it on Glitch. This prototype converted a basic webpage into a virtual soundscape, where users could navigate using keyboard commands, triggering audio elements as they approached specific points. The first co-creation session revealed several insights: while participants appreciated the spatial layout and freedom of exploration, they also expressed concerns about the lack of auditory cues to indicate their location and boundaries within the virtual space.

Based on this feedback, we developed a second prototype with enhanced features such as audio boundaries, additional keyboard controls, and refined sound parameters to provide clearer auditory cues. We adjusted the distance model to ensure users could differentiate between nearby and distant sounds, making navigation more intuitive. This iterative process, informed by co-creator feedback, allowed us to refine the tool in a way that balanced functionality, aesthetics, and usability.

Phase 1 Results

The co-creation sessions yielded valuable insights into the unique ways that blind and visually impaired users navigate digital content and the limitations they encounter with current screen readers. Key findings include:

Spatial Awareness:

Participants expressed that traditional screen readers often "flatten" web experiences by reducing content to linear lists, eliminating the spatial context essential for understanding complex information, such as maps or images. The STS prototype addressed this by providing an environment where audio cues conveyed spatial relationships, enabling users to explore content more naturally.

Customized Interactions:

Through our co-creation process, we discovered the importance of customizable features, such as naturalistic voice options and personalized alt-text generation. Users preferred having more control over how they experienced digital content, with options to adjust voice characteristics, sound localization, and movement within the soundscape. However, for the demo, we used ElevenLabs and ChatGPT for the voice and alt-text generation. Given the positive feedback, we aim to create a free and open-source alterative to these API services.

Usability and Navigation:

While participants appreciated the freedom to explore content in the soundscape, they also highlighted challenges, such as difficulty navigating without clear auditory cues and the overwhelming complexity of multiple layered voices. These insights guided the refinement of our prototype, leading to the introduction of more precise audio boundaries, improved keyboard controls, and options to manage auditory complexity.

Sound Methods:

The sound cues developed for the STS project were designed to enhance information flow by creating auditory layers that coexisted without conflict, each conveying specific types of information. These sound cues function in both the foreground and background, complementing vocalized text to signal transitions and the introduction of new information. Drawing inspiration from Daniel R. Montello’s theories of space, we aligned sound cues with different spatial scales—immediate, vista, environmental, and geographical spaces—to reflect how users naturally navigate physical environments.

Overall, the co-creation sessions demonstrated the potential of spatial audio to enhance web accessibility for visually impaired users, providing a more engaging and intuitive experience than conventional screen readers.

Phase 1 User interaction Demo Video

You can try our first prototype below. Use the arrow keys to navigate. For best experience, please use google chrome, make it full screen, and wear headphones. If you would like to interact with the french version of the demo, please go to https://frscreen2soundscape-cocreation2-soundcues.glitch.me/

Screen-to-Soundscape Phase 1 Co-creation 1 Prototype. Use the arrow keys to navigate. For best audio experience, use headphones.

You can try our second prototype below. Use the arrow keys to navigate. For best experience, please use google chrome, make it full screen, and wear headphones. If you would like to interact with the french version of the demo, please go to https://frscreen2soundscape-cocreation2-soundcues.glitch.me/

Screen-to-Soundscape Phase 1 Co-creation 2 Prototype. Use the arrow keys to navigate. For best audio experience, use headphones.

Phase 1 Customized Alt-Text Generation and Image-to-Soundscape Test

Below is an example of our prototype of an alt-text generation and image-to-soundscape generation for Garden of Earthly Delights by Hieronymous Bosch, as visualized on the left. On the right, there are four different audio files. Each audio file is an example of the custom alt-text generation we created for the Garden of Earthly Delights. The first audio represents an alt-text crafted for an art curator, while the following three were designed for a child, with the second and third showcasing a more upbeat and lively tone. The final audio incorporates a soundtrack generated using Imaginary Soundscapes, seamlessly blending with the alt-text.

Hieronymus Bosch, *The Garden of Earthly Delights*, oil on oak panels, 205.5 cm × 384.9 cm (81 in × 152 in), Museo del Prado, Madrid

Phase 2 Future Directions

Our future directions are divided into two main areas: co-creation goals and technical development goals. These complementary paths will guide the next phase of the STS project, ensuring that we continue to build an accessible and innovative tool that meets the needs of blind and visually impaired users while also advancing the technical capabilities of our application.

Co-Creation Goals

Expanded Co-Creation:

In the next phase, we will continue to engage deeply with our blind and visually impaired co-creators, facilitating more comprehensive sessions that will help us better understand their needs and preferences. These sessions will involve ideation, advanced prototyping, and testing with a wider range of visual content, such as charts, infographics, and other complex non-textual materials. By exploring a diverse set of digital elements, we aim to ensure that the tool remains responsive and adaptable to various real-world browsing scenarios. This iterative process will allow us to refine the tool's features and functionalities, ensuring it is truly aligned with the experiences of our target users.

Open-Source Development and Community Engagement:

We intend to actively promote STS within the open-source community, encouraging developers, sound designers, accessibility advocates, and other contributors to participate in its growth and evolution. By making the project's code, documentation, and development resources openly available, we hope to create a collaborative ecosystem where STS can adapt and improve through collective innovation. This approach will ensure that the tool remains inclusive, adaptable, and responsive to the changing needs of its users, while also fostering a sense of community ownership and involvement.

Establishing Co-Creation Guidelines:

As a final deliverable, we plan to create a comprehensive set of guidelines for co-creating with blind and visually impaired individuals. This documentation will provide insights, best practices, and methodologies for involving visually impaired communities in design and development processes, helping other projects adopt more inclusive and participatory approaches. By sharing this co-creation methodology, we aim to contribute to broader accessibility practices and advocate for user-centered design principles across various fields.

Technical Goals

Sonification of Images, Maps, and Other Web Elements:

Our primary technical goal is to develop an application capable of converting images, maps, and various web elements into layered soundscapes that reflect the spatial relationships and structures of the content. By sonifying these visual elements, we aim to create an immersive and dynamic experience that allows users to explore and interact with digital content in a multi-sensory way. This process will involve advanced spatial audio techniques to ensure that the soundscapes are intuitive and accurately represent the visual context of the web content.

Implementing Open-Source Naturalistic Voice Customization:

To provide a more engaging auditory experience, we will integrate advanced text-to-speech (TTS) engines that offer naturalistic, expressive voice options. Users will have the ability to customize voice settings, including pitch, speed, and tone, enabling them to tailor the experience to their preferences. Achieving real-time processing and maintaining clarity will be critical in ensuring that users can personalize their auditory experience without compromising system performance. This level of customization will make the tool more interactive, responsive, and user-centered.

Creating an Open-Source Highly Adaptable Alt-Text Generation Tool:

Another key goal is to develop a customizable alt-text system that enables users to generate or modify descriptive text for images, maps, and other visual elements. This will involve building a user-friendly interface that allows individuals to easily input and edit text descriptions. The system must be adaptable, supporting various media types and integrating seamlessly with the overall soundscape. This feature will give users more control over how visual content is described and interpreted, ensuring that the auditory experience is both accurate and meaningful.

Achieving Real-Time Responsiveness and Interaction:

We aim to design an application capable of processing text, images, maps, and videos in real-time, transforming them into an interactive spatial audio environment. The application should be responsive to navigational inputs, such as mouse movements or keyboard commands, and dynamically adjust sound localization based on user interactions. This responsiveness is crucial to providing an intuitive and immersive experience that adapts to the user's movements and preferences, making digital navigation more fluid and engaging.

Generalizability Across Different Platforms and Websites:

While full compatibility across all websites may not be feasible in this phase, we will begin by focusing on image- or map-based platforms, such as Google Maps or OpenStreetMap, to refine the tool's functionality. As we address initial challenges, we will gradually expand to tackle broader web-based complexities, such as handling dynamic JavaScript-rendered content, managing intrusive ads, and ensuring that AI models and voice generators remain responsive. A key aspect will be prioritizing relevant information, enabling the tool to identify and emphasize important content for an enhanced user experience.

Creating Tutorials and Documentation for User Learning:

To support the adoption and use of our tool, we will develop comprehensive tutorials that guide users through the navigational techniques and features of the application. These tutorials will be designed to be fully accessible, enabling users—especially those who are blind or visually impaired—to learn, adapt, and use the application independently. Clear documentation will ensure that users can maximize the tool's potential and personalize their experience, further enhancing its value as an accessibility solution.

By pursuing these co-creation and technical goals, Screen-to-Soundscape aims to evolve into a powerful, user-centered tool that redefines how blind and visually impaired individuals navigate digital content. This holistic approach ensures that our project not only addresses technical challenges but also remains grounded in the real-world experiences and insights of our co-creators, ultimately fostering a more inclusive and innovative digital landscape.

Thank you

A special shoutout to the organizations, Constant, The Processing Foundation, and the Stimuerlings Fonds, that believed in and funded our project.

Thank you to Luis Morales-Navarro for mentoring the team throughout this project.

A gracious thank you to our co-creators, Bruno, Chris, Raphael, and Joris, whose insights were fundamental to this design!