Prototype
Github link to Screen-to-Soundscape prototype will be coming soon!
Prototype Development
The goal of this prototype is to showcase current spatial audio technology and spark discussion about its potential uses and interactions during the co-creation session, rather than to propose a concrete solution. To achieve this, we developed a technical prototype that transforms a basic webpage into an immersive soundscape. The prototype was built using A-Frame (https://aframe.io/), a web framework for creating VR experiences, and is hosted on Glitch (https://glitch.com/).
The sound component in A-Frame allows us to position audio sources within a virtual space and simulates the effects of hearing them from different directions. When browsing the space, users navigate using arrow keys to move forward, backward, left and right, similar to a conventional VR experience. In most VR applications, users can also use the mouse to look around in a scene; however, we found that this interaction is difficult to operate without visual feedback. That being said, it would be interesting to include such interaction using a VR goggle, as the orientation corresponds directly with the user’s head orientation, resulting in a more natural interaction.
Design of first prototype
For the first prototype, we chose to convert a Wikipedia page about screen readers into a virtual soundscape. The layout of this virtual space is shown in Figure 1. Different coloured dots represent various audio elements from the Wikipedia page, such as the title, headers, and content paragraphs. In the experience, users begin near the first red dot, which corresponds to the page’s title, and can explore the space freely using the arrow keys. Audio is triggered only when users approach the relevant dot. It’s important to note that the visual representation is simply a development aid and is not intended to assist with using the prototype.
You can try our first prototype below. Use the arrow keys to navigate. For best experience, please use google chrome, make it full screen, and wear headphones.
You can experience the prototype via this link (https://screen2soundscape-cocreation1.glitch.me/) and access the code here (https://glitch.com/edit/#!/screen2soundscape-cocreation1?path=index.html%3A1%3A0https://glitch.com/edit/#!/screen2soundscape-cocreation1?path=index.html%3A1%3A0 ).
Feedback from first co-creation
Most participants found the keyboard navigation easy and intuitive. However, a key challenge that emerged during our discussion was navigating the virtual space without visual feedback. As one participant explained, "It may be easy for able-bodied people to know where they are, but for those without vision, they have to rely on constant cues to navigate physical spaces, like bumps on the street." Since audio sources only play when users move close to them, it becomes difficult for them to determine their location or where to move when no sound is triggered. Another issue relates to the boundaries of the space. We didn’t implement limits, so users could easily wander beyond the designated audio area and become disoriented.
Despite the challenges, participants praised the spatial layout and freedom of exploration supported by the prototype. These features break away from the limitations of traditional screen readers, which typically “reduce everything to a list,” stripping away the spatial context of a website. Participants also noted that the prototype could be especially useful for exploring visual materials, such as maps and graphs, which is impossible using standard screen readers.
Reflection on the first co-creation
The discussion on navigation during the first co-creation session led us to reflect on how both able-bodied and visually impaired individuals perceive and navigate physical spaces, and how this understanding can be adapted to the digital realm. In our research, we were particularly interested in Daniel Montello’s theory of psychological space, which categorizes spaces based on their scale. Montello identifies four types of psychological spaces:
Figural space: Smaller than the body, where properties can be perceived directly from one place without significant movement.
Vista space: Similar in scale to the body, visible from a single location without substantial movement.
Environmental space: Larger than the body, requiring movement and exploration to comprehend.
Geographical space: Vastly larger than the body, not navigable through movement, and understood through symbolic representations.
We found this classification useful as a framework for conceptualizing navigation, which can be applied to both digital spaces and aural environments. In the context of Screen2Soundscape, we believe that an accessible online sonic environment should provide direct access to immediate information (figural and vista spaces) while also allowing users to explore broader contexts through interaction (environmental space). Additionally, it could offer extra contextual information through techniques like sonification (geographical space).
Design of the second prototype
In the second prototype, the first change we implemented was to restrict the user's movement to the area around the audio sources, with corresponding audio cues that notify users when they collide with the boundaries.
In addition to triggering audio playback based on proximity, we introduced two new keyboard controls to enable more flexible exploration. Users can now press the spacebar to toggle the playback of all audio sources on or off. After testing, we also adjusted the sound component parameters. Specifically, we changed the distance model from the default inverse to an exponential model and set the rolloffFactor and refDistance to 3. This adjustment ensures that users hear softer audio only from nearby sources, preventing the confusion of multiple audio sources playing simultaneously. The goal is to provide location cues without overwhelming users with too much information at once.
Another control allows users to press the left shift key to trigger the playback of the nearest audio source. When users do not wish to hear multiple voices at the same time, they can use this feature to navigate to the closest audio source.
Ideally, we also wanted to introduce a keyboard control that would let users navigate the nested structure of audio sources. For example, when a user reaches a paragraph, they could press a key to browse the headers within which the paragraph is embedded. This idea came from our observation that current screen readers group elements by type—such as links or headers—without considering the contextual information around them. By implementing this feature, we aimed to support more meaningful, semantic exploration of web content. However, due to time and technical constraints, we were unable to include this feature in the second prototype.
You can try our second prototype below. Use the arrow keys to navigate. For best audio experience, please wear headphones.
The second prototype can be experienced here: https://screen2soundscape-cocreation2-soundcues.glitch.me/ and the code can be accessed here: https://glitch.com/edit/#!/screen2soundscape-cocreation2-soundcues?path=README.md%3A1%3A0
Customized Alt-Text Generation and Image-to-Soundscape Prototype
Below is an example of our prototype of an alt-text generation and image-to-soundscape generation for Garden of Earthly Delights by Hieronymous Bosch, as visualized on the left. On the right, there are four different audio files. Each audio file is an example of the custom alt-text generation we created for the Garden of Earthly Delights. he first audio represents an alt-text crafted for an art curator, while the following three were designed for a child, with the second and third showcasing a more upbeat and lively tone. The final audio incorporates a soundtrack generated using Imaginary Soundscapes, seamlessly blending with the alt-text.
Hieronymus Bosch, The Garden of Earthly Delights, oil on oak panels, 205.5 cm × 384.9 cm (81 in × 152 in), Museo del Prado, Madrid
Feedback from the second co-creation
Participants appreciated the addition of boundaries, as it offered both practical and psychological safety, along with keyboard control, which gave them greater freedom and command over their experience. However, some raised important questions about the content of the prototype. Since they were already familiar with browsing Wikipedia pages using screen readers, they wondered if it was worthwhile to relearn a new method for interacting with similar content. At the same time, they suggested again that spatial exploration could be particularly useful for navigating visual information that relies on spatial layout.
Reflection on the second co-creation
At this stage of the project, despite challenges with language and time constraints, the co-creation sessions have provided a valuable opportunity to connect with participants and sensitize ourselves with their experiences, aspirations, and challenges, both online and in everyday life. The prototype, acting as both a technical probe and a conversational tool, sparked meaningful discussions around specific issues and ideas. While the initial goal of the project was to "reimagine" the web through sound, after learning more about the participants' needs and wishes, we recognized the importance of emphasizing accessibility. This led us to shift the project's technical development towards creating a functional tool that can truly benefit visually impaired communities. Moving forward, we aim to engage participants more deeply in the design and development process, exploring how we can genuinely "co-create" solutions together.
Plan for the future co-creation
In the next phase of the project, we plan to focus the co-creation process on three main activities. Each activity will likely take one to two co-creation sessions. For each session, we will establish clear communication guidelines and ensure that all materials are fully accessible. Beyond developing the technical tool, a key outcome of these sessions will be to create a methodology or set of guidelines for co-creating with visually impaired individuals. This can benefit similar initiatives and be applied to future co-creation activities, enabling participants to truly collaborate as partners in the project.
The first activity will center on ideation and concept development. The goal is to engage participants in brainstorming sessions to generate ideas for the tool and better understand their needs. In these sessions, participants will identify the types of information they want to access when exploring visual materials. We will use collaborative methods, such as tactile sketching and verbal descriptions, to encourage creative input on the tool's features. Afterward, we will organize and categorize the ideas from the workshops and collectively create low-fidelity prototypes based on the participants' suggestions.
The second activity focuses on prototyping and testing. Here, we aim to refine the tool's design based on feedback from participants. We will present the low-fidelity prototypes from the first activity using tactile representations, audio descriptions, and screen-reader-accessible versions. Participants will then take part in hands-on testing sessions, navigating and exploring the prototypes. Feedback will be collected on usability, accessibility, and the overall user experience, with a focus on practical concerns such as ease of use and time efficiency. Together, we will finalize the key features of the tool. Once the design is agreed upon, we will begin developing a functional, high-fidelity prototype.
The third activity is advanced testing and refinement. In this phase, we will test the high-fidelity prototype and fine-tune the tool based on user feedback. Usability testing will involve exploring more complex visual content, such as charts and infographics, to evaluate the tool’s effectiveness in real-world browsing scenarios. We will gather insights on functionality, accessibility, and ease of use, iteratively refining and revising the design and features based on the results of these sessions.
Throughout the co-creation process, we will maintain an open dialogue with participants to continuously identify areas for improvement and additional features. After the tool's development, we will continue to evolve it to meet users' changing needs. We will also conduct periodic follow-up sessions to monitor how the tool is being used and determine where further adjustments may be necessary.