An abstract arrangement of 13 colored dots on a green background, forming a loose arrow or tree-like shape pointing upwards. The dots vary in color, including shades of green, blue, yellow, orange, and red, and are spread out with some closer together, giving a sense of layered depth and subtle gradient.

Screen-to-Soundscape adopts an experimental approach to re-imaging screen readers in order to address the current limitations for blind and visually impaired users. Our goal is to develop a free and open-source explorative tool that transforms a screen into an immersive soundscape, with a strong focus on providing rich, descriptive alt-text for images and maps. Using open-source computer vision algorithms, our system will analyze visual elements to generate detailed and customizable alt-text tailored to user preferences, offering a more comprehensive understanding of visual content. Additionally, the prototype will feature spatial audio, using multiple layered voices to read out the content, which ideally would enhance the users' navigation and interaction with digital content.

Our motivation is to provide a more intuitive and engaging navigation experience. Traditional screen readers often skip images, videos, and maps, and offer limited customization, especially in voice diversity. By incorporating spatial audio, novel computer vision algorithms, diverse voice options, and a customizable alt-text tool, our tool ensures all content is accessible and allows users to personalize their auditory experience, making digital navigation more natural and comprehensive.

Check out our phase 1 prototype demo below.

You can listen to the Screen-to-Soundscape NotebookML generated podcast here:

Français

Screen-to-Soundscape adopte une approche expérimentale pour réimaginer les lecteurs d'écran, en abordant les limitations actuelles pour les utilisateurs aveugles et malvoyants. Notre objectif est de développer un outil d'exploration gratuit et open source qui transforme un écran en un paysage sonore immersif, en mettant l'accent sur la fourniture d'un texte alternatif riche et descriptif pour les images et les cartes. En utilisant des algorithmes de vision par ordinateur open source, notre système analysera les éléments visuels pour générer un texte alternatif détaillé et personnalisable adapté aux préférences de l'utilisateur, offrant une compréhension plus complète du contenu visuel. De plus, le prototype comportera un son spatial, utilisant plusieurs voix superposées pour lire le contenu, ce qui, idéalement, améliorerait la navigation et l'interaction des utilisateurs avec le contenu numérique.

Notre motivation est de fournir une expérience de navigation plus intuitive et plus engageante. Les lecteurs d'écran traditionnels ignorent souvent les images, les vidéos et les cartes, et offrent une personnalisation limitée, en particulier en termes de diversité vocale. En intégrant l'audio spatial, de nouveaux algorithmes de vision par ordinateur, diverses options vocales et un outil de texte alternatif personnalisable, notre outil garantit que tout le contenu est accessible et permet aux utilisateurs de personnaliser leur expérience auditive, rendant la navigation numérique plus naturelle et plus complète.

Screen-to-Soundscape was supported by: