Accessible Safari

A Sonified Safari Experience for the Vision-Impaired Through Real-Time Object Detection and Scene Description

Accessible Safari

Vision-impaired individuals can face barriers that prevent them from participating in the same experiences as non-vision-impaired individuals. One area where this is relevant, and sometimes overlooked, is in activities related to travel and tourism.

As in investigation into mobile and ubiquitous computational systems for the vision impaired, our team developed an accessible mechanism that allows vision-impaired individuals to take part in a safari by enabling participants to learn about their surroundings through sonification and scene description.

Real-Time Demo Running a Youtube Safari Video as Input

Our system uses computer vision (YOLOv8) to identify and track animals across the x, y, and z planes. Additionally, scene recognition is performed at specified intervals using the multi-modal LLaVA model and played back using text-to-speech. The audio software, MaxMSP, receives data from YOLO to produce a sonification conveying real-time data about the animals. The full system runs in a Streamlit web application with a simple UI. While this project specifically focused on a Safari experience, the technology can be adapted and further developed for various use cases for the vision-impaired.

System Architecture

Below is a simplified diagram of the system. The main application consists of four parts: a web server (input), Max MSP (output), an object detection pipeline (processing), and a scene description pipeline (processing). The web server functions as the site that the user interacts with. The application handles three types of input: live webcam footage (real-time sensing), recorded video footage (testing and debugging), and direct YouTube footage (testing and debugging).

System Architecture

Sonification

MaxMSP Runs a Node.js server to listen for incoming object detection (YOLOv8) results. For each object detected, a new synth instance is generated with a unique, harmonic root note. Values from YOLO are mapped to control synth parameters and audio effects. Our approach prioritizes information perception and understanding over musicality and aesthetics.

Sonification Built in MaxMSP with Node for Max

More details on the research, design, and technology are described in the research paper, which can be accessed here.