Delivering Sensory and Semantic Visual Information via Auditory Feedback on Mobile Technology

Abstract

This project seeks to create and assess new visual-assistive smartphone Apps for fully blind end users. These Apps convey information gathered by sensors (color, distance, heat) and artificial intelligence (object recognition) through both spoken verbal feedback and 'musical' audio that intuitively conveys the locations, visual properties, and identities of objects in the environment. Our research purpose is to produce new Apps that are tailored to blind end users to increase visual information accessibility, enhance daily functionality, and facilitate new interactions of interest to them. In terms of scope, this 2-year project focuses on the development of novel technologies in the first year, with at-home beta-testing by fully blind subjects in the second year. The technology development focuses on new iPhone sensors (LiDAR range-finding, plug-in thermal cameras) and their support for state-of-the-art object recognition techniques that run in real-time, locally on iPhone. During year 1, we found that DeepLabV3, which accurately segments object shapes from live visual images, provides new interaction possibilities. Here the location, shape, size, and identity of recognized objects in a scene can be rapidly presented to users through musical feedback. This represents scenes at the semantic-level (objects). This goes beyond prior technologies that operate at the 'sensory-level' (e.g. brightness, distances, heat) to provide a more intuitive understanding of the environment that remains stable across variable conditions. Furthermore, since object identity is known, we provide optional verbal feedback that tells users each objects name and describes its location in the image. This provides live user support and training within the App. Building on this, we are preparing for user testing in year 2. Here blind end users will beta-test our Apps that convey information at various levels (e.g. sensory, semantic).

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2022
Accession Number
AD1191072

Entities

People

  • Giles Hamilton-fletcher
  • Kevin C Chan

Organizations

  • New York University

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Assistive Technologies
  • Audio Files
  • Beta Testing
  • Biomedical Research
  • Computer Vision
  • Computers
  • Data Displays
  • Data Science
  • Engineering
  • Governments
  • Institutional Review Board
  • Language
  • Medical Personnel
  • Mobile Phones
  • New York
  • Object Recognition
  • Patent Applications
  • Professional Development
  • Recognition
  • Rehabilitation
  • Students
  • Training
  • User Interface

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computer Vision.

Technology Areas

  • AI & ML