URVOICE: AN AKL-TOUSSAINT/ GRAHAM-SKLANSKY APPROACH TOWARDS CONVEX HULL COMPUTATION FOR SIGN LANGUAGE INTERPRETA-TION

Abstract

We present URVoice, a vocalizer for the communication impaired, based on the Indian Sign Language Notations. Contemporary psychological theories consider language and speech as devices to understand complex psychological processes and deliver them as cultural products of ideas and communication. Sign and gesture language, offering an intelligent co-ordination of eye-and-hand and ear-andmouth, has evolved as an intelligent manifestation of speech for the impaired. However, they have very limited modality and iconicity in accommodating a greater range of linguistically relevant meanings. URVoice is an Augmentative and Alternative Communication (AAC) device, which currently features a pipeline of forward communication from signer to collocutor with a novel approach shouldered on convex hull using vision based approach. The solution achieves real time translation of gesture to text/voice using convex hull as the computational geometry, which follows Akl-Toussaint heuristic and Graham-Sklansky scan algorithms. The results are weighed against our other solutions based on conventional Machine Learning and Deep Learning approaches. A futuristic version of URVoice, with voice translated to sign language gestures, will be a complete solution for effectively bridging the cognitive and communication gap between the impaired and the abled lot.

1. INTRODUCTION

The truism of language and scientific linguistics emerged from man's reinforcing vocal behaviour as a response to the evolving social and natural circumstances. The human species grew to evolve as the singular evolutionary group with a unique neurological organization to support language. Speech and language, that started off as a mapping of meanings to sounds, has now grown to mapping of complex representational intelligence to complicated cognitive communication systems. They have further succeeded in understanding the structural compositions of language in terms of underlying mental expressions. A communication disorder in human information processing system adversely affects a person's ability to talk, understand, read, and write. Individuals with speech and language impairments lack sufficient representational and communication intelligence and leave them with very less choice to express the forbiddingly abstract levels of subtilities in human communication. Speech and language impairments are considered a high-incidence disability BrainFacts (2012) Francisco (2017) Yukiko & Kiyoshi (2004) . Sign and gesture language, attributed to an intelligent co-ordination of eye-and-hand and ear-andmouth, evolved as an intelligent manifestation of speech for the impaired. There is no universal sign language used around the world. There are about 138 to 300 different types of sign languages used around the globe todayRichard (2018). A few most widely used sign languages are discussed in detail in Table A1 . However, they have very limited modality and iconicity in accommodating a greater range of linguistically relevant meanings. They also fail to cover the verbal spectrum of temporal and spacial characteristics of communication. Bridging this gap shall strengthen their mental thoughts, avoid reliance on interpreters, and shall also provide access to new technologies.

1.1. TECHNOLOGICAL/MEDICAL SOLUTIONS FOR THE IMPAIRED: STATE OF THE ART

The existing solution includes text-to-speech and sign-to-speech software enabling the speech impaired and the deaf and mute to "speak". These Augmentative and Alternative Communication (AAC) devices (listed in Table A2 ) range from a simple picture board to a computer program that synthesizes speech from text.

1.2. PRESENTING URVOICE, OUR SOLUTION

Considering the limitations of existing AACs, our study focused on designing a cheap, compact and portable vocalizer involving vision-based approaches. We present URVoice, a vocalizer based on the Indian Sign Language Notations. It provides simpler and more intuitive way of communication and makes it possible for remote communication. URVoice achieves real time translation of gesture to text/voice using convex hull as the computational geometry, which follows Akl-Toussaint heuristic and Graham-Sklansky scan algorithms. The results are weighed against our other solutions based on conventional Machine Learning and Deep Learning approaches. A futuristic version of URVoice, with voice translated to sign language gestures, will be a complete solution for effectively bridging the cognitive and communication gap between the impaired and the abled lot.

2.1. URVOICE: ARCHITECTURE

URVoice converts gestures as visual input into audio/ text output for a collocutor or relays as text message to a computer. Similarly, it takes in audio as input from the collocutor/ computer and converts it into gesture/ text as output for the signer. This duplex communication model shall run on an accelerator hardware for optimal performance in real-time communication. A block diagram of the functioning of the vocalizer is presented in the Fig. 1 . This paper features a pipeline for oneway communication in URVoice, i.e, visual input to audio/ text output. The gestures are captured in real-time and processed to give audio/ text as output.

2.2. CONVEX HULL OPTIMIZATION: A NOVEL APPROACH IN URVOICE

The novelty of URVoice is the use of convex hull as the computational geometry method for recognition of gestures involving use of threshold technique for image segmentation and extraction of various mathematical features from the convex hull. The recognition is accomplished by a simple 



Figure 1: Functional overview of the Vocalizer

