Over a billion people in this people belong to the Deaf Community..Over 70 million people in this community use Sign Language as a medium of communication..
A sign language is a language which uses manual communication and body language to convey meaning, as opposed to acoutically conveyed sound patterns.. This can involve simultaneously combining hand shapes,orientation and movement of the hands, arms or body, and facial expressions to fluidly expresses a speaker's thoughts..
Development of any language including sign language, requires ongoing interaction between the speakers of that language.. The knowledge of sign language is limited and therefore it becomes a huge problem for these people..
Deaf and Dumb people find it very difficult to communicate with people who have little or no knowledge about sign language. This widens the gap between these specially gifted people and others.. These specially gifted people are not able to express their ideas, feelings, emotions to the world and thus this becomes a huge problem for them.. These people find it very depressing to be not understood by the outside world.. As these people might have some brilliant ideas or talents which we are not able to see just because of this communication barrier.. Their ideas might never reach us which sounds very dissapointing..
A Solution to this problem is not at all a community service but a necessity for the upliftment of mankind..
Our project addresses this predicament of these people in our society..
We propose to build a Web Application to enable a two way communication between deaf or dumb community and a normal human being.. We use the power of Artificial Intelligence, Machine Learning and various other technical stacks to develop a full fledged application to address this challenge..
We also address the problem of localised sign languages by providing flexibity to the dictionary of our sign languages..
We propose to capture the gestures through a camera and translate it to text and further to speech using cognitive solutions.. Further we record the voice of a normal person and translate it to text, which would enable the deaf person to listen to a normal person .. Thus we establish a two way communication between the deaf and the normal person..
We deploy this application on a Web browser
Our Application being a WEB application would be freely accesible to all people in this community, thus providing accesibility to the under-priviliged ..
Frontend:It is used to provide realtime communication between the server and the client. It records the video from the webcam and posts it to the middleware server using formdata which in turn sends the video to the cnn model for prediction.It also enables text to speech and speech to text for the convienence of the deaf and dumb community.It accomplishes user friendly design and rich user interface.
Technologies used:React.js: It helps to build rich user interfaces and allows writing custom components.It offers fast rendering and is SEO-friendly thus it is more likely to rank higher on Google Search Engine Result Page.Recordrtc:Used to record WebRTC video media streams.It is very useful as it supports cross-browser video recording.Speak-tts:Used for speech to text implementation.
Middleware Server : The middleware server (i.e. server.js) is a relay between frontend and the machine learning model hosted on flask.When it encounters a post request it takes the video from the frontend and extract frames from it, so that it can be fed into the machine learning model.
This is implemented using Node.js because it uses non-blocking, event-driven I/O to remain lightweight and efficient in the face of data-intensive real-time applications that run across distributed devices.
Some libraries used are Express.js for server operation and FFmpeg for frame extraction.
Machine Learning Module Images have been accepted and preprocessed and then passed into a Neural Network for classification. Due to a large training data a very Deep Neural Network has been trained to reduce bias as well as variance of the classifier..
The training image data contained of self generated image data containing of 55500 images of 37 different labels..Since the training image data has been self generated so this can be used to increase the dictionary of gestures to adapt to further languages..
The Neural Network Architecture implemented is similar to a VGG 16 architecture.. Such a network has been chosen after a lot of test and trials.. A Deep network was preffered owing to the vast amount of variations in the Image Data..
The Neural Network has been implemented through Keras with Tensorflow serving as the backend.. It took about 2 hours to train the Neural Network on Google Colab Servers with GPU as the hardware accelarator..
In order to deploy into a web Application the predicting module has been hosted on a Flask Server, in order to make the module communicate with the FrontEnd.. Flask being a lightweight web framework was easily deployable..
https://drive.google.com/open?id=1PF9OzFuqZidcSdCnwkSAxLJyUIZ-WYMG
https://drive.google.com/open?id=1LIiyiRpSM-Tnj8Q4gXMhrIHkHypUfjKq
- Generate images and their corresponding lables by using genererating_symbols.py
- Generate rotated images to train on selected images using Rotate_images.py
- Generate pickle file to feed into cnn_model using generate_images_labels.py
- Go to Backend folder and Run The cnn_model.py to train the model
- With an h5 file now generated, start the flask server by running final_server1.py
- Start server.js in Backend folder using NODE JS.. This server extracts frames from the frontend and passes it to the neural network for prediction
- Start the react development front end server using node package module
- With the communication now established open the front end content
- click on start Record to capture frames and Stop Record to stop capturing frames
- click on text to speech to convert from text to speech
- Using start stop accordingly use it to convert speech to text
This application can easily be deployed to cloud servers so that this product could be used from any part of the globe..
Cloud Servers with Powerful GPUs could further speed up the forward propagation of image frames which would reduce the delay in response to a great extent..
With use of few other Apis such as Bare Bone Api and droplets we could further scale this product to serve a larger user-base..