Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma
GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled. You may as well think of it as "speaking guide dog," as the name suggests. It has three key features based on the scene captured by your mobile phone:
- Reads text upon command
- Describes the scene around you upon command
- Warns you if there is an obstacle in front of you
Check out this demo video to learn more about our app!
-
UI/UX
- Simple and Responsive
- Voice Assistant architecture for targeted audience
-
Libraries / APIs
- GC Speech-to-text and Text-to-Speech
- Android SDK , androidX
- ML Kit object detection and tracking api
- TensorFlow Lite MobileNet Image Classification Model
-
Flask API
- Image Captioning
- Optical Character Recognition
-
Deployment
- Google App Engine
- fast central API with different endpoints
We used tensorflow to build and train model for image captioning on MS-COCO 2014 based on the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. The model uses standard convolutional network as an encoder to extract features from images (we use Inception V3) and feed the generated features into an attention-based decoder generate sentences. While the paper used LSTM model as a decoder, we use a simpler RNN instead.