This project was created during the GenAI Hack AI Campus, with the goal to help visually impaired people to navigate in their environment.
The user verbally asks a question about their environment, a photo is taken and AI companion answers the question using text-to-speech, leveraging VQA and Object Detection.
The prototype is based on the Google Cloud Platform and uses Palm2, VILT VQA, BLIP-2, YOLOv8, text-to-speech and speech-to-text.
- Aditi Bhalla
- Mert Keser
- Théo Gieruc
- Wencan Huang