88d8b.d8b. 88 88
88'`88'`88` 88 d8
88 88 88 88 K
88 88 88 88 98
88 88 88 88 88 88 88
This Python script demonstrates object detection using OpenCV's DNN module and provides spoken feedback using text-to-speech capabilities.
NOTE: you need to downlaod the yolov3.weights file (not included) to make the follwing code work.
Introducing the revolutionary video-to-speech algorithm, designed to empower the visually impaired community! Using neural networks and video recognition libraries, we’ve created Guide Dog to accurately describe visual content from videos in real-time. By converting visual information into clear and detailed spoken descriptions, our algorithm provides a seamless and immersive experience for users, enabling them to access a wide range of video content independently.
How does it work? Using Python, Guide Dog splits the screen into blocks and analyzes each. As it analyzes it slowly makes inferences and can put the image back together. Using a list of requirements, it makes a conclusion of what the item is. It then grabs data from the position of the object and supposed distance to say it through a speaker, allowing for the user to be guided.
Imagine effortlessly exploring educational videos, news clips, entertainment content, and more, with rich and descriptive audio narrations guiding your experience. Guide Dog speaks to the user, guiding them through their everyday life, enhancing the depth and richness of the auditory experience.
-
Install the required libraries:
- OpenCV (
cv2
) - NumPy (
numpy
) - Google's text-to-speech (
gtts
)
- OpenCV (
-
Download the YOLOv3 model files (
yolov3.weights
,yolov3.cfg
) and class names file (name.names
). There are repos for YOLOv3 and can be downloaded from these places. Even downaloded from this repo will be resourceful. Just amke sure all the files are in the same folder. -
Update the paths to the model weights, configuration file, and class names file in the code (
ObjectDetector
initialization).
-
Run the script
object_detection.py
.python object_detection.py
-
The script will initialize the object detector and text-to-speech engine, then start capturing frames from the default camera.
-
Detected objects will be spoken out loud, and bounding boxes with labels and confidence levels will be displayed on the video feed.
-
Press 'q' to quit the application.
-
__init__(self, model_weights, model_config, class_names_file)
: Initializes the object detector using YOLOv3 model files and loads class names from a file. -
detect_objects(self, frame)
: Performs object detection on a frame, returning detected objects with labels, confidences, and bounding box coordinates.
-
__init__(self)
: Initializes the text-to-speech engine. -
speak(self, text)
: Converts text to speech and speaks it out loud.
- Python 3.x
- OpenCV (
cv2
) - NumPy (
numpy
) - gTTS (
gtts
)
Something to note is that Guide Dog is still in its early stages. The next steps would be to make the video recognition much more immersive and allow the audio assistant to actually have full description of what is in front of the user, rather than simply stating the object.
Maninder Kaur