Speech-to-Text Insights: Transcription and Analysis

This project delves into the realm of speech recognition, aiming to convert spoken language into written text. Leveraging various libraries and technologies, the project transcribes audio from diverse sources, especially focusing on children's voices, and conducts comprehensive analyses of the transcribed data.

Data Source

This project employs a range of audio samples as its foundational data sources:

The comparing-speech-to-text-systems notebook primarily employs diverse audio samples to evaluate and compare various online transcription services. These samples encompass live recordings and pre-recorded audio files, including those of children's voices, to assess the transcription efficacy of each service.
The data in the transcript-conversion-analysis notebook predominantly stems from the TalkBank project, specifically its HomeBank component. HomeBank is a rich resource of shared recordings documenting children's everyday experiences. This notebook delves deep into the exploration, conversion, and analysis of transcripts derived from this data.

Libraries Used

The project utilizes a variety of libraries and services to facilitate speech recognition and data analysis:

SpeechRecognition: A Python library for performing speech-to-text conversion.
Google Cloud Speech-to-Text: Google's cloud-based engine for speech recognition.
AWS SDK for Python (Boto3): Python SDK to interact with AWS services, including AWS Transcribe.
IBM Watson Developer Cloud Python SDK: SDK enabling access to IBM Watson's speech-to-text services.
Azure SDK for Python: Python SDK for interfacing with Microsoft Azure services, including its speech service.
Rev.ai Python SDK: SDK for integrating with the Rev.ai transcription service.

Analysis

The project's primary objective is to transcribe spoken language with high accuracy. Analyses are divided into two Python notebooks.

comparing-speech-to-text-systems notebook:

Transcription Engines: This notebook delves into the comparison and utilization of various online speech-to-text systems such as GCP, AWS, IBM, Azure, and Rev.ai.

transcript-conversion-analysis notebook:

Transcript Conversion: Details methods for converting and structuring the transcripts for subsequent analysis.
Frequency Analysis: Methodologies are provided for evaluating word and phrase frequencies.
Sentiment Analysis: Highlights the potential for sentiment analysis.
Length Analysis: Investigates statement lengths and verbosity patterns.
Topic Modeling: Discusses potential topic identification from the transcribed data.

Key Achievements

Versatile Transcription: The comparing-speech-to-text-systems notebook showcases successful transcription of children's voices using multiple services.
Detailed Analysis: The second notebook, transcript-conversion-analysis, offers a comprehensive analytical approach to the transcripts.
Diarization: Emphasis on speaker separation is highlighted in the notebooks, enabling detailed breakdowns of spoken content by individual participants.

Conclusion

The Speech-to-Text-Transcription-and-Analysis project showcases the capabilities and adaptability of modern speech recognition techniques. By integrating findings from both notebooks, the project offers a comprehensive system for transcribing and analyzing spoken language, especially focusing on children's voices.

Future Work

Potential advancements and extensions for this project include:

Advanced Recognition Engines: Integration with more state-of-the-art recognition engines.
Neural Network Models: Exploration of deep learning-based models for enhanced transcription accuracy.
Complex Diarization: Handling more intricate audio samples with multiple speakers and background noise.
Natural Language Processing: Incorporation of advanced NLP techniques to further refine and structure transcribed content.

Note

For a comprehensive understanding of the methodologies and conclusions drawn in this project, it is recommended to go through both notebooks in detail, including the code and its outputs. For an HTML view of the comparing-speech-to-text-systems notebook click here while the HTML version of the transcript-conversion-analysis notebook can be found here

Author

Jesus Cantu Jr.

Last Updated

October 10, 2023

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
generated_audio_transcripts		generated_audio_transcripts
original_audio_files		original_audio_files
original_audio_transcripts		original_audio_transcripts
.DS_Store		.DS_Store
README.md		README.md
TalkBank.png		TalkBank.png
comparing-speech-to-text-systems.ipynb		comparing-speech-to-text-systems.ipynb
transcript-conversion-analysis.ipynb		transcript-conversion-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text Insights: Transcription and Analysis

Data Source

Libraries Used

Analysis

Key Achievements

Conclusion

Future Work

Note

Author

Last Updated

About

Releases

Packages

Languages

JESUSC1/Speech-to-Text-Transcription-and-Analysis

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text Insights: Transcription and Analysis

Data Source

Libraries Used

Analysis

Key Achievements

Conclusion

Future Work

Note

Author

Last Updated

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages