This project brings to life the imaginative stories of a 3-year-old by using the OpenAI API to create videos with AI-generated images and subtitles. Recordings of a child's voice are transcribed, subtitled, and then transformed into captivating videos with visually appealing background images.
Initial results here
export OPENAI_API_KEY=...
mkdir sources
# Place the mp3 files in the sources folder
Split long audio files into smaller segments (up to 20MB each) for efficient processing.
segment.py sources/eric1.mp3
# Results: sources/eric1_0.mp3, sources/eric1_1.mp3, ...
Convert audio segments to text and subtitles.
transcribe.py sources/eric1
concatenate.py sources/eric1_
transcribe_srt.py sources/eric1
concatenate_srt.py sources/eric1_
# Results: eric1.txt, eric1.srt
Refine the transcribed text using GPT-4 for more coherent storytelling.
prompt.py sources/eric1
# Results: eric1_0_corrected.txt, eric1_1_corrected.txt, ...
Generate a basic video with audio and subtitles.
video.py sources/eric1
Create a video with DALL-E generated images for every minute of audio.
# creates a new srt with 1 minute accumulation of text.
python3 video_dallee_accumulated.py
# creates a new srt with prompts instead of accumulationm of text
python3 video_dallee_gpt4.py
# creates the images from the prompts into a folder
python3 video_dallee_dalle.py
# creates the video using the mp3, the subtitles, and the images (from the folder, using the timestamps in the srt)
python3 video_dalle.py
- Audio Segmentation: Split long audio files into smaller segments (up to 20MB each) for efficient processing.
- Transcription: Convert audio segments to text and subtitles.
- Text Improvement with GPT-4: Refine the transcribed text using GPT-4 for more coherent storytelling.
- DALL-E Image Generation: Use DALL-E to generate images for every minute of audio.
- first run, multiple scripts
- concatenate all scripts into one
- Dynamic lenght: Algorithm for a dynamic lenght of video segments, based on the content of the audio
- when there is a lot of conent, the video segments are shorter.
- try to figure out if there's a story, if there are characters, if there's a plot, etc.
- use the above to determine the lenght of the video segments
- use the above to determine the content of the video segments
- Automate the Workflow: Develop a script to automate the entire process from audio segmentation to video creation.
- Use SQLlite to store data and deal with SRT timestamps
- Interactive Web Interface: Create a web application allowing users to upload audio and customize the video generation process.
- Interactive Web Interface: Create a web application allowing users to upload audio and customize the video generation process.
- Narrative Enhancement: Implement more advanced NLP techniques to enrich the storytelling aspect.
- Custom Image Styles: Integrate options for different illustration styles in DALL-E image generation.