-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement]: Add Whisper support #1723
Comments
Maybe we first add support for an The opposite of this was also requested for ebooks #601 |
I guess a natural extension (and this WOULD make audiobookshelf an Audible killer) would be a whispersync like syncing with ebooks. I guess that would just be shifting the bookmark position of the ebook/audiobook whenever one or the other is progressed. Not a lot of work specifically, but some from a quick google: https://github.com/readbeyond/aeneas/ and https://github.com/r4victor/syncabook. Though any library that can match up audio based on a text file, obviously there's a bunch of work to find the start of the chapter, and the start of the audiobook and match that up but that's more of a pipe dream. Aens probably shows the most reasonable promise, but obvious difference between the formats (any preamble by narrators, or table of contents with ebooks) would all be factors. Without digging into the libraries, as long as there was enough error handling to wait til both files had matches (and skip over extras in one or the other) that may go quite smoothly. |
There is also this issue #189 |
I tested Whisper on my setup and results are kind of good but far from perfect. Using base model it took me around 4 minutes to transcribe 1 hour audiobook. Those tests were done on 5950X with 12 parallel threads (no GPU involved). |
I think Whisper (or some kind of speach-to-text) integration could be really nice to be able to transcribe audiobooks if people wanted subtitles. It would be a nice accessibility feature for someone listening in a second language, for example. |
👍thumbs up. Would be very nice for self recorded audiofiles. |
Should we consider working on this? I can volunteer to start contributing. |
Yeah I'm interested in this but I can't put much attention towards it now. We were talking about it in Discord the other day. If anyone wants to start putting something together or setup a proof of concept that would be great. We can chat about it in Discord |
@advplyr I agree with what you said about adding support for SRT subtitle files first. I have now used Whisper to generate corresponding subtitles for my local podcasts. On my computer, I can search and view them. Displaying subtitles while playing on the ABS mobile app is the final piece of the jigsaw. I think these two things can share the same UI: LRC files #817 (external LRC files with the same name as the audio file or ID3 information embedded in the audio file) and SRT files #2257 (external SRT files with the same name as the audio file). |
this is exactly what would be great with ABS. I have the paid feature for Snipd and now it's hard to take notes without it using audiobooks. |
I agree, and it isn't hard to generate the .srt files from audio now. Maybe there should be a branch to work on this, I'd propose doing it in this order:
Honestly as a further extension I would LOVE if you could do audio-clips like Snipd that could export to Obsidian or something, but I think having the ability and UI set up for transcriptions would be the first hurdle for that. We could add audio clips on back button like Snipd does after that. |
just FYI there are several implementations of whisper specifically tailored to subtitle generation. This one for example https://github.com/jianfch/stable-ts can not only generate srt, but also ssa/ass karaoke style subs [meaning that the current spoken word is highlighted] bringing us even closer to snipd. From my experience base and small models are enough with it. |
Thanks for the heads up, hopefully I can get some time to work on a PR for this type of thing. I haven't contributed yet though so I imagine it will take me a bit to get familiar with the code base and what needs to be updated for this kind of feature. |
@turnercore that would be awesome! |
Describe the feature/enhancement
Hi there!
I've been using AudioBookShelf for a while now, and I love the platform. I was thinking about how it could be improved, and I had an idea that I wanted to share with you all.
I think it would be great if AudioBookShelf could integrate with Whisper speech-to-text model to automatically generate subtitles for audiobooks. This could be an external tool like Tone and ffmpeg that the user could enable or disable as needed.
With Whisper, it would be possible to transcribe speech across dozens of languages and even handle poor audio quality or excessive background noise. It would make it easier for people who are hard of hearing or have difficulty understanding accents to enjoy audiobooks.
Here are some tips on how to integrate this feature into the AudioBookShelf flow:
I hope you will consider this suggestion for future updates to AudioBookShelf. Let me know if you have any questions or concerns.
Thank you for all your hard work making AudioBookShelf a great platform!
The text was updated successfully, but these errors were encountered: