Unfortunately stable-ts and whisper don’t obviously output which files it is working on, so you’re dependent on trying to decipher it from the logs. I tried to add prints to show which files it has queued and started, but with threading, the std-out sometimes gets lost or buffered in strange ways.
Completed the Bazarr integration. See: https://github.com/McCloudS/subgen/edit/main/README.md#bazarr
I'm not sure yet. Faster-whisper has some benchmarks of the Largev2 model taking about 1 minute for 13 minutes of audio. Smaller models ought to be quicker. Unsure if the specs of the GPU will make much differenece.
It can only translate into English, but the source audio can be a foreign language.
I just tried, Emby won't actually send out the webook on an action. I can use the test webhook, but it won't trigger off media actions. Documentation half-implies that it's a premiere options?
If I knew what the endpoints were, nothing would prohibit it. I can add it to my short list.
It should detect the foreign language and make english subtitles, but I haven't personally tried it.
I'm not using whisper.cpp anymore. I did some short comparisons between WhisperX and stable-ts and ultimately decided to go with stable-ts. Functionally, I'm sure they're very similar.
Hey all,
Some might remember this from about 9 months ago. I've been running it with zero maintenance since then, but saw there were some new updates that could be leveraged.
What has changed?
- Jellyfin is supported (in addition to Plex and Tautulli)
- Moved away from whisper.cpp to stable-ts and faster-whisper (faster-whisper can support Nvidia GPUs)
- Significant refactoring of the code to make it easier to read and for others to add 'integrations' or webhooks
- Renamed the webhook from webhook to plex/tautulli/jellyfin
- New environment variables for additional control
What is this?
This will transcribe your personal media on a Plex or Jellyfin server to create subtitles (.srt). It is currently reliant on webhooks from Jellyfin, Plex, or Tautulli. This uses stable-ts and faster-whisper which can use both Nvidia GPUs and CPUs.
How do I run it?
I recommend reading through the documentation at: McCloudS/subgen: Autogenerate subtitles using OpenAI Whisper Model via Jellyfin, Plex, and Tautulli (github.com) , but quick and dirty, pull mccloud/subgen from Dockerhub, configure Tautulli/Plex/Jellyfin webhooks, and map your media volumes to match Plex/Jellyfin identically.
What can I do?
I'd love any feedback or PRs to update any of the code or the instructions. Also interested to hear if anyone can get GPU transcoding to work. I have a Tesla T4 in the mail to try it out soon.