Dating back to the del.icio.us days, I've squirreled away links to articles that I will surely read "as soon as I have time". Unfortunately, in reality, I soon forget about them.
A couple of years ago, I realized I was listening to a lot of not-so-great podcasts, and I'd rather spend some of that ear-time catching up on my never-ending backlog of articles.
I hacked up a quick and dirty set of Python scripts that run my saved articles through a TTS engine and produce a personal podcast feed of my backlog of articles.
This has been a pretty useful hack for me, and I thought I'd share the pieces of it with the world. (My actual scripts are pretty specific to my needs and pretty dirty, so I am just giving you the recipe to make your own.)
The article-to-podcast pipeline consists of a few steps:
- Saving the article:
I've used various link aggregators over the years. I'm currently using omnivore and their browser extension / Android app to quickly save articles I come across.
At some point, I may switch to a self-hosted solution like wallabag. In the past, I've never really wanted to self-host my link aggregator. I want access to it outside of my network, and I didn't think it was worth opening ports for. But now that tailscale has solved the remote access problem, I should probably reevaluate.
- Extracting the title and text from the HTML:
If you're using a link aggregator, this will probably be done for you by their API. There is also a nice Python project that does the extraction called Newspaper3k.
- Running the text through a text-to-speech engine:
For many years, I used Mycroft's mimic3 engine. This produced understandable but nowhere near great TTS.
Recently, I've switched to StyleTTS2. I've only been using it for a week, but its results are a definite step up from mimic3. I made a repo that has StyleTTS2 packaged up into a Docker container here.
- Encoding the TTS audio into an MP3:
FFMPEG to the rescue! I'm using:
ffmpeg -i pipe:0 -ac 1 -ar 22050 output_file.mp3
to read the audio from stdin.
- Generating a podcast feed:
feedgen generates feeds nicely.
- Hosting the feed rss and MP3s:
Any http server will do. But I just have a docker container that mounts the folder feed RSS and MP3s and runs:
python3 -m http.server 9000
If there is enough interest, I could be convinced to generalize this enough for others to use.