Job Description
Would like to build an automated process / workflow around summarizing podcast data.
See below details which were found from the inspiration/video of someone else who built this.
Details
Here’s a summary of the YouTube podcast “How to digest 36 weekly podcasts without spending 36 hours listening | Tomasz Tunguz”:
Overview:
Tomasz Tunguz, founder of Theory Ventures, describes his custom-built “Parakeet Podcast Processor,” a terminal-based workflow that lets him extract key insights from 36 weekly podcasts without having to listen to them all in real time.
How the System Works:
Tomasz built a pipeline that:
- Downloads daily podcast files.
- Transcribes audio to text using models like OpenAI’s Whisper and Nvidia’s Parakeet.
- Cleans up transcripts (removes filler words, ensures clarity) using models like Gemma 3.
- Stores transcripts in a local DuckDB database.
- Processes transcripts daily to:
- Create topic and theme summaries
- Extract notable quotes and actionable investment theses (e.g., spotting trends or potential AI tools to invest in for venture capital work)
- Identify company mentions for CRM enrichment and produce tweet/blog post suggestions using AI prompts
Benefits and Customization:
The system is highly hyper-personalized, built in the terminal for:
- Maximum speed and minimal UI friction (Tomasz notes lower latency and ability to script/automate tasks easily)
- End-to-end control by the user, allowing rapid adjustments for new needs
Writing Workflow:
He uses his transcript digest to:
- Extract blog post ideas/topics
- Generate blog post drafts through an AI pipeline, referencing his own previous 2,000 blog posts for style
- Employs an “AP English teacher” grading prompt—blog posts are iteratively improved until the AI gives them an A- or higher, focusing on the hook and conclusion
Tomasz notes that AI struggles to fully match personal writing style or for very short forms (like tweets), but is helpful for overcoming writer’s block and providing structure
AI Tools & Techniques:
- Uses a variety of AI models (OpenAI, Claude, Gemini, Parakeet) and has them “duel”—meaning he’ll have AIs critique each other’s work (“mean girls” technique) to improve quality
- Finds more powerful LLMs outperform classic NLP tools (Stanford NER, etc.) for tasks like company extraction from transcripts
Takeaways:
- Building your own AI-powered pipeline allows tailored insight extraction at scale—especially powerful for knowledge workers or investors monitoring vast content streams
- Off-the-shelf products can’t match the flexibility, integration, or speed of highly-custom terminal solutions, especially when automating multi-step tasks
- AI feedback loops for writing (grading drafts, keeping essential human quirks, iterative improvement) can help professionals scale quality content creation
Final Notes:
Tomasz Tunguz downloads daily podcast files as part of his custom terminal-based workflow, which he calls the “Parakeet Podcast Processor.” Here’s how he does it:
- He maintains a list of 36 podcast RSS feeds that he wants to keep up with.
- Each day, his system goes through each feed, downloads the new podcast episode files, and saves them locally.
- The system is mostly comprised of custom scripts that run on his local computer (often via the terminal for speed and automation).
- The process uses tools like FFmpeg for audio file handling and conversion.
- After downloading, the workflow sends the audio files through transcription models (initially OpenAI’s Whisper, later Nvidia’s Parakeet), and the transcripts are further processed for summaries and insights.
In summary:
- Feeds are checked daily by his script.
- Audio files are downloaded and stored locally in a database (DuckDB).
- The rest of his workflow kicks in to handle transcription, cleaning, and analysis.
This approach lets Tomasz automatically get the latest content from multiple podcasts each day without manual downloads, using RSS automation and scripting.