Debrief is a personalized podcast. I combine saved sites, top articles from your favorite online publications, and scraped articles based on your areas of interest into a curated feed of summaries. You just tell Debrief a little about what you’re interested in and it will give you a feed to listen to on the way to work, while you’re working out, or as you go to bed or brush your teeth.
Debrief was powered by a react Next.js frontend talking to a python FastAPI backend with a MongoDB database. In conjunction with my backend server, I had a few scripts that ran a pipeline that generated relevant user articles.
Users would write down a phrase describing an area of their interest. Then GPT 3.5 would be prompted to generate google search queries to get news on the topic of the phrase. These queries would get passed through google’s search API to get a collection of URLs.
These URLs would be web scraped and the HTML would be fed to GPT to summarize. Finally, the summary would be turned to audio via Google’s TTS API. The audio would be stored in S3 to serve on the front end and the text summary along with the URL of the audio in S3 would get committed to the MongoDB.
I tried to learn from my lessons with previous projects in two key ways:
In working on Debrief, the biggest priority I had was making really rich user telemetry and then optimizing the experience to improve those metrics. I built in a range of user tracking features to monitor how people interacted with the articles they were listening to, then I continuously tried to refine my “interests” recommendation algorithm to improve the initial user experience.
What I found was intuitive, yet critical. Whether a user would use the site again was entirely determined by how relevant the first, second, and third article was. But after the initial articles you listened to, there was exponential decay. It didn’t matter if the 10th article was relevant or completely tangential because if articles 1, 2, and 3 were relevant, you were willing to trust the system and skip to article 11. However, if 1 or 2 was off, you never stuck around long enough to try more.
Ultimately, more so than the interface or summarization, I found the article sourcing and web crawling was the most important step to flesh out. Summarizing, TTS, and the frontend were well-defined tasks. However, the quality of the user experience entirely hinged on the quality of the sites gathered. My GPT search query algorithm worked well for mainstream interests, but it struggled to get the most up-to-date information on more niche phrases. To better improve this model, one must dig deep into the metadata of all of these sites to design a search engine custom-tailored to the users' needs...
There are multi-billion dollar companies that do that, so I decided to leave that part to them. For this project, I was happy that I had a great audio player that could give me descriptive blurbs on relevant tech news. Listening to the summaries in the car to work was a really fulfilling solution to my own user needs.