Let an AI generate your next lecture or speech for you
Publishing a fine-tuned GPT-2 model with keyword-based, conditional, generative capabilities

It’s a crowd-sourced software project that tries to bring together ideas from around society with better technology for delivering them in their proper context rather like an orchestra playing Mozart on its cello instead of PowerPoint moving through time using augmented reality as part’toward being able interact fully within this stage where we can ask questions directly without having our eyes be trained off other speakers’ screens.(Applause)I want you all here tonight — (Laughter)(Distorted guitar music ends)[Music]So I’m going back over my slides so far; they’re pretty good but not quite up there yet! (Cheers).
Did you find out that I started this post with help from my new AI Model? Probably, in some way. But still, look at that flow of ideas for the introductory sentence!
This project started for me because I wanted to learn more about creative solutions for OpenAI’s GPT-2 Model, which GPT-3 is definitely more capable and more documented for especially if you want your sentences conditionally generated as I wanted. Then I researched and stumbled upon this awesome tutorial (his blog), which made my life, with respect to the often very frustrating way of implementing a custom approach for GPT, a lot easier.
Data
The first and main goal was to give the best speakers your voice, so what else would be better than to use the interesting, transcribed speeches from the TED-Talks Community? My data for this project was collected from multiple sources, mainly this GitHub Repository and this Kaggle Dataset (recently a few similar datasets were also added to the new HuggingFace Datasets Platform, which might be worth checking out), where the first one already had the speeches aligned with some keywords and for the second part, they had the transcripts already in place. I had not to web scrape anything, yay! :)
Further on I split the data into sentence pairs of three and removed the noise by using a relevance-checking algorithm for the provided keywords. Then I added a few dozen keywords by hand while creating the rest with the NLP-Library (i gave up the approach by hand happily). I also used the first 3 sentences of every lecture for the introduction sentence every lecturer needs. The overall result was a Set with one more general topic depicting the sentences and up to three more specific keywords (the hard facts) directly from the sentences.
tl;dr Created a Set with one more general topic depicting the sentences and up to three more specific keywords (the hard facts) directly from the sentences.
How can I use it?
I’m a big fan of the “hugging-face” infrastructure, so I uploaded the full model directory there. If you’re unfamiliar with it, then here’s my direct way of using it:
Here is the external gist of the requirements and imports.
And more importantly here is the model inference:
You can replace the variables, namely title and keywords, with anything you would like. Just make sure to use the same schema for the keywords. So, one main topic (can be “Introduction” s.o., or anything else) separated by punctuation, followed by a comma-separated collection of more specific keywords (choose up to three).
And there you go! That's it. Now you should see a transcript for your next presentation.