Lingthusiasm Episode 40: Making machines learn language - Interview with Janelle Shane

If you feed a computer enough ice cream flavours or pictures annotated with whether they contain giraffes, the hope is that the computer may eventually learn how to do these things for itself: to generate new potential ice cream flavours or identify the giraffehood status of new photographs. But it’s not necessarily that easy, and the mistakes that machines make when doing relatively silly tasks like ice cream naming or giraffe identification can illuminate how artificial intelligence works when doing more serious tasks as well.
In this episode, your hosts Gretchen McCulloch and Lauren Gawne interview Dr Janelle Shane, author of You Look Like A Thing And I Love You and person who makes AI do delightfully weird experiments on her blog and twitter feed. We talk about how AI “sees” language, what the process of creating AI humour is like (hint: it needs a lot of human help to curate the best examples), and ethical issues around trusting algorithms.  
Finally, Janelle helped us turn one of the big neural nets on our own 70+ transcripts of Lingthusiasm episodes, to find out what Lingthusiasm would sound like if Lauren and Gretchen were replaced by robots! This part got so long and funny that we made it into a whole episode on its own, which is technically the February bonus episode, but we didn’t want to make you wait to hear it, so we’ve made it available right now! This bonus episode includes a more detailed walkthrough with Janelle of how she generated the Robo-Lingthusiasm transcripts, and live-action reading of some of our favourite Robo-Lauren and Robo-Gretchen moments.
Support Lingthusiasm on Patreon to gain access to the Robo-Lingthusiasm episode and 35 previous bonus episodes.
Also for our patrons, we’ve made a Lingthusiasm Discord server – a private chatroom for Lingthusiasm patrons! Chat about the latest Lingthusiasm episode, share other interesting linguistics links, and geek out with other linguistics fans. (We even made a channel where you can practice typing in the International Phonetic Alphabet, if that appeals to you!)
Here are the links mentioned in this episode:
You can listen to this episode via, Soundcloud, RSS, Apple Podcasts/iTunes, Spotify, YouTube, or wherever you get your podcasts. You can also download an mp3 via the Soundcloud page for offline listening, and stay tuned for a transcript of this episode on the Lingthusiasm website. To received an email whenever a new episode drops, sign up for the Lingthusiasm mailing list.
You can help keep Lingthusiasm advertising-free by supporting our Patreon. Being a patron gives you access to bonus content and lets you help decide on Lingthusiasm topics.
Lingthusiasm is on Facebook, Tumblr, Instagram, Pinterest, and Twitter.
Email us at contact [at] lingthusiasm [dot] com
Gretchen is on Twitter as @GretchenAMcC and blogs at All Things Linguistic.
Lauren is on Twitter as @superlinguo and blogs at Superlinguo.
Lingthusiasm is created by Gretchen McCulloch and Lauren Gawne. Our senior producer is Claire Gawne, our editorial producer is Sarah Dopierala, and our music is ‘Ancient City’ by The Triangles.

I had SO MUCH FUN on this episode. Seriously I don’t think I’ve ever laughed this hard on a podcast.

The icing on the cake was when I got to train the GPT-2 neural net on their past show transcripts, and when Lauren and Gretchen acted the hell out of the resulting script.

And Robo-Lingthusiasm was trying SO HARD to be enthusiastic about linguistics. And in classic neural net fashion it did produce transcripts that read a lot like real Lingthusiasm episodes. It was just also very wrong about linguistics. And about everything else.

Gretchen: I’m Gretchen Gawne, and I’m here today in Gretchen’s hometown of Edinburgh, Scotland as Gretchen McCulloch, and today we’re getting enthusiastic about our first anniversary.
Lauren: We’re very excited to be standing next to you in a very celebratory and beautiful, pink, velour, mini-van-sized van.

(Gawne is Lauren’s last name, not Gretchen’s, Gretchen McCulloch’s hometown is not Edinburgh, and they do not to my knowledge own a pink velour van, nor are they standing next to me in it)

In order to get a transcript that sounded like their show (rather than the generic internet text that GPT-2 was originally trained on) I had to turn the model’s creativity way down, which meant we got things like this soup loop:

Gretchen: How do we spell ‘soup?’?
Lauren: I had to say ‘soup.’ in the first ten or so minutes of this episode because I was like, ‘Oh, my gosh, I just can’t spell ‘soup.’”
Gretchen: ‘Oh, my gosh, I just can’t spell ‘soup.’”
Lauren: ‘Oh, my gosh, I just cannot spell ‘soup.’”
Gretchen: ‘Oh, my gosh, I just cannot spell ‘soup.’”
Lauren: ‘Oh, my gosh, I do not know how to spell ‘soup.”
Gretchen: I know how to spell ‘soup.’.
Lauren: ‘Oh, my gosh, I have never been able to spell ‘soup.’”

Gretchen: Yeah, and I looked at my watch, and I could see the words on it glinting in the light, and it was “soup.”
Lauren: “Soup.”
Gretchen: And I was like, ‘Oh, my god, I can’t even spell ‘soup.’”
Lauren: I was like, ‘Oh, I can sort of spell it, but I’m not able to spell ‘soup.’”

Robo-Lingthusiam tended to stick with topics like spelling and nouns vs verbs, which are not things the actual Lingthusiasm podcast spends much time on, but maybe it was easier for the neural net because it had seen more of that kind of stuff online. It’s a really vivid illustration of how AI can be overcautious and ignore a lot of the richness and complexity of its training data in favor of being as correct as possible.

I’m posting a couple of more transcripts as bonus material. You should definitely listen to my interview on Lingthusiasm, where I get into lots of detail about training AI on language. If you want to hear almost an hour’s worth of Gretchen and Lauren reading the soup loop and other Robo-Lingthusiasm excerpts, become a patron of Lingthusiasm (it’s worth it, it’s such an interesting show).

Subscribe now