Podcast: Download (Duration: 34:35 — 28.5MB)
Subscribe: Spotify | TuneIn | RSS | More
In mid-2019, I shared 9 Ways That Artificial Intelligence Might Disrupt Authors and Publishing, and one of those possible disruptions concerned voice technologies, which I also wrote about in Audio for Authors.
In 2020, we have seen an acceleration of AI with the release of GPT-3 for natural language processing and generation, as well as the development of ever more sophisticated voice recognition and creation. In this episode, Mark Leslie Lefebvre and I share a conversation between our Voice Doubles and our thoughts on the ramifications.
- Our Voice Double conversation
- Thoughts from some of my Patrons
- How we created the Voice Doubles using Descript with just 30 mins of training (reading a story)
- What we had to change and edit to get around a few of the pronunciation issues e.g. I could not get it to say ‘creatives' correctly so changed the word to ‘authors.' I also couldn't get it to say ‘mass-market' in my voice, so I changed that text as well.
- Our thoughts on the development of the tech and why we think it's important for authors and creatives to engage with the technology rather than avoid it
- A warning of deep fakes and why Kevin Kelly's 1000 True Fans is more important than ever
You can get your own Voice Double at Descript.com. You can find Mark at MarkLeslie.ca. Mark also recorded a special episode with more of his thoughts in episode 148 of Stark Reflections.
We'd love to know what you think so please leave a comment or tweet me @thecreativepenn and Mark @markleslie
Our Voice Double Original (Edited) Script
Jo: Hi, Mark.
Mark: Hey, Jo. How are you doing?
Jo: I’m good. So, how's lockdown where you are? How are things in Canada?
Mark: Lockdown has actually allowed me to discover new types of creativity in myself, where I seem to have prevented myself from writing prose. But I rediscovered the joy of doing parody videos and experimenting with different forms of creativity. So I have the energy inside me to tell story, to want to share and amuse and entertain, and I redirected it into a different output that satisfied that part of my soul that needs to write, and now I'm back writing again. But while I was struggling, it was really good to have that outlet.
Jo: I also struggled at the beginning, and I did a flurry of business activities. I made a couple of online courses, including one on turning what you know into an online course, which was, you know, useful and also…
Mark: Very meta?
Jo: Yes, very meta. And also made some money. So, that was really good because the first instinct is the survival instinct, which is, ‘I just need to make some more money in order to survive this, whatever this is.' And then there's that period of, ‘Okay. This isn't just over quickly, and the zombie apocalypse has not arrived, and now I need to finish that novel.' I did manage to change up my creative routine and to finish Map of the Impossible. But we are talking about artificial intelligence today because we are both enthusiasts but there are clearly positives and challenges. What do you think are some of the positive aspects that A.I. could bring to creatives?
Mark: I think it could help us with some of the processes that may be redundant, or take too much time. If we can leverage those tools for the things that will help us and free us up to do the more human creative stuff, that can be a really good thing. We can get more done. We can create more, we can produce more.
Jo: Yes, I agree that it's going to help us and I think A.I. as a tool is what we need to focus on. In the same way that we do research with the internet, we use Scrivener to write, we use Vellum for formatting, we use the internet and all the wonderful tools like Draft Two Digital and lots of other wonderful companies who help us publish. Without these tools, we would not be able to reach people with our books.
Think about it as similar to the internet. If we wind the clock back to the internet of 1986, or even 1995, we didn't know then what it was going to turn into. Now the internet is this wonderful, amazing, incredible place that we all spend a lot of time on and use to create, and learn, and entertain ourselves. And it's also the cesspool of humanity. So, we can use it either way! And that is the important thing.
Mark: Also, when the ebook came out, there was a lot of fear in the book industry about the ebook killing the print book. But it didn't. It only added to the possibilities of what a book could be. It expanded it and, like vinyl, the print book is still doing well, and also expanding in new formats. So I think that if we approach artificial intelligence with the same optimism, and yet with a bit of caution, we can use it as a tool that benefits us in the long run. I want to be part of the disruption, not be the disrupted.
Jo: Good slogan. You should put that on a t-shirt!
I think being part of the disruption is a really good way to look at it, and also, putting a positive spin on creativity. Creativity often involves some form of destruction, and we may have to destroy some of our own practices in order to move into the new way of doing things.
I think one of the biggest shakeups we need is a copyright law so that machine learning can incorporate modern bodies of creative work and still reward the creator. I think that's probably one of the biggest issues right now. In order to incorporate all the different voices in the world, we're going to need to train the big data algorithms on much more varied data, but again, we need creatives to be rewarded.
I would like to see an overhaul of copyright law for an era of artificial intelligence, so that we get some kind of micropayment for the use of training data, because GPT-3 is not plagiarizing. This is the huge shift, I think, because if GPT-3 is not plagiarizing — and we've seen a lot of examples of plagiarism in the author community — we're going to see a lot more books being produced by these A.I. content farms, where our work is used for training and is not plagiarized. That's why copyright law needs to be overhauled. Do you think the publishing industry is ready for this?
Mark: No. The publishing industry is not. They're playing ostrich, and are continuing to, even during this pandemic. They haven’t even fully embraced print on demand properly for distribution. I think blockchain and those technologies are going to be a critical component of ensuring that copyright can be protected in the future.
Jo: One of the other positives is that many authors don't enjoy the marketing side, and I have found, in particular with Amazon ads, where the author brand is very clear, the auto ads work and make money once they've optimized themselves. In that way, we are already using artificial intelligence and machine learning in order to advertise. That is hopefully a sign of what is to come around discoverability, which is one of the perennial issues for authors.
I would also like to see a more intelligent artificial intelligence that can discern the emotional resonance of a book and recommend other books that offer a similar emotional experience. We are missing that right now.
Mark: I agree and I am really disappointed because it was almost 10 years ago when I saw the first iteration of companies like Booklamp and books like The Bestseller Code. I'm disappointed that we haven't yet been able to leverage those tools, because rather than an Amazon marketing ad, where you're manually contriving stuff, manually manipulating things that may not be reality, this should go to the heart of how that book makes you feel. That could potentially reduce the gamification and the BS that we have to deal with now, where some books are well marketed but are not that good and books that are fantastic may be undiscovered.
Jo: To wrap this up, if people want to surf this change, rather than get drowned in it, what is your number one recommendation for authors?
Mark: Step back and look at artificial intelligence with your business hat on, rather than with your emotional writer hat. Our emotional writer hats can really mess us up and prevent us from understanding and embracing the technology. What do you think?
Jo: I come back to the importance of Kevin Kelly's 1000 True Fans. I don't believe there is a mass-market anymore, and it's going to get more and more and more fragmented, and that's okay. We need to be okay with living in the long tail, and writing what we love and selling books and exciting readers in the little area that we write in. Double down on being human. We can't beat the machine, but we can work with the machine to create something more exciting for our collective future.
Mark: That's beautiful.
Thoughts from some of my Patrons
I released the Voice Double portion early to my Patrons and here are a selection of responses:
Thea: It is obvious that it isn’t a human, but is definitely very recognizable as you – a bit monotone, and with less emotion and laughing. Can you teach it to laugh like you?
Anita: This is still a bit robotic due to the pauses, but I think it is remarkably good. I could see applications of this in the AI reading articles or books written that don't have an audio version readily available. To hear it in the author's own voice would be cool.
Gerry: It was the uniform pace that made it sound mechanical. [Note from Joanna: I re-edited it to vary the pacing so hopefully this is now improved.] I'd probably be OK with this for a business book, but no way would I want it to narrate fiction when the emotional nuance is important. Still, it is astonishing what this enables. I feel as though it's like the early scanned copies that came out as ebooks. The words are there but the reading (listening) experience is diminished.
Kim: That was very impressive. Yes it sounded robotic, but it didn't sound like a robotic robot, it sounded more like when we say to a real person, perhaps someone doing a stage read who is feeling very self-conscious – try and relax, you are sounding robotic. Do you know what I mean? It was very believable as being you and Mark, it was definitely your voices, and there was more inflection than I expected. What it lacked was warmth and the humanity that lurks in a giggle.
Jeff from the Big Gay Fiction Podcast: It's amazing that we only found Descript and saw Overdub a year ago at Podcast Movement. Will and I got our voice AIs done as well. I'm super happy with our voices and the flexibility it gives us. I also used my AI to do voiceover for some training videos I needed to do for my day job. I didn't tell my colleagues I'd done this until after they'd watched and approved the voiceover. They were stunned it was not the real me. It makes recording the VO for these videos much easier in terms of maintaining recording levels and ease of editing script.
Diana: There is still a long way I think until voice doubles can get all the nuances right to trick us, but I think for some areas they don’t even have to. We will know that it’s AI and we will accept that anyway.
[Note from Joanna: This is key. We will know that it's A.I. and we will accept it anyway.]
Transcript of the real Mark and Joanna discussing the process and ramifications
Joanna Penn: Mark. We did our recording.
Mark Leslie Lefebvre: Yeah, that was interesting to see.
Joanna Penn: Honestly, what did you think of your voice?
Mark Leslie Lefebvre: So I think my voice still sounds a little stilted and awkward. And yours sounds really well done. It’s almost as if you did way better training and I’m not sure what that was because the training was the same. Did you do the full 40 minutes?
Joanna Penn: 30 minutes. Let’s explain to everyone listening. So basically we both have Descript.com, which is basically you can download the software, and then they have an Overdub tutorial, which is, you essentially have to read the Wizard of Oz. It tells you that you’ve recorded five minutes, 10 minutes etc.
Mark Leslie Lefebvre: Wait. You got The Wizard of Oz?
Joanna Penn: Yes. What did you get?
Mark Leslie Lefebvre: I got this classic, like, 1800s snowman story about this little snow girl that these kids built and it’s from Quebec. Like it was like an old classic Canadian story. I had to read the whole story and it was fascinating because I’d never heard it before. I have no idea who wrote it. It’s obviously public domain.
Joanna Penn: That’s interesting. So we both got different stories and I actually found the Wizard of Oz slightly difficult, because as you say, it’s definitely all the language and it’s not a very good story. There wasn’t actually much dialogue and so when I was reading it, it’s funny that you say that my voice sounds better. I thought my voice was very good. I thought the most stilted bit was our greetings, but the longer sections we’re actually pretty good in some parts. That actually sounds just like me because it’s my words as well.
But in doing the training, I probably overacted. So maybe that’s what you did wrong?
Mark Leslie Lefebvre: I did, too. Cause there were voices and I did the little boy’s voice and the little girl’s voice and the mom’s voice and the dad’s voice.
Joanna Penn: Okay. Okay, you did too much then. Cause it’s still meant to be your voice? I didn’t do special voices. I didn’t put my voice up a pitch or down a pitch. I just read it, but with feeling. so I think maybe you should redo your data and don’t do little girls and guys voices?
Mark Leslie Lefebvre: Just because you don’t like my little boy voice!
Joanna Penn: That’s the thing, it’s training your voice as Mark, not your ability to do different voices.
Mark Leslie Lefebvre: I just changed the inflection a little bit and raised it a tiny bit. But yeah, you’re right. That probably messed it up.
Joanna Penn: Well, it might’ve done, but equally. I mean, this is just to be clear to people, this was the third iteration for both of us, right? But the difference was that the last we were in the beta section before, and they kind of did it especially for us and used our voices in training sessions. And now it’s open to everyone. Get Descript, I think you can even get it free for 15 day, 14 days or whatever it is, and you can try it out.
And then if you want to keep using the voice and it can only be your voice, just to be clear, we both created in sections and then you edited it together. It’s not like I got your voice and made up your words and put it with mine. That’s not possible yet.
Did you get Liz to listen to it or has anyone else heard yours yet?
Mark Leslie Lefebvre: I was telling her about it last night at dinner, but I haven’t shared it yet. I think some of the things that I thought were weird were when you were saying something and I jokingly said ‘very meta.' I couldn’t figure out, because again, sometimes you have to spell it out so that the computer would read it properly like LaFave.
I didn’t use it there, but in previous iterations, I’ve had to spell it so that it sounds different, which is what it would come out. Like most people. That was interesting. But the previous times I was actually having a conversation with Liz when she was in the other room and I was typing it in and Overdub was responding to her and she thought she was talking to me.
Joanna Penn: Wow. Okay. I got Jonathan to listen to our conversation and he was like, yes, that was really weird. He’s met you and spoken to you and heard you on podcasts and things. So he knows your voice and I think it’s quite stunning. Like I think it’s probably, I would say with my voice, it’s probably 80% now. What do you think you’re working with?
Mark Leslie Lefebvre: Yeah. You’re at 80%. Maybe I’m at 70. Hmm. It’s interesting though.
Joanna Penn: I think listen a bit longer, like if you listened to the whole thing and you settle into it, like Jonathan said, ‘Oh, it’s like, there was a bad connection on Skype.'
Mark Leslie Lefebvre: Exactly. And we’re all used to that right now.
Joanna Penn: I guess the point is that we did it, and you mentioned a few difficult words there. The trick on the AI thing is, cause I normally just write it Capital A capital I is that they were meant to be dots between a dot I dot or period, or full stop, whatever you’d say in your country. I still can’t get it to say creative. So I normally say ‘hello, creatives.'
Mark Leslie Lefebvre: I was wondering why you altered that because — so we, we recorded our own conversation.
Joanna Penn: Right.
Mark Leslie Lefebvre: And then you transcribed it. Or had a system transcribe it, and then we tweaked it a little bit because there were some awkward things that we had said. Then you fed your lines into, into Descript and I did the same thing and then I patched them together like that. That was the overall process.
Joanna Penn: Exactly. And as you say, it says creativity, but it doesn’t say creatives. This is a big thing that I say and I can’t use that. So I changed it. This is a bug I picked this bug up before that it can say creativity but not creative. So really interesting. And then also it wouldn’t say ‘mass markets,' it kept going maass.
Mark Leslie Lefebvre: I was following the script while I was listening to the voices to make sure I could patch it together. Then as I was following, when I went Oh, there that she tweaked that obviously there was a problem with that.
Joanna Penn: Isn’t that interesting because if we think about moving it forward into the future, both of us have talked about writing for audio and writing for a narrator. What we were doing there is tweaking our language for an A.I. narrator. I also put more punctuation in, so I might have done a dash for a longer breath or a space to separate the text more. So there was a longer pause. Did you do that?
Mark Leslie Lefebvre: I did a little bit of that. You copy and paste or type it in, and then it tells you how many minutes it’s gonna take to do it. And it doesn’t really take that long. And then I listened to it and then I adjusted, right? You’re going to tweak this.
I’m going to put a pause here or I’m going to change the spelling. Even with the ‘very meta,' I tried about six different ways. I think I spelled Metta differently. And then I put an exclamation point. Then I tried it with a question mark. The intonation of my voice just didn’t work. I was almost tempted to do my real voice in there. Sneak it in.
Joanna Penn: No, that’s cheating!
Mark Leslie Lefebvre: I think I changed it to met—a.
Joanna Penn: This is what’s interesting, right? Because I have rewritten my own writing for my own narration. And this felt like, Oh, I’m adjusting the way I see things in order to fit the voice double and the little quirks of it, which is again, really, really interesting.
In fact, when you work with a professional moderator — so I’ve worked with American narrators — and I’ve had to ‘adjust' their pronunciations of British things. So that is not exactly an unusual thing. It’s just that we did it with this Voice Double. It’s funny when I think about voice and I think about altering the way —
Mark Leslie Lefebvre: You ask for things, right. When you’re communicating with Alexa or Google, right, you, you can’t just say it in normal human talk, you have to begin with, ‘Hey X,' I don’t want to trigger it my system. Play this or whatever. So we’re already adjusting in other ways with how we interact with AI.
Joanna Penn: This is also interesting because both you and I have talked about what we’re excited about because even in our AI discussion previously about how we’re excited about maybe doing audio dramas, and we’ve just given an example of putting two voices together in a conversation.
We’ve created something that it’s definitely not audiobook ready or audio drama ready, but it’s still interesting. So what do you think in terms of a voice market and this kind of thing being much more mainstream? What date should we say this will happen?
Mark Leslie Lefebvre: I’m thinking by 2021, only because you and I have seen these technologies skyrocket in the past couple of years, right? From the very first iteration to where we are now is phenomenal.
Joanna Penn: Just to be clear. We’re recording this in the middle of August 2020. So you reckon within six to 12 months. I think in 18 months we’ll probably be there.
Mark Leslie Lefebvre: I’m optimistic about that, but I’m also concerned because you and I obviously want to be in on the ground floor and checking it out in the early days. I’ve been fascinated for years with this, but we want to make sure that there’s some control, right?
Joanna Penn: Yes, so it’s not pure chaos. Absolutely. Both of us have a chance to put these voices out in the wider sense, and neither of us are going to at this point in time. I definitely feel like I want to control my voice will be used given how good it is already.
I think it does sound like me. I’m actually going to record something and send it to my mum and say, ‘Mum, this is not me speaking.' I want her to be aware of the potential. I think this is the reason you and I also get into this is because if we don’t engage with the technology and deny its existence and just say, Oh, that won’t affect us. It’s fine. Then this rise of deep fakes of which it is incredible at this point already, you know, people have to be aware that this stuff can be faked. And I feel like, you know, older people particularly, but let’s face it. We’re older people!
Mark Leslie Lefebvre: Me more than you, of course.
Joanna Penn: I think younger people are aware – your son probably totally understands about deep fakes and understands that these things can happen, but it may be, you know, the teachers at Liz's school or some of the authors we meet who were like, ‘no way, AI will never be good enough.' But I think as you say, I’m probably more going to say 18 months and I don’t normally say longer period, but I’m picking 2022 for when this becomes more mainstream and that maybe we have voice markets.
Maybe we have more actual AI narration of more stuff. I mean, obviously that’s available now but it will become much more common.
Mark Leslie Lefebvre: I plan on experimenting with this. I mean, I self-narrated, The 7 P’s of Publishing Success, because it’s relatively short. It’s only about 14,000 words. And then I had Jim Kukral’s company do ‘Brian British' male voice for 99 cents. And I’m thinking, well, why don’t I do a fake Mark for 99 cents as well? So you can have the cheap, fake voices for a good price or you pay the full 6.99 if you want my book.
Here’s something that fascinates me, cause I’m sure you’ve thought of this as well, Okay, using my fake voice, how could someone really get me in trouble? By having a recording of me saying something that’s completely not my character or incriminates me about something, right? Like it could be a personal matter where I leave a message. It’s my voice, leaving a message on Liz’s cell phone and saying, Hey, I just slept with a hooker or whatever the thing is, or I’m leaving you or whatever, though, it's likely to really mess up someone’s life.
Joanna Penn: Or someone leaves my voice on your phone saying ‘thanks for last night, Mark!'
Mark Leslie Lefebvre: Exactly. Both Jonathan and Liz get a hold of this and go, wait, what the heck?
Joanna Penn: We live on the opposite sides of the ocean.
Mark Leslie Lefebvre: Yeah. Fortunately, we’re now all separated!
Joanna Penn: This is what I mean by the deep fake and the need to be aware of it. Because obviously this has already happened in the porn industry, lots of actresses, and I’m sure actors have also been put in these situations where they weren’t doing those things. They’ve been faked on video. We’ve seen other fake videos (like Trump, Obama, Boris Johnson etc) and now we can do these fake voices.
But it’s not just famous people anymore. It’s people like us. So. We want people to be aware.
I think that’s the thing with AI. Isn’t it? It’s this double-edged sword, like the internet. It can be amazing or it can be terrible and we need to be aware of both.
Mark Leslie Lefebvre: For sure. So again, I think we’re both cautiously optimistic.
Joanna Penn: Yes. What it comes down to you as my AI voice said, is a trusted brand. Hopefully, you and I have a clear enough brand amongst the people who know us online, that if I heard a recording of you saying something that I thought ‘that’s not Mark, that’s not what Mark would say.'
I hope that as a friend, I would contact you and say, ‘Hey, just to let you,' just check in a bit, like, people email me all the time and say, Oh, Hey, I’ve seen your book on one of these pirate sites. And I’m like, okay, great. Sometimes I do a takedown and sometimes like, just ignore it.
But in the same way, I hope that people would email us and say, Oh, hey, I heard your voice on this advert saying this particular software is the best thing ever and I should buy it. Just checking.
Mark Leslie Lefebvre: Do you mean that, is that you or fake Facebook profiles? For example, where I have somebody reach out who’s already a friend and I like to play with them a bit and ask them something that’s wrong so that they respond in the positive. And I know for sure it’s not them. It’s that kind of thing.
Joanna Penn: Right.
Mark Leslie Lefebvre: Then you check with your friend and say, Hey. Check your account, something might’ve been hacked because they’re using your picture and your likeness to pretend to be you, right. It’s that courtesy we have with our friends and the people that we know and trust.
Joanna Penn: Yes, exactly. So I think, again, the message for everyone listening is to be aware of what’s happening and use these things in a positive way. And also, as we’ve talked about, it’s some kind of licensing and copyright, and let's hope the blockchain speeds up a bit. And that we get things in place that can at least protect this new Wild West.
Because it is like the Wild West right now, and there’s not enough protection around this stuff. We live in interesting times!
Mark Leslie Lefebvre: Here’s what I’m thinking next level. Right? So the AI of the voice, but then the AI of the voice of a writer, which I know you’ve talked about a lot on your podcast is, imagine that there’s enough of our voices as personalities and the things we’ve said on our podcast and in public speaking. Then they use our fake voices and a completely fake conversation between us. It’s like, well, that kind of sounds like what Joanna would say. And it sounds like Joanna, do you know what I mean? Like the double layer? Well, that’s basically what we’ve done.
Joanna Penn: We recorded ourselves chatting and then we got the AI to say that. So the words resonate with our brand.
Mark Leslie Lefebvre: But what I’m saying is imagine that the whole conversation was as if Mark and Joe talked about this. Just like it generates speech based on the JF Penn for fiction, right. In the style of that. That’s what I was wondering. That kind of thing.
Joanna Penn: I absolutely think that with GPT-3, for example, which has come out in recent weeks, that is definitely going to be possible, which is why this all certainly seems quite real and far more real than it has done, even for me and I’ve been thinking about this for years.
I do think that the pandemic is accelerating these technologies at incredible rates, and that’s why we have to be aware. So everyone, balance the good and the bad and let’s try and make the good, the bigger, bigger pot.
Mark Leslie Lefebvre: That’s right because with great power comes great responsibility. I can’t remember who said that but …
Joanna Penn: Haven’t you got a t-shirt with that on it or something?
Mark Leslie Lefebvre: I’m wearing a Spiderman t-shirt as we’re recording this!
Joanna Penn: You love Spider-Man.
Mark Leslie Lefebvre: I do.
Joanna Penn: Okay, Mark. Well, thank you so much for this. This has been fun.
Mark Leslie Lefebvre: It has been. Thank you, Jo.
We'd love to know your thoughts on this topic, so please leave a comment!
Mark Leslie Lefebvre says
So cool. And so glad we did this. (I loved seeing those pics of us from three different countries. That was neat. Did we get more than that silly on-stage selfie I took during a panel when we saw one another in Vegas?) 😉
Calvin Jim says
Hi Joanna,
This is the first time I’ve commented or even interacted with you. I really liked listening to your show on AI voices. The weird thing is, it was extremely easy to tell both your voices were computerized. Your voice had a computerized echo to it. And Mark is someone I know from Canadian writing conventions and his podcast. His voice immediately sounded artificial because it lacked his normal cadence. Plus, there was some phraseology that they used that didn’t quite fit. Ah well. It is not precisely there yet. It made me wonder that if you know someone and how they talk, you could tell the difference almost immediately.
Keep up the great show.
Joanna Penn says
Thanks, Calvin! I’m glad you found it interesting.
Gary Townsend says
Meh. Did not care at all for the AI versions. Both sounded … too bland, too … hehehe … *artificial.* Very monotone.
Alicia says
I really liked that you (and the AI voices) considered both the positive and negative implications of AI voices instead of only mentioning the positive ones. I have watched a few videos discussing exciting new technological and medical advances and possibilities, and the people in the videos, as far as I remember, did not seem to have ever asked themselves what could go wrong with the new technology.
Good job thinking about everything.
Joanna Penn says
Thanks! Like the internet, AI is a true double-edged sword. I want us to be on the side of good!