Author’s note: Although it wasn’t a requirement of LRNT524 to publish this discussion as a blog post, I rather enjoyed reasearching and writing about this topic. I look forward to your feedback if you happen upon this post! AES
I first encountered AI-generated voice-over technology in early-2020 when my learning team had a new leader join with a mandate to modernize our corporate learning and development team and make our team’s processes as efficient and results-driven as possible. I was already creating digital content for our Learning Management System using eLearning authoring tools such as Articulate Storyline and screen-capture technology like TechSmith’s Snagit, but the Trainer-turned-Instructional-Designer in me had always self-recording voice-over audio to accompany this learning content. My years of experience facilitating classroom discussions had proven to me that tone of voice, intonation, and expression went a long way in engaging learners so it seemed like a small investment in time to record voice-over audio for these modules.
When my new leader joined our team she challenged this time investment, asking us instead to use the AI-driven text-to-voice generators built into some of our tools (namely Articulate 360 when developing eLearning modules and Vyond for animations). I complied with the request, but found that I would often spend the same amount of time adjusting the voice-over texts to ensure certain words (such as branded company or product names) were pronounced correctly. Furthermore, learner surveys always commented on the robotic nature of the voices, asking if we could revert to the recorded voice-over. Up until recently I’ve been steadfast in my opinion that AI-generated text-to-voice was far inferior to recorded voice-over audio, and when the previously–mentioned leader left the organization I was quick to switch back to recorded scripts.
Like other instructional designers I’ve noticed the quality of AI-generated text-to-voice tools improving in quality over the last few years, but I know my own prejudices have made me cautious about using them except in the most time-crunched projects where the speed of these tools makes them a valuable option But, I was still cautious that learners would be more focused on determining whether the voice-over was computer-generated or created by a person than they would be on the actual content of the learning. This was even reinforced in one of our previous courses of this program (E. Childs, LRNT523 discussion board post, September 20, 2023) when our instructor drew our attention to Mayer’s 12 Principles of Multimedia Learning, namely principle ten which proposes “people learn better when real presenters rather than machines make voice overs” (Mayer, as cited in Mayer’s 12 Principles of Multimedia Learning, 2023). While the quality of some early (and even recent! text-to-voice generators are acceptable, one can often tell they are computer-generated and can make one feel unsettled, so this seemed to resolve my mind that human-created voice-over was far superior.
Researching this topic further for this blog post has, however, caused me to question my bias towards human-generated voice-over audio in learning. A study performed by Kit et al. (2022) gauged students’ perception to AI-generated voice-over audio in explainer videos used at Taylor University. This study found that students did not have a negative impression of AI voices in educational videos when compared to those with human, voices, as long as the AI-generation was “sufficiently human-like” (p. 89), meaning there is a lot of promise for AI-generated voice-over as the technology continues to become refined. Specifically this study referred to the “Uncanny Valley (UV) effect”, or the extent to which an AI-generated voice leaves the audience unsettled, searching for signs of whether or not it is human-generated (p. 80). Another recent study, performed at a University in China with students learning English language terminology, demonstrated equal levels of learning retention between learning videos with human voices and those with AI-generated voice-over, but only when the voice-over (and, in the case of this study, AI-generated images of the instructor) were sophisticated enough to be a reasonable likeness of a human (Pi et al., 2022, p. 9). While both these studies did note there was an opportunity for further investigation due to their smaller sample size, I think this is evidence enough for me to continue pondering my own bias and maybe give AI-generated voice-over audio another chance.
While we don’t have enough evidence to determine whether or not AI-generated content (such as voice-over) is a net benefit to learning outcomes, I do think we can say it has the opportunity to save educators and instructional designers a lot of development time, which can mean they have more time to include even more media-rich resource in their courses. Anecdotally a script the size of this discussion post might take me 2 or 3 times to record, plus time to edit it would be a 20-minute task investment. With a text-to-voice generator the audio would be created in less than 30 seconds. Along with advancements in technology meaning the quality of audio in these tools is only going to increase in the near future, AI-generated voice-over technology could provide a meaningful solution to creating media-rich video learning content quickly.
References
Mayer’s 12 Principles of Multimedia Learning. (2023, July 18). Digital Learning Institute. https://www.digitallearninginstitute.com/blog/mayers-principles-multimedia-learning
Kit, L. W., Yuin-Y, C., Zulkifli, B., & Nie, K. S. (2022). Perception of university students towards the use of artificial intelligence-generated voice in explainer videos. 77–89. https://fslmjournals.taylors.edu.my/wp-content/uploads/SEARCH/SEARCH-2023-Special-Issue-SEARCH-Conf2022/SEARCH-2023-P6-15-SEARCHConf2022.pdf
Pi, Z., Deng, L., Wang, X., Guo, P., Xu, T., & Zhou, Y. (2022). The influences of a virtual instructor’s voice and appearance on learning from video lectures – Pi – 2022 – Journal of Computer Assisted Learning – Wiley Online Library. https://onlinelibrary.wiley.com/doi/abs/10.1111/jcal.12704