The Effects of Audio on Cognitive Load in Immersive Virtual Reality: Selecting a Theoretical Framework

Updated – November 28, 2021 at 8:09 pm

My research topic has changed quite a bit since the onset of my applied research project. What began as an inquiry into interactivity in multimedia learning quickly shifted into a full-scale investigation into the effects of audio on learning in immersive virtual reality (IVR). Since this article is about selecting a theoretical framework for my research, I will spare the unnecessary detail over why I shifted my research direction. With that said, I feel it necessary to provide a brief overview of my research topic to inject some context around my choice of using the Cognitive Load Theory (CLT) as a theoretical framework to underpin my research. 

Research Topic Overview 

IVR simulates “real and imagined worlds” (p. p.938) by creating multisensory experiences where learners feel as if they are “present” in their virtual surroundings (Makransky & Peterson, 2021). The more realism the learner perceives in the IVR world, the more they become immersed in the learning experience, commonly referred to as “immersion” (Ghinea et al., 2012, p. xv). As is produced by head-mounted IVR technologies like the Oculus Quest 2, high-level immersion is a necessary condition of presence, a crucial influencer of learning in IVR (Brinkman et al., 2015; Haung et al., 2020). Accordingly, researchers devote plenty of resources to understanding the connections between instructional design, immersion, and learning in IVR. One major consensus is that high-level immersion comes with a price: Increased extraneous load (Albus et al., 2021; Mayer, 2021; Makransky, et al., 2019; Richards & Taylor, 2015). Research suggests that the very elements used to create awe-inspiring IVR worlds distract the learner’s attention away from their primary learning tasks (Kern & Ellermeier, 2020; Mayer, 2021; Rogers et al., 2018). Considering IVR experiences are comprised of visual, auditory, and haptic stimuli, investigating which multisensory design strategies are most effective in fostering learning is a worthwhile pursuit. Since I have a background in sound design, I thought I’d try my hand at investigating the impacts of audio on cognitive load in an attempt to shed light on the extraneous load issue in IVR. Thus, I have selected the Cognitive Load Theory (CLT) to underpin my research. 

Cognitive Load Theory

CLT (Sweller, 2011; 2019) is an instructional design theory that focuses on human cognition to explain the learning process and is considered an appropriate framework for evaluating which educational technologies are effective and how they should be used (Sweller, 2020, p.1). CLT embodies a “human cognitive architecture” (p.2) that addresses how humans acquire, process, and use primary and secondary information (Sweller, 2020, p.2). The theory suggests that human short-term memory is limited in capacity and that there are “cognitive effects” (p.9) that can enhance or hinder the process of transferring newly acquired information from working memory into usable long-term memory stores (Sweller, 2020). Most importantly for my research, CLT identifies various instructional design considerations that influence a learner’s cognitive capacity; most specifically, it provides a map for reducing extraneous load. For instance, CLT can be used to understand the effects of learning element interaction, like how various multisensory elements (e.g. audiovisual stimuli) in an IVR world impact the learner’s cognitive capacity to receive, process, and activate learning material; however, as fitting as this may seem for my research, applying the CLT cognitive architecture to the complexities of IVR, such as the effects of immersion and presence on learning, may prove difficult. After some review of the literature, I now realize that applying the CLT framework to my research context will require additional models to make substantial connections between audio and IVR learning: Que Makransky and Peterson’s (2021) Cognitive Affective Model of Immersive Learning (CAMIL). 

The Cognitive Affective Model of Immersive Learning

In short, CAMIL “synthesizes existing immersive educational research to describe the process of learning in IVR” (Makransky & Peterson, 2021, p.1). The model argues that methods interact with media, meaning instantiating instructional design methods which facilitate the unique affordances of HMD-IVR technology, such as presence and agency, return positive learning outcomes. For example, ID methods that foster presence in IVR generate better learning outcomes than ID methods that do not foster presence (Makransky & Peterson, 2021). Further, the model describes how IVR affordances influence several affective and cognitive factors that impact immersive learning, such as intrinsic motivation, embodiment, and, most importantly, cognitive load. All of the factors mentioned in CAMIL are associated with immersion and presence, which are influenced by multisensory stimuli, including audio (Brinkman et al., 2015; Kern & Ellermeier, 2020; Rogers et al., 2018), so CAMIL will likely prove to be a valuable tool for making strong connections between sound design and cognitive load in IVR. Further, many of the conclusions I’ve made thus far from reading the literature on game audio in IVR are supported by CAMIL, providing much-needed validation that my research is on the right track. 

In conclusion, I invite readers to weigh in on my research direction and choice in a theoretical framework. For example, do you feel CLT is a good choice for investigating sound design effects on cognitive load in IVR, or is there a more fundamental theory I missed that might be more beneficial to my investigation? Also, since there are so many great learning models, do you know of other models, like CAMIL, that may assist my research? I thank you in advance for your thoughts and considerations.



Albus, P., Vogt, A., & Seufert, T. (2021). Signaling in virtual reality influences learning outcome and cognitive load. Computers and Education, 166.

Brinkman, W. P., Hoekstra, A. R. D., & van Egmond, R. (2015). The Effect of 3D Audio and Other Audio Techniques on Virtual Reality Experience. Studies in Health Technology and Informatics, 219, 44–48.

Ghinea, G., Andres, F., & Gulliver, S. (2012). Multiple sensorial media advances and applications; new developments in MulSeMedia. (2011). Reference and Research Book News, 26(5).

Huang, C. L., Luo, Y. F., Yang, S. C., Lu, C. M., & Chen, A. S. (2020). Influence of Students’ Learning Style, Sense of Presence, and Cognitive Load on Learning Outcomes in an Immersive Virtual Reality Learning Environment. Journal of Educational Computing Research, 58(3), 596–615.

Kern, A. C., & Ellermeier, W. (2020). Audio in VR: Effects of a Soundscape and Movement-Triggered Step Sounds on Presence. Frontiers in Robotics and AI, 7.

Makransky, G., Terkildsen, T. S., & Mayer, R. E. (2019). Adding immersive virtual reality to a science lab simulation causes more presence but less learning. Learning and Instruction, 60, 225–236.

Makransky, G., & Petersen, G. B. (2021). The cognitive affective model of immersive learning (camil): a theoretical research-based model of learning in immersive virtual reality. Educational Psychology Review, 33(3).

Mayer, R. (2021). Multimedia Learning: Vol. 3rd Edition. Cambridge University Press. UK. 

Richards, D., & Taylor, M. (2015). A Comparison of learning gains when using a 2D simulation tool versus a 3D virtual world: An experiment to find the right representation involving the Marginal Value Theorem. Computers and Education, 86, 157–171.

Rogers, K., Ribeiro, G., Wehbe, R. R., Weber, M., & Nacke, L. E. (2018). Vanishing importance: Studying immersive effects of game audio perception on player experiences in virtual reality. Conference on Human Factors in Computing Systems.

Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. New York: Springer

Sweller, J. (2020). Cognitive load theory and educational technology. Educational Technology Research and Development, 68(1), 1–16.

Sweller, J., van Merriënboer, J., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review, 31, 261–292.

4 thoughts on “The Effects of Audio on Cognitive Load in Immersive Virtual Reality: Selecting a Theoretical Framework

  1. Interesting, Jonathan. Thanks for the good read. I think you’ve made a good case for the adoption of Cognitive Load Theory. To me, it sounds like the obvious choice for your research topic.

    Something that seemed to jump out at me while I was reading your post is that there appears to be a contradiction between what your initial research has shown and the findings associated with CAMIL. You mention in your research topic overview that deep immersion comes at the cost of extraneous cognitive load which can distract from learning. You also point out that presence is influential in a user’s immersion level… and finally… that the authors of CAMIL argued that instructional design which includes presence results in greater learning outcomes. But if immersion, which is influenced by presence, increases cognitive load, would we not expect to see a decrease in learning outcomes as a result of its inclusion? Have I misinterpreted this, or the results simply counterintuitive? This is my first exposure to CAMIL, so please forgive my ignorant perspective.

    1. Hi Christopher,

      Thanks for your thoughts on this. You make an excellent observation and no you most certainly have not misinterpreted CAMIL or my post. There is a contradiction between reports of high extraneous load in high-level IVR and the learning benefits suggested in CAMIL.

      Based on a strong meta-synthesis, CAMIL suggests when methods compliment the affordances of IVR (specifically agency and presence), better learning is achieved as compared to when methods do not compliment IVR affordances. CAMIL-based learning benefits include increased intrinsic motivation, enjoyment, and self-efficacy, to name a few. Although all of these things can indirectly lead to improved learning (e.g. improved transfer, recall, etc.), the extraneous elements of IVR, when present in the design, can still distract learners from their primary task and limit their cognitive capacity to process learning material. For example, HMD-IVR consists of multi-directional stimuli that the learner can observe and usually interact with. Along with this interaction typically comes a high level of agency – e.g. the learner has to first search for interactive objects and then proceed to figure out their function). If the process of searching is deemed non-essential to the learning, then it is viewed as an excessive extraneous load on the learner – CAMIL’s authors refer to it as “seductive details” (p.949). When IVR applications include a high volume of these seductive details, which could embody audiovisual, cognitive, or haptic stimuli, for example, the learner’s cognitive capacity is more likely to be negatively affected.

      So back to your question… even if the IVR design features great methods which complement the media affordances of HMD-IVR technology, other design malpractices can increase the CL for the learner, and hence the contradiction.

  2. Excellent choice for your TF Jonathan and you’ve done a great job describing cognitive load theory and it’s connection to your work; and Christopher’s Combe Ted are extremely well thought through and helpful. As per our chat today, there may be aspects of cognitive load theory that may make sense to explore further in the lit review — but it may not as well. You’ve got a good handle on the TF and if you keep in mind the idea of ….. given that cognitive load theory says X, and we know that’s true, my research steps forward from this with the focus on IVR, and in particular, a focus on audio in IVR and learning. I really enjoyed reading your piece regarding CAMIL and it sounds like this may be worth having in the lit review. Great work on this

Leave a Reply

Your email address will not be published. Required fields are marked *