Self-View As An Attention Drain

Chapter 2

Why the Brain Cannot Ignore Its Own Face

After three hours of video conferencing, Maria shuts her laptop and feels as though she has spent the entire day on her feet. She is a university lecturer, well accustomed to six-hour days in physical classrooms. In-person teaching is tiring too, but in a predictable way: by evening, your feet ache and your voice is hoarse. Online, Maria feels completely different. After just two classes, she feels as if she has been drained by Dementors from Harry Potter.

"And then I sat in silence for forty minutes during a department meeting, just listening—and afterward, I was as exhausted as if I had just taken a final exam," she says. "It makes no sense. I wasn't even doing anything."

In reality, it makes perfect sense. Maria simply doesn't suspect that the cause is hiding in plain sight: in the small window in the corner of her screen. The entire time Maria was "just listening," her brain was simultaneously processing another visual stream—her own face. This happened entirely automatically, without any conscious desire or intention on her part. Had Maria attended the same meeting in a physical room, she would have been engaged in one task: listening. On Zoom, she was engaged in two: listening and observing herself. The second task is invisible, but highly energy-intensive.

In the previous chapter, we saw that mirrors alter behavior reliably, reproducibly, and most importantly, in a matter of minutes. The question now is: what happens when the mirror operates for hours? What exactly is the brain doing when your face appears on the screen? And why can't you "just not look"?

Neurophysiology, eye-tracking, and electroencephalography provide exhaustive answers.

Three Channels, One Superfluous

Let us return to the central idea of this book, outlined in the introduction.

For 300,000 years, human communication relied on two channels. The first is content: words, arguments, the meaning of what is said. The second is the interlocutor's nonverbal behavior: facial expressions, gestures, intonation, posture, and distance. All the systems of social cognition our brains possess—from mirror neurons to the theory of mind—evolved to process exactly these two streams of information simultaneously and in parallel. And this capacity was perfectly sufficient for Homo sapiens.

Video conferencing added a third communication stream: your own face. As science writers like to say, Mother Nature never intended this.

The scale of the problem becomes clear when we look at how attention works. Back in the mid-20th century, Alexander Luria, one of the founders of modern neuropsychology, demonstrated that voluntary attention is not a boundless resource, but a highly energy-intensive function dependent on the prefrontal cortex. When performing complex tasks—especially when a person must simultaneously absorb information, suppress distracting stimuli, and regulate their own behavior—these resources deplete rapidly.

In 2006, Annie Lang proposed the Limited Capacity Model, positing that attention is not an infinitely elastic resource, but rather a strictly limited budget ^[1]. Imagine it as one hundred arbitrary units. If the content of the conversation requires, say, forty units, and processing the interlocutor's nonverbals takes another thirty, thirty are left for everything else. This remainder is enough for many things: keeping track of the time, adjusting your posture, taking notes, or pouring a glass of water. The system works—it has been fine-tuned by natural selection over hundreds of thousands of years. Those who lacked the attention to pour a glass of water during a tribal meeting likely didn't survive to reproductive age to pass on their genes.

Now add the third channel—your own face on the screen. It demands resources. Exactly how much depends on the individual (this is the focus of Part II of the book), but it is never zero. And it's not just the amount that matters, but the priority. The third channel doesn't just subtract units from the first two; it commandeers them with absolute priority, because for the human brain, there is no visual stimulus more important than one's own face.

The issue goes beyond the mere quantity of hijacked resources. This "parasitic" third channel fundamentally changes the nature of the task. When processing the content and the interlocutor's nonverbals, the brain is solving a normal, external problem: understanding the other person. But when processing its own face, it switches to an internal problem: evaluating oneself. These two modes—external and internal, "I am listening to you" and "I am looking at myself"—compete for the exact same neural resources and are highly incompatible. As a result, the person attempts to do both, but does neither fully. They listen to their colleague with half an ear while fleetingly (or sometimes intently) evaluating themselves. Caught between these two tasks, both the conversation's content and the person's peace of mind slip through the cracks.

Faces: A Primate’s Priority

Among all the visual objects the human brain can recognize, faces occupy a special tier. A brain region known as the fusiform face area (FFA) specializes specifically in processing them. It activates faster and more intensely than it does for any other category of objects—houses, cars, letters, or landscapes ^[2]. This is not a learned cultural skill, but the direct result of millions of years of evolutionary pressure. For social primates—and we are no exception—the ability to instantly recognize faces, read their emotions, and, above all, distinguish "friend" from "foe" has always been a matter of survival. Those who recognized faces and emotions too slowly lost the evolutionary race. This superpower is a tool for survival and adaptation, honed to automaticity by natural selection.

But even within the hierarchy of faces, there is an absolute peak. At that peak is our own face.

In 2010, Polish neurophysiologists Paweł Tacikowski and Anna Nowicka used EEG to record event-related potentials—the electrical responses of the cerebral cortex to presented stimuli. Participants were shown three categories of faces: unfamiliar, familiar, and their own. Their own face triggered the most pronounced and rapid electrical response ^[3]. This reaction is automatic; it occurs before a person even has time to consciously register what they are looking at. The brain tags its own face as a self-relevant stimulus and assigns it the highest priority. This is the same reason you can hear your own name across a noisy room—the well-documented "cocktail party effect," only translated into visual modality.

Your own face on a video conference screen is exactly this kind of stimulus. It does not require a conscious decision to "look at myself." It hijacks attention automatically, at a level preceding conscious thought. This mechanism is so fundamental that it traces back to Ivan Pavlov. Over a century ago, he described the "orienting reflex" (or the "What is it?" reflex)—the body's automatic response to any new or biologically significant stimulus. This reflex instantly heightens sensory sensitivity and directs attention exactly where crucial information might be hiding. For humans engaged in communication, their own face is one of the most powerful triggers for this reflex. The brain simply cannot ignore it. And no amount of willpower can completely override this mechanism; it can only brake it for a short time, expending a portion of your already limited cognitive budget with every attempt.

Where They Actually Look

People tend to overestimate their control over their own attention. If you ask a video call participant what they were looking at, most will say, "At the person speaking. Well, sometimes at the presentation." Eye-tracking technology—which captures gaze direction with pinpoint accuracy—paints a different picture.

In 2024, Stephanie Ariss and Christopher Fairbairn at the University of Illinois asked: "Everyone says they look at their conversation partner. But do they?" They equipped participants with eye-trackers and recorded their gaze trajectories during real video calls. The results were unequivocal: participants systematically returned to their own self-view window, making serial visual fixations far more frequently than they later reported ^[4]. The gap between where people think they are looking and where they are actually looking proved robust and reproducible. Even while preparing a presentation on this book for a scientific conference, we easily replicated this experiment using open-source eye-tracking software.

Even more striking data came from researchers at Dartmouth College that same year. They discovered a paradox that, at first glance, defies common sense: participants who experienced the most discomfort from the self-view looked at it more often, not less. Those who reported unpleasant feelings when seeing their own face fixated on it longer than anyone else ^[5].

This is certainly not done for masochistic pleasure. It is a manifestation of a well-known clinical psychology mechanism: anxiety often triggers compensatory hyper-control. A person begins to obsessively focus on the perceived source of a threat to ensure that "everything is okay" and nothing has spiraled out of control. In the moment, this brings fleeting relief, but in the long run, it only amplifies the anxiety and sustains a vicious cycle. The daily practice of any psychotherapist provides ample examples of this. A person terrified of spiders cannot stop looking at a spider they spotted in the room. A person embarrassed by a stain on their shirt will constantly dart their eyes down toward it. A person experiencing a panic attack cannot stop scanning their bodily sensations, constantly checking their pulse or breathing to ensure nothing catastrophic is happening. A person anxious about their appearance cannot stop looking at their own face.

The self-view acts as a spiral of discomfort: the more discomfort you feel, the more you look; the more you look, the more discomfort you feel. This is a textbook definition of a positive feedback loop. In clinical psychology, such self-sustaining mechanisms are called vicious cycles, and we will dissect three of them in the next chapter. But even at the level of simple eye-tracking, the reality is clear: the self-view is not a neutral UI element that you can simply choose to ignore. For a significant portion of users, this stimulus becomes a black hole that swallows attention whole.

Instrumental Evidence

Subjective complaints about Zoom fatigue have echoed since the first months of the pandemic. But subjective data is just that—subjective. A person, especially one locked at home with relatives during a pandemic, might label themselves "exhausted" for dozens of reasons: boredom, irritation, poor sleep, lack of fresh air. Gernot Müller-Putz, head of the Institute of Neural Engineering at Graz University of Technology, approached the question differently: can we see Zoom fatigue on an EEG? Can we record it not from participants' words, but objectively, through the electrical activity of the cortex?

Müller-Putz and his colleagues invited thirty-five students to attend the exact same seminar in two formats: in-person and online. During both, participants wore EEG caps, and their electrocardiograms (ECG) were simultaneously recorded. The content, the instructor, and the duration were identical. The only variable was the format.

The results were stark and became apparent much sooner than the researchers expected. After just fifteen minutes of the online meeting, the EEG registered markers of cognitive fatigue that were nowhere to be found in the in-person format. Simultaneously, heart rate variability (HRV) dropped—a metric reflecting the tone of the parasympathetic nervous system, which is responsible for rest and recovery. When the parasympathetic system is suppressed, the body shifts into a state of mobilization—the classic "fight or flight" response, but in a chronic, smoldering form. The video format didn't just "seem" more tiring; it measurably depleted the cerebral cortex and shifted the autonomic balance toward stress ^[6].

Fifteen minutes is merely a quarter of a typical meeting, class, or consultation. The brain entered a trajectory of exhaustion before participants even had time to realize they were tired. The internal monologue of an online participant, as quoted by Müller-Putz, sounds like this: "Is my shirt okay? Does my background look normal? How is my face?" None of these questions arise when those same people sit around a physical conference table. But on a video call, they arrive in full force.

The Graz experiment captured the general cognitive cost of video conferencing. But what specific role does the self-view—that small window with your own face—play in this cost? Another study, conducted on the other side of Europe, addressed this exact question.

Habituation Does Not Occur

In 2024, Jin Xu, Eoin Whelan, and colleagues from the University of Galway (Ireland) conducted a compelling experiment ^[7]. Thirty-two volunteers participated in a series of live video conferences—not artificial lab tasks, but real conversations with other people. The only variable manipulated was the self-view, which was alternately toggled on and off for the participants. Everything else remained constant: the interlocutors, the topic, the duration. Participants wore EEG electrodes tracking brain activity across five frequency bands: delta, theta, lower alpha, upper alpha, and beta.

Whelan and Xu were primarily interested in the alpha rhythm (oscillations in the 8–13 Hz range). The alpha rhythm is one of the most reliable markers in neurophysiology. It is linked to two simultaneous processes: cortical inhibition (the brain "braking" its processing) and mental fatigue. When the cortex is overloaded, alpha activity spikes, and the nervous system shifts into an energy-saving mode.

The result: when the self-view was enabled, alpha activity was statistically significantly higher than when it was hidden. The difference was consistent and predictable.

But the most crucial discovery wasn't the spike in alpha activity itself—it was its trajectory. The alpha rhythm did not decline over time when the self-view was on. For the entire twenty minutes of observation, it remained stably elevated. Habituation or adaptation (which typically occurs with many other continuous stimuli) never took place. The brain did not "learn" to ignore its own face, nor did it reallocate resources. The twentieth minute of self-view burdened the cortex exactly as much as the first minute.

This is a detail that deserves special attention. Many everyday irritants act as acute stressors: they provoke a reaction, the nervous system adapts, and the load decreases. You stop noticing the hum of an air conditioner after a minute. The smell of a new room "disappears" for us within five minutes. The self-view is not that kind of stressor. It operates as a permanent background load and lasts as long as the video call lasts—one, two, or three hours, if not turned off.

In practical terms, this means that if your workday consists of four one-hour video meetings with the self-view on, your brain remains under a state of elevated cognitive load for all four hours. It is neurophysiologically impossible to "train yourself" or get used to this stimulus—at least over the timeframes researchers have been able to measure.

One more significant detail: Xu and Whelan's EEG data found no gender differences in neurophysiological load. Male and female brains reacted to the self-view identically. The differences consistently captured in self-reported surveys—where women report higher Zoom fatigue and greater dissatisfaction with their appearance—are differences not in the neural load, but in its interpretation. The brains of both sexes are equally overloaded, but it seems social norms steer women toward explaining this exhaustion through their appearance, while men simply attribute it to being "tired." We will return to this strange effect in the next chapter.

Why You Can't "Just Not Look"

The most common advice given to people who notice themselves fixating on the self-view is: "Well, just don't look at it." The advice seems entirely reasonable from a common-sense perspective, but it fundamentally doesn't work (much like other hollow psychological advice, such as "Just don't be sad" or "Stop overthinking").

There are three reasons why, and each is sufficient on its own.

First is the priority of the self-relevant stimulus, which we discussed earlier. As Tacikowski and Nowicka’s data showed, your own face is processed automatically and with the highest priority. Suppressing this automatic reaction requires the active intervention of the prefrontal cortex—the exact same resource needed to do your job, listen, and make decisions. Every act of suppression withdraws cognitive units from the same budget needed to understand your colleague. You aren't "saving" energy by trying not to look at yourself through sheer willpower; you are spending energy on the effort of suppression itself.

Second is peripheral vision. Even when you consciously direct your gaze at the speaker, the self-view window remains in your peripheral vision. Evolutionarily, peripheral vision is highly attuned to detecting movement—it is what saved our ancestors' lives by spotting a predator at the edge of their visual field. (And it still constantly saves our lives today, for example, while driving). Your own moving face is a powerful stimulus constantly competing for the resources of central attention. Every nod, every turn of your head is registered by your periphery, and every single time, your brain must make a micro-decision: switch focus or suppress. This process is unconscious—which is exactly why it is so taxing.

Third is the self-referential network. The brain houses a specific network of regions that activates whenever processing information related to the self: the medial prefrontal cortex (mPFC), the posterior cingulate cortex (PCC), and the insula ^[8]. This network forms the foundation of self-awareness; it lights up when you see your face, hear your name, or think about yourself. The self-view on the screen acts as its constant activator. "Not looking" means suppressing not just the direction of your gaze, but the spontaneous activation of an entire neural network. This is possible—briefly. But the longer the call, the higher the likelihood that the suppression will fail, and your eyes will dart right back to your reflection.

The Hidden Cost of Switching

Let’s assume a person uses heroic effort to keep their eyes on the speaker. Let’s even assume they succeed most of the time. There is another hidden cost that is easily overlooked: the switch cost.

Cognitive science established long ago that every time attention shifts from one object to another, it takes time and consumes resources ^[9]. A single switch takes anywhere from a few dozen to a few hundred milliseconds. By itself, this amount is tiny—we don't even feel it. But on a video call with self-view enabled, these switches occur dozens, sometimes hundreds of times an hour. Speaker → self-view → speaker → presentation slide → self-view → another participant → self-view → chat window. Every cycle incurs a micro-expense. But micro-expenses compound. Crucially, each switch isn't just a tax on transit; it involves a micro-loss of context. You return your gaze to the speaker, but for a fraction of a second, you’ve lost the thread of what they were saying. These micro-losses are imperceptible individually, but together they create a highly recognizable sensation: "I felt like I was listening, but for some reason, I don't remember anything."

There is a useful everyday analogy. If an app on your phone briefly activates the screen once a minute, each individual episode uses a negligible fraction of the battery. But by the end of the day, the battery is dead—not because of one massive drain, but because of a thousand tiny ones. The switch cost of the self-view operates on the exact same principle: by the end of a one-hour meeting, the total cognitive deficit accumulated strictly from visual context switching can be equivalent to several minutes of focused mental work, entirely wasted.

The Gallery of Mirrors

There is one more aspect of video conferencing that demands separate attention. On most platforms, in addition to the "active speaker" layout, there is a "gallery view"—a grid displaying all participants simultaneously. In a corporate environment, this could mean five, ten, twenty-five, or more windows. Your own face is one of them, embedded in the mosaic right alongside the others.

A scenario where a person sees themselves in a lineup with dozens of other faces of roughly the same size, on a single flat plane, all at once, has never existed in the natural world. No social context in 300,000 years has presented a stimulus like this. In real life, you do not see yourself sitting next to your colleagues; you see them, and you experience yourself from the inside, through interoception and proprioception. In the "wild," you do not see how you look. Gallery view shatters this asymmetry: you become just one of many rectangles, each of which can be directly compared to your own.

This is the ultimate breeding ground for upward social comparison—the automatic, poorly controlled tendency to measure oneself against those who subjectively appear to look better ^[10]. Leon Festinger outlined this mechanism back in 1954, long before screens existed: people continuously evaluate themselves through comparison with others. This isn't a conscious choice, but a foundational property of social cognition. Gallery view feeds this mechanism an unprecedented volume of material. Dozens of faces at once, each a potential object of comparison, situated literally adjacent to yours. Whose skin looks better? Who has better lighting? Who looks more put-together? These comparisons happen automatically in the background, further draining the cognitive budget we discussed at the beginning of the chapter.

An Evolutionary Vulnerability

Everything described in this chapter—the absolute priority of the self-relevant stimulus, the automatic hijacking of attention, the absence of habituation, the switch cost, and the gallery view effect—is the result of a normal brain functioning perfectly normally in abnormal conditions. A person staring at themselves on a video call is not demonstrating a bad habit, narcissism, vanity, or a lack of self-discipline—none of the things they are so often accused of. They are demonstrating the entirely expected reaction of an evolutionarily shaped nervous system to a stimulus it was never designed to process in the background.

The problem lies in the environment, not the user. The self-view is enabled by default across all major platforms and messengers: Zoom, Microsoft Teams, Google Meet, Telegram, WhatsApp, FaceTime. It is turned on—and it stays on unless the user takes deliberate action to hide it. Most users likely don't even know that disabling it is an option. Many others, even if they do know, hesitate to turn it off out of a fear of losing control over how they appear. (We will explore exactly why this happens in Part II, in the chapters on "The Controller" and "The Performer").

Ultimately, a design decision implemented purely for technical convenience is inflicting a cognitive load on users that is objectively measurable by neurophysiological instruments. It would be one thing to make such claims based solely on subjective surveys (though many major psychological findings rely exactly on those); it is quite another when we look at an instrumentally recorded spike in alpha rhythms that refuses to drop for the entirety of a call.

Let us return to Maria. She now has a far better explanation for her exhaustion than "you're just not used to it yet." For three straight hours, her brain was processing a high-priority stimulus that it can neither adapt to nor ignore. The cognitive resources meant for decoding the conversation and reading her colleagues' nonverbal cues were bleeding into a third channel—one that simply does not exist in the evolutionary blueprint of human communication.

Attention hijacking is only the first layer of the problem. In the next chapter, we will see how the hijacking described here morphs into something even deeper: a shift in the very way a person is present in a conversation. From the subject of communication to its object. From the one who speaks to the one who observes themselves speaking. And we will examine how this shift triggers vicious cycles that become self-sustaining.

References

[1] Lang, A. (2006). Using the Limited Capacity Model of Motivated Mediated Message Processing to Design Effective Cancer Communication Messages. Journal of Communication, 56(s1), S7–S24.

[2] Kanwisher, N., McDermott, J., & Chun, M. M. (1997). The fusiform face area: a module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11), 4302–4311.

[3] Tacikowski, P., & Nowicka, A. (2010). Allocation of attention to self-name and self-face: An ERP study. Biological Psychology, 84(2), 318–324.

[4] Ariss, S., & Fairbairn, C. (2024). Eye-tracking during videoconference interactions: Self-view fixation and gaze patterns. University of Illinois.

[5] Ratan, R. et al. (2022). Self-view and public self-consciousness in video meetings. Wayne State University. (Data on the paradox "discomfort → higher fixation" is also corroborated by Dartmouth College research, 2024).

[6] Müller-Putz, G. R. et al. (2025). Neurophysiological markers of cognitive fatigue in videoconferencing vs. face-to-face meetings: An EEG and ECG study. Graz University of Technology.

[7] Whelan, E. et al. (2024). Self-view in video-conferencing and its role in Zoom fatigue: An EEG study. Behaviour & Information Technology. PubMed: 38574294.

[8] Northoff, G. et al. (2006). Self-referential processing in our brain – A meta-analysis of imaging studies on the self. NeuroImage, 31(1), 440–457.

[9] Monsell, S. (2003). Task switching. Trends in Cognitive Sciences, 7(3), 134–140.

[10] Festinger, L. (1954). A theory of social comparison processes. Human Relations, 7(2), 117–140.