Imagine entering a party. You grab a drink and join your friends talking in a circle. Around you, similar circles have formed, and the air if filled with chatter and music blaring from the stereos. In this seeming maelstrom of sound, you quite easily and naturally attend to Mary, as she describes the fascinating intricacies of her latest paper detailing the new amendment to the Finnish tax law. Even though enraptured by the impromptu dissertation, your attention begins to wander and you start sampling the auditory scene all the while continuing to nod and make occasional eye contact with Mary. Relieved, you suddenly hear your old friend’s signature laugh across the room, and as Mary pauses for a sip, you smoothly fade out of the circle with a smile and a nod, leaving those poor fellows simmering as she takes them on a detour through the differences among the Baltic countries in equity tax law.
The Cocktail Party Problem refers to people’s capacity to “tune in” to particular aspects of the auditory scene. Theoretically speaking the difficulty of Auditory Scene Analysis (ASA) lies in the fact that we have access to only one “jumbled up” pressure wave that reaches the ear at any given moment. The question is how does the auditory system (and the brain more broadly) separate the single event into different auditory streams that we can perceive. According to Al Bregman , a pioneer in the field, there are two aspects to our capacity to distinguish and attend to separate sound stimuli: “bottom-up” and “top-down” processes. According to my understanding, you could say that bottom-up processes describe the “primitive”, basic aspects of the auditory system and sound that make the separation possible at all. For example, the pressure wave that reaches the ear is composed of various frequencies on which the cochlea seems to perform an analog spectrum analysis. Furthermore, frequency components are grouped together if they begin at the same time. It seems that in natural settings such sound groupings have an internal coherence to which our brains can attune to. Top-down processes on the other hand relate to conscious attention which relies on basic mechanisms, but can’t really be reduced to them. In this blog, I wish consider attention in the context of an auditory scene through a fascinating account of the brain provided by Iain McGilchrist in his book “The Master and His Emissary”.
The book is about how the two hemispheres of the brain enable different kinds of attention. In his view, the world demands of the animal (lateralization of the brain is found throughout the animal kingdom, indeed it predates us by half a billion years) two fundamentally different modes of attention: one attending the whole scene vigilantly and the other detail-oriented and focussed. For McGilchrist, the left hemisphere’s job is to focus on the details and what you already know, and the right is concerned with the whole and with novelty. These two attentional modes are both required working in tandem and are occasionally being brought to bear more exclusively depending on the circumstances. Indeed, the Corpus Callosum, which is a thick nerve bundle connecting the two hemispheres, is not only communicating information between the hemispheres, but also actively inhibiting the other part – almost as if saying “I got this”. But due to their differences, you could say “personalities” even, they have rather different views of the world. McGilchrist goes on to describe the uneasy balance between the hemispheres and consider it’s effects on culture and society, but such speculations need not concern us here.
Most mental functions are served by both hemispheres of the brain. Thus we can assume that during a ASA situation the required functions are being served by both hemispheres such that there is constant switching over between modes of attention while at the same time both hemispheres are processing information in their typical way. But notice that we experience an integrated sensory scene: sounds, smells and sights are all blended into a single but heterogeneous whole. Thus there must exist in addition to specific processing modules a global function of attention, that is as it were casting a spotlight on some aspects but not others. This shifting of attention is precisely what seems to be happening during the described scene: we are both narrowly focussing on Mary, but sometimes shifting over to a global form of attention and searching and sampling the whole auditory scene. For McGilchrist the right hemisphere is grounding the entire experiental field as a whole, and directing the narrow attentional beam of the left hemisphere towards whatever is deemed in need of a more high-resolution view. One level down, certain aspects of the scene are being processed by both hemispheres simultaneously. For example, language is mainly understood by the left hemisphere, but intonation and emotion in prosody are processed by the right. Aspects of music such as melody, timbre, tone and pitch-processing are (preferentially in most people) a largely right hemisphere job, but quite interestingly this applies mostly to non-professional musicians. Professionals process music more evenly in both hemispheres, possibly because learning music theory builds a language-like framework for the sounds which is preferentially processed by the left hemisphere.
There are many more fascinating details and aspects we could consider from the hemispheric perspective, but space is running out. If anyone knows of studies that address ASA and hemispheric activation, I’d love to hear about it.
A final thought comes to mind. It may be that phenomena such as ASA are difficult mostly because we apply a left-hemispheric mode of thinking to ground the questions. If we would truly understand what enables the right hemisphere to process wholes and recognize gestalts for example, we might have far more powerful techniques to solve problems in fields such as A.I and machine learning.
For a nice animated overview of McGilchrist’s idea see below.