Opinion
Object-based auditory and visual attention

https://doi.org/10.1016/j.tics.2008.02.003Get rights and content

Theories of visual attention argue that attention operates on perceptual objects, and thus that interactions between object formation and selective attention determine how competing sources interfere with perception. In auditory perception, theories of attention are less mature and no comprehensive framework exists to explain how attention influences perceptual abilities. However, the same principles that govern visual perception can explain many seemingly disparate auditory phenomena. In particular, many recent studies of ‘informational masking’ can be explained by failures of either auditory object formation or auditory object selection. This similarity suggests that the same neural mechanisms control attention and influence perception across different sensory modalities.

Introduction

At a cocktail party, the sounds of clinking glasses and exuberant voices add acoustically before entering your ears. To appreciate your companion's anecdote, you must filter out extraneous sources (see Glossary) and focus attention on her voice. At the same time, the sounds that you tune out are crucial for maintaining awareness of your environment. Indeed, a source of interference (the pompous man on your right) might become the very source you want to understand in the next moment (e.g. when you realize he is relaying a juicy story about your boss). To maneuver successfully in everyday settings, you need to be able to both focus and shift attention as the need arises.

Theories of visual attention explain many striking perceptual phenomena that arise when viewing complex scenes, from change blindness (failure to notice a change in a visual scene because attention is directed to another part of the image) to performance on visual search tasks 1, 2. Although there is much current interest in how central limitations interfere with auditory perception, there is no comprehensive framework to explain our ability to understand sound sources in complex acoustic scenes. Here, I argue that many auditory phenomena, including how we manage to converse at a cocktail party, can be understood by properly extending theories of visual attention. This commonality supports the idea that the same neural processes control visual and auditory attention [3].

Section snippets

Auditory objects

Theories of visual attention argue that observers focus attention on an object in a complex scene [2]. Unfortunately, just as in vision [4], it is difficult to define what constitutes an object in audition. This difficulty arises, in part, because there are few absolute rules governing auditory object formation. Audible sound in a mixture is not always allocated between the objects perceived in a scene, and can contribute either to multiple objects 5, 6 or to no object [7]. The state of the

Object formation

In a visual scene, objects form locally based on contiguous geometric structure, such as edges, boundaries and contours [4]. Discrete local patches can be perceptually linked, based on similarity of texture, color and other features, to form whole objects [4].

In a similar way, auditory objects form across different analysis scales. For sound elements with contiguous spectro-temporal structure, formation relies primarily on this local structure 12, 13, including common onsets and offsets,

Object-based attention

Object formation directly influences how we perceive and process complex scenes. In all sensory modalities, the normal mode of analyzing a complex scene is to focus on one object while other objects are in the perceptual background 17, 18. In vision, this mode of perceiving is described as a biased competition between perceptual objects [2]. Biased competition takes place automatically and ubiquitously when there are multiple objects in a scene. Which object wins the competition depends both on

Understanding perception of complex scenes

Because attention is object based, competing sources in a complex scene can cause many different forms of perceptual interference, some of which are considered below. An overview of the interactions affecting auditory perception is shown in Figure 1.

Summary

In both vision and audition, we direct top-down attention to select desired objects from a complex scene. Because perceptual objects are the basic units of attention, proper object formation is crucial to this ability. Stimulus structure determines how objects form locally, either in space-time (for visual objects) or time-frequency (for auditory objects). Higher-order perceptual attributes enable both object formation across larger scales and selection of a desired object from a complex scene.

Acknowledgements

Grants from NIDCD, AFOSR, ONR and NSF supported this work. These ideas were developed through discussions with Gin Best, Antje Ihlefeld, Erick Gallun, Chris Mason, Gerald Kidd, Steve Colburn and Nat Durlach.

Glossary

Energetic masking
perceptual interference present in the sensory epithelium.
Informational masking
perceptual interference that cannot be explained by energetic masking.
Object
a perceptual estimate of the content of a discrete physical source.
Salience
the perceptual strength of an input based purely on stimulus attributes.
Similarity
a putative explanation for auditory informational masking when a target and competing sources have similar perceptual features.
Source
a discrete physical entity in the

References (49)

  • C.J. Darwin

    Perceiving vowels in the presence of another sound: a quantitative test of the “old-plus-new” heuristic

  • B.G. Shinn-Cunningham

    A sound element gets lost in perceptual competition

    Proc. Natl. Acad. Sci. U. S. A.

    (2007)
  • R. Cusack

    Effects of location, frequency region, and time course of selective attention on auditory scene analysis

    J. Exp. Psychol. Hum. Percept. Perform.

    (2004)
  • E.S. Sussman

    The role of attention in the formation of auditory streams

    Percept. Psychophys.

    (2007)
  • R.P. Carlyon

    Effects of attention and unilateral neglect on auditory stream segregation

    J. Exp. Psychol. Hum. Percept. Perform.

    (2001)
  • A.S. Bregman

    Auditory Scene Analysis: The Perceptual Organization of Sound

    (1990)
  • A.J. Sach et al.

    Some characteristics of auditory spatial attention revealed using rhythmic masking release

    Percept. Psychophys.

    (2004)
  • C.J. Darwin et al.

    Effectiveness of spatial cues, prosody, and talker characteristics in selective attention

    J. Acoust. Soc. Am.

    (2000)
  • J. Duncan

    EPS Mid-Career Award 2004: brain mechanisms of attention

    Q. J. Exp. Psychol. (Colchester)

    (2006)
  • S. Shomstein et al.

    Configural and contextual prioritization in object-based attention

    Psychon. Bull. Rev.

    (2004)
  • S. Yantis

    How visual salience wins the battle for awareness

    Nat. Neurosci.

    (2005)
  • E.I. Knudsen

    Fundamental components of attention

    Annu. Rev. Neurosci.

    (2007)
  • L. Busse

    The spread of attention across modalities and space in a multisensory object

    Proc. Natl. Acad. Sci. U. S. A.

    (2005)
  • S. Shomstein et al.

    Parietal cortex mediates voluntary control of spatial and nonspatial auditory attention

    J. Neurosci.

    (2006)
  • Cited by (521)

    • Automating medical simulations

      2023, Journal of Biomedical Informatics
    View all citing articles on Scopus
    View full text