Skip to main content
Log in

Effects of Spectral Degradation on Attentional Modulation of Cortical Auditory Responses to Continuous Speech

  • Research Article
  • Published:
Journal of the Association for Research in Otolaryngology Aims and scope Submit manuscript

Abstract

This study investigates the effect of spectral degradation on cortical speech encoding in complex auditory scenes. Young normal-hearing listeners were simultaneously presented with two speech streams and were instructed to attend to only one of them. The speech mixtures were subjected to noise-channel vocoding to preserve the temporal envelope and degrade the spectral information of speech. Each subject was tested with five spectral resolution conditions (unprocessed speech, 64-, 32-, 16-, and 8-channel vocoder conditions) and two target-to-masker ratio (TMR) conditions (3 and 0 dB). Ongoing electroencephalographic (EEG) responses and speech comprehension were measured in each spectral and TMR condition for each subject. Neural tracking of each speech stream was characterized by cross-correlating the EEG responses with the envelope of each of the simultaneous speech streams at different time lags. Results showed that spectral degradation and TMR both significantly influenced how top-down attention modulated the EEG responses to the attended and unattended speech. That is, the EEG responses to the attended and unattended speech streams differed more for the higher (unprocessed, 64 ch, and 32 ch) than the lower (16 and 8 ch) spectral resolution conditions, as well as for the higher (3 dB) than the lower TMR (0 dB) condition. The magnitude of differential neural modulation responses to the attended and unattended speech streams significantly correlated with speech comprehension scores. These results suggest that severe spectral degradation and low TMR hinder speech stream segregation, making it difficult to employ top-down attention to differentially process different speech streams.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. The amount of training subjects received for each spectral condition should be sufficient. Our previous work on vocoder speech perception established that the acclimatization process to vocoder speech in quiet was similar across young normal-hearing listeners (e.g., Kong et al. 2015), where performance plateaued after 30 short sentences. Once trained, these listeners’ ability to understand vocoder speech in quiet lasted after the training session into later days. As for training on vocoder speech perception in a competing talker condition, we decided to provide 4× the amount of training compared to the quiet condition to balance between sufficient training time and the length of the test session (i.e., about 2 h per session). The amount of training included 12 min per spectral condition per TMR (4 min during the training session, 4 min during the first test session of the test condition, 4 min during the second test session of the same test condition).

  2. As discussed by Horton et al. (2013), the positive and negative peaks in the cross-correlation functions are related to the P1-N1-P2 in the traditional EEG response to short discrete stimuli. Here, we used XR (stands for cross correlation) to indicate the cross correlation peaks, distinguishing from traditional EEG P1-N1-P2 components.

  3. This indicates that the neural data is highly reproducible. Using different EEG recording equipment (G.tec in Kong et al. 2014; BrainVision in the current study) on different groups of subjects, the patterns of results for the unprocessed 0 dB TMR condition in the current study are very similar to those in the same test condition reported in Kong et al. (2014). These two sets of data are highly correlated for both the attended (r 301  = 0.9461, p < 0.001) and unattended (r 301  = 0.9176, p < 0.001) speech stream.

References

  • Best V, Gallun FJ, Carlile S, Shinn-Cunningham BG (2007) Binaural interference and auditory grouping. J Acoust Soc Am 121:420–432

    Article  PubMed  Google Scholar 

  • Bregman AS (1990) Auditory scene analysis: the perceptual organization of sound. MIT Press, Cambridge, MA

    Google Scholar 

  • Buschman TJ, Miller EK (2007) Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science 315:1860–1862

    Article  CAS  PubMed  Google Scholar 

  • Culling JF, Darwin CJ (1993) The role of timbre in the segregation of simultaneous voices with intersecting F0 contours. Percept Psychophys 54:303–309

    Article  CAS  PubMed  Google Scholar 

  • Culling JF, Summerfield Q (1995) Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98:785–797

    Article  CAS  PubMed  Google Scholar 

  • Darwin CJ, Carlyon RP (1995) Auditory grouping. In: Moore BCJ (ed) Hearing. Academic Press, Orlando, FL, pp 387–424

    Chapter  Google Scholar 

  • Darwin CJ, Hukin RW (2000a) Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J Acoust Soc Am 107:970–977

    Article  CAS  PubMed  Google Scholar 

  • Darwin CJ, Hukin RW (2000b) Effects of reverberation on spatial, prosodic, and vocal-tract size cues to selective attention. J Acoust Soc Am 108:335–342

    Article  CAS  PubMed  Google Scholar 

  • de Cheveigne A, Simon JZ (2008) Denoising based on spatial filtering. J Neurosci Methods 171:331–339

    Article  PubMed Central  PubMed  Google Scholar 

  • Ding N, Simon JZ (2012a) Emergence of neural encoding auditory objects while listening to competing speakers. Proc Natl Acad Sci U S A 109:11854–11859

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Ding N, Simon JZ (2012b) Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J Neurophysiol 107:78–89

    Article  PubMed Central  PubMed  Google Scholar 

  • Ding N, Simon JZ (2014) Cortical entrainment to continuous speech: functional roles and interpretations. Front Hum Neurosci 8:311. doi:10.3389/fnhum.2014.00311

    Article  PubMed Central  PubMed  Google Scholar 

  • Ding N, Chatterjee M, Simon JZ (2014) Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. Neuroimage 88C:41–46

    Article  Google Scholar 

  • Elhilali M, Fritz JB, Chi TS, Shamma SA (2007) Auditory cortical receptive fields: stable entities with plastic abilities. J Neurosci 27:10372–10382

    Article  CAS  PubMed  Google Scholar 

  • Fischer R, Milfont TL (2010) Standardization in psychological research. Int J Psychol Res 3:88–96

    Google Scholar 

  • Fritz JB, Elhilali M, David SV, Shamma SA (2007) Auditory attention – focusing the searchlight on sound. Curr Opin Neurobiol 17:437–455

    Article  CAS  PubMed  Google Scholar 

  • Greenwood D (1990) A cochlear frequency-position function for several species – 29 years later. J Acoust Soc Am 87:2592–2605

    Article  CAS  PubMed  Google Scholar 

  • Jasper HH (1958) Report of the committee on methods of clinical examination in electroencephalography. Electroencephalogr Clin Neurophysiol 10:370–375

    Article  Google Scholar 

  • Horton C, Srinivasan R, D'Zmura M (2014) Envelope responses in single-trial EEG indicate attended speaker in a ‘cocktail party.’. J Neural Eng 11:046015. doi:10.1088/1741-2560/11/4/046015

    Article  PubMed Central  PubMed  Google Scholar 

  • Horton C, D'Zmura M, Srinivasan R (2013) Suppression of competing speech through entrainment of cortical oscillations. J Neurophysiol 109:3082–3093

    Article  PubMed Central  PubMed  Google Scholar 

  • Kerlin JR, Shahin AJ, Miller LM (2010) Attentional gain control of ongoing cortical speech representation in a “cocktail party.”. J Neurosci 30:620–628

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Kidd G Jr, Arbogast TL, Mason CR, Gallun FJ (2005) The advantage of knowing where to listen. J Acoust Soc Am 118:3804–3815

    Article  PubMed  Google Scholar 

  • Kong Y-Y, Zeng F-G (2006) Temporal and spectral cues in mandarin tone recognition. J Acoust Soc Am 120:2830–2840

    Article  PubMed  Google Scholar 

  • Kong Y-Y, Mullangi A, Ding N (2014) Differential modulation of auditory responses to attended and unattended speech in different listening conditions. Hear Res 316:73–81

    Article  PubMed  Google Scholar 

  • Kong Y-Y, Donaldson G, Somarowthu A (2015) Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation. J Acoust Soc Am 137:2846–2857

    Article  PubMed  Google Scholar 

  • Knol MJ, Pestman WR, Grobbee DE (2011) The (mis)use of overlap of confidence intervals to assess effect modification. Eur J Epidemiol 26:253–254

    Article  PubMed Central  PubMed  Google Scholar 

  • Lalor EC, Power AJ, Reilly RB, Foxe JJ (2009) Resolving precise temporal processing properties of the auditory system using continuous stimuli. J Neurophysiol 102:349–359

    Article  PubMed  Google Scholar 

  • Massida Z, Belin P, James C, Rouger J, Fraysse B, Barone P, Deguine O (2011) Voice discrimination in cochlear-implanted deaf subjects. Hear Res 275:120–129

    Article  CAS  PubMed  Google Scholar 

  • Mesgarani N, Chang EF (2012) Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485:233–236

    Article  CAS  PubMed  Google Scholar 

  • Moore BCJ (2007) Cochlear hearing loss: physiological, psychological and technical issues. John Wiley & Sons Ltd., West Sussex, UK

    Book  Google Scholar 

  • Oldfield RC (1971) The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9:97–113

    Article  CAS  PubMed  Google Scholar 

  • O’Sullivan JA, Power AJ, Mesgarani N, Rajaram S, Foxe JJ, Shinn-Cunningham BG, Slaney M, Shamma SA, Lalor EC (2014) Atttentional selection in a cocktail party environment can be decoded from single-trial EEG. Cereb Cortex. doi:10.1093/cercor/bht355

    PubMed  Google Scholar 

  • Oxenham AJ (2008) Pitch perception and auditory stream segregation: implications for hearing loss and cochlear implants. Trends Amplif 12:316–331

    Article  PubMed Central  PubMed  Google Scholar 

  • Peelle JE, Gross J, Davis MH (2013) Phase-locking responses to speech in human auditory cortex are enhanced during comprehension. Cereb Cortex 23:1378–1387

    Article  PubMed Central  PubMed  Google Scholar 

  • Power AJ, Foxe JJ, Forde EJ, Reilly RB, Lalor EC (2012) At what time is the cocktail party? A late locus of selective attention to natural speech. Eur J Neurosci 35:1497–1503

    Article  PubMed  Google Scholar 

  • Qin MK, Oxenham AJ (2005) Effects of envelope-vocoder processing on F0 discrimination and concurrent-vowel identification. Ear Hear 26:451–460

    Article  PubMed  Google Scholar 

  • Rimmele JM, Zion Golumbic E, Schroger E, Poeppel D (2015) The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene. Cortex 68:144–154

    Article  PubMed  Google Scholar 

  • Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9–18

    Article  CAS  PubMed  Google Scholar 

  • Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106–113

    Article  PubMed Central  PubMed  Google Scholar 

  • Shamma SA, Elhilali M, Micheyl C (2011) Temporal coherence and attention in auditory scene analysis. Trends Neurosci 34:114–123

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  • Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:304–304

    Article  Google Scholar 

  • Shinn-Cunningham BG (2008) Object-based auditory and visual attention. Trends Cogn Sci 12:182–186

    Article  PubMed Central  PubMed  Google Scholar 

  • Shinn-Cunningham BG, Best V (2008) Selective attention in normal and impaired hearing. Trends Amplif 12:283–299

    Article  PubMed Central  PubMed  Google Scholar 

  • Stickney GS, Zeng FG, Litovsky R, Assmann P (2004) Cochlear implant speech recognition with speech maskers. J Acoust Soc Am 116:1081–1091

    Article  PubMed  Google Scholar 

  • Zion-Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, Mckhann GM, Goodman RR, Emerson R, Mehta AD, Simon JZ, Poeppel D, Schroeder CE (2013) Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.”. Neuron 77:980–991

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We thank the reviewers and the associate editor for their helpful comments. This work is supported by NIH-NIDCD R01-DC012300 to Y.-Y.K.

Conflict of Interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ying-Yee Kong.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kong, YY., Somarowthu, A. & Ding, N. Effects of Spectral Degradation on Attentional Modulation of Cortical Auditory Responses to Continuous Speech. JARO 16, 783–796 (2015). https://doi.org/10.1007/s10162-015-0540-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10162-015-0540-x

Keywords

Navigation