Abstract
This study investigates the effect of spectral degradation on cortical speech encoding in complex auditory scenes. Young normal-hearing listeners were simultaneously presented with two speech streams and were instructed to attend to only one of them. The speech mixtures were subjected to noise-channel vocoding to preserve the temporal envelope and degrade the spectral information of speech. Each subject was tested with five spectral resolution conditions (unprocessed speech, 64-, 32-, 16-, and 8-channel vocoder conditions) and two target-to-masker ratio (TMR) conditions (3 and 0 dB). Ongoing electroencephalographic (EEG) responses and speech comprehension were measured in each spectral and TMR condition for each subject. Neural tracking of each speech stream was characterized by cross-correlating the EEG responses with the envelope of each of the simultaneous speech streams at different time lags. Results showed that spectral degradation and TMR both significantly influenced how top-down attention modulated the EEG responses to the attended and unattended speech. That is, the EEG responses to the attended and unattended speech streams differed more for the higher (unprocessed, 64 ch, and 32 ch) than the lower (16 and 8 ch) spectral resolution conditions, as well as for the higher (3 dB) than the lower TMR (0 dB) condition. The magnitude of differential neural modulation responses to the attended and unattended speech streams significantly correlated with speech comprehension scores. These results suggest that severe spectral degradation and low TMR hinder speech stream segregation, making it difficult to employ top-down attention to differentially process different speech streams.
Similar content being viewed by others
Notes
The amount of training subjects received for each spectral condition should be sufficient. Our previous work on vocoder speech perception established that the acclimatization process to vocoder speech in quiet was similar across young normal-hearing listeners (e.g., Kong et al. 2015), where performance plateaued after 30 short sentences. Once trained, these listeners’ ability to understand vocoder speech in quiet lasted after the training session into later days. As for training on vocoder speech perception in a competing talker condition, we decided to provide 4× the amount of training compared to the quiet condition to balance between sufficient training time and the length of the test session (i.e., about 2 h per session). The amount of training included 12 min per spectral condition per TMR (4 min during the training session, 4 min during the first test session of the test condition, 4 min during the second test session of the same test condition).
As discussed by Horton et al. (2013), the positive and negative peaks in the cross-correlation functions are related to the P1-N1-P2 in the traditional EEG response to short discrete stimuli. Here, we used XR (stands for cross correlation) to indicate the cross correlation peaks, distinguishing from traditional EEG P1-N1-P2 components.
This indicates that the neural data is highly reproducible. Using different EEG recording equipment (G.tec in Kong et al. 2014; BrainVision in the current study) on different groups of subjects, the patterns of results for the unprocessed 0 dB TMR condition in the current study are very similar to those in the same test condition reported in Kong et al. (2014). These two sets of data are highly correlated for both the attended (r 301 = 0.9461, p < 0.001) and unattended (r 301 = 0.9176, p < 0.001) speech stream.
References
Best V, Gallun FJ, Carlile S, Shinn-Cunningham BG (2007) Binaural interference and auditory grouping. J Acoust Soc Am 121:420–432
Bregman AS (1990) Auditory scene analysis: the perceptual organization of sound. MIT Press, Cambridge, MA
Buschman TJ, Miller EK (2007) Top-down versus bottom-up control of attention in the prefrontal and posterior parietal cortices. Science 315:1860–1862
Culling JF, Darwin CJ (1993) The role of timbre in the segregation of simultaneous voices with intersecting F0 contours. Percept Psychophys 54:303–309
Culling JF, Summerfield Q (1995) Perceptual separation of concurrent speech sounds: absence of across-frequency grouping by common interaural delay. J Acoust Soc Am 98:785–797
Darwin CJ, Carlyon RP (1995) Auditory grouping. In: Moore BCJ (ed) Hearing. Academic Press, Orlando, FL, pp 387–424
Darwin CJ, Hukin RW (2000a) Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. J Acoust Soc Am 107:970–977
Darwin CJ, Hukin RW (2000b) Effects of reverberation on spatial, prosodic, and vocal-tract size cues to selective attention. J Acoust Soc Am 108:335–342
de Cheveigne A, Simon JZ (2008) Denoising based on spatial filtering. J Neurosci Methods 171:331–339
Ding N, Simon JZ (2012a) Emergence of neural encoding auditory objects while listening to competing speakers. Proc Natl Acad Sci U S A 109:11854–11859
Ding N, Simon JZ (2012b) Neural coding of continuous speech in auditory cortex during monaural and dichotic listening. J Neurophysiol 107:78–89
Ding N, Simon JZ (2014) Cortical entrainment to continuous speech: functional roles and interpretations. Front Hum Neurosci 8:311. doi:10.3389/fnhum.2014.00311
Ding N, Chatterjee M, Simon JZ (2014) Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure. Neuroimage 88C:41–46
Elhilali M, Fritz JB, Chi TS, Shamma SA (2007) Auditory cortical receptive fields: stable entities with plastic abilities. J Neurosci 27:10372–10382
Fischer R, Milfont TL (2010) Standardization in psychological research. Int J Psychol Res 3:88–96
Fritz JB, Elhilali M, David SV, Shamma SA (2007) Auditory attention – focusing the searchlight on sound. Curr Opin Neurobiol 17:437–455
Greenwood D (1990) A cochlear frequency-position function for several species – 29 years later. J Acoust Soc Am 87:2592–2605
Jasper HH (1958) Report of the committee on methods of clinical examination in electroencephalography. Electroencephalogr Clin Neurophysiol 10:370–375
Horton C, Srinivasan R, D'Zmura M (2014) Envelope responses in single-trial EEG indicate attended speaker in a ‘cocktail party.’. J Neural Eng 11:046015. doi:10.1088/1741-2560/11/4/046015
Horton C, D'Zmura M, Srinivasan R (2013) Suppression of competing speech through entrainment of cortical oscillations. J Neurophysiol 109:3082–3093
Kerlin JR, Shahin AJ, Miller LM (2010) Attentional gain control of ongoing cortical speech representation in a “cocktail party.”. J Neurosci 30:620–628
Kidd G Jr, Arbogast TL, Mason CR, Gallun FJ (2005) The advantage of knowing where to listen. J Acoust Soc Am 118:3804–3815
Kong Y-Y, Zeng F-G (2006) Temporal and spectral cues in mandarin tone recognition. J Acoust Soc Am 120:2830–2840
Kong Y-Y, Mullangi A, Ding N (2014) Differential modulation of auditory responses to attended and unattended speech in different listening conditions. Hear Res 316:73–81
Kong Y-Y, Donaldson G, Somarowthu A (2015) Effects of contextual cues on speech recognition in simulated electric-acoustic stimulation. J Acoust Soc Am 137:2846–2857
Knol MJ, Pestman WR, Grobbee DE (2011) The (mis)use of overlap of confidence intervals to assess effect modification. Eur J Epidemiol 26:253–254
Lalor EC, Power AJ, Reilly RB, Foxe JJ (2009) Resolving precise temporal processing properties of the auditory system using continuous stimuli. J Neurophysiol 102:349–359
Massida Z, Belin P, James C, Rouger J, Fraysse B, Barone P, Deguine O (2011) Voice discrimination in cochlear-implanted deaf subjects. Hear Res 275:120–129
Mesgarani N, Chang EF (2012) Selective cortical representation of attended speaker in multi-talker speech perception. Nature 485:233–236
Moore BCJ (2007) Cochlear hearing loss: physiological, psychological and technical issues. John Wiley & Sons Ltd., West Sussex, UK
Oldfield RC (1971) The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 9:97–113
O’Sullivan JA, Power AJ, Mesgarani N, Rajaram S, Foxe JJ, Shinn-Cunningham BG, Slaney M, Shamma SA, Lalor EC (2014) Atttentional selection in a cocktail party environment can be decoded from single-trial EEG. Cereb Cortex. doi:10.1093/cercor/bht355
Oxenham AJ (2008) Pitch perception and auditory stream segregation: implications for hearing loss and cochlear implants. Trends Amplif 12:316–331
Peelle JE, Gross J, Davis MH (2013) Phase-locking responses to speech in human auditory cortex are enhanced during comprehension. Cereb Cortex 23:1378–1387
Power AJ, Foxe JJ, Forde EJ, Reilly RB, Lalor EC (2012) At what time is the cocktail party? A late locus of selective attention to natural speech. Eur J Neurosci 35:1497–1503
Qin MK, Oxenham AJ (2005) Effects of envelope-vocoder processing on F0 discrimination and concurrent-vowel identification. Ear Hear 26:451–460
Rimmele JM, Zion Golumbic E, Schroger E, Poeppel D (2015) The effects of selective attention and speech acoustics on neural speech-tracking in a multi-talker scene. Cortex 68:144–154
Schroeder CE, Lakatos P (2009) Low-frequency neuronal oscillations as instruments of sensory selection. Trends Neurosci 32:9–18
Schroeder CE, Lakatos P, Kajikawa Y, Partan S, Puce A (2008) Neuronal oscillations and visual amplification of speech. Trends Cogn Sci 12:106–113
Shamma SA, Elhilali M, Micheyl C (2011) Temporal coherence and attention in auditory scene analysis. Trends Neurosci 34:114–123
Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M (1995) Speech recognition with primarily temporal cues. Science 270:304–304
Shinn-Cunningham BG (2008) Object-based auditory and visual attention. Trends Cogn Sci 12:182–186
Shinn-Cunningham BG, Best V (2008) Selective attention in normal and impaired hearing. Trends Amplif 12:283–299
Stickney GS, Zeng FG, Litovsky R, Assmann P (2004) Cochlear implant speech recognition with speech maskers. J Acoust Soc Am 116:1081–1091
Zion-Golumbic EM, Ding N, Bickel S, Lakatos P, Schevon CA, Mckhann GM, Goodman RR, Emerson R, Mehta AD, Simon JZ, Poeppel D, Schroeder CE (2013) Mechanisms underlying selective neuronal tracking of attended speech at a “cocktail party.”. Neuron 77:980–991
Acknowledgments
We thank the reviewers and the associate editor for their helpful comments. This work is supported by NIH-NIDCD R01-DC012300 to Y.-Y.K.
Conflict of Interest
The authors declare that they have no conflict of interest.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kong, YY., Somarowthu, A. & Ding, N. Effects of Spectral Degradation on Attentional Modulation of Cortical Auditory Responses to Continuous Speech. JARO 16, 783–796 (2015). https://doi.org/10.1007/s10162-015-0540-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10162-015-0540-x