The study of intelligible speech has a long history in neuroimaging.
The following originally appeared on Talking Brains.
Guest post from Jonathan Peelle:
There were certainly a lot of interesting topics that came up at the SfN nanosymposium, which goes to show that I think we should do this sort of thing more often.
The study of intelligible speech has a long history in neuroimaging. On the one hand, as Greg and others have emphasized, it is a tricky thing to study, because a number of linguistic (and often acoustic) factors are confounded when looking at intelligible > unintelligible contrasts. So once we identify intelligibility-responsive areas, we still have a lot of work to do in order to relate anatomy to cognitive operations involved in speech comprehension. That being said, it does seem like a good place to start, and a reasonable way to try to dissociate language-related processing from auditory/acoustic processing. Depending on the approach used, intelligibility studies can also tell us a great deal about speech comprehension under challenging conditions (e.g. background noise, cochlear implants, hearing loss) that have both theoretical and practical relevance.
One thing I suspect everyone agrees on is that, at the end of the day, we should be able to account for multiple sources of evidence: lesion, PET, fMRI, EEG/MEG, as well as various effects of stimuli and analysis approach. With that in mind, there are a few comments to add to this discussion.
Regarding Okada et al. (2010), I won’t repeat all the points we have made previously (Peelle et al., 2010a), but the influence of background noise (continuous scanning) shouldn’t be underestimated. If background noise simply increases global brain signal (i.e. an increase in gain), it shouldn’t have impacted the results. But background noise can interact with behavioral factors, and results in spatially constrained patterns of univariate signal increase (including left temporal cortex, e.g. Peelle et al. 2010b):
So, in the absence of data I am reluctant to assume that background noise and listening effort wouldn’t affect multivariate results. This goes along with the point that even if two types of stimuli are intelligible, they can differ in listening effort, which is going to impact the neural systems engaged in comprehension. In Okada et al. (2010), this means that a region that distinguishes between the clear and vocoded conditions might be showing acoustic sensitivity (the argument made by Okada et al.), or it may instead be indexing listening effort.
Another point worth emphasizing is that although the materials introduced by Scott et al. (2000) have many advantages and have been used in a number of papers, there are a number of ways to investigate intelligibility responses, and we should be careful not to conclude too much from a single approach. As we have pointed out, Davis and Johnsrude (2003) parametrically varied intelligibility within three types of acoustic degradation, and found regions of acoustic insensitivity both posterior and anterior to primary auditory areas in the left hemisphere, and anterior to primary auditory cortex in the right hemisphere.
One advantage to this approach is that parametrically varying speech clarity may give a more sensitive way to assess intelligibility responses than a dichotomous “intelligible > unintelligible” contrast. The larger point is that multivariate analyses, although extremely useful, are not a magic bullet; we also need to carefully consider the particular stimuli and task used (which I would argue also includes background noise).
Incidentally, in Davis and Johnsrude (2003), responses that are increased when speech is distorted (aka listening effort) look like this (i.e. including regions of temporal cortex):
The role of inferotemporal cortex in speech comprehension
One side point which came up in discussion at the symposium was the role of posterior inferior temporal gyrus / fusiform, which appears in the Hickok & Poeppel model; I think the initial point was that this is not consistently seen in functional imaging studies, to which Greg replied that the primary support for that region was lesion data. It’s true that this region of inferotemporal cortex isn’t always discussed in functional imaging studies, but it actually occurs quite often—often enough that I would say the functional imaging evidence for its importance is rather strong. We review some of this evidence briefly in Peelle et al. (2010b; p. 1416, bottom), but it includes the following studies:
Speaking of inferotemporal cortex, there is a nice peak here in the Okada et al. results (Figure 2, Table 1):
Once you start looking for it, it crops up rather often. (Although it’s also worth noting that the lack of results in this region in fMRI studies may be due to susceptibility artifacts in this region, rather than a lack of neural engagement.)
Anterior vs. Posterior: Words vs. Sentences?
With respect to the discussion about posterior vs. anterior temporal regions being critical for speech comprehension, it strikes me that we all need to be careful about terminology. I.e., does “speech” refer to connected speech (sentences) or single words? One explanation of the lesion data referred to in which a patient with severe left anterior temporal damage performed well on “speech perception” is that the task was auditory word comprehension. How did this patient do on sentence comprehension measures? I think a compelling case could be made that auditory word comprehension is largely bilateral and more posterior, but that in connected speech more anterior (and perhaps left-lateralized) regions become more critical (e.g., Humphries et al., 2006):
As far as I know, no one has done functional imaging of intelligibility of single words in the way that many have done with sentences; nor have there been sentence comprehension measures on patients with left anterior temporal lobe damage. So, at this point I think more work needs to be done before we can directly compare these sources of evidence.
Broadly though, I don’t know how productive it will be to specify which area responds “most” to intelligible speech. Given the variety of challenges which our auditory and language systems need to deal with, surely it comes down to a network of regions that are dynamically called into action depending on (acoustic and cognitive) task demands. This is why I think that we need to include regions of prefrontal, premotor, and inferotemporal cortex in these discussions, even if they don’t appear in every imaging contrast.
Awad M, Warren JE, Scott SK, Turkheimer FE, Wise RJS (2007) A common system for the comprehension and production of narrative speech. Journal of Neuroscience 27:11455-11464. http://dx.doi.org/10.1523/JNEUROSCI.5257-06.2007
Davis MH, Johnsrude IS (2003) Hierarchical processing in spoken language comprehension. Journal of Neuroscience 23: 3423-3431. http://www.jneurosci.org/cgi/content/abstract/23/8/3423
Humphries C, Binder JR, Medler DA, Liebenthal E (2006) Syntactic and semantic modulation of neural activity during auditory sentence comprehension. Journal of Cognitive Neuroscience 18:665-679. http://dx.doi.org/10.1162/jocn.2006.18.4.665
Okada K, Rong F, Venezia J, Matchin W, Hsieh I-H, Saberi K, Serences JT, Hickok G (2010) Hierarchical organization of human auditory cortex: Evidence from acoustic invariance in the response to intelligible speech. Cerebral Cortex 20:2486-2495. http://dx.doi.org/10.1093/cercor/bhp318
Orfanidou E, Marslen-Wilson WD, Davis MH (2006) Neural response suppression predicts repetition priming of spoken words and pseudowords. Journal of Cognitive Neuroscience 18:1237-1252. http://dx.doi.org/10.1162/jocn.2006.18.8.1237
Peelle JE, Johnsrude IS, Davis MH (2010a) Hierarchical processing for speech in human auditory cortex and beyond [Commentary on Okada et al. (2010)]. Frontiers in Human Neuroscience 4: 51. http://frontiersin.org/Human_Neuroscience/10.3389/fnhum.2010.00051/full
Peelle JE, Eason RJ, Schmitter S, Schwarzbauer C, Davis MH (2010b) Evaluating an acoustically quiet EPI sequence for use in fMRI studies of speech and auditory processing. NeuroImage 52: 1410—1419. http://dx.doi.org/10.1016/j.neuroimage.2010.05.015
Rodd JM, Davis MH, Johnsrude IS (2005) The neural mechanisms of speech comprehension: fMRI studies of semantic ambiguity. Cerebral Cortex 15:1261-1269. http://dx.doi.org/doi:10.1093/cercor/bhi009
Rodd JM, Longe OA, Randall B, Tyler LK (2010) The functional organisation of the fronto-temporal language system: Evidence from syntactic and semantic ambiguity. Neuropsychologia 48:1324-1335. http://dx.doi.org/10.1016/j.neuropsychologia.2009.12.035
Scott SK, Blank CC, Rosen S, Wise RJS (2000) Identification of a pathway for intelligible speech in the left temporal lobe. Brain 123:2400-2406. http://dx.doi.org/10.1093/brain/123.12.2400