|Articles|February 23, 2011

The Syllable as the Perceptual Unit in Speech Perception

Dom Massaro comments on the debate regarding the basic unit of speech perception.

The following was originally posted to Talking Brains.

Given that there have been some interesting debates here on Talking Brains regarding the basic unit of speech perception, I asked Dom Massaro, a prominent and long-time player in this debate, to put together a comment on the topic for publication here. He graciously agreed to do this for us and here it is. Thanks Dom!

-greg

*************

Some reminiscences on how I was led to propose the syllable as the perceptual unit in speech perception. I relied mostly on my writings in the literature rather than undocumented memory.

Dom Massaro

During my graduate studies in mathematical and experimental psychology and also during my postdoctoral position, I developed an information-processing approach to the study of behavior (see Massaro & Cowan, 1993, for this brand of information processing). Two important implications arose from this approach: 1) the proximal influences on behavior and 2) the time course of processing are central to a complete description of behavior (as opposed to simple environment-behavior relationships. My early studies involved a delineation of perception and memory processes in the processing of speech and music. The research led to a theory of perception and memory processes that revealed the properties of pre-perceptual and perceptual memory stores and rules for interference of information in these stores and theories of forgetting (Massaro, 1970).

Initiating my career as a faculty member, I looked to apply this information-processing approach to a more substantive domain of behavior. I held a graduate seminar for three years with the purpose of applying the approach to language processing. We learned that previous work in this area had failed to address the issues described above, and our theoretical framework and empirical reviews anticipated much of the research in psycholinguistics since that time in which the focus is on real-time on-line processing (see our book entitled, Understanding Language: An Information Processing Analysis of Speech Perception, Reading and Psycholinguistics, 1975)

My own research interests also expanded to include the study of reading and speech perception. Previous research had manipulated only a single variable in these fields, and our empirical work manipulated multiple sources of both bottom-up and top-down information. Gregg Oden and I collaborated to formulate a fuzzy logical model of perception (Oden & Massaro, 1978; Movellan & McClelland, 2001), which has served as a framework for my research to this day. Inherent to the model were prototypes in memory and, therefore, it was important to take a stance on perceptual units in speech and print. By this time, my research and research by others indicated the syllable and the letter as units in speech and print, respectively. Here is the logic I used.

Speech perception can be described as a pattern-recognition problem. Given some speech input, the perceiver must determine which message best describes the input. An auditory stimulus is transformed by the auditory receptor system and sets up a neurological code in a pre-perceptual auditory storage. Based on my backward masking experiments and other experimental paradigms, this storage holds the information in a pre-perceptual form for roughly 250 ms, during which time the recognition process must take place. The recognition process transforms the pre-perceptual image into a synthesized percept. One issue given this framework is, what are the patterns that are functional in the recognition of speech? These sound patterns are referred to as perceptual units.

One reasonable assumption is that every perceptual unit in speech has a representation in long-term memory, which is called a prototype. The prototype contains a list of acoustic features that define the properties of the sound pattern as they would be represented in pre-perceptual auditory storage. As each sound pattern is presented, its corresponding acoustic features are held in pre-perceptual auditory storage. The recognition process operates to find the prototype in long-term memory which best describes the acoustic features in pre-perceptual auditory storage. The outcome of the recognition process is the transformation of the pre-perceptual auditory image of the sound stimulus into a synthesized percept held in synthesized auditory memory.

According to this model, pre-perceptual auditory storage can hold only one sound pattern at a time for a short temporal period. Backward recognition masking studies have shown that a second sound pattern can interfere with the recognition of an earlier pattern if the second is presented before the first is recognized. Each perceptual unit in speech must occur within the temporal span of pre-perceptual auditory storage and must be recognized before the following one occurs for accurate speech processing to take place. Therefore, the sequence of perceptual units in speech must be recognized one after the other in a successive and linear fashion. Finally, each perceptual unit must have a relatively invariant acoustic signal so that it can be recognized reliably. If the sound pattern corresponding to a perceptual unit changes significantly within different speech contexts, recognition could not be reliable, since one set of acoustic features would not be sufficient to characterize that perceptual unit. Perceptual units in speech as small as the phoneme or as large as the phrase have been proposed.

The phoneme was certainly a favorite to win the pageant for speech’s perceptual unit. Linguists had devoted their lives to phonemes, and phonemes gained particular prominence when they could be distinguished from one another by distinctive features. Trubetzkoy, Jakobson, and other members of the "Prague school" proposed that phonemes in a language could be distinguished by distinctive features. For example, Jakobson, Fant, and Halle (1961) proposed that a small set of orthogonal, binary properties or features were sufficient to distinguish among the larger set of phonemes of a language. Jakobson et al. were able to classify 28 English phonemes on the basis of only nine distinctive features. While originally intended only to capture linguistic generalities, distinctive feature analysis had been widely adopted as a framework for human speech perception. The attraction of this framework is that since these features are sufficient to distinguish among the different phonemes, it is possible that phoneme identification could be reduced to the problem of determining which features are present in any given phoneme. This approach gained credibility with the finding, originally by Miller and Nicely (1955) and since by many others, that the more distinctive features two sounds share, the more likely they are to be perceptually confused for one another. Thus, the first candidate we considered for the perceptual unit was the phoneme.

Consider the acoustic properties of vowel phonemes. Unlike some consonant phonemes, whose acoustic properties change over time, the wave shape of the vowel is considered to be steady-state or tone-like. The wave shape of the vowel repeats itself anywhere from 75 to 200 times per second. In normal speech, vowels last between 100 and 300 ms, and during this time the vowels maintain a fairly regular and unique pattern. It follows that, by our criteria, vowels could function as perceptual units in speech.

Next let us consider consonant phonemes. Consonant sounds are more complicated than vowels and some of them do not seem to qualify as perceptual units. We have noted that a perceptual unit must have a relatively invariant sound pattern in different contexts. However, some consonant phonemes appear to have different sound patterns in different speech contexts. For example, the stop consonant phoneme /d/ has different acoustic representations in different vowel contexts. Since the steady-state portion corresponds to the vowel sounds, the first part, called the transition, must be responsible for the perception of the consonant /d/. The acoustic pattern corresponding to the /d/ sound differs significantly in the syllables /di/ and /du/. Hence, one set of acoustic features would not be sufficient to recognize the consonant /d/ in the different vowel contexts. Therefore, we must either modify our definition of a perceptual unit or eliminate the stop consonant phoneme as a candidate.

There is another reason why the consonant phoneme /d/ cannot qualify as a perceptual unit. In the model perceptual units are recognized in a successive and linear fashion. Research has shown, however, that the consonant /d/ cannot be recognized before the vowel is also recognized. If the consonant were recognized before the vowel, then we should be able to decrease the duration of the vowel portion of the syllable so that only the consonant would be recognized. Experimentally, the duration of the vowel in the consonant-vowel syllable (CV) is gradually decreased and the subject is asked when she hears the stop consonant sound alone. The CV syllable is perceived as a complete syllable until the vowel is eliminated almost entirely (Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967). At that point, however, instead of the perception changing to the consonant /d/, a nonspeech whistle is heard. Liberman et al. show that the stop consonant /d/ cannot be perceived independently of perceiving a CV syllable. Therefore, it seems unlikely that the /d/ sound would be perceived before the vowel sound; it appears, rather, that the CV syllable is perceived as an indivisible whole or gestalt.

These arguments led to the idea that the syllables function as perceptual units rather than containing two perceptual units each. One way to test this hypothesis is to employ the CV syllables in a recognition-masking task. Liberman et al., found that subjects could identify shortened versions of the CV syllables when most of the vowel portion is eliminated. Analogous to our interpretation of vowel perception, recognition of these shortened CV syllables also should take time. Therefore, a second syllable, if it follows the first soon enough, should interfere with perception of the first. Consider the three CV syllables /ba/, /da/, and /ga/ (/a/ pronounced as in father), which differ from each other only with respect to the consonant phoneme. Backward recognition masking, if found with these sounds, would demonstrate that the consonant sound is not recognized before the vowel occurs and also that the CV syllable requires time to be perceived.

There have been several experiments on the backward recognition masking of CV syllables (Massaro, 1974, 1975; Pisoni, 1972). Newman and Spitzer (1987) employed the three CV syllables /ba/, /da/, and /ga/ as test items in the backward recognition masking task. These items were synthetic speech stimuli that lasted 40 ms; the first 20 ms of the item consisted of the CV transition and the last 20 ms corresponded to the steady-state vowel. The masking stimulus was the steady-state vowel /a/ presented for 40 ms. In one condition, the test and masking stimuli were presented to opposite ears, that is, dichotically. All other procedural details followed the prototypical recognition-masking experiment.

The percentage of correct recognitions for 8 observers improved dramatically with increases in the silent interval between the test and masking CVs. These results show that recognition of the consonant is not complete at the end of the CV transition, nor even at the end of the short vowel presentation. Rather, correct identification of the CV syllable requires perceptual processing after the stimulus presentation. These results support our hypothesis that the CV syllable must have functioned as a perceptual unit, because the syllable must have been stored in pre-perceptual auditory storage, and recognition involved a transformation of this pre-perceptual storage into a synthesized percept of a CV unit. The acoustic features necessary for recognition must, therefore, define the complete CV unit. An analogous argument can be made for VC syllables also functioning as perceptual units (Massaro, 1974).

We must also ask whether perceptual units could be larger than vowels, CV, or VC syllables. Miller (1962) argued that the phrase of two or three words might function as a perceptual unit. According to our criteria for a perceptual unit, it must correspond to a prototype in long-term memory which has a list of features describing the acoustic features in the pre-perceptual auditory image of that perceptual unit. Accordingly, pre-perceptual auditory storage must last on the order of one or two seconds to hold perceptual units of the size of a phrase. But the recognition-masking studies usually estimate the effective duration of pre-perceptual storage to be about 250 ms. Therefore, perceptual units must occur within this period, eliminating the phrase as the perceptual unit.

The recognition-masking paradigm developed to study the recognition of auditory sounds has provided a useful tool for determining the perceptual units in speech. If preperceptual auditory storage is limited to 250 ms, the perceptual units must occur within this short period. This time period agrees nicely with the durations of syllables in normal speech.

The results of the present experiments demonstrate backward masking in a two-interval forced-choice task, a same-different task, and an absolute identification task. The backward masking of one sound by a second sound is interpreted in terms of auditory perception continuing after a short sound is complete. A representation of the short sound is held in a preperceptual auditory storage so that resolution of the sound can continue to occur after the stimulus is complete. A second sound interferes with the storage of the earlier sound interfering with its further resolution The current research contributes to the development of a general information processing model (Massaro, 1972, 1975).

To solve the invariance problem between acoustic signal and phoneme, while simultaneously adhering to a pre-perceptual auditory memory constraint of roughly 250 ms, Massaro (1972) proposed the syllables V, CV, or VC as the perceptual unit, where V is a vowel and C is a consonant or consonant cluster. This assumption was built into the foundation of the FLMP (Oden & Massaro, 1978). It should be noted that CVC syllables would actually be two perceptual units, the CV and VC portions, rather that just one. Assuming that this larger segment is the perceptual unit reinstates a significant amount of invariance between signal and percept. Massaro and Oden (1980, pp. 133—135) reviewed evidence that the major coarticulatory influences on perception occur within these syllables, rather than between syllables. Any remaining lack of invariance across these syllables could conceivably be disambiguated by additional sources of information in the speech stream.

References

Massaro, D.W. (1970). Perceptual Processes and Forgetting in Memory Tasks. Psychological Review, 77(6), 557-567.

Massaro, D.W. (1972). Preperceptual Images, Processing Time, and Perceptual Units in Auditory Perception. Psychological Review, 79(2), 124-145.

Massaro, D. W. (1974). Perceptual Units in Speech Recognition. Journal of Experimental Psychology, 102(2), 349-353.

Massaro, D.W. (1975). Understanding Language: An Information Processing Analysis of Speech Perception, Reading and Psycholinguistics. New York: Academic Press.

Massaro, D.W. and Cowan, N. (1993). Information Processing Models: Microscopes of the Mind. Annual Review of Psychology, 44, 383-425.

http://mambo.ucsc.edu/papers/1993.html

Massaro, D. W. & Oden, G. C. (1980). Speech Perception: A Framework for Research and Theory. In N.J. Lass (Ed.), Speech and Language: Advances in Basic Research and Practice. Vol. 3, New York: Academic Press, 129-165.

Movellan, J., and McClelland, J. L. (2001). The Morton-Massaro Law of Information Integration: Implications for Models of Perception. Psychological Review,

Join thousands of clinicians staying current on new therapies, trial data, and expert insights—subscribe to HCPLive today.

Latest CME

Multimedia

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 1)

Suneel Udani, MD

The Syllable as the Perceptual Unit in Speech Perception

Related Content

Lessons From a Career Dedicated to Transforming Heart Failure Care

Sebetralstat Resolves Breakthrough HAE Attacks Despite Prior Prophylaxis

Sanofi Announces Discontinuation of Amlitelimab for Atopic Dermatitis

Telehealth Stepped Alcohol Care Reduced Drinking in Liver Disease

Q&A: How Is Precision Medicine Changing Atopic Dermatitis Care? With Jason Hawkes, MD

Latest CME

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 1)

Looking Beneath the Surface: Latest Updates in Identifying and Managing Hypercortisolism in Patients with Type 2 Diabetes

Navigating Safety Data with Janus Kinase (JAK) Inhibitors in Atopic Dermatitis (AD) Management

Rapid Reviews in Retina™: Emerging Updates from Spring 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Interventional Dry Eye: A Stepwise Treatment & Management Approach

Assessing the Evidence for OX40-OX40L Axis Inhibition for the Treatment of Atopic Dermatitis

Burst CME: Managing Fluid Overload in Patients with Chronic Kidney Disease (Part 2)

Patient, Provider, and Caregiver Connection: Turning a New Leaf in Acute Pain Management – How Recent Advancements Impact the Treatment Paradigm

(CME Track) Collaborating Across the Continuum™: Best Practices in Patient-Centric Team Management of XLRP

Burst CME™: Optimizing Care for Patients with Psoriasis – Incorporating a Buy-and-Bill Model for Biologic Agents into Dermatological Practice

(CME Track) The Evolution of MacTel Management: Integrating Neuroprotective Therapies Into Clinical Practice

Live Expert Illustrations & Commentary™: Visualizing Novel Therapeutic Targets for Patients with Major Depressive Disorder

(CME Track) Rapid Reviews in Retina™: Emerging Updates from Summer 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

(CME Track) A Forward Look at Anti-VEGF Therapies: A Paradigm Shift in Neovascular Retinal Disease Management

Cases and Conversations™: Biologic Matchmaking in Psoriasis – Finding the Right Therapy for the Right Patient

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Collaborating Across the Continuum™: The Pediatrician’s Vital Role in Multidisciplinary Management of Pediatric PAH

Patient, Provider, and Caregiver Connection™: Addressing Patient Challenges With Holistic Approaches to Vitiligo Management

(CME Track) Community Collaborative Connections™: Optimizing the Collaborative Care of Neovascular Retinal Disease in a New Age of Treatment

(CME Track) SoCal Psych 2025: Overcoming Barriers to Long-Acting Injectable Agents in Schizophrenia

Cases and Conversations™: Mineralocorticoid Receptor Antagonists in Patients With HF—Augmenting Current Guidelines with Emerging Evidence

Hidradenitis Suppurativa: Diving Deeper Into Disease Pathogenesis, Severity Assessment, and Holistic Management Approaches

Burst CME CGM: Continuous Glucose Monitoring Considerations – Maximizing Quality of Life for Patients

Progress in Hyperlipidemia Management to Reduce ASCVD Risk: An Illustrated Update

Navigating Advances in Neovascular Retinal Disease: Translating Evidence to Practice in AMD, DME, and RVO

(CME Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

(COPE Track) Beyond the Collarette: Empowering Patients in the Management of Demodex Blepharitis

IgAN Case Files: Real Conversations, Evolving Evidence

Expert Illustrations & Commentaries™: Visualizing the Role of Dystrophin Dysregulation as a Therapeutic Target in Duchenne Muscular Dystrophy and MD STARnet

Shining a Light on an Ultra-Rare Disease – A Closer Look at Thymidine Kinase 2 Deficiency (TK2d)

Shining a Light on an Ultra-Rare Disease – A Closer Look at Thymidine Kinase 2 Deficiency (TK2d)

Identifying and Treating Generalized Myasthenia Gravis in the Modern Era

Addressing Unmet Needs for Patients With Spinal Muscular Atrophy—Understanding Patient Challenges and Management Approaches

Rewiring Recovery: Evidence-Based Approaches to Managing Chronic Inflammatory Demyelinating Polyneuropathy

Navigating Ocular Toxicities: A Multidisciplinary Roadmap for Managing Adverse Events in Targeted Cancer Therapy

(CME Track) Antibody–Drug Conjugates in Oncology: The Essentials of AE Management for Better Patient Outcomes

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

Clinical Consultations™: Tailoring Treatment for Cystic Fibrosis (CF) Across Life Stages and Evolving Health Needs

SimulatEd™: Partnering for Precision – A Framework for Personalized Care Planning in Acute Lymphoblastic Leukemia

Putting the Patient First in Acute Pain Management: The PA’s Guide to Incorporating Cutting-Edge Science Into Their Treatment Strategies

Putting the Patient First in Acute Pain Management: The PA’s Guide to Incorporating Cutting-Edge Science Into Their Treatment Strategies

Expert Illustrations & Commentaries™: Visualizing the Role of B Cells as Therapeutic Targets for Generalized Myasthenia Gravis

Rapid Reviews in Retina™: Emerging Updates from Fall 2025—Addressing the Wealth of New Data in Treatments for Neovascular Retinal Disease

Targeting the Cortisol Cascade: Diagnosis and Treatment Strategies in Patients with Hypertension

Targeting the Cortisol Cascade: Diagnosis and Treatment Strategies in Patients with Hypertension

Patient, Provider, and Caregiver Connection™: Individualizing Care in C3 Glomerulopathy – Understanding Patient Challenges and the Role of Innovative Treatment

Biomarker Testing in HER2+ GEA: Diagnosis and Treatment Implications

Navigating the Adverse Event Landscape in HER2+ GEA Therapy

Unlocking the Future of Glioma Care: Integrating Recent Advances to Personalize Treatment

Screening for Type 1 Diabetes and Delaying Its Onset—An Innovative View

Clear Skin, Clear Mind: Integrating Mental Health into Psoriasis Care

Clear Skin, Clear Mind: Integrating Mental Health into Psoriasis Care

Expert Illustrations & Commentary: Visualizing the Role of Novel Muscarinic Agents in the Management of Schizophrenia

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - NYC Metro

Bridging Regional Challenges in Retinal Disease Management: Applying Advanced Anti-VEGF Therapy in Community Practice - California

(CME Track) Tackling Oncologic Emergencies in Patients Treated With High-Dose Methotrexate

From Clue to Care: Rapid Recognition and Coordinated Management of Paraneoplastic LEMS in SCLC

Burst CME™: Optimal Management of Complications of Sickle Cell Disease

Burst CME™: Optimal Management of Complications of Sickle Cell Disease

Burst CME™: Transition from Pediatric to Adult Care in Sickle Cell Disease

Burst CME™: Transition from Pediatric to Adult Care in Sickle Cell Disease

Burst CME™: Disease-Modifying vs. Curative Therapy – Which Way to Go in Sickle Cell Disease?

Burst CME™: Disease-Modifying vs. Curative Therapy – Which Way to Go in Sickle Cell Disease?

Burst CME™ in Gaucher Disease: Patient Evaluation and Management

Optimizing Lipid-Lowering Strategies for ASCVD Risk Reduction: Bridging the Gap in Treatment Intensification

Collaborating Across the Continuum™: Integrating Novel Therapies Into Multidisciplinary Treatment Plans for Generalized Myasthenia Gravis

Burst CME™: Staying Informed and Up-To-Date on the Treatment of Lupus Nephritis

Burst CME™: Staying Informed and Up-To-Date on the Treatment of Lupus Nephritis

SimulatEd™: A Roadmap to Personalized Care Plans and Shared Decision-Making in Low-Grade Serous Ovarian Cancer

Community Collab™: Identifying the Role of Complement Inhibitors in the Management of Generalized Myasthenia Gravis