Legal forms

The Effect of Talker-and Listener-Related Factors on Intelligibility for a Real-Word, Open-Set Perception Test

The aims of this study were to evaluate whether talker intelligibility is consistent across listeners differing in age and gender and to investigate the process of attunement to talker characteristics in children and adults. Word intelligibility
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
  Markham & Hazan : Effect of Talker and Listener Factors on Intelligibility    725    Journal of Speech, Language, and Hearing Research • Vol. 47 • 725–737 • August 2004 • ©American Speech-Language-Hearing Association1092-4388/04/4704-0725 Duncan Markham Valerie Hazan University College London,London, U.K.  The Effect of Talker- andListener-Related Factors onIntelligibility for a Real-Word,Open-Set Perception Test The aims of this study were to evaluate whether talker intelligibility is consistent across listeners differing in age and gender and to investigate the process of attunement to talker characteristics in children and adults. Word intelligibility rates were obtained from 135 listeners (adults, 11–12-year-olds, and 7–8-year-olds)for 45 talkers from a homogeneous accent group. There were 2 test conditions,each containing multiple talkers. Both test conditions contained multiple talkers. Inthe single-word condition, key words were presented in isolation, whereas in thetriplet condition, triplets of key words were preceded by a precursor sentence bythe same talker. For identical word materials, word intelligibility at a signal-to-noise ratio of +6 dB varied significantly across talkers from 81.2% to 96.4%.Overall, younger listeners made significantly more errors than older children oradults, and women talkers were more intelligible than other classes of talkers. Therelative intelligibility of the 45 talkers was highly consistent across listener groups,suggesting that talker intelligibility is primarily determined by talker-related factorsrather than by the interrelation of talker- and listener-related factors. The presenceof a precursor sentence providing indexical information did improve wordintelligibility for the bottom quartile of listeners in each of the listener groups. KEY WORDS: intelligibility, talker variability, perception, development  T he success of any speech communication event is dependent on acombination of individual talker and listener characteristics, com-municative environment, and message structure. We have a rela-tively poor understanding of what makes a talker of any age more orless intelligible under any circumstance, and of the extent to which theage and language exposure of the listener may interact with talker-re-lated characteristics during this process. The aim of this study, there-fore, was to assess a range of talkers for relative intelligibility, to ascer-tain whether the perception of different talkers varies between adultsand children, and to evaluate the effect of talker attunement on speechperception in different age groups.The effect of talker characteristics on speech perception can be seenin terms of their impact on intelligibility and on the cognitive load in-volved in the perceptual process. This issue has attracted much attentionin recent years due to a debate regarding the nature of the processesinvolved in talker normalization during speech perception. Traditionally,it was thought that the indexical information (e.g., about the identity of   726    Journal of Speech, Language, and Hearing Research • Vol. 47 • 725–737 • August 2004 the talker, dialect, gender) contained in the speech sig-nal was discarded once the signal was decoded into ab-stract linguistic units (Studdert-Kennedy, 1974). How-ever, this assumption is challenged by the fact that wordidentification is enhanced for talkers with “known” voices(Nygaard, Sommers, & Pisoni, 1994, 1995) and by thefinding that there is mutual interference between pho-neme and voice judgments in speeded classification tasks(Mullennix & Pisoni, 1990). These observations have ledto the view that indexical information is encoded andretained along with linguistic information (Pisoni, 1997).Exemplar models of spoken word perception suggest thatdetailed perceptual traces of previously heard instancesof tokens are stored in long-term memory for each wordin the lexicon (Goldinger, 1998; Palmeri, Goldinger, &Pisoni, 1993). In word recognition, tokens are classifiedaccording to their similarity to stored exemplars. Thatis, word recall will be enhanced for known voices becausethere will be a close match for tokens produced by previ-ously heard voices. Furthermore, it appears that the epi-sodic memory of a particular voice may facilitate theperception of other voices with similar acoustic–phoneticcharacteristics. For example, one study showed thatwords produced by voices that were perceptually simi-lar to previously heard voices were better identified thanwords produced by dissimilar voices (Goldinger, 1996).Therefore, it might be expected from exemplar modelsof spoken word recognition that a listener’s exposure todifferent types of voices would at least partially deter-mine the relative intelligibility of newly heard voices.There is ample evidence that increased variabilityin the speech signal leads to an increased cognitive load.Perceptual tasks involving multiple talkers impose agreater processing load than the same tasks presentedin a single-talker condition (e.g., Goldinger, 1996; Mul-lennix, Pisoni, & Martin, 1989; Ryalls & Pisoni, 1997;Takayanagi, Dirks, & Moshfegh, 2002). Greater process-ing loads are also obtained when changes in speech rate(Nygaard et al., 1995; Sommers, Nygaard, & Pisoni,1994), in the position of target consonants (Mullennix,Farnsworth, Harshaw, Bittle, & Evanitz, 2000), and inthe emotional characteristics of the voice (Mullennix,Bihon, Bricklemyer, Gaston, & Keener, 2002) are intro-duced, and for talkers who are less clear and consistentin their consonant production (Newman, Clouse, &Burnham, 2001). However, the fact that certain types of variation do not affect processing load (e.g., vocal am-plitude, as shown in Sommers et al., 1994) suggests thatvariability with phonetic relevance may be treated dif-ferently than other types of variability in the acousticsignal. The perceptual weight of different sources of talker variation is also difficult to ascertain because theeffect of an individual source of variation also has beenshown to vary with the presence of another source of variation (Spisak, Mullennix, Moro, & Will, 2002).The effect of talker characteristics on intelligibilityalso has attracted some attention. However, most find-ings have been based on relatively small numbers of adult talkers (typically from 4 to 20), and this may limitthe degree to which findings can be generalized to a “nor-mal” talker population. For identical test materials, talk-ers from a homogeneous accent group have been foundto vary significantly in intelligibility (Bradlow, Torretta,& Pisoni, 1996). Degree of intelligibility has been foundto be related to a number of acoustic–phonetic factors,including word and vowel duration, size of vowel spaceand cues to consonantal contrasts (Bond & Moore, 1994),fundamental frequency range, and precision of articu-lation (Bradlow et al., 1996). There also is some evidencethat female talkers may be more intelligible on averagethan male talkers (Bradlow et al., 1996).The impact of listener characteristics on perceptualprocessing also needs to be considered. Listeners weightvarious speech-like dimensions differently (Christensen& Humes, 1997), vary in their weighting of cues to pho-nemic contrasts (Hazan & Rosen, 1991), and do not re-spond uniformly well to dimension-directed training.Listeners also show varying degrees of susceptibility todistortion of the speech signal (Eisenberg, Shannon,Shaefer Martinez, Wygonski, & Boothroyd, 2000) andof ability to process anomalous linguistic material(Nittrouer & Boothroyd, 1990). Finally, listeners differin their ability to recognize voices, as shown by Nygaardand Pisoni (1998), whose listeners did not perform uni-formly on a voice identification training task.Listener age is a key factor and affects the use of both linguistic and sensory information contained in thespeech signal, at least for the young end of the age con-tinuum (e.g., Eisenberg et al., 2000; Nittrouer & Booth-royd, 1990). Children differ in their use of sensory infor-mation, and the development of the ability to deal withtalker variability during the course of first-languageacquisition is still relatively unexplored. Infants showearly abilities both to discriminate between talkers (e.g.,Jusczyk, Pisoni, & Mullennix, 1992)   and to normalizeacross talkers (e.g., Kuhl, 1979). However, the paradigmsused in these infant perception studies are not compa-rable to those used in adult studies. For a forced-choiceword identification task with single- or multiple-talkerconditions, Ryalls and Pisoni (1997) found evidence of adevelopmental effect in the way children deal with talkernormalization. Specifically, 3-year-old children weremore affected by talker variability than were 5-year olds.Whether the process is “adult-like” by the age of 5 yearsis difficult to ascertain on the basis of the evidence avail-able. There is clear evidence that certain aspects of per-ceptual development are far from complete at this age.Indeed, lower intelligibility rates have been obtained foryounger children (typically between 6 and 10 years of age) than for older children and adults for stimuli that  Markham & Hazan : Effect of Talker and Listener Factors on Intelligibility    727 were degraded spectrally (Eisenberg et al., 2000), bynoise (e.g., Elliott, 1979; Fallon, Trehub, & Schneider,2000), or by reverberation (Johnson, 2000). Fourteen-year-old children did not achieve adult-like performancewhen stimuli were degraded both by noise and rever-beration (Johnson, 2000). Although many studies acknowledge that poorerintelligibility in children in degraded conditions is partlydue to their poorer use of linguistic/contextual infor-mation (e.g., Eisenberg et al., 2000), there also is strongevidence of poorer use of sensory information (e.g.,Nittrouer & Boothroyd, 1990). For example, Fallon etal. (2000), using age-appropriate test materials, providedevidence that children require a greater signal-to-noiseratio (SNR) to achieve a level of performance equiva-lent to that of adults and that a further comparable de-crease in SNR had the same effect across age groups.This suggests that children’s poorer performance wasdue to sensory limitations rather than to differences inattention or cognitive ability. At the level of phonemiccategorization using synthesized stimuli, 12-year-oldchildren had not reached adult levels of performance(Hazan & Barrett, 2000).The issue of whether children are similar to adultsin terms of the relative intelligibility of different talk-ers is an interesting one in terms of models of talkernormalization, because adults and children differ in theirexposures to different types of talkers. Children aged6–12 years are likely to have greater recent exposure tochildren’s voices than do most adults, and they are morelikely to have greater exposure to women’s voices (ascaregivers and teachers) than to men’s voices. Becauseexemplar theories see word recognition as dependingon the degree of similarity of the incoming sound tostored episodic traces, it would be expected that the typesof voices to which a listener is primarily exposed willaffect word perception. In turn, relative talker intelligi-bility should be affected by listener-related factors. If the acoustic–phonetic characteristics of talkers’ voices,and not listener-related factors, are the primary factorin determining intelligibility, then a similar talker rank-ing would be expected across listener groups.Therefore, the aim of our study was twofold. Thefirst objective was to evaluate the relation between lis-tener-related and talker-related effects on word intelli-gibility. To this end, word intelligibility for 45 talkersdiffering in age (13 years, adults) and gender was evalu-ated for groups of listeners differing in age (7–8 years,11–12 years, adults) and gender. 1 The second objectivewas to investigate the process of talker attunement inchildren and adults by comparing word intelligibility inconditions where a sentence precursor produced by thesame talker was present or absent. Method Materials  The test materials required for this study had to (a)be appropriate for children ages 7 years and older whowere talkers of British English, (b) highlight likely er-rors in consonant perception, (c) enable unconstrainedresponses by listeners, and (d) be appropriate for anexperimental procedure requiring many responses formany talkers from each listener. Because no publishedmaterials fulfilled these requisites, a new test was de-veloped. The starting point in the development of thetest material was a database that covered all monosyl-labic words of Standard British English except rare, tech-nical, or obsolescent items. The database was obtainedby computing all legal CVC combinations (with C beingeither a singleton or cluster). A subset of approximately700 unique monosyllabic words likely to be familiar to7-year-old children was subjected to independent verifi-cation by a group of seven British primary school teach-ers and literacy educators. The set of 380 words judgedto be definitely familiar to 7-year-old children was ex-amined for words that would maximize potential conso-nant and vowel confusions (e.g., Miller & Nicely, 1955;Redford & Diehl, 1999). A final set of 124 key words wasselected. 2 Further details of test design are given inMarkham and Hazan (2002). Recordings  Fifty-five talkers of British English with a region-ally neutral or mild southeastern accent, includingmen, women, and 13-year-old children, were recordedin an anechoic chamber at the Department of Phonet-ics and Linguistics at University College London. Speechand a laryngographic signal, which provides direct in-formation about vocal-fold activity (Fourcin, 1974),were recorded with a digital audio recorder at 44.1 kHz.The talkers recorded a wide variety of speech materi-als, including read texts, word lists, nonsense syllables,and semi-spontaneous speech (Markham & Hazan,2002). 1 Child talkers were slightly older (mean age = 13;2 [years; months]) thanchild listeners (ages 7–8 years and 11–12 years). This was because pilotrecordings showed that children younger than 12 to 13 years did not havethe necessary attention span and voice control to record large amounts of materials in an anechoic chamber. Care was taken to select boys whosevoice had not yet broken. 2 The key words included in the test material were evaluated in terms of their familiarity ranking (on a scale of 1 to 7) as published in Luce andPisoni (1998). A total of 111 out of 124 words had a familiarity scorebetween 6 and 7. Five words were unlisted in the Luce and Pisoni scales,and 8 words had familiarity scores lower than 6. Mean intelligibility ratesobtained by the 7–8-year-old children for high-familiarity words (89.4%)did not differ from those for lower familiarity words/unlisted words(91.7%).  728    Journal of Speech, Language, and Hearing Research • Vol. 47 • 725–737 • August 2004 The voices of 45 of the 55 recorded talkers were usedfor stimuli (see Table 1). The remaining talkers wererejected because of unsuitable intonational behavior,clear regional accent markers, or technical problemsduring the recording procedure.Recordings for each talker were transferred digitallyto a PC and were segmented automatically. Two record-ings of each key word then were identified for each talkerand a single token was selected. Two tokens of each of two carrier phrases also were identified. Goodness of token was assessed on the basis of normal prosody, stableproduction rate, and recording quality, but not with ref-erence to articulatory clarity. Preparation of Materials  Two presentation conditions were devised to eval-uate the effect of indexical information on talker intel-ligibility. In the triplet condition, a precursor phrase(“and the next three words are” or “and now please say”)was followed by three key words from the same talker.The intention was that the precursor would provide anadequate speech sample for perceptual attunement tooccur. In the single-word condition, key words werepresented singly, without precursor. Both conditionswere multitalker in that either triplets or single wordsfrom 15 talkers, differing in gender and age, were fullyrandomized.The stimuli were prepared as follows. All speechmaterials were processed and presented at the srcinalsampling frequency of 44.1 kHz. All words were equatedto a fixed root mean square level. For the single-wordcondition, the key words were stored as individual files.For the triplet condition, automatic procedures were usedto combine precursor carrier phrases and three individualfiles, each containing a key word. The key words wereseparated by 200 ms, with a 300-ms gap following thelast key word. The triplets were constructed so that iden-tical consonants could not abut each other and, wher-ever possible, so that all three words started and endedwith different consonants and contained different vow-els. All triplets thus generated were assessed by the ex-perimenter for prosodic acceptability and matched loud-ness. The 124 key words for each talker were combinedinto 135 different triplets so that each key word washeard in a variety of contexts and positions. A pilot test of word identification revealed that childand adult listeners understood all words when presentedin quiet conditions (i.e., without noise). Twenty-talkerbabble, produced at the Medical Research Council’s In-stitute for Hearing Research, was then mixed with thestimuli at a number of different SNRs. Ten 7-year-oldchildren were tested pseudo-adaptively on a subset of the word list until an error rate of approximately 20%was obtained (at SNR +6 dB). All triplets and singlewords therefore were mixed with babble to obtain theSNR of +6 dB. Differences in auditory sensitivity de-scribed by Fallon et al. (2000) indicate that 11-year-oldchildren and adults may be expected to differ by approxi-mately 2 dB SNR (relative to a –33 dB SNR level foradults), and 7–8-year-old children may differ by at mostanother 2 dB on intelligibility tests for levels of perfor-mance around 85%. However, as we were concerned herewith differences in relative rather than absolute talkerintelligibility across listener groups, we did not vary theSNR for different age groups. Listeners   All listeners were talkers of British English with anonregional or southeastern accent. Forty-five child lis-teners ages 7–8 years and 45 child listeners ages 11–12years were recruited from a number of private schoolsin the London area (see Table 2). The age groups forchild listeners were chosen to represent both less andmore mature points on the developmental scale, but allwere capable of participating in tests administered toadults. Information about each listener’s health, speechand hearing status, accent background, and environmentwas elicited from parents via a questionnaire. Childrenwere included only if they reported no hearing patholo-gies (other than temporary afflictions in early childhood),had pure-tone thresholds of 25 dB HL or better between0.5 and 8 kHz, and passed a language screening test—the Clinical Evaluation of Language Fundamentals Re-calling Sentences test (Semel, Wiig, & Secord, 1995). 3 Anychild judged to have abnormal pronunciation also wasexcluded. Forty-five adult listeners were recruited fromthe university community; they were required to passthe same screening tests as the children. All childrenwere given certificates for participating, and all adultsreceived a small remuneration for their participation. Test Design  The 45 talkers were divided into three subgroups of 15 (each containing 5 men, 6 women, 2 girls, and 2 boys), Table 1. Talker age characteristics. Talkergroup N   Age range MSD   Women1822–5833;1110;9Men1520–5130;710;5Girls613–1413;20;5Boys612–1413;20;9 3 Because of attention or memory problems, a further 12 children wereexcluded from testing after the first 10 min of the first session.  Markham & Hazan : Effect of Talker and Listener Factors on Intelligibility    729 and the 135 listeners were divided into three groups of 45 listeners (each containing 15 younger children, 15older children, and 15 adults). Each listener subgroupheard talkers from one talker subgroup for a given testcondition (triplet or single word). Each listener group of 45 participants was divided into five balanced subgroupsof 9 listeners, who heard a different set of 27 key wordsspoken by each of the 15 talkers in the talker subgroup.Thus, each token by each talker was heard by the same9 listeners, and the totality of tokens from each talkerwas heard by the same 45 listeners. Each key word washeard approximately three times by each listener. In thetriplet condition, the carrier phrase heard alternatedfor each triplet presented. In both conditions, consecu-tive stimuli alternated by talker gender. Procedure  Children were tested individually in a quiet roomin their school. After completing the screening tests, lis-teners were familiarized with the task by oral instruc-tions and then by listening to and repeating three prac-tice triplets. The experimenter ensured that thepresentation level was comfortable and that the listenerhad understood the task. The following additional in-formation was provided: You’re going to hear lots of different words. You’llknow most of the words, but there will be someyou don’t know, or which sound silly. What I wantyou to do is say exactly what you hear, so if youhear someone say “cat,” “horse,” “pog,” then Iwant you to repeat it exactly as you heard it.The mention of possible nonwords proved necessarybecause children in the pilot studies, although told thatthey would hear “words,” often produced nonwords forthe stimuli heard. Therefore, it was necessary to try toensure that there was no ambiguity in the meaning of the term word . Adult listeners were told that they wouldhear real and made-up words, with the same instruc-tion as above. Instructions for the single-word presen-tation condition were given before that part of each ses-sion commenced.Stimuli were presented via headphones (Sennheiser,Model HD433) at a comfortable listening level that wasconstant for all listeners. The presentation of stimuliwas controlled from a laptop computer. The listener’schair was rotated to an angle of approximately 90° fromthe experimenter, a trained phonetician, who was thenable to watch the oral articulation of the listener whennecessary and record nontarget responses.Testing took place individually over two 30-min ses-sions separated by at least 24 hr. Average session sepa-ration was approximately 7 days, with the actual rangebeing 24 hr to 10 days. In the first session, each listenerheard 72 triplets and 150 single words. In the secondsession, 63 triplets and 225 single words were presented. Results Learning Effects  The mean word intelligibility rate per listener wascalculated for each test session; a summary of wordintelligibility rates obtained in Sessions 1 and 2 foreach listener group is given in Table 3. It can be seenthat mean intelligibility rates were around the 90%level for all listener groups and varied by less than 2%across the two testing sessions. First, the data wereanalyzed for any effect relating to the test procedureused. A repeated-measures analysis of variance (ANOVA)carried out on the arcsine-transformed data showedthat the mean intelligibility rate in Session 2 (91.26%, SD = 2.05) was significantly higher than that in Ses-sion 1 (90.37%, SD = 2.4),  F  (1, 132) = 25.04,  p < .0001.When data were analyzed separately for each listenergroup, it was found that errors decreased slightly inthe second session for older child (OC) listeners,  F  (1,44) = 20.43,  p < .0001, and younger child (YC) listen-ers,  F  (1, 44) = 11.84,  p < .001, but not for adult listen-ers. 4 This decrease in error rate across sessions could Table 2. Listener age characteristics. Listener groupFemalesMalesAge range MSD   Adults (AD)232219;4–55;029;97;10Older children (OC)232211;0–12;1111;110;6 Younger children (YC)23 a 227;7–8;57;110;3 a Summary data for this group are not available, because the age data were inadvertently deleted. However, girls were drawn from the sameschool years as the boys, and with the same age restrictions Table 3. Word intelligibility rates ( ± 1 SD  ) obtained at the first andsecond testing sessions. ListenergroupSession 1 SD  Session 2 SD   Adults ( N  = 45)91.552.1891.851.80OC ( N  = 45)90.502.1691.791.87 YC ( N  = 45)89.052.3590.132.04 All ( N  = 135)90.372.4491.262.05 Note  . Intelligibility rates were averaged over talkers and over test conditions. 4 Mean intelligibility rates were higher than those obtained at the sameSNR in the pilot study. Only a subset of test words had been used in thepilot test and this subset might have included more highly confusablewords.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!