汉语韵律标注(CHIPRO )与韵律结构的预测
汉语韵律标注(CHIPRO )与韵律结构的预测
① In this article, I use the term Chinese for Standard Chinese of putonghua type, or Mandarin.
160 韵 律 语 法 研 究 第 九 辑
1. Rationale
When considering the sound structure of this or that language, all literate laymen
probably know about the existence of vowels and consonants. However, they may have
a rather vague idea about features that stretch over vowels and consonants. Usually,
they are not acquainted with the term “prosody”, let alone the term “suprasegmental
features”. Learning that prosody comprises properties such as stress, pauses, intonation,
rhythm, tempo or loudness, they still tend to think that prosodic features are less
important than vowels and consonants, which make up the words and are reflected in the
script. Unfortunately, the same mostly holds for L2 teachers. Chinese language teachers
are no exception: their main suprasegmental concerns are the four lexical tones, tone
sandhi, the neutral tone ( 轻声 ), and disyllabic tone combinations.① Explanations
of the features of connected speech are limited. The same holds for textbooks.
① Important topics proposed for teaching Chinese pronunciation are suggested in Třísková
(2017a) in English, and Třísková (2017b) in Chinese.
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 161
Although audio files are included as an integral part of most current textbooks,
clarification of various prosodic phenomena students may hear there (e.g. undershooting
of tonal targets) is mostly missing. In addition, except in early lessons (the number
depends on the particular textbook), sentences are usually presented in Chinese
characters, while Hanyu Pinyin is placed elsewhere. In classroom teaching, teachers are
usually satisfied if students read the sentences character by character, with full tones
(“scripted speech”). Their main pedagogic goal is proper character recognition, good
initials, finals, and tones. Thus, the phonetic form of the resulting utterance sounds as
a sequence of isolated words mechanically aligned side by side like beads on a string,
deprived of the utterance-level prosody.
However, utterances in natural L1 speech are more than strings of fully pronounced
words. It is not advisable to neglect prosody in L2 (Chinese) teaching, as prosodic
features have vital functions in oral communication. One result of neglecting prosodic
features may be that students speak like robots. Their speech often has no rhythm,
wrongly placed breaks, no changes of syllable prominence, erroneous intonation
patterns, no reflection of information structure, etc. To help students speak more
naturally from the early stages of learning, we can:
1. Give them basic instruction on the prosodic features of spoken Chinese.
2. Provide the characters hand in hand with Hanyu Pinyin notation of sentences
wherever needed and for as long as needed.
3. Furnish Hanyu Pinyin with certain graphic marks rendering major prosodic
features: stress, and grouping (phrasing). This may be called prosodic transcription.
In this paper, I will introduce my proposal for prosodic transcription, called
CHIPROT (Chinese Prosodic Transcription), which was primarily designed for
language teaching purposes. Its tentative versions were presented in Hong Kong in June
2018 (my talk at the Chinese University of Hong Kong) and in Beijing in May 2019
[my talks at the Institute of Linguistics of the Chinese Academy of Social Sciences,
at Beijing Language and Culture University (hereafter referred to as “BLCU”), and
at Capital Normal University]. The final version of CHIPROT was introduced at two
162 韵 律 语 法 研 究 第 九 辑
were applied in two large non-electronic corpora (subsequently digitized). Both corpora
were recorded by a native Beijing speaker and prosodically annotated later on. They
represent a rather unique source of material to date.
The larger corpus is related to a voluminous dictionary of Chinese morphemes,
titled A Learning Dictionary of Modern Chinese (in Czech; Švarný, 1998–2000). Next
to grammatical analysis of individual morphemes, it comprises about 16,000 example
sentences illustrating the usage of particular morphemes in context. The sentences were
recorded over the course of several months in 1969 (Švarný employed a single speaker:
Beijing-born Mrs. Tang Yunling Rusková 唐 云 凌 ). The recordings were prosodically
annotated by Švarný over the following six years (the work was finished in 1976).
Figure 1 Sample of Švarny’s transcription: seven example sentences illustrating the use of the
verb yǒu 有 (A Learning Dictionary of Modern Chinese, 1998–2000: 71). Note that this
version of Švarny’s transcription is slightly different from the version shown in Figure 2.
Figure 2 Sample of Švarny’s transcription: example sentences illustrating the use of the verb
yǒu 有 “to have” (Grammar of Spoken Chinese in Examples, Švarny et al., 1991–1993: 46)
Example of CHIPROT:
这辆汽车是我们的第一辆汽车。“This car is our first car.”
plain Hanyu Pinyin: Zhè liàng qìchē shì wǒmende dì yī liàng qìchē.
CHIPROT transcription: Zhè-liàng qìchē // shì-wǒmende dì-YÍ-liàng qìchē.
① Of course, solid knowledge of the basics, such as proper pronunciation of disyllabic tone
combinations, or proper pronunciation of T0, is tacitly expected.
166 韵 律 语 法 研 究 第 九 辑
on). Transcribed utterances, in combination with audio recordings, can serve as a good
basis for practice and for subsequent work on removing errors with the help of a teacher.
Thus, natural speech production can be learned. More advanced students can use the
same resources to practice speech perception ( 听力 ): they may listen to the audio
recordings and attempt to transcribe the utterances themselves, using CHIPROT. Such
practice may help them become aware of various prosodic features of connected speech
they had not previously noticed.
My major ambition is to offer CHIPROT to teachers and compilers of pedagogic
materials. They can use it to transcribe common colloquial sentences/dialogues
according to the audio recordings. Of course, mastering the transcription procedure
inevitably comprises getting acquainted with the prosodic features of connected Chinese,
and with the essential principles of CHIPROT. Attaining a certain degree of practical
experience with transcription is yet another necessary condition for a satisfactory
outcome.
The CHIPROT system has been tested in teaching practice for several years. Since
2017, I have been using it in my courses on Chinese prosody (Charles University in
Prague) for second-semester students. Introducing the system at this level seems to be
most appropriate: students already know the basics (the initials, the finals, the tones,
basic vocabulary and grammar, simple sentences and phrases). At the same time, they
have not developed fossilized errors. After finishing the course I always distribute a
questionnaire asking students to verbalize their impressions. Students’ feedback on the
course in general and CHIPROT in particular has been very positive①.
Further, CHIPROT has already been systematically applied in structured teaching
material. I used it to transcribe about 80 example sentences and short dialogues in the
textbook Speak Chinese with Ease: Prosody of Colloquial Chinese (Třísková, 2021,
① The feedback from the year 2023: “CHIPROT is intuitive. It makes sense and gives me an
insight into the principles.” (P.M.) “For me CHIPROT is an unbeatable transcription. It is easy,
logical and intelligible.” (P.T.) “I am fully satisfied with CHIPROT. It is better than Pinyin.” (L.J.)
“CHIPROT is great, it definitely eases reading.” (A.M.K.)
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 167
in print in Czech; the English translation is in progress; the forthcoming textbook was
introduced at the CASLAR-6 conference in 2021). These sentences and dialogues
with audio recordings, illustrating various prosodic phenomena, were selected from
the above-mentioned textbook Grammar of Spoken Chinese in Examples (Švarný et
al., 1991–1993). Švarný’s prosodically annotated corpus is a rich resource offering
examples of prosodic phenomena in many different contexts. However, it is structured
according to grammatical topics. Thus, the examples of particular prosodic phenomena
had to be laboriously retrieved in the corpus.
In what follows I will describe the CHIPROT annotation conventions used to mark
syllable prominence and prosodic units.
3. Syllable Prominence
CHIPROT assumes four degrees of syllable prominence. The theoretical basis
of my prominence concept can be found in Třísková (2020). The crucial notions are
a normal syllable, and a weakened syllable. That is, I view the weakening of normal
syllables as a major issue in examining Chinese stress. Note that except for the highest
degree (emphasis/contrastive stress), I do without the term “stress” in my prominence
scale. However, for convenience, I use the common term “stressed syllable” when
speaking of a phonetically salient (relatively prominent) syllable. Similarly, I use the
term “unstressed syllable” for non-prominent, phonetically weak syllables. After all,
stress is a relative matter. As Feng Shengli points out in his ICPG-7 paper:
Stress is not a sound, it is a relationship ( 重音不是“音”,重音是关系 )…
Stress “
( 重”) only exists in relation to non-stress “
( 轻”)…If we are looking for
stress, we should not look for a stressed item, but for a relative prominence ( 如果
找重音,不是看哪儿重,而是看哪儿有“相对凸显”的关系 ). As for whether
this relationship is reached by enhancing [a particular syllable], or by other means,
it is secondary ( 至于该关系是用“加重”或其他手段来表现,则是第二位的 )
(Feng, 2021: 7).
Feng is speaking of “other means” of expressing relative prominence. We may infer
168 韵 律 语 法 研 究 第 九 辑
that syllable weakening may be the most important of these other means. For instance,
an iambic pattern may be attained either by keeping the first syllable fully pronounced
and enhancing the second syllable, or by weakening the first syllable and keeping the
second syllable fully pronounced (without enhancing its prominence). We may add that
the second solution is “cheaper” in terms of articulatory effort and thus may be preferred
by speakers. Feng’s approach to stress is perfectly in line with my own.
What is a meaningful number of stress degrees? Švarný assumed six categories,
which might be too many. His categories were in a way abstract constructs arising from
the combination of two features: 1. Degree of tone fullness; 2. Presence/absence of
stress; see Třísková (2011).
C-ToBI and Mandarin ToBI suggest four degrees of stress. Mandarin ToBI accepts
the following “stress levels” in the stress tier (Peng et al., 2005: 255):
S3: syllable with fully realized lexical tone
S2: syllable with substantial tone reduction, e.g. undershooting of tonal target with
duration reduction
S1: syllable that has lost its lexical tonal specification, e.g. in a weakly-stressed
position
S0: syllable with lexical neutral tone, i.e., inherently unstressed syllable
Like C-ToBI and Mandarin ToBI, CHIPROT also proposes four degrees of
prominence. However, they are conceived differently (see section 3.1). The additional
“top-prominence” syllable is more prominent than S3. On the other hand, S2 and S1 are
collapsed into the “weakened syllable” degree. The reasons for collapsing weak-tone
syllables and neutralized syllables are explained in section 3.5.
first sight and distinguish them from the surrounding text. However, Chinese textbooks
and pedagogic materials do not follow this custom. They are accustomed to using some
sort of sans-serif typeface.
The four degrees of syllable prominence are marked as follows:①
MĀ top-prominence syllable (the most prominent syllable of a prosodic phrase/
utterance; see section 3.6)
tentative Chinese term: 强音节
mā normal syllable (ordinary syllable with full tone; see section 3.3)
tentative Chinese term: 常音节
weakened syllable (the tone is either weakened, or even completely neutralized;
mā
see section 3.5)
tentative Chinese term: 弱音节
ma toneless syllable (morpheme without a lexical tone; see section 3.4)
tentative Chinese term: 无调音节 / 无调语素
• A minority of Chinese morphemes are toneless, i.e., they do not have a lexical
tone (de 的 , le 了 , ba 吧 , etc.). I call them 无调语素 . They are “unstressed” by default,
as they have no lexical tone which would give them the potential to become “stressed”.
• An absolute majority of Chinese morphemes are tonal, i.e., they have a lexical
tone. I call them 有调语素 . Lexical tone gives them the potential to become prominent,
“stressed”. This potential may or may not be exploited in connected speech. Tonal
morphemes may either realize as normal syllables, as weakened syllables, or as
enhanced syllables.
• Chinese tonal morphemes generally strive to be realized with full, perceptible
tones in connected speech, because tones distinguish lexical meanings of morphemes.
• Quite a few tonal morphemes/syllables may become weakened in connected
speech ( 弱音节 ). Their tone is either weak yet still perceivable ( 弱调音节 ) or completely
① Most of the Chinese terms appearing below emerged from extensive discussions with
Professor Cao Wen at BLCU in October 2011.
170 韵 律 语 法 研 究 第 九 辑
I call syllables realized with ordinary full tone normal syllables 常音节 . Cf. Chao
Yuan Ren’s “normal stress” 正常重音 (Chao, 1968: 35). See also Třísková (2020: 82-83).
Normal syllables are not overly prominent (“stressed”). They just carry full distinguishable
tone. In CHIPROT, normal syllables ( 常音节 ) are printed in bold, carrying a tone mark
(mā). They may be viewed as a default form of tonal morphemes. As a starting point,
we may assume that all tonal morphemes would be realized as fully pronounced normal
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 171
syllables in connected speech (pedagogic practice often stops at this point). In turn, the
first step of the transcription procedure is the representation of all tonal syllables in bold
type, and of course with a tone mark. For instance, in this sentence the only syllable
which must not be in bold is the toneless lexical suffix zi 子 :
Zhuōzi shàng yǒu sān běn shū. 桌子上有三本书。
“There are three books on the table.”
Sometimes it is hard to find any difference in prominence in neighbouring tonal
syllables, as may be the case for the word huāpíngr 花瓶儿 “vase” in the following
utterance:
Bǎ-huāpíngr // fàng-zài ZHUŌzi-shàng. 把花瓶儿放在桌子上。
“Put the vase on the table.”
Yet this does not pose a problem, because the adjacency of normal syllables is
regarded as acceptable/common/natural. It is not viewed as some sort of stress clash.
Thus, CHIPROT liberates the transcriber from the enforced pursuit of stressed syllables
in cases where phonetic material does not offer clear support for such an evaluation.
Chinese has a number of toneless morphemes which do not have a lexical tone (无
调语素 ). This group is fully predictable from the lexicon – the syllable carries no tone mark
in dictionaries. In CHIPROT, toneless syllables are printed in non-bold type, carrying
no tone mark (ma). Toneless morphemes are “unstressed” by default – their weak
realization is basically predictable. Sometimes they may be prolonged in final position,
i.e., at the end of a prosodic phrase or utterance (the well-known phenomenon of phrase-
final lengthening is more or less universal in languages). Their loudness may also be
non-negligible. They may even display pitch movement. Thus, such syllables may
sometimes sound rather conspicuous. However, the roots of this sort of conspicuousness
do not lie in prominence (“stress”) structure. Rather, such syllables serve as carriers of
emotional or pragmatic meanings.
We shall distinguish two major groups of toneless items: monosyllabic toneless
function words, and second syllables in some types of disyllabic words.
172 韵 律 语 法 研 究 第 九 辑
Examples are:
• reduplicated of monosyllabic nouns: tiāntiān 天天 “every day”
• reduplicated of monosyllabic verbs: kànkàn 看看 “take a look” (see section 3.5.4)
• verbs with direction complements: zuòxià 坐下 “sit down” (see section 3.5.5)
• verbs with some resultative complements: kànjiàn 看见 “spot” (see section 3.5.6)
• verbs with complements expressing a short action: zuòyíhuìr 坐一会儿 “sit for a
while”
① I admit that the presence of tone mark on fully neutralized syllables may sometimes be
confusing. Yet I decided that the above arguments for the use of tone mark on such syllables are
sufficiently compelling.
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 175
Wǒ bù rènshi tā. 我不认识他。“I do not know him.” There may be contexts where tā
他 restores its full tone or even becomes emphasized: Bù shì wǒ, shì tā! 不是我,是他!
“It is not me, it is him!”
• At the other extreme of the weakening continuum there are content words. They
weaken their meaning/tone(s) only occasionally. This happens, for instance, when
the word is repeated and has no substantial semantic importance in the given context.
Yet such a word tends to keep remnants of tone(s), resisting complete neutralization
(though that may certainly happen in fast, sloppy speech). Content words are least prone
to prosodic weakening, representing the least conspicuous and least frequent/stable/
predictable cases.
In some Chinese monosyllabic words/morphemes with lexical tone, the inclination
to become weakened in connected speech is higher than in other words/morphemes.
They may be weakened rather frequently (in some cases even obligatorily), yet most of
them may occasionally gain prominence and even become strikingly prominent. I will
tentatively call these items commonly weakened morphemes (CWMs).①
Speaking of “commonly weakened morphemes”, the meaning of the word
“commonly” needs to be explained. Importantly, the sources/motivations for this
“common” weakening are not accidental or arbitrary. Weakening is mostly rule-
governed (Třísková, 2020). Thus, many cases of weakening can be predicted – from
grammar, phonology, lexicon, information structure, or pragmatics.
The members of the CWM group share one important feature: a general inclination
to become weakened. This entitles us to establish them as a specific group worth
investigating. Nevertheless, the CWM group is rather heterogeneous. The members
of the group display different grammatical properties, including different degrees of
freeness-boundness. Clearly, the same phonetic surface form (weak/neutralized tone)
may have different sources, rooted in different linguistic levels. We may also observe a
different degree of inclination to become weakened (this may even hold for members of
be questioned. It may be emphasized only rarely. On the other hand, modal verbs, the
adverbs such as dōu 都 , etc., may retain some degree of prominence more often. Their
affiliation to the cliticoid group may perhaps raise some questions. For the present
classification, the major criterion is consistent fading of the semantic content of the item
(and subsequent phonetic weakening) in many/most contexts.
3.5.2 Two Neighboring Monosyllabic Tonal Function Words
Sometimes two cliticoids (monosyllabic tonal function words) occur together. This
often happens at the beginning of a sentence. The examples are:
bǎ-tā 把他 “him”
gěi-tā 给他 “to him”
tā-zài 他在 “he at”
nǐ-jiù 你就 “you then”
jiù-shì 就是 “then is”
tā-hěn 他很 “he very”
Usually, both items form a disyllabic prosodic word. The first FW (function word)
receives weak prominence, while the second FW is completely atonic. The result is an
inconspicuous trochee, where the first item is just slightly prominent. I neglect this in
transcription in order not to overburden the CHIPROT graphics, writing both items as non-
bold (and of course with a tone mark). For instance, bǎ-tā 把他 in the following utterance:
Bǎ-tā jiào-dào WǑ-zhèr-lái. 把他叫到我这儿来。
“Call him to me.”
Only if the first item sounds clearly prominent, I put it in bold:
Bǎ-tā jiào-dào WǑ-zhèr-lái.
Note that in some prosodic words of this type the first FW has no grammatical
relationship with the second FW, e.g. tā-hěn 他很 “he very”. How can such a prosodic
word be formed? The requirements of rhythm may sometimes override the grammar,
causing a word to break away from its grammatical mate and “desert” to the preceding
monosyllabic word, saving it from standing alone. Monosyllabic prosodic words are
generally undesirable. Further, similar length prosodic words is more welcome than
extremes, i.e., very short or very long prosodic words. Thus, 1+3 is conveniently turned
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 179
There are many other words which may actually belong to this group, although
XHC does not recognize them as such. That is, they are printed without a dot between
both syllables, e.g. cuòwù 错误 “mistake”, sùdù 速度 “speed”, yuànwàng 愿望 “wish”
(Wang, 2016: 32, Třísková, 2020: 93).
3.5.4 Second Syllable in Many 3–4 Syllabic Words
Accentuation of 3–4 syllabic words (which represent a rather small proportion
of the Chinese lexicon) is relatively stable. In most of them, the second syllable is
pronounced in the neutral tone. The last syllable tends to be the most prominent.
• huǒchēzhàn 火车站 “train station”
• shuǐmòhuà 水墨画 “ink painting”
• búxiùgāng 不锈钢 “stainless steel”
• qiǎokèlì 巧克力 “chocolate”
• Xīshuāngbǎnnà 西双版纳 “the region Xishuangbanna”
• zībénzhǔyì 资本主义 “capitalism”
3.5.5 Second Syllable in Reduplicated Monosyllabic Verbs
Both monosyllabic and disyllabic Chinese verbs may be reduplicated to express a
short, finished action. If a monosyllabic verb is reduplicated, the second syllable should
be pronounced in the neutral tone:
• kànkàn 看看 “take a look”
• shuōshuō 说说 “talk about”
• tīngtīng 听听 “listen”
• chángcháng 尝尝 “taste”
The numeral yī 一 “one” may be inserted between both components: kànyīkàn 看
一看 . The pronunciation of yī is also atonic.
3.5.6 Directional Complements
Directional complements attached to a Chinese verb describe the direction of an
action (up, down, away from the speaker, towards the speaker, etc.). The complement
may either be monosyllabic (e.g. lái 来 in huílái 回来 ) or disyllabic (e.g. chūlái 出来 in
kànchūlái 看出来 ). Directional complements should be pronounced in the neutral tone.
This holds both for monosyllabic and disyllabic directional complements:
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 181
reached any considerable consensus so far. I assume two underlying patterns (Třísková,
2020: 92):
1. the spondee pattern (“equal-stress pattern”, 重 重 , 等 重 , 轻 重 不 分 ), with
the iamb pattern ( 中重 , 右重 ) as a variant. Note that I do not recognize any need to
establish the iamb as an independent pattern. I view the difference between the spondee
and the iamb as a phonetic detail. The iamb pattern is mostly induced by the prepausal
position (i.e., a post-lexical factor).
2. the trochee pattern ( 重轻 , 左重 ), regardless of the degree of second syllable
weakening (it may be either atonic or weakened, yet it is atonic in most cases).
This solution is quite similar to the analysis of disyllabic stress patterns in Beijing
Mandarin presented in Wang & Feng (2006). They describe two patterns: 左重 (trochee)
and 右重 (iamb). While the 左重 (trochee) pattern always has a weaker second syllable,
in the 右重 (iamb) pattern the weaker first syllable is not a rule (that is, both syllables
may sometimes have equal prominence). Regarding lexical stress, the authors recognize
only one underlying pattern: trochee, or 左 重 , which includes 轻 声 词 and 带 调
左 重 词 . All other disyllabic words are argued not to have lexical stress at all ( 不 是
左重的双音节词没有词重音 ). They may either be realized as 右重 (iamb) or have
both syllables of equal prominence ( 其左右音节可以看作轻重不分或差不多 ). The
authors finally conclude: “[Only] the trochee pattern can be viewed as lexical stress; all
other stress patterns are induced by factors coming from elsewhere than the lexicon.”
( 左重为词汇重音,非左重形式由词汇以外因素决定 ) Most recently, Feng Shengli
(Feng, 2021: 7) has proposed “a new definition of lexical stress in colloquial speech
style”, taking into account speech style and word frequency, which influence the actual
surface stress pattern of a word. He claims that it is impossible to find a solution for
Chinese lexical stress without taking these factors into account. Feng challenges current
theories of lexical stress, seeing problems in the very understanding of what “stress” as
such is. Feng points out again that stress is a relationship ( 重音是关系而不是音体 ).
He wonders whether the Chinese colloquial speech style has word stress at all.
Regarding the distribution of the realization/surface patterns of disyllabic tonal
words, I generally distinguish two major situations:
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 185
– Some words favor the trochee pattern. These were treated in section 3.5.3.
– The majority of Chinese disyllabic tonal words may have more or less variable
accentuation. Disyllabic words with two tonal syllables often keep full tones on both
syllables in connected speech. Sometimes it is hard to tell which syllable is more
prominent. To put it another way, the word assumes a spondee pattern ( 等重 , 重重 ).
Occasionally, the second syllable may be more prominent, the word thus assuming the
iambic pattern ( 中重 , 右重 ). This may happen especially before a pause, as a result of
final lengthening. The spondee (/iamb) pattern may be viewed as a default pattern
of disyllabic tonal words (except for those treated in section 3.5.3, such as 做法 zuò·f⛝
“method”).
Yet in some situations this pattern may be modified. Under the pressure of rhythm
and syntactic/information structure, the second syllable may be pronounced with a weak
or even neutral tone, the whole word assuming the trochee pattern ( 重轻 ).
yīgòng 一共 “altogether”
Yígòng DUŌshao-qián? 一共多少钱?
“How much is the total?”
① Note that if the verb zhīdào 知道 is preceded by the negative bù 不 , the pattern is changed:
zhīdào, but bú-zhīdào.
186 韵 律 语 法 研 究 第 九 辑
The context, as said above, is an important factor determining the actual word
accentuation. If the word is in prepausal position and/or focused, it tends to keep the
original full prominence of the second syllable. On the other hand, we can observe that
when the word in question is not followed by a break – that is, if “something follows”
– weakening of the second syllable often occurs, though definitely not always (Wang &
Chu, 2008: 143). Let us give a few more examples of non-final disyllabic tonal words
realized as trochees:
hángkōng 航空 “aviation”
Shì-HÁNGkōng-xìn-ma? 是航空信吗?
“Is it an airmail letter?”
jīdàn 鸡蛋 “eggs”
Jīdàn HÉN-hǎochī. 鸡蛋很好吃。
“The eggs are (very) tasty.”
fàngxīn 放心 “relieved”
Wǒ-fàngxīn-DUŌ-le. 我放心多了。
“I was greatly relieved.”
This phenomenon is, among other things, undoubtedly related to syntax, as prosodic
structure and syntactic structure are to a large extent interrelated. As Feng (2019a,
2019b) points out, the interaction between syntax and prosody is bidirectional: prosody
not only constrains syntactic structures, but also activates syntactic operations. These
phenomena are also treated in the textbook (Feng & Wang, 2018). Concerning the
relationship between grammar and prosody, see also Lin (1962) and Švarný & Uher
(2014).
It must be pointed out that Chinese disyllabic tonal words are not equally ready to
surrender to such pressures and shift to a trochee pattern. The phenomenon of variable
stress patterns in disyllabic words was analysed by Oldřich Švarný back in the 1970s
(Švarný, 1974). Švarný’s description of such variability was based on a large corpus of
utterances (Švarný, 1998–2000; see section 2.1). He collected numerous tokens of the
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 187
same word type, examining their prominence patterns in various contexts.① By means
of descriptive statistics he established seven “accentuation types” of disyllabic words,
based on their degree of willingness to weaken the second syllable, i.e., their inclination
for the trochee pattern.② For instance, the word dòufu 豆 腐 “bean curd” belongs to the
extreme type (1) with a 100% inclination for trochee (1). The word zuòfǎ 做法 “method”
is the next type (2), with a strong inclination to trochee (2). The following four types (3,
4, 5, 6) display different degrees of variability. The last type (7) includes words such as
lǎoshī 老师 “teacher”, whose willingness to be realized as a trochee is very low. Note
that Švarný was concerned with “non-stress” in disyllabic words, instead of “stress”; cf.
the notion of “ 以轻显重 ”, mentioned in Feng (2021: 14).
Švarný did not explore the conditions and contexts of second syllable weakening in
detail. He only observed the tendency for iambs to occur at the end of a rhythmic unit,
and for trochees to occur at the beginning or inside a rhythmic unit (Švarný, 1998–2000:
xxxviii). This topic is examined, for example, in Wang & Chu (2008). Elucidating
the conditions for weakening of the second syllable in disyllabic words in connected
speech is a task which remains for future research. In any case, Švarný’s early analysis
of accentuation patterns in disyllabic words was fairly ahead of its time. The current
analysis of Feng seems to have much in common with Švarný: “The typical form of
lexical stress in colloquial speech style is atonic or weak pronunciation [of the second
syllable]; these two have nothing to do with ‘stress’ or ‘enhancement’” ( 口语词重音的
shall treat the following situations: default nucleus, emphasis, particle ma 吗 questions,
A-not-A questions, question-word questions, and alternative questions.
3.6.1 Default Nucleus
In so-called “neutral” speech① without any special emphasis (broad focus), the greatest
prominence seems to rest on the last full word of the utterance:
Huǒchē-shàng rén-hěn DUŌ. 火车上人很多。
“There are a lot of people in the train.”
① In fact, no such thing as “neutral” speech exists in real life. I use this common term only for
the sake of convenience.
190 韵 律 语 法 研 究 第 九 辑
personal pronoun tā 他 :
– SHÉI bù-xǐhuan chī-ròu? 谁不喜欢吃肉? “Who does not like to eat meat?”
– TĀ bù-xǐhuan chī-ròu. 他不喜欢吃肉。“He does not like to eat meat.”
Note that in emotionally charged, expressive speech there may be more emphasized
items in one prosodic phrase. Yet this situation is not so common.
3.6.3 Particle ma 吗 Questions (Polarity Questions)
Polarity questions are those which offer a choice between two possibilities,
expecting either a positive or a negative answer. Because the answer is typically (though
not always) either YES or NO, they are often called yes/no questions. In Chinese,
polarity questions are those comprising the particle ma 吗 . Grammatically unmarked
questions also belong here. In such questions, the most salient item is quite naturally the
item the speaker is asking about. This item will probably carry the nucleus. For instance:
Nǐ-shēnti HǍO-ma? 你身体好吗?
“How are you?” (Are you in good health?)
Shì-HÁNGkōng-xìn-ma? 是航空信吗?
“Is it an airmail letter?”
3.6.4 A-not-A Questions (Affirmative-negative Questions)
A-not-A questions use the affirmative and negative forms of the predicate.
If the verb/adjective is monosyllabic, the first item is the most prominent, while
the pronunciation of the negative bù 不 occurring between both items is atonic. The
three items form a prosodic word:
Tāng RÈ-bú-rè? 汤热不热?
“Is the soup hot or not?”
If the verb/adjective is disyllabic, the rhythmic pattern changes: the negative bù
不 assumes a certain prominence, standing as the first item of a new prosodic word.
Repeated verb is rather weak:
Ní-XǏhuan bù-xǐhuan? 你喜欢不喜欢?
“Do you like it or not?”
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 191
A prosodic word (PW) is usually, though not always, composed of several lexical
items. Most often they are grammatically related to each other. However, in rapid speech
a function word may “desert” to the preceding prosodic word (see section 3.5.2). Below
I will review the major structural types of prosodic words.
4.1.1 Single Word
Prosodic words composed of a single word are rather rare. Especially if the word is
monosyllabic (and/or a function word), it seldom stands as a prosodic word. Disyllabic
content words are better able to stand as prosodic words:
Zhè-shì YĬzi. 这是椅子。
This is a chair.
4.1.2 Content Word with Attached Function Word(s)
Most commonly, a prosodic word is formed by a content word with function
word(s) attached to it (before, after, or both). In the utterance below we can find two
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 193
FWs attached to some content words: the preposition bǎ 把 (a proclitic), and the
post-verbally placed preposition zài 在 (an enclitic):
Bǎ-xíngli cún-zài huǒchēZHÀN. 把行李存在火车站。
“Store your luggage at the train station.”
A function word contained in a prosodic word does not always have a grammatical
relationship with its neighbor (see section 3.5.2). For instance, the adverb jiù 就 in the
following example grammatically belongs to the following verb:
Wó-ZǍO-jiù kànguo Hóng-Lóu-Mèng. 我早就看过《红楼梦》。
“I read (the novel) Dream of the Red Chamber a long time ago.”
Note that “unstressed” function words cannot stand alone. In connected speech
they have to join some other word to form a prosodic word together. This phenomenon
has important pedagogical consequences.
4.1.3 Two Content Words
Many prosodic words comprise two content words, such as xué 学 “learn” and
zhōngwén 中文 “Chinese” in the following example:
NĚIxiē xuésheng xué-zhōngwén? 哪些学生学中文?
“Which students learn Chinese?”
Sometimes function word(s) may be added, e.g. the personal pronouns tā 他 “he”,
wǒ 我 “I” in the following example (the second prosodic phrase):
Wǒ-ZHĪdào // tā-bù-xǐhuan-wǒ. 我知道他不喜欢我。
“I know that he does not like me.”
Sometimes function word(s) may be inserted between two content words. This is
the case of the unstressed word jǐ 几 meaning “couple of” in the following example:
Zhè-běnr cídián-lǐ // SHÁO-jǐ-yè. 这本儿词典里少几页。
“There are a few pages missing in this dictionary.”
4.1.4 Two Function Words
Some prosodic words are composed of only two function words (usually standing
at the beginning of an utterance or prosodic phrase). These cases have already been
treated in section 3.5.2.
194 韵 律 语 法 研 究 第 九 辑
Prosodic words join to form larger units: prosodic phrases (PPh). A prosodic phrase
may sometimes contain just one prosodic word. More often there are two or three (rarely
more) prosodic words in one prosodic phrase. In this section, we shall be concerned
with non-final prosodic phrases (such a phrase is not the last one in the utterance).
PPh boundary may occur after a non-final clause (4.2.1), after a prepositional phrase
(4.2.2), after a longer noun phrase standing utterance-initially (4.2.3), after particular
items in enumerations, and (less frequently) after a predicate followed by a longer
noun phrase. A hearer can detect the boundary using several signals, usually occurring
in combination: non-falling intonation pattern, slight final lengthening, and less often
a silent pause. Note that there may or may not be a comma in the orthography (e.g. a
longer noun phrase standing as a subject is not followed by a comma).
4.2.1 Non-final Clause
Zhèr-yǒu YǏzi //, nàr-yǒu ZHUŌzi. 这儿有椅子,那儿有桌子。
“Here is a chair, and there is a table.”
4.2.2 Prepositional Phrase
Bǎ-huāpíngr // fàng-zài ZHUŌzi-shàng. 把花瓶儿放在桌子上。
“Put the vase on the table.”
4.2.3 Preverbal Noun Phrase
A subject, time/place determination, or utterance-initially placed object may be
followed by a notable prosodic boundary if it is longer.
Nèi-jí-běnr SHŪ // wǒ-dōu kànWÁN-le. 那几本书我都看完了。
“I have read these books already.”
5. CHIPROT Cookbook
In previous paragraphs, I have tried to demonstrate that many features of prosodic
structure can be predicted. In this section, I will attempt to describe the CHIPROT
transcription procedure involving certain predictions. I have chosen four sentences, (A),
(B), (C), and (D), as examples to clarify the procedure. It has five steps (or six if we
include step /0/). Steps /2/, /3/, and /4/ are predictions.
Step /0/
The sentence is jotted down or already available in plain Hanyu Pinyin (in italics).
Tonal syllables carry tone marks, toneless syllables carry no tone mark: mā, ma.
(A) Zhuōzi shàng yǒu sān běn shū. 桌子上有三本书。
“There are three books on the table.”
Step /1/
All tonal syllables will be put in bold type (they of course carry a tone mark): mā.
This can be easily done by putting the whole sentence in bold type and then unbolding
toneless syllables (see section 3.4). Regularly there are very few or even no toneless
syllables in a sentence.
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 197
Step /2/
Tonal words/morphemes that are predicted to be weakened will be unbolded:
mā. Many of them will be cliticoids (see section 3.5.1). Note that normal syllables may
neighbor each other (see section 3.3).
belong to the cliticoids. The word zhèr 这儿 is a place word, thus we keep it as a normal
syllable at this point. The verb lái 来 functions as a directional complement here, being
just a formal indicator of the direction towards the speaker. Thus, it will be predicted as
weak.
Step /3/
mark phrasing (prosodic words/phrases) -, //
The words which would presumably be tightly bound in speech, forming a
prosodic word, will be connected by a dash, e.g. sān-běn-shū 三本书 . Remember that
toneless and weakened items cannot stand alone. The most frequent weak, unstressed
items are the clitics (3.4.1) and cliticoids (3.5.1).
Short utterances usually stand as a single prosodic phrase. Its boundary is already
marked by a sentence-final punctuation mark. Longer utterances may be composed of
two or, less commonly, three or more prosodic phrases. The boundary of the non-final
prosodic phrase will be marked by a double slash (//). With respect to decisions about
prosodic boundaries of non-final prosodic phrases, the relevant factors to consider were
outlined in section 4.2.
proclitic. The toneless question particle ma 吗 has no other choice but to be attached
to the preceding word. The resulting prosodic word, forming a prosodic phrase and a
finished utterance at the same time, is rather long (five syllables). This long prosodic
word could fall apart into two prosodic words if the speaker hesitates and inserts a break
after the verb shì 是 . This may manifest in perceptible lengthening and (in the case of
a strong hesitation) by a silent pause. If shì is emphasized, it could also possibly stand
This utterance is composed of two clauses, and thus most probably of two prosodic
lengthening of the syllable me 么, and possibly by a silent pause. The personal pronouns
form a disyllabic prosodic word. The preposition dào 到 is placed after the verb in this
sentence. Its pronunciation is typically atonic in such a position, tightly joining the
preceding verb as an enclitic. The expression wǒ zhèr 我这儿 is a set phrase “here
where I am”, thus both items must be tightly joined. The last item lái 来 is formal and
Step /4/
We look for the items which are presumably the most prominent in the utterance:
the words carrying emphasis, contrastive stress, default nucleus, etc. (3.6). Pertinent
some particular word (e.g. zhuōzi 桌子 , shàng 上 , sān 三 ), this utterance will have a
Step /5/
Listen to the audio and make corrections. I have tried to show that some/many
prosodic features can be predicted without hearing the audio recordings because they are
rule-governed to a large extent. However, our predictions may certainly be imperfect.
Speech tempo, speech style, individual habits of the speaker, specific information
structure or pragmatic context, etc. may influence the surface prosodic form and make
some of our predictions wrong. Careful listening to the audio is thus the last step, which
gives the transcript the final touch. While evaluating the prominence of particular
syllables or phrasing in listening, there may still be some questionable points. In such
cases speech analysis software (such as PRAAT) would be needed to support our final
assessments. There may be some unclear cases, but they should not be frequent.
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 201
6. Minimodules
I have shown how the CHIPROT transcription can be used to transcribe whole
utterances. It may also be used to indicate the prominence structure of commonly used
short phrases such as:
shù-shàng 树上 trochee ●•
ní-hǎo 你好 iamb •●
zhè-běn-shū 这本书 cretic ●• ●
gěi-bàba 给爸爸 amphibrach • ●•
wūzi-lǐ 屋子里 dactyl ●••
zài-Běijīng 在北京 bacchius •●●
xuéxiào-lǐ 学校里 antibacchius ●●•
I call these brief, two- or three-syllable sequences minimodules, or phonetic
chunks. They draw on the notion of formulaic language (Třísková, 2017c) and can be
efficiently used in pedagogic practice. The labels for prominence patterns are borrowed
from verse meter of Ancient Greek poetry. Note that minimodules do not need to employ
the highest degree of prominence (MĀ), since most of them are not finished utterances.
7. Conclusion
The CHIPROT transcription was, above all, designed as a pedagogic tool. It may
aid those who are studying Chinese as a second/foreign language and struggling with
the prosodic form of the utterances. The aim is to help learners speak with more ease,
fluency, and naturalness. Language teachers may test CHIPROT here or there. They
may find it useful to exploit some of its features while preparing teaching materials and
handouts. Students can experiment with CHIPROT. They may, for example, find it
useful to draw up some transcripts related to particular lessons (the annotation procedure
is relatively easy, user-friendly, and computer-friendly; the system does not contain
any unusual graphic marks, complicated conventions, etc.). However, my long-term
objective is to encourage writers of pedagogic materials to incorporate CHIPROT into
202 韵 律 语 法 研 究 第 九 辑
their texts. This would, of course, take a good deal of prosodic knowledge and practical
transcription skill. Indeed, this is the process I have been through myself, discovering
various flaws, drawbacks, and traps in the system and improving it step by step.
Linguists engaged in research on connected speech may also find CHIPROT
useful. It may help them discover major prosodic rules and tendencies while analysing
the prosodic form of Chinese utterances anchored in real communication contexts –
instead of artificial, fabricated sentences pronounced in isolation. As the transcription
procedure can be executed in several clear steps, it may perhaps even be automated to
some extent. The necessary software, if designed, could be used to process larger sets of
speech data such as spoken language corpora.
CHIPROT certainly may have its shortcomings or points which escaped my notice. Yet
I trust that its final version represents a rather consistent, theory-based, and robust system.
Its occasional blind spots or lurking problems may be successfully solved in the course of
time. Feedback from future users of the CHIPROT system may greatly help to polish it. Any
comments or criticisms would certainly be welcome.
References
Beckman M E, Ayers G M. 1994. Guidelines for ToBI Labeling. Ohio State University. [Link]
[Link]/tobi/[Link].
Chao Y R ( 赵元任 ). 1968. A Grammar of Spoken Chinese. Berkeley and Los Angeles: University of California
Press.
Feng S L ( 冯胜利 ). 2019a. Prosodic Syntax in Chinese: History and Changes. New York: Routledge.
Feng S L ( 冯胜利 ). 2019b. Prosodic Syntax in Chinese: Theory and Facts. New York: Routledge.
Feng S L ( 冯胜利 ). 2021. 韵律语体语法与汉语的词重音 (Prosody of stylistic-register grammar and lexical
stress in Chinese). Paper presented at the 7th International Conference on Prosodic Grammar (ICPG-7),
Tianjin.
Feng S L ( 冯胜利 ), Wang L J ( 王丽娟 ). 2018. 汉语韵律语法教程 (A Course of Prosodic Grammar).
Beijing: Peking University Press.
Jiang L P ( 姜丽萍 ). 2014. HSK 标准教程 1 (HSK Standard Course 1). Beijing: Beijing Language and Culture
University Press.
Kratochvil P. 1974. Stress shift mechanism and its role in Peking dialect. Modern Asian Studies, 8.4: 433-458.
Lee W-S, Zee E. 2014. Chinese phonetics. In: Huang C-T J, Li Y-H A, Simpson A. The Handbook of Chinese
Linguistics. Oxford: Wiley Blackwell, 369-399.
Chinese Prosodic Transcription (CHIPROT) and the Prediction of Prosodic Structure 203
Li A J ( 李爱军 ). 2002. Chinese prosody and prosodic labeling of spontaneous speech. Proceedings of Speech
Prosody, Aix-en-Provence, 39-46.
Li A J, Zu Y Q. 2007. Corpus design and annotation for speech synthesis and recognition. In: Lee C H, et al.
Advances in Chinese Spoken Language Processing. Hong Kong: World Scientific, 263-268.
Li W M ( 厉为民 ). 1981. 试论轻声和重音 (Discussion on the neutral tone and stress). 中国语文 (Studies of
the Chinese Language), 1: 35-40.
Li Z Q ( 李智强 ). 2018. 汉语语音系的与教学研究 (Studies in Acquisition and Teaching of Mandarin
Chinese Phonetics). Beijing: Beijing Language and Culture University Press.
Liang L ( 梁 磊 ). 2003. 声 调 与 重 音 —— 汉 语 轻 声 的 再 认 识 (Tone and stress: the Chinese neutral tone
revisited). 第六届全国现代语音学学术会议论文集 ( 上 ) (Proceedings of the 6th National Conference
on Modern Phonetics, 1). Tianjin: Nankai University: 192-197.
Lin T ( 林焘 ). 1957. 现代汉语补足语里的轻音现象所反映出来的语法和语义问题 (Grammatical and
semantic problems related to the non-stress phenomenon in modern Chinese complements). 北京大学学
报 ( 人文科学 ) (Journal of Peking University; Philosophy and Social Sciences), 9: 61-74.
Lin T ( 林焘 ). 1962. 现代汉语轻音和句法结构的关系 (The relationship between non-stress and grammatical
structure in Modern Chinese). 中国语文 (Studies of the Chinese Language), 7: 301-311.
Liu Y H, et al. 2017. Integrated Chinese (4th ed.). Boston: Cheng & Tsui Company.
Peng S-H, Chan M K M, Tseng C-Y, et al. 2005. Towards a Pan-Mandarin system for prosodic transcription.
In: Jun S-A. Prosodic Typology: The Phonology of Intonation and Phrasing. Oxford: Oxford University
Press, 230-270.
Silverman K, Beckman M, Pitrelli J, et al. 1992. ToBI: a standard for labeling English prosody. Proceedings of
the 1992 International Conference on Spoken Language Processing (ICSLP 92), 867-870.
Švarny O. 1974. Variability of tone prominence in Chinese (Pekinese). Asian and African Languages in Social
Context. Dissertationes Orientales (34). Praha: Academia, 127-186.
Švarny O. 1991a. The functioning of the prosodic features in Chinese (Pekinese). Archiv Orientální, 59.2:
208-216.
Švarny O. 1991b. Prosodic features in Chinese (Pekinese): prosodic transcription and statistical tables. Archiv
Orientální, 59.3: 234-254.
Švarny O. 1998-2000. Učební Slovník Jazyka Čínského, I-IV (A Learning Dictionary of Modern Chinese, I-IV).
Olomouc: Palacky University.
Švarny O, et al. 1991-1993. Gramatika Hovorové Čínštiny v Příkladech, I-IV (A Grammar of Spoken Chinese
in Examples, I-IV). Bratislava: Komensky University.
Švarny O, Uher D. 2014. Prozodická Gramatika Čínštiny (A Prosodic Grammar of Chinese). Olomouc:
Palacky University.
Třísková H. 2011. Prozodická transkripce čínštiny O. Švarného: čtyři historické verze (O. Švarny´s prosodic
trancription of Chinese: four subsequent versions). Nový Orient, 66.4: 45-50.
Třísková H. 2016. De-stressed words in Mandarin: drawing parallel with English. In: Tao H Y. Integrating
Chinese Linguistic Research and Language Teaching and Learning. Amsterdam/Philadelphia: John
Benjamins Publishing Company, 121-144.
Třísková H. 2017a. Acquiring and teaching Chinese pronunciation. In: Kecskes I. Explorations into Chinese
204 韵 律 语 法 研 究 第 九 辑
汉语韵律标注(CHIPROT)与韵律
结构的预测
廖 敏
捷克科学院东方研究所
Hana Třísková
Oriental Institute, the Czech Academy of Sciences, Prague
triskova@[Link]