The Fifteenth Annual: North American Computational Linguistics Open Competition 2021
The Fifteenth Annual: North American Computational Linguistics Open Competition 2021
The Fifteenth
Annual
North American
Computational
Linguistics
Open
Competition
2021
www.nacloweb.org
Invitational Round
March 11, 2021
Rules
1. The contest is four hours long and includes ten problems, labeled J to S.
2. Follow the facilitators’ instructions carefully.
3. If you want clarification on any of the problems, talk to your facilitator. The facilitator will
consult with the jury and convey their answer.
4. You may not discuss the problems with anyone during or after the contest except as de-
scribed in item #3.
5. Each problem is worth a specified number of points, with a total of 100 points. In the Invita-
tional Round, some questions require explanations.
6. All your answers should be written clearly in the Answer Sheets in blue or black ink.
7. Write your name and registration number on each page of the Answer Sheets.
8. You can use the last page of the Answer Sheets if you need extra space to answer a question.
Clearly indicate which problem this additional answer applies to.
9. The top students from each country (USA and Canada) will be invited to the next stage.
10. Each problem has been thoroughly checked by linguists and computer scientists as well as
students like you for clarity, accuracy, and solvability. Some problems are more difficult than
others, but all can be solved using ordinary reasoning and some basic analytic skills. You
don’t need to know anything about linguistics or about these languages in order to solve
them.
11. If we have done our job well, very few contestants will solve all these problems completely
in the time allotted. So, don’t be discouraged if you don’t finish everything.
12. DO NOT DISCUSS THE PROBLEMS UNTIL THEY HAVE BEEN POSTED ONLINE!
THIS MAY BE A COUPLE OF MONTHS AFTER THE END OF THE CONTEST.
Program Committee:
Adam Hesterberg — Massachusetts Institute of Technology
Alan Chang — University of Chicago
Aleka Blackwell — Middle Tennessee State University
Ali Sharman — University of Michigan
Andrés Salanova — University of Ottawa
Andrew Lamont — University of Massachusetts
Annie Zhu — Harvard University
Babette Verhoeven — Aquinas College
Daniel Harbour — Queen Mary University of London
Daniel Lovsted — McGill University
David Mortensen — Carnegie Mellon University
Dick Hudson — University College London
Dragomir Radev — Yale University
Elisabeth Mayer — Australian National University
Erik Andersen — Brandeis University
Ethan Chi — Stanford University
Gordon Chi — Stanford University
Harold Somers — All Ireland Linguistics Olympiad
Heather Newell — Université du Québec à Montréal
James Hyett — University of Toronto
Jill Vaughan — University of Melbourne
Kevin Liang — University of Pennsylvania
Lori Levin — Carnegie Mellon University
Lynn Clark — University of Canterbury
Margarita Misirpashayeva — Massachusetts Institute of Technology
Mary Laughren — University of Queensland
Oliver Sayeed — University of Pennsylvania
Patrick Littell — University of British Columbia
Simi Hellsten — University of Oxford
Sonia Reilly — Massachusetts Institute of Technology
NACLO 2021 Organizers (cont’d)
Program Committee (continued)
Sophie Ishiwari — University of Pennsylvania
Tom McCoy — Johns Hopkins University
Vlado Keselj — Dalhousie University
Problem Credits:
Problem J: Ethan Chi
Problem K: Harold Somers
Problem L: Evan Hochstein
Problem M: Simi Hellsten
Problem N: Ethan Chi
Problem O: Aleka Blackwell
Problem P: Gordon Chi
Problem Q: Tom McCoy and Ryan Chi
Problem R: Pranav Krishna
Problem S: Ethan Chi and Daniel Lovsted
Booklet Editor:
Pranav Krishna — Massachusetts Institute of Technology
Daniel Lovsted — McGill University
US Team Coaches:
Aleka Blackwell — Middle Tennessee State University
Dragomir Radev — Yale University
Lori Levin — Carnegie Mellon University
We are grateful for the support of many institutional and individual donors who make this contest possible.
All material in this booklet © 2021, North American Computational Linguistics Open Competition and the
authors of the individual problems. Please do not copy or distribute without permission.
(J) A Vintage Sound System (1/2) [10 points]
The Chinese language was first spoken in a small area in Henan, China, around 1000 BCE, during the Zhou
Dynasty. However, by the time of the Tang Dynasty (700 CE), many words had changed significantly. For ex-
ample, the sound -aj lost its final -j. Below is a table of some words in Chinese, according to a recent histori-
cal reconstruction of the language. Each word has two pronunciations: those of the Zhou era (Old Chinese)
and those of the Tang era (Middle Chinese). (A pronunciation guide is provided on the next page.) However,
some pronunciations in the table are missing.
Old Chinese (Zhou) Middle Chinese (Tang) English Translation
(1) dó ‘to come to’
mˤǝ mō ‘soot’
rajs ljè ‘to revile’
pˤǝk (2) ‘north’
pˤat (3) ‘to stop in the open’
lˤep dẽp ‘butterfly’
(4) bãk ‘calm, still’
dzak dzjẽk ‘stone’
braj bjē ‘to exhaust’
ŋˤajs ŋà ‘hungry’
pˤeks pè ‘favorite’
pˤaj pā ‘wave’
dzˤǝ (5) ‘wealth’
tˤep tẽp ‘paralyzed’
kˤe kē ‘chicken’
nǝʔ ní ‘ear’
gres gjè ‘water-chestnut’
prǝʔ pí ‘border town’
gǝ gī ‘his’
lˤaj (6) ‘to flow’
grajʔ gjé ‘to stand’
tǝk tĩk ‘to go to’
sˤǝks sò ‘frontier’
mrajʔ mjé ‘to share with’
beʔ bjé ‘female servant’
raj ljē ‘to drag into’
(7) sī ‘silk’
lˤek dẽk (name of an ancient tribe, the Beidi, to the north of China)
(J) A Vintage Sound System (2/2)
J1. Fill in the missing pronunciations from the following choices: dā, dzō, sǝ, põk, lˤǝʔ, bˤak, pãt.
J2. Match the Old Chinese words on the left to the Middle Chinese words on the right.
Old Chinese Middle Chinese
1. pˤajʔs A. pjē ‘humble’
2. pˤajʔ B. pà ‘to winnow’
3. pajʔ C. mjē ‘rice gruel’
4. pe D. pẽk ‘wall (of a house)’
5. pˤek E. pjé ‘that’
6. mraj F. pá ‘to limp’
J3. Give the Middle Chinese pronunciations of the following Old Chinese words.
Old Chinese Middle Chinese English Translation
kraj (1) ‘bridle’
nˤǝ (2) ‘violent’
rak (3) ‘female servant’
pre (4) ‘pole’
bˤǝʔ (5) ‘double’
mˤajs (6) ‘dust’
J4. Give the Old Chinese pronunciations of the following Middle Chinese words. If there are multiple possi-
bilities, write all of them, separated by commas.
Old Chinese Middle Chinese English Translation
(7) ŋjē ‘to make a sacrifice to the deity of the soil’
(8) tõk ‘to obtain’
J5. Explain the sound changes that occurred between Old and Middle Chinese.
Pronunciation Notes:
• The letter ˤ after a consonant marks pharyngealization of the preceding consonant (constriction of the
throat).
• j is pronounced like y in English yes.
• ʔ is the so-called glottal stop (the sound between the two syllables of uh-oh).
• ǝ is the vowel of English cut.
• ŋ is the final sound of English hang.
• The symbols ¯, ´, ` and ˜ mark tones.
Make sure you record your answers in your Answer Sheets!
(K) Putting a Place to a Name (1/2) [5 points]
Tamazight is a family of closely related languages spoken by tens of millions of people across North Africa.
Tamazight languages are official languages in Morocco (which is also called Imeghrib in Tamazight) and Alge-
ria (in Tamazight, Dzhayr or Lezzayer).
Tamazight can be written using the Latin alphabet, but it also uses the Tifinagh script, which dates back more
than 2000 years, although it has been adapted for modern use. Tifinagh can be written left-to-right, right-to-
left, or bottom-to-top, with the orientation of some of the symbols altered accordingly. In this problem, all
words are written left-to-right.
On the next page is a list of place names in Tamazight, written in the Tifinagh script, on the left, and the same
places named in English on the right, in a scrambled order. Note that the Tamazight names and the English
names are not always exactly the same as each other (even after converting from one alphabet to the other).
For two of the places, the names are really quite different.
5. ⴰⵙⴼⵉ E. Timbuktu
6. ⴱⴻⵛⵛⴰⵔ F. Béchar
7. ⵙⵉⵡⴰ G. Safi
8. ⴰⵏⴼⴰ H. Tangiers
9. ⵇⴰⵏⴰⵔⵉⴰ I. Oujda
Below on the left are twelve sentences, six in Hawu and six in Dhao. The Hawu and Dhao sentences are
mixed together. On the right are their English translations in an arbitrary order. Each English translation cor-
responds to one sentence in Hawu and one sentence in Dhao.
1. Èi suti. a. She is walking along the edge of the sea.
2. Pehewina noo ri roo. b. They keep walking to Seba.
3. Ra kako taruu asa Sèba. c. They see her head.
4. Ladhe ina na sanède, baku pakèdi. d. They reminded her.
5. Ta nèru ke noo oro ngidi dahi. e. If her mother remembers, don’t leave.
6. Huti ne èi. f. The water spilled.
7. Ra pasanède na.
8. Na kako madhutu sebhe dhasi.
9. Ki ta hewina ke ne ina noo, b’ole pekèd’i.
10. Ta ngède ke ri roo ne kètu noo.
11. Ta nèru ke roo teruu la Hèb’a.
12. Ra ladhe kètu na.
_____________
1
Until conversions in the 1970s, most Hawu people maintained their traditional religion and ways of life. The Hawu people re-
member genealogies spanning hundreds of years that preserve Hawu history and structure Hawu society.
2
The Dhao people recount that their island was first settled by people from the island of Hawu. The Hawu also tell a version of this
history. Traditionally, Dhao women are weavers, while Dhao men are gold- and silver-smiths.
(L) Is This Problem Intelligible? (2/2)
L1. For each sentence, indicate whether it is in Hawu or in Dhao. Then match it to the corresponding English
translation.
a. Ra pasanède ina.
b. Ki ta pedutu ke roo ri ina noo, ta ngède ke noo ri roo.
c. Pehewina roo ri noo.
d. Ladhe na puru, na ladhe sebhe.
e. B’ole bèj’i.
L4. In the table of Hawu and Dhao words at the beginning of the problem, the Dhao word related to the
Hawu word pedutu was left blank. Fill in the blank with the related Dhao word. (Hint: it appears somewhere
in this problem!)
L5. Dialects of a language tend to be largely mutually intelligible — that is, speakers of two dialects can un-
derstand each other without much effort. Based on the features of Hawu and Dhao that you have observed,
are Hawu and Dhao mutually intelligible? Answer in three sentences or less.
Below and on the next page are some Tawala sentences and their translations. Note that you(sg) means “you
(referring to one person),” and you(pl) means “you (referring to more than one person).” Also note that some
of the Tawala words feature reduplication, which is a linguistic process in which all or a part of a word is
repeated. For this problem, you will not need to figure out how a reduplicated form is produced from a non-
reduplicated form, but you should pay attention to when a reduplicated form is used and when a non-
reduplicated form is used.
M3. Describe how to translate the English word “child” into Tawala.
M4. As noted in the introduction to this problem, reduplication is a linguistic process in which all or part of a
word is repeated. Describe when reduplication is used in Tawala. You do not need to describe how the
reduplicated form of a word is created; you only need to describe when reduplicated forms are used.
The words “high,” “mid,” “low,” “front,” and “back” in these notes refer to the position of the tongue in the
mouth, and the words “rounded” and “unrounded” refer to the shape of the lips. “Velar” refers to a specific
part of the roof of the mouth.
Although Dagaare is a tonal language, for simplicity all tones (as well as vowel length marks) have been
omitted.
Note: The Dagaare language is spoken by around 1.1 million Dagaaba people in Ghana and Burkina Faso. The
Dagaaba are a farming people noted for their sophisticated music, usually performed in the form of xylo-
phone (gyil) duets accompanied by drums (iil); another common form is solo melodies performed on bamboo
flute (wul). The duiker is a small antelope native to Sub-Saharan Africa famous for its antisocial nature. Mag-
gots are the larvae of flies, typically found in large groups on rotting organic material. The intestines (in hu-
mans, comprising the small intestine and the large intestine) are part of the digestive tract.
Compounding is a common word formation process in Vengo. Similarly to English, compounds are variably
written as one word, two words, or two hyphenated words in Vengo. Each Vengo word in the column on the
left has its English translation (which is explicitly marked as a noun (n.) or a verb (v.) for clarity) somewhere in
the column on the right, but the translations are in a scrambled order.
___________
1
Speakers of the language call their village Vengo and their language Ghang Vengo [ɣáŋ vəŋóo]; however, the village is officially
called Babungo and appears with this name on maps of Cameroon, and the language is, therefore, often referred to as Babungo.
(O) Cameroonian Compounds (2/2)
Below are some additional Vengo words, each one written next to its English translation. Note that (adj.) indi-
cates that the word is an adjective.
O2. From the English options Q. through W., choose the most likely meaning of each of the Vengo compound
words a. through d., in light of the additional Vengo words and meanings given above.
O3. What is the likely English word equivalent of the Vengo word fɨ ?
A visitor has asked five of the seven children to briefly introduce their brothers, sisters, and cousins. The chil-
dren’s responses are below. Based on these responses, you will need to give the name of the child who corre-
sponds to each numbered position in the family tree; note that we have already provided one match for you
(Krihisiwa is 7 in the tree). You may assume that the children labeled 1 through 7 are the only ones in their
generation in this family, and you may also assume that any numbers in the children’s responses are exact
(e.g., “I have two soriwa” means “I have exactly two soriwa”).
P1. Identify which children occupy the numbered positions of the family tree.
P2. Describe the meaning of the following words: suaboya, soriwa, amiwa, eiwa.
Note: For simplicity, we have used the singular form of all Yanomamö terms, even when those terms are re-
ferring to more than one person.
_________________________________________________________________________________________
Example family tree: In case you are unfamiliar with family trees, below is an example. B, C, & D are siblings;
so are E & F, and I & J. A & B are married, as are F & G and J & K. The parents of E & F are A & B. Similarly, the
parents of I & J are F & G, and E is the father of H. To give a few examples of more distant relationships: A is
the grandmother of H, I, & J; F is the aunt of H; and C is the sister-in-law of A.
Let's consider English for a moment. The word scofflaw means "a person who openly disregards the law." Try
saying this word out loud. Even if you've never seen scofflaw before, you probably pronounced it correctly,
with the emphasis on the first syllable (SCOFF.law), instead of on the second syllable (scoff.LAW).
Now take a deep breath and try saying these next two words: galligaskins ("loose-fitting breeches") and ul-
tracrepidarian ("a person who expresses opinions on matters outside their expertise"). Once again, even if
you've never seen these words before, you probably intuitively knew which syllables to stress. The correct
pronunciations of these words are GALL.i.*GAS.kins and UL.tra.CRE.pi.*DA.ri.an.
How should you read this notation? There are 3 things to remember:
How is it that you intuitively know which syllables to stress, even for unfamiliar words? The answer is that
English speakers must have some systematic way of assigning stress to novel words. In task Q1, we present a
simplified version of one theory of how English stress assignment works.
Q1. Based on the data on the next page, fill in the blanks for the following stress assignment algorithm. Each
blank corresponds to exactly one word. After filling in the blanks, your algorithm should correctly predict the
stress for each of the 9 English words in the table on the next page. (Some blanks can be filled equally well
by multiple answers. You only need to provide one correct answer).
___________________
1
In case you're wondering how to tell which syllable in a word has primary stress, one technique is called the "Lassie test." To use
this technique, pretend that the word is the name of a dog and that you want to call the dog inside. Whichever syllable you elon-
gate when you call out the dog's name is the syllable with primary stress. For example, if your dog were named Ultracrepidarian,
you would call out something like "Ultracrepi-DAAAA-rian!"
2
Note that we use the term odd-numbered syllables to refer to the first syllable, third syllable, fifth syllable, etc. We use the term
even-numbered syllables to refer to the second syllable, fourth syllable, sixth syllable, etc.
(Q) A Stress Test (2/4)
Here is the relevant data for Q1.
Word Stress
elephant *E.le.phant
crush *CRUSH
vitamin *VI.ta.min
illustration IL.lu.*STRA.tion
dime *DIME
scofflaw *SCOFF.law
galligaskins GALL.i.*GAS.kins
ultracrepidarian UL.tra.CRE.pi.*DA.ri.an
supercalafragilisticexpialidocious SU.per.CA.li.FRA.gi.LI.stic.EX.pi.A.li.*DO.cious
Q2. Stress assignment in English is a complex topic; the algorithm in Q1 only covers some of the factors that
affect English stress. Based on the conversation below (which was annotated for stress by a human), what
are some further properties that might need to be added to make the algorithm properly handle English?
Notes: You should only mention factors that are illustrated in the conversation below. If a word
has no capital letters in it, that means it has no stress.
Person A: i’m *HOP.ing to ex.*PORT my *PAINT.ings. *EACH *ONE *SHOWS a *COM.mon *OB.ject in a
*STRANGE *SET.ting.
Person B: i ob.*JECT to *THAT. we should *IM.port *ART, not *EX.port it!
Person A: well, i just *GOT a *PER.mit from the *CUS.toms *OFF.i.cer. she *SAYS that *ART can be an
*EX.cell.ent *EX.port.
Person B: if *SHE per.*MITS it, then i *GUESS *I must per.*MIT it too.
(Q) A Stress Test (3/4)
Not all languages stress their words in the same way that English does. However, it turns out that we can still
use the same basic algorithm for many other languages; we just need to introduce a few options in the state-
ment of this algorithm. Here is the more general algorithm:
1. Start at the [left / right] edge of the word. [Skip / don’t skip] the syllable at that edge and
then assign stress to [only the first / every alternating] syllable that you encounter.
2. If the word is longer than one syllable and if step (1) made the word’s final syllable stressed,
[leave it that way / un-stress it].
3. Assign primary stress to the [leftmost / rightmost] stressed syllable.
We refer to these five bolded options as parameters. By choosing the right set of parameters each time, we
can determine how to stress words in a wide variety of languages!
Q3. For the six languages presented below, examine the examples given to determine which stress assign-
ment parameters the language obeys.3 Select the correct values for each of the parameters mentioned
above. (For some languages, there may be multiple correct answers. You only need to provide one correct
answer. For simplicity, we have simplified the spellings of some of the example words.)
___________
3
In practice, it is possible to do this automatically. In fact, one of the desirable properties of parameter-based linguistic theories is
that they allow a learner (such as a baby acquiring the language, or a computer model being trained on sentences) to learn proper-
ties of the language based on just a few examples. This is because the set of parameters greatly constrains the set of possible sys-
tems that the learner has to distinguish between.
(Q) A Stress Test (4/4)
Q4. Unfortunately, this type of algorithm does not work for all languages. In Q2, we already saw some exam-
ples of how it fails to capture certain nuances of English. Below is another language (Selkup) where the algo-
rithm fails. Describe how stress is assigned in this language.
Here are a few more word representations, along with their English equivalents, in no particular order. Note
that a barometer is a tool for measuring air pressure, while a millibar is a unit of air pressure.
12. [0.3, 0, -0.6, -0.1] L. clock
13. [0.2, -0.2, -0.3, -0.2] M. first
14. [0.4, 0, -0.4, -0.4] N. second
15. [-0.6, 0.6, 0.2, -0.8] O. one
16. [-0.6, -0.2, -0.4, -0.4] P. three
17. [0.4, 0.8, -0.4, -0.4] Q. third
18. [1.6, 0, 1.8, 0.6] R. two
19. [0, 0, 0, -0.4] S. barometer
20. [-0.6, -0.4, -0.2, -0.4] T. half
21. [1.8, 0, 1.6, 0.4] U. millibar
R2. Match the vectors 12 through 21 to their English equivalents. There are two possible answers; either one
will receive full credit.
R3. The word third actually has two meanings that are relevant to the problem. The vector that is given for
third above is the average of the vectors that would represent these two meanings. Suppose English used
two different words for these two meanings, rather than a single word. What would you expect the vector to
be for each meaning?
R4. Below are the two vectors found for the words doctor and nurse. Even though these words are gender-
neutral, the method of defining a word based on the words that occur near it also captures general trends
and biases that are in the texts which were used to determine what words occur near each other. Identify
which of these vectors goes with which word, and explain how the vectors encode gender-related properties
of the corresponding words.
Here are some sentences in Jamsay, along with their English translations.
Note that the diacritics ´, `, ˇ, ˆ represent high, low, rising, and falling tones respectively. The symbol ∴ after a
word means that the word is pronounced with “dying-quail intonation,” an exaggerated prolongation of the
tone accompanied by an exaggerated drop in pitch.1 The symbol : after a vowel signifies length, and n signi-
fies nasalization of the previous sound.
__________
1
Jeffrey Heath, who studied Jamsay, writes: “The dying-quail intonation contour reminds me of the prosodic pattern of American
high-school cheerleaders calling out the letters of their school at sporting events, through their bullhorns (“give me an A…., give me
a B…”).”
(S) Peace Only (2/2)
S1. Translate the following Jamsay sentences into English:
Answer Sheets
REGISTRATION NUMBER
Name: ___________________________________________
Contest Site: ________________________________________
Site ID: ____________________________________________
City, State: _________________________________________
Grade: ______
Please also make sure to write your registration number and your name on each page of the Answer
Sheets, and turn in all pages of the Answers Sheets even if you have left some blank .
SIGN YOUR NAME BELOW TO CONFIRM THAT YOU WILL NOT DISCUSS THESE PROBLEMS WITH ANYONE
UNTIL THEY HAVE BEEN OFFICIALLY POSTED ON THE NACLO WEBSITE IN APRIL.
Signature: __________________________________________________
YOUR NAME: REGISTRATION #
J2. Fill in the letter of the correct option in the boxes below:
1. 2. 3. 4. 5. 6.
J5. Explain the sound changes that occurred between Old and Middle Chinese:
K1. 1. 2. 3. 4. 5. 6. 7. 8. 9.
L1. For each of the sentences, circle the language that it is in. Then, write the letter of its English translation.
L1. (continued)
9. Hawu Dhao 10. Hawu Dhao
(b)
(c)
(d)
(e)
Dhao:
(b) Hawu:
Dhao:
(c) Hawu:
Dhao:
a.
b.
c.
d.
a.
b.
c.
d.
(p) (q)
O1. Write the letter of the corresponding English translation for each number:
1) 2) 3) 4) 5) 6) 7) 8)
O2. Write the letter, from Q to W, of the English word that is the most likely meaning for each Vengo word:
a. b. c. d.
1) 2) 3)
4) 5) 6)
suaboya:
soriwa:
amiwa:
eiwa:
YOUR NAME: REGISTRATION #
(d) (e)
Q2. What are some further properties that should be added to the algorithm in order to handle English?
Q3. For each language, specify the value of the parameter in the correct cell of the table:
left / right skip / don’t skip only the first / leave it that way / leftmost /
every alternating un-stress it rightmost
Mapudungun
Maranungku
Weri
Mansi
Warao
Comalapa
Kaqchikel
YOUR NAME: REGISTRATION #
R1. Write the letter of the English word corresponding to the vector with each number:
1) 2) 3) 4) 5) 6)
7) 8) 9) 10) 11)
R2. Write the letter of the English word corresponding to the vector with each number:
R3. What are the two vectors for each of the meanings of the word “third”?
Vector: Definition:
Vector: Definition:
YOUR NAME: REGISTRATION #
R4. Identify which vector corresponds to which word, and explain how the vectors encode gender-related
notions of these words:
c. You have already untied all the melons because you ate the mango.
YOUR NAME: REGISTRATION #