Mathematical Linguistics 數理語言學
Spring 2023            Thursday () 14:10-17:00            文學院413

課碼: 1306564

 

UPDATED 2023/04/27

Me

James Myers (麥傑)
Office: 文學院247
Tel: 31506
Email: Lngmyers at the university address (ccu...)

Office hours: Wednesday 10-12, or by appointment (made at least 24 hours ahead)

 

Goals

“Mathematical linguistics” means different things to different people. This version of the course has a particular focus on models of learning, as we compare two main approaches: rationalist models (proofs and theorems) like formal learning theory and empiricist models (just try whatever works) like neural network modeling. By the end of the semester, you should feel much more comfortable thinking in a formally precise way about grammar, corpus analysis, language acquisition, and psycholinguistics.

 

Grading

10% Class participation
30% Leading discussion
40% Exercises
20% Research presentation

 

What the class is like

        Rather than passively listening to lectures from a textbook (which doesn’t exist for this version of the course anyway), we will read and discuss classic and recent papers together. I purposely chose readings with lots of math in them (of course!), so read them in a “top-down” way: focus on each paper’s main claims instead of getting stuck on tiny details, though you should still feel free to ask about anything in class. Since these are math papers, the main claims will involve technical concepts and formulas: don’t skip them! Hopefully the authors were nice enough to make the math figure-out-able just by carefully studying the paper itself, but if you think it’s still unclear, it might be their fault, not yours!

        Class participation means that you discuss: you read, think, talk, and respond to others’ ideas. Don’t be afraid to ask for clarification - that’s also part of the discussion.

        Every week somebody will lead the discussion on the week’s readings, using a handout as a guide. The discussion leader should NOT lecture us or search the internet for related information, but instead help us understand the reading and its real-world relevance by asking open-ended questions that inspire people to get involved and express what they think. Please be sure to post a PDF file of your questions to the E-Course “discussion” section by 12 noon on the day of class, so everybody has time to download (and maybe print) it before class.

        In order to get hands-on experience with some of the technical methods that we will read about, there will be two take-home exercises (due on 4/13 and 5/18). Each exercise will be distributed two weeks before it is due.

        On 5/11, about a month before the end of the semester, you will propose an original research project of your own, applying the mathematical models discussed in class. This may involve theoretical analyses, using an existing computer program, and/or writing your own new program. On the last day of class (6/8), you’ll give a presentation about your research findings, which I’ll grade for style, logic, and theory. (If you can’t attend that day, send me your presentation file and I’ll present for you.) There is NO TERM PAPER (yay).

            WARNING #1: Plagiarism (pretending that other people’s words and ideas are your own) is a serious crime and will not be tolerated. Homework or other graded things containing plagiarism will receive a score of zero, and you will be reported to the department chair.

            WARNING #2: Submit your homework and other graded things on time! Unless you have a really good excuse, you will lose 5 points for each day you are late. So don’t make yourself sick working overnight, but get your stuff done early enough.

 

Schedule

*Marks when something is due

Date

Topic/Activity

Readings

Leader

2/16

What should linguists know about math?

 

 

2/23

Information theory basics

Shannon (1948) [§0-10]

Rioul (2018) [§1-14]

Myers

3/2

Information theory and psycholinguistics

Hale (2016)
Mollica & Piantadosi (2019)

Sylvia
Elaine

3/9

Formal language theory

Fitch & Friederici (2012)

Sabrina

3/16

Formal learning theory

Heinz (2016)

又睿

3/23

Bayesian learning models

Perfors et al. (2011)

JR

3/30

Bayesian data analysis

[Distribute Exercise 1]

Vasishth et al. (2018)

Myers

4/6

NO CLASS [校際活動]

 

 

*4/13

Neural network basics

[Exercise 1 due]

Abdi (1994)

Sam

4/20

Neural networks vs. real brains

Lillicrap et al. (2020)

Schaeffer et al. (2022)

Elaine

Sabrina

4/27

Neural network models of language in time

Elman (1990)

Linzen et al. (2016)

Sylvia

JR

5/4

Neural network models of written language

[Distribute Exercise 2]

Lane et al. (2019)

Hannagan et al. (2021)

JR

Sylvia

*5/11

Discuss your research progress

 

 

*5/18

Maximum entropy models

[Exercise 2 due]

Hayes (2022)

Elaine

5/25

Linear discriminative learning

Baayen et al. (2018)

Sabrina

6/1

Modeling language evolution

Kirby & Tamariz (2022)

Lazaridou & Baroni (2020)

又睿

又睿

*6/8

Presentations [last class]

 

 

 

Readings

 

Abdi, H. (1994). A neural network primer. Journal of Biological Systems, 2(03), 247-281. [Sections 0-4]

Baayen, R. H., Chuang, Y. Y., & Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13(2), 230-268.

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179-211.

Fitch, W. T., & Friederici, A. D. (2012). Artificial grammar learning meets formal language theory: an overview. Philosophical Transactions of the Royal Society B: Biological Sciences, 367(1598), 1933-1955.

Hale, J. (2016). Information‐theoretical complexity metrics. Language and Linguistics Compass, 10(9), 397-412.

Hannagan, T., Agrawal, A., Cohen, L., & Dehaene, S. (2021). Emergence of a compositional neural code for written words: Recycling of a convolutional neural network for reading. Proceedings of the National Academy of Sciences, 118(46), e2104779118.

Hayes, B. (2022). Deriving the wug-shaped curve: A criterion for assessing formal theories of linguistic variation. Annual Review of Linguistics, 8, 473-494.

Heinz, J. (2016). Computational theories of learning and developmental psycholinguistics. In J. Lidz, W. Snyder, and J. Pater (ed.) (Eds.) The Cambridge handbook of developmental linguistics (pp. 633-663). Cambridge, UK: Cambridge University Press.

Kirby, S., & Tamariz, M. (2022). Cumulative cultural evolution, population structure and the origin of combinatoriality in human language. Philosophical Transactions of the Royal Society B, 377(1843), 20200319.

Lane, H., Howard, C., & Hapke, H. (2019). Chapter 7: Getting words in order with convolutional neural networks (CNNs). In Natural language processing in action: Understanding, analyzing, and generating text with Python (pp. 218-246). Shelter Island, NY: Manning.

Lazaridou, A., & Baroni, M. (2020). Emergent multi-agent communication in the deep learning era. arXiv preprint arXiv:2006.02419.

Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335-346.

Linzen, T., Dupoux, E., & Goldberg, Y. (2016). Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Transactions of the Association for Computational Linguistics, 4, 521-535.

Mollica, F., & Piantadosi, S. T. (2019). Humans store about 1.5 megabytes of information during language acquisition. Royal Society Open Science, 6(3), 181393.

Perfors, A., Tenenbaum, J. B., Griffiths, T. L., & Xu, F. (2011). A tutorial introduction to Bayesian models of cognitive development. Cognition, 120(3), 302-321.

Rioul, O. (2018). This is it: A primer on Shannon’s entropy and information. In B. Duplantier and V. Rivasseau (Ed.s) Information Theory (pp. 49-86). Birkhäuser. [Sections 1-14]

Schaeffer, R., Khona, M., & Fiete, I. (2022). No free lunch from deep learning in neuroscience: A case study through models of the entorhinal-hippocampal circuit. 2nd AI4ScienceWorkshop at the 39th International Conference on Machine Learning.

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379-423. [Sections 0-10]

Vasishth, S., Nicenboim, B., Beckman, M. E., Li, F., & Kong, E. J. (2018). Bayesian data analysis in the phonetic sciences: A tutorial introduction. Journal of Phonetics, 71, 147-161.

 

Interesting links

 

Programming languages

 

* Excel: easy to use for many types of calculations <1000s of websites online>

* R: powerful statistics programming language <https://cran.r-project.org/>

* Python: most widely used general programming language <https://www.python.org/>

 

Information theory

 

* Shannon entropy calculator <https://www.shannonentropy.netmark.pl/>

* Maxent Grammar Tool <https://linguistics.ucla.edu/people/hayes/MaxentGrammarTool/>

 

Formal language theory

 

* Automaton Simulator <https://automatonsimulator.com/>

* JFLAP (for downloading) <https://www.jflap.org/>

 

Bayesian models

 

* Simple Bayes calculator <https://psych.fullerton.edu/mbirnbaum/bayes/bayescalc.htm>

* Stan <https://mc-stan.org/>

 

Neural networks

 

* Neural networks videos (among many many others)
- Overview: <https://www.youtube.com/watch?v=pdNYw6qwuNc>

  - LSTM vs. transformers: <https://www.youtube.com/watch?v=S27pHKBEp30>

  - Non-video but picture-based explanation of the same things:

<https://colah.github.io/posts/2015-08-Understanding-LSTMs/>

<https://jalammar.github.io/illustrated-transformer/>

- Convolution: <https://www.youtube.com/watch?v=-QQML5kf26Q>

* Neural network simulator <https://www.mladdict.com/neural-network-simulator>

* Online demo of convolutional network learning to read
<https://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html>

* TensorFlow <https://www.tensorflow.org/>

 - Online interface for playing <https://playground.tensorflow.org/>

 - Keras: user-friendly interface for programming <https://keras.io/>

 - R interface for programming <https://tensorflow.rstudio.com/>

* Software for linear discriminative learning
<https://sfs.uni-tuebingen.de/~hbaayen/software.html>

 

Language evolution

 

* Language Evolution Simulation (simulates word coinage)
<https://rmeertens.github.io/language-evolution-simulation/>

* Onset (simulates historical sound change) <https://onset.cadel.me/>

* Color Game: mobile app that was used to study human creation of new languages
<https://colorgame.net/>