Introduction to Contemporary Linguistics December 16, 1998 Language processing by you, and by computer OVERVIEW: 1. Language processing 2. Human language processing: comprehension 3. Human language processing: production 4. Computers as linguistic tools 5. Computers as linguistic modellers 6. Computers as language users ============================================================= 1. Language processing >So far we have studied people's knowledge of language (competence). But how do people actively use language in real time (performance)? How can computers use language? >These questions involve processing: the manipulating, changing, analyzing, and/or creating of linguistic structures in real time. >First, we will examine psycholinguistics: the processing of language in the human mind. >Psycholinguistics is part of cognitive psychology: the study of the mind, which is the action of the brain. >The four dimensions of psycholinguistics: >Level of processing: discourse, syntax, morphology, phonology, etc. >Development of processing: language acquisition >Depth of processing: Descriptive linguistics: description of language Psycholinguistics: study of language behavior Neurolinguistics: study of relationship between language behavior and the physical brain >Direction of processing: comprehension vs. production 2. Human language processing: comprehension (perception and understand) >Overview of language comprehension [OVERHEAD] >We'll just look at two parts: the lexicon and syntax. >Lexical access: how people access (reach) the lexicon when listening to speech or when reading >The speed of lexical access depends on at least two factors: >Frequency: how common the word is >Priming: whether the word has already been accessed recently >The effect of frequency can be seen in experiments using the lexical decision task: >People are given a list containing both real words and fake words, and they have to decide which is which. >People are faster if the real words have high frequency [OVERHEAD] >Priming: the presentation of one word (the prime) speeds up response to another word (the target) if they are related in some way. >Priming in a lexical decision task: Prime Target Response to target Unrelated prime: ¶º ª¯ normal speed Related prime: ¿ß ª¯ faster >Priming probably works because related words are stored near each other in the brain [OVERHEAD] >Priming can be used to study semantic ambiguity, when words have more than one meaning: ¤¤À\ = "lunch" or "Chinese food" >Question: Does the brain still access both meanings even when the context tells us which one is right? >Experiment (recently done at Chung Cheng University): >People heard sentences like this: ¦pªG¦³¥~°ê«È¤H¨Ó³X¡A§Ú³q±`¤£©Û«Ý¥L­Ì¦Y¦è¦¡¶¼­¹¡A ¦Ó·|½Ð¥L­Ì¦Y¡i¤¤À\¡j¡A³o¼Ë¥L­Ì¤~·|¦³·sAªº·Pı¡A ¨Ã¥B¯d¤U¤ñ¸û²`¨èÃø§Ñªº¸gÅç¡C >This context makes it clear that the meaning is "Chinese food", not "lunch". >As soon as they heard the word ¤¤À\ , they would see another word written on a computer screen: >Related to this meaning: ªF¤è >Related to other meaning: ¤È¶º >Not related to either meaning: ¤åªk >Fake word: ¬ü­Ì >People had to do a lexical decision task with the written words. >Results: response was faster for BOTH related words! >Conclusion: People access ALL meanings of words when they hear a sentence, and they ignore the context! >This may imply that something is innate about language: word processing and sentence processing are independent, as if processed by different "modules" of the brain. >Now to sentence parsing: deciding on the syntactic structure of a sentence while listening to speech or reading. >Imagine you are hearing an English sentence. [OVERHEAD] >First you hear "the". What do you know about the structure of the sentence? >Then you hear "actor". Now what do you know? >Then you hear "thanked". What comes next...? >Question: which way do people parse? >(A) They wait until they know for sure before they decide on the syntactic structure for the whole sentence. >(B) They make an immediate guess about where the current word goes, even if this turns out to be wrong. >Answer: (B). Evidence comes from garden path sentences: grammatical sentences that are difficult to parse because at first they seem to have one structure, but later they turn out to have another (so they make you wander down a "garden path" and get lost!). (1) a. The glass dropped from my hand. [easy] b. The glass dropped by the boy broke. [harder] c. The glass dropped from my hand broke. [hardest] >In each sentence, you expect that "dropped" is the main verb of the sentence, not part of a relative clause. So when you get to the word "broke" in (1a,b), you are confused: at first, you parsed the sentences incorrectly! [OVERHEAD] (2) ¦b­º³£¾÷³õ¤w¸gÃö³¬¤F¡C (3) §Ú­Ì¦b§O¹Ö¤U­±¦V®ü¬v¡C (4) ¤ý¸g²z³ßÅw³Üªk°ê¸²µå°sªº¹µ­û¡C (5) §Ú­Ì¦h¶R¤@¨Ç½­µæ¦Y¸z­G¤~·|ı±oµÎªA¡C >Hypothesis (B) must be right, since you get "lost". If you waited until the end of the sentence, you wouldn't get lost in the middle. >But how does it work? How can people make an immediate guess when they don't have enough information yet? Do they follow innate processing principles? 3. Human language processing: production >Overview of language production (OVERHEAD) >The easiest way to study language production is to study when it goes wrong in speech errors (slips of the tongue) >Speech errors come in many kinds (some examples from my own small collection of English spoken by a Taiwanese): [OVERHEAD] >Speech errors show that linguistic units are really used when we speak: (examples from ³¯®¶¦t) >Morphemes: §Ú­Ì­nª±ªº¹CÀ¸... --> §Ú­Ì­n¹Cªº¹CÀ¸... >Segments: ¤j¦P°Ó±M --> ¤j¦P [êŒn55][têuŒ÷55] >Features: «ç»ò¸I¨ì¤F¤@¦ìºC­¦¤¤ --> «ç»ò¸I¨ì¤F¤@¦ì[lŒ÷51][mŒn35]¤¤ >They also show that rules are really used when we speak: (example from ¸U¨ÌµÓ) Intended utterance Wrong tone Tone rule used ¥L«Ü¶Æ¡C --> ¥L«Ü[Œu214] --> ¥L[x\n35][Œu214] >In summary, psycholinguistics helps us understand how the different parts of grammar actually work when people are comprehending and producing language. 4. Computers as linguistic tools >Now we turn to computers. How can computers help linguists with the study of language? >(1) Computers can just be used as tools, basically just fancier versions of old tools like paper and pens and books. >(2) Computers themselves can use language: they can "talk" to us, "listen" to us, and "understand" what we are saying (hopefully). >(3) Computers can be used to model language processing, so we can learn more about human language processing. >One important use of computers as tools is in corpus linguistics: linguistics that uses a large collection of written or spoken language (corpus="body", as in "body of data"; pl. is corpora) >What can corpus linguistics tell us? >A corpus can tell us how people REALLY use language, since our intuitions are often inaccurate or incomplete. >For example, is "³o­Ó¦n" a grammatical NP? >Here is a sentence from a Chinese corpus using it: §A¹ï§Ú¦n¡A§Ú¤]¹ï§A¦n¡A³o­Ó¡§¦n¡§´NÅܱo¨ã¦³¥Í©R¤O¡C >A corpus can tell us how common or rare something is, such as a word (lexical frequency), or a combination of words, or a syntactic structure. >An example: a computerized corpus for Chinese can help us figure out what classifiers really "mean": [OVERHEADS] http://www.sinica.edu.tw/ftms-bin/kiwi.sh See my WWW page for more links about computers and many other topics! http://www.ccunix.ccu.edu.tw/~lngmyers /introresources.html 5. Computers as language users >Computer programs can be written to produce and comprehend language. It turns out that production is easier, if you know what you want the computer to say. >Synthesized speech: Pronunciation by computers (they don't necessarily understand what they're saying!) >Synthesized speech relies both on a lexicon (to memorize word pronunciations) and a "grammar" (to change the pronunciation of words in different contexts, and to figure out how to pronounce words that are not in the lexicon). >An English example: [OVERHEAD/TAPE] >A Chinese example: [TALKING DICTIONARY] >Speech recognition: Speech perception by computer (they don't necessarily know what the meaning is!) >Computerized speech recognition is very, very hard. >The program can't just depend on a dictionary, since the same word will be pronounced differently in different contexts. >Phonological rules would probably be helpful. >Another problem is that people's voices are different, which will also confuse the computer. >There is thus often a trade-off between the number of words and the number of voices that the computer can recognize. >Parsing: Figuring out the syntactic structure of sentences. >An example of a syntax parser for English: http://bobo.link.cs.cmu.edu/cgi-bin/grammar /build-intro-page.cgi >Computers are actually still pretty bad at speech synthesis, speech recognition, and parsing. This shows just how complex human language really is! >Many "ordinary" things that people do is still too hard for machines. Sure, computers can play chess, but robots still don't know how to walk without falling over all the time! We still have a lot to learn about ourselves and our minds. 6. Computers as linguistic modellers >Computers can be used to model theories of language processing to see if they really work. >For example, we have assumed that people need grammatical rules in order to say things they have never said before: "John likes to plip" --> "Yesterday John plipped." English past tense rule: Verb + ed >But are rules really necessary? >Maybe people just guess based on examples they have heard before: If: trip --> tripped slip --> slipped clip --> clipped Then: plip --> plipped >Maybe so, but how can we describe this idea precisely? With a kind of computer model called connectionism. >Connectionist model: a computer program that stores information in the connections of a network, similar to the way that the brain stores information in a network of brain cells. >A connectionist model of English past tense: [OVERHEAD] >This model can make guesses about past tense forms even for words it doesn't know, since all the examples it has ever seen are encoded in the network of connections: [OVERHEAD] >However, the guesses are often wrong, so maybe we do need rules after all: [OVERHEAD] >Another example: a program that figures out where the syllables are in words without using any rules, just "constraints": http://www.u.arizona.edu/ic/hammond/mhlwfront.html