|
International Workshop on |
April 13-15, 2007 |
|
Tutorial on experimental syntax
|
Wayne Cowart |
2007.4.13 |
The tutorial will cover practical aspects of the process of designing, executing, and interpreting experiments addressed to syntactic issues. All aspects of the process of developing and executing an experiment will be covered at least briefly. However, the emphasis will be on various manual, semi-automated, and automated techniques that can facilitate the task of constructing materials and questionnaires, ones that correctly and consistently implement the linguist's intended design. The tools considered are those available within commonplace productivity applications such as word processors, spreadsheets, etc. Prior experience with macros and spreadsheet functions is helpful but not necessary. Attendees are invited to bring specific problems relevant to their own research. To the extent possible, these will be discussed as part of the session.
|
Tutorial on empirical methods in phonological research
|
Mike Hammond |
2007.4.13 |
In this tutorial, we discuss the general theoretical issues and very practical challenges involved in getting started with corpus work or experimental work in phonology. We'll discuss general questions involving what kinds of hypotheses can be tested with these methodologies and we'll consider very practical issues involving how to select a corpus, what kind of experimental software to use, search tools, statistics, etc. We'll take a very simple hypothesis and work through it using both methodologies.
Links to free corpora and software will be available before the tutorial. Attendees are encouraged to download this material and bring their laptops. The tutorial will be scheduled in a room with computers and internet access so that we can work through some examples together.
|
Grammar & evidence: What are they anyway?
|
James Myers |
2007.4.14 |
This presentation briefly overviews the notions of grammar, evidence, and why we should care about them.
|
Corpus data vs. experiments in English phonotactics
|
Mike Hammond |
2007.4.14 |
Three factors have been argued to play a role in phonological systems: phonological patterns, statistical regularities, and typological generalizations. In this paper, we discuss new data that show how these factors interact and propose a model to describe their interaction.
Traditional generative phonology makes the assumption that phonological generalizations are made with minimal regard to the frequency of the patterns in question. Thus, for example, Vowel Shift is a central regularity of English phonology, though some of the data that motivate it are not at all frequent (Chomsky & Halle, 1968).
On the other hand, psycholinguistic research has shown that wellformedness judgments very clearly reflect statistical regularities like neighborhood density and phonotactic probability (Coleman & Pierrehumbert, 1997). For example, subjects' judgments of wellformedness for a nonsense form, e.g. "flink" would reflect i) how many words it sounds like, and ii) how frequent the components of the form are in real words.
Finally, what is the role of typological frequency? If a pattern occurs frequently in the languages of the world, what role does it play in any specific language? Zamuner, Gerken & Hammond (2004) show that language-specific statistical regularities overwhelm typological regularities in acquisition, but what role do they play in wellformedness?
In this talk, we report on a series of experiments designed to investigate the role of typological generalizations with respect to frequency in wellformedness judgments. We find that typological generalizations play a role, but one that is reversed from what we would expect. Specifically, in "ill-formed" nonsense words, typological regularity plays the role we would expect, with typologically rare forms judged less well-formed. However, in "well-formed" nonsense words, typological regularity plays a reverse role: typologically rare forms are judged more well-formed.
These results demonstrate i) that the distinction between "ill-formed" and "well-formed" is a critical one in the nonsense word task, and ii) that typological generalizations must be incorporated in any model of phonological wellformedness.
|
Phonologization and the liaison consonants in Taiwan Min and Hakka
|
H. Samuel Wang |
2007.4.14 |
Phonologization is a process in which a sound which was originally not distinctive becomes distinctive, and the phonetic property becomes phonological. A case of phonologization in progress is observed in Taiwan Min and Hakka where a suffixed particle is added to the end of a noun, causing a liaison consonant to appear in the second syllable. The liaisoned consonant is becoming recognized and is predicted to become phonologized in the process of change.
|
The components of phonological data
|
James Myers |
2007.4.14 |
Like any other linguistic data source, the dictionary attestations commonly analyzed by phonologists reflect both grammar and extra-grammatical factors. The most important of the latter are memory traces of actual words and the ad hoc analogies derived from them. In this talk I first demonstrate how such lexical influences can undermine a phonological analysis unless they are carefully understood in their own right, so that they can be factored out. Since doing this requires analyzing as much of an entire dictionary as possible, I will also introduce a software tool designed to automate most of the process: MiniCorp. MiniCorp permits tagging of a dictionary corpus in phonologically relevant ways so that analogical generalizations can be automatically extracted. The user can then test whether a hypothesized grammatical model contributes anything to the description of the data set beyond analogy alone. From the phonologist's perspective, the formalism used is entirely familiar: ranked Optimality Theoretic constraints (including output-output correspondence constraints to model analogy). Hidden inside MiniCorp, however, are algorithms that take advantage of the deep connection between Optimality Theory and the statistics of categorical data analysis (loglinear modeling). The result is that a phonological data set can be automatically decomposed into its essential components:
Phonological data = Analogy + [Constraint1 + ... + Constraintn]Grammar + Random noise
The hope is that MiniCorp will help bridge the gap between standard phonological practice and the larger world of corpus analysis, merging theoretical depth, statistical power, and ease of use.
|
A statistical analysis of Chinese compounds:
Power-law distribution and morphological productivity
|
Chao-Jan Chen |
2007.4.14 |
Compounding is a highly productive mechanism of word formation in Chinese. Under the conventional algebraic view of language, the formation of Chinese compounds is generally regarded as a mere rule-governed issue involving semantic or syntactic restrictions imposed on the components. However, against our expectation of a pure rule-governed production mechanism, a statistical analysis of the component characters of V-V compounds in the ASBC corpus by Chen (2005) shows an amazing statistical regularity: the character connectivity (with compounding regarded as a link connecting two characters) follows a power-law distribution, instead of a normal distribution. Such a distribution pattern is well known to appear in various kinds of scale-free network structures in real and virtual worlds; it is also viewed as the signature of a growing network with preferential attachment (Barabasi & Albert 1999). In this paper the phenomenon will be further explored and argued to suggest a stochastic mechanism of example-based (template-based) analogical creation that plays an important role in forming new compounds. A preliminary stochastic growth model is accordingly proposed to fit the power-law connectivity distribution of the character network. Such a model clearly shows an effect of positive feedback in the dynamic evolution of compound lexicon.
|
Straddling the interface: Investigating the locus of coordination
|
Wayne Cowart |
2007.4.14 |
Much recent work on the structure of coordinates aims to avoid or minimize features of grammar that are motivated solely by the properties of coordinates. Investigators have attempted to show how, for example, the asymmetric functor-argument relations characteristic of Merge might yield structures that at least seem symmetrical and unheaded, etc.
This talk will attempt to advance this project in an unusual way. It will adopt a skeptical view of the assumption that coordinates are entirely syntactically integrated. Contemplating a cognitive system that clearly deploys a variety of subsystems with diverse competencies, it will ask how investigators might go about ascribing the properties of a given structure type to one or another subsystem, or some interaction or collaboration among them. In particular, it will attempt to identify a wide range of evidence types that should be expected to reflect various possible accounts of the origin and nature of coordinates. These will range over acceptability judgments, acquisition data, evidence from comprehension and production, and studies of aphasic patients, among others.
With this survey in mind, the talk will review a number of specific findings and observations that are relevant to the assumption that all the constituents within a coordinate structure are fully and explicitly integrated within a single phrase marker.
Though no confident conclusion is motivated, the skeptical stance finds significant support. The talk will conclude with some consideration of the kinds of evidence that might help to clarify the status of coordinates.
|
Numerical methods for probing phonological representations
and their phonetic interpretations in young children
|
Mary Beckman |
2007.4.15 |
In the first few years of life, most children learn to talk. They progress from the simple squeals, coos, and rudimentary syllables of early vocal play to being able to pronounce longer words and phrases that contain fluent recognizable renditions of all of the consonants and vowels that conventionally distinguish meanings in the ambient speech community. They also learn to perceive spoken language and recognize the forms of words and sentences that they have never heard before. In order to be able to describe this process of becoming a talker/listener of some speech community, phonologists need to count and to measure various things. In this talk, I will review some of the methods that have been used to assess and model the acquisition process over the last thirty years. These methods and models have let us go well beyond the limitations of Jakobson's (1941) proposals, based on a handful of diary studies. However, there is still a great deal that we do not yet understand about phonological acquisition, and I will describe what kinds of measures we need, to make similar progress over the next thirty years.
|
Grammaticality and parsability in sentence processing
|
Chien-Jer Charles Lin |
2007.4.15 |
The distinction between what is ungrammatical and what is merely difficult may seem self-evident; in language processing, however, such distinctions can be murky. In this talk, we discuss factors that lead to judgments of ungrammaticality: grammatical knowledge, syntactic complexity, processing resources (e.g. working memory), linguistic experience, etc. We draw evidence from experiments of sentence processing to illustrate the effect of these factors and discuss their implications for understanding the architecture of grammar and its interfaces.
|
Corpus-based research on child phonology
|
Jane Tsay |
2007.4.15 |
Regarding phonological acquisition, much attention has been paid to universal innate patterns such as markedness constraints. For example, Optimality-theoretic (OT) models of child language acquisition make specific predictions about markedness. However, learning phonology also requires learning the particular sound patterns found in the adult language's specific lexicon. In particular, the lexicon contains crucial information about frequency. Therefore, we expect both universal markedness and language specific lexical properties to be available, and possibly competing, for the child. However, frequency information regarding sound patterns in child language has been difficult to obtain due to methodological limitations. To obtain frequency information, we need child language corpora that are very large, but that also have a great amount of phonological detail. In this study, we use data from the Taiwan Child Language Corpus (Tsay, in preparation) to address the issue of the interaction between markedness and frequency. The acquisition of syllable types and the acquisition of lexical tones in Taiwan Southern Min are the two major testing domains. Some technical issues about phonological coding in the child corpus will also be illustrated.
|
The acquisition of syntactic categories in Chinese:
Issues of bootstrapping and productivity
|
Thomas Lee |
2007.4.15 |
Generative linguists have long argued for the early onset of syntactic categories in child language based on learnability considerations and empirical evidence for rule-governed combinations in the two-word stage. On the other hand, usage-based linguists have argued for the absence of abstract categories in the early stages of syntactic development, using detailed measures of productivity and versatility, and identifying the lexical contexts in which early word combinations occur. With regard to developmental data, longitudinal case studies from different languages reveal divergences in productivity of early word combinations, as well as the extent to which novel sentences of children can be analyzed as simple expansions or reorganizations of their earlier utterances. Examining data from child Mandarin and child Cantonese from a comparative perspective, I would like to explore (a) whether common methodological criteria can be adopted for ascertaining the emergence of syntactic categories; and (b) how divergent cross-linguistic findings on early syntax can be evaluated with respect to the acquisition of categories.
|
Panel discussion
|
Mary Beckman |
2007.4.15 |
The panelists will review and synthesize what we have learned in the Workshop, with focus on two key questions. First, should the mostly quantitative methodologies discussed in this Workshop (and elsewhere: see links page in English or Chinese) become part of the basic training of theoretical linguists? If so, how can they best be taught to students who traditionally have backgrounds in the humanities (e.g., modern languages or philosophy) rather than in the sciences? Second, should the notion of "grammar" remain at the core of theoretical linguistics? If so, should linguists take the trouble to explain it in terms that psychologists and neuroscientists understand and respect? If not, what should be at the core of theoretical linguistics?
There will also be an opportunity for participants to pose their own questions to the panelists, and to discuss any other issues raised during the course of the Workshop.
Last updated on May 10, 2007