MiniJudge FAQ

MiniJudge FAQ

Last updated on June 16, 2006

1. "Minimalist" experimental syntax

1.1 Are informal methods really worthless?
1.2 What about a "not sure" category for judgments?
1.3 What about directly comparing contrasting forms?

2. Using MiniJudge

2.1 What to do about informed consent forms?
2.2 Why can't MiniJudge send out email surveys automatically?

1.1 Experimental syntax is supposed to "improve" the way native-speaker judgments are collected. Does this mean that linguists have been collecting judgments all wrong for decades, and their work should just be thrown out?

Of course not. There are very good reasons to keep informal methods in the toolkit, rather than replacing them wholesale by more sophisticated methods. Informal methods have always played an important role in science. Since the goal of a scientific argument is to convince one's peers, the degree of formal rigor deemed necessary depends a great deal on the standards conventional in the community. Many firmly established findings were first published in papers that would never pass peer review by today's standards. Even thought experiments, the most "informal" type of experiment imaginable, can be highly convincing, merely by highlighting the theoretical implications of everyday observations. Much linguistic argumentation has the character of thought experiments, since native speakers are invited to test claimed judgments against their own intuitions. This ease of testability means that the results of a typical linguistic "experiment" are replicated far more often than results in fields with more demanding methodological protocols. Finally, there is the issue of limited resources: carrying out rigorous tests of every intuitively obvious empirical claim would be a serious waste of time and energy.

Nevertheless, linguists should understand that the complex methodological protocols of the other experimental cognitive sciences are not caprice, but developed over a history of almost two centuries in the face of very difficult empirical challenges: the mind is a black box, and behavior is messy. Testing hypotheses about the mind using only physical (mostly behavioral) evidence is so difficult to do right that the behaviorists gave up entirely, and even today's cognitive psychologists and neuroscientists retain a high degree of skepticism about mentalist claims that are not backed up by models that explicitly link behavior to the mind. Linguists have highly developed models of mental entities (elements of grammar), and they recognize that competence (mind) and performance (behavior) are not the same thing, but they generally have no interest at all in developing explicit competence-performance linking models. The result is that they have no way to tell when evidence that seems intuitively obvious is actually anything but: tainted by bias, or simply wrong. A psychologist might say that this slippery-slope problem calls all linguistic judgment data into question. However, such a response not only dismisses the arguments in the previous paragraph without any real counterargument, but it is also rather hypocritical, given that "all experiments leak": the discussion sections of the typical psychology paper have just about as much speculation and special pleading as the typical linguistics paper.

In short, the degree of methodological rigor should be proportional to the subtlety of the effects one is trying to detect. For even more on this issue, see here and here.

1.2 Linguistic acceptability is not an all-or-nothing phenomenon, so why does "minimalist" experimental syntax only permit binary YES/NO judgments? Won't this lose a lot of information, and maybe even force speakers to give weird, biased judgments?

This question is thoroughly addressed here. Briefly, the answer is that binary judgment experiments can provide a lot more information than you might expect, and they probably lead to less bias than judgment experiments permitting a third "not sure" response category.

1.3 It is well known that judgments become sharper when contrasting forms are directly pitted against each other. That's why syntax papers tend to put sentences in pairs, so readers can see more easily that one is better than the other. So why do experimental syntacticians insist on testing sentences one by one, in random order?

The most fundamental reason why experimental syntacticians follow the psycholinguistic practice of testing sentences individually is that this best reflects the nature of the hypotheses they are testing. That is, syntax is about individual sentences, not sentence pairs, so the testing unit should be the sentence, just as discourse hypotheses should be tested with strings of sentences. Even Optimality Theory, which relies on a comparison mechanism to make predictions about the grammaticality of forms, is not, strictly speaking, about comparison as an empirical subject matter.

From this it follows that showing judges paired sentences is not only unnecessary, but a downright bad idea. For each sentence of the pair, the other sentence represents a context. If the experiment consists of nothing but pairs of a certain type, the factor defining the contrast is confounded with the presence of the same type of context. Thus there is no way to know if any judgment differences really relate to the individual sentences (which is what a syntactic theory cares about) and not to the sentence pairs as pairs. It's quite conceivable that even if, in actual fact, there is no real acceptability difference at all, the pairing by itself will lead many judges to think that there must be some difference in acceptability, or they wouldn't be paired! Concerns about context effects also explain another basic convention of experimental syntax: the randomization of sentence order. For more on randomization in "minimalist" experiment syntax, see here.

Does this mean it's bad practice for syntax papers to show pairs of contrasting sentences? Not at all. A syntactic judgment experiment must have pairs (or sets) of contrasting sentences, or it's not an experiment. An experiment is something that varies one or more theoretically interesting factors while controlling everything else. A contrasting sentence pair in a paper is just demonstrating good experimental design. (For more on factorial design, see here.) Moreover, pairing sentences has a valid role to play in informal methodology: if even under this extremely difference-biasing context you can detect no acceptability contrast, then there probably isn't any.

2.1 Testing sentences on myself, or on a colleague or two, may be a kind of "experiment," but fortunately my university doesn't think so, since then I'd have to get my research approved by the Human Subjects Review Board. But if I do "real" experimental syntax, won't this convenient loophole disappear?

Probably. The line between "informal" and "formal" experiment is not very sharp; linguists are constantly confirming judgments with email acquaintances without having to go through all the paperwork. Still, when you start reporting p values and whatnot, it seems reasonable to expect that your university will require your judges to sign informed consent forms. This is particularly annoying if your participants are indeed just email acquaintances, not formally recruited subjects. Fortunately, syntactic judgment experiments (especially those run with a standardized tool like MiniJudge) are all the same: the task is well-established as harmless, it doesn't invade anyone's privacy, and it doesn't even take that long to do. So you should be able to come up with a standard form to use for years. It should also be acceptable to use a "signable" electronic version (e.g. an editable text file) to use with emailed surveys. Perhaps you could include a brief version of the consent form in the instructions themselves, heading it with something like "By returning this survey, you understand that..." This stuff is indeed annoying and pointless, but it would be pretty pathetic to use this as an excuse not to make "minimalist" experimental syntax part of your research toolkit.

2.2 Sending surveys by email makes it easier to collect and analyze judgments, but MiniJudge forces you to send out each survey one by one. Why can't there be a function where MiniJudge automatically sends out individual surveys to individual email addresses?

Automatically sent email is called "spam".

Contact James Myers with your questions and comments.