KCL • CCH • Minor programme • AV1000 • Text-analysis
The use of a program like MonoConc looks to some like a replacement of critical thought by technology. But as we've seen, it is a tool for use by a skilled and informed scholar, not a machine that generates knowledge on its own. These notes review some of the chief problems that need to be kept in mind.
Johanna Rubba, “An overview of the English morphological system”, has a useful brief summary that gives examples of the many ways that English words change in form.
From James W. Parins and Todd K. Bender, “Preface”, A Concordance to Conrad's The Nigger of the Narcissus (New York: Garland, 1981), vii–ix:
Even a brief examination of Conrad's lexicon yields some surprising results. For example, Ian Watt's influential study follows suggestions in Conrad's “Preface” and argues that Conrad's purpose in The Nigger of the Narcissus is to “enact the complexities and contradictions of solidarity.” He points out Conrad's own comparison of this work to Stephen Crane's The Red Badge of Courage. Conrad asserts that both works deal with the psychology of the mass. Crane's treats the large mass of the army, Conrad's the smaller grouping of the ship's crew. Watt sees these novels as expressing a very simple idea which lies at the core of all Conrad's thought, a commitment to “solidarity.” The old sailor Singleton is the true emblem of the work, steering steadfastly amidst the turmoil of nature and man. If, as Watt maintains, “solidarity” is the theme of The Nigger of the Narcissus, a glance at the Word Frequency Table shows that it must be a theme undenominated in the text. To be sure, a novel may have for its main theme “war” and “peace” without actually using the words “war” and “peace” excessively. Nevertheless, even the related vocabulary indicates a surprising lack of explicit reference to the general concept of “solidarity” which we might expect to be more usually formulated perhaps in Conrad's own term “fidelity.” “Fidelity” occurs only once in the text, and that reference at 162.14 seems almost purposely contrived to confound the thematic critic: “We commenced to believe Singleton, but with unshaken fidelity dissembled to Jimmy.” Here is a strange, dissembling, slippery “fidelity” indeed. If Watt had approached this text with an accurate tabulation of the words actually used by Conrad, would he have read it the same way that he does under the influence of Conrad's “Preface” to the work in which Conrad speaks of “the solidarity that knits together the loneliness of innumerable hearts”? Watt seems to demonstrate an interesting case of a reader whose understanding of the text is dictated by the framing devices of the author. If Gulliver presents himself as a straightforward, unimaginative fellow, we nevertheless hesitate to accept his Lilliputians as true—unless we are very unsophisticated as readers. If Marlow tells us he hates a lie, we nevertheless can spot that he sometimes is inaccurate. If Conrad writes a “Preface” telling us that this story is very simple and straightforward, we have a similar problem as the reader: a turbulence created between what we are told to expect and what the work actually does. The most interesting question defined when we bring this concordance to bear on Watt's assertions is, “Why is Watt gratified to translate the vocabulary actually stated in the text into an unstated category, solidarity?” One of the main functions of our work with verbal indexes and other tabulations of data is to turn critics back to a self-scrutiny, asking how their intuitions match, or fail to match, the evidence of the text.
Jorge Luis Borges, “The Garden of Forking Paths”, translated by Donald A. Yates, in Labyrinths (New York: New Directions, 1964), 27:
“In a riddle whose answer is chess, what is the only prohibited word?”
I thought a moment and replied, “The word chess.”
Is there really no language in Conrad's story to suggest ideas of solidarity or fidelity? It's easy to check; online texts of the story are readily available, and an online version of Roget's thesaurus (such as this one) make it easy to generate lists of other terms relating to solidarity and fidelity to check. As we've already seen, there are also other ways such a theme might be expressed besides this sort of direct mention.
Samuel Johnson, opening of the “Preface” to his edition of Shakespeare's works (1765):
That praises are without reason lavished on the dead, and that the honours due only to excellence are paid to antiquity, is a complaint likely to be always continued by those, who, being able to add nothing to truth, hope for eminence from the heresies of paradox; or those, who, being forced by disappointment upon consolatory expedients, are willing to hope from posterity what the present age refuses, and flatter themselves that the regard which is yet denied by envy, will be at last bestowed by time.
That sentence is 87 words long. How wide a span would we need in a concordance program to work with such a sentence properly?
How effective are the concordance programs we've looked at for getting an idea of the distribution of a word in the text rather than its frequency alone?
Does the sequence of words in a text matter, or is a text different when you read the words in the original order? Does the context give some words more weight than others? Could a hapax legomenon have more weight than a frequently-used word?
Text analysis typically looks carefully at word frequency and collocation; but what literary aspects of texts might also matter? Rhythm, assonance, allusion, …
H-Bot is a web site that answers historical questions using very simple text-analytic techniques. Daniel J. Cohen and Roy Rosenzweig, “Web of lies? Historical knowledge on the Internet”, First Monday 10:12, December 2005, explains how this resource works (see especially this section): it automates the process of looking around the web, comparing answers, and choosing what comes out most often, possibly weighted to favour more reliable sites. Note also the discussion of how H-Bot works differently for different forms of a question, because it doesn't actually know English.
But what about very recent findings, or little-known findings? More complex questions, beyond the merely factual?
revised December 2007