KCLCCHMinor programmeAV1000Text analysis


AV1000
Fundamentals of the digital humanities
Explorations in large corpora: the (translated) speeches of Fidel Castro

Fidel Castro

Overview

This exercise in text-analytic exploration centres on the collected speeches of Dr Fidel Castro Ruz (1926– ), a lawyer, Marxist revolutionary and the Cuban head of state from 1959 to the present.

From 1959, when he ousted the former ruler Fulgencio Batista, until the present day Castro has been speaking prolifically in person at all kinds of public events, major and minor, as well as on television and radio. In the first decade of his rule, the Castro Speech Database maintained by the Latin American Network Information Center (University of Texas at Austin) records 411 speeches, interviews and press conferences—significantly in excess of 10 megabytes of text. As becomes clear from reading through the text, for more than 40 years Castro, an intelligent and articulate man, has been passionately, relentlessly telling the Cuban people and others what the Cuban revolution is about. Indeed, it is hard not to conclude that in the eyes of his people he must very early on have himself become the revolution he is constantly defining.

As the number of speeches suggests, Castro could hardly keep from repeating himself to a significant degree. A high degree of repetition would suggest in turn that simple text-analytic techniques should easily reveal his main lines of thought. The point of this exercise is to explore what these might be under circumstances that prevent you from reading enough of the text to find out in the ordinary way.

The data

The data are provided in two ways:

  1. 3 files of uncompressed, plain text: castro1.txt (3.04 MB), castro2.txt (3.46 MB), castro3.txt (3.32 MB)
  2. 3 zipped files: castro1.zip (1.03 MB), castro2.zip (1.2 MB), castro3.zip (1.1 MB)

If you download the latter, you will need WinZip or the equivalent. Once you have the three text files, you are ready to explore the corpus.

revised November 2007