layout text
layout text
layout text
layout text
layout text layout text
layout text

Español | English


Integrated Tools for Exploitation of a Spontaneous Speech Corpus of Spanish.
(Antonio Moreno-Sandoval, José M. Guirao)

Antonio Moreno-Sandoval
antonio.msandoval@uam.es
Universidad Autónoma de Madrid
Madrid
Spain
José M. Guirao
jmguirao@ugr.es
Universidad de Granada
Granada
Spain

This communication will present a query system to a corpus of spontaneous speech, concretely the Spanish C-ORAL-ROM corpus (Cresti & Moneglia eds. 2005; Moreno et al. 2005). The corpus consists of 181 transcripted sessions, in different registers and communicative situations. With over 42 hours of recorded data and almost 500 speakers, the corpus has 312,000 tokens (words) of 21,000 different types.

The system can be currently accessed through a web page, although it will be also available as an independent application. The system consists of three main components:

  1. A concordancer of text and sound: the system looks for words or multi-words expressions in all the texts and retrieves every “utterance” where the searched string appears along with the original sound fragment (in mp3). This way the user can hear the original source, not only its transcription. (Figure 1)
  2. A morphological analyser of Spanish, based on broad-coverage lexicon, which provides all the possible analyses for a given wordform. (Figure 2)
  3. A Part-of-Speech tagger for sentences in Spanish, which provides the surface syntactic analysis for the sequence. (Figure 3)

The potential uses of this tool enhance the possibilities of the original John Benjamins version published in DVD format. In particular, some examples of its application to teaching/learning Spanish as a second language, as well as to describing properties of spoken Spanish will be given.


Figure 1 shows the concordances for the multi-word al fin y al cabo (at last)
Figure 1 shows the concordances for the multi-word al fin y al cabo (at last)
Figure 2 displays all possible analyses for the word sobre  (about, envelop, to be left over)
Figure 2 displays all possible analyses for the word sobre (about, envelop, to be left over)
Figure 3 shows the PoS analysis for the sentence "John put an envelop on the table".
Figure 3 shows the PoS analysis for the sentence "John put an envelop on the table".

References

  • Cresti, Emanuela, and Máximo Moneglia, eds. 2005. C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages. Amsterdam: John Benjamins.
  • Moreno, Antonio, Guillermo de la Madrid, Manuel Alcántara, et al. 2005. 'The Spanish corpus' . In C-ORAL-ROM Integrated Reference Corpora for Spoken Romance Languages, 135-161. Amsterdam: John Benjamins
layout text layout text
layout text layout text
layout text
layout text layout text