Interspeech Special Session

Call for Participation: Speech recognition in the cockpit: robustness to noise and non-native speech

Novel techniques for the NATO non-native Air Traffic Control and HIWIRE cockpit databases

The aim of this special session is for researchers to apply state-of-the art feature extraction, acoustic modeling and adaptation algorithms to the problem of (hands-free) speech recognition in the cockpit. Two main conditions have to be addressed: (i) speech signal corrupted by additive noise, and (ii) non-native speaker input.

For this special session, the HIWIRE database collected and packaged under the auspices for the IST-EU STREP project HIWIRE (Human Input that Works in Real Environments) is made freely available to the participants. The database contains 8100 English utterances pronounced by non-native speakers (31 French, 20 Greek, 20 Italian, and 10 Spanish speakers); the collected utterances correspond to human input in a command and control aeronautics application. The data was recorded in the studio and noise recorded in the cockpit was artificially added to the data. A description of the database can be found here.

The signals are provided in clean (studio recordings with close talking microphone), low-, mid- and high-noise1 (cockpit noise artificially added to the data) conditions. Two research tracks are proposed: ``robust non-native'' (RNN) and ``non-native adaptation'' (NNA) tasks. You are free to participate to one or both tracks. Baseline HTK setup scripts are provided along with the HIWIRE database for both tracks. Baseline word error rate and sentence error rate are also provided for result comparison. Training scripts use the TIMIT database (not provided with the distribution; should be acquired separately).

Track 1:
Robust non-native task (RNN) The clean and/or noisy non-native HIWIRE database will be used for this task. All data provided should be used for testing (no model adaptation allowed). The purpose of this track is to investigate novel algorithms for feature extraction and (unsupervised) feature transformations.
Track 2:
Non-native adaptation task (NNA) The data is split into adaptation and testing (up to 50% of speakers utterances can be used for adaptation, 50% of speaker utterances used for testing). Supervised model adaptation is this task can be performed using 10%, 20% or 50% of the utterances of each speaker. The purpose of this track is to investigate acoustic modeling and adaptation algorithms for dealing with no-native speech.

Research Areas

We encourage submission of papers that deal with one or more of the following research areas (list not exclusive) and present results on the HIWIRE database:

Authors may select to present results on a subset of the databases, e.g., clean data only for track 1. Papers will be treated as regular papers in the Interspeech paper submission procedure.

Obtaining the data

To acquire a copy of the data please use contact or

References

  1. EU-IST HIWIRE project web site: http://www.hiwire.org/
  2. A. Potamianos, G. Bouselmi, D. Dimitriadis, D. Fohr, Roberto Gemello, I. Illina, Franco Mana, P. Maragos, M. Matassoni, V. Pitsikalis, J. Ramirez, E. Sanchez-Soto, J. Segura, and P. Svaizer, ``Towards Speaker and Environmental Robustness in ASR: the HIWIRE project,'' in Proc. Workshop on Speech Recognition and Intrinsic Variation, (Toulouse, France), May 2006.

Contact

For questions, clarifications or bug reporting on task definitions and HTK scripts please contact or

Organizing committee: Thibaut Ehrette (Thales Research), Dominique Fohr (LORIA), Petros Maragos (National Technical University of Athens), Marco Matassoni (ITC-IRST), Alexandros Potamianos, (Technical University of Crete), Jose C. Segura (Universidad de Granada).

Co-organizer: David van Leeuwen (TNO Human Factors)