AbstractUsing voice characteristics to verify identity is an emerging science, reporting ever-lower error rates. This project investigates the application of this technology to unattended secure banking, such as automated teller machines. The aim was not to produce a final system but to create a verifier with parameters which could be varied to investigate the effects. Following Furui (1981) and Bernasconi (1990), the text-dependent, dynamic time-warping (DTW) platform was chosen.
To help evaluate changes to the verifier configuration, methods of assessment based on the separation between genuine and impostor DTW (dissimilarity) score distributions are proposed. These offer alternatives to the often-quoted but sometimes uninformative equal-error rate. A technique of generating speaker-specific but globally adjustable thresholds is presented for occasions when error rates are of interest. Also, two databases, one with over 200 speakers, have been collected.
Enrolling customers record several versions of the same word or phrase and these are used to make a characteristic template. Four different approaches are examined: two keep all of these initial tokens separate but the performance gain over the other two (combination) methods of Bernasconi and Furui - whose approach suffers from arbitrary treatment of initial data - does not merit the extra computation when verifying.
As expected, long utterances are found to work better than short ones, up to a point, probably about 2 seconds. However, combining DTW scores for sequences of individual short words may be as effective.
Traditional techniques for weighting the distance measure treat each vector dimension in isolation but by considering their combined effect on the separation between score distributions, when generated either by genetic algorithm or purely randomly, great gains may be made. In practice, a central database of speech would be necessary to calibrate such a system for each enrolling user. Also, to provide high security, the text to be spoken should consist of a sequence of randomly-chosen short words, such as the digits. Personalised modification of the input speech through the use of weighting functions is found to stabilise the verifier performance and reduce error rates to a level likely to be acceptable to financial institutions.
|Date of Award||Nov 1996|