Запис Детальніше


Науковий журнал «Радіоелектроніка, інформатика, управління»

Переглянути архів Інформація
Поле Співвідношення
##plugins.schemas.marc.fields.042.name## dc
##plugins.schemas.marc.fields.720.name## Bisikalo, O. V; Vinnytsia National Technical University, Vinnytsia, Ukraine
Grischuk, T. V.; Vinnytsia National Technical University, Vinnytsia, Ukraine
Kovtun, V. V.; Vinnytsia National Technical University, Vinnytsia, Ukraine
##plugins.schemas.marc.fields.653.name## automated speaker recognition system of critical use; signal processing; neural network; feature analysis
##plugins.schemas.marc.fields.520.name## Context. The questions of adapting the convolution neural network classifier use in automatic speaker recognition system of critical use<br />(ASRSCU) are considered. The research object is the individual features of the human speech process.<br />Objective. Development of means for separating individual features from the speaker’s speech signal, increasing their informativeness as<br />a result of the factor analysis, their visual representation for the use of the convolution neural network classifier, and optimizing its<br />architecture for the needs of ASRSCU.<br />Method. Measures are proposed to optimize the speaker recognition procedure of the ASRSCU, for which the optimal way of informative<br />features representation and the method of increasing their informativeness are theoretically justified, the topology and measures for increasing<br />of the speaker recognition process efficiency are justified. In particular, it is justified the use of power normalized cepstral coefficients (PNCC)<br />for the description of phonograms recorded in noisy environment conditions. We propose to use Gabor filters to represent information that<br />will be analyzed by a convolution neural network, an optimal method of factor analysis (a sparse main components analyzing method) to<br />reduce of the features vector length while preserving its informativeness, an improved topology of the convolution neural network in which<br />the Gabor filters are integrated in to the convolution layer, which allows them to optimize their parameters during the neural network training<br />process, and in a fully connected layer a deep neural network with a bottleneck layer is used, whose weights after training are uses as inputs for<br />the GMM/HMM control classifier.<br />Results. Methods of representation and optimization of the speaker’s individual features, methods for their visual presentation and<br />improvement of the topology of a convolution neural network for making speaker recognition on their basis.<br />Conclusions. The obtained theoretical results have found empirical confirmation. In particular, the stability of an improved convolution<br />neural network to the noisy input phonograms proved to be higher than the results of an ordinary convolution neural network and a deep neural<br />network. With an SNR increase up to 10 dB, the GMM/HMM classifier is more efficient than the neural network, which can be explained by the efficiency of the used UBM models, but it is much more resource-intensive. Also, the parameters of the Gabor filter bank frames that<br />provide the most variable individual features from the speech signal for speaker recognition are determined empirically.
##plugins.schemas.marc.fields.260.name## Zaporizhzhya National Technical University
2018-10-04 12:10:39
##plugins.schemas.marc.fields.856.name## application/pdf
##plugins.schemas.marc.fields.786.name## Radio Electronics, Computer Science, Control; No 2 (2018): Radio Electronics, Computer Science, Control
##plugins.schemas.marc.fields.546.name## uk
##plugins.schemas.marc.fields.540.name## Copyright (c) 2018 O. V Bisikalo, T. V. Grischuk, V. V. Kovtun