Feature Extraction Techniques for Deep Learning based Speech Classification
Author
Abstract

The process of classifying audio data into several classes or categories is referred to as audio classification. The purpose of speaker recognition, one particular use of audio classification, is to recognize a person based on the characteristics of their speech. The phrase "voice recognition" refers to both speaker and speech recognition tasks. Speaker verification systems have grown significantly in popularity recently for a variety of uses, such as security measures and individualized help. Computers that have been taught to recognize individual voices can swiftly translate speech or confirm a speaker’s identification as part of a security procedure by identifying the speaker. Four decades of research have gone into speaker recognition, which is based on the acoustic characteristics of speech that differ from person to person. Some systems use auditory input from those seeking entry, just like fingerprint sensors match input fingerprint markings with a database or photographic attendance systems map inputs to a database. Personal assistants, like Google Home, for example, are made to limit access to those who have been given permission. Even under difficult circumstances, these systems must correctly identify or recognize the speaker. This research proposes a strong deep learning-based speaker recognition solution for audio categorization. We suggest self-augmenting the data utilizing four key noise aberration strategies to improve the system’s performance. Additionally, we conduct a comparison study to examine the efficacy of several audio feature extractors. The objective is to create a speaker identification system that is extremely accurate and can be applied in practical situations.

Year of Publication
2023
Date Published
jul
URL
https://ieeexplore.ieee.org/document/10307237
DOI
10.1109/ICCCNT56998.2023.10307237
Google Scholar | BibTeX | DOI