Home > Published Issues > 2020 > Volume 15, No. 11, November 2020 >

Speech Separation in the Frequency Domain with Autoencoder

Hao D. Do 1,2,3, Son T. Tran 1,2, and Duc T. Chau 1,2
1. University of Science, Ho Chi Minh City, Vietnam
2. Vietnam National University, Ho Chi Minh City, Vietnam
3. OLLI Technology JSC, Ho Chi Minh City, Vietnam

Abstract—Speech separation plays an important role in a speech-related system because it can denoise, extract and enhance speech signal, and after all improve the accuracy and performance of the system. In recent years, many approaches only separate the speech out of commonly high-frequency noise or a particular background sound. We propose a more powerful approach, combining an autoencoder and a bandpass filter to separate speech signals. This combination can extract the speech in the mixture with not only high-frequency noise but also many kinds of different background sounds. Our approach can be flexibly applied for the new background sounds. Experimental results show that our model can extract fastly and effectively the speech signal with 9.01 dB in SIR and 11.26 in SDR. On the other hand, we can adjust the passband to identify the range of frequency at the output signal to apply for particular applications.
Index Terms—Speech separation, autoencoder, bandpass filter, frequency domain

Cite: Hao D. Do, Son T. Tran, and Duc T. Chau, "Speech Separation in the Frequency Domain with Autoencoder," Journal of Communications vol. 15, no. 11, pp. 841-848, November 2020. Doi: 10.12720/jcm.15.11.841-848

Copyright © 2020 by the authors. This is an open access article distributed under the Creative Commons Attribution License (CC BY-NC-ND 4.0), which permits use, distribution and reproduction in any medium, provided that the article is properly cited, the use is non-commercial and no modifications or adaptations are made.