Welcome to Zhihao Du(杜志浩)’s homepage.

Highlights

  1. CosyVoice has been open-sourced: [Code][Paper][Demos]
  2. FunAudioLLM has bee released at: [Code]

About(关于)

I'm a senior researcher of Speech Lab, DAMO academy, Alibaba group. I recieved the Ph.D. degree with the School of Computer Science and Technology at Harbin Institute of Technology under the supervision of Jiqing Han, in 2021. I received the B.E. degree in software engineering from the College of Software of Inner Mongolia University under the supervision of Xueliang Zhang, in 2015. My research interests include multi-talker speech processing, speech separation, speech synthesis, and deep learning. Last, but certainly not least, I'd like to thanks my wonderful wife for her understanding and supports.

Publications(出版物)

(Note: Most of my papers can be found on arxiv.)

Journal Papers(期刊论文)

  1. Zhihao Du, Xueliang Zhang, Jiqing Han. A joint framework of denoising autoencoder and generative vocoder for monaural speech enhancement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020. View Demos

Conference Papers(会议论文)

  1. Zhihao Du, Shiliang Zhang, Kai Hu, Siqi Zheng, Funcodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec. ICASSP 2024
  2. Mohan Shi, Zhihao Du, et al., A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-Party Meetings. APSIPA 2023
  3. Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie, Sa-Paraformer: Non-Autoregressive End-To-End Speaker-Attributed ASR. ASRU 2023
  4. Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu, The Second Multi-Channel Multi-Party Meeting Transcription Challenge (M2MeT 2.0): A Benchmark for Speaker-Attributed ASR. ASRU 2023
  5. Yue Gu, Zhihao Du, Shiliang Zhang, Qian Chen, Jiqing Han, Personality-aware Training based Speaker Adaptation for End-to-end Speech Recognition. INTERSPEECH 2023 Paper
  6. Mohan Shi, Zhihao Du, Qian Chen, Fan Yu, Yangze Li, Shiliang Zhang, Jie Zhang, Lirong Dai, CASA-ASR: Context-Aware Speaker-Attributed ASR. INTERSPEECH 2023
  7. Zhifu Gao, Zerui Li, Jiaming Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang, FunASR: A Fundamental End-to-End Speech Recognition Toolkit. INTERSPEECH 2023
  8. Jiaming Wang*, Zhihao Du*, Shiliang Zhang. TOLD: A Novel Two-stage Overlap-aware Framework for Speaker Diarization. ICASSP 2023 (equal contribution)
  9. Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan. Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis. EMNLP 2022 (long paper)
  10. Yuxiao Lin, Zhihao Du, Shiliang Zhang, Fan Yu, Zhou Zhao, Fei Wu, Separate-to-Recognize: Joint Multi-target Speech Separation and Speech Recognition for Speaker-attributed ASR. ISCSLP 2022
  11. Fan Yu, Shiliang Zhang, Pengcheng Guo, Yuhao Liang, Zhihao Du, et.al. MFCCA: Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario. SLT 2022
  12. Fan Yu, Zhihao Du, Shiliang Zhang, Yuxiao Lin, Lei Xie. A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings. ICASSP 2022
  13. Fan Yu, Shiliang Zhang, Pengcheng Guo, Yihui Fu, Zhihao Du, et.al. Summary on the ICASSP 2022 multi-channel multi-party meeting transcription grand challenge. ICASSP 2022
  14. Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, et.al. M2MeT: The ICASSP 2022 multi-channel multi-party meeting transcription challenge. ICASSP 2022
  15. Hongwei Song, Jiqing Han, Shiwen Deng, Zhihao Du. Capturing Temporal Dependencies Through Future Prediction for CNN-Based Audio Classifiers. ICASSP 2021
  16. Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang. Pan: Phoneme-aware network for monaural speech enhancement. ICASSP 2020.
  17. Zhihao Du, Ming Lei, Jiqing Han, Shiliang Zhang. Self-Supervised Adversarial Multi-Task Learning for Vocoder-Based Monaural Speech Enhancement. INTERSPEECH 2020
  18. Zhihao Du, Jiqing Han, Xueliang Zhang. Double Adversarial Network Based Monaural Speech Enhancement for Robust Speech Recognition. INTERSPEECH 2020, https://github.com/ZhihaoDU/du2020dan
  19. Yue Gu, Zhihao Du, Hui Zhang, Xueliang Zhang. An Efficient Joint Training Framework for Robust Small-Footprint Keyword Spotting. ICONIP 2020
  20. Hongwei Song, Jiqing Han, Shiwen Deng. Zhihao Du. Acoustic scene classification by implicitly identifying distinct sound events, INTERSPEECH 2019
  21. Zhihao Du, Xueliang Zhang, Jiqing Han. Investigation of Monaural Front-End Processing for Robust Speech Recognition Without Retraining or Joint-Training. APSIPA 2019.

Preprints(预印本)

  1. Zhihao Du, Shiliang Zhang, Siqi Zheng, Zhijie Yan. Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information. https://arxiv.org/abs/2111.13694.

PhD Thesis(博士论文)

RESEARCH ON MONAURAL SPEECH ENHANCEMENT BASED ON PRIOR INFORMATION IN DIFFERENT SEMANTIC LEVELS(基于不同语义层级先验信息的 单通道语音增强方法研究).

Reviewer(审稿)

  1. International Conference on Asian Language Processing (IALP) 2023
  2. Conference of the International Speech Communication Association (INTERSPEECH) 2023
  3. International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023
  4. International Symposium on Chinese Spoken Language Processing (ISCSLP) 2022

Open sources(开源代码)

  1. Widely-used speech features, https://github.com/ZhihaoDU/speech_feature_extractor, star 100+

Honors(荣誉)

  1. 哈尔滨工业大学优秀博士论文提名(2021)
  2. 内蒙古自治区优秀毕业生(2015)
  3. MCM Meritorious Winner
  4. ACM/ICPC 二等奖

Organization(组织)

  1. IEEE Member
  2. SIGDAT Member

Contact me(联系我)

TEL: +86-15600609952

E-mails: duzhihao.china@gmail.com and neo.dzh@alibaba-inc.com.