Publications＆Patents | Dr. Simon Lui 雷兆恒

I. Publications

2025

Tsoi, T., Deng, J., Ju, Y., Weck, B., Kirchhoff, H., & Lui, S. (2025). CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining. arXiv preprint arXiv:2503.23128. (to appear in ICME 2025)

2024

Deng, J., Ju, Y., Yang, J., Lui, S., & Liu, X. Efficient Adapter Tuning for Joint Singing Voice Beat and Downbeat Tracking with Self-supervised Learning Features. The 25th International Society for Music Information Retrieval Conference (ISMIR).

Ju, Y., Wu, C. Y., Lorenzo, B. C., Yang, J., Deng, J., Fan, F., & Lui, S. End-to-end automatic singing skill evaluation using cross-attention and data augmentation for solo singing and singing with accompaniment. The 25th International Society for Music Information Retrieval Conference (ISMIR).

Wu, Y., Ju, Y., Lui, S., Yang, J., Fan, F., & Du, X. (2024, July). Cycle Frequency-Harmonic-Time Transformer for Note-Level Singing Voice Transcription. In 2024 IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE.

2023

Ju, Y., Xu, C., Guo, Y., Li, J., & Lui, S. (2023). Improving Automatic Singing Skill Evaluation with Timbral Features, Attention, and Singing Voice Separation. ICME 2023.

2022

Xu, L., Wang, Z., Wu, B., & Lui, S. (2022). MDAN: Multi-level Dependent Attention Network for Visual Emotion Analysis. Accepted in CVPR 2022

Zhang, B., Wang, W., Zhao, E., & Lui, S. (2022, July). Lyrics-to-audio alignment for dynamic lyric generation. In Music Inf. Retrieval Eval. eXchange Audio-Lyrics Alignment Challenge (MIREX).

2021

Zhuang, X., Yu, H., Zhao, W., Jiang, T., Hu, P., Lui, S., & Zhou, W. (2021). KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke. arXiv preprint arXiv:2110.09121, Accepted in Interspeech 2022.

S. Hu, B. Liang, Z. Chen, X. Lu, E. Zhao and S. Lui, “Large-scale singer recognition using deep metric learning: an experimental study,” 2021 International Joint Conference on Neural Networks (IJCNN), 2021, pp. 1-6, doi: 10.1109/IJCNN52387.2021.9533911.

Zhuang, X., Jiang, T., Chou, S. Y., Wu, B., Hu, P., & Lui, S. (2021, June). Litesing: Towards fast, lightweight and expressive singing voice synthesis. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 7078-7082). IEEE.

Zeng, Y., Xiao, Z., Hung, K. W., & Lui, S. (2021). Real-time video super resolution network using recurrent multi-branch dilated convolutions. Signal Processing: Image Communication, 93, 116167.

Xiao, Z., Zhang, Z., Hung, K. W., & Lui, S. (2021). Real-time video super-resolution using lightweight depthwise separable group convolutions with channel shuffling. Journal of Visual Communication and Image Representation, 75, 103038.

2020

Hu, S., Zhang, B., Liang, B., Zhao, E., & Lui, S. (2020). Phase-aware music super-resolution using generative adversarial networks. arXiv preprint arXiv:2010.04506. Interspeech 2020.

Jin, C., Wang, T., Liu, S., Tie, Y., Li, J., Li, X., & Lui, S. (2020). A transformer-based model for multi-track music generation. International Journal of Multimedia Data Engineering and Management (IJMDEM), 11(3), 36-54.

Lin, K. W. E., Balamurali, B. T., Koh, E., Lui, S., & Herremans, D. (2020). Singing voice separation using a deep convolutional neural network trained by ideal binary mask and cross entropy. Neural Computing and Applications, 32(4), 1037-1050.

2019

Agres, K., Lui, S., & Herremans, D. (2019, August). A novel music-based game with motion capture to support cognitive and motor function in the elderly. In 2019 IEEE Conference on Games (CoG) (pp. 1-4). IEEE.

Balamurali, B. T., Lin, K. E., Lui, S., Chen, J. M., & Herremans, D. (2019). Toward robust audio spoofing detection: A detailed comparison of traditional and learned features. IEEE Access, 7, 84229-84241.

Zhao, D., Lee, J. S. A., Tan, C. T., Dancu, A., Lui, S., Shen, S., & Mueller, F. F. (2019, June). GameLight-Gamification of the Outdoor Cycling Experience. In Companion Publication of the 2019 on Designing Interactive Systems Conference 2019 Companion (pp. 73-76).

Hee, H. I., Balamurali, B. T., Karunakaran, A., Herremans, D., Teoh, O. H., Lee, K. P., … & Chen, J. M. (2019). Development of machine learning for asthmatic and healthy voluntary cough sounds: A proof of concept study. Applied Sciences, 9(14), 2833.

2018

Agus, N., Anderson, H., Chen, J. M., Lui, S., & Herremans, D. (2018). Minimally simple binaural room modeling using a single feedback delay network. Journal of the Audio Engineering Society, 66(10), 791-807.

Agus, N., Anderson, H., Chen, J. M., Lui, S., & Herremans, D. (2018). Perceptual evaluation of measures of spectral variance. The Journal of the Acoustical Society of America, 143(6), 3300-3311.

Upadhyay, R., & Lui, S. (2018, January). Foreign English accent classification using deep belief networks. In 2018 IEEE 12th international conference on semantic computing (ICSC) (pp. 290-293). IEEE.

2017

Anderson, H., Agus, N., Chen, J. M., & Lui, S. (2017). Modeling the Proportion of Early and Late Energy in Two-Stage Reverberators. Journal of the Audio Engineering Society, 65(12), 1017-1031.

Lui, S., & Grunberg, D. (2017, December). Using skin conductance to evaluate the effect of music silence to relieve and intensify arousal. In 2017 international conference on orange technologies (ICOT) (pp. 91-94). IEEE.

Fang, J., Grunberg, D., Lui, S., & Wang, Y. (2017, December). Development of a music recommendation system for motivating exercise. In 2017 International Conference on Orange Technologies (ICOT) (pp. 83-86). IEEE.

Hee, H. I., Chen, J., & Lui, S. (2017). Intuitive Interactive Platform for Preoperative Communication Between Hospital and Patients/Caregivers: Towards Community Partnership for Peri-Operative Person-Based Healthcare Model. Iproceedings, 3(1), e8425.

Agus, N., Anderson, H., Chen, J. M., & Lui, S. (2017). Energy-Based Binaural Acoustic Modeling. Technical Report 1, Singapore University of Technology and Design.(2017 Apr.) https://istd. sutd. edu. sg/research/technicalreports/energy-based-binaural-acoustic-modeling.

Lin, K. W. E., Anderson, H., So, C., & Lui, S. (2017). Sinusoidal Partials Tracking for Singing Analysis Using the Heuristic of the Minimal Frequency and Magnitude Difference. In INTERSPEECH (pp. 3038-3042).

2016

Khwaja, M. K., Vikash, P., Arulmozhivarman, P., & Lui, S. (2016). Robust phoneme classification for automatic speech recognition using hybrid features and an amalgamated learning model. International Journal of Speech Technology, 19(4), 895-905.

Lee, H., Yoong, A. C. H., Lui, S., Vaniyar, A., & Balasubramanian, G. (2016, November). Design exploration for the” squeezable” interaction. In Proceedings of the 28th Australian Conference on Computer-Human Interaction (pp. 586-594).

2015

Tan, C. T., Byrne, R., Lui, S., Liu, W., & Mueller, F. (2015). JoggAR: a mixed-modality AR approach for technology-augmented jogging. In SIGGRAPH Asia 2015 Mobile Graphics and Interactive Applications (pp. 1-1).

Anderson, H., Lin, K. W. E., So, C., & Lui, S. (2015, October). Flatter frequency response from feedback delay network reverbs. In ICMC.

Trochidis, K., & Lui, S. (2015, June). Modeling affective responses to music using audio signal analysis and physiology. In International symposium on computer music multidisciplinary research (pp. 346-357). Springer, Cham.

Anderson, H., Lin, K. W. E., Agus, N., & Lui, S. (2015, May). Major thirds: a better way to tune your ipad. In NIME (pp. 365-368).

Leslie, G., Picard, R., & Lui, S. (2015). An EEG and Motion Capture Based Expressive Music Interface for Affective Neurofeedback. In Proc. 1st Int. BCMI Workshop.

Lui, S. (2015, May). Generate expressive music from picture with a handmade multi-touch music table. In NIME (pp. 374-377).

Hoon, L. T., Vuyyuru, M. R., Kumar, T. A., & Lui, S. (2015). Binaural Navigation for the Visually Impaired with a Smartphone. In ICMC.

II. Patents

2022

ID	Patent Titles	authors	link
CN-114945892-A	播放音频的方法、装置、系统、设备及存储介质	曹翔, 汤戈, 徐豪杰, 王征韬, 雷兆恒	https://patents.google.com/patent/CN114945892A/zh
CN-114936996-A	一种图像检测方法、装置、智能设备及存储介质	洪国伟, 曹成志, 董治, 雷兆恒	https://patents.google.com/patent/CN114936996A/zh
CN-114329043-A	Audio essence fragment determination method, electronic equipment and computer-readable storage medium	毛绮雯, 陈肇康, 吴斌, 雷兆恒	https://patents.google.com/patent/CN114329043A/en
CN-114067840-A	生成音乐视频的方法、存储介质和电子设备	梅立锋, 杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN114067840A/zh
CN-113963397-A	图像处理方法、服务器以及存储介质	杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN113963397A/zh
CN-113902989-A	直播场景检测方法、存储介质及电子设备	洪国伟, 曹成志, 曾裕斌, 董治, 雷兆恒	https://patents.google.com/patent/CN113902989A/zh
CN-113377331-B	Audio data processing method, device, equipment and storage medium	余菲, 孔令城, 赵伟峰, 雷兆恒, 周文江	https://patents.google.com/patent/CN113377331B/en
CN-113393830-B	混合声学模型训练及歌词时间戳生成方法、设备、介质	张斌, 赵伟峰, 雷兆恒, 周文江, 张柏生, 李幸烨, 苑文波, 杨小康, 李童, 林艳秋, 曹利, 代玥, 胡鹏	https://patents.google.com/patent/CN113393830B/zh
CN-113901894-A	Video generation method, device, server and storage medium	杨跃, 董治, 雷兆恒, 梅立锋	https://patents.google.com/patent/CN113901894A/en
CN-113888534-A	一种图像处理方法、电子设备及可读存储介质	曾梓华, 董治, 雷兆恒	https://patents.google.com/patent/CN113888534A/zh

2021

id	patent title	authors	link
CN-113724136-A	Video restoration method, device and medium	曾裕斌, 洪国伟, 董治, 雷兆恒	https://patents.google.com/patent/CN113724136A/en
CN-113689440-A	Video processing method and device, computer equipment and storage medium	黄均昕, 杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN113689440A/en
CN-113610012-A	Video detection method, electronic device and computer-readable storage medium	洪国伟, 曹成志, 曾裕斌, 董治, 雷兆恒	https://patents.google.com/patent/CN113610012A/en
CN-113569809-A	Image processing method, device and computer readable storage medium	魏旭东, 杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN113569809A/en
CN-113516762-A	Image processing method and device	杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN113516762A/en
CN-113505707-A	吸烟行为检测方法、电子设备及可读存储介质	洪国伟, 曹成志, 雷兆恒	https://patents.google.com/patent/CN113505707A/zh
CN-113868463-A	Recommendation model training method and device	龚韬, 赵伟峰, 胡诗超, 陈洲旋, 顾旻玮, 马小栓, 蔡宗颔, 雷兆恒, 周文江	https://patents.google.com/patent/CN113868463A/en
CN-113486672-A	Method for disambiguating polyphone, electronic device and computer readable storage medium	杨宜涛, 徐东, 陈洲旋, 赵伟峰, 雷兆恒, 周文江	https://patents.google.com/patent/CN113486672A/en
CN-113473201-A	Audio and video alignment method, device, equipment and storage medium	杨跃, 董治, 雷兆恒	https://patents.google.com/patent/CN113473201A/en
CN-113393830-A	混合声学模型训练及歌词时间戳生成方法、设备、介质	张斌, 赵伟峰, 雷兆恒, 周文江, 张柏生, 李幸烨, 苑文波, 杨小康, 李童, 林艳秋, 曹利, 代玥, 胡鹏	https://patents.google.com/patent/CN113393830A/zh
CN-113377331-A	一种音频数据处理方法、装置、设备及存储介质	余菲, 孔令城, 赵伟峰, 雷兆恒, 周文江	https://patents.google.com/patent/CN113377331A/zh
CN-113257222-A	Method, terminal and storage medium for synthesizing song audio	周思瑜, 庄晓滨, 徐东, 赵伟峰, 吴斌, 雷兆恒, 胡鹏	https://patents.google.com/patent/CN113257222A/en
CN-113192484-A	基于文本生成音频的方法、设备和存储介质	徐东, 邓一平, 陈洲旋, 鲁霄, 余洋洋, 陈苑苑, 邢佳佳, 陈纳珩, 周思瑜, 赵伟峰, 周蓝珺, 易越, 许瑶, 唐志彬, 曹利, 雷兆恒, 潘树燊, 周文江	https://patents.google.com/patent/CN113192484A/zh
WO-2021139535-A1	Method, apparatus and system for playing audio, and device and storage medium	曹翔, 汤戈, 徐豪杰, 王征韬, 雷兆恒	https://patents.google.com/patent/WO2021139535A1/en
CN-113077815-A	一种音频评估方法及组件	夏志强, 吴斌, 雷兆恒, 王征韬	https://patents.google.com/patent/CN113077815A/zh
CN-109903784-B	一种拟合失真音频数据的方法及装置	陈颖, 赵伟峰, 张庆, 雷兆恒, 王征韬, 孔令城, 徐东, 杨伟明, 陈洲旋, 鲁霄	https://patents.google.com/patent/CN109903784B/zh
CN-112445933-A	Model training method, device, equipment and storage medium	陈肇康, 林梅露, 吴斌, 雷兆恒	https://patents.google.com/patent/CN112445933A/en
CN-112257781-A	Model training method and device	林梅露, 陈肇康, 夏志强, 吴斌, 雷兆恒	https://patents.google.com/patent/CN112257781A/en
CN-112231511-A	Neural network model training method and song mining method and device	夏志强, 吴斌, 雷兆恒	https://patents.google.com/patent/CN112231511A/en
CN-112183946-A	Multimedia content evaluation method, device and training method thereof	关文婕, 吴斌, 雷兆恒	https://patents.google.com/patent/CN112183946A/en

2020

id	patent title	authors	link
CN-111414513-A	Music genre classification method and device and storage medium	林梅露, 吴康健, 吴斌, 王征韬, 夏志强, 雷兆恒	https://patents.google.com/patent/CN111414513A/en
CN-111261185-A	播放音频的方法、装置、系统、设备及存储介质	曹翔, 汤戈, 徐豪杰, 王征韬, 雷兆恒	https://patents.google.com/patent/CN111261185A/zh

2019

ID	Patent Title	Autors	Link
CN-110516104-A	Song recommendations method, apparatus and computer storage medium	张斌, 王征韬, 吴斌, 雷兆恒	https://patents.google.com/patent/CN110516104A/en
CN-110472096-A	Management method, device, equipment and the storage medium of library	杨伟明, 赵伟峰, 雷兆恒	https://patents.google.com/patent/CN110472096A/en
CN-109903784-A	A kind of method and device of fitting distortion audio data	陈颖, 赵伟峰, 张庆, 雷兆恒, 王征韬, 孔令城, 徐东, 杨伟明, 陈洲旋, 鲁霄	https://patents.google.com/patent/CN109903784A/en