To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The framework iteratively refines the selection, greatly improving efficiency, while being model-, dataset-, and domain-independent. Through experiments on 12 biomedical datasets across four tasks—named entity recognition, relation extraction, event extraction, and text classification—we demonstrate that our approach effectively identifies better combinations, even for tasks that may seem unpromising from a human perspective. This verifies that our framework provides a promising solution for maximizing MTL potential.
@inproceedings{zhan-zhang-2025-towards,title={Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models},author={Zhan, Zaifu and Zhang, Rui},editor={Chiruzzo, Luis and Ritter, Alan and Wang, Lu},booktitle={Findings of the Association for Computational Linguistics: NAACL 2025},month=apr,year={2025},address={Albuquerque, New Mexico},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2025.findings-naacl.297/},pages={5373--5386},isbn={979-8-89176-195-7},}
Arxiv
MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning
Zaifu Zhan, Jun Wang, Shuang Zhou, and 2 more authors
@misc{zhan2025mmragmultimoderetrievalaugmentedgeneration,title={MMRAG: Multi-Mode Retrieval-Augmented Generation with Large Language Models for Biomedical In-Context Learning},author={Zhan, Zaifu and Wang, Jun and Zhou, Shuang and Deng, Jiawen and Zhang, Rui},year={2025},eprint={2502.15954},archiveprefix={arXiv},primaryclass={cs.CL},}
SPJ-HDS
Benchmarking of Large Language Models for the Dental Admission Test
@article{BenchmarkingDental,author={Hou, Yu and Patel, Jay and Dai, Liya and Zhang, Emliy and Liu, Yang and Zhan, Zaifu and Gangwani, Pooja and Zhang, Rui},title={Benchmarking of Large Language Models for the Dental Admission Test},journal={Health Data Science},volume={0},number={ja},year={2025},month=feb,doi={10.34133/hds.0250},eprint={https://spj.science.org/doi/pdf/10.34133/hds.0250},}
JAMIA
RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements
Zaifu Zhan, Shuang Zhou, Mingchen Li, and 1 more author
Journal of the American Medical Informatics Association, Jan 2025
To develop an advanced multi-task large language model (LLM) framework for extracting diverse types of information about dietary supplements (DSs) from clinical records.We focused on 4 core DS information extraction tasks: named entity recognition (2 949 clinical sentences), relation extraction (4 892 sentences), triple extraction (2 949 sentences), and usage classification (2 460 sentences). To address these tasks, we introduced the retrieval-augmented multi-task information extraction (RAMIE) framework, which incorporates: (1) instruction fine-tuning with task-specific prompts; (2) multi-task training of LLMs to enhance storage efficiency and reduce training costs; and (3) retrieval-augmented generation, which retrieves similar examples from the training set to improve task performance. We compared the performance of RAMIE to LLMs with instruction fine-tuning alone and conducted an ablation study to evaluate the individual contributions of multi-task learning and retrieval-augmented generation to overall performance improvements.Using the RAMIE framework, Llama2-13B achieved an F1 score of 87.39 on the named entity recognition task, reflecting a 3.51% improvement. It also excelled in the relation extraction task with an F1 score of 93.74, a 1.15% improvement. For the triple extraction task, Llama2-7B achieved an F1 score of 79.45, representing a significant 14.26% improvement. MedAlpaca-7B delivered the highest F1 score of 93.45 on the usage classification task, with a 0.94% improvement. The ablation study highlighted that while multi-task learning improved efficiency with a minor trade-off in performance, the inclusion of retrieval-augmented generation significantly enhanced overall accuracy across tasks.The RAMIE framework demonstrates substantial improvements in multi-task information extraction for DS-related data from clinical records.
@article{10.1093/jamia/ocaf002,author={Zhan, Zaifu and Zhou, Shuang and Li, Mingchen and Zhang, Rui},title={RAMIE: retrieval-augmented multi-task information extraction with large language models on dietary supplements},journal={Journal of the American Medical Informatics Association},pages={ocaf002},year={2025},month=jan,issn={1527-974X},doi={10.1093/jamia/ocaf002},url={https://doi.org/10.1093/jamia/ocaf002},eprint={https://academic.oup.com/jamia/advance-article-pdf/doi/10.1093/jamia/ocaf002/61415205/ocaf002.pdf}}
2024
arXiv
Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models
@misc{zhan2024bettermultitasklearningframework,title={Towards Better Multi-task Learning: A Framework for Optimizing Dataset Combinations in Large Language Models},author={Zhan, Zaifu and Zhang, Rui},year={2024},journal={arXiv preprint arXiv:2412.11455}}
arXiv
RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements
Zaifu Zhan, Shuang Zhou, Mingchen Li, and 1 more author
@article{zhan2024ramie,title={RAMIE: Retrieval-Augmented Multi-task Information Extraction with Large Language Models on Dietary Supplements},author={Zhan, Zaifu and Zhou, Shuang and Li, Mingchen and Zhang, Rui},journal={arXiv preprint arXiv:2411.15700},year={2024},}
arXiv
Large language models for disease diagnosis: A scoping review
Shuang Zhou, Zidu Xu, Mian Zhang, and 8 more authors
@article{zhou2024large,title={Large language models for disease diagnosis: A scoping review},author={Zhou, Shuang and Xu, Zidu and Zhang, Mian and Xu, Chunpu and Guo, Yawen and Zhan, Zaifu and Ding, Sirui and Wang, Jiashuo and Xu, Kaishuai and Fang, Yi and others},journal={arXiv preprint arXiv:2409.00097},year={2024},}
arXiv
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness
Mingchen Li, Zaifu Zhan, Han Yang, and 3 more authors
@article{li2024benchmarking,title={Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness},author={Li, Mingchen and Zhan, Zaifu and Yang, Han and Xiao, Yongkang and Huang, Jiatan and Zhang, Rui},journal={arXiv preprint arXiv:2405.08151},year={2024},}