Research Interests 💡
- Agentic AI: Data Science Agent, Multi-agent System, Retrieval Augmented Generation, Benchmarks.
- Large Language Model: Supervised Fine-tuning and Reinforcement Learning.
- AI4Science: Agent4Science, Health Informatics, Medical Image, Medical Language Model.
Papers & Manuscripts 📰
DARE: Aligning LLM Agents with the R Statistical Ecosystem via Distribution-Aware Retrieval
Maojun Sun†, Yue Wu†, Yifei Xie†, Ruijian Han*, Binyan Jiang, Defeng Sun, Yancheng Yuan*, Jian Huang*.
Under Review, 2026
DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems
Maojun Sun†, Yifei Xie†, Yue Wu†, Ruijian Han*, Binyan Jiang, Defeng Sun, Yancheng Yuan*, Jian Huang*.
Under Review, arXiv preprint arXiv:2601.13591, 2026
JASA
LAMBDA: A Large Model Based Data Agent
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan*, Jian Huang*.
Accepted. Journal of the American Statistical Association, 2025. (Top Journal)
🏅 Selected with discussion
JASA
Rejoinder to the Discussions on "LAMBDA: A Large Model Based Data Agent"
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan*, Jian Huang*.
Accepted. Journal of the American Statistical Association, 2026. (Top Journal)
TAS
A Survey on Large Language Model-based Agents for Statistics and Data Science
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan*, Jian Huang*.
Accepted. The American Statistician, 2025. (JCR Q1)
🏅 Selected with discussion
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
Maojun Sun.
Technical Report, arXiv preprint arXiv:2406.02350, 2024.
Experiences 🚀
Research Assistant & Project Assistant
Research and develop on LLM for data analysis. Develop systems for
IOR,
CMFAI,
RCNA,
RCQF.
Student Researcher
Research and develop on LLM in diagnostic systems (fine-tuning, evaluation, prompt engineering).
Image Algorithm Intern
Research and develop on intelligent inspection solutions for wind power drones.
AI Engineer
Develop on web robot "Xiao D". Research on algorithm of resume content classification and extraction.
Awards 🏅
National Scholarship of China 12/2020
Outstanding Graduation of Zhejiang Province 06/2022
Government Scholarship of Zhejiang Province 12/2021
PolyU Research Postgraduate Scholarship 09/2024
Elite Scholarship × 2 2020, 2021
First Class Scholarship × 5 2018-2021
Winning Prize, DJI RoboMaster Intelligent Perception 12/2022
Second Prize of National Artificial Intelligence & Innovation Competition 05/2021
Merit Student Award × 7 2018 & 2019 & 2020 & 2021
Teaching & Talks 👨🏫
Teaching Assistant, DSAI1102 Data Analytics Fundamentals, PolyU
25/26 S2
Teaching Assistant, DSAI5101 Statistical Data Mining, PolyU
25/26 S1
Teaching Assistant, Mathematics Learning Support Centre, PolyU
24/25 S1
Talk: LAMBDA: A Large Model Based Data Agent @ Seminar of Mathematical Foundations of AI, Tianyuan Mathematics Research Center, Kunming, Yunnan
Sep 2024
Talk: Understanding Large Language Models: Principles, Evolution, and Applications @ PolyU Summer School
Jun 2024
Professional Skills 🪀
AI & ML: LLM Fine-tuning, Image Classification, Data Mining, Target Detection, Image Segmentation, etc.
Programming: Python, Java, SQL, HTML/JS/CSS, C, etc.
Development: FastAPI, Flask, SpringBoot, SpringCloud, Vue, Nginx, Git, Docker, AWS, Aliyun, etc.
Big Data: MySQL, Redis, Hadoop, Spark, etc.