Introduction🙋♂️
- I am a second-year Ph.D. student at the Department of Data Science & Artificial Intelligence (DSAI) / Department of Applied Mathematics (AMA), the Hong Kong Polytechnic University (PolyU). I am fortunately supervised by Prof. Han Ruijian, Prof. Huang Jian, and Prof. Yuan Yancheng.
- Before that, I worked as a research assistant at Research Center for the Mathematical Foundations of Generative AI (CMFAI) and supervised by Prof. Huang Jian. I obtained my Master’s degree with Distinction Honor in Data Science and Analytics in 2024, under supervision of Prof. Jiang Binyan. I got my Bachelor’s degree in Computer Science and Technology in 2022, and obtained the National Scholarship.
- I am interested in areas of artificial intelligence, software development, and big data. I have more than two years of solid industry experience.
Research Interests💡
- Large Language Model: Supervised Fine-tuning (SFT) and In-context Learning (ICL).
- AI Agent: Multi-agent System, Retrieval Augmented Generation (RAG).
- AI4Science: Agent4Science, Health Informatics, Medical Image, Medical Language Model.
News📢
- Our survey paper A Survey on Large Language Model-based Agents for Statistics and Data Science has been accepted by the TAS (The American Statistician)🎉. Aug 22, 2025
- Our Paper LAMBDA: A Large Model Based Data Agent has been accepted by top journal JASA (Journal of the American Statistical Association)🎉. And it is our great honor that our paper has been selected with discussion (Only 2) and will be presented at the JSM 2025. We are especially privileged to learn that Prof. David Donoho will serve as one of the discussants for our work.May 16, 2025
- Registered as a Ph.D. student at the Hong Kong Polytechnic University. August 30, 2024
- Graduated and got the Distinction Honor 🥇 from Msc in Data Science & Analytics, PolyU. July 15, 2024
- My fans exceed 1000 on CSDN 🔥. December, 2023
Papers & Manuscripts📰
A Survey on Large Language Model-based Agents for Statistics and Data ScienceAccepted. The American Statistician, 2025. (JCR Q1)
Paper Repository
Paper Repository
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing Technical Report. arXiv preprint arXiv:2406.02350, 2024.
Paper Code HuggingFace
Paper Code HuggingFace
Research & Industry Experiences🚀
-
Large Language Model in diagnostic systems :Responsible for research and development of the latest LLM, including fine-tuning, evaluation and prompt engineering.
-
Intelligent inspection solutions for wind power drones :Responsible for target detection and image segmentation, including algorithm design and implementation, fine-tuning and model deployment (Product Details).
-
Intelligent web robot “Xiao D” in low code platform :Responsible for back-end development and optimization of speech recognition solution.Recruitment module of the internship management system :Responsible for algorithm design and implementation of resume content classification and key information extraction.
-
Choco BOX applet (Micro-service e-commerce system) :Responsible for back-end development. Participating in performance tuning, service splitting and high concurrency design.
Awards🏅
- National Scholarship of China, (Highest scholarship honor in China | 0.2%) 12/2020
- The Outstanding Graduation of Zhejiang Province, (4%) 06/2022
- Government Scholarship of Zhejiang Province (5%) 12/2021
- PolyU Research Postgraduate Scholarship 09/2024
- Elite Scholarship × 2 (Highest honor in the university, 1%) 07/2021 & 07/2020
- First Class Scholarship for Academic Excellence × 5, (3%) 2018 & 2019 & 2020 & 2021
- Winning Prize of DJI RoboMaster Intelligent perception technology competition (Ranking 28th Nationally) 12/2022
- Second Prize of National Artificial Intelligence & Innovation Competition 05/2021
- Merit Student Award × 7 2018 & 2019 & 2020 & 2021
- Outstanding Chief Award of Computer Hospital Association 06/2020
Teaching Services👨🏫
- DSAI5101 Statistical Data Mining, Teaching Assistant. 25/26 Semester 1
- Mathematics Learning Support Centre, Regular Help Session, Teaching Assistant. 24/25 Semester 1
Talks📚
- LAMBDA: A Large Model Based Data Agent @ Seminar of Mathematical Foundations of AI, Tianyuan Mathematics Research Center, Kunming, Yunan. Sep 27, 2024
- Understanding Large Language Models: Principles, Evolution, and Applications @ PolyU Summer School of Beihang University × Northwestern Polytechnical University. Jun 23, 2024
Professional Skills🪀
- Familiar with machine learning, and deep learning theory; familiar with AI tasks such as data mining, image classification, target detection, image segmentation, text categorization, LLM, etc.; Proficient in Pytorch.
- Familiar with common data structures and algorithms (table, stack, queue, search, sort, etc.), computer network protocols (TCP, UDP, HTTP, WebSocket, etc.) and operating systems (scheduling, management, etc.).
- Familiar with programming languages like Python, and Java; Knowledge of C, JavaScript, PHP and R; Familiar with markup languages HTML and CSS; Familiar with Linux common shell commands.
- Familiar with development frameworks like SSM, SpringBoot, SpringCloud, Flask, and VUE; Familiar with development tools and cloud ecosystems Git, Swagger, Postman, Docker, AWS, Aliyun, etc.
- Familiar with relational database such as MySQL (indexing, transaction, SQL tuning) and SQLServer; understand non-relational database like Redis and MongoDB.
- Understanding of distributed systems, microservice architecture and message middleware such as RabbitMQ; understanding of high concurrency design, load balancing, multi-threading, and locking mechanism.
- Understanding of components of big data framework such as Hadoop, Hive, Spark and Flink; understanding of ETL process, data warehousing, and common data analysis tools such as Tableau.
Others💌
- 🎓 Research interests please contact me by mj.sun@connect.polyu.hk