Maojun SUN (Stephen)

菜鸡轻喷

EN/中文

Introduction🙋‍♂️

Research Interests💡

  • Large Language Model: Supervised Fine-tuning (SFT) and In-context Learning (ICL).
  • AI Agent: Multi-agent System, Retrieval Augmented Generation (RAG).
  • AI4Science: Agent4Science, Health Informatics, Medical Image, Medical Language Model.

News📢

  • Our survey paper A Survey on Large Language Model-based Agents for Statistics and Data Science has been accepted by the TAS (The American Statistician)🎉. Aug 22, 2025
  • Our Paper LAMBDA: A Large Model Based Data Agent has been accepted by top journal JASA (Journal of the American Statistical Association)🎉. And it is our great honor that our paper has been selected with discussion (Only 2) and will be presented at the JSM 2025. We are especially privileged to learn that Prof. David Donoho will serve as one of the discussants for our work.May 16, 2025
  • Registered as a Ph.D. student at the Hong Kong Polytechnic University. August 30, 2024
  • Graduated and got the Distinction Honor 🥇 from Msc in Data Science & Analytics, PolyU. July 15, 2024
  • My fans exceed 1000 on CSDN 🔥. December, 2023

Papers & Manuscripts📰

JASA
LAMBDA: A Large Model Based Data Agent GitHub Repo stars
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan*, and Jian Huang*.
Accepted. Journal of the American Statistical Association, 2025. (Top Journal)
🏅Selected with discussion

Paper Page Code Docs
TAS
A Survey on Large Language Model-based Agents for Statistics and Data Science
Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan*, and Jian Huang*.
Accepted. The American Statistician, 2025. (JCR Q1)
Paper Repository
LlamaCare: A Large Medical Language Model for Enhancing Healthcare Knowledge Sharing
Maojun Sun.
Technical Report. arXiv preprint arXiv:2406.02350, 2024.
Paper Code HuggingFace
Data Enhancement for Melanoma Classification
Maojun Sun, Anxing Jiang, and Zixiong Li.
2021 2nd International Conference on Artificial Intelligence and Computer Engineering.
Paper Code

Research & Industry Experiences🚀

  • Hong Kong Polytechnic University Research Assistant Feb 2024 - Aug 2024
    LAMBDA: Large Model Based Data Agent :
    Research and design LLM Agents for data science and analysis (LAMBDA).
    Systems Development of Research Centre: (Part-time Project Assistant) May 2023 - May 2024
    Designing and developing the research centre system, including: [IOR], [CMFAI], [RCNA], [RCQF].
  • AI Lab, Hong Kong Hospital Authority Student Researcher July 2023 - December 2023
    Large Language Model in diagnostic systems :
    Responsible for research and development of the latest LLM, including fine-tuning, evaluation and prompt engineering.
  • Bacara Energy Technology Co., Ltd. Image Algorithm Intern June 2022 - August 2022
    Intelligent inspection solutions for wind power drones :
    Responsible for target detection and image segmentation, including algorithm design and implementation, fine-tuning and model deployment (Product Details).
  • DXC Technology Artificial Intelligence Engineer Nov 2021 - June 2022
    Intelligent web robot “Xiao D” in low code platform :
    Responsible for back-end development and optimization of speech recognition solution.
    Recruitment module of the internship management system :
    Responsible for algorithm design and implementation of resume content classification and key information extraction.
  • Chinasoft International Co., Ltd. Software Development Intern June 2020 - August 2020
    Choco BOX applet (Micro-service e-commerce system) :
    Responsible for back-end development. Participating in performance tuning, service splitting and high concurrency design.

Awards🏅

  • National Scholarship of China, (Highest scholarship honor in China | 0.2%) 12/2020
  • The Outstanding Graduation of Zhejiang Province, (4%) 06/2022
  • Government Scholarship of Zhejiang Province (5%) 12/2021
  • PolyU Research Postgraduate Scholarship 09/2024
  • Elite Scholarship × 2 (Highest honor in the university, 1%) 07/2021 & 07/2020
  • First Class Scholarship for Academic Excellence × 5, (3%) 2018 & 2019 & 2020 & 2021
  • Winning Prize of DJI RoboMaster Intelligent perception technology competition (Ranking 28th Nationally) 12/2022
  • Second Prize of National Artificial Intelligence & Innovation Competition 05/2021
  • Merit Student Award × 7 2018 & 2019 & 2020 & 2021
  • Outstanding Chief Award of Computer Hospital Association 06/2020

Teaching Services👨‍🏫

Talks📚

Professional Skills🪀

  • Familiar with machine learning, and deep learning theory; familiar with AI tasks such as data mining, image classification, target detection, image segmentation, text categorization, LLM, etc.; Proficient in Pytorch.
  • Familiar with common data structures and algorithms (table, stack, queue, search, sort, etc.), computer network protocols (TCP, UDP, HTTP, WebSocket, etc.) and operating systems (scheduling, management, etc.).
  • Familiar with programming languages like Python, and Java; Knowledge of C, JavaScript, PHP and R; Familiar with markup languages HTML and CSS; Familiar with Linux common shell commands.
  • Familiar with development frameworks like SSM, SpringBoot, SpringCloud, Flask, and VUE; Familiar with development tools and cloud ecosystems Git, Swagger, Postman, Docker, AWS, Aliyun, etc.
  • Familiar with relational database such as MySQL (indexing, transaction, SQL tuning) and SQLServer; understand non-relational database like Redis and MongoDB.
  • Understanding of distributed systems, microservice architecture and message middleware such as RabbitMQ; understanding of high concurrency design, load balancing, multi-threading, and locking mechanism.
  • Understanding of components of big data framework such as Hadoop, Hive, Spark and Flink; understanding of ETL process, data warehousing, and common data analysis tools such as Tableau.

Others💌