Exploring the World and the Life.
My name is Qiaosheng Chen (陈乔晟). Currently, I am a third-year Ph.D. candidate of Websoft Research Group at the School of Computer Science and Technology, Nanjing University. I received my B.Eng. degree from Harbin Institute of Technology at Weihai in 2021. In the same year, I was admitted to study for a M.Sc. degree in Nanjing University without entrance examination. In 2023, I started my Ph.D. degree under the supervision of Prof. Gong Cheng.
My research interests include Web Code Generation, Big Data Search, Knowledge Graph, and Retrieval-Augmented Generation (RAG). I have published 5 CCF-A papers and 3 CCF-B papers as the first/co-first author. I received the ISWC 2023 Best Research Paper Nomination and the CCF BigData 2025 Best Application Paper award. I was selected for the inaugural CAST Young Talent Support Program (Ph.D. Special Plan) sponsored by the Chinese Information Processing Society of China (CIPS).
Feel free to reach out if you are interested in collaboration or potential opportunities.
News
- 2026.04 One paper accepted by ICML 2026 (first author).
- 2026.03 Joined Alibaba Qwen Team on WebDev pre-training.
- 2025.12 One paper accepted by ICLR 2026 (co-first author).
- 2025.11 Joined Tencent HY AI Data Team for Deep Research Agent project.
- 2025.07 Two papers accepted by SIGIR 2025.
- 2025.04 Joined Shanghai AI Lab as a research intern.
- 2024.12 Received National Scholarship (Ph.D.) and Outstanding Graduate Student Pioneer at NJU.
- 2024.10 Selected for the inaugural CAST Young Talent Support Program (Ph.D. Special Plan, CIPS).
- 2024.07 One paper accepted by ISWC 2024 (first author).
- 2024.03 Two papers accepted by SIGIR 2024 (both first author).
Education

2021.09 - Present
Ph.D. in Computer Science advised by Prof. Gong Cheng
National Scholarship (Ph.D.) | Outstanding Graduate Student Pioneer

2017.09 - 2021.06
B.E. in Computer Science (GPA: 90.11/100, Rank: 5/137)
National Scholarship (Undergraduate) | Outstanding Graduate of Shandong Province
Experience

2026.03 - 2026.06
Research Intern · WebDev Pre-training
Pre-training data cleaning, filtering, and synthesis for Qwen's code capabilities; designed file-level and repo-level code data pipelines.

2025.11 - 2026.02
Research Intern · Deep Research Agent
Designed DAG-based planning and summary strategies; SFT+RL optimization for agent plan capabilities.

2025.04 - 2025.09
Research Intern · Interactive Scientific Demo Code Generation & Multimodal Code LLM
Led research on interactive science demo generation (ICML 2026); contributed HTML code data for JanusCoder (ICLR 2026) and Intern-S1-Pro.
Publications
(* equal contribution · † corresponding author)
Qiaosheng Chen, Yang Liu, Lei Li, Kai Chen, Qipeng Guo, Gong Cheng, Fei Yuan.
Studied LLMs' ability to generate interactive scientific demonstration website code. Proposed hard tests for interactivity and soft tests based on multi-screenshot comparison.
ICML 2026
Qiushi Sun*, Jingyang Gong*, Yang Liu*, Qiaosheng Chen*, Lei Li, Kai Chen, Qipeng Guo, Ben Kao, Fei Yuan.
Trained language and multimodal models for code plotting, frontend web generation, multimodal algorithm problems, and niche visualization languages.
ICLR 2026
Qiaosheng Chen, Kaijia Huang, Xiao Zhou, Weiqing Luo, Yuanning Cui, Gong Cheng.
Built HuggingKG, the first AI resource knowledge graph based on Hugging Face (2.6M nodes, 6.2M edges), and designed HuggingBench covering recommendation, classification, and tracing tasks.
SIGIR 2025
Xiao Zhou, Qiaosheng Chen, Jiageng Chen, Gong Cheng.
Proposed μDS that jointly optimizes compactness, relevance, representativeness, and cohesiveness for data snippet extraction, modeled as a novel combinatorial optimization problem with worst-case approximation guarantees.
SIGIR 2025
Qiaosheng Chen, Jiageng Chen, Xiao Zhou, Gong Cheng.
Proposed CDS, a subgraph-extraction-based method for generating compact, query-relevant data snippets that improve retrieval accuracy and result interpretability for dataset search.
SIGIR 2024
Qiaosheng Chen, Weiqing Luo, Zixian Huang, Tengteng Lin, Xiaxia Wang, Ahmet Soylu, Basil Ell, Baifan Zhou, Evgeny Kharlamov, Gong Cheng.
Built ACORDAR 2.0, a content-based dataset retrieval test collection that uses dense retrieval to expand candidate datasets and LLM-based query rewriting to improve evaluation diversity.
SIGIR 2024
Qiaosheng Chen, Zixian Huang, Zhiyang Zhang, Weiqing Luo, Tengteng Lin, Qing Shi, Gong Cheng.
Proposed DR2 using distant supervision and self-training for generating pseudo-labeled data, with coarse-to-fine training to improve dense retrieval models for dataset search.
ISWC 2023 🏆 Best Research Paper Nomination
- ICML 2026 InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
First author. - ICLR 2026 JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
Co-first author. - Tech Report Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale
- SIGIR 2025 Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph
First author. - SIGIR 2025 μDS: Multi-Objective Data Snippet Extraction for Dataset Search
Second author. - ISWC 2025 mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs
Second author. - SIGIR 2024 Enhancing Dataset Search with Compact Data Snippets
First author. - SIGIR 2024 ACORDAR 2.0: A Test Collection for Ad Hoc Dataset Retrieval with Densely Pooled Datasets and Question-Style Queries
First author. - ISWC 2024 DUNKS: Chunking and Summarizing Large and Heterogeneous Data for Dataset Search
First author. - ISWC 2023 Dense Re-Ranking with Weak Supervision for RDF Dataset Search
First author. 🏆 Best Research Paper Nomination. - EMNLP 2023 An Empirical Investigation of Implicit and Explicit Knowledge-Enhanced Methods for Ad Hoc Dataset Retrieval
Second author. - SIGIR 2022 ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval
Second author.
Projects
Qiaosheng Chen (Project Leader)
Led the design and development of a national-scale public data search system. Collected, integrated, and indexed datasets from 148 open data portals across 25 provinces in China. Implemented keyword search, faceted search, and result presentation. Innovatively leveraged LLMs for automated metadata integration, high-precision dataset ranking, and relevance explanation.
CCF BigData 2023 · 🏆 CCF BigData 2025 Best Application Paper · DSE 2026
Awards
- 2024, Inaugural CAST Young Talent Support Program (Ph.D. Special Plan, sponsored by CIPS, 3226 nationwide)
- 2024, National Scholarship (Ph.D.), Nanjing University (14 awardees in CS School)
- 2024, Outstanding Graduate Student Pioneer, Nanjing University (15 awardees in CS School)
- 2021, Outstanding Graduate of Shandong Province
- 2019, CCPC National Finals Bronze Medal
- 2019, ACM-ICPC Asia Regional Contest (Shanghai) Silver Medal
- 2019, CCPC Xiamen Site Silver Medal
- 2018, ACM-ICPC Asia Regional Contest (Xuzhou) Silver Medal
- 2018, National Scholarship (Undergraduate), HIT Weihai (5 awardees in CS School)
Services
- Reviewer for ICML 2026 (Gold Reviewer), SIGIR 2025-2026, WWW 2026, KDD 2026, CIKM 2024-2026, WSDM 2026.








