Chanjun Park is an Assistant Professor in the School of Software at Soongsil University, where he serves as the Principal Investigator of the Natural Language Processing Lab. Prior to joining Soongsil University, he was a Research Professor at Korea University. Before that, he served as a Principal Research Engineer and Technical Leader of the Large Language Models (LLMs) team at Upstage, where he contributed to building an ecosystem for LLMs. He also worked as a Research Engineer at SYSTRAN, contributing to advancements in machine translation (MT) and automatic speech recognition (ASR) systems. He earned his Ph.D. in the Department of Computer Science and Engineering at Korea University under the supervision of Professor Heuiseok Lim. He received the Naver Ph.D. Fellowship and was selected for the Forbes 30 Under 30 Korea. For more details, please see his CV.
2025: Two papers (including one from Industry) have been accepted at COLING 2025.
Please see my CV or Google Scholar profiles for the full list.
Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning
Sugyeong Eo, Jung Jun Lee, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2025 (Oral)
Benchmark Profiling: Mechanistic Diagnosis of LLM Benchmarks
Dongjun Kim, Gyuho Shim, Yongchan Chun, Minhyuk Kim, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2025 (Oral)
MultiDocFusion : Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents
Joong Min Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo, Heuiseok Lim
EMNLP 2025
HAWK: Highlighting Entity-aware Knowledge for Alleviating Information Sparsity in Long Contexts
Seonmin Koo, Jinsung Kim, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2025-Findings
ZEBRA: Leveraging Model-Behavioral Knowledge for Zero-Annotation Preference Dataset Construction
Jeesu Jung, Jinsung Kim, Chanjun Park (✝), Sangkeun Jung (✝)
EMNLP 2025-Findings
Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching
Seoyeon Kim, Huiseo Kim, Chanjun Park, Jinyoung Yeo, Dongha Lee
EMNLP 2025-Findings
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
Yungi Kim, Hyunsoo Ha, Seonghoon Yang, Sukyung Lee, Jihoo Kim, Chanjun Park (✝)
EMNLP 2025-Industry
AGENTiGraph: A Multi-Agent Knowledge Graph Framework for Interactive, Domain-Specific LLM Chatbots
Xinjie Zhao, Moritz Blum, Fan Gao, Yingjian Chen, Boming Yang, Luis Marquez-Carpintero, Mónica Pina-Navarro, Yanran Fu, So Morikawa, Yusuke Iwasawa, Yutaka Matsuo, Chanjun Park, Irene Li
CIKM 2025-Demo
HealthGenie: An Interactive Knowledge-Driven LLM Framework for Tailored Dietary Guidance
Fan Gao, Xinjie Zhao, Ding Xia, Zhongyi Zhou, Rui Yang, Jinghui Lu, Hang Jiang, Chanjun Park, Irene Li
CIKM 2025-Demo
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Yungi Kim, Hyunsoo Ha, Sukyung Lee, Jihoo Kim, Seonghoon Yang, Chanjun Park (✝)
ACL 2025
Enhancing Automatic Term Extraction in Large Language Models via Syntactic Retrieval
Yongchan Chun, Minhyuk Kim, Dongjun Kim, Chanjun Park (✝), Heuiseok Lim (✝)
ACL 2025-Findings
From Ambiguity to Accuracy: The Transformative Effect of Coreference Resolution on RAG systems
Youngjoon Jang, Seongtae Hong, Junyoung Son, Sungjin Park, Chanjun Park (✝), Heuiseok Lim (✝)
ACL 2025 - Student Research Workshop
LCIRC: A Recurrent Compression Approach for Efficient Long-form Context and Query Dependent Modeling in LLMs
Sumin An, Junyoung Sung, Wonpyo Park, Chanjun Park (✝), Paul Hongsuck Seo (✝)
NAACL 2025 (Oral)
CoME: A Unlearning-based Approach to Conflict-free Model Editing
Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2025
MIRAGE: A Metric-Intensive Benchmark for Retrieval-Augmented Generation Evaluation
Chanhee Park, Hyeonseok Moon, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2025-Findings
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung, Seungyoon Lee, Hyeonseok Moon, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2025-Findings
Find the Intention of Instruction: Comprehensive Evaluation of Instruction Understanding for Large Language Models
Hyeonseok Moon, Jaehyung Seo, Seungyoon Lee, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2025-Findings
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Hyeonwoo Kim, Dahyun Kim, Jihoo Kim, Sukyung Lee, Yungi Kim, Chanjun Park (✝)
NAACL 2025 - Industry, 2025
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Chanjun Park (✝), Hyeonwoo Kim
NAACL 2025 - Industry, 2025
CharacterGPT: A Persona Reconstruction Framework for Role-Playing Agents
Jeiyoon Park, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2025 - Industry, 2025
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
Hyunbyung Park, Sukyung Lee, Gyoungjin Gim, Yungi Kim, Dahyun Kim, Chanjun Park (✝)
NAACL 2025 - Demo, 2025
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Dahyun Kim, Sukyung Lee, Yungi Kim, Attapol Rutherford, Chanjun Park (✝)
COLING 2025
sDPO: Don’t Use Your Data All at Once
Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park (✝)
COLING 2025 - Industry
An analysis on language transfer of pre-trained language model with cross-lingual post-training
Suhyune Son (*), Chanjun Park (*), Jungseob Lee (*), Midan Shim (*), Chanhee Lee, Yoonna Jang, Jaehyung Seo, Jungwoo Lim, Heuiseok Lim
Expert Systems with Applications, 2025
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
Seonmin Koo, Jinsung Kim, YoungJoon Jang, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2024
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models
Seonmin Koo, Jinsung Kim, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2024-Findings
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Hyeonseok Moon, Seungyoon Lee, Seongtae Hong, Seungjun Lee, Chanjun Park , Heuiseok Lim
EMNLP 2024-Findings
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park (✝)
EMNLP 2024 - Industry
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim, Wonho Song, Dahyun Kim, Yunsu Kim, Yungi Kim, Chanjun Park (✝)
EMNLP 2024 - Demo
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Chanjun Park, Hyeonwoo Kim, Dahyun Kim, SeongHwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee
ACL 2024
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee, Heuiseok Lim
ACL 2024 - Findings
Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation
Jungseob Lee, Hyeonseok Moon, Seungjun Lee, Chanjun Park (✝), Sugyeong Eo, Hyunwoong Ko, Jaehyung Seo, Seungyoon Lee, Heuiseok Lim (✝)
ACL 2024 - Findings
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Sanghoon Kim (*, ✝), Dahyun Kim (*), Chanjun Park (*, ✝), Wonsung Lee (*, ✝), Wonho Song (*), Yunsu Kim (*), Hyeonwoo Kim (*), Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee (✝), Sunghun Kim (✝)
NAACL 2024 - Industry
Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
Seungyoon Lee, Dongjun Kim, Dahyun Jung, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2024 - Student Research Workshop
Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation
Dahyun Jung, Sugyeong Eo, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2024 - Student Research Workshop
Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean
Seungyoon Lee, Chanjun Park (✝), DaHyun Jung, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim (✝)
LREC-COLING 2024, Oral
Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation
Sugyeong Eo, Jungwoo Lim, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
LREC-COLING 2024, Oral
Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism
Chanjun Park (*, ✝), Minsoo Khang (*), Dahyun Kim (*)
ICLR 2024 - Data-centric Machine Learning Research (DMLR) Workshop
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
EACL 2024 - Findings
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
EACL 2024 - Findings
Exploiting Hanja-based Resources in Processing Korean Historic Documents written by Common Literati
Hyeonseok Moon, Myunghoon Kang, Jaehyung Seo, Sugyeong Eo, Chanjun Park, Yeongwook Yang, Heuiseok Lim
IEEE Access, 2024
| Year | Award |
|---|---|
| 2024.02 | Forbes 30 Under 30 Korea |
| 2023.02 | Best Paper Award at Korea University |
| 2022.12 | 1st Place at WMT Quality Estimation Shared Task 2022 - Sentence-level Critical Error Detection |
| 2021.12 | Naver Ph.D. Fellowship |
| 2019.10 | 1st Place at Microsoft AI Accessibility Hackathon in Korea, Microsoft |