Chanjun Park is a Research Professor at Korea University. Before joining Korea University, he served as a Principal Research Engineer and Technical Leader for the Large Language Models (LLMs) team at Upstage, where he contributed to building an ecosystem for LLMs. He also worked as a Research Engineer at SYSTRAN, contributing to the development of machine translation (MT) and automatic speech recognition (ASR) systems. He earned his Ph.D. at Korea University under the supervision of Professor Heuiseok Lim. He has authored over 100 publications in leading NLP conferences and journals, such as ACL, EMNLP, NAACL, EACL, and COLING. He has delivered over 70 invited talks and has significant teaching experience. Additionally, he holds more than 10 patents in the field of natural language processing (NLP). His achievements include recognition in Forbes 30 Under 30 Korea in the SCIENCE / SW field and the Naver Ph.D. Fellowship. He has been actively involved in the academic community, holding roles such as Virtual Social Chair at COLING 2022, Publication Chair for DMLR at ICLR 2024, and Program Chair for the WiNLP Workshop. For more details, please see his CV.
My research philosophy is centered on service-driven research, aiming to bridge the gap between foundational theories in natural language processing (NLP) and their practical, real-world applications. My primary interests include the development of efficient, purpose-trained Large Language Models (LLMs), with a particular focus on their fundamental capabilities, rigorous evaluation, and addressing issues of recency and truthfulness. Additionally, I am deeply engaged in cross-lingual NLP and multidisciplinary research that integrates diverse fields to enhance NLP applications.
Upstage: Solar, Open Ko-LLM Leaderboard, Open Ko-LLM Leaderboard2, sDPO, SAAS, Dataverse, Evalverse, 1 Trillion Token Club, LP Data Pipeline. Thai LLM, DeepLearning.AI
Korea University: KoCommonGEN v2, LeVoc, Hyper-BTS, KEBAP, Critical Error Detection, Quality Estimation, Automatic Post Editing, Parallel Corpus Filtering, Korean CommonGen, PEEP-Talk, FreeTalky, PicTalky, ONE-Piece
BUFS: Han-Tong-E, Exobrain
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
Yungi Kim, Hyunsoo Ha, Seonghoon Yang, Sukyung Lee, Jihoo Kim, Chanjun Park (✝)
arxiv, 2024
Can Code-Switched Texts Activate a Knowledge Switch in LLMs? A Case Study on English-Korean Code-Switching
Seoyeon Kim, Huiseo Kim, Chanjun Park, Jinyoung Yeo, Dongha Lee
arxiv, 2024
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Hyeonwoo Kim, Dahyun Kim, Jihoo Kim, Sukyung Lee, Yungi Kim, Chanjun Park (✝)
arxiv, 2024
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Dahyun Kim, Sukyung Lee, Yungi Kim, Attapol Rutherford, Chanjun Park (✝)
arxiv, 2024
InstaTrans: An Instruction-Aware Translation Framework for Non-English Instruction Datasets
Yungi Kim, Chanjun Park (✝)
arxiv, 2024
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models
Chanjun Park (✝), Hyunsoo Ha, Jihoo Kim, Yungi Kim, Dahyun Kim, Sukyung Lee, Seonghoon Yang
arxiv, 2024
Rethinking KenLM: Good and Bad Model Ensembles for Efficient Text Quality Filtering in Large Web Corpora
Yungi Kim, Hyunsoo Ha, Sukyung Lee, Jihoo Kim, Seonghoon Yang, Chanjun Park (✝)
arxiv, 2024
Understanding LLM Development Through Longitudinal Study: Insights from the Open Ko-LLM Leaderboard
Chanjun Park (✝), Hyeonwoo Kim
arxiv, 2024
ChatLang-8: An LLM-Based Synthetic Data Generation Framework for Grammatical Error Correction
Jeiyoon Park, Chanjun Park (✝), Heuiseok Lim (✝)
arxiv, 2024
Dataverse: Open-Source ETL (Extract, Transform, Load) Pipeline for Large Language Models
Hyunbyung Park, Sukyung Lee, Gyoungjin Gim, Yungi Kim, Dahyun Kim, Chanjun Park (✝)
arxiv, 2024
Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards
Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
arxiv, 2023
sDPO: Don’t Use Your Data All at Once
Dahyun Kim, Yungi Kim, Wonho Song, Hyeonwoo Kim, Yunsu Kim, Sanghoon Kim, Chanjun Park (✝)
COLING 2025 - Industry
Where am I? Large Language Models Wandering between Semantics and Structures in Long Contexts
Seonmin Koo, Jinsung Kim, YoungJoon Jang, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2024
Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models
Seonmin Koo, Jinsung Kim, Chanjun Park (✝), Heuiseok Lim (✝)
EMNLP 2024-Findings
Translation of Multifaceted Data without Re-Training of Machine Translation Systems
Hyeonseok Moon, Seungyoon Lee, Seongtae Hong, Seungjun Lee, Chanjun Park , Heuiseok Lim
EMNLP 2024-Findings
SAAS: Solving Ability Amplification Strategy for Enhanced Mathematical Reasoning in Large Language Models
Hyeonwoo Kim, Gyoungjin Gim, Yungi Kim, Jihoo Kim, Byungju Kim, Wonseok Lee, Chanjun Park (✝)
EMNLP 2024 - Industry
Evalverse: Unified and Accessible Library for Large Language Model Evaluation
Jihoo Kim, Wonho Song, Dahyun Kim, Yunsu Kim, Yungi Kim, Chanjun Park (✝)
EMNLP 2024 - Demo
Open Ko-LLM Leaderboard: Evaluating Large Language Models in Korean with Ko-H5 Benchmark
Chanjun Park, Hyeonwoo Kim, Dahyun Kim, SeongHwan Cho, Sanghoon Kim, Sukyung Lee, Yungi Kim, Hwalsuk Lee
ACL 2024
KoCommonGEN v2: A Benchmark for Navigating Korean Commonsense Reasoning Challenges in Large Language Models
Jaehyung Seo, Jaewook Lee, Chanjun Park, SeongTae Hong, Seungjun Lee, Heuiseok Lim
ACL 2024 - Findings
Length-aware Byte Pair Encoding for Mitigating Over-segmentation in Korean Machine Translation
Jungseob Lee, Hyeonseok Moon, Seungjun Lee, Chanjun Park (✝), Sugyeong Eo, Hyunwoong Ko, Jaehyung Seo, Seungyoon Lee, Heuiseok Lim (✝)
ACL 2024 - Findings
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Sanghoon Kim (*, ✝), Dahyun Kim (*), Chanjun Park (*, ✝), Wonsung Lee (*, ✝), Wonho Song (*), Yunsu Kim (*), Hyeonwoo Kim (*), Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee (✝), Sunghun Kim (✝)
NAACL 2024 - Industry
Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean
Seungyoon Lee, Chanjun Park (✝), DaHyun Jung, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim (✝)
LREC-COLING 2024, Oral
Detecting Critical Errors Considering Cross-Cultural Factors in English-Korean Translation
Sugyeong Eo, Jungwoo Lim, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
LREC-COLING 2024, Oral
Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
EACL 2024 - Findings
Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
EACL 2024 - Findings
KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
Seonmin Koo (*), Chanjun Park (*), Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
EMNLP 2023
CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim
EMNLP 2023
Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
IJCNLP-AACL 2023
Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering
Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
IWSLT 2023
PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
Seugnjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Nugroho Yahya, Heuiseok Lim
ACL 2023 - Demo
PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities
Chanjun Park (*), Yoonna Jang (*), Seolhwa Lee (*), Jaehyung Seo (*), Kisu Yang, Heuiseok Lim
AACL-IJCNLP 2022 - Demo
KU X Upstage’s submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
WMT 2022
QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
COLING 2022
A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
NAACL 2022 - Findings
Priming Ancient Korean Neural Machine Translation
Chanjun Park (*), Seolhwa Lee (*), Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
LREC 2022, Oral
FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
Chanjun Park (*), Yoonna Jang (*), Seolhwa Lee (*), Sungjin Park (*), Heuiseok Lim
LREC 2022
Empirical Analysis of Synthetic Data Generation Using Noising Strategies for Automatic Post-editing
Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jeongsub Lee, Sugyeong Eo, Heuiseok Lim
LREC 2022
Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
Chanjun Park (*), Sugyeong Eo (*), Hyeonseok Moon (*), Heuiseok Lim
NAACL-HLT 2021 - Industry
Exploring Inherent Biases in LLMs within Korean Social Context: A Comparative Analysis of ChatGPT and GPT-4
Seungyoon Lee, Dongjun Kim, Dahyun Jung, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2024 - Student Research Workshop (SRW)
Explainable CED: A Dataset for Explainable Critical Error Detection in Machine Translation
Dahyun Jung, Sugyeong Eo, Chanjun Park (✝), Heuiseok Lim (✝)
NAACL 2024 - Student Research Workshop (SRW)
Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism
Chanjun Park (*, ✝), Minsoo Khang (*), Dahyun Kim (*)
ICLR 2024 - Data-centric Machine Learning Research (DMLR) Workshop
Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
Seungyoon Lee (*), DaHyun Jung (*), Chanjun Park (*), Seolhwa Lee, Heuiseok Lim
ICDM 2023 - The First Workshop on Data-Centric AI
DMOps: Data Management Operation and Recipes
Eujeong Choi, Chanjun Park (*, ✝)
ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop
Two Heads are Better than One? Verification of Ensemble Effect in Neural Machine Translation
Chanjun Park, Sungjin Park, Seolhwa Lee, Taesun Whang, Heuiseok Lim
EMNLP 2021 -The Second Workshop on Insights from Negative Results in NLP
BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
ACL 2021 -Workshop on Asian Translation (WAT)
Dealing with the Paradox of Quality Estimation
Sugyeong Eo (*), Chanjun Park (*), Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
MT Summit 2021 - Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT)
In addition, I have published 23 papers in international workshops, 31 papers in international journals (SCI/SCIE), 21 papers in domestic journals (KCI), 65 papers in domestic conferences, and 34 papers in other international conferences.
Year | Award |
---|---|
2024.02 | Forbes 30 Under 30 Korea |
2023.02 | Best Paper Award at Korea University |
2022.12 | 1st Place at WMT Quality Estimation Shared Task 2022 - Sentence-level Critical Error Detection |
2021.12 | Naver Ph.D. Fellowship |
2019.10 | 1st Place at Microsoft AI Accessibility Hackathon in Korea, Microsoft |
Year | Place | Contents |
---|---|---|
2024.11 | LG Electronics | Large Language Model Data and Evaluation in the Wild |
2024.11 | U.S. Department of State-2024 Tech Camp Korea | Large Language Model in the Wild |
2024.10 | Republic of Korea Army Headquarters | Large Language Model in the Wild |
2024.09 | Seoul National University | Large Language Model in the Wild |
2024.09 | Embassy of the United States in Seoul | The entire journey to becoming a natural language processing researcher |
2024.07 | Sungkyunkwan University | SOLAR: The Next Frontier in Large Language Models by Upstage and its Ecosystem |
2024.06 | KCC 2024 | SOLAR: The Next Frontier in Large Language Models by Upstage and its Ecosystem |
2024.05 | Applied ML: LLMs and Knowledge Graphs- Tokyo | The Ecosystem of LLMs from a Real-World Perspective |
2024.05 | AI Safety Compass Conference 2024 | Data and Evaluation Methods for Trustworthy AI |
2024.05 | ETRI | SOLAR: The Next Frontier in Large Language Models by Upstage and its Ecosystem |
2024.05 | AI Tech 2024 - AI Frontier for AI Era | SOLAR: The Next Frontier in Large Language Models by Upstage and its Ecosystem |
2024.05 | No code·Low code Hyper-automation Conference 2024 | The Ecosystem of LLMs from a Real-World Perspective |
2024.01 | The University of Tokyo | SOLAR: The Next Frontier in Large Language Models by Upstage |
2023.12 | Seoul Metropolitan Office of Education-AI & Digital Education Conference | Reset Moment by Large Language Model |
2023.12 | Korean-English Joint Seminar on Artificial Intelligence (AI) Safety and Reliability | Upstage Vision for Ethical and Trustworthy Large Language Models |
2023.09 | Korea University | From Language Model to Large Language Model |
2023.09 | HanYang University | From Language Model to Large Language Model |
2023.09 | AGI Town in Seoul | The use case of solving customer problems using LLM |
2023.08 | Google I/O Extended 2023 Incheon | From Language Model to Large Language Model |
2023.07 | Google I/O Extended 2023 Seoul | From Language Model to Large Language Model |
2023.07 | TensorFlow Korea LLM Day | Language Model to Large Language Model |
2023.06 | AI EDucation Alliance Policy lab (AIEDAP) | Deep Learning Understanding and Practice |
2023.02 | AI·DATA SUMMIT 2023 | Real-World Centric AI, (Video) |
2019.08 | NAVER | Machine Translation for everyone |