About Me

Chanjun Park is a researcher in the field of Natural Language Processing (NLP), with a focus on Data-Centric AI, Machine Translation and Large Language Models (LLM). He is currently working as a Principal Research Engineer & Technical Leader at the Upstage LLM Team. His current interests include forming an LLM-based ecosystem, and he has initiated projects such as SOLAR, Open Ko-LLM Leaderboard, Dataverse and Up 1 Trillion Token Club. In 2023, he received his Ph.D. from Korea University, supervised by Professor Heuiseok Lim, for his work on “Data-Centric Neural Machine Translation”. From 2018 to 2019, he worked at SYSTRAN as a Research Engineer. Chanjun is the founder and chief scientist of the KU-NMT Group, and has received the Naver Ph.D. Fellowship in 2021. He served as the Virtual Social Chair at COLING 2022, and is currently serving as the Program Chair for the WiNLP Workshop and Publication Chair for the DMLR Workshop. He has published more than 170 papers in the field of NLP. Finally, he is selected for Forbes 30 Under 30 Korea in the SCIENCE / SW field. See CV for more information.

Research Interest

Large Language Model (LLM), Data-Centric AI, Machine Translation

Education

Professional Experience

Academic Services

External Activities

Publications

Preprints

  1. Self-Improving-Leaderboard(SIL): A Call for Real-World Centric Natural Language Processing Leaderboards
    Chanjun Park, Hyeonseok Moon, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
    arxiv, 2023

  2. Language Chameleon: Transformation analysis between languages using Cross-lingual Post-training based on Pre-trained language models
    Suhyune Son, Chanjun Park, Jungseob Lee, Midan Shim, Chanhee Lee, Yoonna Jang, Jaehyung Seo, Heuiseok Lim (Equal Contribution(First Co-Author))
    arxiv, 2022

  3. There is no rose without a thorn: Finding weaknesses on BlenderBot 2.0 in terms of Model, Data and User-Centric Approach
    Jungseob Lee, Suhyune Son, Midan Shim, Yujin Kim, Chanjun Park, Heuiseok Lim (Equal Contribution(First Co-Author))
    arxiv, 2022

International Conference (Main / Workshop)

  1. SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
    Sanghoon Kim, Dahyun Kim, Chanjun Park, Wonsung Lee, Wonho Song, Yunsu Kim, Hyeonwoo Kim, Yungi Kim, Hyeonju Lee, Jihoo Kim, Changbae Ahn, Seonghoon Yang, Sukyung Lee, Hyunbyung Park, Gyoungjin Gim, Mikyoung Cha, Hwalsuk Lee, Sunghun Kim (Equal Contribution(First Co-Author), Corresponding Author)
    NAACL 2024 Industry Track, 2024

  2. Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism
    Chanjun Park, Minsoo Khang, Dahyun Kim
    ICLR 2024 - Data-centric Machine Learning Research (DMLR) Workshop, 2024

  3. Leveraging Pre-existing Resources for Data-Efficient Counter-Narrative Generation in Korean
    Seungyoon Lee, Chanjun Park (Corresponding Author), DaHyun Jung, Hyeonseok Moon, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim (Corresponding Author)
    LREC-COLING 2024, 2024

  4. KNOTICED: A Dataset for Critical Error Detection in English-Korean Machine Translation
    Sugyeong Eo, Jungwoo Lim, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    LREC-COLING 2024, 2024

  5. Hyper-BTS Dataset: Scalability and Enhanced Analysis of Back TranScription (BTS) for ASR Post-Processing
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Junyoung Son, Hyeonseok Moon, Sugyeong Eo, Chanhee Lee, Heuiseok Lim
    EACL 2024 (Findings of EACL 2024), 2024

  6. Generative Interpretation: Toward Human-Like Evaluation for Educational Question-Answer Pair Generation
    Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Jaehyung Seo, Heuiseok Lim
    EACL 2024 (Findings of EACL 2024), 2024

  7. KEBAP: Korean Error Explainable Benchmark Dataset for ASR and Post-processing
    Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim  (Equal Contribution(First Co-Author))
    EMNLP 2023, 2023

  8. CHEF in the Language Kitchen: A Generative Data Augmentation Leveraging Korean Morpheme Ingredients
    Jaehyung Seo, Hyeonseok Moon, Jaewook Lee, Sugyeong Eo, Chanjun Park, Heuiseok Lim
    EMNLP 2023, 2023

  9. Proceedings of the Seventh Widening NLP Workshop (WiNLP 2023)
    Bonaventure F. P. Dossou, Isidora Tourni, Hatem Haddad, Shaily Bhatt, Fatemehsadat Mireshghallah, Sunipa Dev, Tanvi Anand, Weijia Xu, Atnafu Lambebo Tonja, Alfredo Gomez, Chanjun Park
    EMNLP 2023, 2023

  10. Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
    Seungyoon Lee, DaHyun Jung, Chanjun Park, Seolhwa Lee, Heuiseok Lim (Equal Contribution(First Co-Author))
    ICDM 2023 - The First Workshop on Data-Centric AI, 2023

  11. Informative Evidence-guided Prompt-based Fine-tuning for English-Korean Critical Error Detection
    DaHyun Jung, Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    IJCNLP-AACL 2023, 2023

  12. Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction
    Chanjun Park, Seonmin Koo, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  13. DMOps: Data Management Operation and Recipes
    Eujeong Choi, Chanjun Park (Corresponding Author)
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  14. Inter-Annotator Agreement in the Wild: Uncovering Its Emerging Roles and Considerations in Real-World Scenarios
    NamHyeok Kim, Chanjun Park (Corresponding Author)
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  15. Transcending Traditional Boundaries: Leveraging Inter-Annotator Agreement (IAA) for Enhancing Data Management Operations (DMOps)
    Damrin Kim, NamHyeok Kim, Chanjun Park (Corresponding Author), Harksoo Kim (Corresponding Author)
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  16. Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation
    Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  17. Toward Practical Automatic Speech Recognition and Post-Processing: a Call for Explainable Error Benchmark Guideline
    Seonmin Koo, Chanjun Park, Jinsung Kim, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim (Equal Contribution(First Co-Author))
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  18. Knowledge Graph-Augmented Korean Generative Commonsense Reasoning
    Dahyun Jung, Jaehyung Seo, Jaewook Lee, Chanjun Park, Heuiseok Lim
    ICML 2023 - Data-centric Machine Learning Research (DMLR) Workshop, 2023

  19. Improving Formality-Sensitive Machine Translation using Data-Centric Approaches and Prompt Engineering
    Seugnjun Lee, Hyeonseok Moon, Chanjun Park, Heuiseok Lim
    IWSLT 2023 - ACL 2023, 2023

  20. PEEP-Talk: A Situational Dialogue-based Chatbot for English Education
    Seugnjun Lee, Yoonna Jang, Chanjun Park, Jungseob Lee, Jaehyung Seo, Hyeonseok Moon, Sugyeong Eo, Seounghoon Lee, Bernardo Nugroho Yahya, Heuiseok Lim
    ACL 2023 - Demo Track, 2023

  21. PicTalky: Augmentative and Alternative Communication for Language Developmental Disabilities
    Chanjun Park, Yoonna Jang, Seolhwa Lee, Jaehyung Seo, Kisu Yang, Heuiseok Lim
    AACL-IJCNLP 2022 - Demo Track, 2022

  22. KU X Upstage’s submission for the WMT22 Quality Estimation: Critical Error Detection Shared Task
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim
    WMT 2022 - EMNLP 2022, 2022

  23. QUAK: A Synthetic Quality Estimation Dataset for Korean-English Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Gyeongmin Kim, Jungseob Lee, Heuiseok Lim
    COLING 2022, 2022

  24. Focus on FoCus: Is FoCus focused on Context, Knowledge and Persona?
    SeungYoon Lee, Jungseob Lee, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Jaehyung Seo, Jeongbae Park, Heuiseok Lim
    COLING 2022 - The 1st Workshop on Customized Chat Grounding Persona and Knowledge , 2022

  25. A Self-Supervised Automatic Post-Editing Data Generation Tool
    Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Seungjun Lee, Heuiseok Lim
    ICML 2022 - DataPerf workshop, 2022

  26. A Dog Is Passing Over The Jet? A Text-Generation Dataset for Korean Commonsense Reasoning and Evaluation
    Jaehyung Seo, Seounghoon Lee, Chanjun Park, Yoonna Jang, Hyeonseok Moon, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
    NAACL 2022 - Findings, 2022

  27. Priming Ancient Korean Neural Machine Translation
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    LREC 2022, 2022 - (Oral presentation)

  28. FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
    Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim
    LREC 2022, 2022 - (Poster)

  29. Empirical Analysis of Synthetic Data Generation Using Noising Strategies for Automatic Post-editing
    Hyeonseok Moon, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Jeongsub Lee, Sugyeong Eo, Heuiseok Lim
    LREC 2022, 2022 - (Poster)

  30. FreeTalky: Don’t Be Afraid! Conversations Made Easier by a Humanoid Robot using Persona-based Dialogue
    Chanjun Park, Yoonna Jang, Seolhwa Lee, Sungjin Park, Heuiseok Lim
    AAAI 2022 -Artificial Intelligence for Education(AI4EDU), 2022

  31. How should human translation coexist with NMT? Efficient tool for building high quality parallel corpus
    Chanjun Park, Seolhwa Lee, Hyeonseok Moon, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop, 2021

  32. A New Tool for Efficiently Generating Quality Estimation Datasets
    Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop, 2021

  33. Automatic Knowledge Augmentation for Generative Commonsense Reasoning
    Jaehyung Seo, Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    NeurIPS 2021 - Data-centric AI (DCAI) workshop, 2021

  34. Syntax-enhanced Dialogue Summarization using Syntax-aware information
    Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
    NeurIPS 2021 - Women in Machine Learning (WiML 2021) workshop, 2021 - (Contributed Talk / Oral presentation)

  35. Towards Syntax-Aware Dialogue Summarization using Multi-task Learning
    Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
    EMNLP 2021 -Widening NLP (WiNLP2021) workshop, 2021 - (Poster)

  36. Two Heads are Better than One? Verification of Ensemble Effect in Neural Machine Translation
    Chanjun Park, Sungjin Park, Seolhwa Lee, Taesun Whang, Heuiseok Lim
    EMNLP 2021 -The Second Workshop on Insights from Negative Results in NLP, 2021 - (Oral presentation)

  37. BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Hyeonseok Moon, Sugyeong Eo, Heuiseok Lim
    ACL 2021 -WAT(Workshop on Asian Translation) 2021 Workshop, 2021 - (oral presentation)

  38. Dealing with the Paradox of Quality Estimation
    Sugyeong Eo, Chanjun Park, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim (Equal Contribution(First Co-Author))
    MT Summit 2021 - LoResMT, 2021 - (Oral presentation)

  39. Should we find another model?: Improving Neural Machine Translation Performance with ONE-Piece Tokenization Method without Model Modification
    Chanjun Park, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim
    NAACL-HLT 2021 Industry Track, 2021- (Poster/Oral presentation)

International Journal (SCI/SCIE)

  1. Enhancing Machine Translation Quality Estimation via Fine-grained Error Analysis and Large Language Model
    Dahyun Jung, Chanjun Park, Sugyeong Eo, Heuiseok Lim
    Mathematics, 2023

  2. Uncovering the Risks and Drawbacks Associated with the Use of Synthetic Data for Grammatical Error Correction
    Seonmin Koo, Chanjun Park, Seolhwa Lee, Jaehyung Seo, Sugyeong Eo, Hyeonseok Moon, Heuiseok Lim (Equal Contribution(First Co-Author))
    IEEE Access, 2023

  3. Doubts on the Reliability of Parallel Corpus Filtering
    Hyeonseok Moon, Chanjun Park , Seonmin Koo, Jungseob Lee, Seungjun Lee, Jaehyung Seo, Sugyeong Eo, Yoonna Jang, Hyunjoong Kim, Hyoung-gyu Lee, Heuiseok Lim
    Expert Systems With Applications, 2023

  4. A Survey on Evaluation Metrics for Machine Translation
    Seungjun Lee, Jungseob Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Seonmin Koo, Heuiseok Lim
    Mathematics, 2023

  5. K-NCT: Korean Neural Grammatical Error Correction Gold-Standard Test Set Using Novel Error Type Classification Criteria
    Seonmin Koo, Chanjun Park, Jaehyung Seo, Seungjun Lee, Hyeonseok Moon, Jungseob Lee, Heuiseok Lim (Equal Contribution(First Co-Author))
    IEEE Access, 2022

  6. Plain Template Insertion: Korean-Prompt-based Engineering for Few-shot Learners
    Jaehyung Seo, Hyeonseok Moon, Chanhee Lee, Sugyeong Eo, Chanjun Park, Jihoon Kim, Changwoo Chun, Heuiseok Lim
    IEEE Access, 2022

  7. The ASR post-processor performance challenges of BackTranScription (BTS) : Data-Centric and Model-Centric Approaches
    Chanjun Park, Jaehyung Seo, Seolhwa Lee, Chanhee Lee, Heuiseok Lim
    Mathematics, 2022

  8. PU-GEN: Enhancing Generative Commonsense Reasoning for Language Models with Human-Centered Knowledge
    Jaehyung Seo, Dongsuk Oh, Sugyeong Eo, Chanjun Park, Kisu Yang, Hyeonseok Moon, Kinam Park, Heuiseok Lim
    Knowledge-Based Systems, 2022

  9. Utilization Strategy of User Engagements in Korean Fake News Detection
    Myunghoon Kang, Jaehyung Seo, Chanjun Park, Heuiseok Lim
    IEEE Access, 2022

  10. BERTOEIC: Solving TOEIC Problems Using Simple and Efficient Data Augmentation Techniques with Pretrained Transformer Encoders
    Jeongwoo Lee, Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim
    Applied Sciences, 2022

  11. Empirical Analysis of Parallel Corpora and in-depth Analysis using LIWC
    Chanjun Park, Midan Shim, Sugyeong Eo, Seolhwa Lee, Jaehyung Seo, Hyeonseok Moon, Heuiseok Lim
    Applied Sciences, 2022

  12. AI for Patents: A Novel yet Effective and Efficient Framework for Patent Analysis
    Junyoung Son, Hyeonseok Moon, Jeongwoo Lee, Seolhwa Lee, Chanjun Park, Wonkyung Jung, Heuiseok Lim
    IEEE Access, 2022

  13. AI student: A Machine Reading Comprehension System for the Korean College Scholastic Ability Test
    Gyeongmin Kim, Soomin Lee, Chanjun Park, Jaechoon Jo
    Mathematics, 2022

  14. Return on Advertising Spend Prediction with Task Decomposition based LSTM Model
    Hyeonseok Moon, Taemin Lee, Jaehyung Seo, Chanjun Park, Sugyeong Eo, Imatitikua D. AIyanyo, Jeongbae Park, Aram So, Kyoungwha Ok, Kinam Park
    Mathematics, 2022

  15. Word-level Quality Estimation for Korean-English Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim (Equal Contribution(First Co-Author))
    IEEE Access, 2022

  16. Dense-to-Question and Sparse-to-Answer: Hybrid Retriever System for Industrial Frequently Asked Questions
    Jaehyung Seo, Taemin Lee, Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Imatitikua D AIyanyo, Kinam Park, Aram So, Sungmin Ahn, Jeongbae Park
    Mathematics, 2022

  17. Mimicking Infants’ Bilingual Language Acquisition for Domain Specialized Neural Machine Translation
    Chanjun Park, Woo-Young Go, Sugyeong Eo, Hyeonseok Moon, Seolhwa Lee, Heuiseok Lim
    IEEE Access, 2022

  18. An Automatic Post Editing with Efficient and Simple Data Generation Method
    Hyeonseok Moon, Chanjun Park, Jaehyung Seo, Sugyeong Eo, Heuiseok Lim (Equal Contribution(First Co-Author))
    IEEE Access, 2022

  19. Who speaks like a style of Vitamin: Towards Syntax-Aware Dialogue Summarization using Multi-task Learning
    Seolhwa Lee, Kisu Yang, Chanjun Park, João Sedoc, Heuiseok Lim
    IEEE Access, 2021

  20. Grounded Vocabulary for Image Retrieval Using a Modified Multi-Generator Generative Adversarial Network
    Kuekyeng Kim, Chanjun Park, Jaehyung Seo, Heuiseok Lim
    IEEE Access, 2021

  21. An Empirical Study on Automatic Post Editing for Neural Machine Translation
    Hyeonseok Moon, Chanjun Park, Sugyeong Eo, Jaehyung Seo, Heuiseok Lim (Equal Contribution(First Co-Author))
    IEEE Access, 2021

  22. Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multi-Domain Task-Oriented Dialogue
    Jeiyoon Park, Chanhee Lee, Chanjun Park, Kuekyeng Kim, Heuiseok Lim
    Applied Sciences, 2021

  23. Comparative Analysis of Current Approaches to Quality Estimation for Neural Machine Translation
    Sugyeong Eo, Chanjun Park, Hyeonseok Moon, Jaehyung Seo, Heuiseok Lim (Equal Contribution(First Co-Author))
    Applied Sciences, 2021

  24. Exploring the Data Efficiency of Cross-Lingual Post-Training in Pretrained Language Models
    Chanhee Lee, Kisu Yang, Taesun Whang, Chanjun Park, Andrew Matteson, Heuiseok Lim
    Applied Sciences, 2021

  25. Decoding Strategies for Improving Low-Resource Machine Translation
    Chanjun Park, YeongWookYang, Kinam Park, Heuiseok Lim
    Electronics, 2020

  26. Ancient Korean Neural Machine Translation
    Chanjun Park, Chanhee Lee, YeongWookYang, Heuiseok Lim
    IEEE Access, 2020

  27. Comparison of the evaluation metrics for Neural Grammatical Error Correction with Overcorrection
    Chanjun Park, YeongWookYang, Chanhee Lee, Heuiseok Lim
    IEEE Access, 2020

  28. Neural Spelling Correction: Translating Incorrect sentences to Correct sentences for Multimedia
    Chanjun Park, Kuekyeng Kim, YeongWookYang, Minho Kang, Heuiseok Lim
    Multimedia Tools and Applications, 2020

Domestic Journal (KCI) / Domestic Conference

Book Chapters

  1. Data-Centric Neural Machine Translation - A Real-World Approaches
    Chanjun Park
    Ph.D. Thesis

  2. Natural Langugae Processing Bible
    HeuiSeok Lim, Korea University NLP&AI Lab
    Human Science

International Patents

  1. METHOD FOR GENERATING TRAINING DATA AND METHOD FOR POST-PROCESSING OF SPEECH RECOGNITION USING THE SAME
    HeuiSeok Lim, Chanjun Park
    Apply for a patent (17/739,383)

  2. METHOD OF BUILDING TRAINING DATA OF MACHINE TRANSLATION
    HeuiSeok Lim, Chanjun Park
    Apply for a patent (PCT/KR2021/012195)

Domestic Patents

  1. DEVICE AND METHOD FOR GENERATING OF TRAINING DATA FOR QUALITY ESTIMATION IN MACHINE TRANSLATION
    HeuiSeok Lim, Sugyeong Eo, Chanjun Park, Hyeonseok Moon
    Granted Patent (10-2593447)

  2. APPRATUS FOR CORPUS PROCESSING, APPARATUS AND METHOD AND MATHINE TRANSLATION
    Chanjun Park, HeuiSeok Lim
    Granted Patent (10-2574167)

  3. DEVICE AND METHOD FOR GENERATING TRAINING DATA FOR AUTOMATIC POST EDITING
    HeuiSeok Lim, Hyeonseok Moon, Chanjun Park, Sugyeong Eo
    Apply for a patent (10-2021-0118924)

  4. DEVICE AND METHOD FOR GENERATING OPTIMAL TRANSLATION SUBTITLE USING QUALITY ESTIMATION
    HeuiSeok Lim, Chanjun Park
    Apply for a patent (10-2021-0117011)

  5. Improving speech recognition performance using TTS in domain-specific environment
    HeuiSeok Lim, Chanjun Park
    Apply for a patent (10-2021-0028816)

  6. Method For Generating Training Data And Method For Post-Processing Of Speech Recognition Using The Same
    HeuiSeok Lim, Chanjun Park
    Granted Patent (10-2557810)

  7. METHOD OF BUILDING TRAINING DATA OF MACHINE TRANSLATION
    HeuiSeok Lim, Chanjun Park
    Granted Patent (10-2409667)

  8. Correction performance evaluation metrics of neural network machine translation and method of constructing the same
    HeuiSeok Lim, Chanjun Park
    Granted Patent (10-2390154)

  9. APPARATUS AND METHOD FOR OUTPUTTING IMAGE CORRESPONDING TO LANGUAGE
    HeuiSeok Lim, Chanjun Park, Yanghee Kim
    Granted Patent (10-2476497)

  10. METHOD OF TRANSLATING ANCIENT KOREAN USING MACHINE TRANSLATION
    HeuiSeok Lim, Chanjun Park
    Granted Patent (10-2425922)

  11. Device and method for correcting Korean spelling
    HeuiSeok Lim, Chanjun Park
    Granted Patent (10-2430918)

Teaching

  1. Language Model to Large Language Model (LM to LLM), Instructor, Fast Campus. (2023)
  2. Natural Language Processing (NLP) Basic, Instructor, Fast Campus. (2023)
  3. Learning ChatGPT Utilization and Service Construction with AskUP, Instructor, Fast Campus. (2023)
  4. Finance Specialized Large Language Model for Everyone, Instructor, Upstage Online Course. (2023)
  5. Data-Centric NLP, Master (Instructor), BoostCamp - NAVER Connect Foundation. (2023)
  6. Introduction to Natural Language Processing in Big Data (BDC101), Teaching Assistant, Korea Univ. (Autumn 2021)
  7. Introduction to Natural Language Processing in Big Data (BDC101), Head Teaching Assistant, Korea Univ. (Autumn 2020)
  8. Natural Language Processing for Digital Finance Engineering (DFE610), Head Teaching Assistant, Korea Univ. (Autumn 2020)
  9. Natural Language Processing (COSE461), Teaching Assistant, Korea Univ. (Spring 2020)
  10. Artificial Intelligence and Natural Language Processing (DFC615), Teaching Assistant, Korea Univ. (Spring 2020)

Honors & Awards

Year Award
2024.02 Forbes 30 Under 30 Korea
2023.10 Best Paper Award, The 35th Annual Conference on Human & Cognitive Language Technology (HCLT2023) - Language Model 2 Section
2023.10 Best Paper Award, The 35th Annual Conference on Human & Cognitive Language Technology (HCLT2023) - Data Bias & Ethics Section
2023.02 Best Paper Award, Korea University
2023.02 Research Encouragement Scholarship, Korea University
2022.12 1st place in Quality Estimation Shared Task 2022 - Sentence-level “Critical Error Detection”, WMT 2022 (EMNLP 2022)
2022.10 Best Paper Award, The 34th Annual Conference on Human & Cognitive Language Technology (HCLT2022)
2021.12 Naver Ph.D. Fellowship 2021
2021.10 Best Paper Award, The 33rd Annual Conference on Human & Cognitive Language Technology (HCLT2021) - NLP Application 2 Section
2021.10 Best Paper Award, The 33rd Annual Conference on Human & Cognitive Language Technology (HCLT2021) - Language Resource Section
2021.10 Best Paper Award, The 33rd Annual Conference on Human & Cognitive Language Technology (HCLT2021) - QA and Speech Section
2021.07 Ranked 4th on the CommonGen 1.1 Leaderboard (Nov. 2022 Ranked 7th, CommonGen 1.1)
2020.11 1st Place in Flitto Hackathon (Team Lead)
2020.10 Best Paper Award, The 32nd Annual Conference on Human & Cognitive Language Technology (HCLT2020)
2020.05 Best practices for using NIA AI training data(Korean-English Neural Machine Translation model), NIA
2019.10 Best Paper Award, The 31st Annual Conference on Human & Cognitive Language Technology (HCLT2019)
2019.10 1st Place Microsoft AI Accessibility Hackathon in Korea (Team Lead), Microsoft
2019.03 Graduate School Associate Scholarship, Sungkyunkwan University
2018.10 Next Generation Information Processing NLP Competition 2018: Participation Award, Next-generation information computing technology development business
2017.06 Bit Computer Excellence Award (President Award), Bit Computer
2017.12 Scholarship for academic excellence, Sooyoungro Church
2016.12 Scholarship for academic excellence, Sooyoungro Church
2015.03 Full Scholarship, BUFS

Invited Talk

Year Place Contents
2024.02 Korea University College of Medicine - Intelligent Medical Data Lab SOLAR: The Next Frontier in Large Language Models by Upstage
2024.02 Fast Campus Job Employment Special Lecture
2024.02 FuriosaAI SOLAR: The Next Frontier in Large Language Models by Upstage
2024.01 The University of Tokyo - Center for Data-Driven Discovery SOLAR: The Next Frontier in Large Language Models by Upstage
2024.01 Defense Agency for Technology and Quality (DTaQ) Current State of Artificial Intelligence (AI) and Its Application Strategies in the Defense Sector
2024.01 National Library of Korea Upstage LLM One Pager
2023.12 Ministry of Science and ICT - Data Utilization Council Upstage LLM One Pager
2023.12 Seoul Metropolitan Office of Education-AI & Digital Education Conference Reset Moment by Large Language Model
2023.12 Korean-English Joint Seminar on Artificial Intelligence (AI) Safety and Reliability Upstage Vision for Ethical and Trustworthy Large Language Models
2023.11 Digital Innovation Forum (Ministry of Culture, Sports and Tourism) Generative AI era: Copyright issues and countermeasures
2023.11 Kyungpook National University Upstage LLM One Pager
2023.11 Future Technology Exchange Conference in Southeast Region Upstage LLM One Pager
2023.11 Jeju National University - Guest lecture Upstage LLM One Pager
2023.11 Busan University of Foreign Studies From Language Model to Large Language Model
2023.11 KB Kookmin Bank Areas of Generative AI Application in Business
2023.11 Human-Inspired AI Research Upstage LLM One Pager
2023.10 NIA Upstage LLM and latest B2B LLM trends
2023.09 Woongjin Thinkbig Upstage LLM One Pager
2023.09 Kyobo Life Insurance Upstage and Private LLM
2023.09 HD Korea Shipbuilding & Offshore Engineering Upstage and Private LLM
2023.09 Korea University - Guest lecture From Language Model to Large Language Model
2023.09 HanYang University - Guest lecture From Language Model to Large Language Model
2023.09 Korea Institute of Patent Information Upstage LLM and latest B2B LLM trends
2023.09 AGI Town in Seoul The use case of solving customer problems using LLM
2023.09 Fast Campus Job Employment Special Lecture
2023.08 HD Korea Shipbuilding & Offshore Engineering Upstage LLM and latest B2B LLM trends
2023.08 Upstage Webinar The Future of Finance/Insurance Transformed by Generative AI
2023.08 Google I/O Extended 2023 Incheon From Language Model to Large Language Model
2023.07 POSCO RIST Data-Centric AI in Real-World
2023.07 Google I/O Extended 2023 Seoul From Language Model to Large Language Model
2023.07 TensorFlow Korea LLM Day Language Model to Large Language Model
2023.07 Jeju National University Real-World Artificial Intelligence and Large Language Model for Everyone
2023.06 AI EDucation Alliance Policy lab (AIEDAP) Deep Learning Understanding and Practice
2023.05 KIDA (Korea Institute for Defense Analysis) Data-Centric AI in the Large Language Model Era
2023.04 Upstage NLP based Large Language Model for All
2023.02 AI·DATA SUMMIT 2023 Real-World Centric AI, (Video)
2022.12 Sunmoon University Real-World Centric AI
2022.08 Kyungsung University Language and Information Studies and the Future of Artificial Intelligence
2022.07 Hankuk University of Foreign Studies Basic practice of natural language processing for everyone
2022.01 Dongguk University Artificial intelligence and Machine Translation
2021.07 Busan Social Welfare Development Group Attending advisory meetings and Focus Group Interview
2020.03 LLsoLLu Latest natural language processing Research
2020.02 NC SOFT Technology Transfer Seminar
2020.01 Dongguk University A.I - NLP - MT for Liberal Arts
2019.10-2019.11 SKC Text Preprocessing, Machine Translation, Language Embedding
2019.08 SK T Academy Machine Translation for everyone
2019.08 NAVER Machine Translation for everyone

Media Coverages (Press, Blog)

Year Headline Press
2024.03 Story Pack - 박찬준 디지털 데일리 콘텐츠랩
2024.03 데이터 처리에 관한 모든 것, 데이터버스의 오픈 소스 Upstage Tech Blog
2024.03 AI 끝판왕 ‘AGI’ 위한 3박자…전문가들 의견 봤더니 Tech World News
2024.03 업스테이지, 데이터 전처리 메커니즘 ‘데이터버스’ 오픈 소스 공개 AI TIMES
2024.03 이세영 뤼튼 대표 등 AI 전문가 5인, 포브스코리아 ‘30세 미만 30인’ 선정 AI TIMES
2024.03 사임 압박에 소송전까지…’AI 패권’ 두고 쟁탈전 SBS
2024.02 포브스코리아 30세 미만 30인 2024 (5) SCIENCE/SW Forbes Korea
2024.02 포브스코리아 30세 미만 30인 2024 Forbes Korea
2024.02 Introducing the Open Ko-LLM Leaderboard: Leading the Korean LLM Evaluation Ecosystem Hugging Face
2024.02 Ko-LLM 리더보드, 5개월간 대성공…확장·전환 통해 실사용 도움 될 것 AI TIMES
2024.01 버티컬 시장에 쏠리는 눈…AI발 M&A 슈퍼사이클 열린다 이투데이
2023.12 과기정통부, 제4회 AI 데이터 활용협의회 개최 아시아투데이 등 다수 언론사
2023.12 서울교육청, 16일 ‘AI·디지털 교육 컨퍼런스’ 개최 이데일리 등 다수 언론사
2023.12 업스테이지 연구진, ‘초거대 언어모델 연구 동향’ 이라는 한국어 LLM 서베이 논문 공개 인공지능 신문
2023.12 구글 건재 보여준 AI ‘제미나이’… 전문가 “GPT-4 능가하진 않아” 문화일보
2023.12 구글 ‘제미나이’로 AI 공세…평가는 “글쎄” 전자신문
2023.12 떠오르는 ‘미스트랄 7B’…‘라마 2’ 이어 한국어 모델 세대교체 주도 AI TIMES
2023.12 LLM 리더보드? 한국에는 ‘Open Ko-LLM’이 있다! Upstage Youtube
2023.12 한국어 AI 경쟁력을 강화하는 ‘Open Ko-LLM 리더보드’ Upstage Tech Blog
2023.11 논문 쓰는 회사들… AI 기술기업들 유력 저널에 게재 서울신문
2023.11 강화학습법 ‘DPO’, ‘RLHF’ 대안으로 인기…마커AI 1위 탈환 AI TIMES
2023.10 ‘글로벌 AI 규범, 민·관이 함께!’… 개인정보위, ‘인공지능 프라이버시 민·관 정책협의회’ 출범 AI TIMES 등 다수 언론사
2023.10 인공지능 자연어처리 세계 최고 기술 기업으로…업스테이지, NLP 최고 권위 EMNLP 2023에 논문 2편 채택 AI TIMES 등 다수 언론사
2023.10 마커AI, 차트를 점령하다…최초 1위는 옴니어스닷AI AI TIMES
2023.09 NIA, 글로벌 LLM 플랫폼 ‘업스테이지’와 국내 ‘초거대 언어 모델’ 생태계 활성화에 기반 다진다! 인공지능신문
2023.09 한국어 데이터 토큰 1조개 함께 모으자 한겨레
2023.08 기업 맞춤용 챗GPT 쓰세요” 오픈AI 급하게 만든 이 숫자 중앙일보
2023.08 세계 1위 기술 증명…업스테이지에 ‘글로벌’은 도전 아닌 현실 AI TIMES
2023.08 챗GPT 제친 업스테이지 박찬준 리더 “협업이 AI 시장서 성패 가를 것” 이투데이
2023.08 업스테이지, 금융권 특화 생성형AI 접목 노하우 공개한다 뉴시스, 파이낸셜뉴스, 아시아투데이 등 다수 언론사
2023.07 AI시대, 데이터의 시대/박찬준 업스테이지 AI 리서치 엔지니어 서울신문
2023.07 데이터 AI 논문 7건, 세계적 학술지 채택 매일경제
2023.06 업스테이지, 글로벌 머신러닝 AI 학회서 논문 7편 채택 매일경제, 서울경제, 디지털투데이, 이코노믹리뷰 등 30개 이상 언론사
2023.04 Data-Centric AI 관점으로 재해석하는 자연언어처리 기반 History of AI Upstage Tech Blog
2023.04 DMOps(Data Management Operation and Recipes), 현업에서 데이터 구축하기 Upstage Tech Blog
2023.04 Data-Centric AI와 Real-World Upstage Tech Blog
2023.04 AI·DATA SUMMIT 2023-Real-World Centric AI allshow TV Youtube
2022.05 성장성·유연한 조직문화에 반했죠 서울경제