I'm a postdoc in Natural Language Processing at Aalborg University (Copenhagen) advised by Prof. Johannes Bjerva and Prof. Euan Lindsay. My research is currently at the intersection of NLP and Education. Additionally, I'm affiliated to the Pioneer Centre for Artifical Intelligence.

Previously, I was a PhD Student in NLP at the IT University of Copenhagen (ITU) advised by Prof. Barbara Plank and Prof. Rob van der Goot. I was part of NLPnorth at ITU and MaiNLP at the Ludwig Maximilian University of Munich (LMU). I worked on Computational Job Market Analysis (/ NLP for HR), where we investigated how to extract information (e.g., skills) from job ads data and match these to existing taxonomies.

Feel free to reach out to me if any of my work is interesting and you have ideas or would like to collaborate!

News

21 February 2025

Area Chair for ACL 2025

I'll be serving as an Area Chair for ACL 2025.

20 February 2025

Sailor2 Technical Report Released

The Sailor2 Technical Report is out!

17 February 2025

LLM Agents for Educational Feedback Pre-print Released

A Pre-print on leveraging LLM agents for educational feedback is out!

Education

2020-2024

IT University of Copenhagen

Ph.D. in Natural Language Processing

Advisor: Barbara Plank & Rob van der Goot

2018-2020

University of Groningen

M.Sc. Information Science

2015-2018

University of Groningen

B.Sc. Information Science

Publications

Technical Report 2025

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Longxu Dou, Qian Liu, Fan Zhou, Changyu Chen, ... (30 authors), Mike Zhang, Shiqi Chen, Tianyu Pang, Chao Du, Xinyi Wan, Wei Lu, Min Lin

Under Review 2025

SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems

Mike Zhang, Amalie Pernille Dilling, Léon Gondelman, Niels Erik Ruan Lyngdorf, Euan D. Lindsay, Johannes Bjerva

Under Review 2025

HIFI-KPI: A Dataset for Hierarchical KPI Extraction from Earnings Filings

Rasmus T. Aavang, Giovanni Rizzi, Rasmus Tjalk-Bøggild, Alexandre Iolov, Mike Zhang, Johannes Bjerva

CVPR'25

All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages

Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana, ... (Other Authors), Mike Zhang, Mahardika Krisna Ihsani, ... (Other Authors), Fahad Khan

CHI'25

How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale

Jeanette Falk, Yiyi Chen, Janet Rafner, Mike Zhang, Johannes Bjerva, Alexander Nolte

EDUCON'25

The Responsible Development of Automated Student Feedback with Generative AI

Euan Lindsay, Mike Zhang, Aditya Johri, Johannes Bjerva

C3NLP'25 // NB-REAL'25

DaKultur: Evaluating the Cultural Awareness of Language Models for Danish with Native Speakers

Max Müller-Eberstein*, Mike Zhang*, Elisa Bassignana, Peter Brunsgaard Trolle, Rob van der Goot

NAACL'25

SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models

Margaret Mitchell, Giuseppe Attanasio, ... (Other Authors), Mike Zhang, Sydney Zink, Zeerak Talat

ICLR'25

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Angelika Romanou, Negar Foroutan, Anna Sotnikova, ... (Other Authors), Mike Zhang, Imanol Schlag, Marzieh Fadaee, Sara Hooker, Antoine Bosselut

NoDaLiDa'25

SnakModel: Lessons Learned from Training an Open Danish Large Language Model

Mike Zhang*, Max Müller-Eberstein*, Elisa Bassignana, Rob van der Goot

NoDaLiDa'25

MorSeD: Morphological Segmentation of Danish and its Effect on Language Modeling

Rob van der Goot, Anette Jensen, Emil Allerslev Schledermann, Mikkel Wildner Kildeberg, Nicolaj Larsen, Mike Zhang, Elisa Bassignana

SEFI'24

Leveraging Large Language Models for Actionable Course Evaluation Student Feedback to Lecturers

Mike Zhang, Euan Lindsay, Frederik Bode Thorbensen, Danny Bøgsted Poulsen, Johannes Bjerva

LREC-COLING'24

Can Humans Identify Domains?

Maria Barrett*, Max Müller-Eberstein*, Elisa Bassignana*, Amalie Brogaard Pauli*, Mike Zhang*, Rob van der Goot*

ACL'24

Aya Dataset: An Open-access Collection for Multilingual Instruction Tuning

Shivalika Singh, Freddie Vargus, Daniel D’souza, ... (Other Authors), Mike Zhang, Ramith Hettiarachchi, Joseph Wilson, ... (Other Authors), Sara Hooker

NLP4HR'24

Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

Elena Senger, Mike Zhang, Rob van der Goot, Barbara Plank

NLP4HR'24

Rethinking Skill Extraction in the Job Market Domain using Large Language Models

Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, Antoine Bosselut

NLP4HR'24

JOBSKAPE: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, Antoine Bosselut

EACL'24

Entity Linking in the Job Market Domain

Mike Zhang, Rob van der Goot, Barbara Plank

EACL'24

NNOSE: Nearest Neighbor Occupational Skill Extraction

Mike Zhang, Rob van der Goot, Min-Yen Kan, Barbara Plank

ACL'23

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

Mike Zhang, Rob van der Goot, Barbara Plank

EMNLP'22

Evidence > Intuition: Transferability Estimation for Encoder Selection

Elisa Bassignana*, Max Müller-Eberstein*, Mike Zhang, Barbara Plank

RecSysHR'22

Skill Extraction from Job Postings using Weak Supervision

Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, Barbara Plank

LREC'22

Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning

Mike Zhang, Kristian Nørgaard Jensen, Barbara Plank

NAACL'22

SkillSpan: Hard and Soft Skill Extraction from English Job Postings

Mike Zhang*, Kristian Nørgaard Jensen*, Sif Dam Sonniks, Barbara Plank

EMNLP'22

Experimental Standards for Deep Learning in Natural Language Processing Research

Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Christian Hardmeier, Barbara Plank

NoDaLiDa'21

De-identification of Privacy-related Entities in Job Postings

Kristian Nørgaard Jensen, Mike Zhang, Barbara Plank

EMNLP'21

Cartography Active Learning

Mike Zhang, Barbara Plank

WMT'19

The Effect of Translationese in Machine Translation Test Sets

Mike Zhang, Antonio Toral

SemEval'19

Grunn2019 at SemEval-2019 Task 5: Shared Task on Multilingual Detection of Hate

Mike Zhang, Roy David, Leon Graumans, Gerben Timmerman

Experience

02/25 - Present

Postdoctoral Researcher Aalborg University

Advisor: Johannes Bjerva & Euan D. Lindsay

Investigating Educational Feedback tools for improving student learning.

10/23 - 11/23

Ph.D. Research Visitor EPFL

Advisor: Syrielle Montariol & Antoine Bosselut

Worked synthetic data and Large Language Models for Skill Extraction.

02/23 - 07/23

Ph.D. Research Visitor National University of Singapore

Advisor: Min-Yen Kan

Worked Retrieval Augmented methods for Skill Extraction.

07/22 - 12/22

Ph.D. Research Intern NEC Laboratories Europe GmBH

Advisor: Kiril Gashteovski

Investigated data-centric methods to improve Open Information Extraction.