Mike Zhang


Hello there! My name is Mike Zhang. I’m a Postdoc in NLP at Aalborg University (AAU, Copenhagen campus) advised by Prof. Johannes Bjerva. My research is currently focused on NLP for Education.

Previously, I was a PhD Student in NLP at the IT University of Copenhagen (ITU) advised by Prof. Barbara Plank and Prof. Rob van der Goot. I was part of NLPnorth at ITU and MaiNLP at the Ludwig Maximilian University of Munich (LMU). I worked on Computational Job Market Analysis (or NLP for HR), where we investigated how to extract information from job advertisement data and match these to existing resources (e.g., taxonomies).

I am interested in:

  • NLP for Education: Can we improve students’ learning by giving them automatic feedback from NLP tools (e.g., language models)? How can we do this over time?
  • NLP for HR: How can we extract relevant skills from job ads and in what way can we match them with existing taxonomies to assist job centers matching candidates to jobs better?
  • My expertise is mostly on resource creation, developing annotation guidelines for data annotation, (multilingual) datasets creation in general and specific domains, and language model training on small and large scale.


Feb 20, 2024 1 paper accepted at LREC-COLING!
Jan 22, 2024 5 papers accepted at EACL 2024 (1 main, 1 findings, 3 workshop)! See you there!
Jan 14, 2024 I submitted my PhD thesis!

Selected Publications

  1. ArXiv
    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
    Shivalika Singh, Freddie Vargus, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, and 22 more authors
    In arXiv, 2024
  2. EACL 2024
    NNOSE: Nearest Neighbor Occupational Skill Extraction
    Mike Zhang, Rob van der Goot, Min-Yen Kan, and Barbara Plank
    In The 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
  3. ACL 2023
    ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
    Mike Zhang, Rob van der Goot, and Barbara Plank
    In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics, 2023
  4. NAACL 2022
    SkillSpan: Hard and Soft Skill Extraction from English Job Postings
    Mike Zhang*, Kristian Jensen*, Sif Sonniks, and Barbara Plank
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022