Publications

Peer-reviewed publications. (* = equal contribution)

2024

  1. ArXiv
    Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
    Shivalika Singh, Freddie Vargus, Börje F. Karlsson, Abinaya Mahendiran, Wei-Yin Ko, Herumb Shandilya, Jay Patel, Deividas Mataciunas, Laura OMahony, Mike Zhang, and 22 more authors
    In arXiv, 2024
  2. EACL 2024
    NNOSE: Nearest Neighbor Occupational Skill Extraction
    Mike Zhang, Rob van der Goot, Min-Yen Kan, and Barbara Plank
    In The 18th Conference of the European Chapter of the Association for Computational Linguistics, 2024
  3. EACL 2024
    Entity Linking in the Job Market Domain
    Mike Zhang, Rob van der Goot, and Barbara Plank
    In Findings of the Association for Computational Linguistics: EACL 2024, 2024
  4. NLP4HR 2024
    Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings
    Elena Senger, Mike Zhang, Rob van der Goot, and Barbara Plank
    In The 1st Workshop on Natural Language Processing for Human Resources (EACL Workshop), 2024
  5. NLP4HR 2024
    JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching
    Antoine Magron, Anna Dai, Mike Zhang, Syrielle Montariol, and Antoine Bosselut
    In The 1st Workshop on Natural Language Processing for Human Resources (EACL Workshop), 2024
  6. NLP4HR 2024
    Rethinking Skill Extraction in the Job Market Domain using Large Language Models
    Khanh Cao Nguyen, Mike Zhang, Syrielle Montariol, and Antoine Bosselut
    In The 1st Workshop on Natural Language Processing for Human Resources (EACL Workshop), 2024

2023

  1. ACL 2023
    ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
    Mike Zhang, Rob van der Goot, and Barbara Plank
    In Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics, 2023

2022

  1. EMNLP 2022
    Evidence > Intuition: Transferability Estimation for Encoder Selection
    Elisa Bassignana*, Max Müller-Eberstein*, Mike Zhang*, and Barbara Plank
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
  2. EMNLP 2022
    Experimental Standards for Deep Learning in Natural Language Processing Research
    Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, and Barbara Plank
    In Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
  3. RecSys HR’22
    Skill Extraction from Job Postings using Weak Supervision
    Mike Zhang, Kristian Nørgaard Jensen, Rob van der Goot, and Barbara Plank
    In RecSys in HR’22: The 2nd Workshop on Recommender Systems for Human Resources, in conjunction with the 16th ACM Conference on Recommender Systems, September 18–23, 2022, Seattle, USA., 2022
  4. NAACL 2022
    SkillSpan: Hard and Soft Skill Extraction from English Job Postings
    Mike Zhang*, Kristian Jensen*, Sif Sonniks, and Barbara Plank
    In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
  5. LREC 2022
    Kompetencer: Fine-grained Skill Classification in Danish Job Postings via Distant Supervision and Transfer Learning
    Mike Zhang*, Kristian Nørgaard Jensen*, and Barbara Plank
    In Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
  6. SMILES 2022
    Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective
    Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Christian Hardmeier, and Barbara Plank
    arXiv preprint arXiv:2204.06251, 2022

2021

  1. EMNLP 2021
    Cartography Active Learning
    Mike Zhang, and Barbara Plank
    In Findings of the Association for Computational Linguistics: EMNLP 2021, 2021
  2. NoDaLiDa 2021
    De-identification of Privacy-related Entities in Job Postings
    Kristian Nørgaard Jensen*, Mike Zhang*, and Barbara Plank
    In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), 2021

2019

  1. WMT 2019
    The Effect of Translationese in Machine Translation Test Sets
    Mike Zhang, and Antonio Toral
    In Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers), 2019
  2. SemEval 2019
    Grunn2019 at SemEval-2019 task 5: Shared task on multilingual detection of hate
    Mike Zhang, Roy David, Leon Graumans, and Gerben Timmerman
    In Proceedings of the 13th International Workshop on Semantic Evaluation, 2019