The project Inducing Syntactic Structure, funded by the DFG, explores novel approaches to induce syntactic structures from language models. It is led by Laura Kallmeyer at Heinrich Heine Universität Düsseldorf. The project is conducted in collaboration with Prof. Hassan Sajjad, Director of the HyperMatrix Research Lab at Dalhousie University, Halifax.

Funding phase: 10/2024 – 09/2027.

Project description

The starting point of this project is the observation that (i) across syntactic theories, across treebank formats and across languages, a large variety of syntactic structures have been proposed; and (ii) it has been shown that self-supervised contextual language models (LMs) capture syntactic information to a certain extent though it is not clear how these models generalize. In this project, we want to remain neutral with respect to the underlying theory and we want to induce syntactic constituency structure in an unsupervised way from LMs. We will experiment with different types of neural network architectures that make different assumptions concerning the overall hierarchical structures that we extract. Our central research questions are

  • How can we automatically learn syntactic structure from processing raw text?
  • How do the emerging structures relate to established constituency from linguistic theory?
  • How useful are the emerging structures for NLP applications?