Please email any one of us if you wish to discuss further.

Translating between text levels: unsupervised translation for text simplification

Description

In automatic text simplification the aim is to translate between sentences of different difficulty levels. Common neural machine translation methods rely on large parallel corpora for training (Stajner et al 2017), which limits the generalizability of these methods to other languages, use-cases and domains. This project instead aims to explore the task of unsupervised translation for simplification where only monolingual corpora of different difficulties are given. Very recent attempts on unsupervised sentence simplification exist (Surya et al 2019; Zhao et al 2020) and can function as a solid basis for this project. Possible directions for this project include: [i] exploring learning jointly on multiple objectives, as text simplification is related to several tasks such as sentence paraphrasing or summarization, [ii] unsupervised multilingual simplification for languages such as Spanish or Italian (Aprosio et al 2019; Martin et al 2020), [iii] cross-domain performance when simplifying sentences for a different domain than trained on (e.g Wikipedia and Newsela).

Resources

References

Linking language comprehension and production embeddings in vector space

Description

Integrated teaching and learning platforms are becoming increasingly sophisticated. Recent work has employed neural models to create vectors representing a learner's skill-set and the learning tasks available to them. When these embeddings occupy the same vector space, they can be used to recommend tasks appropriate to the learner. Previous work by Moore et al (2019) has modelled latent user proficiencies and tasks as skills embeddings in the STEM domain. Building on work by Chen & Meurers (2019), the aim of this project is to apply a similar modelling approach to the domain of language learning. In particular this project would aim to model both the learner's written language proficiency and their reading proficiency. It is anticipated that the vectors representing the proficiencies will not overlap: therefore part of this project will involve mapping such that reading competence may be predicted from writing competence and vice versa.

Resources

References