Laurie Burchell

Laurie Burchell

🦄

About

I’m a PhD student in Natural Language Processing (NLP) at the University of Edinburgh, supervised by Kenneth Heafield and Lexi Birch. Right now I’m mostly working on improving language identification systems, but I’m also interested in low-resource machine translation and multilinguality in general.

I’m happy to be part of the Centre for Doctoral Training in NLP, the Institute for Language, Cognition and Computation, and the OSCAR project.

The best way to contact me is by email: laurie[dot]burchell@ed.ac.uk

Resources

OpenLID

A model for fast natural language identification for 200+ languages, plus all the data used for training.

Multi-sentence questions

A dataset of over over 162,000 multi-sentence questions (MSQs), which are sequences of questions intended to be answered as a unit.

Selected publications

An Open Dataset and Model for Language Identification

Laurie Burchell, Alexandra Birch, Nikolay Bogoychev, Kenneth Heafield (2023)

TLDR: We curate a high-quality dataset for training language identification models which includes 201 language varieties. The fastText model we train on this data outperforms previous high-coverage language identification models.

Exploring diversity in back translation for low-resource machine translation

Laurie Burchell, Alexandra Birch, Kenneth Heafield (2022)

TLDR: We move towards a better definition of what “diversity” means in the context of training data for low-resource machine translation, and we use novel metrics to investigate the effect of different kinds of diversity on downstream performance.

Querent Intent in Multi-Sentence Questions

Laurie Burchell, Jie Chi, Tom Hosking, Nina Markl, Bonnie Webber (2020)

TLDR: Multi-sentence questions (MSQs) are sequences of questions which need to be answered as a unit. We identify five types of MSQs based on speaker intent and answering strategy, plus we create a new dataset of MSQs to enable future research.