Luccas Mateus de Medeiros Gomes 7c845fe0e3 [alan-turing][m] - individual pages
2023-05-01 21:06:58 -03:00

1.2 KiB
Raw Blame History

title
title
Hate Speech Dataset Catalogue

This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.

The list is maintained by Leon Derczynski, Bertie Vidgen, Hannah Rose Kirk, Pica Johansson, Yi-Ling Chung, Mads Guldborg Kjeldgaard Kongsbak, Laila Sprejer, and Philine Zeinert.

We provide a list of datasets and keywords. If you would like to contribute to our catalogue or add your dataset, please see the instructions for contributing.

If you use these resources, please cite (and read!) our paper: Directions in Abusive Language Training Data: Garbage In, Garbage Out. And if you would like to find other resources for researching online hate, visit The Alan Turing Institutes Online Hate Research Hub or read The Alan Turing Institutes Reading List on Online Hate and Abuse Research.

If youre looking for a good paper on online hate training datasets (beyond our paper, of course!) then have a look at Resources and benchmark corpora for hate speech detection: a systematic review by Poletto et al. in Language Resources and Evaluation.