Luccas Mateus de Medeiros Gomes 61b96c20ed [alan-turing][xs] - fix links
2023-05-08 14:36:59 -03:00

53 lines
3.7 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: Hate Speech Dataset Catalogue
---
This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.
The list is maintained by [Leon Derczynski](https://www.derczynski.com/), [Bertie Vidgen](https://www.turing.ac.uk/people/researchers/bertie-vidgen), [Hannah Rose Kirk](https://www.hannahrosekirk.com/), Pica Johansson, [Yi-Ling Chung](https://yilingchung.github.io/), Mads Guldborg Kjeldgaard Kongsbak, [Laila Sprejer](https://www.turing.ac.uk/people/researchers/laila-sprejer), and Philine Zeinert.
We provide a list of [datasets](#Datasets-header) and [keywords](#Keywords-header). If you would like to contribute to our catalogue or add your dataset, please see the [instructions for contributing](#Contributing-header).
If you use these resources, please cite (and read!) our paper: [Directions in Abusive Language Training Data: Garbage In, Garbage Out](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243300). And if you would like to find other resources for researching online hate, visit The Alan Turing Institute's [Online Hate Research Hub](https://www.turing.ac.uk/research/research-programmes/public-policy/online-hate-research-hub) or read The Alan Turing Institute's [Reading List on Online Hate and Abuse Research](https://docs.google.com/document/d/1WVkVGp29Jt6d-4fBnZ5OWVYuFn_03rzz-KBqPsu6gTM/edit?usp=sharing).
If you're looking for a good paper on online hate training datasets (beyond our paper, of course!) then have a look at ['Resources and benchmark corpora for hate speech detection: a systematic review'](https://link.springer.com/article/10.1007/s10579-020-09502-8) by Poletto et al. in *Language Resources and Evaluation*.
Accompanying [data statements](https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00041) preferred for all corpora.
<a href="#Datasets-header" className="w-fit mx-auto no-underline rounded-md py-3 px-6 outline-offset-2 transition !active:transition-none bg-zinc-800 !font-semibold !text-zinc-100 hover:bg-zinc-700 active:bg-zinc-800 active:text-zinc-100/70 dark:bg-zinc-700 dark:hover:bg-zinc-600 !dark:active:bg-zinc-700 dark:active:text-zinc-100/70">See datasets</a>
<h2 id="Contributing-header">How to contribute</h2>
We accept entries to our catalogue based on pull requests to the content folder. The dataset must be avaliable for download to be included in the list. If you want to add an entry, follow these steps!
Please send just one dataset addition/edit at a time - edit it in, then save. This will make everyones life easier (including yours!)
### Create file
Go to the repo url file and click the "Add file" dropdown and then click on "Create new file".
![](https://i.imgur.com/2PR0ZgL.png)
### Choose location
In the following page type `content/datasets/<name-of-the-file>.md`. if you want to add an entry to the datasets catalog or `content/keywords/<name-of-the-file>.md` if you want to add an entry to the lists of abusive keywords, if you want to just add an static page you can leave in the root of `content` it will automatically get assigned an url eg: `/content/about.md` becomes the `/about` page
![](https://i.imgur.com/rr3uSYu.png)
### Fill in content
Copy the contents of `templates/dataset.md` or `templates/keywords.md` respectively to the camp below, filling out the fields with the correct data format
![](https://i.imgur.com/x6JIjhz.png)
### Commit changes
Click on "Commit changes", on the popup make sure you give some brief detail on the proposed change. and then click on Propose changes
<img src='https://i.imgur.com/BxuxKEJ.png' style={{ maxWidth: '50%', margin: '0 auto' }}/>
### Submit PR
Submit the pull request on the next page when prompted.