[examples/turing] - rename it to turing

This commit is contained in:
Luccas Mateus de Medeiros Gomes
2023-05-11 16:13:09 -03:00
parent 82773b5e8a
commit 7822440f0d
43 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,5 @@
---
title: About
---
This is an about page, left here as an example

View File

@@ -0,0 +1,14 @@
---
title: AbuseEval v1.0
link-to-publication: http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.760.pdf
link-to-data: https://github.com/tommasoc80/AbuseEval
task-description: Explicitness annotation of offensive and abusive content
details-of-task: "Enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages and the new dimension for abusive messages. Labels for offensive language: EXPLICIT, IMPLICT, NOT; Labels for abusive language: EXPLICIT, IMPLICT, NOTABU"
size-of-dataset: 14100
percentage-abusive: 20.75
language: English
level-of-annotation: ["Tweets"]
platform: ["Twitter"]
medium: ["Text"]
reference: "Caselli, T., Basile, V., Jelena, M., Inga, K., and Michael, G. 2020. \"I feel offended, dont be abusive! implicit/explicit messages in offensive and abusive language\". The 12th Language Resources and Evaluation Conference (pp. 6193-6202). European Language Resources Association."
---

View File

@@ -0,0 +1,16 @@
---
title: "Abusive Language Detection on Arabic Social Media (Al Jazeera)"
link-to-publication: https://www.aclweb.org/anthology/W17-3008
link-to-data: http://alt.qcri.org/~hmubarak/offensive/AJCommentsClassification-CF.xlsx
task-description: Ternary (Obscene, Offensive but not obscene, Clean)
details-of-task: Incivility
size-of-dataset: 32000
percentage-abusive: 0.81
language: Arabic
level-of-annotation: ["Posts"]
platform: ["AlJazeera"]
medium: ["Text"]
reference: "Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56."
---
SOMETHING TEST

View File

@@ -0,0 +1,14 @@
---
title: "CoRAL: a Context-aware Croatian Abusive Language Dataset"
link-to-publication: https://aclanthology.org/2022.findings-aacl.21/
link-to-data: https://github.com/shekharRavi/CoRAL-dataset-Findings-of-the-ACL-AACL-IJCNLP-2022
task-description: Multi-class based on context dependency categories (CDC)
details-of-task: Detectioning CDC from abusive comments
size-of-dataset: 2240
percentage-abusive: 100
language: "Croatian"
level-of-annotation: ["Posts"]
platform: ["Posts"]
medium: ["Newspaper Comments"]
reference: "Ravi Shekhar, Mladen Karan and Matthew Purver (2022). CoRAL: a Context-aware Croatian Abusive Language Dataset. Findings of the ACL: AACL-IJCNLP."
---

View File

@@ -0,0 +1,14 @@
---
title: Detecting Abusive Albanian
link-to-publication: https://arxiv.org/abs/2107.13592
link-to-data: https://doi.org/10.6084/m9.figshare.19333298.v1
task-description: Hierarchical (offensive/not; untargeted/targeted; person/group/other)
details-of-task: Detect and categorise abusive language in social media data
size-of-dataset: 11874
percentage-abusive: 13.2
language: Albanian
level-of-annotation: ["Posts"]
platform: ["Instagram", "Youtube"]
medium: ["Text"]
reference: "Nurce, E., Keci, J., Derczynski, L., 2021. Detecting Abusive Albanian. arXiv:2107.13592"
---

View File

@@ -0,0 +1,15 @@
---
title: "Hate Speech Detection in the Bengali language: A Dataset and its Baseline Evaluation"
link-to-publication: https://arxiv.org/pdf/2012.09686.pdf
link-to-data: https://www.kaggle.com/naurosromim/bengali-hate-speech-dataset
task-description: Binary (hateful, not)
details-of-task: "Several categories: sports, entertainment, crime, religion, politics, celebrity and meme"
size-of-dataset: 30000
percentage-abusive: 0.33
language: Bengali
level-of-annotation: ["Posts"]
platform: ["Youtube", "Facebook"]
medium: ["Text"]
reference: "Romim, N., Ahmed, M., Talukder, H., & Islam, M. S. (2021). Hate speech detection in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence (pp. 457-468). Springer, Singapore."
---

View File

@@ -0,0 +1,14 @@
---
title: Large-Scale Hate Speech Detection with Cross-Domain Transfer
link-to-publication: https://aclanthology.org/2022.lrec-1.238/
link-to-data: https://github.com/avaapm/hatespeech
task-description: Three-class (Hate speech, Offensive language, None)
details-of-task: Hate speech detection on social media (Twitter) including 5 target groups (gender, race, religion, politics, sports)
size-of-dataset: "100k English (27593 hate, 30747 offensive, 41660 none)"
percentage-abusive: 58.3
language: English
level-of-annotation: ["Posts"]
platform: ["Twitter"]
medium: ["Text", "Image"]
reference: "Cagri Toraman, Furkan Şahinuç, Eyup Yilmaz. 2022. Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 22152225, Marseille, France. European Language Resources Association."
---

View File

@@ -0,0 +1,14 @@
---
title: "Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language"
link-to-publication: https://arxiv.org/abs/2103.10195
link-to-data: https://drive.google.com/file/d/1mM2vnjsy7QfUmdVUpKqHRJjZyQobhTrW/view
task-description: Binary (misogyny/none) and Multi-class (none, discredit, derailing, dominance, stereotyping & objectification, threat of violence, sexual harassment, damning)
details-of-task: Introducing an Arabic Levantine Twitter dataset for Misogynistic language
size-of-dataset: 6603
percentage-abusive: 48.76
language: Arabic
level-of-annotation: ["Posts"]
platform: ["Twitter"]
medium: ["Text", "Images"]
reference: "Hala Mulki and Bilal Ghanem. 2021. Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, pages 154163, Kyiv, Ukraine (Virtual). Association for Computational Linguistics"
---

View File

@@ -0,0 +1,14 @@
---
title: Measuring Hate Speech
link-to-publication: https://arxiv.org/abs/2009.10277
link-to-data: https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech
task-description: 10 ordinal labels (sentiment, (dis)respect, insult, humiliation, inferior status, violence, dehumanization, genocide, attack/defense, hate speech), which are debiased and aggregated into a continuous hate speech severity score (hate_speech_score) that includes a region for counterspeech & supportive speeech. Includes 8 target identity groups (race/ethnicity, religion, national origin/citizenship, gender, sexual orientation, age, disability, political ideology) and 42 identity subgroups.
details-of-task: Hate speech measurement on social media in English
size-of-dataset: "39,565 comments annotated by 7,912 annotators on 10 ordinal labels, for 1,355,560 total labels."
percentage-abusive: 25
language: English
level-of-annotation: ["Social media comment"]
platform: ["Twitter", "Reddit", "Youtube"]
medium: ["Text"]
reference: "Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277."
---

View File

@@ -0,0 +1,14 @@
---
title: Offensive Language and Hate Speech Detection for Danish
link-to-publication: http://www.derczynski.com/papers/danish_hsd.pdf
link-to-data: https://figshare.com/articles/Danish_Hate_Speech_Abusive_Language_data/12220805
task-description: "Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other)"
details-of-task: Group-directed + Person-directed
size-of-dataset: 3600
percentage-abusive: 0.12
language: Danish
level-of-annotation: ["Posts"]
platform: ["Twitter", "Reddit", "Newspaper comments"]
medium: ["Text"]
reference: "Sigurbergsson, G. and Derczynski, L., 2019. Offensive Language and Hate Speech Detection for Danish. ArXiv."
---

View File

@@ -0,0 +1,52 @@
---
title: Hate Speech Dataset Catalogue
---
This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.
The list is maintained by [Leon Derczynski](https://www.derczynski.com/), [Bertie Vidgen](https://www.turing.ac.uk/people/researchers/bertie-vidgen), [Hannah Rose Kirk](https://www.hannahrosekirk.com/), Pica Johansson, [Yi-Ling Chung](https://yilingchung.github.io/), Mads Guldborg Kjeldgaard Kongsbak, [Laila Sprejer](https://www.turing.ac.uk/people/researchers/laila-sprejer), and Philine Zeinert.
We provide a list of [datasets](#Datasets-header) and [keywords](#Keywords-header). If you would like to contribute to our catalogue or add your dataset, please see the [instructions for contributing](#Contributing-header).
If you use these resources, please cite (and read!) our paper: [Directions in Abusive Language Training Data: Garbage In, Garbage Out](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243300). And if you would like to find other resources for researching online hate, visit The Alan Turing Institute's [Online Hate Research Hub](https://www.turing.ac.uk/research/research-programmes/public-policy/online-hate-research-hub) or read The Alan Turing Institute's [Reading List on Online Hate and Abuse Research](https://docs.google.com/document/d/1WVkVGp29Jt6d-4fBnZ5OWVYuFn_03rzz-KBqPsu6gTM/edit?usp=sharing).
If you're looking for a good paper on online hate training datasets (beyond our paper, of course!) then have a look at ['Resources and benchmark corpora for hate speech detection: a systematic review'](https://link.springer.com/article/10.1007/s10579-020-09502-8) by Poletto et al. in *Language Resources and Evaluation*.
Accompanying [data statements](https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00041) preferred for all corpora.
<a href="#Datasets-header" className="w-fit mx-auto no-underline rounded-md py-3 px-6 outline-offset-2 transition !active:transition-none bg-zinc-800 !font-semibold !text-zinc-100 hover:bg-zinc-700 active:bg-zinc-800 active:text-zinc-100/70 dark:bg-zinc-700 dark:hover:bg-zinc-600 !dark:active:bg-zinc-700 dark:active:text-zinc-100/70">See datasets</a>
<h2 id="Contributing-header">How to contribute</h2>
We accept entries to our catalogue based on pull requests to the content folder. The dataset must be avaliable for download to be included in the list. If you want to add an entry, follow these steps!
Please send just one dataset addition/edit at a time - edit it in, then save. This will make everyones life easier (including yours!)
### Create file
Go to the repo url file and click the "Add file" dropdown and then click on "Create new file".
![](https://i.imgur.com/2PR0ZgL.png)
### Choose location
In the following page type `content/datasets/<name-of-the-file>.md`. if you want to add an entry to the datasets catalog or `content/keywords/<name-of-the-file>.md` if you want to add an entry to the lists of abusive keywords, if you want to just add an static page you can leave in the root of `content` it will automatically get assigned an url eg: `/content/about.md` becomes the `/about` page
![](https://i.imgur.com/rr3uSYu.png)
### Fill in content
Copy the contents of `templates/dataset.md` or `templates/keywords.md` respectively to the camp below, filling out the fields with the correct data format
![](https://i.imgur.com/x6JIjhz.png)
### Commit changes
Click on "Commit changes", on the popup make sure you give some brief detail on the proposed change. and then click on Propose changes
<img src='https://i.imgur.com/BxuxKEJ.png' style={{ maxWidth: '50%', margin: '0 auto' }}/>
### Submit PR
Submit the pull request on the next page when prompted.

View File

@@ -0,0 +1,10 @@
---
title: Hurtlex
description: HurtLex is a lexicon of offensive, aggressive, and hateful words in over 50 languages. The words are divided into 17 categories, plus a macro-category indicating whether there is stereotype involved.
data-link: https://github.com/valeriobasile/hurtlex
reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Proc. CLiC-it 2018
---
## Markdown TEST
Some text

View File

@@ -0,0 +1,5 @@
---
title: SexHateLex is a Chinese lexicon of hateful and sexist words.
data-link: https://doi.org/10.5281/zenodo.4773875
reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Journal of OSNEM, Vol.27, 2022, 100182, ISSN 2468-6964.
---