Compare commits

..

10 Commits

Author SHA1 Message Date
Luccas Mateus de Medeiros Gomes
5a29959d14 [alan-turing][m] - small tweaks 2023-05-02 12:53:44 -03:00
Luccas Mateus de Medeiros Gomes
ed3a26cd6d [alan-turing][sm] - fix build 2023-05-01 21:40:46 -03:00
Luccas Mateus
026059184a [alan-turing][m] - individual pages (#828) 2023-05-01 21:06:52 -03:00
João Demenech
a041d69282 merge: tutorial VI
**Issue:** https://github.com/datopian/portaljs/issues/821

## Changes

- Added `npm run export` command to `learn-example`
- Added "Deploying your PortalJS app" section to `/docs`
  - Deploy to Vercel
  - One-Click Deploy (to Vercel)
  - Deploy to  Cloudflare
2023-05-01 19:09:18 -03:00
deme
14abd5b768 [tutorial][xs]: fix typo 2023-05-01 15:02:59 -03:00
deme
4aaabba229 [#821,tutorial][m]: add export npm command to example-learn, add tutorial VI to /docs 2023-05-01 14:51:02 -03:00
Luccas Mateus de Medeiros Gomes
cc43597130 [learn-example][sm] - fix dark mode styling 2023-05-01 11:16:52 -03:00
João Demenech
d9a6ea4ef1 [docs][xs]: fix install command after rename 2023-05-01 08:30:17 -03:00
Luccas Mateus
f6b94ee254 Alan turing portal (#815)
* [alan-turing-portal][m] - initial commit

* [alan-turing][m] - first page with search

* [alan-turing][m] - cleanup
2023-04-29 23:37:30 -03:00
Luccas Mateus de Medeiros Gomes
04b05c0896 [learn-example][sm] - use getstaticprops 2023-04-29 13:54:51 -03:00
30 changed files with 6179 additions and 417 deletions

View File

@@ -1,6 +1,18 @@
## Intro
This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.
Its built on top of [PortalJS](https://portaljs.org/), it allows you to publish datasets, lists of offensive keywords and static pages, all of those are stored as markdown files inside the `content` folder.
- .md files inside `content/datasets/` will appear on the dataset list section of the homepage and be searchable as well as having a individual page in `datasets/<file name>`
- .md files inside `content/keywords/` will appear on the list of offensive keywords section of the homepage as well as having a individual page in `keywords/<file name>`
- .md files inside `content/` will be converted to static pages in the url `/<file name>` eg: `content/about.md` becomes `/about`
This is also a Next.JS project so you can use the following steps to run the website locally.
## Getting started
To get started with this template, first install the npm dependencies:
To get started first install the npm dependencies:
```bash
npm install
@@ -13,7 +25,3 @@ npm run dev
```
Finally, open [http://localhost:3000](http://localhost:3000) in your browser to view the website.
## License
This site template is a commercial product and is licensed under the [Tailwind UI license](https://tailwindui.com/license).

View File

@@ -21,7 +21,7 @@ export function Footer() {
<Container.Inner>
<div className="flex flex-col items-center justify-between gap-6 sm:flex-row">
<p className="text-sm font-medium text-zinc-800 dark:text-zinc-200">
hatespeechdata maintained by <a href='https://github.com/leondz'>leondz</a>
Built with <a href='https://portaljs.org'>PortalJS 🌀</a>
</p>
<p className="text-sm text-zinc-400 dark:text-zinc-500">
&copy; {new Date().getFullYear()} Leon Derczynski. All rights

View File

@@ -0,0 +1,5 @@
---
title: About
---
This is an about page, left here as an example

View File

@@ -0,0 +1,14 @@
---
title: AbuseEval v1.0
link-to-publication: http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.760.pdf
link-to-data: https://github.com/tommasoc80/AbuseEval
task-description: Explicitness annotation of offensive and abusive content
details-of-task: "Enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages and the new dimension for abusive messages. Labels for offensive language: EXPLICIT, IMPLICT, NOT; Labels for abusive language: EXPLICIT, IMPLICT, NOTABU"
size-of-dataset: 14100
percentage-abusive: 20.75
language: English
level-of-annotation: ["Tweets"]
platform: ["Twitter"]
medium: ["Text"]
reference: "Caselli, T., Basile, V., Jelena, M., Inga, K., and Michael, G. 2020. \"I feel offended, dont be abusive! implicit/explicit messages in offensive and abusive language\". The 12th Language Resources and Evaluation Conference (pp. 6193-6202). European Language Resources Association."
---

View File

@@ -12,3 +12,5 @@ platform: ["AlJazeera"]
medium: ["Text"]
reference: "Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56."
---
SOMETHING TEST

View File

@@ -0,0 +1,14 @@
---
title: "CoRAL: a Context-aware Croatian Abusive Language Dataset"
link-to-publication: https://aclanthology.org/2022.findings-aacl.21/
link-to-data: https://github.com/shekharRavi/CoRAL-dataset-Findings-of-the-ACL-AACL-IJCNLP-2022
task-description: Multi-class based on context dependency categories (CDC)
details-of-task: Detectioning CDC from abusive comments
size-of-dataset: 2240
percentage-abusive: 100
language: "Croatian"
level-of-annotation: ["Posts"]
platform: ["Posts"]
medium: ["Newspaper Comments"]
reference: "Ravi Shekhar, Mladen Karan and Matthew Purver (2022). CoRAL: a Context-aware Croatian Abusive Language Dataset. Findings of the ACL: AACL-IJCNLP."
---

View File

@@ -12,3 +12,4 @@ platform: ["Youtube", "Facebook"]
medium: ["Text"]
reference: "Romim, N., Ahmed, M., Talukder, H., & Islam, M. S. (2021). Hate speech detection in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence (pp. 457-468). Springer, Singapore."
---

View File

@@ -0,0 +1,14 @@
---
title: Large-Scale Hate Speech Detection with Cross-Domain Transfer
link-to-publication: https://aclanthology.org/2022.lrec-1.238/
link-to-data: https://github.com/avaapm/hatespeech
task-description: Three-class (Hate speech, Offensive language, None)
details-of-task: Hate speech detection on social media (Twitter) including 5 target groups (gender, race, religion, politics, sports)
size-of-dataset: "100k English (27593 hate, 30747 offensive, 41660 none)"
percentage-abusive: 58.3
language: English
level-of-annotation: ["Posts"]
platform: ["Twitter"]
medium: ["Text", "Image"]
reference: "Cagri Toraman, Furkan Şahinuç, Eyup Yilmaz. 2022. Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 22152225, Marseille, France. European Language Resources Association."
---

View File

@@ -0,0 +1,14 @@
---
title: Measuring Hate Speech
link-to-publication: https://arxiv.org/abs/2009.10277
link-to-data: https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech
task-description: 10 ordinal labels (sentiment, (dis)respect, insult, humiliation, inferior status, violence, dehumanization, genocide, attack/defense, hate speech), which are debiased and aggregated into a continuous hate speech severity score (hate_speech_score) that includes a region for counterspeech & supportive speeech. Includes 8 target identity groups (race/ethnicity, religion, national origin/citizenship, gender, sexual orientation, age, disability, political ideology) and 42 identity subgroups.
details-of-task: Hate speech measurement on social media in English
size-of-dataset: "39,565 comments annotated by 7,912 annotators on 10 ordinal labels, for 1,355,560 total labels."
percentage-abusive: 25
language: English
level-of-annotation: ["Social media comment"]
platform: ["Twitter", "Reddit", "Youtube"]
medium: ["Text"]
reference: "Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277."
---

View File

@@ -0,0 +1,14 @@
---
title: Offensive Language and Hate Speech Detection for Danish
link-to-publication: http://www.derczynski.com/papers/danish_hsd.pdf
link-to-data: https://figshare.com/articles/Danish_Hate_Speech_Abusive_Language_data/12220805
task-description: "Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other)"
details-of-task: Group-directed + Person-directed
size-of-dataset: 3600
percentage-abusive: 0.12
language: Danish
level-of-annotation: ["Posts"]
platform: ["Twitter", "Reddit", "Newspaper comments"]
medium: ["Text"]
reference: "Sigurbergsson, G. and Derczynski, L., 2019. Offensive Language and Hate Speech Detection for Danish. ArXiv."
---

View File

@@ -1,3 +1,7 @@
---
title: Hate Speech Dataset Catalogue
---
This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.
The list is maintained by Leon Derczynski, Bertie Vidgen, Hannah Rose Kirk, Pica Johansson, Yi-Ling Chung, Mads Guldborg Kjeldgaard Kongsbak, Laila Sprejer, and Philine Zeinert.
@@ -7,3 +11,24 @@ We provide a list of datasets and keywords. If you would like to contribute to o
If you use these resources, please cite (and read!) our paper: Directions in Abusive Language Training Data: Garbage In, Garbage Out. And if you would like to find other resources for researching online hate, visit The Alan Turing Institutes Online Hate Research Hub or read The Alan Turing Institutes Reading List on Online Hate and Abuse Research.
If youre looking for a good paper on online hate training datasets (beyond our paper, of course!) then have a look at Resources and benchmark corpora for hate speech detection: a systematic review by Poletto et al. in Language Resources and Evaluation.
## How to contribute
We accept entries to our catalogue based on pull requests to the content folder. The dataset must be avaliable for download to be included in the list. If you want to add an entry, follow these steps!
Please send just one dataset addition/edit at a time - edit it in, then save. This will make everyones life easier (including yours!)
- Go to the repo url file and click the "Add file" dropdown and then click on "Create new file".
![](https://i.imgur.com/2PR0ZgL.png)
- In the following page type `content/datasets/<name-of-the-file>.md`. if you want to add an entry to the datasets catalog or `content/keywords/<name-of-the-file>.md` if you want to add an entry to the lists of abusive keywords, if you want to just add an static page you can leave in the root of `content` it will automatically get assigned an url eg: `/content/about.md` becomes the `/about` page
![](https://i.imgur.com/rr3uSYu.png)
- Copy the contents of `templates/dataset.md` or `templates/keywords.md` respectively to the camp below, filling out the fields with the correct data format
![](https://i.imgur.com/x6JIjhz.png)
- Click on "Commit changes", on the popup make sure you give some brief detail on the proposed change. and then click on Propose changes
<img src='https://i.imgur.com/BxuxKEJ.png' style={{ maxWidth: '50%', margin: '0 auto' }}/>
- Submit the pull request on the next page when prompted.

View File

@@ -0,0 +1,10 @@
---
title: Hurtlex
description: HurtLex is a lexicon of offensive, aggressive, and hateful words in over 50 languages. The words are divided into 17 categories, plus a macro-category indicating whether there is stereotype involved.
data-link: https://github.com/valeriobasile/hurtlex
reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Proc. CLiC-it 2018
---
## Markdown TEST
Some text

View File

@@ -0,0 +1,5 @@
---
title: SexHateLex is a Chinese lexicon of hateful and sexist words.
data-link: https://doi.org/10.5281/zenodo.4773875
reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Journal of OSNEM, Vol.27, 2022, 100182, ISSN 2468-6964.
---

View File

@@ -0,0 +1,105 @@
import matter from "gray-matter";
import mdxmermaid from "mdx-mermaid";
import { h } from "hastscript";
import remarkCallouts from "@flowershow/remark-callouts";
import remarkEmbed from "@flowershow/remark-embed";
import remarkGfm from "remark-gfm";
import remarkMath from "remark-math";
import remarkSmartypants from "remark-smartypants";
import remarkToc from "remark-toc";
import remarkWikiLink from "@flowershow/remark-wiki-link";
import rehypeAutolinkHeadings from "rehype-autolink-headings";
import rehypeKatex from "rehype-katex";
import rehypeSlug from "rehype-slug";
import rehypePrismPlus from "rehype-prism-plus";
import { serialize } from "next-mdx-remote/serialize";
/**
* Parse a markdown or MDX file to an MDX source form + front matter data
*
* @source: the contents of a markdown or mdx file
* @format: used to indicate to next-mdx-remote which format to use (md or mdx)
* @returns: { mdxSource: mdxSource, frontMatter: ...}
*/
const parse = async function (source, format) {
const { content, data, excerpt } = matter(source, {
excerpt: (file, options) => {
// Generate an excerpt for the file
file.excerpt = file.content.split("\n\n")[0];
},
});
const mdxSource = await serialize(
{ value: content, path: format },
{
// Optionally pass remark/rehype plugins
mdxOptions: {
remarkPlugins: [
remarkEmbed,
remarkGfm,
[remarkSmartypants, { quotes: false, dashes: "oldschool" }],
remarkMath,
remarkCallouts,
remarkWikiLink,
[
remarkToc,
{
heading: "Table of contents",
tight: true,
},
],
[mdxmermaid, {}],
],
rehypePlugins: [
rehypeSlug,
[
rehypeAutolinkHeadings,
{
properties: { className: 'heading-link' },
test(element) {
return (
["h2", "h3", "h4", "h5", "h6"].includes(element.tagName) &&
element.properties?.id !== "table-of-contents" &&
element.properties?.className !== "blockquote-heading"
);
},
content() {
return [
h(
"svg",
{
xmlns: "http:www.w3.org/2000/svg",
fill: "#ab2b65",
viewBox: "0 0 20 20",
className: "w-5 h-5",
},
[
h("path", {
fillRule: "evenodd",
clipRule: "evenodd",
d: "M9.493 2.853a.75.75 0 00-1.486-.205L7.545 6H4.198a.75.75 0 000 1.5h3.14l-.69 5H3.302a.75.75 0 000 1.5h3.14l-.435 3.148a.75.75 0 001.486.205L7.955 14h2.986l-.434 3.148a.75.75 0 001.486.205L12.456 14h3.346a.75.75 0 000-1.5h-3.14l.69-5h3.346a.75.75 0 000-1.5h-3.14l.435-3.147a.75.75 0 00-1.486-.205L12.045 6H9.059l.434-3.147zM8.852 7.5l-.69 5h2.986l.69-5H8.852z",
}),
]
),
];
},
},
],
[rehypeKatex, { output: "mathml" }],
[rehypePrismPlus, { ignoreMissing: true }],
],
format,
},
scope: data,
}
);
return {
mdxSource: mdxSource,
frontMatter: data,
excerpt,
};
};
export default parse;

View File

@@ -0,0 +1,5 @@
/// <reference types="next" />
/// <reference types="next/image-types/global" />
// NOTE: This file should not be edited
// see https://nextjs.org/docs/basic-features/typescript for more information.

File diff suppressed because it is too large Load Diff

View File

@@ -26,17 +26,42 @@
"feed": "^4.2.2",
"flexsearch": "^0.7.31",
"focus-visible": "^5.2.0",
"next": "13.3.0",
"next-router-mock": "^0.9.3",
"next-superjson-plugin": "^0.5.7",
"postcss-focus-visible": "^6.0.4",
"react": "18.2.0",
"react-dom": "18.2.0",
"react-hook-form": "^7.43.9",
"react-markdown": "^8.0.7",
"remark-gfm": "^3.0.1",
"superjson": "^1.12.3",
"tailwindcss": "^3.3.0"
"tailwindcss": "^3.3.0",
"@flowershow/core": "^0.4.10",
"@flowershow/remark-callouts": "^1.0.0",
"@flowershow/remark-embed": "^1.0.0",
"@flowershow/remark-wiki-link": "^1.1.2",
"@heroicons/react": "^2.0.17",
"@opentelemetry/api": "^1.4.0",
"@tanstack/react-table": "^8.8.5",
"@types/node": "18.16.0",
"@types/react": "18.2.0",
"@types/react-dom": "18.2.0",
"eslint": "8.39.0",
"eslint-config-next": "13.3.1",
"gray-matter": "^4.0.3",
"hastscript": "^7.2.0",
"mdx-mermaid": "2.0.0-rc7",
"next": "13.2.1",
"next-mdx-remote": "^4.4.1",
"papaparse": "^5.4.1",
"react": "18.2.0",
"react-dom": "18.2.0",
"react-vega": "^7.6.0",
"rehype-autolink-headings": "^6.1.1",
"rehype-katex": "^6.0.3",
"rehype-prism-plus": "^1.5.1",
"rehype-slug": "^5.1.0",
"remark-gfm": "^3.0.1",
"remark-math": "^5.1.1",
"remark-smartypants": "^2.0.0",
"remark-toc": "^8.0.1"
},
"devDependencies": {
"eslint": "8.26.0",

View File

@@ -0,0 +1,103 @@
import { Container } from '../components/Container'
import clientPromise from '../lib/mddb'
import fs from 'fs'
import { MDXRemote } from 'next-mdx-remote'
import { serialize } from 'next-mdx-remote/serialize'
import { Card } from '../components/Card'
import Head from 'next/head'
export const getStaticProps = async ({ params }) => {
const urlPath = params.slug ? params.slug.join('/') : ''
const mddb = await clientPromise
const dbFile = await mddb.getFileByUrl(urlPath)
const source = fs.readFileSync(dbFile.file_path, { encoding: 'utf-8' })
const mdxSource = await serialize(source, { parseFrontmatter: true })
return {
props: {
mdxSource,
},
}
}
export async function getStaticPaths() {
const mddb = await clientPromise
const allDocuments = await mddb.getFiles({ extensions: ['md', 'mdx'] })
const paths = allDocuments.filter(document => document.url_path !== '/').map((page) => {
const parts = page.url_path.split('/')
return { params: { slug: parts } }
})
return {
paths,
fallback: false,
}
}
const isValidUrl = (urlString) => {
try {
return Boolean(new URL(urlString))
} catch (e) {
return false
}
}
const Meta = ({keyValuePairs}) => {
const prettifyMetaValue = (value) => value.replaceAll('-',' ').charAt(0).toUpperCase() + value.replaceAll('-',' ').slice(1);
return (
<>
{keyValuePairs.map((entry) => {
return isValidUrl(entry[1]) ? (
<Card.Description>
<span className="font-semibold">
{prettifyMetaValue(entry[0])}: {' '}
</span>
<a
className="text-ellipsis underline transition hover:text-teal-400 dark:hover:text-teal-900"
href={entry[1]}
>
{entry[1]}
</a>
</Card.Description>
) : (
<Card.Description>
<span className="font-semibold">{prettifyMetaValue(entry[0])}: </span>
{Array.isArray(entry[1]) ? entry[1].join(', ') : entry[1]}
</Card.Description>
)
})}
</>
)
}
export default function DRDPage({ mdxSource }) {
const meta = mdxSource.frontmatter
const keyValuePairs = Object.entries(meta).filter(
(entry) => entry[0] !== 'title'
)
return (
<>
<Head>
<title>{meta.title}</title>
</Head>
<Container className="mt-16 lg:mt-32">
<article>
<header className="flex flex-col">
<h1 className="mt-6 text-4xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
{meta.title}
</h1>
<Card as="article">
<Meta keyValuePairs={keyValuePairs} />
</Card>
</header>
<div className="prose dark:prose-invert">
<MDXRemote {...mdxSource} />
</div>
</article>
</Container>
</>
)
}

View File

@@ -3,19 +3,21 @@ import fs from 'fs'
import { Card } from '../components/Card'
import { Container } from '../components/Container'
import clientPromise from '@/lib/mddb'
import ReactMarkdown from 'react-markdown'
import clientPromise from '../lib/mddb'
import { Index } from 'flexsearch'
import { useForm } from 'react-hook-form'
import Link from 'next/link'
import { serialize } from 'next-mdx-remote/serialize'
import { MDXRemote } from 'next-mdx-remote'
function DatasetCard({ dataset }) {
return (
<Card as="article">
<Card.Title>{dataset.title}</Card.Title>
<Card.Title><Link href={dataset.url}>{dataset.title}</Link></Card.Title>
<Card.Description>
<span className="font-semibold">Link to publication: </span>{' '}
<a
className="underline transition hover:text-teal-400 dark:hover:text-teal-900 text-ellipsis"
className="text-ellipsis underline transition hover:text-teal-400 dark:hover:text-teal-900"
href={dataset['link-to-publication']}
>
{dataset['link-to-publication']}
@@ -24,7 +26,7 @@ function DatasetCard({ dataset }) {
<Card.Description>
<span className="font-semibold">Link to data: </span>
<a
className="underline transition hover:text-teal-600 dark:hover:text-teal-900 text-ellipsis"
className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
href={dataset['link-to-data']}
>
{dataset['link-to-data']}
@@ -69,14 +71,60 @@ function DatasetCard({ dataset }) {
</Card>
)
}
export default function Home({ datasets, indexText, availableLanguages, availablePlatforms }) {
function ListOfAbusiveKeywordsCard({ list }) {
return (
<Card as="article">
<Card.Title><Link href={list.url}>{list.title}</Link></Card.Title>
{list.description && (
<Card.Description>
<span className="font-semibold">List Description: </span>{' '}
{list.description}
</Card.Description>
)}
<Card.Description>
<span className="font-semibold">Data Link: </span>
<a
className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
href={list['data-link']}
>
{list['data-link']}
</a>
</Card.Description>
<Card.Description>
<span className="font-semibold">Reference: </span>
<a
className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
href={list.reference}
>
{list.reference}
</a>
</Card.Description>
</Card>
)
}
export default function Home({
datasets,
indexText,
listsOfKeywords,
availableLanguages,
availablePlatforms,
}) {
const index = new Index()
datasets.forEach((dataset) => index.add(dataset.id, `${dataset.title} ${dataset['task-description']} ${dataset['details-of-task']} ${dataset['reference']}`))
const { register, watch } = useForm({ defaultValues: {
searchTerm: '',
lang: '',
platform: ''
}})
datasets.forEach((dataset) =>
index.add(
dataset.id,
`${dataset.title} ${dataset['task-description']} ${dataset['details-of-task']} ${dataset['reference']}`
)
)
const { register, watch, handleSubmit, reset } = useForm({
defaultValues: {
searchTerm: '',
lang: '',
platform: '',
},
})
return (
<>
<Head>
@@ -89,49 +137,68 @@ export default function Home({ datasets, indexText, availableLanguages, availabl
<Container className="mt-9">
<div className="max-w-2xl">
<h1 className="text-4xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
Hate Speech Dataset Catalogue
{indexText.frontmatter.title}
</h1>
<article className="mt-6 flex flex-col gap-y-2 text-base text-zinc-600 dark:text-zinc-400">
<ReactMarkdown>{indexText}</ReactMarkdown>
<article className="mt-6 index-text flex flex-col gap-y-2 text-base text-zinc-600 dark:text-zinc-400 prose dark:prose-invert">
<MDXRemote {...indexText} />
</article>
</div>
</Container>
<Container className="mt-24 md:mt-28">
<div className="mx-auto grid max-w-xl grid-cols-1 gap-y-8 lg:max-w-none">
<form className="rounded-2xl border border-zinc-100 px-4 py-6 sm:p-6 dark:border-zinc-700/40">
<div className="mx-auto grid max-w-7xl grid-cols-1 gap-y-8 lg:max-w-none">
<h2 className="text-xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
Datasets
</h2>
<form onSubmit={handleSubmit(() => reset())} className="rounded-2xl border border-zinc-100 px-4 py-6 dark:border-zinc-700/40 sm:p-6">
<p className="mt-2 text-lg font-semibold text-zinc-600 dark:text-zinc-100">
Search for datasets
</p>
<div className="mt-6 flex flex-col sm:flex-row gap-3">
<div className="mt-6 flex flex-col gap-3 sm:flex-row">
<input
placeholder="Search here"
aria-label="Hate speech on Twitter"
required
{...register('searchTerm')}
className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-600 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-200 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
/>
<select
placeholder="Language"
defaultValue=""
className="min-w-0 flex-auto text-zinc-600 appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] text-zinc-600 shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
{...register('lang')}
>
<option value="" disabled hidden>Filter by language</option>
<option value="" disabled hidden>
Filter by language
</option>
{availableLanguages.map((lang) => (
<option key={lang} className='dark:bg-white dark:text-black' value={lang}>{lang}</option>
<option
key={lang}
className="dark:bg-white dark:text-black"
value={lang}
>
{lang}
</option>
))}
</select>
<select
placeholder="Platform"
defaultValue=""
className="min-w-0 flex-auto text-zinc-600 appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] text-zinc-600 shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
{...register('platform')}
>
<option value="" disabled hidden>Filter by platform</option>
<option value="" disabled hidden>
Filter by platform
</option>
{availablePlatforms.map((platform) => (
<option key={platform} className='dark:bg-white dark:text-black' value={platform}>{platform}</option>
<option
key={platform}
className="dark:bg-white dark:text-black"
value={platform}
>
{platform}
</option>
))}
</select>
<button type='submit' className='inline-flex items-center gap-2 justify-center rounded-md py-2 px-3 text-sm outline-offset-2 transition active:transition-none bg-zinc-800 font-semibold text-zinc-100 hover:bg-zinc-700 active:bg-zinc-800 active:text-zinc-100/70 dark:bg-zinc-700 dark:hover:bg-zinc-600 dark:active:bg-zinc-700 dark:active:text-zinc-100/70 flex-none'>Clear filters</button>
</div>
</form>
<div className="flex flex-col gap-16">
@@ -157,24 +224,56 @@ export default function Home({ datasets, indexText, availableLanguages, availabl
</div>
</div>
</Container>
<Container className="mt-16">
<h2 className="text-xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
Lists of Abusive Keywords
</h2>
<div className="mt-3 flex flex-col gap-16">
{listsOfKeywords.map((list) => (
<ListOfAbusiveKeywordsCard key={list.title} list={list} />
))}
</div>
</Container>
</>
)
}
export async function getStaticProps() {
const mddb = await clientPromise
const allPages = await mddb.getFiles({ extensions: ['md', 'mdx'] })
const datasets = allPages
.filter((page) => page.url_path !== '/')
.map((page) => ({ ...page.metadata, id: page._id }))
const index = allPages.filter((page) => page.url_path === '/')[0]
const source = fs.readFileSync(index.file_path, { encoding: 'utf-8' })
const availableLanguages = [... new Set(datasets.map((dataset) => dataset.language))]
const availablePlatforms = [... new Set(datasets.map((dataset) => dataset.platform).flat())]
const datasetPages = await mddb.getFiles({
folder: 'datasets',
extensions: ['md', 'mdx'],
})
const datasets = datasetPages.map((page) => ({
...page.metadata,
id: page._id,
url: page.url_path,
}))
const listsOfKeywordsPages = await mddb.getFiles({
folder: 'keywords',
extensions: ['md', 'mdx'],
})
const listsOfKeywords = listsOfKeywordsPages.map((page) => ({
...page.metadata,
id: page._id,
url: page.url_path,
}))
const index = await mddb.getFileByUrl('/')
let indexSource = fs.readFileSync(index.file_path, { encoding: 'utf-8' })
indexSource = await serialize(indexSource, { parseFrontmatter: true })
const availableLanguages = [
...new Set(datasets.map((dataset) => dataset.language)),
]
const availablePlatforms = [
...new Set(datasets.map((dataset) => dataset.platform).flat()),
]
return {
props: {
indexText: source,
datasets,
listsOfKeywords,
indexText: indexSource,
availableLanguages,
availablePlatforms,
},

View File

@@ -2,3 +2,12 @@
@import 'tailwindcss/components';
@import './prism.css';
@import 'tailwindcss/utilities';
.index-text ul,
.index-text p {
margin: 0;
}
.index-text h2 {
margin-top: 1rem;
}

View File

@@ -0,0 +1,14 @@
---
title: string
link-to-publication: url
link-to-data: url
task-description: string
details-of-task: string
size-of-dataset: number
percentage-abusive: number
language: string
level-of-annotation: list eg: ["Posts", "Comments", ...]
platform: list eg: ["Youtube", "Facebook", ...]
medium: list eg: ["Text", "Emojis", "Images", ...]
reference: string
---

View File

@@ -0,0 +1,5 @@
---
title: string
data-link: url
reference: string
---

View File

@@ -0,0 +1,28 @@
{
"compilerOptions": {
"lib": [
"dom",
"dom.iterable",
"esnext"
],
"allowJs": true,
"skipLibCheck": true,
"strict": false,
"forceConsistentCasingInFileNames": true,
"noEmit": true,
"incremental": true,
"esModuleInterop": true,
"moduleResolution": "node",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "preserve"
},
"include": [
"next-env.d.ts",
"**/*.ts",
"**/*.tsx"
],
"exclude": [
"node_modules"
]
}

View File

@@ -6,7 +6,8 @@
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "next lint"
"lint": "next lint",
"export": "npm run build && next export -o out"
},
"dependencies": {
"@flowershow/core": "^0.4.10",

View File

@@ -3,7 +3,21 @@ import path from 'path';
import parse from '../lib/markdown';
import DRD from '../components/DRD';
export const getServerSideProps = async (context) => {
export const getStaticPaths = async () => {
const contentDir = path.join(process.cwd(), '/content/');
const contentFolders = await fs.readdir(contentDir, 'utf8');
const paths = contentFolders.map((folder: string) =>
folder === 'index.md'
? { params: { path: [] } }
: { params: { path: [folder] } }
);
return {
paths,
fallback: false,
};
};
export const getStaticProps = async (context) => {
let pathToFile = 'index.md';
if (context.params.path) {
pathToFile = context.params.path.join('/') + '/index.md';
@@ -22,7 +36,7 @@ export const getServerSideProps = async (context) => {
export default function DatasetPage({ mdxSource, frontMatter }) {
return (
<div className="prose mx-auto">
<div className="prose dark:prose-invert mx-auto">
<header>
<div className="mb-6">
<>

View File

@@ -7,7 +7,7 @@ class MyDocument extends Document {
<Head>
<link rel="icon" href="/favicon.png" />
</Head>
<body>
<body className='bg-white dark:bg-gray-900'>
<Main />
<NextScript />
</body>

View File

@@ -16,7 +16,7 @@ If you have questions about anything related to PortalJS, you're always welcome
To create a PortalJS app, open your terminal, cd into the directory youd like to create the app in, and run the following command:
```bash
npx create-next-app my-data-portal --example https://github.com/datopian/portaljs/tree/main/examples/basic-example
npx create-next-app my-data-portal --example https://github.com/datopian/portaljs/tree/main/examples/learn-example
```
> [!tip]
@@ -116,3 +116,57 @@ From the browser, access http://localhost:3000. You should see the following:
<img src="/assets/docs/datasets-index-page.png" />
## Deploying your PortalJS app
Finally, let's learn how to deploy PortalJS apps to Vercel or Cloudflare Pages.
> [!tip]
> Although we are using Vercel and Cloudflare Pages in this tutorial, you can deploy apps in any hosting solution you want as a static website by running `npm run export` and distributing the contents of the `out/` folder.
### Push to a GitHub repo
The PortalJS app we built up to this point is stored locally. To allow Vercel or Cloudflare Pages to deploy it, we have to push it to GitHub (or another SCM supported by these hosting solutions).
- Create a new repository under your GitHub account
- Add the new remote origin to your PortalJS app
- Push the app to the repository
If you are not sure about how to do it, follow this guide: https://nextjs.org/learn/basics/deploying-nextjs-app/github
> [!tip]
> You can also deploy using our Vercel deploy button. In this case, a new repository will be created under your GitHub account automatically.
> [Click here](#one-click-deploy) to scroll to the deploy button.
### Deploy to Vercel
The easiest way to deploy a PortalJS app is to use Vercel, a serverless platform for static and hybrid applications developed by the creators of Next.js.
To deploy your PortalJS app:
- Create a Vercel account by going to https://vercel.com/signup and choosing "Continue with GitHub"
- Import the repository you created for the PortalJS app at https://vercel.com/new
- During the setup process you can use the default settings - no need to change anything.
When you deploy, your PortalJS app will start building. It should finish in under a minute.
When its done, youll get deployment URLs. Click on one of the URLs and you should see your PortaJS app live.
>[!tip]
> You can find a more in-depth explanation about this process at https://nextjs.org/learn/basics/deploying-nextjs-app/deploy
#### One-Click Deploy
You can instantly deploy our example app to your Vercel account by clicking the button below:
[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdatopian%2Fportaljs%2Ftree%2Fmain%2Fexamples%2Flearn-example&project-name=my-data-portal&repository-name=my-data-portal&demo-title=PortalJS%20Learn%20Example&demo-description=PortalJS%20Learn%20Example%20-%20https%3A%2F%2Fportaljs.org%2Fdocs&demo-url=learn-example.portaljs.org&demo-image=https%3A%2F%2Fportaljs.org%2Fassets%2Fexamples%2Fbasic-example.png)
This will create a new repository on your GitHub account and deploy it to Vercel. If you are following the tutorial, you can replicate the changes done on your local app to this new repository.
### Deploy to Cloudflare Pages
To deploy your PortalJS app to Cloudflare Pages, follow this guide:
https://developers.cloudflare.com/pages/framework-guides/deploy-a-nextjs-site/#deploy-with-cloudflare-pages-1
Note that you don't have to change anything - just follow the steps, choosing the repository you created.