[alan-turing][m] - small tweaks

[alan-turing][sm] - fix build
[alan-turing][m] - individual pages (#828 )
2023-05-02 12:53:44 -03:00 · 2023-05-01 21:40:46 -03:00 · 2023-05-01 21:06:52 -03:00 · 2023-05-01 19:09:18 -03:00 · 2023-05-01 15:02:59 -03:00 · 2023-05-01 14:51:02 -03:00
30 changed files with 6179 additions and 417 deletions
--- a/examples/alan-turing-portal/README.md
+++ b/examples/alan-turing-portal/README.md
@@ -1,6 +1,18 @@
+## Intro
+
+This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.
+
+Its built on top of [PortalJS](https://portaljs.org/), it allows you to publish datasets, lists of offensive keywords and static pages, all of those are stored as markdown files inside the `content` folder.
+
+- .md files inside `content/datasets/` will appear on the dataset list section of the homepage and be searchable as well as having a individual page in `datasets/<file name>`
+- .md files inside `content/keywords/` will appear on the list of offensive keywords section of the homepage as well as having a individual page in `keywords/<file name>`
+- .md files inside `content/` will be converted to static pages in the url `/<file name>` eg: `content/about.md` becomes `/about`
+
+This is also a Next.JS project so you can use the following steps to run the website locally.
+
 ## Getting started

-To get started with this template, first install the npm dependencies:
+To get started first install the npm dependencies:

 ```bash
 npm install
@@ -13,7 +25,3 @@ npm run dev
 ```

 Finally, open [http://localhost:3000](http://localhost:3000) in your browser to view the website.
-
-## License
-
-This site template is a commercial product and is licensed under the [Tailwind UI license](https://tailwindui.com/license).
--- a/examples/alan-turing-portal/components/Footer.jsx
+++ b/examples/alan-turing-portal/components/Footer.jsx
@@ -21,7 +21,7 @@ export function Footer() {
          <Container.Inner>
            <div className="flex flex-col items-center justify-between gap-6 sm:flex-row">
              <p className="text-sm font-medium text-zinc-800 dark:text-zinc-200">
-                hatespeechdata maintained by <a href='https://github.com/leondz'>leondz</a>
+                Built with <a href='https://portaljs.org'>PortalJS 🌀</a>
              </p>
              <p className="text-sm text-zinc-400 dark:text-zinc-500">
                &copy; {new Date().getFullYear()} Leon Derczynski. All rights
--- a/examples/alan-turing-portal/content/about.md
+++ b/examples/alan-turing-portal/content/about.md
@@ -0,0 +1,5 @@
+---
+title: About
+---
+
+This is an about page, left here as an example
--- a/examples/alan-turing-portal/content/datasets/abusive-eval-v1-0.md
+++ b/examples/alan-turing-portal/content/datasets/abusive-eval-v1-0.md
@@ -0,0 +1,14 @@
+---
+title: AbuseEval v1.0
+link-to-publication: http://www.lrec-conf.org/proceedings/lrec2020/pdf/2020.lrec-1.760.pdf
+link-to-data: https://github.com/tommasoc80/AbuseEval
+task-description: Explicitness annotation of offensive and abusive content
+details-of-task: "Enriched versions of the OffensEval/OLID dataset with the distinction of explicit/implicit offensive messages and the new dimension for abusive messages. Labels for offensive language: EXPLICIT, IMPLICT, NOT; Labels for abusive language: EXPLICIT, IMPLICT, NOTABU"
+size-of-dataset: 14100
+percentage-abusive: 20.75
+language: English
+level-of-annotation: ["Tweets"]
+platform: ["Twitter"]
+medium: ["Text"]
+reference: "Caselli, T., Basile, V., Jelena, M., Inga, K., and Michael, G. 2020. \"I feel offended, don’t be abusive! implicit/explicit messages in offensive and abusive language\". The 12th Language Resources and Evaluation Conference (pp. 6193-6202). European Language Resources Association."
+---
--- a/examples/alan-turing-portal/content/datasets/abusive-language-detection-on-arabic-social-media-al-jazeera.md
+++ b/examples/alan-turing-portal/content/datasets/abusive-language-detection-on-arabic-social-media-al-jazeera.md
@@ -12,3 +12,5 @@ platform: ["AlJazeera"]
 medium: ["Text"]
 reference: "Mubarak, H., Darwish, K. and Magdy, W., 2017. Abusive Language Detection on Arabic Social Media. In: Proceedings of the First Workshop on Abusive Language Online. Vancouver, Canada: Association for Computational Linguistics, pp.52-56."
 ---
+
+SOMETHING TEST
--- a/examples/alan-turing-portal/content/datasets/coral-a-context-aware-croatian-abusive-language-dataset.md
+++ b/examples/alan-turing-portal/content/datasets/coral-a-context-aware-croatian-abusive-language-dataset.md
@@ -0,0 +1,14 @@
+---
+title: "CoRAL: a Context-aware Croatian Abusive Language Dataset"
+link-to-publication: https://aclanthology.org/2022.findings-aacl.21/
+link-to-data: https://github.com/shekharRavi/CoRAL-dataset-Findings-of-the-ACL-AACL-IJCNLP-2022
+task-description: Multi-class based on context dependency categories (CDC)
+details-of-task: Detectioning CDC from abusive comments
+size-of-dataset: 2240
+percentage-abusive: 100
+language: "Croatian"
+level-of-annotation: ["Posts"]
+platform: ["Posts"]
+medium: ["Newspaper Comments"]
+reference: "Ravi Shekhar, Mladen Karan and Matthew Purver (2022). CoRAL: a Context-aware Croatian Abusive Language Dataset. Findings of the ACL: AACL-IJCNLP."
+---
--- a/examples/alan-turing-portal/content/datasets/detecting-abusive-albanian.md
+++ b/examples/alan-turing-portal/content/datasets/detecting-abusive-albanian.md
--- a/examples/alan-turing-portal/content/datasets/hate-speech-detection-in-the-bengali-language-a-dataset-and-its-baseline-evaluation.md
+++ b/examples/alan-turing-portal/content/datasets/hate-speech-detection-in-the-bengali-language-a-dataset-and-its-baseline-evaluation.md
@@ -12,3 +12,4 @@ platform: ["Youtube", "Facebook"]
 medium: ["Text"]
 reference: "Romim, N., Ahmed, M., Talukder, H., & Islam, M. S. (2021). Hate speech detection in the bengali language: A dataset and its baseline evaluation. In Proceedings of International Joint Conference on Advances in Computational Intelligence (pp. 457-468). Springer, Singapore."
 ---
+
--- a/examples/alan-turing-portal/content/datasets/large-scale-hate-speech-detection-with-cross-domain-tranfer.md
+++ b/examples/alan-turing-portal/content/datasets/large-scale-hate-speech-detection-with-cross-domain-tranfer.md
@@ -0,0 +1,14 @@
+---
+title: Large-Scale Hate Speech Detection with Cross-Domain Transfer
+link-to-publication: https://aclanthology.org/2022.lrec-1.238/
+link-to-data: https://github.com/avaapm/hatespeech
+task-description: Three-class (Hate speech, Offensive language, None)
+details-of-task: Hate speech detection on social media (Twitter) including 5 target groups (gender, race, religion, politics, sports)
+size-of-dataset: "100k English (27593 hate, 30747 offensive, 41660 none)"
+percentage-abusive: 58.3
+language: English
+level-of-annotation: ["Posts"]
+platform: ["Twitter"]
+medium: ["Text", "Image"]
+reference: "Cagri Toraman, Furkan Şahinuç, Eyup Yilmaz. 2022. Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2215–2225, Marseille, France. European Language Resources Association."
+---
--- a/examples/alan-turing-portal/content/datasets/let-mi-an-arabic-levantine-twitter-dataset-for-mysogynistic-language.md
+++ b/examples/alan-turing-portal/content/datasets/let-mi-an-arabic-levantine-twitter-dataset-for-mysogynistic-language.md
--- a/examples/alan-turing-portal/content/datasets/measuring-hate-speech.md
+++ b/examples/alan-turing-portal/content/datasets/measuring-hate-speech.md
@@ -0,0 +1,14 @@
+---
+title: Measuring Hate Speech
+link-to-publication: https://arxiv.org/abs/2009.10277
+link-to-data: https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech
+task-description: 10 ordinal labels (sentiment, (dis)respect, insult, humiliation, inferior status, violence, dehumanization, genocide, attack/defense, hate speech), which are debiased and aggregated into a continuous hate speech severity score (hate_speech_score) that includes a region for counterspeech & supportive speeech. Includes 8 target identity groups (race/ethnicity, religion, national origin/citizenship, gender, sexual orientation, age, disability, political ideology) and 42 identity subgroups.
+details-of-task: Hate speech measurement on social media in English
+size-of-dataset: "39,565 comments annotated by 7,912 annotators on 10 ordinal labels, for 1,355,560 total labels."
+percentage-abusive: 25
+language: English
+level-of-annotation: ["Social media comment"]
+platform: ["Twitter", "Reddit", "Youtube"]
+medium: ["Text"]
+reference: "Kennedy, C. J., Bacon, G., Sahn, A., & von Vacano, C. (2020). Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application. arXiv preprint arXiv:2009.10277."
+---
--- a/examples/alan-turing-portal/content/datasets/offensive-language-and-hate-speech-detection-for-danish.md
+++ b/examples/alan-turing-portal/content/datasets/offensive-language-and-hate-speech-detection-for-danish.md
@@ -0,0 +1,14 @@
+---
+title: Offensive Language and Hate Speech Detection for Danish
+link-to-publication: http://www.derczynski.com/papers/danish_hsd.pdf
+link-to-data: https://figshare.com/articles/Danish_Hate_Speech_Abusive_Language_data/12220805
+task-description: "Branching structure of tasks: Binary (Offensive, Not), Within Offensive (Target, Not), Within Target (Individual, Group, Other)"
+details-of-task: Group-directed + Person-directed
+size-of-dataset: 3600
+percentage-abusive: 0.12
+language: Danish
+level-of-annotation: ["Posts"]
+platform: ["Twitter", "Reddit", "Newspaper comments"]
+medium: ["Text"]
+reference: "Sigurbergsson, G. and Derczynski, L., 2019. Offensive Language and Hate Speech Detection for Danish. ArXiv."
+---
--- a/examples/alan-turing-portal/content/index.md
+++ b/examples/alan-turing-portal/content/index.md
@@ -1,3 +1,7 @@
+---
+title: Hate Speech Dataset Catalogue
+---
+
 This page catalogues datasets annotated for hate speech, online abuse, and offensive language. They may be useful for e.g. training a natural language processing system to detect this language.

 The list is maintained by Leon Derczynski, Bertie Vidgen, Hannah Rose Kirk, Pica Johansson, Yi-Ling Chung, Mads Guldborg Kjeldgaard Kongsbak, Laila Sprejer, and Philine Zeinert.
@@ -7,3 +11,24 @@ We provide a list of datasets and keywords. If you would like to contribute to o
 If you use these resources, please cite (and read!) our paper: Directions in Abusive Language Training Data: Garbage In, Garbage Out. And if you would like to find other resources for researching online hate, visit The Alan Turing Institute’s Online Hate Research Hub or read The Alan Turing Institute’s Reading List on Online Hate and Abuse Research.

 If you’re looking for a good paper on online hate training datasets (beyond our paper, of course!) then have a look at ‘Resources and benchmark corpora for hate speech detection: a systematic review’ by Poletto et al. in Language Resources and Evaluation.
+
+## How to contribute
+
+We accept entries to our catalogue based on pull requests to the content folder. The dataset must be avaliable for download to be included in the list. If you want to add an entry, follow these steps!
+
+Please send just one dataset addition/edit at a time - edit it in, then save. This will make everyone’s life easier (including yours!)
+
+- Go to the repo url file and click the "Add file" dropdown and then click on "Create new file".
+![](https://i.imgur.com/2PR0ZgL.png)
+   
+- In the following page type `content/datasets/<name-of-the-file>.md`. if you want to add an entry to the datasets catalog or `content/keywords/<name-of-the-file>.md` if you want to add an entry to the lists of abusive keywords, if you want to just add an static page you can leave in the root of `content` it will automatically get assigned an url eg: `/content/about.md` becomes the `/about` page
+![](https://i.imgur.com/rr3uSYu.png)
+   
+- Copy the contents of `templates/dataset.md` or `templates/keywords.md` respectively to the camp below, filling out the fields with the correct data format
+![](https://i.imgur.com/x6JIjhz.png)
+   
+- Click on "Commit changes", on the popup make sure you give some brief detail on the proposed change.  and then click on Propose changes
+<img src='https://i.imgur.com/BxuxKEJ.png' style={{ maxWidth: '50%', margin: '0 auto' }}/>
+   
+- Submit the pull request on the next page when prompted.
+
--- a/examples/alan-turing-portal/content/keywords/hurtlex.md
+++ b/examples/alan-turing-portal/content/keywords/hurtlex.md
@@ -0,0 +1,10 @@
+---
+title: Hurtlex
+description: HurtLex is a lexicon of offensive, aggressive, and hateful words in over 50 languages. The words are divided into 17 categories, plus a macro-category indicating whether there is stereotype involved.
+data-link: https://github.com/valeriobasile/hurtlex
+reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Proc. CLiC-it 2018
+---
+
+## Markdown TEST
+
+Some text
--- a/examples/alan-turing-portal/content/keywords/jiang-et-al.md
+++ b/examples/alan-turing-portal/content/keywords/jiang-et-al.md
@@ -0,0 +1,5 @@
+---
+title: SexHateLex is a Chinese lexicon of hateful and sexist words.
+data-link: https://doi.org/10.5281/zenodo.4773875
+reference: http://ceur-ws.org/Vol-2253/paper49.pdf, Journal of OSNEM, Vol.27, 2022, 100182, ISSN 2468-6964.
+---
--- a/examples/alan-turing-portal/lib/markdown.js
+++ b/examples/alan-turing-portal/lib/markdown.js
@@ -0,0 +1,105 @@
+import matter from "gray-matter";
+import mdxmermaid from "mdx-mermaid";
+import { h } from "hastscript";
+import remarkCallouts from "@flowershow/remark-callouts";
+import remarkEmbed from "@flowershow/remark-embed";
+import remarkGfm from "remark-gfm";
+import remarkMath from "remark-math";
+import remarkSmartypants from "remark-smartypants";
+import remarkToc from "remark-toc";
+import remarkWikiLink from "@flowershow/remark-wiki-link";
+import rehypeAutolinkHeadings from "rehype-autolink-headings";
+import rehypeKatex from "rehype-katex";
+import rehypeSlug from "rehype-slug";
+import rehypePrismPlus from "rehype-prism-plus";
+
+import { serialize } from "next-mdx-remote/serialize";
+
+/**
+ * Parse a markdown or MDX file to an MDX source form + front matter data
+ *
+ * @source: the contents of a markdown or mdx file
+ * @format: used to indicate to next-mdx-remote which format to use (md or mdx)
+ * @returns: { mdxSource: mdxSource, frontMatter: ...}
+ */
+const parse = async function (source, format) {
+  const { content, data, excerpt } = matter(source, {
+    excerpt: (file, options) => {
+      // Generate an excerpt for the file
+      file.excerpt = file.content.split("\n\n")[0];
+    },
+  });
+
+  const mdxSource = await serialize(
+    { value: content, path: format },
+    {
+      // Optionally pass remark/rehype plugins
+      mdxOptions: {
+        remarkPlugins: [
+          remarkEmbed,
+          remarkGfm,
+          [remarkSmartypants, { quotes: false, dashes: "oldschool" }],
+          remarkMath,
+          remarkCallouts,
+          remarkWikiLink,
+          [
+            remarkToc,
+            {
+              heading: "Table of contents",
+              tight: true,
+            },
+          ],
+          [mdxmermaid, {}],
+        ],
+        rehypePlugins: [
+          rehypeSlug,
+          [
+            rehypeAutolinkHeadings,
+            {
+              properties: { className: 'heading-link' },
+              test(element) {
+                return (
+                  ["h2", "h3", "h4", "h5", "h6"].includes(element.tagName) &&
+                  element.properties?.id !== "table-of-contents" &&
+                  element.properties?.className !== "blockquote-heading"
+                );
+              },
+              content() {
+                return [
+                  h(
+                    "svg",
+                    {
+                      xmlns: "http:www.w3.org/2000/svg",
+                      fill: "#ab2b65",
+                      viewBox: "0 0 20 20",
+                      className: "w-5 h-5",
+                    },
+                    [
+                      h("path", {
+                        fillRule: "evenodd",
+                        clipRule: "evenodd",
+                        d: "M9.493 2.853a.75.75 0 00-1.486-.205L7.545 6H4.198a.75.75 0 000 1.5h3.14l-.69 5H3.302a.75.75 0 000 1.5h3.14l-.435 3.148a.75.75 0 001.486.205L7.955 14h2.986l-.434 3.148a.75.75 0 001.486.205L12.456 14h3.346a.75.75 0 000-1.5h-3.14l.69-5h3.346a.75.75 0 000-1.5h-3.14l.435-3.147a.75.75 0 00-1.486-.205L12.045 6H9.059l.434-3.147zM8.852 7.5l-.69 5h2.986l.69-5H8.852z",
+                      }),
+                    ]
+                  ),
+                ];
+              },
+            },
+          ],
+          [rehypeKatex, { output: "mathml" }],
+          [rehypePrismPlus, { ignoreMissing: true }],
+        ],
+        format,
+      },
+      scope: data,
+    }
+  );
+
+  return {
+    mdxSource: mdxSource,
+    frontMatter: data,
+    excerpt,
+  };
+};
+
+export default parse;
--- a/examples/alan-turing-portal/markdown.db
+++ b/examples/alan-turing-portal/markdown.db
--- a/examples/alan-turing-portal/next-env.d.ts
+++ b/examples/alan-turing-portal/next-env.d.ts
@@ -0,0 +1,5 @@
+/// <reference types="next" />
+/// <reference types="next/image-types/global" />
+
+// NOTE: This file should not be edited
+// see https://nextjs.org/docs/basic-features/typescript for more information.
--- a/examples/alan-turing-portal/package-lock.json
+++ b/examples/alan-turing-portal/package-lock.json
--- a/examples/alan-turing-portal/package.json
+++ b/examples/alan-turing-portal/package.json
@@ -26,17 +26,42 @@
    "feed": "^4.2.2",
    "flexsearch": "^0.7.31",
    "focus-visible": "^5.2.0",
-    "next": "13.3.0",
    "next-router-mock": "^0.9.3",
    "next-superjson-plugin": "^0.5.7",
    "postcss-focus-visible": "^6.0.4",
-    "react": "18.2.0",
-    "react-dom": "18.2.0",
    "react-hook-form": "^7.43.9",
    "react-markdown": "^8.0.7",
-    "remark-gfm": "^3.0.1",
    "superjson": "^1.12.3",
-    "tailwindcss": "^3.3.0"
+    "tailwindcss": "^3.3.0",
+    "@flowershow/core": "^0.4.10",
+    "@flowershow/remark-callouts": "^1.0.0",
+    "@flowershow/remark-embed": "^1.0.0",
+    "@flowershow/remark-wiki-link": "^1.1.2",
+    "@heroicons/react": "^2.0.17",
+    "@opentelemetry/api": "^1.4.0",
+    "@tanstack/react-table": "^8.8.5",
+    "@types/node": "18.16.0",
+    "@types/react": "18.2.0",
+    "@types/react-dom": "18.2.0",
+    "eslint": "8.39.0",
+    "eslint-config-next": "13.3.1",
+    "gray-matter": "^4.0.3",
+    "hastscript": "^7.2.0",
+    "mdx-mermaid": "2.0.0-rc7",
+    "next": "13.2.1",
+    "next-mdx-remote": "^4.4.1",
+    "papaparse": "^5.4.1",
+    "react": "18.2.0",
+    "react-dom": "18.2.0",
+    "react-vega": "^7.6.0",
+    "rehype-autolink-headings": "^6.1.1",
+    "rehype-katex": "^6.0.3",
+    "rehype-prism-plus": "^1.5.1",
+    "rehype-slug": "^5.1.0",
+    "remark-gfm": "^3.0.1",
+    "remark-math": "^5.1.1",
+    "remark-smartypants": "^2.0.0",
+    "remark-toc": "^8.0.1"
  },
  "devDependencies": {
    "eslint": "8.26.0",
--- a/examples/alan-turing-portal/pages/[...slug].jsx
+++ b/examples/alan-turing-portal/pages/[...slug].jsx
@@ -0,0 +1,103 @@
+import { Container } from '../components/Container'
+import clientPromise from '../lib/mddb'
+import fs from 'fs'
+import { MDXRemote } from 'next-mdx-remote'
+import { serialize } from 'next-mdx-remote/serialize'
+import { Card } from '../components/Card'
+import Head from 'next/head'
+
+export const getStaticProps = async ({ params }) => {
+  const urlPath = params.slug ? params.slug.join('/') : ''
+
+  const mddb = await clientPromise
+  const dbFile = await mddb.getFileByUrl(urlPath)
+
+  const source = fs.readFileSync(dbFile.file_path, { encoding: 'utf-8' })
+  const mdxSource = await serialize(source, { parseFrontmatter: true })
+
+  return {
+    props: {
+      mdxSource,
+    },
+  }
+}
+
+export async function getStaticPaths() {
+  const mddb = await clientPromise
+  const allDocuments = await mddb.getFiles({ extensions: ['md', 'mdx'] })
+
+  const paths = allDocuments.filter(document => document.url_path !== '/').map((page) => {
+    const parts = page.url_path.split('/')
+    return { params: { slug: parts } }
+  })
+
+  return {
+    paths,
+    fallback: false,
+  }
+}
+
+const isValidUrl = (urlString) => {
+  try {
+    return Boolean(new URL(urlString))
+  } catch (e) {
+    return false
+  }
+}
+
+const Meta = ({keyValuePairs}) => {
+  const prettifyMetaValue = (value) => value.replaceAll('-',' ').charAt(0).toUpperCase() + value.replaceAll('-',' ').slice(1);
+  return (
+    <>
+      {keyValuePairs.map((entry) => {
+        return isValidUrl(entry[1]) ? (
+          <Card.Description>
+            <span className="font-semibold">
+              {prettifyMetaValue(entry[0])}: {' '}
+            </span>
+              <a
+                className="text-ellipsis underline transition hover:text-teal-400 dark:hover:text-teal-900"
+                href={entry[1]}
+              >
+                {entry[1]}
+              </a>
+          </Card.Description>
+        ) : (
+          <Card.Description>
+            <span className="font-semibold">{prettifyMetaValue(entry[0])}: </span>
+            {Array.isArray(entry[1]) ? entry[1].join(', ') : entry[1]}
+          </Card.Description>
+        )
+      })}
+    </>
+  )
+}
+
+export default function DRDPage({ mdxSource }) {
+  const meta = mdxSource.frontmatter
+  const keyValuePairs = Object.entries(meta).filter(
+    (entry) => entry[0] !== 'title'
+  )
+  return (
+    <>
+      <Head>
+        <title>{meta.title}</title>
+      </Head>
+      <Container className="mt-16 lg:mt-32">
+        <article>
+          <header className="flex flex-col">
+            <h1 className="mt-6 text-4xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
+              {meta.title}
+            </h1>
+            <Card as="article">
+              <Meta keyValuePairs={keyValuePairs} />
+            </Card>
+          </header>
+          <div className="prose dark:prose-invert">
+            <MDXRemote {...mdxSource} />
+          </div>
+        </article>
+      </Container>
+    </>
+  )
+}
--- a/examples/alan-turing-portal/pages/index.jsx
+++ b/examples/alan-turing-portal/pages/index.jsx
@@ -3,19 +3,21 @@ import fs from 'fs'

 import { Card } from '../components/Card'
 import { Container } from '../components/Container'
-import clientPromise from '@/lib/mddb'
-import ReactMarkdown from 'react-markdown'
+import clientPromise from '../lib/mddb'
 import { Index } from 'flexsearch'
 import { useForm } from 'react-hook-form'
+import Link from 'next/link'
+import { serialize } from 'next-mdx-remote/serialize'
+import { MDXRemote } from 'next-mdx-remote'

 function DatasetCard({ dataset }) {
  return (
    <Card as="article">
-      <Card.Title>{dataset.title}</Card.Title>
+      <Card.Title><Link href={dataset.url}>{dataset.title}</Link></Card.Title>
      <Card.Description>
        <span className="font-semibold">Link to publication: </span>{' '}
        <a
-          className="underline transition hover:text-teal-400 dark:hover:text-teal-900 text-ellipsis"
+          className="text-ellipsis underline transition hover:text-teal-400 dark:hover:text-teal-900"
          href={dataset['link-to-publication']}
        >
          {dataset['link-to-publication']}
@@ -24,7 +26,7 @@ function DatasetCard({ dataset }) {
      <Card.Description>
        <span className="font-semibold">Link to data: </span>
        <a
-          className="underline transition hover:text-teal-600 dark:hover:text-teal-900 text-ellipsis"
+          className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
          href={dataset['link-to-data']}
        >
          {dataset['link-to-data']}
@@ -69,14 +71,60 @@ function DatasetCard({ dataset }) {
    </Card>
  )
 }
-export default function Home({ datasets, indexText, availableLanguages, availablePlatforms }) {
+
+function ListOfAbusiveKeywordsCard({ list }) {
+  return (
+    <Card as="article">
+      <Card.Title><Link href={list.url}>{list.title}</Link></Card.Title>
+      {list.description && (
+        <Card.Description>
+          <span className="font-semibold">List Description: </span>{' '}
+          {list.description}
+        </Card.Description>
+      )}
+      <Card.Description>
+        <span className="font-semibold">Data Link: </span>
+        <a
+          className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
+          href={list['data-link']}
+        >
+          {list['data-link']}
+        </a>
+      </Card.Description>
+      <Card.Description>
+        <span className="font-semibold">Reference: </span>
+        <a
+          className="text-ellipsis underline transition hover:text-teal-600 dark:hover:text-teal-900"
+          href={list.reference}
+        >
+          {list.reference}
+        </a>
+      </Card.Description>
+    </Card>
+  )
+}
+
+export default function Home({
+  datasets,
+  indexText,
+  listsOfKeywords,
+  availableLanguages,
+  availablePlatforms,
+}) {
  const index = new Index()
-  datasets.forEach((dataset) => index.add(dataset.id, `${dataset.title} ${dataset['task-description']} ${dataset['details-of-task']} ${dataset['reference']}`))
-  const { register, watch } = useForm({ defaultValues: {
-    searchTerm: '',
-    lang: '',
-    platform: ''
-  }})
+  datasets.forEach((dataset) =>
+    index.add(
+      dataset.id,
+      `${dataset.title} ${dataset['task-description']} ${dataset['details-of-task']} ${dataset['reference']}`
+    )
+  )
+  const { register, watch, handleSubmit, reset } = useForm({
+    defaultValues: {
+      searchTerm: '',
+      lang: '',
+      platform: '',
+    },
+  })
  return (
    <>
      <Head>
@@ -89,49 +137,68 @@ export default function Home({ datasets, indexText, availableLanguages, availabl
      <Container className="mt-9">
        <div className="max-w-2xl">
          <h1 className="text-4xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
-            Hate Speech Dataset Catalogue
+            {indexText.frontmatter.title}
          </h1>
-          <article className="mt-6 flex flex-col gap-y-2 text-base text-zinc-600 dark:text-zinc-400">
-            <ReactMarkdown>{indexText}</ReactMarkdown>
+          <article className="mt-6 index-text flex flex-col gap-y-2 text-base text-zinc-600 dark:text-zinc-400 prose dark:prose-invert">
+            <MDXRemote {...indexText} />
          </article>
        </div>
      </Container>
      <Container className="mt-24 md:mt-28">
-        <div className="mx-auto grid max-w-xl grid-cols-1 gap-y-8 lg:max-w-none">
-          <form className="rounded-2xl border border-zinc-100 px-4 py-6 sm:p-6 dark:border-zinc-700/40">
+        <div className="mx-auto grid max-w-7xl grid-cols-1 gap-y-8 lg:max-w-none">
+          <h2 className="text-xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
+            Datasets
+          </h2>
+          <form onSubmit={handleSubmit(() => reset())} className="rounded-2xl border border-zinc-100 px-4 py-6 dark:border-zinc-700/40 sm:p-6">
            <p className="mt-2 text-lg font-semibold text-zinc-600 dark:text-zinc-100">
              Search for datasets
            </p>
-            <div className="mt-6 flex flex-col sm:flex-row gap-3">
+            <div className="mt-6 flex flex-col gap-3 sm:flex-row">
              <input
                placeholder="Search here"
                aria-label="Hate speech on Twitter"
-                required
                {...register('searchTerm')}
                className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-600 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-200 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
              />
              <select
                placeholder="Language"
                defaultValue=""
-                className="min-w-0 flex-auto text-zinc-600 appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
+                className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] text-zinc-600 shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
                {...register('lang')}
              >
-                <option value="" disabled hidden>Filter by language</option>
+                <option value="" disabled hidden>
+                  Filter by language
+                </option>
                {availableLanguages.map((lang) => (
-                  <option key={lang} className='dark:bg-white dark:text-black' value={lang}>{lang}</option>
+                  <option
+                    key={lang}
+                    className="dark:bg-white dark:text-black"
+                    value={lang}
+                  >
+                    {lang}
+                  </option>
                ))}
              </select>
              <select
                placeholder="Platform"
                defaultValue=""
-                className="min-w-0 flex-auto text-zinc-600 appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
+                className="min-w-0 flex-auto appearance-none rounded-md border border-zinc-900/10 bg-white px-3 py-[calc(theme(spacing.2)-1px)] text-zinc-600 shadow-md shadow-zinc-800/5 placeholder:text-zinc-400 focus:border-teal-500 focus:outline-none focus:ring-4 focus:ring-teal-500/10 dark:border-zinc-700 dark:bg-zinc-700/[0.15] dark:text-zinc-200 dark:placeholder:text-zinc-500 dark:focus:border-teal-400 dark:focus:ring-teal-400/10 sm:text-sm"
                {...register('platform')}
              >
-                <option value="" disabled hidden>Filter by platform</option>
+                <option value="" disabled hidden>
+                  Filter by platform
+                </option>
                {availablePlatforms.map((platform) => (
-                  <option key={platform} className='dark:bg-white dark:text-black' value={platform}>{platform}</option>
+                  <option
+                    key={platform}
+                    className="dark:bg-white dark:text-black"
+                    value={platform}
+                  >
+                    {platform}
+                  </option>
                ))}
              </select>
+              <button type='submit' className='inline-flex items-center gap-2 justify-center rounded-md py-2 px-3 text-sm outline-offset-2 transition active:transition-none bg-zinc-800 font-semibold text-zinc-100 hover:bg-zinc-700 active:bg-zinc-800 active:text-zinc-100/70 dark:bg-zinc-700 dark:hover:bg-zinc-600 dark:active:bg-zinc-700 dark:active:text-zinc-100/70 flex-none'>Clear filters</button>
            </div>
          </form>
          <div className="flex flex-col gap-16">
@@ -157,24 +224,56 @@ export default function Home({ datasets, indexText, availableLanguages, availabl
          </div>
        </div>
      </Container>
+      <Container className="mt-16">
+        <h2 className="text-xl font-bold tracking-tight text-zinc-800 dark:text-zinc-100 sm:text-5xl">
+          Lists of Abusive Keywords
+        </h2>
+        <div className="mt-3 flex flex-col gap-16">
+          {listsOfKeywords.map((list) => (
+            <ListOfAbusiveKeywordsCard key={list.title} list={list} />
+          ))}
+        </div>
+      </Container>
    </>
  )
 }

 export async function getStaticProps() {
  const mddb = await clientPromise
-  const allPages = await mddb.getFiles({ extensions: ['md', 'mdx'] })
-  const datasets = allPages
-    .filter((page) => page.url_path !== '/')
-    .map((page) => ({ ...page.metadata, id: page._id }))
-  const index = allPages.filter((page) => page.url_path === '/')[0]
-  const source = fs.readFileSync(index.file_path, { encoding: 'utf-8' })
-  const availableLanguages = [... new Set(datasets.map((dataset) => dataset.language))]
-  const availablePlatforms = [... new Set(datasets.map((dataset) => dataset.platform).flat())]
+  const datasetPages = await mddb.getFiles({
+    folder: 'datasets',
+    extensions: ['md', 'mdx'],
+  })
+  const datasets = datasetPages.map((page) => ({
+    ...page.metadata,
+    id: page._id,
+    url: page.url_path,
+  }))
+  const listsOfKeywordsPages = await mddb.getFiles({
+    folder: 'keywords',
+    extensions: ['md', 'mdx'],
+  })
+  const listsOfKeywords = listsOfKeywordsPages.map((page) => ({
+    ...page.metadata,
+    id: page._id,
+    url: page.url_path,
+  }))
+
+  const index = await mddb.getFileByUrl('/')
+  let indexSource = fs.readFileSync(index.file_path, { encoding: 'utf-8' })
+  indexSource = await serialize(indexSource, { parseFrontmatter: true })
+
+  const availableLanguages = [
+    ...new Set(datasets.map((dataset) => dataset.language)),
+  ]
+  const availablePlatforms = [
+    ...new Set(datasets.map((dataset) => dataset.platform).flat()),
+  ]
  return {
    props: {
-      indexText: source,
      datasets,
+      listsOfKeywords,
+      indexText: indexSource,
      availableLanguages,
      availablePlatforms,
    },
--- a/examples/alan-turing-portal/styles/tailwind.css
+++ b/examples/alan-turing-portal/styles/tailwind.css
@@ -2,3 +2,12 @@
@import 'tailwindcss/components';
@import './prism.css';
@import 'tailwindcss/utilities';
+
+.index-text ul,
+.index-text p {
+  margin: 0;
+}
+
+.index-text h2 {
+  margin-top: 1rem;
+}
--- a/examples/alan-turing-portal/templates/dataset.md
+++ b/examples/alan-turing-portal/templates/dataset.md
@@ -0,0 +1,14 @@
+---
+title: string
+link-to-publication: url
+link-to-data: url
+task-description: string
+details-of-task: string
+size-of-dataset: number
+percentage-abusive: number
+language: string
+level-of-annotation: list eg: ["Posts", "Comments", ...]
+platform: list eg: ["Youtube", "Facebook", ...]
+medium: list eg: ["Text", "Emojis", "Images", ...]
+reference: string
+---
--- a/examples/alan-turing-portal/templates/list-of-keywords.md
+++ b/examples/alan-turing-portal/templates/list-of-keywords.md
@@ -0,0 +1,5 @@
+---
+title: string
+data-link: url
+reference: string
+---
--- a/examples/alan-turing-portal/tsconfig.json
+++ b/examples/alan-turing-portal/tsconfig.json
@@ -0,0 +1,28 @@
+{
+  "compilerOptions": {
+    "lib": [
+      "dom",
+      "dom.iterable",
+      "esnext"
+    ],
+    "allowJs": true,
+    "skipLibCheck": true,
+    "strict": false,
+    "forceConsistentCasingInFileNames": true,
+    "noEmit": true,
+    "incremental": true,
+    "esModuleInterop": true,
+    "moduleResolution": "node",
+    "resolveJsonModule": true,
+    "isolatedModules": true,
+    "jsx": "preserve"
+  },
+  "include": [
+    "next-env.d.ts",
+    "**/*.ts",
+    "**/*.tsx"
+  ],
+  "exclude": [
+    "node_modules"
+  ]
+}
--- a/examples/learn-example/package.json
+++ b/examples/learn-example/package.json
@@ -6,7 +6,8 @@
    "dev": "next dev",
    "build": "next build",
    "start": "next start",
-    "lint": "next lint"
+    "lint": "next lint",
+    "export": "npm run build && next export -o out"
  },
  "dependencies": {
    "@flowershow/core": "^0.4.10",
--- a/examples/learn-example/pages/[[...path]].tsx
+++ b/examples/learn-example/pages/[[...path]].tsx
@@ -3,7 +3,21 @@ import path from 'path';
 import parse from '../lib/markdown';
 import DRD from '../components/DRD';

-export const getServerSideProps = async (context) => {
+export const getStaticPaths = async () => {
+  const contentDir = path.join(process.cwd(), '/content/');
+  const contentFolders = await fs.readdir(contentDir, 'utf8');
+  const paths = contentFolders.map((folder: string) =>
+    folder === 'index.md'
+      ? { params: { path: [] } }
+      : { params: { path: [folder] } }
+  );
+  return {
+    paths,
+    fallback: false,
+  };
+};
+
+export const getStaticProps = async (context) => {
  let pathToFile = 'index.md';
  if (context.params.path) {
    pathToFile = context.params.path.join('/') + '/index.md';
@@ -22,7 +36,7 @@ export const getServerSideProps = async (context) => {

 export default function DatasetPage({ mdxSource, frontMatter }) {
  return (
-    <div className="prose mx-auto">
+    <div className="prose dark:prose-invert mx-auto">
      <header>
        <div className="mb-6">
          <>
--- a/examples/learn-example/pages/_document.tsx
+++ b/examples/learn-example/pages/_document.tsx
@@ -7,7 +7,7 @@ class MyDocument extends Document {
        <Head>
          <link rel="icon" href="/favicon.png" />
        </Head>
-        <body>
+        <body className='bg-white dark:bg-gray-900'>
          <Main />
          <NextScript />
        </body>
--- a/site/content/docs/index.md
+++ b/site/content/docs/index.md
@@ -16,7 +16,7 @@ If you have questions about anything related to PortalJS, you're always welcome
 To create a PortalJS app, open your terminal, cd into the directory you’d like to create the app in, and run the following command:

 ```bash
-npx create-next-app my-data-portal --example https://github.com/datopian/portaljs/tree/main/examples/basic-example
+npx create-next-app my-data-portal --example https://github.com/datopian/portaljs/tree/main/examples/learn-example
 ```

 > [!tip]
@@ -116,3 +116,57 @@ From the browser, access http://localhost:3000. You should see the following:

 <img src="/assets/docs/datasets-index-page.png" />

+## Deploying your PortalJS app
+
+Finally, let's learn how to deploy PortalJS apps to Vercel or Cloudflare Pages.
+
+> [!tip]
+> Although we are using Vercel and Cloudflare Pages in this tutorial, you can deploy apps in any hosting solution you want as a static website by running `npm run export` and distributing the contents of the `out/` folder.
+
+### Push to a GitHub repo
+
+The PortalJS app we built up to this point is stored locally. To allow Vercel or Cloudflare Pages to deploy it, we have to push it to GitHub (or another SCM supported by these hosting solutions).
+
+- Create a new repository under your GitHub account
+- Add the new remote origin to your PortalJS app
+- Push the app to the repository
+
+If you are not sure about how to do it, follow this guide: https://nextjs.org/learn/basics/deploying-nextjs-app/github
+
+> [!tip]
+> You can also deploy using our Vercel deploy button. In this case, a new repository will be created under your GitHub account automatically.
+> [Click here](#one-click-deploy) to scroll to the deploy button.
+
+### Deploy to Vercel
+
+The easiest way to deploy a PortalJS app is to use Vercel, a serverless platform for static and hybrid applications developed by the creators of Next.js.
+
+To deploy your PortalJS app:
+
+- Create a Vercel account by going to https://vercel.com/signup and choosing "Continue with GitHub"
+- Import the repository you created for the PortalJS app at https://vercel.com/new
+- During the setup process you can use the default settings - no need to change anything.
+
+When you deploy, your PortalJS app will start building. It should finish in under a minute.
+
+When it’s done, you’ll get deployment URLs. Click on one of the URLs and you should see your PortaJS app live.
+
+>[!tip]
+> You can find a more in-depth explanation about this process at https://nextjs.org/learn/basics/deploying-nextjs-app/deploy
+
+#### One-Click Deploy
+
+You can instantly deploy our example app to your Vercel account by clicking the button below:
+
+[![Deploy with Vercel](https://vercel.com/button)](https://vercel.com/new/clone?repository-url=https%3A%2F%2Fgithub.com%2Fdatopian%2Fportaljs%2Ftree%2Fmain%2Fexamples%2Flearn-example&project-name=my-data-portal&repository-name=my-data-portal&demo-title=PortalJS%20Learn%20Example&demo-description=PortalJS%20Learn%20Example%20-%20https%3A%2F%2Fportaljs.org%2Fdocs&demo-url=learn-example.portaljs.org&demo-image=https%3A%2F%2Fportaljs.org%2Fassets%2Fexamples%2Fbasic-example.png)
+
+This will create a new repository on your GitHub account and deploy it to Vercel. If you are following the tutorial, you can replicate the changes done on your local app to this new repository.
+
+### Deploy to Cloudflare Pages
+
+To deploy your PortalJS app to Cloudflare Pages, follow this guide:
+
+https://developers.cloudflare.com/pages/framework-guides/deploy-a-nextjs-site/#deploy-with-cloudflare-pages-1
+
+Note that you don't have to change anything - just follow the steps, choosing the repository you created.
+
Author	SHA1	Message	Date
Luccas Mateus de Medeiros Gomes	5a29959d14	[alan-turing][m] - small tweaks	2023-05-02 12:53:44 -03:00
Luccas Mateus de Medeiros Gomes	ed3a26cd6d	[alan-turing][sm] - fix build	2023-05-01 21:40:46 -03:00
Luccas Mateus	026059184a	[alan-turing][m] - individual pages (#828 )	2023-05-01 21:06:52 -03:00
João Demenech	a041d69282	merge: tutorial VI Issue: https://github.com/datopian/portaljs/issues/821 ## Changes - Added `npm run export` command to `learn-example` - Added "Deploying your PortalJS app" section to `/docs` - Deploy to Vercel - One-Click Deploy (to Vercel) - Deploy to Cloudflare	2023-05-01 19:09:18 -03:00
deme	14abd5b768	[tutorial][xs]: fix typo	2023-05-01 15:02:59 -03:00
deme	4aaabba229	[#821,tutorial][m]: add export npm command to example-learn, add tutorial VI to /docs	2023-05-01 14:51:02 -03:00
Luccas Mateus de Medeiros Gomes	cc43597130	[learn-example][sm] - fix dark mode styling	2023-05-01 11:16:52 -03:00
João Demenech	d9a6ea4ef1	[docs][xs]: fix install command after rename	2023-05-01 08:30:17 -03:00
Luccas Mateus	f6b94ee254	Alan turing portal (#815 ) * [alan-turing-portal][m] - initial commit * [alan-turing][m] - first page with search * [alan-turing][m] - cleanup	2023-04-29 23:37:30 -03:00
Luccas Mateus de Medeiros Gomes	04b05c0896	[learn-example][sm] - use getstaticprops	2023-04-29 13:54:51 -03:00