Resources
The following are a group of datasets and resources that have been published.
iSarcasmEval Dataset
-
The dataset from SemEval-2022 Task 6: iSarcasmEval — the first shared task on intended sarcasm detection in English and Arabic, co-located with NAACL 2022. Attracted 60 participating teams.
-
Unlike most sarcasm datasets, the data is author-annotated (first-party), where authors label their own text as sarcastic or not, avoiding the noise of third-party annotation.
-
Available here
-
Abu Farha I., S. V. Oprea, S. Wilson and W. Magdy. “SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic”. SemEval 2022 link
ArSarcasm Dataset (v2)
-
An extension of the original ArSarcasm. It contains around 15K tweets labelled for sarcasm, sentiment and dialect.
-
The standard dataset for the shared task on sarcasm and sentiment detection in Arabic . We recommend using ArSarcasm-v2 over ArSarcasm-v1.
-
Available here
-
Abu Farha I., W. Zaghouani and W. Magdy. “Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic”. WANLP - EACL 2021 link
ArSarcasm Dataset (v1)
-
A set of around 10K tweets labelled for sarcasm, sentiment and dialect.
-
Available here
-
Abu Farha I. and W. Magdy. “From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset”. OSACT4 - LREC 2020 link
Mazajak Arabic Sentiment Analyser
-
Free online Arabic Sentiment Analysis tool and API
-
Available here
-
Related publication:
Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link