Ibrahim Abu Farha | Resources

The dataset from SemEval-2022 Task 6: iSarcasmEval — the first shared task on intended sarcasm detection in English and Arabic, co-located with NAACL 2022. Attracted 60 participating teams.
Unlike most sarcasm datasets, the data is author-annotated (first-party), where authors label their own text as sarcastic or not, avoiding the noise of third-party annotation.
Available here
Abu Farha I., S. V. Oprea, S. Wilson and W. Magdy. “SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic”. SemEval 2022 link

An extension of the original ArSarcasm. It contains around 15K tweets labelled for sarcasm, sentiment and dialect.
The standard dataset for the shared task on sarcasm and sentiment detection in Arabic . We recommend using ArSarcasm-v2 over ArSarcasm-v1.
Available here
Abu Farha I., W. Zaghouani and W. Magdy. “Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic”. WANLP - EACL 2021 link

A set of around 10K tweets labelled for sarcasm, sentiment and dialect.
Available here
Abu Farha I. and W. Magdy. “From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset”. OSACT4 - LREC 2020 link

Free online Arabic Sentiment Analysis tool and API
Available here
Related publication:
Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link

Arabic Word Embedding set for social media.
Word2vec vectors built using CBOW and Skip-gram Architectures.
Built using 250M tweets.
Available for free here
Related publication:
Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link