Resources

The following are a group of datasets and resources that have been published.

ArSarcasm Dataset (v2)

An extension of the original ArSarcasm. It contains around 15K tweets labelled for sarcasm, sentiment and dialect.
The standard dataset for the shared task on sarcasm and sentiment detection in Arabic . We recommend using ArSarcasm-v2 over ArSarcasm-v1.
Available here
Abu Farha I., W. Zaghouani and W. Magdy. “Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic”. WANLP - EACL 2021 link

ArSarcasm Dataset (v1)

A set of around 10K tweets labelled for sarcasm, sentiment and dialect.
Available here
Abu Farha I. and W. Magdy. “From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset”. OSACT4 - LREC 2020 link

Mazajak Arabic Sentiment Analyser

Free online Arabic Sentiment Analysis tool and API
Available here
Related publication:
Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link

Mazajak Arabic Word Embeddigs

Arabic Word Embedding set for social media.
Word2vec vectors built using CBOW and Skip-gram Architectures.
Built using 250M tweets.
Available for free here
Related publication:
Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link