Resources

The following are a group of datasets and resources that have been published.

ArSarcasm Dataset (v2)

  • An extension of the original ArSarcasm. It contains around 15K tweets labelled for sarcasm, sentiment and dialect.

  • The standard dataset for the shared task on sarcasm and sentiment detection in Arabic . We recommend using ArSarcasm-v2 over ArSarcasm-v1.

  • Available here

  • Abu Farha I., W. Zaghouani and W. Magdy. “Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic”. WANLP - EACL 2021 link

ArSarcasm Dataset (v1)

  • A set of around 10K tweets labelled for sarcasm, sentiment and dialect.

  • Available here

  • Abu Farha I. and W. Magdy. “From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset”. OSACT4 - LREC 2020 link

Mazajak Arabic Sentiment Analyser

  • Free online Arabic Sentiment Analysis tool and API

  • Available here

  • Related publication:
    Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link

Mazajak Arabic Word Embeddigs

  • Arabic Word Embedding set for social media.

  • Word2vec vectors built using CBOW and Skip-gram Architectures.

  • Built using 250M tweets.

  • Available for free here

  • Related publication:
    Abu Farha I. and W. Magdy. “Mazajak: An Online Arabic Sentiment Analyser”. WANLP - ACL 2019 link