Data annotation is the foundation of most natural language processing (NLP) tasks. However, data annotation is complex and there is often no specific correct label, especially in subjective tasks. Data annotation is affected by the annotators’ ability to understand the provided data. In the case of Arabic, this is important due to the large dialectal variety. In this paper, we analyse how Arabic speakers understand other dialects in written text. Also, we analyse the effect of dialect familiarity on the quality of data annotation, focusing on Arabic sarcasm detection. This is done by collecting third-party labels and comparing them to high-quality first-party labels. Our analysis shows that annotators tend to better identify their own dialect and they are prone to confuse dialects they are unfamiliar with. For task labels, annotators tend to perform better on their dialect or dialects they are familiar with. Finally, females tend to perform better than males on the sarcasm detection task. We suggest that to guarantee high-quality labels, researchers should recruit native dialect speakers for annotation.
EMNLP-Findings
Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection
Abu Farha, Ibrahim, Wilson, Steven, Oprea, Silviu, and Magdy, Walid
In Findings of the Association for Computational Linguistics: EMNLP 2022 Dec 2022
Recently, author-annotated sarcasm datasets, which focus on intended, rather than perceived sarcasm, have been introduced. Although datasets collected using first-party annotation have important benefits, there is no comparison of human and machine performance on these new datasets. In this paper, we collect new annotations to provide human-level benchmarks for these first-party annotated sarcasm tasks in both English and Arabic, and compare the performance of human annotators to that of state-of-the-art sarcasm detection systems. Our analysis confirms that sarcasm detection is extremely challenging, with individual humans performing close to or slightly worse than the best trained models. With majority voting, however, humans are able to achieve the best results on all tasks. We also perform error analysis, finding that some of the most challenging examples are those that require additional context. We also highlight common features and patterns used to express sarcasm in English and Arabic such as idioms and proverbs. We suggest that to better capture sarcasm, future sarcasm detection datasets and models should focus on representing conversational and cultural context while leveraging world knowledge and common sense.
SemEval
SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic
Abu Farha, Ibrahim, Oprea, Silviu Vlad, Wilson, Steven, and Magdy, Walid
In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) Jul 2022
iSarcasmEval is the first shared task to target intended sarcasm detection: the data for this task was provided and labelled by the authors of the texts themselves. Such an approach minimises the downfalls of other methods to collect sarcasm data, which rely on distant supervision or third-party annotations. The shared task contains two languages, English and Arabic, and three subtasks: sarcasm detection, sarcasm category classification, and pairwise sarcasm identification given a sarcastic sentence and its non-sarcastic rephrase. The task received submissions from 60 different teams, with the sarcasm detection task being the most popular. Most of the participating teams utilised pre-trained language models. In this paper, we provide an overview of the task, data, and participating teams.
2021
WANLP
Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic
Abu Farha, Ibrahim, Zaghouani, Wajdi, and Magdy, Walid
In Proceedings of the Sixth Arabic Natural Language Processing Workshop Apr 2021
This paper provides an overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. The shared task has two subtasks: sarcasm detection (subtask 1) and sentiment analysis (subtask 2). This shared task aims to promote and bring attention to Arabic sarcasm detection, which is crucial to improve the performance in other tasks such as sentiment analysis. The dataset used in this shared task, namely ArSarcasm-v2, consists of 15,548 tweets labelled for sarcasm, sentiment and dialect. We received 27 and 22 submissions for subtasks 1 and 2 respectively. Most of the approaches relied on using and fine-tuning pre-trained language models such as AraBERT and MARBERT. The top achieved results for the sarcasm detection and sentiment analysis tasks were 0.6225 F1-score and 0.748 F1-PN respectively.
WANLP
Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection
Abu Farha, Ibrahim, and Magdy, Walid
In Proceedings of the Sixth Arabic Natural Language Processing Workshop Apr 2021
The introduction of transformer-based language models has been a revolutionary step for natural language processing (NLP) research. These models, such as BERT, GPT and ELECTRA, led to state-of-the-art performance in many NLP tasks. Most of these models were initially developed for English and other languages followed later. Recently, several Arabic-specific models started emerging. However, there are limited direct comparisons between these models. In this paper, we evaluate the performance of 24 of these models on Arabic sentiment and sarcasm detection. Our results show that the models achieving the best performance are those that are trained on only Arabic data, including dialectal Arabic, and use a larger number of parameters, such as the recently released MARBERT. However, we noticed that AraELECTRA is one of the top performing models while being much more efficient in its computational cost. Finally, the experiments on AraGPT2 variants showed low performance compared to BERT models, which indicates that it might not be suitable for classification tasks.
IP&M
A comparative study of effective approaches for Arabic sentiment analysis
Sentiment analysis (SA) is a natural language processing (NLP) application that aims to analyse and identify sentiment within a piece of text. Arabic SA started to receive more attention in the last decade with many approaches showing some effectiveness for detecting sentiment on multiple datasets. While there have been some surveys summarising some of the approaches for Arabic SA in literature, most of these approaches are reported on different datasets, which makes it difficult to identify the most effective approaches among those. In addition, those approaches do not cover the recent advances in NLP that use transformers. This paper presents a comprehensive comparative study on the most effective approaches used for Arabic sentiment analysis. We re-implement most of the existing approaches for Arabic SA and test their effectiveness on three of the most popular benchmark datasets for Arabic SA. Further, we examine the use of transformer-based language models for Arabic SA and show their superior performance compared to the existing approaches, where the best model achieves F-score scores of 0.69, 0.76, and 0.92 on the SemEval, ASTD, and ArSAS benchmark datasets. We also apply an extensive analysis of the possible reasons for failures, which show the limitations of the existing annotated Arabic SA datasets, and the challenge of sarcasm that is prominent in Arabic dialects. Finally, we highlight the main gaps in Arabic sentiment analysis research and suggest the most in-need future research directions in this area.
JKSUCI
An efficient single document Arabic text summarization using a combination of statistical and semantic features
Qaroush, Aziz,
Abu Farha, Ibrahim, Ghanem, Wasel, Washaha, Mahdi, and Maali, Eman
Journal of King Saud University - Computer and Information Sciences Apr 2021
The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose an automatic, generic, and extractive Arabic single document summarizing method aiming at producing a sufficiently informative summary. The proposed extractive method evaluates each sentence based on a combination of statistical and semantic features in which a novel formulation is used taking into account sentence importance, coverage and diversity. Further, two summarizing techniques including score-based and supervised machine learning were employed to produce the summary and then assist leveraging the designed features. We demonstrate the effectiveness of the proposed method through a set of experiments under EASC corpus using ROUGE measure. Compared to some existing related work, the experimental evaluation shows the strength of the proposed method in terms of precision, recall, and F-score performance metrics.
2020
OSACT
Multitask Learning for Arabic Offensive Language and Hate-Speech Detection
Abu Farha, Ibrahim, and Magdy, Walid
In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection May 2020
Offensive language and hate-speech are phenomena that spread with the rising popularity of social media. Detecting such content is crucial for understanding and predicting conflicts, understanding polarisation among communities and providing means and tools to filter or block inappropriate content. This paper describes the SMASH team submission to OSACT4’s shared task on hate-speech and offensive language detection, where we explore different approaches to perform these tasks. The experiments cover a variety of approaches that include deep learning, transfer learning and multitask learning. We also explore the utilisation of sentiment information to perform the previous task. Our best model is a multitask learning architecture, based on CNN-BiLSTM, that was trained to detect hate-speech and offensive language and predict sentiment.
OSACT
From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset
Abu Farha, Ibrahim, and Magdy, Walid
In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection May 2020
Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16% of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotators’ biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1 score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.
2019
WANLP
Mazajak: An Online Arabic Sentiment Analyser
Abu Farha, Ibrahim, and Magdy, Walid
In Proceedings of the Fourth Arabic Natural Language Processing Workshop Aug 2019
Sentiment analysis (SA) is one of the most useful natural language processing applications. Literature is flooding with many papers and systems addressing this task, but most of the work is focused on English. In this paper, we present “Mazajak”, an online system for Arabic SA. The system is based on a deep learning model, which achieves state-of-the-art results on many Arabic dialect datasets including SemEval 2017 and ASTD. The availability of such system should assist various applications and research that rely on sentiment analysis as a tool.