Ibrahim Abu Farha | Publications

2025

ICWSM

UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections

Haouari, Fatima, Scarton, Carolina, Faggiani, Nicolò, Nikolaidis, Nikolaos, Kotseva, Bonka, Abu Farha, Ibrahim, Linge, Jens, and Bontcheva, Kalina

Proceedings of the International AAAI Conference on Web and Social Media Jun 2025

Bib

@article{Haouari_Scarton_Faggiani_Nikolaidis_Kotseva_Abu Farha_Linge_Bontcheva_2025,
  abbr = {ICWSM},
  bibtex_show = {true},
  title = {UKElectionNarratives: A Dataset of Misleading Narratives Surrounding Recent UK General Elections},
  volume = {19},
  url = {https://ojs.aaai.org/index.php/ICWSM/article/view/35950},
  doi = {10.1609/icwsm.v19i1.35950},
  abstractnote = {Misleading narratives play a crucial role in shaping public opinion during elections,
   as they can influence how voters perceive candidates and political parties. This entails the need
  to detect these narratives accurately. To address this,
   we introduce the first taxonomy of common misleading narratives that circulated during recent elections in Europe. Based on this
  taxonomy,
   we construct and analyse UKElectionNarratives: the first dataset of human-annotated misleading narratives which circulated during the UK General Elections in 2019 and 2024. We also benchmark Pre-trained and Large Language Models (focusing on GPT-4o),
   studying their effectiveness in detecting election-related misleading narratives. Finally,
   we discuss potential use cases and make recommendations for future research directions using the proposed codebook and dataset.},
  number = {1},
  journal = {Proceedings of the International AAAI Conference on Web and Social Media},
  author = {Haouari, Fatima and Scarton, Carolina and Faggiani, Nicolò and Nikolaidis, Nikolaos and Kotseva, Bonka and Abu Farha, Ibrahim and Linge, Jens and Bontcheva, Kalina},
  year = {2025},
  month = jun,
  pages = {2477-2495}
}

2024

ArabicNLP

SMASH at StanceEval 2024: Prompt Engineering LLMs for Arabic Stance Detection

Al Hariri, Youssef, and Abu Farha, Ibrahim

In Proceedings of the Second Arabic Natural Language Processing Conference Aug 2024

Abs Bib

This paper presents our submission for the Stance Detection in Arabic Language (StanceEval) 2024 shared task conducted by Team SMASH of the University of Edinburgh. We evaluated the performance of various BERT-based and large language models (LLMs). MARBERT demonstrates superior performance among the BERT-based models, achieving F1 and macro-F1 scores of 0.570 and 0.770, respectively. In contrast, Command R model outperforms all models with the highest overall F1 score of 0.661 and macro F1 score of 0.820.

@inproceedings{hariri-abu-farha-2024-smash-stanceeval,
  abbr = {ArabicNLP},
  bibtex_show = {true},
  title = {{SMASH} at {S}tance{E}val 2024: Prompt Engineering {LLM}s for {A}rabic Stance Detection},
  author = {Al Hariri, Youssef and Abu Farha, Ibrahim},
  editor = {Habash, Nizar and Bouamor, Houda and Eskander, Ramy and Tomeh, Nadi and Abu Farha, Ibrahim and Abdelali, Ahmed and Touileb, Samia and Hamed, Injy and Onaizan, Yaser and Alhafni, Bashar and Antoun, Wissam and Khalifa, Salam and Haddad, Hatem and Zitouni, Imed and AlKhamissi, Badr and Almatham, Rawan and Mrini, Khalil},
  booktitle = {Proceedings of the Second Arabic Natural Language Processing Conference},
  month = aug,
  year = {2024},
  address = {Bangkok, Thailand},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.arabicnlp-1.92/},
  doi = {10.18653/v1/2024.arabicnlp-1.92},
  pages = {800--806}
}

ArabicNLP
SMASH at AraFinNLP2024: Benchmarking Arabic BERT Models on the Intent Detection

Al Hariri, Youssef, and Abu Farha, Ibrahim

In Proceedings of the Second Arabic Natural Language Processing Conference Aug 2024

Abs Bib

The recent growth in Middle Eastern stock markets has intensified the demand for specialized financial Arabic NLP models to serve this sector. This article presents the participation of Team SMASH of The University of Edinburgh in the Multi-dialect Intent Detection task (Subtask 1) of the Arabic Financial NLP (AraFinNLP) Shared Task 2024. The dataset used in the shared task is the ArBanking77 (Jarrar et al., 2023). We tackled this task as a classification problem and utilized several BERT and BART-based models to classify the queries efficiently. Our solution is based on implementing a two-step hierarchical classification model based on MARBERTv2. We fine-tuned the model by using the original queries. Our team, SMASH, was ranked 9th with a macro F1 score of 0.7866, indicating areas for further refinement and potential enhancement of the model’s performance.
@inproceedings{hariri-abu-farha-2024-smash, abbr = {ArabicNLP}, bibtex_show = {true}, title = {{SMASH} at {A}ra{F}in{NLP}2024: Benchmarking {A}rabic {BERT} Models on the Intent Detection}, author = {Al Hariri, Youssef and Abu Farha, Ibrahim}, editor = {Habash, Nizar and Bouamor, Houda and Eskander, Ramy and Tomeh, Nadi and Abu Farha, Ibrahim and Abdelali, Ahmed and Touileb, Samia and Hamed, Injy and Onaizan, Yaser and Alhafni, Bashar and Antoun, Wissam and Khalifa, Salam and Haddad, Hatem and Zitouni, Imed and AlKhamissi, Badr and Almatham, Rawan and Mrini, Khalil}, booktitle = {Proceedings of the Second Arabic Natural Language Processing Conference}, month = aug, year = {2024}, address = {Bangkok, Thailand}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2024.arabicnlp-1.35/}, doi = {10.18653/v1/2024.arabicnlp-1.35}, pages = {403--409} }
EAMT
Multilinguality in the VIGILANT project

Spillane, Brendan, Scarton, Carolina, Moro, Robert, Ivanov, Petar, Tagarev, Andrey, Simko, Jakub, Abu Farha, Ibrahim, Munnelly, Gary, Uhlárik, Filip, and Heppell, Freddy

In Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2) Jun 2024

Abs Bib

VIGILANT (Vital IntelliGence to Investigate ILlegAl DisiNformaTion) is a three-year Horizon Europe project that will equip European Law Enforcement Agencies (LEAs) with advanced disinformation detection and analysis tools to investigate and prevent criminal activities linked to disinformation. These include disinformation instigating violence towards minorities, promoting false medical cures, and increasing tensions between groups causing civil unrest and violent acts. VIGILANT’s four LEAs require support for English, Spanish, Catalan, Greek, Estonian, Romanian and Russian. Therefore, multilinguality is a major challenge and we present the current status of our tools and our plans to improve their performance.
@inproceedings{spillane-etal-2024-multilinguality, abbr = {EAMT}, bibtex_show = {true}, title = {Multilinguality in the {VIGILANT} project}, author = {Spillane, Brendan and Scarton, Carolina and Moro, Robert and Ivanov, Petar and Tagarev, Andrey and Simko, Jakub and Abu Farha, Ibrahim and Munnelly, Gary and Uhl{\'a}rik, Filip and Heppell, Freddy}, editor = {Scarton, Carolina and Prescott, Charlotte and Bayliss, Chris and Oakley, Chris and Wright, Joanna and Wrigley, Stuart and Song, Xingyi and Gow-Smith, Edward and Forcada, Mikel and Moniz, Helena}, booktitle = {Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 2)}, month = jun, year = {2024}, address = {Sheffield, UK}, publisher = {European Association for Machine Translation (EAMT)}, url = {https://aclanthology.org/2024.eamt-2.21/}, pages = {41--42} }

2022

WANLP
Best Paper Award
The Effect of Arabic Dialect Familiarity on Data Annotation

Abu Farha, Ibrahim, and Magdy, Walid

In Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP) Dec 2022

Abs Bib Code/data

Data annotation is the foundation of most natural language processing (NLP) tasks. However, data annotation is complex and there is often no specific correct label, especially in subjective tasks. Data annotation is affected by the annotators’ ability to understand the provided data. In the case of Arabic, this is important due to the large dialectal variety. In this paper, we analyse how Arabic speakers understand other dialects in written text. Also, we analyse the effect of dialect familiarity on the quality of data annotation, focusing on Arabic sarcasm detection. This is done by collecting third-party labels and comparing them to high-quality first-party labels. Our analysis shows that annotators tend to better identify their own dialect and they are prone to confuse dialects they are unfamiliar with. For task labels, annotators tend to perform better on their dialect or dialects they are familiar with. Finally, females tend to perform better than males on the sarcasm detection task. We suggest that to guarantee high-quality labels, researchers should recruit native dialect speakers for annotation.
@inproceedings{abu-farha-magdy-2022-effect, abbr = {WANLP}, bibtex_show = {true}, code = {https://github.com/iabufarha/arabic-dialect-familiarity}, comment = {Best Paper Award}, title = {The Effect of {A}rabic Dialect Familiarity on Data Annotation}, author = {Abu Farha, Ibrahim and Magdy, Walid}, booktitle = {Proceedings of the The Seventh Arabic Natural Language Processing Workshop (WANLP)}, month = dec, year = {2022}, address = {Abu Dhabi, United Arab Emirates (Hybrid)}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.wanlp-1.39}, pages = {399--408} }
EMNLP-Findings
Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection

Abu Farha, Ibrahim, Wilson, Steven, Oprea, Silviu, and Magdy, Walid

In Findings of the Association for Computational Linguistics: EMNLP 2022 Dec 2022

Abs Bib Code/data

Recently, author-annotated sarcasm datasets, which focus on intended, rather than perceived sarcasm, have been introduced. Although datasets collected using first-party annotation have important benefits, there is no comparison of human and machine performance on these new datasets. In this paper, we collect new annotations to provide human-level benchmarks for these first-party annotated sarcasm tasks in both English and Arabic, and compare the performance of human annotators to that of state-of-the-art sarcasm detection systems. Our analysis confirms that sarcasm detection is extremely challenging, with individual humans performing close to or slightly worse than the best trained models. With majority voting, however, humans are able to achieve the best results on all tasks. We also perform error analysis, finding that some of the most challenging examples are those that require additional context. We also highlight common features and patterns used to express sarcasm in English and Arabic such as idioms and proverbs. We suggest that to better capture sarcasm, future sarcasm detection datasets and models should focus on representing conversational and cultural context while leveraging world knowledge and common sense.
@inproceedings{abu-farha-etal-2022-sarcasm, abbr = {EMNLP-Findings}, bibtex_show = {true}, code = {https://github.com/iabufarha/iSarcasmEval/tree/main/third-party%20annotations}, title = {Sarcasm Detection is Way Too Easy! An Empirical Comparison of Human and Machine Sarcasm Detection}, author = {Abu Farha, Ibrahim and Wilson, Steven and Oprea, Silviu and Magdy, Walid}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2022}, month = dec, year = {2022}, address = {Abu Dhabi, United Arab Emirates}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2022.findings-emnlp.387}, pages = {5284--5295} }
SemEval
SemEval-2022 Task 6: iSarcasmEval, Intended Sarcasm Detection in English and Arabic

Abu Farha, Ibrahim, Oprea, Silviu Vlad, Wilson, Steven, and Magdy, Walid

In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022) Jul 2022

Abs Bib PDF

iSarcasmEval is the first shared task to target intended sarcasm detection: the data for this task was provided and labelled by the authors of the texts themselves. Such an approach minimises the downfalls of other methods to collect sarcasm data, which rely on distant supervision or third-party annotations. The shared task contains two languages, English and Arabic, and three subtasks: sarcasm detection, sarcasm category classification, and pairwise sarcasm identification given a sarcastic sentence and its non-sarcastic rephrase. The task received submissions from 60 different teams, with the sarcasm detection task being the most popular. Most of the participating teams utilised pre-trained language models. In this paper, we provide an overview of the task, data, and participating teams.
@inproceedings{abu-farha-etal-2022-semeval, abbr = {SemEval}, bibtex_show = {true}, title = {{S}em{E}val-2022 Task 6: i{S}arcasm{E}val, Intended Sarcasm Detection in {E}nglish and {A}rabic}, author = {Abu Farha, Ibrahim and Oprea, Silviu Vlad and Wilson, Steven and Magdy, Walid}, booktitle = {Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)}, month = jul, year = {2022}, address = {Seattle, United States}, publisher = {Association for Computational Linguistics}, pdf = {https://aclanthology.org/2022.semeval-1.111}, data = {https://github.com/iabufarha/iSarcasmEval}, pages = {802--814} }

2021

WANLP
Overview of the WANLP 2021 Shared Task on Sarcasm and Sentiment Detection in Arabic

Abu Farha, Ibrahim, Zaghouani, Wajdi, and Magdy, Walid

In Proceedings of the Sixth Arabic Natural Language Processing Workshop Apr 2021

Abs Bib PDF

This paper provides an overview of the WANLP 2021 shared task on sarcasm and sentiment detection in Arabic. The shared task has two subtasks: sarcasm detection (subtask 1) and sentiment analysis (subtask 2). This shared task aims to promote and bring attention to Arabic sarcasm detection, which is crucial to improve the performance in other tasks such as sentiment analysis. The dataset used in this shared task, namely ArSarcasm-v2, consists of 15,548 tweets labelled for sarcasm, sentiment and dialect. We received 27 and 22 submissions for subtasks 1 and 2 respectively. Most of the approaches relied on using and fine-tuning pre-trained language models such as AraBERT and MARBERT. The top achieved results for the sarcasm detection and sentiment analysis tasks were 0.6225 F1-score and 0.748 F1-PN respectively.
@inproceedings{abu-farha-etal-2021-overview, abbr = {WANLP}, bibtex_show = {true}, title = {Overview of the {WANLP} 2021 Shared Task on Sarcasm and Sentiment Detection in {A}rabic}, author = {Abu Farha, Ibrahim and Zaghouani, Wajdi and Magdy, Walid}, booktitle = {Proceedings of the Sixth Arabic Natural Language Processing Workshop}, month = apr, year = {2021}, address = {Kyiv, Ukraine (Virtual)}, publisher = {Association for Computational Linguistics}, pdf = {https://aclanthology.org/2021.wanlp-1.36}, pages = {296--305} }
WANLP
Benchmarking Transformer-based Language Models for Arabic Sentiment and Sarcasm Detection

Abu Farha, Ibrahim, and Magdy, Walid

In Proceedings of the Sixth Arabic Natural Language Processing Workshop Apr 2021

Abs Bib PDF

The introduction of transformer-based language models has been a revolutionary step for natural language processing (NLP) research. These models, such as BERT, GPT and ELECTRA, led to state-of-the-art performance in many NLP tasks. Most of these models were initially developed for English and other languages followed later. Recently, several Arabic-specific models started emerging. However, there are limited direct comparisons between these models. In this paper, we evaluate the performance of 24 of these models on Arabic sentiment and sarcasm detection. Our results show that the models achieving the best performance are those that are trained on only Arabic data, including dialectal Arabic, and use a larger number of parameters, such as the recently released MARBERT. However, we noticed that AraELECTRA is one of the top performing models while being much more efficient in its computational cost. Finally, the experiments on AraGPT2 variants showed low performance compared to BERT models, which indicates that it might not be suitable for classification tasks.
@inproceedings{abu-farha-magdy-2021-benchmarking, abbr = {WANLP}, bibtex_show = {true}, title = {Benchmarking Transformer-based Language Models for {A}rabic Sentiment and Sarcasm Detection}, author = {Abu Farha, Ibrahim and Magdy, Walid}, booktitle = {Proceedings of the Sixth Arabic Natural Language Processing Workshop}, month = apr, year = {2021}, address = {Kyiv, Ukraine (Virtual)}, publisher = {Association for Computational Linguistics}, pdf = {https://aclanthology.org/2021.wanlp-1.3}, pages = {21--31} }
IP&M
A comparative study of effective approaches for Arabic sentiment analysis

Abu Farha, Ibrahim, and Magdy, Walid

Information Processing & Management Apr 2021

Abs Bib PDF

Sentiment analysis (SA) is a natural language processing (NLP) application that aims to analyse and identify sentiment within a piece of text. Arabic SA started to receive more attention in the last decade with many approaches showing some effectiveness for detecting sentiment on multiple datasets. While there have been some surveys summarising some of the approaches for Arabic SA in literature, most of these approaches are reported on different datasets, which makes it difficult to identify the most effective approaches among those. In addition, those approaches do not cover the recent advances in NLP that use transformers. This paper presents a comprehensive comparative study on the most effective approaches used for Arabic sentiment analysis. We re-implement most of the existing approaches for Arabic SA and test their effectiveness on three of the most popular benchmark datasets for Arabic SA. Further, we examine the use of transformer-based language models for Arabic SA and show their superior performance compared to the existing approaches, where the best model achieves F-score scores of 0.69, 0.76, and 0.92 on the SemEval, ASTD, and ArSAS benchmark datasets. We also apply an extensive analysis of the possible reasons for failures, which show the limitations of the existing annotated Arabic SA datasets, and the challenge of sarcasm that is prominent in Arabic dialects. Finally, we highlight the main gaps in Arabic sentiment analysis research and suggest the most in-need future research directions in this area.
@article{ABUFARHA2021102438, abbr = {IP&M}, bibtex_show = {true}, title = {A comparative study of effective approaches for Arabic sentiment analysis}, journal = {Information Processing & Management}, volume = {58}, number = {2}, pages = {102438}, year = {2021}, issn = {0306-4573}, doi = {https://doi.org/10.1016/j.ipm.2020.102438}, pdf = {https://www.sciencedirect.com/science/article/pii/S0306457320309316}, author = {{Abu Farha}, Ibrahim and Magdy, Walid}, keywords = {Arabic, Sentiment Analysis, Sarcasm} }
JKSUCI
An efficient single document Arabic text summarization using a combination of statistical and semantic features

Qaroush, Aziz, Abu Farha, Ibrahim, Ghanem, Wasel, Washaha, Mahdi, and Maali, Eman

Journal of King Saud University - Computer and Information Sciences Apr 2021

Abs Bib PDF

The exponential growth of online textual data triggered the crucial need for an effective and powerful tool that automatically provides the desired content in a summarized form while preserving core information. In this paper, we propose an automatic, generic, and extractive Arabic single document summarizing method aiming at producing a sufficiently informative summary. The proposed extractive method evaluates each sentence based on a combination of statistical and semantic features in which a novel formulation is used taking into account sentence importance, coverage and diversity. Further, two summarizing techniques including score-based and supervised machine learning were employed to produce the summary and then assist leveraging the designed features. We demonstrate the effectiveness of the proposed method through a set of experiments under EASC corpus using ROUGE measure. Compared to some existing related work, the experimental evaluation shows the strength of the proposed method in terms of precision, recall, and F-score performance metrics.
@article{QAROUSH2021677, abbr = {JKSUCI}, bibtex_show = {true}, title = {An efficient single document Arabic text summarization using a combination of statistical and semantic features}, journal = {Journal of King Saud University - Computer and Information Sciences}, volume = {33}, number = {6}, pages = {677-692}, year = {2021}, issn = {1319-1578}, doi = {https://doi.org/10.1016/j.jksuci.2019.03.010}, pdf = {https://www.sciencedirect.com/science/article/pii/S1319157818310498}, author = {Qaroush, Aziz and {Abu Farha}, Ibrahim and Ghanem, Wasel and Washaha, Mahdi and Maali, Eman}, keywords = {Arabic language, Single document summarization, Machine learning, Score-based, Statistical, Semantic, NLP} }

2020

OSACT
Multitask Learning for Arabic Offensive Language and Hate-Speech Detection

Abu Farha, Ibrahim, and Magdy, Walid

In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection May 2020

Abs Bib PDF

Offensive language and hate-speech are phenomena that spread with the rising popularity of social media. Detecting such content is crucial for understanding and predicting conflicts, understanding polarisation among communities and providing means and tools to filter or block inappropriate content. This paper describes the SMASH team submission to OSACT4’s shared task on hate-speech and offensive language detection, where we explore different approaches to perform these tasks. The experiments cover a variety of approaches that include deep learning, transfer learning and multitask learning. We also explore the utilisation of sentiment information to perform the previous task. Our best model is a multitask learning architecture, based on CNN-BiLSTM, that was trained to detect hate-speech and offensive language and predict sentiment.
@inproceedings{abu-farha-magdy-2020-multitask, abbr = {OSACT}, bibtex_show = {true}, title = {Multitask Learning for {A}rabic Offensive Language and Hate-Speech Detection}, author = {Abu Farha, Ibrahim and Magdy, Walid}, booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resource Association}, pdf = {https://aclanthology.org/2020.osact-1.14}, pages = {86--90}, language = {English}, isbn = {979-10-95546-51-1} }
OSACT
From Arabic Sentiment Analysis to Sarcasm Detection: The ArSarcasm Dataset

Abu Farha, Ibrahim, and Magdy, Walid

In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection May 2020

Abs Bib PDF

Sarcasm is one of the main challenges for sentiment analysis systems. Its complexity comes from the expression of opinion using implicit indirect phrasing. In this paper, we present ArSarcasm, an Arabic sarcasm detection dataset, which was created through the reannotation of available Arabic sentiment analysis datasets. The dataset contains 10,547 tweets, 16% of which are sarcastic. In addition to sarcasm the data was annotated for sentiment and dialects. Our analysis shows the highly subjective nature of these tasks, which is demonstrated by the shift in sentiment labels based on annotators’ biases. Experiments show the degradation of state-of-the-art sentiment analysers when faced with sarcastic content. Finally, we train a deep learning model for sarcasm detection using BiLSTM. The model achieves an F1 score of 0.46, which shows the challenging nature of the task, and should act as a basic baseline for future research on our dataset.
@inproceedings{abu-farha-magdy-2020-arabic, abbr = {OSACT}, bibtex_show = {true}, title = {From {A}rabic Sentiment Analysis to Sarcasm Detection: The {A}r{S}arcasm Dataset}, author = {Abu Farha, Ibrahim and Magdy, Walid}, booktitle = {Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection}, month = may, year = {2020}, address = {Marseille, France}, publisher = {European Language Resource Association}, pdf = {https://aclanthology.org/2020.osact-1.5}, pages = {32--39}, language = {English}, isbn = {979-10-95546-51-1} }

2019

WANLP
Mazajak: An Online Arabic Sentiment Analyser

Abu Farha, Ibrahim, and Magdy, Walid

In Proceedings of the Fourth Arabic Natural Language Processing Workshop Aug 2019

Abs Bib PDF

Sentiment analysis (SA) is one of the most useful natural language processing applications. Literature is flooding with many papers and systems addressing this task, but most of the work is focused on English. In this paper, we present “Mazajak”, an online system for Arabic SA. The system is based on a deep learning model, which achieves state-of-the-art results on many Arabic dialect datasets including SemEval 2017 and ASTD. The availability of such system should assist various applications and research that rely on sentiment analysis as a tool.
@inproceedings{abu-farha-magdy-2019-mazajak, abbr = {WANLP}, bibtex_show = {true}, title = {{M}azajak: An Online {A}rabic Sentiment Analyser}, author = {Abu Farha, Ibrahim and Magdy, Walid}, booktitle = {Proceedings of the Fourth Arabic Natural Language Processing Workshop}, month = aug, year = {2019}, address = {Florence, Italy}, publisher = {Association for Computational Linguistics}, pdf = {https://aclanthology.org/W19-4621}, doi = {10.18653/v1/W19-4621}, pages = {192--198} }