ABSTRACT:
An essential component in combating hate speech is the development of effective computational algorithms. While prior research has proposed a range of methods for hate speech detection, they often fall short in addressing the complex nature of hate speech, which is characterized by its nuanced nature, the diversity of its forms, and the heterogeneous motivations behind it. To address these limitations, we introduce a novel prompt-learning framework for hate speech detection. Our approach offers several key innovations: (i) prompt generation is delegated to multiple language model agents, drawing upon the theory of questioning as a guiding principle; (ii) we employ an information-theoretic selection mechanism to identify the most effective prompts from a pool of candidates; and (iii) we incorporate motivation-aware instruction tuning to improve the model’s capacity to capture the diverse motivational drivers of hate speech. Our empirical evaluation, which includes comparisons with state-of-the-art benchmarks and multiple robustness checks, demonstrates significant performance gains achieved by our framework. These findings highlight the promise of prompt-learning based methods in hate speech detection, particularly when designed with attention to the social and psychological complexities that characterize online hate speech.
Key words and phrases: Hate speech, hate speech detection, prompt learning, multi-agent systems, information theory, language models