Development of an Extended Natural Language Processing (NLP)-based Framework For Knowledge Discovery In Terrorism-based Communication
Bolanle F. Oladejo Noble I. Onyemenam

Keywords

Ontology based Information Extraction
Phrase Structure Grammar
Context Free Grammar
Linguistic rules based on regular expressions

Abstract

Bolanle F. Oladejo

Noble I. Onyemenam

Department of Computer Science, University of Ibadan, Ibadan, Nigeria.

Abstract

There is a global acceptance that communication is crucial to terrorism. Short Message Service (SMS), emails, instant and voice messages which are unstructured make up the top four most used means for terrorist communication which is largely based on natural language. In-order to detect evidence or threats conveyed in terrorist communication there is a need to analyse such forms of communication. To perform such analysis, we acknowledge that traditional text mining systems employ shallow parsing techniques and relies more on taxonomic relations or concept extractions which is not totally reliable in detecting semantic relationships and suspicious patterns in communication. Such task requires more complex analysis and processing of text. We propose an integrated framework that performs syntactic and semantic analysis of natural language. This integrated framework is developed using Natural Language Processing (NLP) techniques to process text, Ontology Based Information Extraction (OBIE) too understand, represent and express the problem domain, Computational linguistics technique(s); Phrase Structure Grammar (PSG, Context Free Grammar (CFG) and lastly, linguistic rules based on regular expressions to create a rule-based modelling of natural language from a computational perspective . Dataset was obtained from the Message Understanding Conference (MUC) and Global Terrorism Database (GTD) which consists of actual communication of terrorist activity. By analysing these datasets with the system, an average precision, recall, F-score of 90.5, 86.1, 88.43 and 95.5, 93.3, 92.63 for the MUC and GTD dataset respectively were obtained. The experimental result obtained clearly shows that our system not only identifies major conditions that satisfies the definition of a terrorist attack but also expresses the relationship, intent, recipient and location of an attack. This in turn informs the security analyst to take prompt decisions as regards such communication that includes malicious content.

Bolanle F. Oladejo Noble I. Onyemenam