Development of an Extended Natural Language Processing (NLP)-based Framework For Knowledge Discovery In Terrorism-based Communication
Keywords:
Ontology based Information Extraction, Phrase Structure Grammar, Context Free Grammar, Linguistic rules based on regular expressionsAbstract
Bolanle F. Oladejo
Noble I. Onyemenam
Department of Computer Science, University of Ibadan, Ibadan, Nigeria.
Abstract
There is a global acceptance that communication is crucial to terrorism. Short Message Service (SMS), emails, instant and voice messages which are unstructured make up the top four most used means for terrorist communication which is largely based on natural language. In-order to detect evidence or threats conveyed in terrorist communication there is a need to analyse such forms of communication. To perform such analysis, we acknowledge that traditional text mining systems employ shallow parsing techniques and relies more on taxonomic relations or concept extractions which is not totally reliable in detecting semantic relationships and suspicious patterns in communication. Such task requires more complex analysis and processing of text. We propose an integrated framework that performs syntactic and semantic analysis of natural language. This integrated framework is developed using Natural Language Processing (NLP) techniques to process text, Ontology Based Information Extraction (OBIE) too understand, represent and express the problem domain, Computational linguistics technique(s); Phrase Structure Grammar (PSG, Context Free Grammar (CFG) and lastly, linguistic rules based on regular expressions to create a rule-based modelling of natural language from a computational perspective . Dataset was obtained from the Message Understanding Conference (MUC) and Global Terrorism Database (GTD) which consists of actual communication of terrorist activity. By analysing these datasets with the system, an average precision, recall, F-score of 90.5, 86.1, 88.43 and 95.5, 93.3, 92.63 for the MUC and GTD dataset respectively were obtained. The experimental result obtained clearly shows that our system not only identifies major conditions that satisfies the definition of a terrorist attack but also expresses the relationship, intent, recipient and location of an attack. This in turn informs the security analyst to take prompt decisions as regards such communication that includes malicious content.
