Project Title

Identifying Terrorist Affiliations through Social Network Analysis Using Data Mining Techniques

Date of Award

4-20-2016

Degree Type

Thesis

Program

Information Technology

First Advisor

Sonja H. Streuber

Second Advisor

Nicholas S. Rosasco

Abstract

In a technologically enabled world, local ideologically inspired warfare becomes global all too quickly, specifically terrorist groups like Al Quaeda and ISIS (Daesh) have successfully used modern computing technology and social networking environments to broadcast their message, recruit new members, and plot attacks. This is especially true for such platforms as Twitter and encrypted mobile apps like Telegram or the clandestine Alrawi. As early detection of such activity is crucial to attack prevention data mining techniques have become increasingly important in the fight against the spread of global terrorist activity. This study employs data mining tools to mine Twitter for terrorist ‘organizing’ vocabulary and to pinpoint, through the analysis of (admittedly sometimes sparse) tweet metadata, the most likely geographical location and connected identities behind the user accounts used to transmit which organizing or post-event information. To accomplish this goal, R code and the twitteR package was used to connect through the existing Twitter API in order to validate a relevant word/ search term list. I then determine, with “most likely” frequency counts and word clouds the number of K-means clusters into which to separate the linguistic uses of these words and, by virtue of association, their user accounts. These user accounts are then investigated with network graphs built using R, NodeXL, and Gephi, which plot the user network as the final step. For the sake of user-friendly visualization, these networks are shown using three verified ISIS-sympathizing accounts that contain activist language and have emerged through analysis as leadership positions, either in terms of communication or in terms of internet activism. Within the limits of this thesis and available computing resources, an analysis of these three accounts will have to suffice; however, this technique could be used in a larger framework to produce more analytical layers and identify high-rank leaders. One challenge to this approach has been the meaningful extraction of Arabic terms in R, which has required workarounds for UTF-8 to overcame challenges relating to character sets; another is the transience nature of social network activity, in which user accounts change frequently, one user is found to own several accounts, and tweets can be deleted at any time. As customary with Natural Language Processing, a third challenge emerges through variations in spelling, orthography, and the use of abbreviations and special characters (especially the use of the underscore character), must be accredited this impacts the composition of stop lists and edge lists and likely introduce false positives into the overall analysis. This is why, at the present, visual verification of the analysis results is requisite with greater refinement of the analysis, which exceeds the context of this study, and this need can be greatly reduced.

Share

COinS