Journal of Management Information Systems

Volume 37 Number 3 2020 pp. 694-722

Semi-Supervised Cyber Threat Identification in Dark Net Markets: A Transductive and Deep Learning Approach

Ebrahimi, Mohammadreza, Nunamaker, Jay F, and Chen, Hsinchun


Dark Net Marketplaces (DNMs), online selling platforms on the dark web, constitute a major component of the underground economy. Due to the anonymity and increasing accessibility of these platforms, they are rich sources of cyber threats such as hacking tools, data breaches, and personal account information. As the number of products offered on DNMs increases, researchers have begun to develop automated machine learning-based threat identification approaches. A major challenge in adopting such an approach is that the task typically requires manually labeled training data, which is expensive and impractical. We propose a novel semi-supervised labeling technique for leveraging unlabeled data based on the lexical and structural characteristics of DNMs using transductive learning. Empirical results show that the proposed approach leads to an approximately 3-5% increase in classification performance measured by F1-score, while increasing both precision and recall. To further improve the identification performance, we adopt Long Short-Term Memory (LSTM) as a deep learning structure on top of the proposed labeling method. The results are evaluated against a large collection of 79K product listings obtained from the most popular DNMs. Our method outperforms the state-of-the-art methods in threat identification and is considered as an important step toward lowering the human supervision cost in realizing automated threat detection within cyber threat intelligence organizations.

Key words and phrases: Dark net marketplaces, cyber threats, semi-supervised labeling, transductive learning, deep learning, long short-term memory, threat detection