Home  CV  Contact

MalPaCA Seq+ Logo

MalPaCA Seq+ (repository)

Summary
🔍 An updated version of the MalPaCA algorithm that creates, based on the network flow of a software, a behavioral profile representing its actual capabilities.

The MalPaCA algorithm is a novel, unsupervised clustering algorithm that creates, based on the network flow of a software a behavioral profile representing its actual capabilities. It takes as an input one or multiple pcap files from which it then:

  1. splits them into uni-directional connections
  2. extracts from each connection 4 sequential features, namely the packet sizes (bytes), inter-arrival-times (ms), source ports and dest ports
  3. computes each feature the pairwise distance between all connections and stores them in their respective distance matrix
  4. combines the distance matrices using a simple weighted average, where all features have equal weights
  5. inputs the final distance matrix into the HDBScan clustering algorithm
  6. post-processes the final clusters and exports them in .csv and in temporal heatmaps form

In addition to the original version, “MalPaCa Seq+” contains a number of improvements that either facilitate research into the impact of different sequence lengths on the clustering performance or that make “MalPaCA” a more viable tool for cybersecurity research in general. In particular:

Example Transition Graph

Example Detailed Labels Overview Graph

Features

With “MalPaCA Seq+”, the user can:

Tools

PurposeName
Programming languagePython
Dependency managerAnaconda
Version control systemGit
Clustering AlgorithmHDBScan
Graph LibraryMatplotlib

Installation Process

If you want to import this project and resolve all the dependencies associated with it, it is assumed that you have already installed Anaconda, Python, an IDE like PyCharm and that your operating system is Windows. Re-create the original MalPaCA environment from the environment.yml file with this command:

conda env create -f environment.yml

Activate the new environment:

conda activate MalPaCA

Lastly, check that the new environment was installed correctly:

conda env list

Contributors

The original author of “MalPaCA” was Azqa Nadeem and the original source code can be found here.

Licence

The original “MalPaCA” framework was published under the MIT license, which can be found in the LICENSE file.

If you use MalPaCA in a scientific work, consider citing the following paper:

@article{nadeembeyond,
  title={Beyond Labeling: Using Clustering to Build Network Behavioral Profiles of Malware Families},
  author={Nadeem, Azqa and Hammerschmidt, Christian and Ga{\~n}{\'a}n, Carlos H and Verwer, Sicco},
  journal={Malware Analysis Using Artificial Intelligence and Deep Learning},
  pages={381},
  publisher={Springer}
}

References

The clustering result image in the logo was taken from the HDBSCAN website.