Transforming Digital Forensics with Large Language Models

Overview

In the pursuit of justice and accountability in the digital age, the integration of Large Language Models (LLMs) with digital forensics holds immense promise. This three-hour tutorial provides a comprehensive exploration of the transformative potential of LLMs in automating digital investigations and uncovering hidden insights.

Through a combination of real-world case studies, interactive exercises, and hands-on labs, participants will gain a deep understanding of how to harness LLMs for evidence analysis, entity identification, and knowledge graph reconstruction.

By fostering a collaborative learning environment, this tutorial aims to empower professionals, researchers, and students with the skills and knowledge needed to drive innovation in digital forensics. As LLMs continue to revolutionize the field, this tutorial will have far-reaching implications for enhancing justice outcomes, promoting accountability, and shaping the future of digital investigations.

Date and Location

Monday October 21, 2024
ACM International Conference on Information and Knowledge Management (CIKM), Boise, USA, 2024

Speakers

Eric Xu, University of Maryland, College Park, Email: exu17288 at terpmail.umd dot edu
Wenbin Zhang, Ph.D., Assistant Professor in AI, Florida International University, Email: wenbin.zhang at fiu dot edu
Weifeng Xu, Ph.D., Professor in Cyber Forensics, University of Baltimore, Email: wxu at ubalt dot edu

Table of Contents PPT

Introduction
Forensic evidence entity recognition (hands-on lab)
- Evidence entity recognition
- Visualize evidence and their relations
Evidence knowledge graphs reconstruction (hands-on lab)
Profiling suspect based on browser history (hands-on lab)
- Intro
- Profiling lab and video
- Data Preprocessing
- Student Practice
Political insights analysis based on Hillary's leaked Emails (hands-on lab)
Challenges and Limitations of Leveraging LLM in Digital Forensics
Conclusion

Forensic Evidence Analysis

The Cyber incident report documents a conversation between an IT Security Specialist and an Employee about an email phishing attack. We use LLMs to identify evidence entities and relationships and to construct digital forensic knowledge graphs.

Here is an example of a reconstructed digital forensics knowledge graph using an LLM only:

Political Insight Analysis Leveraging LLMs

The case study demonstrates how to Leverage Large Language Models to gain political insight based on an email dataset. The dataset we have used in the case study is a set of leaked emails obtained from Hillary Clinton's private email server.

The background of the leaked emails is a significant chapter in recent U.S. political history, involving questions of transparency, security, and the handling of sensitive information. During Hillary's tenure as U.S. Secretary of State from 2009 to 2013, Hillary Clinton used a private email server for her official communications instead of the official State Department email system. She stated that this was done for convenience, allowing her to use a single device for both personal and official emails.

The leaked email dataset from Hillary Clinton's private email server is a comprehensive collection of communications covering her entire tenure as Secretary of State from 2009 to 2013. It includes approximately 30,000 emails with a wide range of topics from official diplomatic communications to personal correspondences. The release and subsequent analysis of these emails have played a crucial role in political debates, legal inquiries, and public discussions about transparency and security in government communications.

Our dataset: a set of email summaries. Each email summary is a summarization of an email generated by Gemini from an original email in the original leaked email dataset. We are only interested in emails containing the keyword "Israel".

Our results: Code in Jupyter Notebook.

Here are some political insights based on the leaked email summaries obtained from Hillary Clinton's private email server that are related to Israel:

Please cite our paper:

Eric Xu, Wenbin Zhang, and Weifeng Xu, "Transforming Digital Forensics with Large Language Models: Unlocking Automation, Insights, and Justice," in Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), Boise, USA, October 21-25, 2024

Honghe Zhou and Weifeng Xu and Josh Dehlinger and Suranjan Chakraborty and Lin Deng, "An LLM approach to gain cybercrime insights with evidence networks," in Proceedings of the 20th Symposium on Usable Privacy and Security (SOUP 2024), Philadelphia, PA, August 11-13 2024

Daniel Addai and Sarfraz Shaikh and Eric Xu and Wenbin Zhang and Xu, Weifeng, "A graph-based approach for discovering evidence relationships across multiple devices in group crimes," in Proceedings of the 24th IEEE International Conference on Software Quality, Reliability, and Security (QRS2024-Fast Abstract), Cambridge, United Kingdom, July 1-5 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

Transforming Digital Forensics with Large Language Models

Overview

Date and Location

Speakers

Table of Contents PPT

Forensic Evidence Analysis

Political Insight Analysis Leveraging LLMs

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

Transforming Digital Forensics with Large Language Models

Overview

Date and Location

Speakers

Table of Contents PPT

Forensic Evidence Analysis

Political Insight Analysis Leveraging LLMs