String searching within a large corpus of data is a critical component of digital forensic (DF) analysis techniques such as file carving. The continuing increase in capacity of consumer storage devices requires similar improvements to the performance of string searching techniques employed by DF tools used to analyse forensic data. As string searching is a trivially-parallelisable problem, general purpose graphic processing unit (GPGPU) approaches are a natural fit. Currently, only some of the research in employing GPGPU programming has been transferred to the field of DF, of which, a closed-source GPGPU framework was used— Complete Unified Device Architecture (CUDA). Findings from these earlier studies have found that local storage devices from which forensic data are read present an insurmountable performance bottleneck.
This research hypothesises that modern storage devices no longer present a performance bottleneck to the currently used processing techniques of the field, and proposes that an open-standards GPGPU framework solution – Open Computing Language (OpenCL) – would be better suited to accelerate file carving with wider compatibility across an array of modern GPGPU hardware. This research further hypothesises that a modern multi-string searching algorithm may be better adapted to fulfil the requirements of DF investigation.
This research presents a review of existing research and tools used to perform file carving and acknowledges related work within the field. To test the hypothesis, parallel file carving software was created using C# and OpenCL, employing both a traditional string searching algorithm and a modern multi-string searching algorithm to conduct an analysis of forensic data. A set of case studies that demonstrate and evaluate potential benefits of adopting various methods in conducting string searching on forensic data are given. This research concludes with a final case study which evaluates the performance to perform file carving with the best-proposed string searching solution and compares the result with an existing file carving tool— Foremost.
The results demonstrated from the research establish that utilising the parallelised OpenCL and Parallel Failureless Aho-Corasick (PFAC) algorithm solution demonstrates significantly greater processing improvements from the use of a single, and multiple, GPUs on modern hardware. In comparison to CPU approaches, GPGPU processing models were observed to minimised the amount of time required to search for greater amounts of patterns. Results also showed that employing PFAC also delivers significant performance increases over the BM algorithm. The method employed to read data from storage devices was also seen to have a significant effect on the time required to perform string searching and file carving.
Empirical testing shows that the proposed string searching method is believed to be more efficient than the widely-adopted Boyer-Moore algorithms when applied to string searching and performing file carving. The developed OpenCL GPGPU processing framework was found to be more efficient than CPU counterparts when searching for greater amounts of patterns within data. This research also refutes claims that file carving is solely limited by the performance of the storage device, and presents compelling evidence that performance is bound by the combination of the performance of the storage device and processing technique employed.
|Date of Award||Feb 2017|
|Supervisor||Ian Ferguson (Supervisor) & Adam T. Sampson (Supervisor)|
- Digital forensics
- String searching
- Pattern matching
- Parallel programming