![Learning Python for Forensics](https://wfqqreader-1252317822.image.myqcloud.com/cover/209/36702209/b_36702209.jpg)
Chapter 3. Parsing Text Files
Text files, usually sourced from application or service logs, are a common source of artifacts in digital investigations. Log files can be quite large or contain data that makes human review difficult. Manual examination can devolve into a series of grep searches, which may or may not be fruitful. Some text files might be supported by prebuilt software. For those that are not, we will need to develop our own solution to properly parse and extract relevant information. In this chapter, we will analyze the setupapi.dev.log
file, which records device information on Windows machines. This log file is commonly analyzed in forensics to extract the first connection time of USB devices on the system. Although our focus is a single log file, note that we could replicate and improve upon this basic design to handle similarly structured files.
We will step through several iterations of the same code through this chapter. Though redundant, we encourage writing out each iteration for yourself. By rewriting the code, we will progress through the material together and find the proper solution, learn about bug handling, and implement efficiency measures. Please rewrite the code for yourself and test each iteration to see the changes in the output and code handling.
In this chapter, we will be covering the following topics:
- Identifying repetitive patterns in the log file for USB device entries
- Extracting and processing artifacts from text files
- Enhancing presentation of data in a deduplicated and readable manner