Learning Python for Forensics

User input

Allowing a user input enhances the dynamic nature of a program. It is good practice to query the user for file paths or values rather than explicitly writing this information. Therefore, if the user wants to use the same program on a separate file, they can simply provide a different path, rather than editing the source code. In most programs, users supply input and output locations or identify which optional features or modules should be used at runtime.

User input can be supplied when the program is first called or during runtime as an argument. For most projects, it is recommended to use command-line arguments because asking the user for an input during runtime halts the program execution while waiting for the input.

Using the raw input method and the system module – user_input.py

Both raw_input() and sys.argv represent basic methods of obtaining input from users. Be cognizant of the fact that both of these methods return string objects. We can simply convert the string to the required data type using the appropriate class constructor.

The raw_input() function is similar to asking someone a question and waiting for their reply. During this time, the program's execution thread halts until a reply is received. We define a function later that queries the user for a number and returns the squared value. The integer constructor around the raw_input() function ensures that we are working with an integer and not a string. If the integer conversion is removed, a TypeError will be generated when attempting to square a string. We have the following code:

>>> def square():
... value = int(raw_input('Provide number: '))
... return value**2
>>> square()
Provide number: 3

Arguments supplied at the command line are stored in the sys.argv list. As with any list, these arguments can be accessed with an index, which starts at zero. The first element is the name of the script while any element after that represents a space-separated user-supplied input. We need to import the sys module to access this list.

On line 9, we copy the arguments from the sys.argv list into a temporary list variable named args. This is preferred because on line 11, we remove the first element after printing it. For the remaining items in the args list, we use a for loop and wrap our list with the built-in enumerate() function. This gives us a counter for our loop, I, to count the number of arguments. On lines 13 and 14, we print out each argument, its number, and type. We have the following code:

001 import sys 
004 def main():
005     """"""
006     The main function uses sys.argv list to print any user supplied input.
007     :return: Nothing.
008     """"""
009     args = sys.argv
010     print ''Script: '', args.pop(0)
012     for i, argument in enumerate(sys.argv):
013         print ''Argument {}: {}''.format(i, argument)
014         print ''Type: {}''.format(type(argument))
016 if __name__ == ''__main__'':
017     main()

After saving this file as user_input.py, we can call it at the command line and pass in our arguments. Note in the following screenshot that both arguments are treated as strings. Please remember this feature of using raw_input() or the sys.argv list as it will save you some headaches later when troubleshooting a script.

Using the raw input method and the system module – user_input.py

For smaller programs that do not have many command-line options, the sys.argv list is a quick and easy way to obtain user input.


File paths that contain a space should be double-quoted. For example, sys.argv would split C:/Users/LPF/misc/my books into C:/Users/LPF/misc/my and books. This would result in an IOError exception when trying to interact with this directory in a script.

Understanding Argparse – argument_parser.py

Argparse is a module in the standard library and will be used throughout the book as a means of obtaining user input. Argparse can help develop more complicated command-line interfaces. By default, Argparse creates an -h switch or a "help" switch to display help and usage information for any argument. Later, we have built a sample argparse implementation that has required, optional, and default arguments.

We import the argparse module on line 1. The main function defined on line 4 will print out any supplied arguments it receives. Starting with line 14, we begin to define components of our argument parser. Lines 14 and 15 are optional, and they generally only display when a user runs the help (-h or -help) switch with our script. See the following code:

001 import argparse
004 def main(args):
005     """
006     The main function prints the the args input to the console.
007     :param args: The parsed arguments namespace created by the argparse module.
008     :return: Nothing.
009     """
010     print args
013 if __name__ == '__main__':
014     description = 'Argparse: Command-Line Parser Sample'  # Description of the Program to display with help
015     epilog = 'Built by Preston Miller & Chapin Bryce'  # Displayed after help, usually Authorship and License Information

The first real step to develop our command-line interface is creating the ArgumentParser object on line 18. We will add our arguments and ultimately parse supplied arguments using this object:

017     # Define initial information for argument parser
018     parser = argparse.ArgumentParser(description=description, epilog=epilog)

Arguments are added to our parser via the add_argument() function. This function must be supplied with a string representing the name of the argument followed by any options. Optional arguments include a description of the argument, selecting choices, storing the argument as true or false if selected, and so on.

There are two types of argument—positional and optional. Optional arguments have one or two dashes as the first character of their name, by default. For example, the timezone argument on line 21 is positional and required, whereas the c argument is optional. Using the required keyword on line 22 allows us to create a required non-positional argument as follows:

020     # Add arguments
021     parser.add_argument('timezone', help='timezone to apply') # Required variable (no `-` character)
022     parser.add_argument('--source', help='source information', required=True) # Optional argument, forced to be required
023     parser.add_argument('-c', '--csv', help='Output to csv') # Optional argument using -c or --csv 

The action keyword determines how the argument is processed when supplied at the command line. On line 26, the no-email argument will store the Boolean value False when supplied and True when it is not. Alternatively, the send-email argument stores True when supplied by the user and False otherwise. On line 28, every time the email argument is used it gets appended to the emails list. The count action will store an integer representing the number of times the argument was called. For example, supplying –vvv at the command line will store the value 3 in the v argument. Take a look at the following code:

025     # Using actions
026     parser.add_argument('--no-email', help='disable emails', action="store_false") # Assign `False` to value if present.
027     parser.add_argument('--send-email', help='enable emails', action="store_true") # Assign `True` to value if present.
028     parser.add_argument('--emails', help='email addresses to notify', action="append") # Append values for each call. i.e. --emails a@example.com --emails b@example.com
029     parser.add_argument('-v', help='add verbosity', action='count') # Count the number of instances. i.e. –vvv

The default keyword dictates the default value of an argument. We can also use the type keyword to store our argument as a certain object. Instead of being stuck with strings as our only input, we can now store the input directly as the desired object and remove user input conversions from our scripts:

031     # Defaults
032     parser.add_argument('--length', default=55, type=int)
033     parser.add_argument('--name', default='Alfred', type=str)

Argparse can be used to open a file for reading or writing. On line 36, we open the required argument input_file in read mode. By passing this file object into our main script, we can immediately begin to process our data of interest:

035     # Handling Files
036     parser.add_argument('input_file', type=argparse.FileType('r')) # Open specified file for reading
037     parser.add_argument('output_file', type=argparse.FileType('w')) # Open specified file for writing

The last keyword we will discuss is choices, which takes a list of options the user can select from. When the user calls this switch, they must then provide one of the valid options. For example, --file-type DD/001 would set the file-type argument to the DD/001 choice as follows:

039     # Choices
040     parser.add_argument('--file-type', choices=['E01', 'DD/001', 'Ex01'])  # Allow only specified choices

Finally, once we have added all of our desired arguments to our parser, we can parse the arguments. On line 43, we call the parse_args() function that creates a Namespace object. To access, for example, the length argument that we created on line 32, we need to call the Namespace object, such as arguments.length. On line 44, we pass our arguments into our main() function, which prints out all the arguments in the Namespace object on line 10. We have the following code:

042     # Parsing arguments into objects
043     arguments = parser.parse_args() 
044     main(arguments)

These objects may be reassigned to variables for easier recall. With the basics of the argparse module understood, we can now build simple and more advanced command-line arguments for our scripts. Therefore, this module is used extensively to provide command-line arguments for most of the code we will build. When running the following code with the help switch, we should see our series of required and optional arguments for the script:

Understanding Argparse – argument_parser.py