Writing efficient image-scanning loops
In the previous recipes of this chapter, we presented different ways of scanning an image in order to process its pixels. In this recipe, we will compare the efficiency of these different approaches.
When you write an image-processing function, efficiency is often a concern. When you design your function, you will frequently need to check the computational efficiency of your code in order to detect any bottleneck in your processing that might slow down your program.
However, it is important to note that unless necessary, optimization should not be done at the price of reducing the program clarity. Simple code is indeed always easier to debug and maintain. Only code portions that are critical to a program's efficiency should be heavily optimized.
How to do it...
In order to measure the execution time of a function or a portion of code, there exists a very convenient OpenCV function called cv::getTickCount()
. This function gives you the number of clock cycles that have occurred since the last time you started your computer. Since we want the execution time of a code portion given in seconds, we use another method, cv::getTickFrequency()
. This gives us the number of cycles per second. The usual pattern to be used in order to obtain the computational time of a given function (or portion of code) would then be as follows:
const int64 start = cv::getTickCount(); colorReduce(image); // a function call // elapsed time in seconds double duration = (cv::getTickCount()-start)/ cv::getTickFrequency();
How it works...
The execution times of the different implementations of the colorReduce
function from this chapter are reported here. The absolute runtime numbers would differ from one machine to another (here, we used a 2.40 GHz machine equipped with a 64-bit Intel Core i7). It is rather interesting to look at their relative difference. These results are also dependent on the specific compiler that is used to produce the executable file. Our tests report the average time to reduce the colors of an image that has a resolution of 4288 x 2848 pixels.
First, we compare the three ways of computing the color reduction as presented in the There's more... section of the Scanning an image with pointers recipe. It is interesting to observe that the formula that uses the bitwise operator is much faster than the others at 9.5ms. The one using the integer division is at 26ms. The version based on the modulo operator is, however, at 33 ms. This represents a factor of more than 3 between the fastest and the slowest! It is therefore important to take the time to identify the most efficient way of computing a result in an image loop, as the net impact can be very significant.
When an output image that needs to be reallocated is specified instead of in-place processing, the execution time becomes 29 ms. The extra duration represents the overhead for memory allocation.
In a loop, you should avoid repetitive computations of values that could be precomputed instead. This consumes time, obviously. For example, you take the following inner loop of the color reduction function:
int nc= image.cols * image.channels(); uchar div2= div>>1; for (int i=0; i<nc; i++) {
Then, you replace it with the following one:
for (int i=0; i<image.cols * image.channels(); i++) { // . . . *data++ += div>>1;
The preceding code is a loop where you need to compute the total number of elements in a line and the div>>1
result again and again; you will obtain a runtime of 52 ms, which is significantly slower than the original version at 26 ms. Note, however, that some compilers might be able to optimize these kinds of loops and still obtain efficient code.
The version of the color reduction function that uses iterators, as shown in the Scanning an image with iterators recipe, gives slower results at 52 ms. The main objective of iterators is to simplify the image-scanning process and make it less prone to errors.
For completeness, we also implemented a version of the function that uses the at
method for pixel access. The main loop of this implementation would then simply read as follows:
for (int j=0; j<nl; j++) { for (int i=0; i<nc; i++) { // process each pixel --------------------- image.at<cv::Vec3b>(j,i)[0]= image.at<cv::Vec3b>(j,i)[0]/div*div + div/2; image.at<cv::Vec3b>(j,i)[1]= image.at<cv::Vec3b>(j,i)[1]/div*div + div/2; image.at<cv::Vec3b>(j,i)[2]= image.at<cv::Vec3b>(j,i)[2]/div*div + div/2; // end of pixel processing ---------------- } // end of line }
This implementation is much slower when a runtime of 53 ms is obtained. This method should then be used only for the random access of image pixels but never when scanning an image.
A shorter loop with few statements is generally more efficiently executed than a longer loop over a single statement even if the total number of elements processed is the same. Similarly, if you have N
different computations to apply to a pixel, apply all of them in one loop rather than writing N
successive loops, one for each computation.
We also performed the continuity test that produces one loop in the case of continuous images instead of the regular double loop over lines and columns. For a very large image, like the one we used in our tests, this optimization is not significant (25 ms instead of 26 ms), but in general, it is always a good practice to use this strategy, since it can lead to a significant gain in speed.
There's more…
Multithreading is another way to increase the efficiency of your algorithms, especially since the advent of multicore processors. OpenMP and the Intel Threading Building Blocks (TBB) are two popular APIs that are used in concurrent programming to create and manage your threads. In addition, C++11 now offers built-in support for threads.
See also
- The Performing simple image arithmetic recipe presents an implementation of the color-reduction function (described in the There's more... section) that uses the OpenCV 2 arithmetic image operators and has a runtime of 25 ms.
- The Applying look-up tables to modify image appearance recipe of Chapter 4, Counting the Pixels with Histograms describes an implementation of the color-reduction function based on a look-up table. The idea is to precompute all intensity reduction values that lead to a runtime of 22 ms.