The book Optimized C++ by Kurt Guntheroth outlines many software optimization patterns, often used by seasoned programmers. Some of them are -
Precomputation - In this method, we pre-compute some or all the intermediate results and store it in a lookup table. Whenever a new problem comes, we check if it's already computed, or if it's reducible to a intermediate result which we have in the lookup table.
A example of this is in a game I had worked on. The user's points were calculated based on the selection (s)he has made in a pick mystery prize game. We ran a simulation and found that there were nearly 200 unique prize amounts, and no matter how the user picked, the final points would always be among these 200. Hence we hardcoded the different user picks to these prize amounts. During the game, instead of calculating the prize amount each time, we simply looked up this table.
Lazy computation - is almost the exact opposite of pre-computation. In some cases, the result of an heavy computation or a object of a big class, might not be required or used in every runpath. In such cases the heavy computation, or the instantiation of a big class is delayed as much as possible, and only done when it's required.
Batching - Batching is a method of collecting all common tasks and executing them together.
An example of this is commonly done in pricing of financial products. In this scenario, pricing requests are created for all the securities. Once all the requests are ready, they are sent to a grid (with load balancer). This ensures maximum utilization of grid, while the terminal waits for results from the grid.
Caching - Caching is similar to pre-computation is some aspects. But the difference is that, in pre-computation that most or all possible results are calculated and stored in a lookup table, which is available to the program even before it starts doing any real work. In contrast, in the caching technique, the program does some amount of computation and it's result is stored in a table. In the course of the next task/computation, the program looks up this table to see if it can re-use the previously calculated result.
This is often done in dynamic programming algorithms.
Specialization - Some algorithms, as intelligent and efficient as they may be, might not be suitable for out requirement or data-set. In such cases, it makes sense to specialize the algorithm to our needs.
The example of this is the C++ sort algorithm. The de-facto algorithm used to sort, is the STL std::sort, and works well in conjunction with many STL containers. However std::sort is quite in-efficient when used with std::list. Hence std::list has a specialized version of sort.
Generalization - Opposite of specialization. In some cases, the logic is exactly the same for many data types. Hence it makes sense to write a single template function or class, which reduces the code maintenance effort.
Optimization of hot paths - During the execution of a program, the pre-fetcher of a CPU core fetches as many instructions it can from memory, and feeds them to the execution units ( Ex. ALU) continuously to maximize throughput and reduce latency. The instructions also include branching statements like "if statements" and the body of "if " block. But after the execution of the "if statement", if the outcome of the statement is false, all the instructions in the condition block needs to be flushed, and the instructions of the next "else if" or "else" branch need to be loaded. When this happens, the execution units are idle, waiting for the pre-fetcher to finish it's work. This in-turn reduces the throughput of the CPU and efficiency of the program.
To avoid this, in case of branching instructions, we should execute the most likely check and instructions first,
Precomputation - In this method, we pre-compute some or all the intermediate results and store it in a lookup table. Whenever a new problem comes, we check if it's already computed, or if it's reducible to a intermediate result which we have in the lookup table.
A example of this is in a game I had worked on. The user's points were calculated based on the selection (s)he has made in a pick mystery prize game. We ran a simulation and found that there were nearly 200 unique prize amounts, and no matter how the user picked, the final points would always be among these 200. Hence we hardcoded the different user picks to these prize amounts. During the game, instead of calculating the prize amount each time, we simply looked up this table.
Lazy computation - is almost the exact opposite of pre-computation. In some cases, the result of an heavy computation or a object of a big class, might not be required or used in every runpath. In such cases the heavy computation, or the instantiation of a big class is delayed as much as possible, and only done when it's required.
Batching - Batching is a method of collecting all common tasks and executing them together.
An example of this is commonly done in pricing of financial products. In this scenario, pricing requests are created for all the securities. Once all the requests are ready, they are sent to a grid (with load balancer). This ensures maximum utilization of grid, while the terminal waits for results from the grid.
Caching - Caching is similar to pre-computation is some aspects. But the difference is that, in pre-computation that most or all possible results are calculated and stored in a lookup table, which is available to the program even before it starts doing any real work. In contrast, in the caching technique, the program does some amount of computation and it's result is stored in a table. In the course of the next task/computation, the program looks up this table to see if it can re-use the previously calculated result.
This is often done in dynamic programming algorithms.
Specialization - Some algorithms, as intelligent and efficient as they may be, might not be suitable for out requirement or data-set. In such cases, it makes sense to specialize the algorithm to our needs.
The example of this is the C++ sort algorithm. The de-facto algorithm used to sort, is the STL std::sort, and works well in conjunction with many STL containers. However std::sort is quite in-efficient when used with std::list. Hence std::list has a specialized version of sort.
Generalization - Opposite of specialization. In some cases, the logic is exactly the same for many data types. Hence it makes sense to write a single template function or class, which reduces the code maintenance effort.
Optimization of hot paths - During the execution of a program, the pre-fetcher of a CPU core fetches as many instructions it can from memory, and feeds them to the execution units ( Ex. ALU) continuously to maximize throughput and reduce latency. The instructions also include branching statements like "if statements" and the body of "if " block. But after the execution of the "if statement", if the outcome of the statement is false, all the instructions in the condition block needs to be flushed, and the instructions of the next "else if" or "else" branch need to be loaded. When this happens, the execution units are idle, waiting for the pre-fetcher to finish it's work. This in-turn reduces the throughput of the CPU and efficiency of the program.
To avoid this, in case of branching instructions, we should execute the most likely check and instructions first,
C++ optimization - Software Optimization patterns
Reviewed by zeroingTheDot
on
July 16, 2018
Rating:
No comments: