Excessive / incorrect usage of dynamic variables is the second biggest performance killer in C++ programs. Dynamic variables created when new is called, either directly or indirectly when a STL container, smart pointer is created, entails a huge cost. A single call to the memory manager will translate to thousands of low level instructions.
Hence we should take care to not create dynamically allocated variables excessively in big loops or frequently called functions.
Why is dynamic memory allocation such a costly affair ?
When a call to new is made, the request for a new memory of the requested size is forwarded to the memory manager. The process of finding a free block will require the memory manager to traverse the free memory blocks in heap to find the size you are looking for. If the block of the required size is already available with it, the block is simply returned. If memory manager has a bigger block, the memory is split. The memory manager tries to merge the remaining of the split block with adjacent free blocks before returning with the requested block.
But when the memory manager does not have a free block of requested or bigger size, the call to new becomes even more expensive because the memory manager then requests for more free memory from the system / OS. This call to the kernel is very very expensive, and moreover the memory returned by the kernel might not be in physical RAM, let alone the cache. So accessing dyanamically allocated memory variables for the first time can be slow. The first access brings the requested free memory from RAM to cache. But subsequent accesses should be relatively fast.
In a multi-threaded environment, where multiple threads are requesting for free memory, the call to memory manager becomes even more slower because there is only a single memory manager. So calls to memory manager is serialized to make it thread safe, and hence becomes a point of contention.
Now for the second part i.e freeing the dynamic variables. When a call to free is made, the memory manager tries to free the memory. It throws an error, when attempt is made to free memory which is not actually present ( happens in scenarios like double free etc). But when the free request is valid, the block is freed and is attempted to be merged with adjacent free blocks.
Calling free in a multi-threaded environment runs into the same issues as that of calling new in multi-threaded environment
Steps to optimize the dynamic memory usage
Here are a few steps how we can better use our heap memory -
1. Reduce usage of dynamically allocated variables - Not all objects need to be created on the heap. There are many instances where we can instead create the objects on stack or static memory area.
2. Use better data structures - Many STL containers like std::vector, std::deque use free memory. We can use statically allocated structures like std::array, an array implementation of linked list and trees, wherever they are sufficient for the required use case.
3. shared_ptr usage - For the creation of shared_ptr, prefer using the convenience function std::make_shared instead of using new. When make_shared is called, a single allocation call is made with the request size being equal to the size of object plus the size of control block. When new is called, there are two separate calls to the memory manager, one for the object itself and the second call for control block.
4. Do not use shared_ptr when not required - Prefer using unique_ptr when ownership is not shared. This is because unique_ptr is faster and lighter.
5. Prevent multiple allocations when using STL - Illustrating with an example, when a vector is first created without initializing it, like this -
std::vector<int> vc;
vc is created with a capacity of 0. When we make the first push, a new memory location of a certain size, let's say 2, is allocated( actually the initial size depends on compiler and machine). As we progress in the program and we keep pushing elements into the vector, there comes a point where we reach our capacity i.e vector's size becomes equal to vector's capacity. At the next push operation on this vector, a new call is made to the memory manager with the request size of current_capacity * 2 (this factor of 2 is actually not fixed but differs from implementation to implementation), thus increasing it's capacity. In the next step, the elements are copied from the old heap location to the new heap location. And the new element is then appended. In the final step, the old heap space is de-allocated i.e freed. As more elements are pushed, there might be more allocations and de-allocations. Hence we now end up with additional calls to memory manager which we intend to avoid, wherever we can.
If the size of the vector can be ascertained in the program, we can write something like this
In this example, since we are able to pre-determine the size of the vector, memory allocation (in line 4), and deallocation(when vc is destroyed) happens only once.
All STL containers like std::map, std::deque etc has the reserve API, which should be used whenever we are able to predict the size of the container in advance.
6. Avoid creating dynamically stored variables in loops.
7. If an object's lifetime is bound to container, use emplace instead of push/insert. Click here to read more.
8. Prefer move instead of copy - Implement move operations in your classes and use it. It's significantly faster than copy.
If you want to learn more about subtleties of dynamic memory and it's right usage, I highly recommend checking out Optimized C++ by Kurt Guntheroth.
Hence we should take care to not create dynamically allocated variables excessively in big loops or frequently called functions.
Why is dynamic memory allocation such a costly affair ?
When a call to new is made, the request for a new memory of the requested size is forwarded to the memory manager. The process of finding a free block will require the memory manager to traverse the free memory blocks in heap to find the size you are looking for. If the block of the required size is already available with it, the block is simply returned. If memory manager has a bigger block, the memory is split. The memory manager tries to merge the remaining of the split block with adjacent free blocks before returning with the requested block.
But when the memory manager does not have a free block of requested or bigger size, the call to new becomes even more expensive because the memory manager then requests for more free memory from the system / OS. This call to the kernel is very very expensive, and moreover the memory returned by the kernel might not be in physical RAM, let alone the cache. So accessing dyanamically allocated memory variables for the first time can be slow. The first access brings the requested free memory from RAM to cache. But subsequent accesses should be relatively fast.
In a multi-threaded environment, where multiple threads are requesting for free memory, the call to memory manager becomes even more slower because there is only a single memory manager. So calls to memory manager is serialized to make it thread safe, and hence becomes a point of contention.
Now for the second part i.e freeing the dynamic variables. When a call to free is made, the memory manager tries to free the memory. It throws an error, when attempt is made to free memory which is not actually present ( happens in scenarios like double free etc). But when the free request is valid, the block is freed and is attempted to be merged with adjacent free blocks.
Calling free in a multi-threaded environment runs into the same issues as that of calling new in multi-threaded environment
Steps to optimize the dynamic memory usage
Here are a few steps how we can better use our heap memory -
1. Reduce usage of dynamically allocated variables - Not all objects need to be created on the heap. There are many instances where we can instead create the objects on stack or static memory area.
2. Use better data structures - Many STL containers like std::vector, std::deque use free memory. We can use statically allocated structures like std::array, an array implementation of linked list and trees, wherever they are sufficient for the required use case.
3. shared_ptr usage - For the creation of shared_ptr, prefer using the convenience function std::make_shared instead of using new. When make_shared is called, a single allocation call is made with the request size being equal to the size of object plus the size of control block. When new is called, there are two separate calls to the memory manager, one for the object itself and the second call for control block.
4. Do not use shared_ptr when not required - Prefer using unique_ptr when ownership is not shared. This is because unique_ptr is faster and lighter.
5. Prevent multiple allocations when using STL - Illustrating with an example, when a vector is first created without initializing it, like this -
std::vector<int> vc;
vc is created with a capacity of 0. When we make the first push, a new memory location of a certain size, let's say 2, is allocated( actually the initial size depends on compiler and machine). As we progress in the program and we keep pushing elements into the vector, there comes a point where we reach our capacity i.e vector's size becomes equal to vector's capacity. At the next push operation on this vector, a new call is made to the memory manager with the request size of current_capacity * 2 (this factor of 2 is actually not fixed but differs from implementation to implementation), thus increasing it's capacity. In the next step, the elements are copied from the old heap location to the new heap location. And the new element is then appended. In the final step, the old heap space is de-allocated i.e freed. As more elements are pushed, there might be more allocations and de-allocations. Hence we now end up with additional calls to memory manager which we intend to avoid, wherever we can.
If the size of the vector can be ascertained in the program, we can write something like this
In this example, since we are able to pre-determine the size of the vector, memory allocation (in line 4), and deallocation(when vc is destroyed) happens only once.
All STL containers like std::map, std::deque etc has the reserve API, which should be used whenever we are able to predict the size of the container in advance.
6. Avoid creating dynamically stored variables in loops.
7. If an object's lifetime is bound to container, use emplace instead of push/insert. Click here to read more.
8. Prefer move instead of copy - Implement move operations in your classes and use it. It's significantly faster than copy.
If you want to learn more about subtleties of dynamic memory and it's right usage, I highly recommend checking out Optimized C++ by Kurt Guntheroth.
C++ Optimization - Optimizing dynamically allocated variables
Reviewed by zeroingTheDot
on
August 24, 2018
Rating:
No comments: