Improving performance

Because of their need to cover every eventuality, malloc library implementations are very general and most do their job well when you consider what is thrown at them. However, your program may not be performing as well as it should simply because there may be a more efficient way of dealing with dynamic memory allocations. Indeed, there may even be a more efficient malloc library available for you to use.

If you need to allocate lots of blocks of the same size¹, but you won't know the number of blocks you'll require until run-time then you could take the easy approach by simply allocating a new block of memory for each occurrence. However, this is going to create a lot of (typically small) memory blocks that the underlying malloc library will have to keep track of, and even in many good malloc libraries this is likely to cause memory fragmentation and possibly even result in the blocks scattered throughout the address space rather than all in the one place, which is not necessarily a good thing on systems with virtual memory.

An alternative approach would be to allocate memory in multiples of the block size, so that several blocks would be allocated at once. This would require slightly more work on your part since you would need to write interface code to return a single block, while possible allocating space for more blocks if no free blocks were available. However, this approach has several advantages. The first is that the malloc library only needs to keep track of a few large allocations rather than lots of small allocations, so splitting and merging free blocks is less likely to occur. Secondly, your blocks will be scattered about less in the address space of the process, which means that on systems with virtual memory there are less likely to be page faults if you need to access or traverse all of the blocks you have created.

A memory allocation concept that is similar to this is called an arena. This datatype requires functions which are built on top of the existing malloc library functions and which associate each memory allocation with a particular arena. An arena can have as many allocations added to it as required, but allocations cannot usually be freed until the whole arena is freed. Note that there are not really any generic implementations of arenas that are available as everyone tends to write their own version when they require it, although SGI IRIX and Compaq Tru64 systems do come with an arena library called amalloc.

However, what if you don't plan to free all of the blocks at the same time? A slight modification to the above design could be to have a slot table. This would involve allocating chunks of blocks as they are required, adding each individual block within a chunk to a singly-linked list of free blocks. Then, as new blocks are required, the allocator would simply choose the first block on the free list, otherwise it would allocate memory for a new chunk of blocks and add them to the free list. Freeing individual blocks would simply involve returning the block to the free list. If this description isn't clear enough, have a look in src/slots.h and src/slots.c. This is how the mpatrol library allocates memory from the system for all of its internal structures. For variable-sized structures, a slightly different approach needs to be taken, but for an example of this using strings see src/strtab.h and src/strtab.c.

Another optimisation that is possible on UNIX and Windows platforms is making use of memory-mapped files. This allows you to map a filesystem object into the address space of your process, thus allowing you to treat a file as an array of bytes. Because it uses the virtual memory system to map the file, any changes you make to the mapped memory will be applied to the file. This is implemented through the virtual memory system treating the file as a pseudo swap file and will therefore only use up physical memory when pages are accessed. It also means that file operations can be replaced by memory read and write operations, leading to a very fast and efficient way of performing I/O. Another added bonus of this system means that entire blocks of process memory can be written to a file for later re-use, just as long as the file can later be mapped to the same address. This can be a lot faster than writing to and reading from a specific format of file.

If you really don't want to keep track of dynamic memory allocations at all then perhaps you should consider garbage collection. This allows you to make dynamic memory allocations that need not necessarily be matched by corresponding calls to free these allocations. A garbage collector will (at certain points during program execution) attempt to look for memory allocations that are no longer referenced by the program and free them for later re-use, hence removing all possibility of memory leaks. However, the garbage collection process can take a sizable chunk of processor time depending on how large the program is, so it is not really an option for real-time programming. It is also very platform-dependent as it examines very low-level structures within a process in order to determine which pointers point to which memory allocations. But there is at least one garbage collector² that works well with C and C++ and acts as a replacement for malloc() and free(), so it may be the ideal solution for you.

If you do choose to use an alternative malloc library make sure that you have a license to do so and that you follow any distribution requirements. On systems that support dynamic linking you may want to link the library statically rather than dynamically so that you don't have to worry about an additional file that would need to be installed. However, whether you have that choice depends on the license for the specific library, and some licenses also require that the source code for the library be made readily available. Shared libraries have the advantage that they can be updated with bug fixes so that all programs that require these libraries will automatically receive these fixes without needing to be relinked.

If all of the above suggestions do not seem to help and you still feel that you have a performance bottleneck in the part of your code that deals with dynamically allocated memory then you should try using the memory allocation profiling feature of mpatrol. This can be used at run-time to analyse the dynamic memory allocation calls that your program makes during its execution, and builds statistics for later viewing with the mprof command. It is then possible for you to see exactly how many calls were made to each function and where they came from. Such information can then be put to good use in order to optimise the relevant parts of your code. The tracing output files that can be produced by the mpatrol library may also be useful in order to view patterns in memory allocation behaviour and gather information about lifetimes of memory allocations.

And finally, some tips on how to correctly use dynamic memory allocations. The first, most basic rule is to always check the return values from malloc() and related functions. Never assume that a call to malloc() will succeed, because you're unlikely to be able to read the future³. Alternatively, use (or write) an xmalloc() or similar function⁴, which calls malloc() but never returns `NULL' since it will abort instead. With the C++ operators it is slightly different because some versions use exceptions to indicate failure, so you should always provide a handler to deal with this eventuality.

Never use features⁵ of specific malloc libraries if you want your code to be portable. Always follow the ANSI C or C++ calling conventions and never make assumptions about the function or operator you are about to call — the standards committees went to great lengths to explicitly specify its behaviour. For example, don't assume that the contents of a freed memory allocation will remain valid until the next call to malloc(), and don't assume that the contents of a newly allocated memory block will be zeroed unless you created it with calloc().

Try to avoid allocating arrays on the stack if they are to hold data that may overflow. In most cases this is common sense, but sometimes you may allocate an array that should suffice for 99% of the time. However, if there is a 1% chance that it may overflow then on some systems the stack is executable and hackers can use that feature to break into a secure program by overwriting the current function's return address on the stack. Use statically-allocated or dynamically-allocated arrays for these situations, or better still, check for overflow.

Finally, try stress-testing your program in low memory conditions. The mpatrol library contains the LIMIT option which can place an upper bound on the size of the heap, and also contains the FAILFREQ and FAILSEED options which can cause random memory allocation failures. Doing this will test parts of your code that you would probably never expect to be called, but perhaps they will one day! Who would you rather have debugging your program — yourself or the user?

Footnotes

[1] Such as for use in a linked list.

[2] A freely distributably library called GC (see Related software).

[3] If you can, why are you reading this — you've already read it!

[4] The mpatrol library comes with the xmalloc() and MP_MALLOC() families of functions.

[5] Whether they are documented or not.

14 Improving performance

Footnotes