Previous: Call stacks and symbol tables, Up: Operating system support


6.3 Threads

On systems with virtual memory, such as UNIX and Windows, user programs are run as processes which have their own address space and resources. If a process needs to create sub-processes to perform other tasks it must call fork() or spawn() to create new processes, but these new processes do not share the same address space or resources as the parent process. If processes need to share memory they must either use a message passing interface or explicitly mark a range of memory as shareable.

Traditionally, this was not too much of a handicap as parallel processing was an expensive luxury and could only be made use of by the kernel of such systems. However, with the birth of fast processors and parallel programming, programs could be made to run more efficiently and faster on multi-processor systems by having more than one thread of control. This was achieved by allowing processes to have more than one program counter through which the processor could execute instructions, and if one thread of control stalled for a particular reason then another could continue without stalling the entire process.

Such multithreaded programs allow parallel programming and implicit shared memory between threads since all threads in a process share the same address space and resources. This is similar to operating systems that have no virtual memory, such as AmigaOS and Netware1, except that once a process terminates, all threads terminate as well and all of its resources are still reclaimed.

Multithreaded programming generally needs no compiler support, but does require some primitive operations to be supported by the operating system for a threads library to call. The functions that are available in the threads library provide the means for a process to create and destroy threads. There are currently several popular threads libraries available, although the POSIX threads standard remains the definitive implementation.

It is always important to remember when programming a multithreaded application that because all threads in a process share the same address space, measures must be taken to prevent threads reading and writing global data in a haphazard fashion. This can either be done by locking with semaphores and mutexes, or can be performed by using stack variables instead of global variables since every thread has its own local stack. Care must be taken to write re-entrant functions — i.e. a function will give exactly the same result with one thread as it will with multiple threads running it at the same time.

The mpatrol library can be built as a thread-safe library with support for multi-threaded programs. When this library is linked with your program, only one thread at a time can allocate, reallocate or deallocate dynamic memory, or perform a memory operation via memcpy(), memset(), etc. This does not take full advantage of the potential concurrency in the library, but at least it will allow the debugging of multi-threaded programs.

The process of making the mpatrol library thread-safe was made more complicated by the fact that the mutexes protecting the library's data structures had to be recursive, since some of the functions that the library will call may call malloc() and free() or any other functions redefined by the library. If this was to happen with non-recursive mutexes then the recursive call would result in the thread attempting to lock a mutex that it already owned. However, implementing recursive mutexes was only half the problem.

The other problem with writing a thread-safe malloc library is that it must be initialised before the program becomes multi-threaded. If the library is initialised when there are multiple threads running then one thread may be attempting to initialise the mutexes whilst another thread may be attempting to lock an uninitialised mutex. Ideally, the best place to initialise the library would be at the start of main() but there is currently no way to do this other than getting users to explicitly plant calls to initialise the library in their code. This is not a very satisfactory solution if all we want to do is link in the replacement malloc library without any need for recompilation.

Fortunately, there are some ways to plant initialisation calls before main() is called, but they all have some drawbacks. The first way is to use a static file-scope constructor in C++, which will then initialise the mutexes and the library data structures before the code in main() is executed. However, on many systems this will require the final link to be performed by the C++ compiler that built the library. That may not be desirable or even possible in many cases. Unfortunately, this drawback appears in the second method, which involves using the GNU C compiler to compile the library. This compiler has an extension which allows functions to be specified as constructors which will be called before main(), but means that any program which is linked with the resulting library must be linked with the GNU C compiler driver. However, many systems are now GNU-based which would mean that this would happen anyway.

The final way of initialising the mutexes and the library data structures is to plant a call to the initialisation routines from a special section which the system will call before main() is called. This section is called the `.init' section on ELF-based platforms, but may exist in another form on other platforms too. This has the advantage that the system linker can be used to link the final program, but a possible disadvantage is that the library may be initialised too early, possibly before the environment or file streams have been set up. You may find that if one of the above methods does not work for you then perhaps another one will.


Footnotes

[1] Where the kernel is effectively a single process running all user programs as threads.