Threading API Design

Thread: Definition and Explanation

Basic Threading and Rationale

A thread is a path of execution through a program. A single-threaded application has only one path of execution. It starts at main() and ends when main() ends (yes, these aren't fully true, but the concept doesn't change).

In a multithreaded system, a thread can spawn other threads. Effectively, it calls a particular function, but calls this function in a separate path of execution. The two paths then execute concurrently. Effectively concurrently. If multiple processors, or hyperthreading, is not available, then the OS will interrupt the execution of one thread to allow another to run. The time during which a thread is run between the execution of other threads is known as its timeslice. A multithreading OS is responsible for deciding how long a timeslice is and when it ends. Ending a timeslice and beginning another one for a different thread is called task-switching. This operation can take some time.

Simply looking at this paradigm initially, one might think that threads aren't very useful. After all, unless the machine has multiple CPU's or hyperthread-enabled CPU's, the OS has to take time to perform task switching. In a single-threaded app, the OS wouldn't have to do this; it could let your thread run as normal. However, there are some uses for multithreading.

It can make input more responsive in a GUI application. If the user performs some operation that will take a non-trivial amount of time, the program can simply spawn a thread to do the task, and then return to processing the GUI. By doing so, the GUI is more responsive. It gets back to processing user input faster. Now, the entire process of performing the operation gets slower, because each thread increases the time that an operation takes slightly. But, the application seems faster to the user because it returns control much faster than if the process had to cause the user to wait.

Asynchronous file access is another place where threading can be useful. Reading even a byte from a harddrive takes a massive quantity of time. It takes so much time that most OS's halt the thread's execution until the operation completes. One can take advantage of this by doing all disk access in a special thread. This thread waits for the information to arrive while the application goes on and does something useful.

Cross-Thread Communication

Because threads are a different execution path through the same code, it is possible for two threads to touch the same memory or resource at the same time. As long as all threads are doing read operations, this is fine. However, as soon as one thread may be writing to a resource that another thread is reading from, a potential problem occurs. This is called a "race condition".

The reason the problem comes about is because the OS is free to interrupt the execution of a thread, to end its timeslice, at any time. Between any 2 machine opcodes, the OS can task-switch to another thread. If one thread happens to be in the middle of changing a structure, and it gets switched out, another thread may start reading the half-finished structure. This problem is typically referred to as "thread synchronization"

There are two ways to get around this synchronization problem. One way is to only use "atomic" operations on shared resources. An operation is atomic if it is guaranteed to take only 1 opcode, or is guaranteed by the operating system to be atomic (no task switching while it is going on). Unfortunately, something as simple as "++number" can take a number of cycles, so it is technically not atomic.

There is a theoretical third mechanism that can be used to solve this problem. The OS could be told to prevent task-switching until the OS is told to re-enable it. However, this is a very poor method for preventing race conditions.

This brings us to a second method: semaphores. A semaphore is an object, created by the OS, that can be locked and unlocked. The user can use them to manage access to objects. When a thread calls the semaphore lock command, if the semaphore is unlocked, then it is locked, and the thread continues on. If the semaphore is already locked, then the thread will suspend execution until the thread that has the lock unlocks the semaphore. This suspension is typically known as "blocking" on the semaphore.

Technically, this is not quite enough to solve the problem; it merely moves the race condition from some arbitrary object to the semaphore lock command. However, this is sufficient to solve the problem if the semaphore lock command is atomic. Fortunately, multithreading OS's guarantee that locks are atomic operations, so this all works out.

The penalty for that guarantee is that locking a semaphore is not free. Locking a semaphore requires a call to a not-entirely-fast OS function, as does unlocking it. As such, there is a performance warning for using too many semaphore objects.

Also, note that it is upon the user's shoulders to manage semaphores. A semaphore is not directly bound to a resource; it is a freestanding object. It is only the conventions that the user puts into the program that binds it to certain resources.

AllegroPro Thread Design

Threading is a feature of the OS. This system is initialized when AllegroPro is initialized. This is due to the fact that the threading system can be used by any number of systems in AllegroPro. Because this is part of the AllegroPro core, there is no such thing as a thread "driver".

Working with Threads

Spawning Threads

A thread is created by calling an API function and passing it a function pointer. This points to a method that functions as the main() procedure for that thread.

A thread can be assigned a priority. This is a hint to the OS as to how often to run the thread and how big a timeslice to give it. While the actual meaning of these values is dependent on the implementation, the relative meanings are as follows. There are 5 priorities:

Very Low: This thread will rarely get a timeslice.
Low: This thread will get less time than normal threads.
Normal: The best priority for most applications.
High: This thread will get a large portion of CPU time. This may impact the performance of other threads.
Extreme: This thread will get the vast majority of the CPU time. Using this could impact the performance of other applications. It is usually quite rude to use this priority.

A thread can be created in a suspended state, as defined below.

The API that created the thread will return a thread object. This object is used to refer to the thread in various API contexts.

Thread Management

Sometimes, it is the case that a thread would like to spend less time processing itself. A thread is therefore capable of yielding the rest of its timeslice back to the OS. This, effectively, shortens the timeslice of the current thread. There will be an API for the current thread to yield the timeslice back to the OS. Blocking on a semaphore can accomplish the same task, and, since the thread controls this behavior, it can prevent itself from tying up valuable semaphore objects.

A thread can, also, choose to suspend itself. While suspended, it does not get a timeslice. Another thread, the one that spawned it, must unsuspend it. In general, it is preferable to use a semaphore for this kind of communication. It is much easier to simply let a thread block on some semaphore than to suspend it.

In a thread, the user can query the thread object for that thread. With a thread object, the user can change the priority of that thread.

Blocking on Threads

Occasionally, the need arises for a thread to block until another thread completes. There will be an API for this.

Thread Death

A thread dies when it returns from its main procedure. Thread objects that refer to this thread are still valid objects, though most API functions no longer work on them, and will cause errors if they are called. These objects can be explicitly destroyed via an API function; this cleans up AllegroPro memory associated with the thread.

When the main procedure of a thread returns, it returns an error code. Once the thread has died, the user can query the returned error code by using the thread object, which is why they stick around. The user may delete a thread object without terminating the thread. However, without the thread object, the user may not use API functions that require the use of a thread object, such as collecting return values and so forth.

Forcing Thread Death

It is possible to kill a thread outright by calling an API function on the thread object. However, doing this can have very unexpected consequences. For example, if a thread was holding onto a semaphore lock when it was killed, it will never release it. And since the lock can only be released by the thread that locked it, the semaphore will never become unlocked. Thus any threads blocked on the lock will never execute.

This API should be used for killing threads when the program itself is terminating.

Primary Thread

The primary thread is the thread that initialized AllegroPro itself. This thread cannot be terminated manually, though it can be suspended. Some components of the base AllegroPro system may have specific requirements about which thread they are called from.

Thread Synchronization

AllegroPro's thread API will provide for a few synchronization objects.

Mutual Exclusive (mutex)

A mutex is the must fundamental of all semaphore types. It is the basic lock/unlock semaphore. It specifies a "mutually exclusive" resource; one that can only be accessed exclusively from one thread at a time.

A thread may lock a semaphore. If the semaphore is unlocked, then the thread now possesses the lock until it unlocks it. If the semaphore is locked, then the thread will not process further until the semaphore becomes unlocked.

If multiple threads are blocked on a semaphore that is unlocked, they will get the lock in an unspecified, though "reasonable" fashion.

Locking and unlocking are guaranteed to be atomic operations. Locking operations are allowed an approximate timeout value, so that if this value is exceeded, the thread will resume.

Signal

A signal is a semaphore that is used to communicate messages between threads. A thread can lock a signal. This expresses the intension of the thread to block until a signal is raised. Any number of threads can block on the same signal.

There are 2 types of singals: single release, or multiple release. When a signal is created, one of these must be specified. If single release is specified, then calling the function to raise a signal only releases one of the blocked threads. If multiple release is specified, then calling the function will release all blocked threads.

Multiple locking

There is a potential problem when multiple mutex's are used to lock resources. If thread A needs to lock X and Y, and so does thread B, a problem can occur. Let us suppose we have the following:

Thread A:

lock(MutexX)
lock(MutexY)
//Process resources for X and Y
unlock(MutexX)
unlock(MutexY)

Thread B:

lock(MutexY)
lock(MutexX)
//Process resources for X and Y
unlock(MutexX)
unlock(MutexY)

This can cause a serious problem, but one that will rarely occur. If thread A is task-switched out just after locking X, and thread B locks Y, then both threads will block on each other. In order for thread B to unlock Y, and allow A to run, it must be able to lock X. Unfortunately, in order for that to happen thread B must unlock Y so that thread A can proceed.

The correct way to deal with this problem is to make sure that all multiple locking operations happen in a specific order. Had Thread B done instead:

lock(MutexX)
lock(MutexY)
//Process resources for X and Y
unlock(MutexX)
unlock(MutexY)

The order itself doesn't matter, as long as there is an order.

As such, to provide for this AllegroPro will allow for the creation of multi-lock objects. These objects are created with a set number of mutex objects in mind. The user adds the mutex objects that he or she wants. These objects can be locked and unlocked. These objects guarantee that this condition will not happen even if different objects are locked/unlocked.

The manor in which they do this is simple. Every set of pointers has an implicit ordering; simply order the pointers as though they were integers. Since the absolute ordering isn't really important, only that the set of all mutex's has a single ordering, this works just fine.

This API is for mutex semaphore objects only. Using this API for signal semaphores is not possible, because it is impossible to fire two events simultaneously. When an event is fired, it is instantaneous and temporary.