Although the OmniThreadLibrary treats communication as a superior approach to locking, there are still times when using “standard” synchronization primitives such as a critical section are unavoidable. As the standard Delphi/Windows approach to locking is very low-level, OmniThreadLibrary builds on it and improves it in some significant ways. All these improvements are collected in the OtlSync unit and are described in the following sections. The only exception is the waitable value class/interface, which is declared in the OtlCommon unit.
This part of the book assumes that you have a basic understanding of locking. If you are new to the topic, you should first read the appropriate chapters from one of the books mentioned in the introduction.
The most useful synchronisation primitive for multithreaded programming is indubitably the critical section5.
OmniThreadLibrary simplifies sharing critical sections between a task owner and a task with the use of the
WithLock method. High-level tasks can access this method through the task configuration block.
I was always holding the opinion that locks should be as granular as possible. Putting many small locks around many unrelated pieces of code is better than using one giant lock for everything, but programmers frequently use one or few locks because managing many critical sections can be a bother.
Delphi implements critical section support with a TCriticalSection class which must be created and destroyed in the code. (There is also a TRTLCriticalSection record, but it is only supported on Windows.) OmniThreadLibrary extends this implementation with an
IOmniCriticalSection interface, which you only have to create. The compiler will make sure that it is destroyed automatically at the appropriate place.
TCriticalSection internally. It acts just as a proxy that calls
TCriticalSection functions. Besides that, it provides an additional functionality by counting the number of times a critical section has been acquired, which can help a lot while debugging. This counter can be read through the
A critical section can be acquired multiple times from one thread. For example, the following code is perfectly valid:
IOmniCriticalSection doesn’t use
TCriticalSection directly, but wraps it into a larger object as suggested by Eric Grange.
TCriticalSection extension found in the OmniThreadLibrary is the
TOmniCS record. It allows you to use a critical section by simply declaring a record in appropriate place.
TOmniCS, locking can be as simple as this:
TOmniCS is implemented as a record with one private field holding the
Release method merely calls the
Release method on the internal interface, while the
Acquire method is more tricky as it has to initialize the
ocsSync field first.
The initialization, hidden inside the
Initialize method (which you can also call from the code to initialize the critical section explicitly) is quite tricky because it has to initialize the
ocsSync only once and must work properly when called from two places (two threads) at the same time. This is achieved by using the optimistic initialization approach, described later in this chapter.
TOmniCS is a great simplification of the critical section concept, but it still requires you to declare a separate locking entity. If this locking entity is only used to synchronize access to a specific instance (being that an object, record, interface or even a simple type) it is often better to declare a variable/field of type
Locked<T> which combines any type with a critical section.
Locked<T>, the example from the
TOmniCS section can be rewritten as follows.
The interesting fact to notice is although the
lockedIntf is declared as a variable of type
Locked<IGpIntegerList>, it can be initialized and used as if it is of type
IGpIntegerList. This is accomplished by providing
Implicit operators for conversion from
T and back. Delphi compiler is (sadly) not smart enough to use this conversion operator in some cases so you would still sometimes have to use the provided
Value property. For example, you’d have to do it to release wrapped object. (In the example above we have wrapped an interface and the compiler itself handled the destruction.)
Besides the standard
Locked<T> also implements methods used for pessimistic locking, which is described later in this chapter, and two almost identical methods called
Locked which allow you to execute a code segment (a procedure, method or an anonymous method) while the critical section is acquired. (In other words, you can be assured that the code passed to the
Locked method is always executed only once provided that all code in the program properly locks access to the shared variable.)
There is an alternative built into Delphi since 2009 which provides functionality similar to the
TMonitor. In modern Delphis, every object can be locked by using
System.TMonitor.Enter function and unlocked by using
System.TMonitor.Exit. The example above could be rewritten to use the
TMonitor without much work.
A reasonable question to ask is, therefore, why implementing
Locked<T>. Why is
TMonitor not good enough? There are plenty of reasons for that.
TMonitorwas buggy since its inception6 (although I believe that
Exitmay be stable enough for release code) and I don’t like to use it.
TMonitordoesn’t convey your intentions. Just by looking at the variable/field declaration you wouldn’t know that the entity is supposed to be used in a thread-safe manner. Using
Locked<T>, however, explicitly declares your intent.
Exitdoesn’t work with interfaces, records and primitive types.
A typical situation in a multithreaded program is a multiple readers/exclusive writer scenario. It occurs when there are multiple reader thread which can operate on the same object simultaneously, but must be locked out when an exclusive writer thread wants to make changes to this object. Delphi already implements a synchronizer for this scenario (
SysUtils), but it is quite a heavy weight object which you can use in many different ways. For situations when the probability of collision8 is low and especially, when the object is not locked for a long period of time, a
TOmniMREW synchronizer may give you a better performance.
To use the
TOmniMREW synchronizer, readers must call
EnterReadLock before reading the object and
ExitReadLock when they don’t need the object anymore. Similarly, writers must call
I’d like to stress again the importance of not locking an object for long time when using
Enter methods wait in a tight loop while waiting for an object to be available, which can quickly use lots of CPU time if probability of collisions are high. (Collisions typically occur more often if an object is locked for extensive periods of time.)
Due to an optimized implementation that favours speed over safety, you’ll get a cryptic access violation error if the
TOmniMREW instance is destroyed while a read or write lock is taken.
To be clear, this is a programming error; you should never destroy a synchronization object while it holds a lock. It’s just that the error displayed will not make it very clear what you are doing wrong.
For example, the following test code fragment will cause an access violation.
Sometimes you want to instruct background tasks to stop whatever they are doing and quit. Typically, this happens when the program is shutting down. Programs using the “standard” multithreaded programming (i.e. TThread) are solving this problem each in its own way, typically by using boolean flags or Windows events.
To make the task cancellation simpler and more standardized, OmniThreadLibrary introduces a cancellation token. A cancellation token is an instance of the
IOmniCancellationToken interface and implements functionality very similar to the Windows event synchronization primitive.
By default, a cancellation token is in a cleared (inactive) state. To signal it, a code calls the
Signal method. Signalled token can be cleared by calling the
The task can check the cancellation token’s state by calling the
IsSignalled method or by waiting (using
WaitForSingleObject or any of its variants) on the
Handle property. Wait will succeed when the cancellation token is signalled.
An important part of the cancellation token implementation is that the same token can be shared between multiple tasks. To cancel all tasks, the code must only call
Signal once (provided that other parts of the program don’t call
Clear, of course).
Cancellation tokens are used in low-level and high-level multithreading. Low-level multithreading uses the
CancelWith method to pass a multithreading token around while the high-level multithreading uses the task configuration block.
Cancellation is demonstrated in examples
The communication framework in the OmniThreadLibrary works asynchronously (you cannot know when a task or owner will receive and process the message). Most of the time that functions great, but sometimes you have to process messages synchronously (that is, you want to wait until the task processes the message) because otherwise the code gets too complicated. For those situations, OmniThreadLibrary offers a waitable value
TOmniWaitableValue, which is also exposed as an interface
The usage pattern is quite simple. The caller creates an object or interface of that type, sends it to another thread (typically via Task.Comm.Send) and calls the
WaitFor method. The task receives the message, does the processing and calls
Signal to signal completion or
Signal(some_data) to signal completion and return some data. At that point, the
WaitFor returns and caller can read the data from the
A practical example should clarify this explanation. The two methods below are taken from the OtlThreadPool unit.
When a code wants to cancel threadpooled task, it will call the
Cancel function. This function sends the
Cancel message to the worker task and passes along the ID of the task to be cancelled and a
TOmniWaitableValue object. Then it waits for the object to become signalled.
Cancel method in the worker task processes the message, does lots of complicated work (removed for clarity) and at the end calls the
Signal method on the
TOmniWaitableValue object to signal completion and return a boolean value.
Very soon after the
Signal is called, the
WaitFor in the caller code exits and
TOmniThreadPool.Cancel retrieves result from the
A semaphore is a counting synchronisation object that starts at some value (typically greater than 0) which usually represents a number of available resources of some kind. To allocate a semaphore, one waits on it. If the semaphore count is greater than zero, the semaphore is signalled, wait will succeed and semaphore count gets decremented by one. [Of course, all of this occurs atomically.] If the semaphore count is zero, the semaphore is not signalled and wait will block until the timeout or until other thread releases the semaphore, which increments the semaphore’s count and puts it into the signalled state.
While semaphores are implemented in the Windows kernel and Delphi wrapps them in a pretty object (TSemaphore), Windows doesn’t support an useful variation on the theme – an inverse semaphore, also known as a countdown event9.
Inverse semaphore differs from a normal semaphore by getting signalled when the count drops to zero. This allows another thread to execute a blocking wait that will succeed only when the semaphore’s count is zero. Why is that good, you’ll ask? Because it simplifies resource exhaustion detection. If you wait on an inverse semaphore and this semaphore becomes signalled, then you know that the resource is fully used.
The inverse semaphore is implemented by the
TOmniResourceCount class which implements an
Initial resource count is passed to the constructor or to the
Allocate will block if this count is zero (and will unblock automatically when the count becomes greater than zero); otherwise it will decrement the count. The new value of the resource count is returned as a function result. [You should keep in mind that this number may not be valid when it is processed in the calling code if other threads are using the same inverse semaphore.]
TryAllocate is a safer version of
Allocate taking a timeout parameter (which may be set to
INFINITE) and returning success/fail status as a function result.
Release increments the count and unblocks waiting
Allocates. New resource count (potentially invalid at the moment caller will see it) is returned as the result.
Finally, there is a
Handle property exposing a handle which is signalled when resource count is zero and unsignalled otherwise.
Initializing an object in a multithreaded world is not a problem – if the object is initialized in the context of a single thread. To put this into a simple language – everything is fine, if we can initialize object first and then pass it to multiple tasks.10 In most cases, this is not a problem, but sometimes we want to pass an object to multiple tasks and then create it in one of the tasks (typically, in the first task that will want to use the object). While this may look as a weird approach to programming, it is a legitimate programming pattern, called lazy initialization.
The reason behind this weirdness is that sometimes we don’t know in advance whether an object (or some part of a composite object) will be used at all. If the probability that the object will be used is low enough, it may be a good idea not to initialize it in advance, as that would take some time and use some memory (or maybe even lots of memory).
Additionally, there may not be a good place to call the initialization. A good example is the
TOmniCS record where we want to do an implicit initialization the first time an
Acquire method is called. As this record is usually just declared as a variable/field and not explicitly initialized, there is no better place to call then initialization code than from the
This part of the book will explain two well-known approaches to shared initialization – a pessimistic initialization and an optimistic initialization. There’s also a third approach – busy-wait – which you can read more about on my blog.
The difference between the two approaches is visible from the following pseudocode.
An optimistic initializer assumes that there’s hardly a chance of initialization being called from two tasks at the same time. Under this assumption, it is fastest to initialize the object (in the code above, the initialization is represented by creation of the shared object) and then atomically copying this object into the shared field/variable. The (nonexisting)
AtomicallyTestAndStore method compares old value of
nil and stores
Shared is nil. It makes all this in a way that prevents the code from being executed from two threads at the same time. If the
AtomicallyTestAndStore fails (returns False), another task has already modified the
Shared variable and we must destroy the temporary resource.
The advantage of this approach is that there is no locking so we don’t have to create an additional critical section. Only CPU-level bus locking is used to implement the
AtomicallyTestAndStore. The disadvantage is that duplicate objects may be created at some point.
A pessimistic initializer assumes that there’s a high probability of initialization being called from two tasks at the same time and uses an additional critical section to lock access to the initialization part. A test, lock, retest pattern is used for performance reason – the code first checks whether the shared object is initialized then (if it is not) locks the critical section and retests the shared object as another task could have initialized it in the meantime.
The advantage of this approach is that only a single object is created. The disadvantage is that we must manage additional critical section that will be used for locking.
It is unclear which approach is better. Although locking slows the application more than microlocking, creating duplicate resources may slow it down even more. On the other hand, pessimistic initializer requires additional lock, but that won’t make much difference if you don’t create millions of shared objects. In most cases initialization code will be rarely called and the complexity of initializer will not change the program performance in any meaningful way so the choice of initializer will mainly be a matter of personal taste.
While pessimistic initialization doesn’t represent any problems for a skilled programmer, it is bothersome as we must manage an additional locking object. (Typically that will be a critical section.) To simplify the code and to make it more intentional, OmniThreadLibrary introduces a
Locked<T> type which wraps any type (the type of your shared object) and a critical section.
An instance of the
Locked<T> type contains two fields – one holding your data (
FValue) and another containing a critical section (
FLock). (The latter is, as a matter of fact, initialized with an optimistic approach.)
Locked<T> provides two helper functions (
Initialize) which implement the pessimistic initialization pattern.
The first version accepts a factory function which creates the object. The code implements the test, lock, retest pattern explained previously in this section.
Another version, implemented only in Delphi 2010 and newer, doesn’t require a factory function but calls the default (parameter-less) constructor. This is, of course, only possible if the
T type represents a class. Actually, this method simply calls the other version and provides a special factory method which travels the extended RTTI information, selects an appropriate constructor and executes it to create the shared object.
Locked<T> implements few other helpers.
Release allow you to manage locking manually. They simply call the appropriate
TOmniCS function. There are also two variations of the
Locked function which lock the critical section, call your code and unlock the critical section.
An optimistic initialization is supported with the
Atomic<T> class which is much simpler than the pessimistic
Locked<T>, there are two
Initialize functions, one creating the object using a user-provided factory function and another using RTTI to call the default parameter-less constructor. We’ll only examine the former.
The code first checks if the storage is already initialized by using a weird cast which assumes that the
T is pointer-sized. This is a safe assumption because
Atomic<T> only supports
T being a class or an interface.
Next the code checks whether the shared object and the temporary variable are properly aligned. This should in most cases not present a problem as all ‘normal’ fields (not stored in
packed record types) should be always appropriately aligned.
After that, the factory function is called to create an object.
Next, the InterlockedCompareExchangePointer is called. It takes three parameters – a destination address, an exchange data and a comparand. The functionality of the code can be represented by the following pseudocode:
The trick here is that this code is all executed inside the CPU, atomically. The CPU ensures that the destination value is not modified (by another CPU) during the execution of the code. It is hard to understand (interlocked functions always make my mind twirl in circles) but basically it reduces to two scenarios:
nil(old, uninitialized value of
storageis set to new object (
storageis not modified.
In yet another words – InterlockedCompareExchangePointer either stores new value in the
storage and returns
nil or does nothing, leaves already initialized
storage intact and returns something other than
At the end, the code handles two specific case. If a
T is an interface type and initialization was successful, the temporary value in
tmpT must be replaced with
nil. Otherwise two variables (
tmpT) would own an interface with a reference count only of
1 which would cause big problems.
T is a class type and initialization was not successful, the temporary value stored in
tmpT must be destroyed.
Initialize returns the same shared object twice – once in the
storage parameter and once as a result of the function. This allows us to write very space-efficient initializers like in the example below, taken from the OtlParallel unit.
When a new instance of the shared object is created by calling the default
Create, you can use the two-parameter version of
Atomic to simplify the code. [3.06]
This is only supported in Delphi XE and newer.
For example, if the shared object is stored in
shared: IMyInterface and
is created by calling
TMyInterface.Create, you can initialize it via
A common scenario in parallel programming is that the program has to wait for something to happen. The occurrence of that something is usually signalled with an event.
On Windows, this is usually accomplished by calling one of the functions from the WaitForMultipleObjects family. While they are pretty powerful and quite simple to use, they also have a big limitation – one can only wait for up to 64 events at the same time.
Windows also offers a RegisterWaitForSingleObject API call which
can be used to circumvent this limitation. Its use is, however, quite
complicated to use. To simplify programmer’s life, OmniThreadLibrary introduces
TWaitFor class which allows the code to wait on any number of events.
TWaitFor, you have to create an instance of this class and pass it
an array of handles either as a constructor parameter or by calling the
SetHandles method. All handles must be created with the
You can then wait for any (
WaitAny) or all (
WaitAll) events to become
signalled. In both cases the
Signalled array is filled with information
about signalled (set) events. The
Signalled property is an array of
THandleInfo records, each of which (currently)
only contains one field - an index (into the
handles array) of the signalled
For example, if you want to wait on two events and then react to them, you should use the following approach:
You don’t have to recreate
TWaitFor for each wait operation; it is perfectly
ok to call
WaitXXX functions repeatedly on the same object. It is also fine
to change the array of handles between two
WaitXXX calls by calling the
WaitAny method also comes in a variant which processes Windows messages,
I/O completion routines and APC calls (
flags parameters are the same as the
corresponding parameters to the MsgWaitForMultipleObjectsEx API.
The use of the
TWaitFor is shown in demo
TOmniLockManager solves a very specific problem – how to synchronize
access to entities of any type. In a way, it is similar to
.Exit, except that it
works on all types, not just on objects.
Following requirements are implemented in the
Lockto get exclusive access to a key and calls
Unlockto release the key back to the public use.
Unlockcalls in one thread matches the number of
Lockcalls, the key is unlocked. (In other words, if you call
Locktwice with the same key, you also have to call
Unlocktwice to release that key.)
TOmniLockManager<K> public class implements the
Lock function returns
False if it fails to lock the key in the specified
INFINITE are supported.
There’s also a
LockUnlock function which returns an interface that
automatically unlocks the key when it is released. This interface also
Unlock function which unlocks the key.
A practical example of using the lock manager is shown in demo
For debugging purpuses, OmniThreadLibrary implements the
TOmniSingleThreadUseChecker record. It gives the programmer a simple way
to check that some code is always executed from the same thread.
Using it is simple – first declare a variable/field of type
in a context that has to be checked and then call
method of that variable whenever you want to check that some part of code was
not used from more than one thread.
The difference between
DebugCheck is that the latter can be
disabled during the compilation. It implements the check only if the conditional
OTL_CheckThreadSafety is defined. Otherwise,
no code and does not affect the execution speed.
In cases where you do use such an object from more than one thread (for example,
if you use it from a task and then from the task controller after the task terminates)
you can call
AttachToCurrentThread to associate the checker with the