Many people choose to connect directly from a kdb+ instance to a data source such as Activ Financial or Bloomberg via the kdb+ C api. In this discussion we will create a dynamically loaded shared object within kdb+ and implement the following 3 design patterns:
- A single threaded application that reads data directly from the source onto the kdb+ main thread
- A dual threaded application using a socket pair to transfer information safely between a dedicated exchange thread and the kdb+ main thread
- An extension of the socket pair method that circumvents certain issues associated with transferring data through a socket and allows for conflation of updates
At the end of this blog the reader should have a basic understanding of some options available to build a market data interface that will ingest data into a kdb+ ticker plant or related application.
Outline of Setup
This blog assumes the use of recent version of kdb+ on a standard Linux server. Relevant code can be downloaded from the AquaQ GitHub repository. In each example, a simple “exchange” is mimicked by calling the routine ‘startserver’ with the appropriate arguments. The implementation of ‘startclient’ varies between examples, and a utility function ‘getstats’ exists to print metrics of the client’s behaviour with respect to messages read and processed. Each of the functions is exposed as part of a shared library. This library can be built by following the instructions in the README located within the repository, and also contains information about the functions’ arguments. Instructions to recreate output associated with the three setups below are also contained.
This is the most straightforward method and easiest to code. The server simply writes data to a socket once it receives a connection, while the client reads and processes the data as it is sent. In this example the design pattern will perform satisfactorily when the time taken to process an individual message is small.
However, as the message rate increases or the processing time of a single message gets large the design will suffer issues, namely the connection socket filling up. In the real world this would result in a server disconnecting a client. The steps to reproduce these examples are shown in the “Design A – Simple Callback” section of the following README.
A further design limitation of the previous approach is that all the work required to deal with a message packet from the exchange must be completed by one thread. If the message structure is complex this can be problematic and lead to latency. Furthermore, micro message bursts can cause server side message overflow or loss depending on the connection protocol.
A dual-threaded design pattern alleviates these issues by dedicating a thread to read messages from the exchange or source, leaving the main kdb+ thread to process the message once read from the socket pair. This has the advantage of spreading the workload of the application over multiple cores and minimising the data build-up on the exchange connection.
The benefits of this design can be seen in the “Design B – Dual-thread consumer” section of the following README. When a slow data processing callback is defined, ‘getstats’ shows that the exchange connection is being drained of messages as soon as they arrive, and are being held on an internal buffer, awaiting processing by the kdb+ main thread.
This setup means that micro bursts of messages can be buffered. One drawback is that this design incurs unneeded overhead, as messages must be serialised and un-serialised, as well as rewritten to a network socket, due to the fact that K objects cannot be shared directly between threads without potential issues.
Dual-Threaded Setup with Circular Memory Buffer
The previous design pattern represents a good outline for a basic feed handler that can deal with the fluctuating message rates experienced in real life market data feeds, and shield the external connection from any processing lag. However, the socket pair design has some disadvantages, due to the fact that the exact settings of the socket pair used may not be easily configurable at the application level, and the overhead of reading and writing relatively large payloads to and from the intermediate socket.
In this final design we still use a socket pair, but on receipt of each raw message only write a single byte to trigger the main callback, and push the main data payload to an area of memory that has been pre-allocated during the initialisation of the application. Furthermore, in this example we have extended the callback to entirely drain “pending messages” from the buffer, enabling the kdb+ processing to conflate or look ahead in the message queue. This could be invaluable if the application is used for real time trading, as messages could be processed in bulk, at exactly the time when message load is largest, or indeed certain message types can be ignored.
The benefits of this design can be seen in the final section of the README. The callbacks to the main thread can contain multiple updates from the exchange. In practice the feed handler callback could be coded to be intelligent enough to process multiple updates in a batch, or simply ignore unimportant updates. A real world example of this would be updating a last value cache of quote prices from multiple providers. This design has been implemented with a common data source (www.solace.com) and is publicly available