kdb+ v3.4 introduces a new broadcast feature that reduces the work done when publishing messages to multiple subscribers. Whenever a message is sent from one process to another via IPC (Inter Process Communication), the message is serialised before being sent and then deserialised when received. Under normal IPC, if the same message is sent to multiple processes, it is serialised separately each time.
The broadcast function (-25!) allows the message to be serialised once and sent to multiple subscribers. When passed a two item list of (list of handles; message), it will serialise the message once and send it asynchronously to the handles provided. For the simplest case of one publisher and one subscriber, there shouldn’t be any performance improvement. As the number of subscribers increases, the savings should become more apparent.
-25!(HANDLES;MESSAGE) -25!(5 6 7i;(`upd;`trade;tradedata))
The schema we will be using here will be the same schema used throughout the analysis. Each row will be approximately 28 bytes.
q)meta trade c | t f a -----| ----- time | p sym | s price| f size | i stop | b cond | c ex | c
We are going to investigate the difference in CPU time and memory usage between broadcast and normal publishing. We will be increasing the subscriber count whilst keeping the data size and publish interval constant. The two cases we are going to test with:
- large bulk updates, sent infrequently. 10,000 rows published at 200ms intervals.
- smaller updates, sent frequently. 50 rows published at 50ms intervals.
The difference between the two cases is to highlight the difference between broadcast and normal publishing. The publish interval won’t have much of an effect on the test as there will be no slow subscribers and the publishing duration should not exceed the interval. The range of subscribers these cases will be tested against is: 1 2 4 8 15 30 50 75 100. It is weighted towards the lower end as this will be the more common number of subscribers in typical pub/sub setups.
We launch one feed, one tickerplant and N subscribers. The default kdb+tick library is used with modifications made to the publishing function, .u.pub (this is now available in TorQ). The modifications include overriding .u.pub with the broadcast function as well as immediate flushes which are only required for this test in order to accurately measure time and memory per publication. Every subscriber is subscribed to all of the symbols. The only thing that we care about for the time being is the performance of the publish function and not the latency between publisher and subscriber. The data we are measuring is the elapsed time to publish and amount of space required for publishing. Whilst the ts system command provides time in milliseconds, we are going to record the time difference immediately before and after using timestamps which will allow sub-ms precision. The timestamp difference and memory is recorded, and upserted to a log on disk. Each test will last 5 minutes and will be repeated 3 times across the range of subscribers. To make sure that the measuring times are consistent, the processes are restricted to different cores. As such, the tickerplant will have its own core, whilst the feed and subscriber processes are split amongst three cores.
All tests were run on the incredibly fast AMD FX-7500 Radeon R7 2.1GHz processor which comes with the general issue AquaQ laptop, running Linux, kdb+ l32 3.4 2016.08.04. The results are likely quite hardware dependent and could be improved with tuning, so please test independently on your setup!
The results were parsed from the log files created during testing. Once they were parsed, the average time and memory usage were calculated and then exported to csv. From these csv files plots were created using gnuplot.
By plotting the publishing time against a large number of subscribers, the difference in publishing time was exaggerated and we can see a clear trend. In both cases publishing was marginally faster than broadcast for one subscriber, but slower for all other subscriber counts. Even with broadcast publishing though adding subscribers is not “free”- some additional time is still required per subscriber. This is likely the time taken to write the data to the socket to be sent to the client. When larger datasets are published the improvement achieved by broadcast increases.
The memory results showed that both broadcast and standard publish used constant memory irrespective of the number of subscribers (the graphs were pretty boring and therefore not shown!). This may not be the case if using a standard, unmodified u.q from kdb+tick as the data for each subscriber may be serialised and queued before all being flushed.
Broadcast would be recommended when there is more than one subscriber receiving the same dataset, and especially if the datasets being transmitted are large. A common example of this would be a standard kdb+ data capture set up (such as TorQ) where there are multiple subscribers to the tickerplant which receive full table subscriptions. Examples of these processes in a standard set up are the RDB, the WDB (writing database), Chained Tickerplants and Real Time Engines for streaming calculations.
Broadcast publish is a very useful addition to the kdb+ arsenal!