Parallel kdb+ Database Access with QPad and TorQ

Blog kdb+ 8 Nov 2016

Data Intellect

We’ve been working with Oleg Zakharov, who created QPad, to implement asynchronous querying and allow users to run multiple concurrent queries more efficiently.  This blog explains how it works, using a new version which Oleg will make available soon (but can be downloaded from here for now).

The Problem

A common usage pattern in kdb+ environments is to have multiple users accessing the same dataset simultaneously – for example, Quant users working in a research environment.  Users can either access processes directly (a dedicated process per user) or via a gateway.  The gateway process gives the advantage of allowing processes to be shared and failures to be managed transparently to the users.  A set up like this is usual:

AquaQ TorQ Synchronous Diagram

 

All the common IDEs send queries synchronously.   Sending synchronous queries through a kdb+ process causes the process to become unresponsive until the query is complete and a response is sent to the client. If one of these users wants to run a query that will take a long time to run, then the rest of the users will not be able to run anything against this gateway for the duration of this query.

The TorQ gateway will load balance by selecting an RDB/HDB which isn’t busy.  However, the synchronous query sent to the gateway will cause the gateway itself to become unresponsive until it responds to the client. This means that while the gateway is executing a synchronous query, it doesn’t matter how many background processes are available to serve other user’s requests, the other users cannot use the gateway to facilitate their own requests to the background processes.

All kdb+ processes will stop and wait for synchronous queries to complete.  So how do we avoid this?

AquaQ TorQ Asynchronous Diagram

 

One way of avoiding this problem is to send asynchronous queries to the gateway instead. When a kdb+ process receives an asynchronous query, it has more control over what it can do with the query.  When the TorQ gateway receives an asynchronous query, it passes it onto a backend process that currently has nothing running, and then moves onto the next query. When the backend process sends the gateway back its response, the gateway carries out any further manipulation required and propagates the response to the client. This means that as long as there are enough available background processes to serve the user’s requests, any amount of users could send queries at the same time, to the same gateway and not have to wait on other user’s queries before they get a response.

By default QPad (and the other IDEs) will send synchronous queries, but QPad does have an asynchronous query mode that this guide will show you how to enable.

Starting QPad in Async Mode

QPad has an asynchronous query mode that can be activated using a flag when starting the application. To do this, open a windows console in the same directory as the QPad executable and type:

qpad64.exe --async

Or

qpad32.exe --async

(depending on whether you have a 32bit or 64bit version of QPad).

It can be useful to have a .bat file in the QPad application directory with this command as its contents, so that you do not need to keep typing the command to start QPad in Async mode.

qpad_shortcut_properties

You could also modify a shortcut to QPad so that it appends the “–async” flag when launching QPad. This can  be done by right clicking on the QPad executable, and selecting “Create Shortcut”. Then right clicking on that shortcut and selecting properties. Then editing the Target field shown in the image above.

When QPad is started in Async mode you will have a new option on the menu bar, Q -> Sync.

qpadsync

When this is selected it will change to “Async” with a tick mark next to it. You are now in Async mode and QPad will send queries asynchronously and wait for a response – but this does not ensure that the process being queried will send back a response.

Defining a Callback Function

Now that you are sending asynchronous queries, you will need to define a callback function to ensure that the response to normal queries are sent back to QPad. A callback function essentially tells the process to evaluate the query and send the result back to the process that sent the query.

A query like “.gw.asyncexec[“1+1″;`rdb`hdb]” run against a TorQ Gateway will work fine without a callback function, as it has a callback function built in, but other queries might not work without a user defined callback function.

Here is an example of a callback function that allows QPad to run queries against standard kdb+ processes as well as TorQ Gateway processes:

{$[".gw.asyncexec[" ~ 14#x;
  value x;
x like "*.gw.asyncexec*";
  @[neg[.z.w];"Async error: can only invoke .gw.asyncexec as standalone query. e.g. .gw.asyncexec[\"1+1\";`rdb`hdb]";()];
  {@[neg[.z.w];{$[(::)~x;"Async success";x]} @[value;x;{"Async error: '",x}]; {@[neg .z.w; "Async error: failed to send result back";()]}]}x]}

Let’s go through this line by line as there’s a lot going on!

{$[".gw.asyncexec[" ~ 14#x;

The first line checks if the first 14 characters of the query are equal to “.gw.asyncexec[“. If the first 14 characters were “.gw.asyncexec[“ then the function instructs the kdb+ process to simply evaluate the query, since it is likely to be the .gw.asyncexec[] function that already has an inbuilt callback function.

 x like "*.gw.asyncexec*";

The third line checks if the query contains “.gw.asyncexec”, but not at the beginning of the query. This is done to prevent the user from embedding .gw.asyncexec in other operations and perhaps getting unexpected results. If the third line evaluates to true, then the kdb+ process is instructed to send an error back to the client informing them that they may only use the .gw.asyncexec[] function in a standalone manner.

@[neg[.z.w];"Async error: can only invoke .gw.asyncexec as standalone query. e.g. .gw.asyncexec[\"1+1\";`rdb`hdb]";()];

Finally if none of the previous conditions are true, the function instructs the kdb+ process to run the query and return the result back to the client, or the string “Async Success” if the query did not return a value, but otherwise completed successfully. If any errors are encountered, the kdb+ process is instructed to construct an error message and return that to the client instead.  It also handles serialization errors on the return, for example, if the query result was too large or an IPC dynamic load.

 {@[neg[.z.w];{$[(::)~x;"Async success";x]} @[value;x;{"Async error: '",x}]; {@[neg .z.w; "Async error: failed to send result back";()]}]}x]} 

Appending a callback function to every query

You can have QPad append this to the beginning of every query by editing a registry value.

  1. Open regedit (you can find this using the windows start menu search)
  2. Navigate to “HKEY_CURRENT_USER/Software/ozConcept/ozQInsightPad/Settings” using the left pane and search for the value “QueryPrefix” on the right.
  3. Change the value of this to the callback function above (or to one of your choice)

AquaQ TorQ Regedit

 

Make sure to change QueryPostfix as well, this is what is appended to the end of your query, in this case it should be blank. You will need to restart QPad for these changes to apply. (remember to use the async mode flag if you want to send asynchronous queries).  Now you should be able to send asynchronous queries using QPad and get back the correct response – the queries will execute just as they did before, but with the additional advantage of being able to utilise the TorQ Gateway.

Summary

Sending synchronous queries through a kdb+ process causes the process to become unresponsive until the query is complete. This prevents, for example, a  gateway process from serving multiple queries at once and load balancing them across multiple background processes. It is better to send asynchronous queries through the gateway as this will result in the system being capable of serving multiple queries from multiple users at the same time, through one gateway process. This guide explains how to setup QPad to send asynchronous queries instead of synchronous queries.

Share this:

LET'S CHAT ABOUT YOUR PROJECT.

GET IN TOUCH