When setting up a kdb+ production system, you may not always have access to the hardware you need. In such instances, it can be helpful to turn to “the cloud”. There are a number of potential benefits to using a cloud system instead of a local server. For example, important server-related duties such as security, failover and data redundancy will be handled by the company providing the server. Several companies offer services in this area, including Amazon (Amazon Web Serivces – AWS), Google (Google Cloud Platform – GCP) and Microsoft (Azure). But which is best for your needs, and how much can you expect to pay?
First, what do you need from one of these companies? Essentially, you will need a virtual machine (VM), which will run your kdb+ processes. You will also need storage, to store your data. Finally, you will want to have backups of your data.
Beginning with the VM, the most important specification is likely to be RAM, as kdb+ systems will typically deal with very large amounts of data in-memory. The number of CPU cores is less likely to be an issue as the limiting factor will typically be the number of kdb+ licensed cores, rather than the number of available cores.
In terms of RAM, the highest offered in a single VM instance by each provider is as follows:
Another important consideration will also be the amount and type of storage. With the large amounts of data being stored on-disk in most kdb+ systems, the type of storage chosen can make a dramatic impact on both performance and price. Each of the providers has a number of different options, ranging in prices and speeds.
AWS has 3 main solutions for storage – S3, EFS & EBS.
- S3 (Simple Storage Service) is the cheapest, but it is essentially “offline” storage as far as KDB is concerned. Files stored here must be copied onto the VM instance before being used.
- EBS (Elastic Block Storage) attaches to the VM instance, and appears as local storage. Within EBS, there are different options for the type of storage (SSD or HDD). Details of the options are available on the AWS website.
- EFS (Elastic File Storage) is a distributed file system, allowing data to be shared across multiple hosts and giving a higher throughput than EBS. It is considerably more expensive than EBS.
One point to note with EBS is that all types of storage offered have a maximum bandwidth limit of 1250 MB/s (per instance, which may contain several storage volumes, each with it’s own lower bandwidth limit). This can place a limit on the speed of queries running across very large datasets. In cases where this will be a problem, EFS may be the better option.
EBS currently ranges in price from $0.025/GB-month (for cold HDD i.e. infrequent access) to $0.125/GB-month (for highest performance SSD), while EFS is around $0.30/GB-month. (Note that the prices vary slightly depending on the location of the server).
GCP and Azure both offer similar options for storage. For example, GCP storage options are summarised here. They offer several types of persistent disk, analogous to AWS’s EBS storage, and Cloud Storage buckets, analogous to AWS’s S3 storage. They also offer local SSDs and RAM disks, for higher performance storage at a higher price and lower maximum size. GCP’s persistent disks also have a lower bandwidth limit than AWS’s EBS storage.
Azure also offers a selection of storage options. Here, Blob Storage is analogous to S3 (object storage), and Disk Storage is the equivalent of EBS. Once again, different types of disk are offered for Disk Storage – Premium Disks (i.e. SSD) and Standard Disks (i.e. HDD). The prices of these storage options per GB-month vary depending on the total size, and the redundancy options desired.
Each of the services also offer options for snapshot backups. Prices depend on the amount of storage needed for the backups.
Another important consideration is bandwidth cost – how much will it cost to get data into and out of your cloud server? None of the three providers charge for bandwidth in, meaning you can have as much data flowing into your server as you need without paying anything for the bandwidth. Each provider charges for bandwidth out of the server, with pricing available for AWS, GCP and Azure. The pricing depends on the volume of data, as well as (in the case of GCP & Azure) the geographical location being transferred to. Note that AWS and Azure both have free allowances for the first 1GB/month and 5GB/month respectively.
A useful tool each of the providers offers is a pricing calculator to estimate the cost of the required services. These can be accessed at the following links:
As a comparison, we priced a similar setup with each provider (different offerings from the providers means we were unable to price exactly the same setup from each). We priced a setup containing 200 TB of HDD storage, with as much RAM as possible (typically desirable in a KDB system). We also included snapshot backups of the storage. Prices correct as of February 2017.
|VM Spec||64 CPU cores, 488 GB RAM, no disk||32 CPU cores, 208GB RAM, no disk||32 CPU cores, 448GB RAM, 6 TB disk|
|Storage Spec||208 TB, Throughput Optimised HDD EBS||200 TB, Persistent Disk Storage||200 TB, Basic Disk Storage|
|Backup Spec||3% change for monthly snapshots||200 TB backup||200 TB backup, LRS redundancy|
Note that in the above comparison the VM from GCP has considerably lower RAM. As indicated earlier, the maximum amount of RAM is considerably lower.
We are currently building/maintaining systems on both Azure and AWS. We will be publishing future blog posts on our experience with both. The next subject to investigate is modifying the kdb+ system architecture to fit more with the cloud paradigm: use a larger number of small hosts accessing a distributed file system such as Amazon EFS.
If you would like more information on our experience to date with kdb+ on the cloud, please email us directly to email@example.com