Garbage Collection in kdb+

AquaQ Admin kdb, kdb+ 6 Comments

The aim of this article is give an understanding of how kdb+ uses and releases memory, and the options available to modify the behaviour.

kdb+ allocates memory in powers of 2.  A vector of data will always be placed into a memory block which is the next power of two up from the raw data size (and allowing for some header information).  For example a vector of 8000 8-byte long integers has a raw size of 64000 bytes.  However, it will require a memory block of size 2^16 = 65536 bytes.  We can demonstrate this with the \ts operator, which shows the time in milliseconds (first result) and space (second result) required for an operation.

q)\ts til 8000
0 65712

If we were to instead create a vector of 9000 8 byte long integers then kdb+ would use the next power of two up to store the data.

q)\ts til 9000
0 131248

The actual boundary case doesn’t quite lie at 8192 (2^13) but instead at 8190:

q)\ts til 8190
0 65712
q)\ts til 8191
0 131248

It is common with kdb+ systems for vectors to grow (e.g. rows being inserted into a table). If a vector of 8188 longs is grown by 1 element each time, the concatenations are cheap until the boundary point is hit, in which case a new memory block must be allocated.

q)a:til 8188
q)\ts a,:1
0 400
q)\ts a,:1
0 400
q)\ts a,:1
0 131232
q)\ts a,:1
0 368

In the example above when the boundary of 8190 elements is exceeded, a memory block of size 2^17 is allocated and the value of a is copied into it. The old block of size 2^16 is placed on the heap to be recycled internally.

The power-of-2 allocation approach leads to excellent performance, but potential for being “memory hungry” – the database will, in the worst case, require twice as much memory as raw data. However, the disadvantages can be managed using the garbage collection flag -g and the inbuilt kdb+ function, .Q.gc[].

Under normal circumstances (-g 0 is default), unused memory blocks are not released back to the operating system but are retained and recycled internally. Switching garbage collection to immediate mode (-g 1) means that any large memory blocks (>32 MB) freed by the process are returned immediately to the operating system. Invoking .Q.gc[] (irrespective of the –g setting) will return any large memory blocks back to the operating system and also attempt to coalesce smaller memory blocks into large blocks to be returned. A small scale example of these operations is outlined below. Note that the inbuilt .Q.w[] function can be used to retrieve the memory stats of a q process into a readable form.

q).Q.w[]
used| 118384
heap| 67108864
peak| 67108864
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754
  • used is the subset of heap which is currently being used.
  • heap is the memory allocated by the OS to the q process.
  • peak is the largest value that the q process has been allocated.
  • wmax is the memory limit enforced by the -w command line flag
  • mmap is the mapped memory in use
  • mphy is the size of the physical memory on the host
  • syms is the number of symbols which have been created (internalized) in the process
  • symw is the amount of memory used by the created symbols

We can monitor how these values change as memory is used:

q)a:til 10000000
q).Q.w[]
used| 134336160
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

Running the operation ‘a:til 10000000’ has required the heap to be increased to approx 200 MB.

q)delete a from `.
`.
q).Q.w[]
used| 118400
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

Here, deleting ‘a’ has reduced the memory being currently used by, but not the physical memory allocated (heap) to, the q process. This may not be ideal as the process is holding up resources it does not immediately need. Fortunately, kdb+ has a method of managing this in the form of the garbage collection function.

Let’s look at the same values if garbage collection has been set to immediate.

C:\q>q -g 1
KDB+ 3.2 2014.11.01 Copyright (C) 1993-2014 Kx Systems
q)a:til 10000000
q).Q.w[]
used| 134336160
heap| 201326592
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754
q)delete a from `.
`.
q).Q.w[]
used| 118400
heap| 67108864
peak| 201326592
wmax| 0
mmap| 0
mphy| 2036150272
syms| 567
symw| 20754

In the example above, we can see that the heap has been reduced as soon as ‘a’ has been deleted. It’s as simple as that! Well, almost. There is a caveat here. Setting -g to 1 doesn’t automatically clear everything. For example, if we start out as before:

C:\q>q -g 1
KDB+ 3.2 2014.11.01 Copyright (C) 1993-2014 Kx Systems

And then perform the following operations before checking memory usage again:

q)a:upper -10000?`4
q){@[`.;x;:;til 3000]} each a;
q).Q.w[]
used| 327995136
heap| 335544320
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20564
symw| 858712

We can see that the used and heap memory have increased (along with the number of symbols and their size within the process).  Now, if we delete what we have just created, we get the following results:

q){value"delete ",(string x)," from `."}each a;
q).Q.w[]
used| 184128
heap| 335544320
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20564
symw| 858712

You can see that the used memory has decreased, but the heap remains high. The problem here is that the “-g 1” method doesn’t clear objects with size <= 32 MB, and the assignment has created lots of objects below this size. In cases like these, garbage collection has to be run manually again:

q).Q.gc[]
268435456
q).Q.w[]
used| 184128
heap| 67108864
peak| 335544320
wmax| 0
mmap| 0
mphy| 2036150272
syms| 20565
symw| 858742

Which has reduced the heap back down to initial levels.

Does your project have a garbage collection problem that needs solved? Let AquaQ consultants take out the trash…

AquaQ AdminGarbage Collection in kdb+

Comments 6

  1. Pingback: Garbage collection in kdb+

  2. Pingback: Supporting kdb+ Tick - AquaQ Analytics

  3. sam

    What is alternative to .Q.gc[] in kdb/q 2.6 version? We are using 2.6 and my processes are leaking a lot of memory due to this issue. Is there a workaround or a recommeded solution or practice to handle this issue? Shoudn’t I be using mmap tables on disk over loading everything in memory?

    1. Jonny Press

      Hi Sam

      I’m not sure if there is an alternative. Ideally if you can work off disk you probably should (allow kdb+ to mmap files as required) rather than reading them into memory, although the lower memory usage is a trade off against query speed.

      Thanks

      Jonny

  4. sam

    On the other hand, after running .Q.gc[] on 2.7, there is still a significant difference between used and heap numbers shown by .Q.w[]. Where is the residual going? As per your explanation, .Q.gc[] call should defragment and free up unused space. The difference is significant – used (11G), heap (17G)

    1. Jonny Press

      Hi Sam

      I think the way it works is kdb+ will request memory blocks of 64MB from the OS and then join split them / join them as required to get the block sizes that it needs. I believe it can only return the memory to the OS in the same blocks that it got – so to return a 64 MB block, it all has to be free. If a small part of a 64MB block is still in use it will be retained, hence you might see a high heap value when compared to the used value. I think .Q.gc will check which blocks can be put back together (as all constituents are free) and return them, but will not attempt to shuffle objects between blocks.

      The below is on 3.3 (default int type is 8 byte, you’ll need to do til 8000000 on 2.7). I imagine the behaviour will change a bit between versions.

      Thanks

      Jonny

      // start
      q)usedheap#.Q.w[]
      used| 142320
      heap| 67108864

      // create objects of approx 32MB
      q)a:til 4000000
      q)usedheap#.Q.w[]
      used| 33696800
      heap| 67108864

      // mem usage grows by 64MB each time
      // b and c are in the same 64MB block
      // as the heap value does not grow between allocations
      q)b:til 4000000
      q)usedheap#.Q.w[]
      used| 67251232
      heap| 134217728

      q)c:til 4000000
      q)usedheap#.Q.w[]
      used| 100805728
      heap| 134217728

      // d and e are in the same block
      q)d:til 4000000                                                                
      q)usedheap#.Q.w[]
      used| 134360160
      heap| 201326592

      q)e:til 4000000                                                                
      q)usedheap#.Q.w[]
      used| 167914592
      heap| 201326592

      // delete c and e – .Q.gc returns 0
      // as b and d respectively reside in the same block
      q)delete c from ..
      q).Q.gc[]
      0
      q)delete e from ..
      q).Q.gc[]
      0

      // note the difference between used and heap
      q)usedheap#.Q.w[]
      used| 100805728
      heap| 201326592

      // delete d
      // .Q.gc can now return memory
      // (as d and e are both free in that 64MB block)
      q)delete d from ..
      q).Q.gc[]
      67108864

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax