caneka

A programming langauge for streams and lifecycles

=

Table KeyValue Store

Table is a convention on top of the Span data structure which provides key/value lookup resolution. In short, this is a hash table.

The underlying mechanism is to resolve a portion of bits from the hash value of the key (such as 0001 from 11100111101001011001101010110001). In this case we have the hash of the string "Delta" (generated by our hashing algorithm which is a modification of dbj2), into a value within the range of the slots in the Span, 16 to start, then later 256.

For example, a Table would resolve the above hash into slot 1 (0001 in binary) for a 16 slot span, or 177 (10110001 in binary) for a 256 slot Span.

This is done by masking the hash positionally, against the max value of the slots available.

The placement and retrieval lookup resolution is re-query based. This means that if there is already a non-matching item in the slot, it will try the neighboring bits to find a home for the item. In the above example, if slot 1 is taken an attempt will be made to place the key/value in slot 11 (1011 in binary) because that is the second 4 bits of the hash value.

Each req-query examines the bits 4 positions higher and looks in that location for an opening to place the item. When it is fetched it will pursue the same lookup pattern, and is thereby gauranteed to find the item by comparing the full hash.

Below is a table with four items in it. Two of the items find an open slot on the first try, with one needing two tries, and one needing three tries, but in practice the second try for "Carrots" would be skipped because it contains identicle bits.

This is how our debugger presents the table on the console using our Debug_Print functionality

Span<4 items in 1 dims of 16 slots each
  Slab<icr16[0] 
    3=H<itm="Bananas" v="Don't slip">, 
    7=H<itm="Apples" v="Mmmm pie in the making">, 
    9=H<itm="Carrots" v="And peas and potatoes and corn.">, 
    11=H<itm="Dandilions" v="Summer-summer-summer-Time!">>
>

Table Resize

The table will resize if it has to query too many times to find a place for items, or if it begins to fill the slots. The below table is how the first 10 items would be inserted into a Table.

Debug_Print output:

Span<10 items in 1 dims of 16 slots each
  Slab<icr16[0]
    0=H<itm="Foxtrot" v="Five Minutes">,
    1=H<itm="Alpha" v="Apples">
    5=H<itm="India" v="Idio-syncratic">
    6=H<itm="Golf" v="Gophers Are Cool">
    7=H<itm="Bravo" v="Bandits">
    8=H<itm="Juliet" v="Jockey Rider">
    9=H<itm="Charlie" v="Carrots">
    10=H<itm="Hotel" v="Happy Go Lucky">
    11=H<itm="Delta" v="DeadMan">
    15=H<itm="Echo" v="Elevator">>
>

Inserting one more value puts it over the slots-filled limit, because 11 values in 16 spaces will increase the collisions significantly, so it resizes after the below insertion.

After "Kilo" is inserted into the table, it resizes to a dimension 2 span with 256 slots instead of the basic 16. Below is the re-insertion pattern into the new table using 8 bits instead of 4. As you can see, the index values are now between 0 and 255.

Here is the Debug_Print output for the final 256 slot Table. It's complex to read becuase it splits over several slabs now that the Span has two levels of slab dimensions.

Span<12 items in 2 dims of 16 slots each
  Slab<incr256[0] 0=0, 1=16, 3=48, 4=64, 6=96, 7=112, 9=144, 10=160, 11=176, 14=224>
    0=Slab<icr16[0]
        1=H<1 itm="Alpha" v="Apples">>
    3=Slab<icr16[48]
        0=H<48 itm="Foxtrot" v="Five Minutes">
        8=H<56 itm="Juliet" v="Jockey Rider">>
    4=Slab<icr16[64]
        6=H<70 itm="Golf" v="Gophers Are Cool">>
    6=Slab<icr16[96]
        9=H<105 itm="Charlie" v="Carrots">>
    7=Slab<icr16[112]
        15=H<127 itm="Echo" v="Elevator">>
    9=Slab<icr16[144]
        7=H<151 itm="Bravo" v="Bandits">>
    10=Slab<icr16[160]
        0=H<160 itm="Hotel" v="Happy Go Lucky">>
    11=Slab<icr16[176]
        1=H<177 itm="Delta" v="DeadMan">>
    14=Slab<icr16[224]
        5=H<229 itm="India" v="Idio-syncratic">
        15=H<239 itm="Kilo" v="Kangaroo">>
>

If the query placement encounters more than a threshold of insert conflicts it will resort to a linked list of items as is standard in a normal hash table. This is possible but has arisen very rarely in our testing so far.

Summary

The system is in it's infancy and is ripe for optimization. One challenge is that just after the table expands, it is likely not very memory efficient due to several 16 slot spans allocated with only a single item in them. The system is functional now but may be optimized more if we were to move into IoT or embedded spaces where memory usage is more sensitive.

For the sake of clarity we can emphasis that the simplicity of using several sections of bits from the hash makes for a clean memory footprint while providing consistent lookups. See the page dedicated to the underlying architecture for an overview of how the Span allocation works.