Implementation
Hashing
Want stable hashing. Do not hash on pointer values.
Hashconsing
Weak references
Intrusive counters with weak count
Reference counts in same allocation as data. When strong ref count hits zero, destroy the data. Until the weak ref count hits zero, the object remains to indicate that there is zero strong refs so weak refs cannot access data.
Consider adding indirection to the data if the data is large and weak pointers may survive a long time.
Good catch coherency, fewer allocations.
Separate counters with weak count
Reference counts in a separate allocation to the data. When strong ref count hits zero the allocation for the data can be freed. The allocation for the counts remains to indicate the pointer is now invalid until the weak ref count also hits zero.
Reference objects have two pointers, one to the counters and one to the object. Strong refs can access the data pointer directly. Weak refs must access the counters first and check strong is nonzero to upgrade.
Less cache coherency, more allocations.
Weak reference with intrusive linked list
All weak refs to an object form an intrusive linked list. The head of the linked list is stored with the reference counts. As part of deleting an object, traverse all the weak references to that object via the intrusive linked list and set them to null.
Reference Counting Optimizations
Cache Invalidation and Immutability
Updating a reference count will modify a cache line containing an object. One of the strengths of immutable data, such as objects in Hirpdag, is that it can be shared between threads. Modifying reference counts adjacent to objects will dirty unmodified shared cache lines.
Unmodified cache lines can remain in the Shared state
Coalescing and Elision
Coalescing reference count modifications together can reduce update operations by 50-90%. Only the first increment and last decrement need to remain in the same place. This kind of statement reodering and combining would need compiler support to be applied comprehensively.
Comparisons
Pointer Equality
Compare object pointers for equality or inequality.
Ordering
Cannot use the object pointers for ordering in many contexts, they are unstable and meaningless.
A deterministic ordering is desirable, which reflects a partial order of the global Hirpdag object DAG.
Normalization
On construction.
Rewriting
Apply rewrite rule to self. Rewrite all children. Construct a replacement for self, if anything changed.
Memoization
Enabled by immutability and reference counting.
Hash map of reference to reference. Key is input, value is output.
Serialization
Leaf objects should appear before other objects which use them. The serialization ordering should be a valid bottom up partial order. A post-order traversal will produce one possible linear ordering.
Good to mark reference handle roots in serialization format.
Cache coherent datastructures
Immutable data will typically make non-contiguous data structures (such as tree-sets, tree-maps, and linked-lists) less appealing.
Cache line size is typically 64 bytes on most modern x86 systems.
$ getconf LEVEL1_DCACHE_LINESIZE
64
Object Metadata
Each Hirpdag Object has a small amount of metadata attached to it. This includes:
- DAG Height
- DAG Count
- Content flags
The content flags are intended to provide a hint to avoid unnecessary traversals.
Reference Count Update Elision
Incrementing and decrementing can be expensive and they may occur frequently. Because the count is shared so these updates must be atomic.