This task is related to work done on our GHash in [temp-ghash-experiments](https://developer.blender.org/diffusion/B/browse/temp-ghash-experiments/) branch.
Current Branch vs Master
===================
This is a simple comparison between new code in branch and master one, using as close a possible same tests. Both are done using modulo, and no pre-reservation of buckets. Master uses load 3.0, new branch, load 0.75. New branch also adds hash stored in entries.
| | Master | | Branch |
| -------------------------------- | --------- | --------- | --------- | ---------
| | Insert ms | Lookup ms | Insert ms | Lookup ms
| -------------------------------- | --------- | --------- | --------- | ---------
| Strings (1M words, 122M file) | 8.287163 | 5.990606 | 2.646282 | 2.613921
| 100M first integers | 18.988747 | 23.433619 | 18.983423 | 12.533957
| 50M random integers | 8.690880 | 10.420476 | 8.778986 | 5.496768
| 50M random integers (no hash) | 8.495241 | 9.679239 | 8.603362 | 5.333819
| 20M random 4D integers vectors | 6.970162 | 5.530896 | 3.979335 | 2.019020
Notes:
* Switching from a load 3.0 to 0.75 implies between one and two additional resizes in branch code, which means using pre-reservation gives even better speedup.
* integer hashing uses 'uint in pointer' (i.e. value of the void pointer itself). We can see how this is efficient (no dereference at all), storing hashes in entries gives nearly no gain in this case during insert - even the super-quick 'modulo only' option (`no hash` in table above) gives nearly no speedup.
* Storing hashes helps a lot with expansive hashing (strings…), and in all cases during lookup (because it avoids having to call cmpfp on all entries sharing a same bucket until we find the good one).
Reducing Load
===========
Here are the results of some tests, ran using [this corpus](http://corpora.uni-leipzig.de/downloads/eng_wikipedia_2010_1M-text.tar.gz) (1 million of words, about 122MB of text), from [corpora.informatik.uni-leipzig.de](http://corpora.informatik.uni-leipzig.de/download.html).
Inserted keys were: the whole text, all its lines, and all its words, leading to 2 917 605 entries.
| Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded %
| -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------
| Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.556978 | 3.036947 | 1.002262 | 0.349653 | 34.78 | 70.68 | 4.84 (7)
| Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.741920 | 3.156516 | 1.002311 | 0.349693 | 34.78 | 70.68 | 4.85 (7)
| Strings (1M words, 122M file) | GH | 3 | Mask | 3.567847 | 4.707220 | 1.002759 | 1.404238 | 139.12 | 25.03 | 5.36 (10)
| Strings (1M words, 122M file) | GH | 3 | Mod | 4.116788 | 5.141498 | 1.001088 | 1.396343 | 139.12 | 24.94 | 5.29 (11)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.836161 | 3.226247 | 1.000052 | 0.347848 | 34.78 | 70.63 | 4.82 (6)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 3.083521 | 3.520704 | 0.999960 | 0.347772 | 34.78 | 70.62 | 4.81 (7)
| Strings (1M words, 122M file) | MM2 | 3 | Mask | 3.807093 | 4.904332 | 0.999995 | 1.391200 | 139.12 | 24.89 | 5.27 (11)
| Strings (1M words, 122M file) | MM2 | 3 | Mod | 4.198665 | 5.395439 | 0.999922 | 1.390842 | 139.12 | 24.89 | 2.28 (9)
| No pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded %
| -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------
| Strings (1M words, 122M file) | GH | 0.75 | Mask | 6.177619 | 3.579811 | 1.001687 | 0.698775 | 69.56 | 49.95 | 15.45 (8)
| Strings (1M words, 122M file) | GH | 0.75 | Mod | 6.670569 | 4.010712 | 1.002150 | 0.699641 | 69.56 | 49.96 | 15.43 (8)
| Strings (1M words, 122M file) | GH | 3 | Mask | 7.459112 | 5.181792 | 1.002515 | 2.815917 | 278.24 | 6.29 | 30.41 (13)
| Strings (1M words, 122M file) | GH | 3 | Mod | 7.634238 | 5.366527 | 1.000709 | 2.791862 | 278.24 | 6.23 | 30.45 (14)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 5.133478 | 3.935534 | 1.000095 | 0.695790 | 69.56 | 49.88 | 15.43 (9)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 5.434215 | 4.090286 | 0.999919 | 0.695456 | 69.56 | 49.88 | 15.44 (8)
| Strings (1M words, 122M file) | MM2 | 3 | Mask | 6.287116 | 5.402290 | 1.000289 | 2.786288 | 278.24 | 6.20 | 30.43 (15)
| Strings (1M words, 122M file) | MM2 | 3 | Mod | 6.577213 | 5.520325 | 0.999844 | 2.780355 | 278.24 | 6.17 | 30.47 (16)
Notes:
* //GH// stands for standard GHash hashing, //MM2// for Murmur2 method
* //LoadT// is the 'load threshold', i.e. the maximum average amount of entries per bucket.
* //Overloaded %// is the percentage of buckets holding more entries than max(1, LoadT) (i.e. two or more for 0.75, four or more for 3).
* Did same tests using first 100M integers, and 10M random vectors of four integers, as keys. Results are more or less similar as with strings.
From that we can conclude (at least for now) that:
* Our current load threshold (3) is really an issue! We are talking of several tens of percent slower than with a threshold of 0.75, in insertion **and** lookup - memory saving seems really not worth it here!
* Murmur2 hashing is //slightly// better than GHash one, but tradeoff in performances is not worth it usually (only exception would be big-enough strings, where it becomes better than our char-looping current hashing).
* Modulo is even more slightly better than masking, but again, noticeably slower.
Also, we can see how important it is to pre-allocate (reserve) GHash buckets whenever possible. Resizing buckets is a very expansive operation. We can also see how interesting it can be to over-reserve buckets for better lookup times (see differences in lookup times between different % of load).
----------
Storing Hashes
===========
And now, here are some results with hashes stored in entries:
| Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded %
| -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------
| Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.182862 | 2.383782 | 1.002262 | 0.349653 | 34.78 | 70.68 | 4.84 (7)
| Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.438983 | 2.601138 | 1.002311 | 0.349693 | 34.78 | 70.68 | 4.85 (7)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.439041 | 2.674447 | 1.000052 | 0.347848 | 34.78 | 70.63 | 4.82 (6)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 2.624448 | 2.830084 | 0.999960 | 0.347772 | 34.78 | 70.62 | 4.81 (7)
| No pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded %
| -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------
| Strings (1M words, 122M file) | GH | 0.75 | Mask | 2.475735 | 2.459534 | 1.001687 | 0.698775 | 69.56 | 49.95 | 15.45 (8)
| Strings (1M words, 122M file) | GH | 0.75 | Mod | 2.743342 | 2.621367 | 1.002150 | 0.699641 | 69.56 | 49.96 | 15.43 (8)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mask | 2.825784 | 2.770305 | 1.000095 | 0.695790 | 69.56 | 49.88 | 15.43 (9)
| Strings (1M words, 122M file) | MM2 | 0.75 | Mod | 3.047072a | 2.975727 | 0.999919 | 0.695456 | 69.56 | 49.88 | 15.44 (8)
Benefits are obvious, non-reserved times are now quite close to reserve times even!
----------
When Using Non-hashing Hash Functions
==============================
Yes, we do that in Blender - quit a bit actually!
| Pre-reservation of buckets | Hash | LoadT | Type | Insert ms| Lookup ms | Quality | Variance | Load % | Empty % | Overloaded %
| -------------------------------- | ---- | ----- | ---- | ------ | ------ | ------- | -------- | ------ | ------- | ------------
| 20M pointers from an array | None | 0.75 | Mask | 0.870727 | 0.500824 | 4.068809 | 5.344611 | 59.60 | 93.75 | 6.25 (10)
| 20M pointers from an array | None | 0.75 | Mod | 0.840899 | 0.320517 | 0.770402 | 0.240775 | 59.60 | 40.40 | 0.00 (1)
Here we store 20M entries using raw pointers addresses as keys (from an array, so keys are consecutive addresses). No surprise here, a single modulo division shines, because (due to load < 1) it can perfectly spread all entries, while masking (since addresses are not small values) fails quite heavily. Note that even in this hyper-worst-case scenario, performances of masking are not that catastrophic.
In any other tested cases (consecutive ints starting from zero, and random ints), masking and modulo get very similar results.
So I think either we decide to stick to modulo, accepting the (more or less) 10% loss of speed, or we decide to switch to masking, which involves a careful check of all cases where we are currently using raw integers as hashes (e.g. `ghashutil_bmelem_indexhash()`, `imagecache_hashhash()`, …). First solution looks simpler for now.
----------
About memory
===========
Here are some (rough) estimations for 100M of entries:
| Memory usage for 100M entries | GHash | | |GSet | |
| ------------------------------- | ------- | ------- | ----- | ------- | ------- | -----
| Memory (in MB) | Buckets | Entries | Total | Buckets | Entries | Total
| ------------------------------- | ------- | ------- | ----- | ------- | ------- | -----
| Current Master (load: 3) | 268 | 2400 | 2668 | 268 | 1600 | 1868
| Load: 0.75 | 1074 | 2400 | 3474 | 1074 | 1600 | 2674
| Load: 0.75, storing hashes | 1074 | 3200 | 4274 | 1074 | 2400 | 3474
In other words, all proposed changes approximately increase the memory footprint of GHash of 60%, and of GSet of 86%.
But our typical usage in Blender nearly never involves more than a few millions entries (usually much less actually), so I would not consider that an issue?