| NOTES ON DICTIONARIES |
| ================================ |
| |
| Principal Use Cases for Dictionaries |
| ------------------------------------ |
| |
| Passing keyword arguments |
| Typically, one read and one write for 1 to 3 elements. |
| Occurs frequently in normal python code. |
| |
| Class method lookup |
| Dictionaries vary in size with 8 to 16 elements being common. |
| Usually written once with many lookups. |
| When base classes are used, there are many failed lookups |
| followed by a lookup in a base class. |
| |
| Instance attribute lookup and Global variables |
| Dictionaries vary in size. 4 to 10 elements are common. |
| Both reads and writes are common. |
| |
| Builtins |
| Frequent reads. Almost never written. |
| About 150 interned strings (as of Py3.3). |
| A few keys are accessed much more frequently than others. |
| |
| Uniquification |
| Dictionaries of any size. Bulk of work is in creation. |
| Repeated writes to a smaller set of keys. |
| Single read of each key. |
| Some use cases have two consecutive accesses to the same key. |
| |
| * Removing duplicates from a sequence. |
| dict.fromkeys(seqn).keys() |
| |
| * Counting elements in a sequence. |
| for e in seqn: |
| d[e] = d.get(e,0) + 1 |
| |
| * Accumulating references in a dictionary of lists: |
| |
| for pagenumber, page in enumerate(pages): |
| for word in page: |
| d.setdefault(word, []).append(pagenumber) |
| |
| Note, the second example is a use case characterized by a get and set |
| to the same key. There are similar use cases with a __contains__ |
| followed by a get, set, or del to the same key. Part of the |
| justification for d.setdefault is combining the two lookups into one. |
| |
| Membership Testing |
| Dictionaries of any size. Created once and then rarely changes. |
| Single write to each key. |
| Many calls to __contains__() or has_key(). |
| Similar access patterns occur with replacement dictionaries |
| such as with the % formatting operator. |
| |
| Dynamic Mappings |
| Characterized by deletions interspersed with adds and replacements. |
| Performance benefits greatly from the re-use of dummy entries. |
| |
| Data Layout |
| ----------- |
| |
| Dictionaries are composed of 3 components: |
| The dictobject struct itself |
| A dict-keys object (keys & hashes) |
| A values array |
| |
| |
| Tunable Dictionary Parameters |
| ----------------------------- |
| |
| * PyDict_STARTSIZE. Starting size of dict (unless an instance dict). |
| Currently set to 8. Must be a power of two. |
| New dicts have to zero-out every cell. |
| Increasing improves the sparseness of small dictionaries but costs |
| time to read in the additional cache lines if they are not already |
| in cache. That case is common when keyword arguments are passed. |
| Prior to version 3.3, PyDict_MINSIZE was used as the starting size |
| of a new dict. |
| |
| * PyDict_MINSIZE. Minimum size of a dict. |
| Currently set to 4 (to keep instance dicts small). |
| Must be a power of two. Prior to version 3.3, PyDict_MINSIZE was |
| set to 8. |
| |
| * USABLE_FRACTION. Maximum dictionary load in PyDict_SetItem. |
| Currently set to 2/3. Increasing this ratio makes dictionaries more |
| dense resulting in more collisions. Decreasing it improves sparseness |
| at the expense of spreading entries over more cache lines and at the |
| cost of total memory consumed. |
| |
| * Growth rate upon hitting maximum load. Currently set to *2. |
| Raising this to *4 results in half the number of resizes, less |
| effort to resize, better sparseness for some (but not all dict sizes), |
| and potentially doubles memory consumption depending on the size of |
| the dictionary. Setting to *4 eliminates every other resize step. |
| |
| * Maximum sparseness (minimum dictionary load). What percentage |
| of entries can be unused before the dictionary shrinks to |
| free up memory and speed up iteration? (The current CPython |
| code does not represent this parameter directly.) |
| |
| * Shrinkage rate upon exceeding maximum sparseness. The current |
| CPython code never even checks sparseness when deleting a |
| key. When a new key is added, it resizes based on the number |
| of active keys, so that the addition may trigger shrinkage |
| rather than growth. |
| |
| Tune-ups should be measured across a broad range of applications and |
| use cases. A change to any parameter will help in some situations and |
| hurt in others. The key is to find settings that help the most common |
| cases and do the least damage to the less common cases. Results will |
| vary dramatically depending on the exact number of keys, whether the |
| keys are all strings, whether reads or writes dominate, the exact |
| hash values of the keys (some sets of values have fewer collisions than |
| others). Any one test or benchmark is likely to prove misleading. |
| |
| While making a dictionary more sparse reduces collisions, it impairs |
| iteration and key listing. Those methods loop over every potential |
| entry. Doubling the size of dictionary results in twice as many |
| non-overlapping memory accesses for keys(), items(), values(), |
| __iter__(), iterkeys(), iteritems(), itervalues(), and update(). |
| Also, every dictionary iterates at least twice, once for the memset() |
| when it is created and once by dealloc(). |
| |
| Dictionary operations involving only a single key can be O(1) unless |
| resizing is possible. By checking for a resize only when the |
| dictionary can grow (and may *require* resizing), other operations |
| remain O(1), and the odds of resize thrashing or memory fragmentation |
| are reduced. In particular, an algorithm that empties a dictionary |
| by repeatedly invoking .pop will see no resizing, which might |
| not be necessary at all because the dictionary is eventually |
| discarded entirely. |
| |
| The key differences between this implementation and earlier versions are: |
| 1. The table can be split into two parts, the keys and the values. |
| |
| 2. There is an additional key-value combination: (key, NULL). |
| Unlike (<dummy>, NULL) which represents a deleted value, (key, NULL) |
| represented a yet to be inserted value. This combination can only occur |
| when the table is split. |
| |
| 3. No small table embedded in the dict, |
| as this would make sharing of key-tables impossible. |
| |
| |
| These changes have the following consequences. |
| 1. General dictionaries are slightly larger. |
| |
| 2. All object dictionaries of a single class can share a single key-table, |
| saving about 60% memory for such cases. |
| |
| Results of Cache Locality Experiments |
| -------------------------------------- |
| |
| Experiments on an earlier design of dictionary, in which all tables were |
| combined, showed the following: |
| |
| When an entry is retrieved from memory, several adjacent entries are also |
| retrieved into a cache line. Since accessing items in cache is *much* |
| cheaper than a cache miss, an enticing idea is to probe the adjacent |
| entries as a first step in collision resolution. Unfortunately, the |
| introduction of any regularity into collision searches results in more |
| collisions than the current random chaining approach. |
| |
| Exploiting cache locality at the expense of additional collisions fails |
| to payoff when the entries are already loaded in cache (the expense |
| is paid with no compensating benefit). This occurs in small dictionaries |
| where the whole dictionary fits into a pair of cache lines. It also |
| occurs frequently in large dictionaries which have a common access pattern |
| where some keys are accessed much more frequently than others. The |
| more popular entries *and* their collision chains tend to remain in cache. |
| |
| To exploit cache locality, change the collision resolution section |
| in lookdict() and lookdict_string(). Set i^=1 at the top of the |
| loop and move the i = (i << 2) + i + perturb + 1 to an unrolled |
| version of the loop. |
| |
| For split tables, the above will apply to the keys, but the value will |
| always be in a different cache line from the key. |
| |
| |