| NOTES ON DICTIONARIES | 
 | ================================ | 
 |  | 
 | Principal Use Cases for Dictionaries | 
 | ------------------------------------ | 
 |  | 
 | Passing keyword arguments | 
 |     Typically, one read and one write for 1 to 3 elements. | 
 |     Occurs frequently in normal python code. | 
 |  | 
 | Class method lookup | 
 |     Dictionaries vary in size with 8 to 16 elements being common. | 
 |     Usually written once with many lookups. | 
 |     When base classes are used, there are many failed lookups | 
 |         followed by a lookup in a base class. | 
 |  | 
 | Instance attribute lookup and Global variables | 
 |     Dictionaries vary in size.  4 to 10 elements are common. | 
 |     Both reads and writes are common. | 
 |  | 
 | Builtins | 
 |     Frequent reads.  Almost never written. | 
 |     About 150 interned strings (as of Py3.3). | 
 |     A few keys are accessed much more frequently than others. | 
 |  | 
 | Uniquification | 
 |     Dictionaries of any size.  Bulk of work is in creation. | 
 |     Repeated writes to a smaller set of keys. | 
 |     Single read of each key. | 
 |     Some use cases have two consecutive accesses to the same key. | 
 |  | 
 |     * Removing duplicates from a sequence. | 
 |         dict.fromkeys(seqn).keys() | 
 |  | 
 |     * Counting elements in a sequence. | 
 |         for e in seqn: | 
 |           d[e] = d.get(e,0) + 1 | 
 |  | 
 |     * Accumulating references in a dictionary of lists: | 
 |  | 
 |         for pagenumber, page in enumerate(pages): | 
 |           for word in page: | 
 |             d.setdefault(word, []).append(pagenumber) | 
 |  | 
 |     Note, the second example is a use case characterized by a get and set | 
 |     to the same key.  There are similar use cases with a __contains__ | 
 |     followed by a get, set, or del to the same key.  Part of the | 
 |     justification for d.setdefault is combining the two lookups into one. | 
 |  | 
 | Membership Testing | 
 |     Dictionaries of any size.  Created once and then rarely changes. | 
 |     Single write to each key. | 
 |     Many calls to __contains__() or has_key(). | 
 |     Similar access patterns occur with replacement dictionaries | 
 |         such as with the % formatting operator. | 
 |  | 
 | Dynamic Mappings | 
 |     Characterized by deletions interspersed with adds and replacements. | 
 |     Performance benefits greatly from the re-use of dummy entries. | 
 |  | 
 | Data Layout | 
 | ----------- | 
 |  | 
 | Dictionaries are composed of 3 components: | 
 | The dictobject struct itself | 
 | A dict-keys object (keys & hashes) | 
 | A values array | 
 |  | 
 |  | 
 | Tunable Dictionary Parameters | 
 | ----------------------------- | 
 |  | 
 | See comments for PyDict_MINSIZE_SPLIT, PyDict_MINSIZE_COMBINED, | 
 | USABLE_FRACTION and GROWTH_RATE in dictobject.c | 
 |  | 
 | Tune-ups should be measured across a broad range of applications and | 
 | use cases.  A change to any parameter will help in some situations and | 
 | hurt in others.  The key is to find settings that help the most common | 
 | cases and do the least damage to the less common cases.  Results will | 
 | vary dramatically depending on the exact number of keys, whether the | 
 | keys are all strings, whether reads or writes dominate, the exact | 
 | hash values of the keys (some sets of values have fewer collisions than | 
 | others).  Any one test or benchmark is likely to prove misleading. | 
 |  | 
 | While making a dictionary more sparse reduces collisions, it impairs | 
 | iteration and key listing.  Those methods loop over every potential | 
 | entry.  Doubling the size of dictionary results in twice as many | 
 | non-overlapping memory accesses for keys(), items(), values(), | 
 | __iter__(), iterkeys(), iteritems(), itervalues(), and update(). | 
 | Also, every dictionary iterates at least twice, once for the memset() | 
 | when it is created and once by dealloc(). | 
 |  | 
 | Dictionary operations involving only a single key can be O(1) unless | 
 | resizing is possible.  By checking for a resize only when the | 
 | dictionary can grow (and may *require* resizing), other operations | 
 | remain O(1), and the odds of resize thrashing or memory fragmentation | 
 | are reduced. In particular, an algorithm that empties a dictionary | 
 | by repeatedly invoking .pop will see no resizing, which might | 
 | not be necessary at all because the dictionary is eventually | 
 | discarded entirely. | 
 |  | 
 | The key differences between this implementation and earlier versions are: | 
 |     1. The table can be split into two parts, the keys and the values. | 
 |  | 
 |     2. There is an additional key-value combination: (key, NULL). | 
 |        Unlike (<dummy>, NULL) which represents a deleted value, (key, NULL) | 
 |        represented a yet to be inserted value. This combination can only occur | 
 |        when the table is split. | 
 |  | 
 |     3. No small table embedded in the dict, | 
 |        as this would make sharing of key-tables impossible. | 
 |  | 
 |  | 
 | These changes have the following consequences. | 
 |    1. General dictionaries are slightly larger. | 
 |  | 
 |    2. All object dictionaries of a single class can share a single key-table, | 
 |       saving about 60% memory for such cases. | 
 |  | 
 | Results of Cache Locality Experiments | 
 | -------------------------------------- | 
 |  | 
 | Experiments on an earlier design of dictionary, in which all tables were | 
 | combined, showed the following: | 
 |  | 
 |   When an entry is retrieved from memory, several adjacent entries are also | 
 |   retrieved into a cache line.  Since accessing items in cache is *much* | 
 |   cheaper than a cache miss, an enticing idea is to probe the adjacent | 
 |   entries as a first step in collision resolution.  Unfortunately, the | 
 |   introduction of any regularity into collision searches results in more | 
 |   collisions than the current random chaining approach. | 
 |  | 
 |   Exploiting cache locality at the expense of additional collisions fails | 
 |   to payoff when the entries are already loaded in cache (the expense | 
 |   is paid with no compensating benefit).  This occurs in small dictionaries | 
 |   where the whole dictionary fits into a pair of cache lines.  It also | 
 |   occurs frequently in large dictionaries which have a common access pattern | 
 |   where some keys are accessed much more frequently than others.  The | 
 |   more popular entries *and* their collision chains tend to remain in cache. | 
 |  | 
 |   To exploit cache locality, change the collision resolution section | 
 |   in lookdict() and lookdict_string().  Set i^=1 at the top of the | 
 |   loop and move the  i = (i << 2) + i + perturb + 1 to an unrolled | 
 |   version of the loop. | 
 |  | 
 | For split tables, the above will apply to the keys, but the value will | 
 | always be in a different cache line from the key. | 
 |  | 
 |  |