Narayan Kamath | f163f69 | 2012-09-14 15:02:00 +0100 | [diff] [blame] | 1 | - Project name |
| 2 | |
| 3 | marisa-trie |
| 4 | http://code.google.com/p/marisa-trie/ |
| 5 | |
| 6 | - Project summary |
| 7 | |
| 8 | Matching Algorithms with Recursively Implemented StorAge |
| 9 | |
| 10 | - Version |
| 11 | |
| 12 | 0.1.4 |
| 13 | |
| 14 | - Description |
| 15 | |
| 16 | This project *marisa-trie* provides a C++ library *libmarisa* and command line tools *`marisa-*`* for building and operating nesting patricia tries. The brand-new dictionary structure is designed to be static and space efficient. Also, *marisa-trie* enables not only simple lookups but also prefix searches and predictive searches. |
| 17 | |
| 18 | * Prefix searches are to find keys from prefixes of a given query. |
| 19 | * Predictive searches are to find keys starting with a given query. |
| 20 | |
| 21 | The biggest advantage of *marisa-trie* is that it can build a considerably compact dictionary. See below for the size of dictionaries built with various trie implementations. |
| 22 | |
| 23 | * Input |
| 24 | * Source: enwiki-20110115-all-titles-in-ns0.gz |
| 25 | * Contents: all page titles of English Wikipedia (Jan. 2011) |
| 26 | * Number of keys: 8,279,325 |
| 27 | * Total size: 167,223,035 bytes (plain) / 46,344,646 bytes (gzipped) |
| 28 | |
| 29 | || *Implementation* || *Size (bytes)* || *Remarks* || |
| 30 | || darts-clone || 316,065,792 || Compacted double-array trie || |
| 31 | || tx-trie || 107,119,864 || LOUDS-based trie || |
| 32 | || *marisa-trie* || *42,688,271* || Nesting patricia trie || |
| 33 | |
| 34 | * Documentation (in Japanese) |
| 35 | * HowTo |
| 36 | * ListOfTools |
| 37 | * LibraryInterface |
| 38 | * BenchmarkResults |
| 39 | |
| 40 | - Version control system |
| 41 | |
| 42 | Subversion |
| 43 | |
| 44 | - Source code license |
| 45 | |
| 46 | New BSD License |
| 47 | |
| 48 | - Project labels |
| 49 | |
| 50 | Nesting |
| 51 | Patricia |
| 52 | Trie |
| 53 | Static |
| 54 | Dictionary |
| 55 | CPlusPlus |