Raph Levien | 7efdf8e | 2013-10-11 10:33:29 +0200 | [diff] [blame] | 1 | This is a README for the font compression reference code. There are several |
| 2 | compression related modules in this repository. |
| 3 | |
| 4 | brotli/ contains reference code for the Brotli byte-level compression |
| 5 | algorithm. Note that it is licensed under an Apache 2 license. |
| 6 | |
| 7 | src/ contains prototype Java code for compressing fonts. |
| 8 | |
| 9 | cpp/ contains prototype C++ code for decompressing fonts. |
| 10 | |
| 11 | docs/ contains documents describing the proposed compression format. |
Raph Levien | dcecdd8 | 2012-03-23 11:21:16 -0700 | [diff] [blame] | 12 | |
| 13 | = How to run the compression test tool = |
| 14 | |
| 15 | This document documents how to run the compression reference code. At this |
| 16 | writing, the code, while it is intended to produce a bytestream that can be |
| 17 | reconstructed into a working font, the reference decompression code is not |
| 18 | done, and the exact format of that bytestream is subject to change. |
| 19 | |
| 20 | == Building the tool == |
| 21 | |
| 22 | On a standard Unix-style environment, it should be as simple as running “ant”. |
Raph Levien | dcecdd8 | 2012-03-23 11:21:16 -0700 | [diff] [blame] | 23 | |
| 24 | The tool depends on sfntly for much of the font work. The lib/ directory |
| 25 | contains a snapshot jar. If you want to use the latest sfntly sources, then cd |
| 26 | to the java subdirectory, run “ant”, then copy these files dist/lib/sfntly.jar |
| 27 | dist/tools/conversion/eot/eotconverter.jar and |
| 28 | dist.tools/conversion/woff/woffconverter.jar to $(thisproject)/lib: |
| 29 | |
| 30 | dist/lib/sfntly.jar dist/tools/conversion/eot/eotconverter.jar |
| 31 | dist.tools/conversion/woff/woffconverter.jar |
| 32 | |
| 33 | There’s also a dependency on guava (see references below). |
| 34 | |
| 35 | The dependencies are subject to their own licenses. |
| 36 | |
| 37 | == Setting up the test == |
| 38 | |
| 39 | A run of the tool evaluates a “base” configuration plus one or more test |
| 40 | configurations, for each font. It measures the file size of the test as a ratio |
| 41 | over the base file size, then graphs the value of that ratio sorted across all |
| 42 | files given on the command line. |
| 43 | |
| 44 | The test parameters are set by command line options (an improvement from the |
| 45 | last snapshot). The base is set by the -b command line option, and the |
| 46 | additional tests are specified by repeated -x command line options (see below). |
| 47 | |
| 48 | Each test is specified by a string description. It is a colon-separated list of |
| 49 | stages. The final stage is entropy compression and can be one of “gzip”, |
| 50 | “lzma”, “bzip2”, “woff”, “eot” (with actual wire-format MTX compression), or |
| 51 | “uncomp” (for raw, uncompressed TTF’s). Also, the new wire-format draft |
| 52 | WOFF2 spec is available as "woff2", and takes an entropy coding as an |
| 53 | optional argument, as in "woff2/gzip" or "woff2/lzma". |
| 54 | |
| 55 | Other stages may optionally include subparameters (following a slash, and |
| 56 | comma-separated). The stages are: |
| 57 | |
| 58 | glyf: performs glyf-table preprocessing based on MTX. There are subparameters: |
| 59 | 1. cbbox (composite bounding box). When specified, the bounding box for |
| 60 | composite glyphs is included, otherwise stripped 2. sbbox (simple bounding |
| 61 | box). When specified, the bounding box for simple glyphs is included 3. code: |
| 62 | the bytecode is separated out into a separate stream 4. triplet: triplet coding |
| 63 | (as in MTX) is used 5. push: push sequences are separated; if unset, pushes are |
| 64 | kept inline in the bytecode 6. reslice: components of the glyf table are |
| 65 | separated into individual streams, taking the MTX idea of separating the |
| 66 | bytecodes further. |
| 67 | |
| 68 | hmtx: strips lsb’s from the hmtx table. Based on the idea that lsb’s can be |
| 69 | reconstructed from bbox. |
| 70 | |
| 71 | hdmx: performs the delta coding on hdmx, essentially the same as MTX. |
| 72 | |
| 73 | cmap: compresses cmap table: wire format representation is inverse of cmap |
| 74 | table plus exceptions (one glyph encoded by multiple character codes). |
| 75 | |
| 76 | kern: compresses kern table (not robust, intended just for rough testing). |
| 77 | |
| 78 | strip: the subparameters are a list of tables to be stripped entirely |
| 79 | (comma-separated). |
| 80 | |
| 81 | The string roughly corresponding to MTX is: |
| 82 | |
| 83 | glyf/cbbox,code,triplet,push,hop:hdmx:gzip |
| 84 | |
| 85 | Meaning: glyph encoding is used, with simple glyph bboxes stripped (but |
| 86 | composite glyph bboxes included), triplet coding, push sequences, and hop |
| 87 | codes. The hdmx table is compressed. And finally, gzip is used as the entropy |
| 88 | coder. |
| 89 | |
| 90 | This differs from MTX in a number of small ways: LZCOMP is not exactly the same |
| 91 | as gzip. MTX uses three separate compression streams (the base font including |
| 92 | triplet-coded glyph data), the bytecodes, and the push sequences, while this |
| 93 | test uses a single stream. MTX also compresses the CVT table (an upper bound on |
| 94 | the impact of this can be estimated by testing strip/cvt) |
| 95 | |
| 96 | Lastly, as a point of methodology, the code by default strips the “dsig” table, |
| 97 | which would be invalidated by any non-bit-identical change to the font data. If |
| 98 | it is desired to keep this table, add the “keepdsig” stage. |
| 99 | |
| 100 | The string representing the currently most aggressive optimization level is: |
| 101 | |
| 102 | glyf/triplet,code,push,reslice:hdmx:hmtx:cmap:kern:lzma |
| 103 | |
| 104 | In addition to the MTX one above, it strips the bboxes from composite glyphs, |
| 105 | reslices the glyf table, compresses the htmx, cmap, and kern tables, and uses |
| 106 | lzma as the entropy coding. |
| 107 | |
| 108 | The string corresponding to the current WOFF Ultra Condensed draft spec |
| 109 | document is: |
| 110 | |
| 111 | glyf/cbbox,triplet,code,reslice:woff2/lzma |
| 112 | |
| 113 | The current C++ codebase can roundtrip compressed files as long as no per-table |
| 114 | entropy coding is specified, as below (this will be fixed soon). |
| 115 | |
| 116 | glyf/cbbox,triplet,code,reslice:woff2 |
| 117 | |
| 118 | |
| 119 | == Running the tool == |
| 120 | |
| 121 | java -jar build/jar/compression.jar *.ttf > chart.html |
| 122 | |
| 123 | The tool takes a list of OpenType fonts on the commandline, and generates an |
| 124 | HTML chart, which it simply outputs to stdout. This chart uses the Google Chart |
| 125 | API for plotting. |
| 126 | |
| 127 | Options: |
| 128 | |
| 129 | -b <desc> |
| 130 | |
| 131 | Sets the baseline experiment description. |
| 132 | |
| 133 | [ -x <desc> ]... |
| 134 | |
| 135 | Sets an experiment description. Can be used multiple times. |
| 136 | |
| 137 | -o |
| 138 | |
| 139 | Outputs the actual compressed file, substituting ".wof2" for ".ttf" in |
| 140 | the input file name. Only useful when a single -x parameter is specified. |
| 141 | |
| 142 | = Decompressing the fonts = |
| 143 | |
| 144 | See the cpp/ directory (including cpp/README) for the C++ implementation of |
| 145 | decompression. This code is based on OTS, and successfully roundtrips the |
| 146 | basic compression as described in the draft spec. |
| 147 | |
| 148 | = References = |
| 149 | |
| 150 | sfntly: http://code.google.com/p/sfntly/ Guava: |
| 151 | http://code.google.com/p/guava-libraries/ MTX: |
| 152 | http://www.w3.org/Submission/MTX/ |
| 153 | |
| 154 | Also please refer to documents (currently Google Docs): |
| 155 | |
| 156 | WOFF Ultra Condensed file format: proposals and discussion of wire format |
Raph Levien | 7efdf8e | 2013-10-11 10:33:29 +0200 | [diff] [blame] | 157 | issues (PDF is in docs/ directory) |
Raph Levien | dcecdd8 | 2012-03-23 11:21:16 -0700 | [diff] [blame] | 158 | |
| 159 | WIFF Ultra Condensed: more discussion of results and compression techniques. |
| 160 | This tool was used to prepare the data in that document. |