Raph Levien | dcecdd8 | 2012-03-23 11:21:16 -0700 | [diff] [blame] | 1 | This is a README for the font compression reference code. It’s very rough in |
| 2 | this snapshot, but will be cleaned up some for public release. |
| 3 | |
| 4 | = How to run the compression test tool = |
| 5 | |
| 6 | This document documents how to run the compression reference code. At this |
| 7 | writing, the code, while it is intended to produce a bytestream that can be |
| 8 | reconstructed into a working font, the reference decompression code is not |
| 9 | done, and the exact format of that bytestream is subject to change. |
| 10 | |
| 11 | == Building the tool == |
| 12 | |
| 13 | On a standard Unix-style environment, it should be as simple as running “ant”. |
| 14 | A couple of paths to compression subprocesses are hardcoded in |
| 15 | CompressionRunner.java, namely “usr/bin/lzma” and “/bin/bzip2”. These are the |
| 16 | default locations in Ubuntu, but if they’re elsewhere on your system, you’ll |
| 17 | need to change that. |
| 18 | |
| 19 | The tool depends on sfntly for much of the font work. The lib/ directory |
| 20 | contains a snapshot jar. If you want to use the latest sfntly sources, then cd |
| 21 | to the java subdirectory, run “ant”, then copy these files dist/lib/sfntly.jar |
| 22 | dist/tools/conversion/eot/eotconverter.jar and |
| 23 | dist.tools/conversion/woff/woffconverter.jar to $(thisproject)/lib: |
| 24 | |
| 25 | dist/lib/sfntly.jar dist/tools/conversion/eot/eotconverter.jar |
| 26 | dist.tools/conversion/woff/woffconverter.jar |
| 27 | |
| 28 | There’s also a dependency on guava (see references below). |
| 29 | |
| 30 | The dependencies are subject to their own licenses. |
| 31 | |
| 32 | == Setting up the test == |
| 33 | |
| 34 | A run of the tool evaluates a “base” configuration plus one or more test |
| 35 | configurations, for each font. It measures the file size of the test as a ratio |
| 36 | over the base file size, then graphs the value of that ratio sorted across all |
| 37 | files given on the command line. |
| 38 | |
| 39 | The test parameters are set by command line options (an improvement from the |
| 40 | last snapshot). The base is set by the -b command line option, and the |
| 41 | additional tests are specified by repeated -x command line options (see below). |
| 42 | |
| 43 | Each test is specified by a string description. It is a colon-separated list of |
| 44 | stages. The final stage is entropy compression and can be one of “gzip”, |
| 45 | “lzma”, “bzip2”, “woff”, “eot” (with actual wire-format MTX compression), or |
| 46 | “uncomp” (for raw, uncompressed TTF’s). Also, the new wire-format draft |
| 47 | WOFF2 spec is available as "woff2", and takes an entropy coding as an |
| 48 | optional argument, as in "woff2/gzip" or "woff2/lzma". |
| 49 | |
| 50 | Other stages may optionally include subparameters (following a slash, and |
| 51 | comma-separated). The stages are: |
| 52 | |
| 53 | glyf: performs glyf-table preprocessing based on MTX. There are subparameters: |
| 54 | 1. cbbox (composite bounding box). When specified, the bounding box for |
| 55 | composite glyphs is included, otherwise stripped 2. sbbox (simple bounding |
| 56 | box). When specified, the bounding box for simple glyphs is included 3. code: |
| 57 | the bytecode is separated out into a separate stream 4. triplet: triplet coding |
| 58 | (as in MTX) is used 5. push: push sequences are separated; if unset, pushes are |
| 59 | kept inline in the bytecode 6. reslice: components of the glyf table are |
| 60 | separated into individual streams, taking the MTX idea of separating the |
| 61 | bytecodes further. |
| 62 | |
| 63 | hmtx: strips lsb’s from the hmtx table. Based on the idea that lsb’s can be |
| 64 | reconstructed from bbox. |
| 65 | |
| 66 | hdmx: performs the delta coding on hdmx, essentially the same as MTX. |
| 67 | |
| 68 | cmap: compresses cmap table: wire format representation is inverse of cmap |
| 69 | table plus exceptions (one glyph encoded by multiple character codes). |
| 70 | |
| 71 | kern: compresses kern table (not robust, intended just for rough testing). |
| 72 | |
| 73 | strip: the subparameters are a list of tables to be stripped entirely |
| 74 | (comma-separated). |
| 75 | |
| 76 | The string roughly corresponding to MTX is: |
| 77 | |
| 78 | glyf/cbbox,code,triplet,push,hop:hdmx:gzip |
| 79 | |
| 80 | Meaning: glyph encoding is used, with simple glyph bboxes stripped (but |
| 81 | composite glyph bboxes included), triplet coding, push sequences, and hop |
| 82 | codes. The hdmx table is compressed. And finally, gzip is used as the entropy |
| 83 | coder. |
| 84 | |
| 85 | This differs from MTX in a number of small ways: LZCOMP is not exactly the same |
| 86 | as gzip. MTX uses three separate compression streams (the base font including |
| 87 | triplet-coded glyph data), the bytecodes, and the push sequences, while this |
| 88 | test uses a single stream. MTX also compresses the CVT table (an upper bound on |
| 89 | the impact of this can be estimated by testing strip/cvt) |
| 90 | |
| 91 | Lastly, as a point of methodology, the code by default strips the “dsig” table, |
| 92 | which would be invalidated by any non-bit-identical change to the font data. If |
| 93 | it is desired to keep this table, add the “keepdsig” stage. |
| 94 | |
| 95 | The string representing the currently most aggressive optimization level is: |
| 96 | |
| 97 | glyf/triplet,code,push,reslice:hdmx:hmtx:cmap:kern:lzma |
| 98 | |
| 99 | In addition to the MTX one above, it strips the bboxes from composite glyphs, |
| 100 | reslices the glyf table, compresses the htmx, cmap, and kern tables, and uses |
| 101 | lzma as the entropy coding. |
| 102 | |
| 103 | The string corresponding to the current WOFF Ultra Condensed draft spec |
| 104 | document is: |
| 105 | |
| 106 | glyf/cbbox,triplet,code,reslice:woff2/lzma |
| 107 | |
| 108 | The current C++ codebase can roundtrip compressed files as long as no per-table |
| 109 | entropy coding is specified, as below (this will be fixed soon). |
| 110 | |
| 111 | glyf/cbbox,triplet,code,reslice:woff2 |
| 112 | |
| 113 | |
| 114 | == Running the tool == |
| 115 | |
| 116 | java -jar build/jar/compression.jar *.ttf > chart.html |
| 117 | |
| 118 | The tool takes a list of OpenType fonts on the commandline, and generates an |
| 119 | HTML chart, which it simply outputs to stdout. This chart uses the Google Chart |
| 120 | API for plotting. |
| 121 | |
| 122 | Options: |
| 123 | |
| 124 | -b <desc> |
| 125 | |
| 126 | Sets the baseline experiment description. |
| 127 | |
| 128 | [ -x <desc> ]... |
| 129 | |
| 130 | Sets an experiment description. Can be used multiple times. |
| 131 | |
| 132 | -o |
| 133 | |
| 134 | Outputs the actual compressed file, substituting ".wof2" for ".ttf" in |
| 135 | the input file name. Only useful when a single -x parameter is specified. |
| 136 | |
| 137 | = Decompressing the fonts = |
| 138 | |
| 139 | See the cpp/ directory (including cpp/README) for the C++ implementation of |
| 140 | decompression. This code is based on OTS, and successfully roundtrips the |
| 141 | basic compression as described in the draft spec. |
| 142 | |
| 143 | = References = |
| 144 | |
| 145 | sfntly: http://code.google.com/p/sfntly/ Guava: |
| 146 | http://code.google.com/p/guava-libraries/ MTX: |
| 147 | http://www.w3.org/Submission/MTX/ |
| 148 | |
| 149 | Also please refer to documents (currently Google Docs): |
| 150 | |
| 151 | WOFF Ultra Condensed file format: proposals and discussion of wire format |
| 152 | issues |
| 153 | |
| 154 | WIFF Ultra Condensed: more discussion of results and compression techniques. |
| 155 | This tool was used to prepare the data in that document. |