Joe Gregorio | 02f7202 | 2021-03-27 10:12:45 -0400 | [diff] [blame] | 1 | --- |
| 2 | title: 'Correctness Testing' |
| 3 | linkTitle: 'Correctness Testing' |
| 4 | --- |
| 5 | |
| 6 | Skia correctness testing is primarily served by a tool named DM. This is a |
| 7 | quickstart to building and running DM. |
| 8 | |
| 9 | <!--?prettify lang=sh?--> |
| 10 | |
| 11 | python2 tools/git-sync-deps |
| 12 | bin/gn gen out/Debug |
| 13 | ninja -C out/Debug dm |
| 14 | out/Debug/dm -v -w dm_output |
| 15 | |
| 16 | When you run this, you may notice your CPU peg to 100% for a while, then taper |
| 17 | off to 1 or 2 active cores as the run finishes. This is intentional. DM is very |
| 18 | multithreaded, but some of the work, particularly GPU-backed work, is still |
| 19 | forced to run on a single thread. You can use `--threads N` to limit DM to N |
| 20 | threads if you like. This can sometimes be helpful on machines that have |
| 21 | relatively more CPU available than RAM. |
| 22 | |
| 23 | As DM runs, you ought to see a giant spew of output that looks something like |
| 24 | this. |
| 25 | |
| 26 | ``` |
| 27 | Skipping nonrendering: Don't understand 'nonrendering'. |
| 28 | Skipping angle: Don't understand 'angle'. |
| 29 | Skipping nvprmsaa4: Could not create a surface. |
| 30 | 492 srcs * 3 sinks + 382 tests == 1858 tasks |
| 31 | |
| 32 | ( 25MB 1857) 1.36ms 8888 image mandrill_132x132_12x12.astc-5-subsets |
| 33 | ( 25MB 1856) 1.41ms 8888 image mandrill_132x132_6x6.astc-5-subsets |
| 34 | ( 25MB 1855) 1.35ms 8888 image mandrill_132x130_6x5.astc-5-subsets |
| 35 | ( 25MB 1854) 1.41ms 8888 image mandrill_132x130_12x10.astc-5-subsets |
| 36 | ( 25MB 1853) 151µs 8888 image mandrill_130x132_10x6.astc-5-subsets |
| 37 | ( 25MB 1852) 154µs 8888 image mandrill_130x130_5x5.astc-5-subsets |
| 38 | ... |
| 39 | ( 748MB 5) 9.43ms unit test GLInterfaceValidation |
| 40 | ( 748MB 4) 30.3ms unit test HalfFloatTextureTest |
| 41 | ( 748MB 3) 31.2ms unit test FloatingPointTextureTest |
| 42 | ( 748MB 2) 32.9ms unit test DeferredCanvas_GPU |
| 43 | ( 748MB 1) 49.4ms unit test ClipCache |
| 44 | ( 748MB 0) 37.2ms unit test Blur |
| 45 | ``` |
| 46 | |
| 47 | Do not panic. |
| 48 | |
| 49 | As you become more familiar with DM, this spew may be a bit annoying. If you |
| 50 | remove -v from the command line, DM will spin its progress on a single line |
| 51 | rather than print a new line for each status update. |
| 52 | |
| 53 | Don't worry about the "Skipping something: Here's why." lines at startup. DM |
| 54 | supports many test configurations, which are not all appropriate for all |
| 55 | machines. These lines are a sort of FYI, mostly in case DM can't run some |
| 56 | configuration you might be expecting it to run. |
| 57 | |
| 58 | Don't worry about the "skps: Couldn't read skps." messages either, you won't |
| 59 | have those by default and can do without them. If you wish to test with them |
| 60 | too, you can download them separately. |
| 61 | |
| 62 | The next line is an overview of the work DM is about to do. |
| 63 | |
| 64 | ``` |
| 65 | 492 srcs * 3 sinks + 382 tests == 1858 tasks |
| 66 | ``` |
| 67 | |
| 68 | DM has found 382 unit tests (code linked in from tests/), and 492 other drawing |
| 69 | sources. These drawing sources may be GM integration tests (code linked in from |
| 70 | gm/), image files (from `--images`, which defaults to "resources") or .skp files |
| 71 | (from `--skps`, which defaults to "skps"). You can control the types of sources |
| 72 | DM will use with `--src` (default, "tests gm image skp"). |
| 73 | |
| 74 | DM has found 3 usable ways to draw those 492 sources. This is controlled by |
| 75 | `--config`. The defaults are operating system dependent. On Linux they are "8888 |
| 76 | gl nonrendering". DM has skipped nonrendering leaving two usable configs: 8888 |
| 77 | and gl. These two name different ways to draw using Skia: |
| 78 | |
| 79 | - 8888: draw using the software backend into a 32-bit RGBA bitmap |
| 80 | - gl: draw using the OpenGL backend (Ganesh) into a 32-bit RGBA bitmap |
| 81 | |
| 82 | Sometimes DM calls these configs, sometimes sinks. Sorry. There are many |
| 83 | possible configs but generally we pay most attention to 8888 and gl. |
| 84 | |
| 85 | DM always tries to draw all sources into all sinks, which is why we multiply 492 |
| 86 | by 3. The unit tests don't really fit into this source-sink model, so they stand |
| 87 | alone. A couple thousand tasks is pretty normal. Let's look at the status line |
| 88 | for one of those tasks. |
| 89 | |
| 90 | ``` |
| 91 | ( 25MB 1857) 1.36ms 8888 image mandrill_132x132_12x12.astc-5-subsets |
| 92 | [1] [2] [3] [4] |
| 93 | ``` |
| 94 | |
| 95 | This status line tells us several things. |
| 96 | |
| 97 | 1. The maximum amount of memory DM had ever used was 25MB. Note this is a high |
| 98 | water mark, not the current memory usage. This is mostly useful for us to |
| 99 | track on our buildbots, some of which run perilously close to the system |
| 100 | memory limit. |
| 101 | |
| 102 | 2. The number of unfinished tasks, in this example there are 1857, either |
| 103 | currently running or waiting to run. We generally run one task per hardware |
| 104 | thread available, so on a typical laptop there are probably 4 or 8 running at |
| 105 | once. Sometimes the counts appear to show up out of order, particularly at DM |
| 106 | startup; it's harmless, and doesn't affect the correctness of the run. |
| 107 | |
| 108 | 3. Next, we see this task took 1.36 milliseconds to run. Generally, the |
| 109 | precision of this timer is around 1 microsecond. The time is purely there for |
| 110 | informational purposes, to make it easier for us to find slow tests. |
| 111 | |
| 112 | 4. The configuration and name of the test we ran. We drew the test |
| 113 | "mandrill_132x132_12x12.astc-5-subsets", which is an "image" source, into an |
| 114 | "8888" sink. |
| 115 | |
| 116 | When DM finishes running, you should find a directory with file named `dm.json`, |
| 117 | and some nested directories filled with lots of images. |
| 118 | |
| 119 | ``` |
| 120 | $ ls dm_output |
| 121 | 8888 dm.json gl |
| 122 | |
| 123 | $ find dm_output -name '*.png' |
| 124 | dm_output/8888/gm/3x3bitmaprect.png |
| 125 | dm_output/8888/gm/aaclip.png |
| 126 | dm_output/8888/gm/aarectmodes.png |
| 127 | dm_output/8888/gm/alphagradients.png |
| 128 | dm_output/8888/gm/arcofzorro.png |
| 129 | dm_output/8888/gm/arithmode.png |
| 130 | dm_output/8888/gm/astcbitmap.png |
| 131 | dm_output/8888/gm/bezier_conic_effects.png |
| 132 | dm_output/8888/gm/bezier_cubic_effects.png |
| 133 | dm_output/8888/gm/bezier_quad_effects.png |
| 134 | ... |
| 135 | ``` |
| 136 | |
| 137 | The directories are nested first by sink type (`--config`), then by source type |
| 138 | (`--src`). The image from the task we just looked at, "8888 image |
| 139 | mandrill_132x132_12x12.astc-5-subsets", can be found at |
| 140 | `dm_output/8888/image/mandrill_132x132_12x12.astc-5-subsets.png`. |
| 141 | |
| 142 | `dm.json` is used by our automated testing system, so you can ignore it if you |
| 143 | like. It contains a listing of each test run and a checksum of the image |
| 144 | generated for that run. |
| 145 | |
| 146 | ### Detail <a name="digests"></a> |
| 147 | |
| 148 | Boring technical detail: The checksum is not a checksum of the .png file, but |
| 149 | rather a checksum of the raw pixels used to create that .png. That means it is |
| 150 | possible for two different configurations to produce the same exact .png, but |
| 151 | have their checksums differ. |
| 152 | |
| 153 | Unit tests don't generally output anything but a status update when they pass. |
| 154 | If a test fails, DM will print out its assertion failures, both at the time they |
| 155 | happen and then again all together after everything is done running. These |
| 156 | failures are also included in the `dm.json` file. |
| 157 | |
| 158 | DM has a simple facility to compare against the results of a previous run: |
| 159 | |
| 160 | <!--?prettify lang=sh?--> |
| 161 | |
| 162 | ninja -C out/Debug dm |
| 163 | out/Debug/dm -w good |
| 164 | |
| 165 | # do some work |
| 166 | |
| 167 | ninja -C out/Debug dm |
| 168 | out/Debug/dm -r good -w bad |
| 169 | |
| 170 | When using `-r`, DM will display a failure for any test that didn't produce the |
| 171 | same image as the `good` run. |
| 172 | |
| 173 | For anything fancier, I suggest using skdiff: |
| 174 | |
| 175 | <!--?prettify lang=sh?--> |
| 176 | |
| 177 | ninja -C out/Debug dm |
| 178 | out/Debug/dm -w good |
| 179 | |
| 180 | # do some work |
| 181 | |
| 182 | ninja -C out/Debug dm |
| 183 | out/Debug/dm -w bad |
| 184 | |
| 185 | ninja -C out/Debug skdiff |
| 186 | mkdir diff |
| 187 | out/Debug/skdiff good bad diff |
| 188 | |
| 189 | # open diff/index.html in your web browser |
| 190 | |
| 191 | That's the basics of DM. DM supports many other modes and flags. Here are a few |
| 192 | examples you might find handy. |
| 193 | |
| 194 | <!--?prettify lang=sh?--> |
| 195 | |
| 196 | out/Debug/dm --help # Print all flags, their defaults, and a brief explanation of each. |
| 197 | out/Debug/dm --src tests # Run only unit tests. |
| 198 | out/Debug/dm --nocpu # Test only GPU-backed work. |
| 199 | out/Debug/dm --nogpu # Test only CPU-backed work. |
| 200 | out/Debug/dm --match blur # Run only work with "blur" in its name. |
| 201 | out/Debug/dm --dryRun # Don't really do anything, just print out what we'd do. |