blob: 36138c632f7af7c8e5206d6cbf05f7186a2e3b32 [file] [log] [blame]
Michael Ellermanefe4a772014-08-05 23:32:17 -07001(How to avoid) Botching up ioctls
2=================================
3
4From: http://blog.ffwll.ch/2013/11/botching-up-ioctls.html
5
6By: Daniel Vetter, Copyright © 2013 Intel Corporation
7
8One clear insight kernel graphics hackers gained in the past few years is that
9trying to come up with a unified interface to manage the execution units and
10memory on completely different GPUs is a futile effort. So nowadays every
11driver has its own set of ioctls to allocate memory and submit work to the GPU.
12Which is nice, since there's no more insanity in the form of fake-generic, but
13actually only used once interfaces. But the clear downside is that there's much
14more potential to screw things up.
15
16To avoid repeating all the same mistakes again I've written up some of the
17lessons learned while botching the job for the drm/i915 driver. Most of these
18only cover technicalities and not the big-picture issues like what the command
19submission ioctl exactly should look like. Learning these lessons is probably
20something every GPU driver has to do on its own.
21
22
23Prerequisites
24-------------
25
26First the prerequisites. Without these you have already failed, because you
27will need to add a a 32-bit compat layer:
28
29 * Only use fixed sized integers. To avoid conflicts with typedefs in userspace
30 the kernel has special types like __u32, __s64. Use them.
31
32 * Align everything to the natural size and use explicit padding. 32-bit
33 platforms don't necessarily align 64-bit values to 64-bit boundaries, but
34 64-bit platforms do. So we always need padding to the natural size to get
35 this right.
36
Laura Abbottc6517b782016-09-02 15:42:24 -070037 * Pad the entire struct to a multiple of 64-bits if the structure contains
38 64-bit types - the structure size will otherwise differ on 32-bit versus
39 64-bit. Having a different structure size hurts when passing arrays of
40 structures to the kernel, or if the kernel checks the structure size, which
41 e.g. the drm core does.
Michael Ellermanefe4a772014-08-05 23:32:17 -070042
43 * Pointers are __u64, cast from/to a uintprt_t on the userspace side and
44 from/to a void __user * in the kernel. Try really hard not to delay this
45 conversion or worse, fiddle the raw __u64 through your code since that
Laura Abbottc6517b782016-09-02 15:42:24 -070046 diminishes the checking tools like sparse can provide. The macro
47 u64_to_user_ptr can be used in the kernel to avoid warnings about integers
48 and pointres of different sizes.
Michael Ellermanefe4a772014-08-05 23:32:17 -070049
50
51Basics
52------
53
54With the joys of writing a compat layer avoided we can take a look at the basic
55fumbles. Neglecting these will make backward and forward compatibility a real
56pain. And since getting things wrong on the first attempt is guaranteed you
57will have a second iteration or at least an extension for any given interface.
58
59 * Have a clear way for userspace to figure out whether your new ioctl or ioctl
60 extension is supported on a given kernel. If you can't rely on old kernels
61 rejecting the new flags/modes or ioctls (since doing that was botched in the
62 past) then you need a driver feature flag or revision number somewhere.
63
64 * Have a plan for extending ioctls with new flags or new fields at the end of
65 the structure. The drm core checks the passed-in size for each ioctl call
66 and zero-extends any mismatches between kernel and userspace. That helps,
67 but isn't a complete solution since newer userspace on older kernels won't
68 notice that the newly added fields at the end get ignored. So this still
69 needs a new driver feature flags.
70
71 * Check all unused fields and flags and all the padding for whether it's 0,
72 and reject the ioctl if that's not the case. Otherwise your nice plan for
73 future extensions is going right down the gutters since someone will submit
74 an ioctl struct with random stack garbage in the yet unused parts. Which
75 then bakes in the ABI that those fields can never be used for anything else
76 but garbage.
77
78 * Have simple testcases for all of the above.
79
80
81Fun with Error Paths
82--------------------
83
84Nowadays we don't have any excuse left any more for drm drivers being neat
85little root exploits. This means we both need full input validation and solid
86error handling paths - GPUs will die eventually in the oddmost corner cases
87anyway:
88
89 * The ioctl must check for array overflows. Also it needs to check for
90 over/underflows and clamping issues of integer values in general. The usual
91 example is sprite positioning values fed directly into the hardware with the
92 hardware just having 12 bits or so. Works nicely until some odd display
93 server doesn't bother with clamping itself and the cursor wraps around the
94 screen.
95
96 * Have simple testcases for every input validation failure case in your ioctl.
97 Check that the error code matches your expectations. And finally make sure
98 that you only test for one single error path in each subtest by submitting
99 otherwise perfectly valid data. Without this an earlier check might reject
100 the ioctl already and shadow the codepath you actually want to test, hiding
101 bugs and regressions.
102
103 * Make all your ioctls restartable. First X really loves signals and second
104 this will allow you to test 90% of all error handling paths by just
105 interrupting your main test suite constantly with signals. Thanks to X's
106 love for signal you'll get an excellent base coverage of all your error
107 paths pretty much for free for graphics drivers. Also, be consistent with
108 how you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in its
109 userspace library. The i915 driver botched this with the set_tiling ioctl,
110 now we're stuck forever with some arcane semantics in both the kernel and
111 userspace.
112
113 * If you can't make a given codepath restartable make a stuck task at least
114 killable. GPUs just die and your users won't like you more if you hang their
115 entire box (by means of an unkillable X process). If the state recovery is
116 still too tricky have a timeout or hangcheck safety net as a last-ditch
117 effort in case the hardware has gone bananas.
118
119 * Have testcases for the really tricky corner cases in your error recovery code
120 - it's way too easy to create a deadlock between your hangcheck code and
121 waiters.
122
123
124Time, Waiting and Missing it
125----------------------------
126
127GPUs do most everything asynchronously, so we have a need to time operations and
Masanari Iidad53a7b82015-11-16 20:07:37 +0900128wait for outstanding ones. This is really tricky business; at the moment none of
Michael Ellermanefe4a772014-08-05 23:32:17 -0700129the ioctls supported by the drm/i915 get this fully right, which means there's
130still tons more lessons to learn here.
131
132 * Use CLOCK_MONOTONIC as your reference time, always. It's what alsa, drm and
133 v4l use by default nowadays. But let userspace know which timestamps are
134 derived from different clock domains like your main system clock (provided
135 by the kernel) or some independent hardware counter somewhere else. Clocks
136 will mismatch if you look close enough, but if performance measuring tools
137 have this information they can at least compensate. If your userspace can
138 get at the raw values of some clocks (e.g. through in-command-stream
139 performance counter sampling instructions) consider exposing those also.
140
141 * Use __s64 seconds plus __u64 nanoseconds to specify time. It's not the most
142 convenient time specification, but it's mostly the standard.
143
144 * Check that input time values are normalized and reject them if not. Note
145 that the kernel native struct ktime has a signed integer for both seconds
146 and nanoseconds, so beware here.
147
148 * For timeouts, use absolute times. If you're a good fellow and made your
149 ioctl restartable relative timeouts tend to be too coarse and can
150 indefinitely extend your wait time due to rounding on each restart.
151 Especially if your reference clock is something really slow like the display
Masanari Iidad53a7b82015-11-16 20:07:37 +0900152 frame counter. With a spec lawyer hat on this isn't a bug since timeouts can
Michael Ellermanefe4a772014-08-05 23:32:17 -0700153 always be extended - but users will surely hate you if their neat animations
154 starts to stutter due to this.
155
156 * Consider ditching any synchronous wait ioctls with timeouts and just deliver
157 an asynchronous event on a pollable file descriptor. It fits much better
158 into event driven applications' main loop.
159
160 * Have testcases for corner-cases, especially whether the return values for
161 already-completed events, successful waits and timed-out waits are all sane
162 and suiting to your needs.
163
164
165Leaking Resources, Not
166----------------------
167
168A full-blown drm driver essentially implements a little OS, but specialized to
169the given GPU platforms. This means a driver needs to expose tons of handles
170for different objects and other resources to userspace. Doing that right
171entails its own little set of pitfalls:
172
173 * Always attach the lifetime of your dynamically created resources to the
174 lifetime of a file descriptor. Consider using a 1:1 mapping if your resource
175 needs to be shared across processes - fd-passing over unix domain sockets
176 also simplifies lifetime management for userspace.
177
178 * Always have O_CLOEXEC support.
179
180 * Ensure that you have sufficient insulation between different clients. By
181 default pick a private per-fd namespace which forces any sharing to be done
Masanari Iidad53a7b82015-11-16 20:07:37 +0900182 explicitly. Only go with a more global per-device namespace if the objects
Michael Ellermanefe4a772014-08-05 23:32:17 -0700183 are truly device-unique. One counterexample in the drm modeset interfaces is
184 that the per-device modeset objects like connectors share a namespace with
185 framebuffer objects, which mostly are not shared at all. A separate
186 namespace, private by default, for framebuffers would have been more
187 suitable.
188
189 * Think about uniqueness requirements for userspace handles. E.g. for most drm
190 drivers it's a userspace bug to submit the same object twice in the same
191 command submission ioctl. But then if objects are shareable userspace needs
192 to know whether it has seen an imported object from a different process
193 already or not. I haven't tried this myself yet due to lack of a new class
194 of objects, but consider using inode numbers on your shared file descriptors
195 as unique identifiers - it's how real files are told apart, too.
196 Unfortunately this requires a full-blown virtual filesystem in the kernel.
197
198
199Last, but not Least
200-------------------
201
202Not every problem needs a new ioctl:
203
204 * Think hard whether you really want a driver-private interface. Of course
205 it's much quicker to push a driver-private interface than engaging in
206 lengthy discussions for a more generic solution. And occasionally doing a
207 private interface to spearhead a new concept is what's required. But in the
208 end, once the generic interface comes around you'll end up maintainer two
209 interfaces. Indefinitely.
210
211 * Consider other interfaces than ioctls. A sysfs attribute is much better for
212 per-device settings, or for child objects with fairly static lifetimes (like
213 output connectors in drm with all the detection override attributes). Or
214 maybe only your testsuite needs this interface, and then debugfs with its
215 disclaimer of not having a stable ABI would be better.
216
217Finally, the name of the game is to get it right on the first attempt, since if
218your driver proves popular and your hardware platforms long-lived then you'll
219be stuck with a given ioctl essentially forever. You can try to deprecate
220horrible ioctls on newer iterations of your hardware, but generally it takes
221years to accomplish this. And then again years until the last user able to
222complain about regressions disappears, too.