blob: 7dc2c1c6f77917928b9cd63380a861a02d9453c1 [file] [log] [blame]
Linus Torvalds1da177e2005-04-16 15:20:36 -07001The Linux Watchdog driver API.
2
3Copyright 2002 Christer Weingel <wingel@nano-system.com>
4
5Some parts of this document are copied verbatim from the sbc60xxwdt
6driver which is (c) Copyright 2000 Jakob Oestergaard <jakob@ostenfeld.dk>
7
8This document describes the state of the Linux 2.4.18 kernel.
9
10Introduction:
11
12A Watchdog Timer (WDT) is a hardware circuit that can reset the
13computer system in case of a software fault. You probably knew that
14already.
15
16Usually a userspace daemon will notify the kernel watchdog driver via the
17/dev/watchdog special device file that userspace is still alive, at
18regular intervals. When such a notification occurs, the driver will
19usually tell the hardware watchdog that everything is in order, and
20that the watchdog should wait for yet another little while to reset
21the system. If userspace fails (RAM error, kernel bug, whatever), the
22notifications cease to occur, and the hardware watchdog will reset the
23system (causing a reboot) after the timeout occurs.
24
25The Linux watchdog API is a rather AD hoc construction and different
26drivers implement different, and sometimes incompatible, parts of it.
27This file is an attempt to document the existing usage and allow
28future driver writers to use it as a reference.
29
30The simplest API:
31
32All drivers support the basic mode of operation, where the watchdog
33activates as soon as /dev/watchdog is opened and will reboot unless
34the watchdog is pinged within a certain time, this time is called the
35timeout or margin. The simplest way to ping the watchdog is to write
36some data to the device. So a very simple watchdog daemon would look
37like this:
38
Randy Dunlap92930d92006-04-04 20:17:26 -070039#include <stdlib.h>
40#include <fcntl.h>
41
Linus Torvalds1da177e2005-04-16 15:20:36 -070042int main(int argc, const char *argv[]) {
43 int fd=open("/dev/watchdog",O_WRONLY);
44 if (fd==-1) {
45 perror("watchdog");
46 exit(1);
47 }
48 while(1) {
49 write(fd, "\0", 1);
50 sleep(10);
51 }
52}
53
54A more advanced driver could for example check that a HTTP server is
55still responding before doing the write call to ping the watchdog.
56
57When the device is closed, the watchdog is disabled. This is not
58always such a good idea, since if there is a bug in the watchdog
59daemon and it crashes the system will not reboot. Because of this,
60some of the drivers support the configuration option "Disable watchdog
61shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when
62compiling the kernel, there is no way of disabling the watchdog once
63it has been started. So, if the watchdog dameon crashes, the system
64will reboot after the timeout has passed.
65
66Some other drivers will not disable the watchdog, unless a specific
67magic character 'V' has been sent /dev/watchdog just before closing
68the file. If the userspace daemon closes the file without sending
69this special character, the driver will assume that the daemon (and
70userspace in general) died, and will stop pinging the watchdog without
71disabling it first. This will then cause a reboot.
72
73The ioctl API:
74
75All conforming drivers also support an ioctl API.
76
77Pinging the watchdog using an ioctl:
78
79All drivers that have an ioctl interface support at least one ioctl,
80KEEPALIVE. This ioctl does exactly the same thing as a write to the
81watchdog device, so the main loop in the above program could be
82replaced with:
83
84 while (1) {
85 ioctl(fd, WDIOC_KEEPALIVE, 0);
86 sleep(10);
87 }
88
89the argument to the ioctl is ignored.
90
91Setting and getting the timeout:
92
93For some drivers it is possible to modify the watchdog timeout on the
94fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
95flag set in their option field. The argument is an integer
96representing the timeout in seconds. The driver returns the real
97timeout used in the same variable, and this timeout might differ from
98the requested one due to limitation of the hardware.
99
100 int timeout = 45;
101 ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
102 printf("The timeout was set to %d seconds\n", timeout);
103
104This example might actually print "The timeout was set to 60 seconds"
105if the device has a granularity of minutes for its timeout.
106
107Starting with the Linux 2.4.18 kernel, it is possible to query the
108current timeout using the GETTIMEOUT ioctl.
109
110 ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
111 printf("The timeout was is %d seconds\n", timeout);
112
Corey Minyarde05b59f2006-04-19 22:40:53 +0200113Pretimeouts:
114
115Some watchdog timers can be set to have a trigger go off before the
116actual time they will reset the system. This can be done with an NMI,
117interrupt, or other mechanism. This allows Linux to record useful
118information (like panic information and kernel coredumps) before it
119resets.
120
121 pretimeout = 10;
122 ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout);
123
124Note that the pretimeout is the number of seconds before the time
125when the timeout will go off. It is not the number of seconds until
126the pretimeout. So, for instance, if you set the timeout to 60 seconds
127and the pretimeout to 10 seconds, the pretimout will go of in 50
128seconds. Setting a pretimeout to zero disables it.
129
130There is also a get function for getting the pretimeout:
131
132 ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout);
133 printf("The pretimeout was is %d seconds\n", timeout);
134
135Not all watchdog drivers will support a pretimeout.
136
137Environmental monitoring:
Linus Torvalds1da177e2005-04-16 15:20:36 -0700138
139All watchdog drivers are required return more information about the system,
140some do temperature, fan and power level monitoring, some can tell you
141the reason for the last reboot of the system. The GETSUPPORT ioctl is
142available to ask what the device can do:
143
144 struct watchdog_info ident;
145 ioctl(fd, WDIOC_GETSUPPORT, &ident);
146
147the fields returned in the ident struct are:
148
149 identity a string identifying the watchdog driver
150 firmware_version the firmware version of the card if available
151 options a flags describing what the device supports
152
153the options field can have the following bits set, and describes what
154kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
155return. [FIXME -- Is this correct?]
156
157 WDIOF_OVERHEAT Reset due to CPU overheat
158
159The machine was last rebooted by the watchdog because the thermal limit was
160exceeded
161
162 WDIOF_FANFAULT Fan failed
163
164A system fan monitored by the watchdog card has failed
165
166 WDIOF_EXTERN1 External relay 1
167
168External monitoring relay/source 1 was triggered. Controllers intended for
169real world applications include external monitoring pins that will trigger
170a reset.
171
172 WDIOF_EXTERN2 External relay 2
173
174External monitoring relay/source 2 was triggered
175
176 WDIOF_POWERUNDER Power bad/power fault
177
178The machine is showing an undervoltage status
179
180 WDIOF_CARDRESET Card previously reset the CPU
181
182The last reboot was caused by the watchdog card
183
184 WDIOF_POWEROVER Power over voltage
185
186The machine is showing an overvoltage status. Note that if one level is
187under and one over both bits will be set - this may seem odd but makes
188sense.
189
190 WDIOF_KEEPALIVEPING Keep alive ping reply
191
192The watchdog saw a keepalive ping since it was last queried.
193
194 WDIOF_SETTIMEOUT Can set/get the timeout
195
Corey Minyarde05b59f2006-04-19 22:40:53 +0200196The watchdog can do pretimeouts.
197
198 WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set
199
Linus Torvalds1da177e2005-04-16 15:20:36 -0700200
201For those drivers that return any bits set in the option field, the
202GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
203status, and the status at the last reboot, respectively.
204
205 int flags;
206 ioctl(fd, WDIOC_GETSTATUS, &flags);
207
208 or
209
210 ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
211
212Note that not all devices support these two calls, and some only
213support the GETBOOTSTATUS call.
214
215Some drivers can measure the temperature using the GETTEMP ioctl. The
216returned value is the temperature in degrees farenheit.
217
218 int temperature;
219 ioctl(fd, WDIOC_GETTEMP, &temperature);
220
221Finally the SETOPTIONS ioctl can be used to control some aspects of
222the cards operation; right now the pcwd driver is the only one
223supporting thiss ioctl.
224
225 int options = 0;
226 ioctl(fd, WDIOC_SETOPTIONS, options);
227
228The following options are available:
229
230 WDIOS_DISABLECARD Turn off the watchdog timer
231 WDIOS_ENABLECARD Turn on the watchdog timer
232 WDIOS_TEMPPANIC Kernel panic on temperature trip
233
234[FIXME -- better explanations]
235
236Implementations in the current drivers in the kernel tree:
237
238Here I have tried to summarize what the different drivers support and
239where they do strange things compared to the other drivers.
240
241acquirewdt.c -- Acquire Single Board Computer
242
243 This driver has a hardcoded timeout of 1 minute
244
245 Supports CONFIG_WATCHDOG_NOWAYOUT
246
247 GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if
248 the device is open, 0 if not. [FIXME -- isn't this rather
249 silly? To be able to use the ioctl, the device must be open
250 and so GETSTATUS will always return 1].
251
252advantechwdt.c -- Advantech Single Board Computer
253
254 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
255
256 Supports CONFIG_WATCHDOG_NOWAYOUT
257
258 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
259 The GETSTATUS call returns if the device is open or not.
260 [FIXME -- silliness again?]
261
Kumar Galaa2f40cc2005-09-03 15:55:33 -0700262booke_wdt.c -- PowerPC BookE Watchdog Timer
263
264 Timeout default varies according to frequency, supports
265 SETTIMEOUT
266
267 Watchdog can not be turned off, CONFIG_WATCHDOG_NOWAYOUT
268 does not make sense
269
270 GETSUPPORT returns the watchdog_info struct, and
271 GETSTATUS returns the supported options. GETBOOTSTATUS
272 returns a 1 if the last reset was caused by the
273 watchdog and a 0 otherwise. This watchdog can not be
274 disabled once it has been started. The wdt_period kernel
275 parameter selects which bit of the time base changing
276 from 0->1 will trigger the watchdog exception. Changing
277 the timeout from the ioctl calls will change the
278 wdt_period as defined above. Finally if you would like to
279 replace the default Watchdog Handler you can implement the
280 WatchdogHandler() function in your own code.
281
Linus Torvalds1da177e2005-04-16 15:20:36 -0700282eurotechwdt.c -- Eurotech CPU-1220/1410
283
284 The timeout can be set using the SETTIMEOUT ioctl and defaults
285 to 60 seconds.
286
287 Also has a module parameter "ev", event type which controls
288 what should happen on a timeout, the string "int" or anything
289 else that causes a reboot. [FIXME -- better description]
290
291 Supports CONFIG_WATCHDOG_NOWAYOUT
292
293 GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
294 GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
295
296i810-tco.c -- Intel 810 chipset
297
298 Also has support for a lot of other i8x0 stuff, but the
299 watchdog is one of the things.
300
301 The timeout is set using the module parameter "i810_margin",
302 which is in steps of 0.6 seconds where 2<i810_margin<64. The
303 driver supports the SETTIMEOUT ioctl.
304
305 Supports CONFIG_WATCHDOG_NOWAYOUT.
306
307 GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call
308 returns some kind of timer value which ist not compatible with
309 the other drivers. GETBOOT status returns some kind of
310 hardware specific boot status. [FIXME -- describe this]
311
312ib700wdt.c -- IB700 Single Board Computer
313
314 Default timeout of 30 seconds and the timeout is settable
315 using the SETTIMEOUT ioctl. Note that only a few timeout
316 values are supported.
317
318 Supports CONFIG_WATCHDOG_NOWAYOUT
319
320 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
321 The GETSTATUS call returns if the device is open or not.
322 [FIXME -- silliness again?]
323
324machzwd.c -- MachZ ZF-Logic
325
326 Hardcoded timeout of 10 seconds
327
328 Has a module parameter "action" that controls what happens
329 when the timeout runs out which can be 0 = RESET (default),
330 1 = SMI, 2 = NMI, 3 = SCI.
331
332 Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character
333 'V' close handling.
334
335 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
336 returns if the device is open or not. [FIXME -- silliness
337 again?]
338
339mixcomwd.c -- MixCom Watchdog
340
341 [FIXME -- I'm unable to tell what the timeout is]
342
343 Supports CONFIG_WATCHDOG_NOWAYOUT
344
345 GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
346 the device is opened or not [FIXME -- I'm not really sure how
347 this works, there seems to be some magic connected to
348 CONFIG_WATCHDOG_NOWAYOUT]
349
350pcwd.c -- Berkshire PC Watchdog
351
352 Hardcoded timeout of 1.5 seconds
353
354 Supports CONFIG_WATCHDOG_NOWAYOUT
355
356 GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
357 GETSTATUS and GETBOOTSTATUS return something useful.
358
359 The SETOPTIONS call can be used to enable and disable the card
360 and to ask the driver to call panic if the system overheats.
361
362sbc60xxwdt.c -- 60xx Single Board Computer
363
364 Hardcoded timeout of 10 seconds
365
366 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
367 character 'V' close handling.
368
369 No bits set in GETSUPPORT
370
371scx200.c -- National SCx200 CPUs
372
373 Not in the kernel yet.
374
375 The timeout is set using a module parameter "margin" which
376 defaults to 60 seconds. The timeout can also be set using
377 SETTIMEOUT and read using GETTIMEOUT.
378
379 Supports a module parameter "nowayout" that is initialized
380 with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the
381 magic character 'V' handling.
382
383shwdt.c -- SuperH 3/4 processors
384
385 [FIXME -- I'm unable to tell what the timeout is]
386
387 Supports CONFIG_WATCHDOG_NOWAYOUT
388
389 GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
390 returns if the device is open or not. [FIXME -- silliness
391 again?]
392
393softdog.c -- Software watchdog
394
395 The timeout is set with the module parameter "soft_margin"
396 which defaults to 60 seconds, the timeout is also settable
397 using the SETTIMEOUT ioctl.
398
399 Supports CONFIG_WATCHDOG_NOWAYOUT
400
401 WDIOF_SETTIMEOUT bit set in GETSUPPORT
402
403w83877f_wdt.c -- W83877F Computer
404
405 Hardcoded timeout of 30 seconds
406
407 Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
408 character 'V' close handling.
409
410 No bits set in GETSUPPORT
411
412w83627hf_wdt.c -- w83627hf watchdog
413
414 Timeout that defaults to 60 seconds, supports SETTIMEOUT.
415
416 Supports CONFIG_WATCHDOG_NOWAYOUT
417
418 GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
419 The GETSTATUS call returns if the device is open or not.
420
421wdt.c -- ICS WDT500/501 ISA and
422wdt_pci.c -- ICS WDT500/501 PCI
423
424 Default timeout of 60 seconds. The timeout is also settable
425 using the SETTIMEOUT ioctl.
426
427 Supports CONFIG_WATCHDOG_NOWAYOUT
428
429 GETSUPPORT returns with bits set depending on the actual
430 card. The WDT501 supports a lot of external monitoring, the
431 WDT500 much less.
432
433wdt285.c -- Footbridge watchdog
434
435 The timeout is set with the module parameter "soft_margin"
436 which defaults to 60 seconds. The timeout is also settable
437 using the SETTIMEOUT ioctl.
438
439 Does not support CONFIG_WATCHDOG_NOWAYOUT
440
441 WDIOF_SETTIMEOUT bit set in GETSUPPORT
442
443wdt977.c -- Netwinder W83977AF chip
444
445 Hardcoded timeout of 3 minutes
446
447 Supports CONFIG_WATCHDOG_NOWAYOUT
448
449 Does not support any ioctls at all.
450