Brendan Gregg | fe430e5 | 2016-02-10 01:34:53 -0800 | [diff] [blame] | 1 | Demonstrations of oomkill, the Linux eBPF/bcc version. |
| 2 | |
| 3 | |
| 4 | oomkill is a simple program that traces the Linux out-of-memory (OOM) killer, |
| 5 | and shows basic details on one line per OOM kill: |
| 6 | |
| 7 | # ./oomkill |
| 8 | Tracing oom_kill_process()... Ctrl-C to end. |
| 9 | 21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724 |
| 10 | 21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932 |
| 11 | |
| 12 | The first line shows that PID 22516, with process name "perl", was OOM killed |
| 13 | when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill |
| 14 | happened to be triggered by PID 3297, process name "ntpd", doing some memory |
| 15 | allocation. |
| 16 | |
| 17 | The system log (dmesg) shows pages of details and system context about an OOM |
| 18 | kill. What it currently lacks, however, is context on how the system had been |
| 19 | changing over time. I've seen OOM kills where I wanted to know if the system |
| 20 | was at steady state at the time, or if there had been a recent increase in |
| 21 | workload that triggered the OOM event. oomkill provides some context: at the |
| 22 | end of the line is the load average information from /proc/loadavg. For both |
| 23 | of the oomkills here, we can see that the system was getting busier at the |
| 24 | time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average" |
| 25 | of 0.30). |
| 26 | |
| 27 | oomkill can also be the basis of other tools and customizations. For example, |
| 28 | you can edit it to include other task_struct details from the target PID at |
| 29 | the time of the OOM kill. |
| 30 | |
| 31 | |
| 32 | The following commands can be used to test this program, and invoke a memory |
| 33 | consuming process that exhausts system memory and is OOM killed: |
| 34 | |
| 35 | sysctl -w vm.overcommit_memory=1 # always overcommit |
| 36 | perl -e 'while (1) { $a .= "A" x 1024; }' # eat all memory |
| 37 | |
| 38 | WARNING: This exhausts system memory after disabling some overcommit checks. |
| 39 | Only test in a lab environment. |