Add lockstat tool

Adding lockstat tool to trace kernel mutex lock events and
display locks statistics and displays following data:

                                  Caller   Avg Spin  Count   Max spin Total spin
                      psi_avgs_work+0x2e       3675      5       5468      18379
                     flush_to_ldisc+0x22       2833      2       4210       5667
                       n_tty_write+0x30c       3914      1       3914       3914
                               isig+0x5d       2390      1       2390       2390
                   tty_buffer_flush+0x2a       1604      1       1604       1604
                      commit_echoes+0x22       1400      1       1400       1400
          n_tty_receive_buf_common+0x3b9       1399      1       1399       1399

                                  Caller   Avg Hold  Count   Max hold Total hold
                     flush_to_ldisc+0x22      42558      2      76135      85116
                      psi_avgs_work+0x2e      14821      5      20446      74106
          n_tty_receive_buf_common+0x3b9      12300      1      12300      12300
                       n_tty_write+0x30c      10712      1      10712      10712
                               isig+0x5d       3362      1       3362       3362
                   tty_buffer_flush+0x2a       3078      1       3078       3078
                      commit_echoes+0x22       3017      1       3017       3017

Every caller to using kernel's mutex is displayed on every line.

First portion of lines show the lock acquiring data, showing the amount
of time it took to acquired given lock.

  'Caller'       - symbol acquiring the mutex
  'Average Spin' - average time to acquire the mutex
  'Count'        - number of times mutex was acquired
  'Max spin'     - maximum time to acquire the mutex
  'Total spin'   - total time spent in acquiring the mutex

Second portion of lines show the lock holding data, showing the amount
of time it took to hold given lock.

  'Caller'       - symbol holding the mutex
  'Average Hold' - average time mutex was held
  'Count'        - number of times mutex was held
  'Max hold'     - maximum time mutex was held
  'Total hold'   - total time spent in holding the mutex

This works by tracing mutex_lock/unlock kprobes, udating the lock stats
in maps and processing them in the python part.

Examples:
    lockstats                           # trace system wide
    lockstats -d 5                      # trace for 5 seconds only
    lockstats -i 5                      # display stats every 5 seconds
    lockstats -p 123                    # trace locks for PID 123
    lockstats -t 321                    # trace locks for PID 321
    lockstats -c pipe_                  # display stats only for lock callers with 'pipe_' substring
    lockstats -S acq_count              # sort lock acquired results on acquired count
    lockstats -S hld_total              # sort lock held results on total held time
    lockstats -S acq_count,hld_total    # combination of above
    lockstats -n 3                      # display 3 locks
    lockstats -s 3                      # display 3 levels of stack

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
diff --git a/tools/offwaketime.py b/tools/offwaketime.py
index b46e9e1..4809a3b 100755
--- a/tools/offwaketime.py
+++ b/tools/offwaketime.py
@@ -140,7 +140,7 @@
 // of the Process who wakes it
 BPF_HASH(wokeby, u32, struct wokeby_t);
 
-BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE);
+BPF_STACK_TRACE(stack_traces, 2);
 
 int waker(struct pt_regs *ctx, struct task_struct *p) {
     // PID and TGID of the target Process to be waken
@@ -321,7 +321,7 @@
         line = [k.target.decode('utf-8', 'replace')]
         if not args.kernel_stacks_only:
             if stack_id_err(k.t_u_stack_id):
-                line.append("[Missed User Stack]")
+                line.append("[Missed User Stack] %d" % k.t_u_stack_id)
             else:
                 line.extend([b.sym(addr, k.t_tgid).decode('utf-8', 'replace')
                     for addr in reversed(list(target_user_stack)[1:])])
@@ -353,7 +353,7 @@
         print("    %-16s %s %s" % ("waker:", k.waker.decode('utf-8', 'replace'), k.w_pid))
         if not args.kernel_stacks_only:
             if stack_id_err(k.w_u_stack_id):
-                print("    [Missed User Stack]")
+                print("    [Missed User Stack] %d" % k.w_u_stack_id)
             else:
                 for addr in waker_user_stack:
                     print("    %s" % b.sym(addr, k.w_tgid))