| Demonstrations of nfsdist, the Linux eBPF/bcc version. |
| |
| nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their |
| latency as a power-of-2 histogram. For example: |
| |
| |
| ./nfsdist.py |
| |
| Tracing NFS operation latency... Hit Ctrl-C to end. |
| |
| operation = read |
| usecs : count distribution |
| 0 -> 1 : 4 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 7107 |************** | |
| 16 -> 31 : 19864 |****************************************| |
| 32 -> 63 : 1494 |*** | |
| 64 -> 127 : 491 | | |
| 128 -> 255 : 1810 |*** | |
| 256 -> 511 : 6356 |************ | |
| 512 -> 1023 : 4860 |********* | |
| 1024 -> 2047 : 3070 |****** | |
| 2048 -> 4095 : 1853 |*** | |
| 4096 -> 8191 : 921 |* | |
| 8192 -> 16383 : 122 | | |
| 16384 -> 32767 : 15 | | |
| 32768 -> 65535 : 5 | | |
| 65536 -> 131071 : 2 | | |
| 131072 -> 262143 : 1 | | |
| |
| operation = write |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 1 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 9 | | |
| 64 -> 127 : 19491 |****************************************| |
| 128 -> 255 : 3064 |****** | |
| 256 -> 511 : 940 |* | |
| 512 -> 1023 : 365 | | |
| 1024 -> 2047 : 312 | | |
| 2048 -> 4095 : 119 | | |
| 4096 -> 8191 : 31 | | |
| 8192 -> 16383 : 84 | | |
| 16384 -> 32767 : 31 | | |
| 32768 -> 65535 : 5 | | |
| 65536 -> 131071 : 3 | | |
| 131072 -> 262143 : 0 | | |
| 262144 -> 524287 : 1 | | |
| |
| operation = getattr |
| usecs : count distribution |
| 0 -> 1 : 27 |****************************************| |
| 2 -> 3 : 2 |** | |
| 4 -> 7 : 3 |**** | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 0 | | |
| 128 -> 255 : 0 | | |
| 256 -> 511 : 2 |** | |
| 512 -> 1023 : 2 |** | |
| |
| operation = open |
| usecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 0 | | |
| 4 -> 7 : 0 | | |
| 8 -> 15 : 0 | | |
| 16 -> 31 : 0 | | |
| 32 -> 63 : 0 | | |
| 64 -> 127 : 0 | | |
| 128 -> 255 : 0 | | |
| 256 -> 511 : 2 |****************************************| |
| |
| |
| In this example you can see that the read traffic is rather bi-modal, with about |
| 26K reads falling within 8 - 30 usecs and about 18K reads spread between 128 - |
| 8191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket. |
| The faster read traffic is probably coming from a filesystem cache and the slower |
| traffic from disk. The reason why the writes are so consistently fast is because |
| this example test was run on a couple of VM's and I believe the hypervisor was |
| caching all the write traffic to memory. |
| |
| This "latency" is measured from when the operation was issued from the VFS |
| interface to the file system, to when it completed. This spans everything: |
| RPC latency, network latency, file system CPU cycles, file system locks, run |
| queue latency, etc. This is a better measure of the latency suffered by |
| applications reading from a NFS share and can better expose problems |
| experienced by NFS clients. |
| |
| Note that this only traces the common NFS operations (read, write, open and |
| getattr). I chose to include getattr as a significant percentage of NFS |
| traffic end up being getattr calls and are a good indicator of problems |
| with an NFS server. |
| |
| An optional interval and a count can be provided, as well as -m to show the |
| distributions in milliseconds. For example: |
| |
| ./nfsdist -m 1 5 |
| Tracing NFS operation latency... Hit Ctrl-C to end. |
| |
| 11:02:39: |
| |
| operation = write |
| msecs : count distribution |
| 0 -> 1 : 1 | | |
| 2 -> 3 : 24 |******** | |
| 4 -> 7 : 114 |****************************************| |
| 8 -> 15 : 9 |*** | |
| 16 -> 31 : 1 | | |
| 32 -> 63 : 1 | | |
| |
| 11:02:40: |
| |
| operation = write |
| msecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 11 |*** | |
| 4 -> 7 : 111 |****************************************| |
| 8 -> 15 : 13 |**** | |
| 16 -> 31 : 1 | | |
| |
| 11:02:41: |
| |
| operation = write |
| msecs : count distribution |
| 0 -> 1 : 0 | | |
| 2 -> 3 : 21 |****** | |
| 4 -> 7 : 137 |****************************************| |
| 8 -> 15 : 3 | | |
| |
| This shows a write workload, with writes hovering primarily in the 4-7ms range. |
| |
| USAGE message: |
| |
| |
| ./nfsdist -h |
| usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count] |
| |
| Summarize NFS operation latency |
| |
| positional arguments: |
| interval output interval, in seconds |
| count number of outputs |
| |
| optional arguments: |
| -h, --help show this help message and exit |
| -T, --notimestamp don't include timestamp on interval output |
| -m, --milliseconds output in milliseconds |
| -p PID, --pid PID trace this PID only |
| |
| examples: |
| ./nfsdist # show operation latency as a histogram |
| ./nfsdist -p 181 # trace PID 181 only |
| ./nfsdist 1 10 # print 1 second summaries, 10 times |
| ./nfsdist -m 5 # 5s summaries, milliseconds |