HOWTO - using the library with perf {#howto_perf}

@brief This HOWTO explains how to use perf which has integrated openCSD.

March 3rd 2016

This HOWTO explains how to use the perf cmd line tools and the openCSD library to collect and extract program flow traces generated by the CoreSight IP block on a Linux system. The examples have been generated using a aarch64 Juno-r1 platform. All information is considered accurate and tested using library branch opencsd-bkk16 (decode library only) and perf update branch perf-opencsd-4.5-rc6-bkk16 (decode library + perf tools) on the OpenCSD github repository.

On Target Trace Acquisition

The enhancement to the Perf tools that support the new cs_etm pmu have not been upstreamed yet. To get the required functionality branch perf-opencsd-4.5-rc6-bbk16 needs to be downloaded to the target system where traces are to be collected. This branch is an upstream v4.5-rc6 kernel supplemented with modifications to the CoreSight framework and drivers to be usable by the Perf core. Some of those patches have been queued for merging in the 4.6 cycle. Others have been submitted and some have yet to be posted for review. The process is being done incrementally.

From there compiling the perf tools with make -C tools/perf will yield a perf executable that will support CoreSight trace collection. Note that if traces are to be decompressed off target, there is no need to download and compile the openCSD library (on the target).

Before launching a trace run a sink that will collect trace data needs to be identified. All CoreSight blocks identified by the framework are registed in sysFS:

linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
20010000.etf   20040000.main_funnel  22040000.etm 22140000.etm  
230c0000.A53_funnel  23240000.etm  replicator@20020000 20030000.tpiu
20070000.etr 220c0000.A57_funnel  23040000.etm  23140000.etm 23340000.etm

CoreSight blocks are listed in the device tree for a specific system and discovered at boot time. Since tracers can be linked to more than one sink, the sink that will recieve trace data needs to be identified manually. In In this example the ETR block is selected:

root@linaro-nano:~# echo 1 > /sys/bus/coresight/devices/20070000.etr/enable_sink
  • Note that selecting a trace sink prior to launching a trace run is only temporary. Work is currently underway to specify sink selection in the "perf record" command.

Once a sink has been identify trace collection can start. An easy and yet interesting example is the uname command:

linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm// --per-thread uname

This will generate a perf.data file where execution has been traced for both user and kernel space. To narrow the field to either user or kernel space the u and k options can be specified. For example the following will limit traces to user space:

linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm//u --per-thread uname
Problems setting modules path maps, continuing anyway...
-----------------------------------------------------------
perf_event_attr:
  type                             8
  size                             112
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|IDENTIFIER
  read_format                      ID
  disabled                         1
  exclude_kernel                   1
  exclude_hv                       1
  enable_on_exec                   1
  sample_id_all                    1
------------------------------------------------------------
sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x9
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|IDENTIFIER
  read_format                      ID
  disabled                         1
  exclude_kernel                   1
  exclude_hv                       1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------
sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
mmap size 266240B
AUX area mmap length 131072
perf event ring buffer mmapped per thread
Synthesizing auxtrace information
Linux
auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0
[ perf record: Woken up 1 times to write data ]
overlapping maps:
 7f99daf000-7f99db0000 0 [vdso]
 7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
 7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so
 7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
failed to write feature 8
failed to write feature 9
failed to write feature 14
[ perf record: Captured and wrote 0.072 MB perf.data ]

linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data
_-rw------- 1 linaro linaro 77888 Mar  2 20:41 perf.data

/home/linaro/.debug/:
total 16
drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [kernel.kallsyms]
drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [vdso]
drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 bin
drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 lib

On Target Trace Collection

The entire program flow will have been recorded in the perf.data file. Information about libraries and executable is stored under $HOME/.debug . All this information needs to be collected in order to successfully decode traces off target:

linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug

Note that file vmlinux should also be added to the bundle if kernel traces have also been collected.

Off Target OpenCSD Compilation

As of this writing the openCSD library is not part of the perf tools source. It is available on github and needs to be compiled before perf.

linaro@t430:~/linaro/coresight/bkk16/$ git clone -b opencsd-bkk16 https://github.com/Linaro/OpenCSD.git opencsd-bkk16
Cloning into 'OpenCSD'...
remote: Counting objects: 2063, done.
remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done.
Resolving deltas: 100% (1399/1399), done.
Checking connectivity... done.
linaro@t430:~/linaro/coresight/bkk16/$ ls opencsd-bkk16 
decoder LICENSE  README.md

Once the source code has been acquired compilation of the openCSD library can take place. For Linux two options are available, LINUX and LINUX64, based on the host's (which has nothing to do with the target) architecture:

linaro@t430:~/linaro/coresight/bkk16/$ cd opencsd-bkk16/decoder/build/linux/
linaro@t430:~/linaro/coresight/bkk16/opencsd-bkk16/decoder/build/linux/$ ls
makefile  rctdl_c_api_lib  ref_trace_decode_lib

linaro@t430:~/linaro/coresight/bkk16/opencsd-bkk16/decoder/build/linux/$ make LINUX64=1 DEBUG=1 
...
...

linaro@t430:~/linaro/coresight//bkk16/opencsd-bkk16/decoder/build/linux/$ ls ../../lib/linux64/dbg/
libcstraced.a  libcstraced_c_api.a  libcstraced_c_api.so  libcstraced.so 

Off Target Perf Tools Compilation

As stated above not all the pieces of the solution have been upstreamed. To get all the components branch perf-opencsd-4.5-rc6-bkk16 needs to be obtained:

linaro@t430:~/linaro/coresight/bkk16/$ git clone -b perf-opencsd-4.5-rc6-bkk16 https://github.com/Linaro/OpenCSD.git perf-opencsd-4.5-rc6-bkk16
...
...

linaro@t430:~/linaro/coresight/bkk16/$ ls perf-opencsd-4.5-rc6-bkk16/ 
arch   certs    CREDITS  Documentation  firmware  include  ipc     Kconfig  lib          Makefile  net     REPORTING-BUGS  scripts   sound  usr
block  COPYING  crypto   drivers        fs        init     Kbuild  kernel   MAINTAINERS  mm        README  samples         security  tools  virt       

At this point openCSD object files needs to be copied in the cs_etm decoder directory. After that a new perf tool binary can be compiled:

linaro@t430:~/linaro/coresight/bkk16/$ mkdir perf-opencsd-4.5-rc6-bkk16/tools/perf/util/cs-etm-decoder/lib 

linaro@t430:~/linaro/coresight/bkk16/$ cp opencsd-bkk16/decoder/lib/linux64/dbg/* perf-opencsd-4.5-rc6-bkk16/tools/perf/util/cs-etm-decoder/lib/

linaro@t430:~/linaro/coresight/bkk16/$ cd perf-opencsd-4.5-rc6-bkk16
linaro@t430:~/linaro/coresight/bkk16/perf-opencsd-4.5-rc6-bkk16/$ export CSTRACE_PATH=~/linaro/coresight/bkk16/opencsd-bkk16/decoder
linaro@t430:~/linaro/coresight/bkk16/perf-opencsd-4.5-rc6-bkk16/$ make -C tools/perf ARCH=arm DEBUG=1 NO_LIBPERL=1 
...
...
linaro@t430:~/linaro/coresight/bkk16/perf-opencsd-4.5-rc6-bkk16/$ ls -l tools/perf/perf
-rwxrwxr-x 1 linaro linaro 6276360 Mar  3 10:05 tools/perf/perf

Since the openCSD library is not part of the pert tools, an environment variable telling the build scripts where to find the library is needed. If the CSTRACE_PATH variable is not defined the compilation will still be successful, but handling of CoreSight trace data won't be supported.

At the end of the compilation a new perf binary is available in tools/perf/

Trace Decoding with Perf Record

Before working with custom traces it is suggested to use a trace bundle that is known to be working properly. A sample bundle has been made available [here][2]. Trace bundles can be extracted anywhere and have no dependencies on where the perf tools and openCSD library have been compiled.

linaro@t430:~/linaro/coresight/bkk16/$ mkdir feb24
linaro@t430:~/linaro/coresight/bkk16/$ cd feb24
linaro@t430:~/linaro/coresight/bkk16/feb24/$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.feb24.tgz
linaro@t430:~/linaro/coresight/bkk16/feb24/$ md5sum uname.v4.user.feb24.tgz 
f53f11d687ce72bdbe9de2e67e960ec6  uname.v4.user.feb24.tgz
linaro@t430:~/linaro/coresight/bkk16/feb24/$ tar xf uname.v4.user.feb24.tgz
linaro@t430:~/linaro/coresight/bkk16/feb24/$ ls -la
total 1312
drwxrwxr-x 3 linaro linaro    4096 Mar  3 10:26 .
drwxrwxr-x 5 linaro linaro    4096 Mar  3 10:13 ..
drwxr-xr-x 7 linaro linaro    4096 Feb 24 12:21 .debug
-rw------- 1 linaro linaro   78016 Feb 24 12:21 perf.data
-rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.feb24.tgz 

Perf is expecting files related to the trace capture (perf.data) to be located under ~/.debug [3]. This example will remove the current ~/.debug directory to be sure everything is clean.

linaro@t430:~/linaro/coresight/bkk16/feb24/$ rm -rf ~/.debug 
linaro@t430:~/linaro/coresight/bkk16/feb24/$ cp -dpR .debug ~/
linaro@t430:~/linaro/coresight/bkk16/feb24/$ export LD_LIBRARY_PATH=~/linaro/coresight/bkk16/perf-opencsd-4.5-rc6-bkk16/tools/util/cs-etm-decoder/lib
linaro@t430:~/linaro/coresight/bkk16/feb24/$ ../perf-opencsd-4.5-rc6-bkk16/tools/perf/perf report --stdio 

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 0  of event 'cs_etm//u'
# Event count (approx.): 0
#
# Children      Self  Command  Shared Object  Symbol
# ........  ........  .......  .............  ......
#


# Samples: 0  of event 'dummy:u'
# Event count (approx.): 0
#
# Children      Self  Command  Shared Object  Symbol
# ........  ........  .......  .............  ......
#


# Samples: 115K of event 'instructions:u'
# Event count (approx.): 522009
#
# Children      Self  Command  Shared Object     Symbol                
# ........  ........  .......  ................  ......................
#
     4.13%     4.13%  uname    libc-2.21.so      [.] 0x0000000000078758
     3.81%     3.81%  uname    libc-2.21.so      [.] 0x0000000000078e50
     2.06%     2.06%  uname    libc-2.21.so      [.] 0x00000000000fcaf4
     1.65%     1.65%  uname    libc-2.21.so      [.] 0x00000000000fcae4
     1.59%     1.59%  uname    ld-2.21.so        [.] 0x000000000000a7f4
     1.50%     1.50%  uname    libc-2.21.so      [.] 0x0000000000078e40
     1.43%     1.43%  uname    libc-2.21.so      [.] 0x00000000000fcac4
     1.31%     1.31%  uname    libc-2.21.so      [.] 0x000000000002f0c0
     1.26%     1.26%  uname    ld-2.21.so        [.] 0x0000000000016888
     1.24%     1.24%  uname    libc-2.21.so      [.] 0x0000000000078e7c 
     1.24%     1.24%  uname    libc-2.21.so      [.] 0x00000000000fcab8
...

Trace Decoding with Perf Script

Working with perf scripts needs more command line options but yields interesting results.

linaro@t430:~/linaro/coresight/bkk16/feb24/$ export EXEC_PATH=/home/linaro/coresight/bkk16/perf-opencsd-4.5-rc6-bkk16/tools/perf/ 
linaro@t430:~/linaro/coresight/bkk16/feb24/$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/ 
linaro@t430:~/linaro/coresight/bkk16/feb24/$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
linaro@t430:~/linaro/coresight/bkk16/feb24/$ ../perf-opencsd-4.5-rc6-bkk16/tools/perf/perf -exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump

          7f89f24d80:   910003e0        mov     x0, sp
          7f89f24d84:   94000d53        bl      7f89f282d0 <free@plt+0x3790>
          7f89f282d0:   d11203ff        sub     sp, sp, #0x480
          7f89f282d4:   a9ba7bfd        stp     x29, x30, [sp,#-96]!
          7f89f282d8:   910003fd        mov     x29, sp
          7f89f282dc:   a90363f7        stp     x23, x24, [sp,#48]
          7f89f282e0:   9101e3b7        add     x23, x29, #0x78
          7f89f282e4:   a90573fb        stp     x27, x28, [sp,#80]
          7f89f282e8:   a90153f3        stp     x19, x20, [sp,#16]
          7f89f282ec:   aa0003fb        mov     x27, x0
          7f89f282f0:   910a82e1        add     x1, x23, #0x2a0
          7f89f282f4:   a9025bf5        stp     x21, x22, [sp,#32]
          7f89f282f8:   a9046bf9        stp     x25, x26, [sp,#64]
          7f89f282fc:   910102e0        add     x0, x23, #0x40
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>

The Linaro CoreSight Team

  • Mike Leach
  • Tor Jeremiassen
  • Chunyan Zang
  • Mathieu Poirier

One Last Thing

We welcome help on this project. If you would like to add features or help improve the way things work, we want to hear from you.

Best regards,

The Linaro CoreSight Team


[2] wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.feb24.tgz

[3) Get in touch with us if you know a way to change this.