HOWTO - using the library with perf {#howto_perf}

@brief Using command line perf and OpenCSD to collect and decode trace.

This HOWTO explains how to use the perf cmd line tools and the openCSD library to collect and extract program flow traces generated by the CoreSight IP blocks on a Linux system. The examples have been generated using an aarch64 Juno-r0 platform.

On Target Trace Acquisition - Perf Record

Compile the perf tool from the same kernel source code version you are using with:

make -C tools/perf

This will yield a perf executable that will support CoreSight trace collection.

Note: If traces are to be decompressed off target, there is no need to download and compile the openCSD library (on the target).

If you are instead planning to use perf to record and decode the trace on the target, compile the perf tool linking against the openCSD library, in the following way:

make -C tools/perf VF=1 CORESIGHT=1

Further information on the needed build environments and options are detailed later in the section Off Target Perf Tools Compilation.

Before launching a trace run a sink that will collect trace data needs to be identified. All CoreSight blocks identified by the framework are registed in sysFS:

linaro@linaro-nano:~$ ls /sys/bus/coresight/devices/
etm0  etm2  etm4  etm6  funnel0  funnel2  funnel4      stm0      tmc_etr0
etm1  etm3  etm5  etm7  funnel1  funnel3  replicator0  tmc_etf0

CoreSight blocks are listed in the device tree for a specific system and discovered at boot time. Since tracers can be linked to more than one sink, the sink that will recieve trace data needs to be identified and given as an option on the perf command line. Once a sink has been identify trace collection can start. An easy and yet interesting example is the uname command:

linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -e cs_etm/@tmc_etr0/ --per-thread uname

This will generate a perf.data file where execution has been traced for both user and kernel space. To narrow the field to either user or kernel space the u and k options can be specified. For example the following will limit traces to user space:

linaro@linaro-nano:~/kernel$ ./tools/perf/perf record -vvv -e cs_etm/@tmc_etr0/u --per-thread uname
Problems setting modules path maps, continuing anyway...
-----------------------------------------------------------
perf_event_attr:
  type                             8
  size                             112
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|IDENTIFIER
  read_format                      ID
  disabled                         1
  exclude_kernel                   1
  exclude_hv                       1
  enable_on_exec                   1
  sample_id_all                    1
------------------------------------------------------------
sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
------------------------------------------------------------
perf_event_attr:
  type                             1
  size                             112
  config                           0x9
  { sample_period, sample_freq }   1
  sample_type                      IP|TID|IDENTIFIER
  read_format                      ID
  disabled                         1
  exclude_kernel                   1
  exclude_hv                       1
  mmap                             1
  comm                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------
sys_perf_event_open: pid 11375  cpu -1  group_fd -1  flags 0x8
mmap size 266240B
AUX area mmap length 131072
perf event ring buffer mmapped per thread
Synthesizing auxtrace information
Linux
auxtrace idx 0 old 0 head 0x11ea0 diff 0x11ea0
[ perf record: Woken up 1 times to write data ]
overlapping maps:
 7f99daf000-7f99db0000 0 [vdso]
 7f99d84000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
 7f99d84000-7f99daf000 0 /lib/aarch64-linux-gnu/ld-2.21.so
 7f99db0000-7f99db3000 0 /lib/aarch64-linux-gnu/ld-2.21.so
failed to write feature 8
failed to write feature 9
failed to write feature 14
[ perf record: Captured and wrote 0.072 MB perf.data ]

linaro@linaro-nano:~/kernel$ ls -l ~/.debug/ perf.data
_-rw------- 1 linaro linaro 77888 Mar  2 20:41 perf.data

/home/linaro/.debug/:
total 16
drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [kernel.kallsyms]
drwxr-xr-x 2 linaro linaro 4096 Mar  2 20:40 [vdso]
drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 bin
drwxr-xr-x 3 linaro linaro 4096 Mar  2 20:40 lib

Trace data filtering

The amount of traces generated by CoreSight tracers is staggering, event for the most simple trace scenario. Reducing trace generation to specific areas of interest is desirable to save trace buffer space and avoid getting lost in the trace data that isn't relevant. Supplementing the 'k' and 'u' options described above is the notion of address filters.

On CoreSight two types of address filter have been implemented - address range and start/stop filter:

Address range filters: With address range filters traces are generated if the instruction pointer falls within the specified range. Any work done by the CPU outside of that range will not be traced. Address range filters can be specified for both user and kernel space session:

perf record -e cs_etm/@tmc_etr0/k --filter 'filter 0xffffff8008562d0c/0x48' --per-thread uname

perf record -e cs_etm/@tmc_etr0/u --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main

When dealing with kernel space trace addresses are typically taken in the 'System.map' file. In user space addresses are relocatable and can be extracted from an objdump output:

$ aarch64-linux-gnu-objdump  -d libcstest.so.1.0
...
...
000000000000072c <coresight_test1>:		<------------ Beginning of traces
 72c:	d10083ff 	sub	sp, sp, #0x20
 730:	b9000fe0 	str	w0, [sp,#12]
 734:	b9001fff 	str	wzr, [sp,#28]
 738:	14000007 	b	754 <coresight_test1+0x28>
 73c:	b9400fe0 	ldr	w0, [sp,#12]
 740:	11000800 	add	w0, w0, #0x2
 744:	b9000fe0 	str	w0, [sp,#12]
 748:	b9401fe0 	ldr	w0, [sp,#28]
 74c:	11000400 	add	w0, w0, #0x1
 750:	b9001fe0 	str	w0, [sp,#28]
 754:	b9401fe0 	ldr	w0, [sp,#28]
 758:	7100101f 	cmp	w0, #0x4
 75c:	54ffff0d 	b.le	73c <coresight_test1+0x10>
 760:	b9400fe0 	ldr	w0, [sp,#12]
 764:	910083ff 	add	sp, sp, #0x20
 768:	d65f03c0 	ret
...
...

Following the address the amount of byte is specified and if tracing in user space, the full path to the binary (or library) being traced.

Start/Stop filters: With start/stop filters traces are generated when the instruction pointer is equal to the start address. Incidentally traces stop being generated when the insruction pointer is equal to the stop address. Anything that happens between there to events is traced:

perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0' --per-thread  uname

perf record -vvv -e cs_etm/@tmc_etr0/u --filter 'start 0x72c@/opt/lib/libcstest.so.1.0,    \
                                                     stop 0x40082c@/home/linaro/main'          \
                                                 --per-thread ./main

Limitation on address filters: The only limitation on address filters is the amount of address comparator found on an implementation and the mutual exclusion between range and start stop filters. As such the following example would not work:

perf record -e cs_etm/@tmc_etr0/k --filter 'start 0xffffff800856bc50,stop 0xffffff800856bcb0, \  // start/stop
                                                filter 0x72c/0x40@/opt/lib/libcstest.so.1.0'      \  // address range
                                                --per-thread  uname

Additional Trace Options

Additional options can be used during trace collection that add information to the captured trace.

  • Timestamps: These packets are added to the trace streams to allow correlation of different sources where tools support this.
  • Cycle Counts: These packets are added to get a count of cycles for blocks of executed instructions. Adding cycle counts will considerably increase the amount of generated trace. The relationship between cycle counts and executed instructions differs according to the trace protocol. For example, the ETMv4 protocol will emit counts for groups of instructions according to a minimum count threshold. Presently this threshold is fixed at 256 cycles for perf record.

Command line options in perf record to use these features are part of the options for the cs_etm event:

perf record -e cs_etm/timestamp,cycacc,@tmc_etr0/ --per-thread uname

At current version, perf record and perf script do not use this additional information.

The cs_etm perf event

System information for this perf pmu event can be found at:

/sys/devices/cs_etm

This contains internal format of the parameters described above:

root@linaro-developer:~# ls /sys/devices/cs_etm/format
contextid  cycacc  retstack  sinkid  timestamp

and names of registered sinks:

root@linaro-developer:~# ls /sys/devices/cs_etm/sinks
tmc_etf0  tmc_etr0  tpiu0

Note: The sinkid parameter is there to document the usage of a 32-bit internal parameter to pass the sink name used in the cs_etm/@sink/ command to the kernel drivers. It can be used directly as cs_etm/sinkid=<hash_value>/ but this is not recommended as the values used are considered opaque and subject to changes.

On Target Trace Collection

The entire program flow will have been recorded in the perf.data file. Information about libraries and executable is stored under $HOME/.debug:

linaro@linaro-nano:~/kernel$ tree ~/.debug
.debug
├── [kernel.kallsyms]
│   └── 0542921808098d591a7acba5a1163e8991897669
│       └── kallsyms
├── [vdso]
│   └── 551fbbe29579eb63be3178a04c16830b8d449769
│       └── vdso
├── bin
│   └── uname
│       └── ed95e81f97c4471fb2ccc21e356b780eb0c92676
│           └── elf
└── lib
    └── aarch64-linux-gnu
        ├── ld-2.21.so
        │   └── 94912dc5a1dc8c7ef2c4e4649d4b1639b6ebc8b7
        │       └── elf
        └── libc-2.21.so
            └── 169a143e9c40cfd9d09695333e45fd67743cd2d6
                └── elf

13 directories, 5 files
linaro@linaro-nano:~/kernel$

All this information needs to be collected in order to successfully decode traces off target:

linaro@linaro-nano:~/kernel$ tar czf uname.trace.tgz perf.data ~/.debug

Note that file vmlinux should also be added to the bundle if kernel traces have also been collected.

Off Target OpenCSD Compilation

The openCSD library is not part of the perf tools. It is available on github and needs to be compiled before the perf tools. Checkout the required branch/tag version into a local directory.

linaro@t430:~/linaro/coresight$ git clone https://github.com/Linaro/OpenCSD.git my-opencsd
Cloning into 'OpenCSD'...
remote: Counting objects: 2063, done.
remote: Total 2063 (delta 0), reused 0 (delta 0), pack-reused 2063
Receiving objects: 100% (2063/2063), 2.51 MiB | 1.24 MiB/s, done.
Resolving deltas: 100% (1399/1399), done.
Checking connectivity... done.
linaro@t430:~/linaro/coresight$ ls my-opencsd
decoder LICENSE  README.md HOWTO.md TODO

Once the source code has been acquired compilation of the openCSD library can take place. For Linux two options are available, LINUX and LINUX64, based on the host's (which has nothing to do with the target) architecture:

linaro@t430:~/linaro/coresight/$ cd my-opencsd/decoder/build/linux/
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls
makefile  rctdl_c_api_lib  ref_trace_decode_lib

linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ make LINUX64=1 DEBUG=1
...
...

linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls ../../lib/linux64/dbg/
libopencsd.a  libopencsd_c_api.a  libopencsd_c_api.so  libopencsd.so

From there the header file and libraries need to be installed on the system, something that requires root privileges. The default installation path is /usr/include/opencsd for the header files and /usr/lib/ for the libraries:

linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ sudo make install
linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/include/opencsd
total 60
drwxr-xr-x 2 root root  4096 Dec 12 10:19 c_api
drwxr-xr-x 2 root root  4096 Dec 12 10:19 etmv3
drwxr-xr-x 2 root root  4096 Dec 12 10:19 etmv4
-rw-r--r-- 1 root root 28049 Dec 12 10:19 ocsd_if_types.h
drwxr-xr-x 2 root root  4096 Dec 12 10:19 ptm
drwxr-xr-x 2 root root  4096 Dec 12 10:19 stm
-rw-r--r-- 1 root root  7264 Dec 12 10:19 trc_gen_elem_types.h
-rw-r--r-- 1 root root  3972 Dec 12 10:19 trc_pkt_types.h

linaro@t430:~/linaro/coresight/my-opencsd/decoder/build/linux$ ls -l /usr/lib/libopencsd*
-rw-r--r-- 1 root root  598720 Dec 12 10:19 /usr/lib/libopencsd_c_api.so
-rw-r--r-- 1 root root 4692200 Dec 12 10:19 /usr/lib/libopencsd.so

A "clean_install" target is also available so that openCSD installed files can be removed from a system. Going forward the goal is to have the openCSD library packaged as a Debian or RPM archive so that it can be installed from a distribution without having to be compiled.

Off Target Perf Tools Compilation

As mentioned above the openCSD library is not part of the perf tools' code base and needs to be installed on a system prior to compilation. Information about the status of the openCSD library on a system is given at compile time by the perf tools build script:

linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf
Auto-detecting system features:
...                         dwarf: [ on  ]
...            dwarf_getlocations: [ on  ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ OFF ]
...                        libelf: [ on  ]
...                       libnuma: [ OFF ]
...        numa_num_possible_cpus: [ OFF ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...                      libslang: [ on  ]
...                     libcrypto: [ on  ]
...                     libunwind: [ OFF ]
...            libdw-dwarf-unwind: [ on  ]
...                          zlib: [ on  ]
...                          lzma: [ OFF ]
...                     get_cpuid: [ on  ]
...                           bpf: [ on  ]
...                    libopencsd: [ on  ]  <-------

At the end of the compilation a new perf binary is available in tools/perf/:

linaro@t430:~/linaro/linux-kernel$ ldd tools/perf/perf
linux-vdso.so.1 =>  (0x00007fff135db000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f15f9176000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f15f8f6e000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f15f8c64000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f15f8a60000)
libopencsd_c_api.so => /usr/lib/libopencsd_c_api.so (0x00007f15f884e000)   <-------
libelf.so.1 => /usr/lib/x86_64-linux-gnu/libelf.so.1 (0x00007f15f8635000)
libdw.so.1 => /usr/lib/x86_64-linux-gnu/libdw.so.1 (0x00007f15f83ec000)
libaudit.so.1 => /lib/x86_64-linux-gnu/libaudit.so.1 (0x00007f15f81c5000)
libslang.so.2 => /lib/x86_64-linux-gnu/libslang.so.2 (0x00007f15f7e38000)
libperl.so.5.22 => /usr/lib/x86_64-linux-gnu/libperl.so.5.22 (0x00007f15f7a5d000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f15f7693000)
libpython2.7.so.1.0 => /usr/lib/x86_64-linux-gnu/libpython2.7.so.1.0 (0x00007f15f7104000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f15f6eea000)
/lib64/ld-linux-x86-64.so.2 (0x0000559b88038000)
libopencsd.so => /usr/lib/libopencsd.so (0x00007f15f6c62000)    <-------
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f15f68df000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f15f66c9000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f15f64a6000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f15f6296000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f15f605e000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f15f5e5a000)

Additional debug output from the decoder can be compiled in by setting the CSTRACE_RAW environment variable. Setting this to packed gets trace frame output as follows:-

Frame Data; Index    576;    RAW_PACKED; d6 d6 d6 d6 d6 d6 d6 d6 fc fb d6 d6 d6 d6 e0 7f 
Frame Data; Index    576;   ID_DATA[0x14]; d7 d6 d7 d6 d7 d6 d7 d6 fd fb d7 d6 d7 d6 e0

Set to any other value will remove the RAW_PACKED lines.

Working with an alternate version of the openCSD library

When compiling the perf tools it is possible to reference another version of the openCSD library than the one installed on the system. This is useful when working with multiple development trees or having the desire to keep system libraries intact. Two environment variable are available to tell the perf tools build script where to get the header file and libraries, namely CSINCLUDES and CSLIBS:

linaro@t430:~/linaro/linux-kernel$ export CSINCLUDES=~/linaro/coresight/my-opencsd/decoder/include/
linaro@t430:~/linaro/linux-kernel$ export CSLIBS=~/linaro/coresight/my-opencsd/decoder/lib/builddir/
linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf

This will have the effect of compiling and linking against the provided library. Since the system's openCSD library is in the loader's search patch the LD_LIBRARY_PATH environment variable needs to be set.

linaro@t430:~/linaro/linux-kernel$ export LD_LIBRARY_PATH=$CSLIBS

Trace Decoding with Perf Report

Before working with custom traces it is suggested to use a trace bundle that is known to be working properly. A sample bundle has been made available here 2. Trace bundles can be extracted anywhere and have no dependencies on where the perf tools and openCSD library have been compiled.

linaro@t430:~/linaro/coresight$ mkdir sept20
linaro@t430:~/linaro/coresight$ cd sept20
linaro@t430:~/linaro/coresight/sept20$ wget http://people.linaro.org/~mathieu.poirier/openCSD/uname.v4.user.sept20.tgz
linaro@t430:~/linaro/coresight/sept20$ md5sum uname.v4.user.sept20.tgz
f53f11d687ce72bdbe9de2e67e960ec6  uname.v4.user.sept20.tgz
linaro@t430:~/linaro/coresight/sept20$ tar xf uname.v4.user.sept20.tgz
linaro@t430:~/linaro/coresight/sept20$ ls -la
total 1312
drwxrwxr-x 3 linaro linaro    4096 Mar  3 10:26 .
drwxrwxr-x 5 linaro linaro    4096 Mar  3 10:13 ..
drwxr-xr-x 7 linaro linaro    4096 Feb 24 12:21 .debug
-rw------- 1 linaro linaro   78016 Feb 24 12:21 perf.data
-rw-rw-r-- 1 linaro linaro 1245881 Feb 24 12:25 uname.v4.user.sept20.tgz

Perf is expecting files related to the trace capture (perf.data) to be located in the buildid directory. By default this is under ~/.debug. Alternatively the default buildid directory can be changed using the command:

 perf config --system buildid.dir=/my/own/buildid/dir

This example will remove the current ~/.debug directory to be sure everything is clean.

linaro@t430:~/linaro/coresight/sept20$ rm -rf ~/.debug
linaro@t430:~/linaro/coresight/sept20$ cp -dpR .debug ~/
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio

# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 0  of event 'cs_etm//u'
# Event count (approx.): 0
#
# Children      Self  Command  Shared Object  Symbol
# ........  ........  .......  .............  ......
#


# Samples: 0  of event 'dummy:u'
# Event count (approx.): 0
#
# Children      Self  Command  Shared Object  Symbol
# ........  ........  .......  .............  ......
#


# Samples: 115K of event 'instructions:u'
# Event count (approx.): 522009
#
# Children      Self  Command  Shared Object     Symbol                
# ........  ........  .......  ................  ......................
#
     4.13%     4.13%  uname    libc-2.21.so      [.] 0x0000000000078758
     3.81%     3.81%  uname    libc-2.21.so      [.] 0x0000000000078e50
     2.06%     2.06%  uname    libc-2.21.so      [.] 0x00000000000fcaf4
     1.65%     1.65%  uname    libc-2.21.so      [.] 0x00000000000fcae4
     1.59%     1.59%  uname    ld-2.21.so        [.] 0x000000000000a7f4
     1.50%     1.50%  uname    libc-2.21.so      [.] 0x0000000000078e40
     1.43%     1.43%  uname    libc-2.21.so      [.] 0x00000000000fcac4
     1.31%     1.31%  uname    libc-2.21.so      [.] 0x000000000002f0c0
     1.26%     1.26%  uname    ld-2.21.so        [.] 0x0000000000016888
     1.24%     1.24%  uname    libc-2.21.so      [.] 0x0000000000078e7c 
     1.24%     1.24%  uname    libc-2.21.so      [.] 0x00000000000fcab8
...

Additional data can be obtained, which contains a dump of the trace packets received using the command

mjl@ubuntu-vbox:./perf-opencsd-master/coresight/tools/perf/perf report --stdio --dump

resulting a large amount of data, trace looking like:-

0x618 [0x30]: PERF_RECORD_AUXTRACE size: 0x11ef0  offset: 0  ref: 0x4d881c1f13216016  idx: 0  tid: 15244  cpu: -1

. ... CoreSight ETM Trace data: size 73456 bytes

  0: I_ASYNC : Alignment Synchronisation.
  12: I_TRACE_INFO : Trace Info.
  17: I_TRACE_ON : Trace On.
  18: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F24D80; Ctxt: AArch64,EL0, NS; 
  28: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
  29: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
  30: I_ATOM_F6 : Atom format 6.; EEEEEEEEEEEEEEEEEEEEEEEE
  32: I_ATOM_F6 : Atom format 6.; EEEEN
  33: I_ATOM_F1 : Atom format 1.; E
  34: I_EXCEPT : Exception.;  Data Fault; Ret Addr Follows;
  36: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; 
  45: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; 
  56: I_TRACE_ON : Trace On.
  57: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F2832C; Ctxt: AArch64,EL0, NS; 
  68: I_ATOM_F3 : Atom format 3.; NEE
  69: I_ATOM_F3 : Atom format 3.; NEN
  70: I_ATOM_F3 : Atom format 3.; NNE
  71: I_ATOM_F5 : Atom format 5.; ENENE
  72: I_ATOM_F5 : Atom format 5.; NENEN
  73: I_ATOM_F5 : Atom format 5.; ENENE
  74: I_ATOM_F5 : Atom format 5.; NENEN
  75: I_ATOM_F5 : Atom format 5.; ENENE
  76: I_ATOM_F3 : Atom format 3.; NNE
  77: I_ATOM_F3 : Atom format 3.; NNE
  78: I_ATOM_F3 : Atom format 3.; NNE
  80: I_ATOM_F3 : Atom format 3.; NNE
  81: I_ATOM_F3 : Atom format 3.; ENN
  82: I_EXCEPT : Exception.;  Data Fault; Ret Addr Follows;
  84: I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; 
  93: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0xFFFFFFC000083400; Ctxt: AArch64,EL1, NS; 
  104: I_TRACE_ON : Trace On.
  105: I_ADDR_CTXT_L_64IS0 : Address & Context, Long, 64 bit, IS0.; Addr=0x0000007F89F283F0; Ctxt: AArch64,EL0, NS; 
  116: I_ATOM_F5 : Atom format 5.; NNNNN
  117: I_ATOM_F5 : Atom format 5.; NNNNN

Trace Decoding with Perf Script

Working with perf scripts needs more command line options but yields interesting results.

linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script --script=python:${SCRIPT_PATH}/cs-trace-disasm.py -- -d ${XTOOL_PATH}/aarch64-linux-gnu-objdump

          7f89f24d80:   910003e0        mov     x0, sp
          7f89f24d84:   94000d53        bl      7f89f282d0 <free@plt+0x3790>
          7f89f282d0:   d11203ff        sub     sp, sp, #0x480
          7f89f282d4:   a9ba7bfd        stp     x29, x30, [sp,#-96]!
          7f89f282d8:   910003fd        mov     x29, sp
          7f89f282dc:   a90363f7        stp     x23, x24, [sp,#48]
          7f89f282e0:   9101e3b7        add     x23, x29, #0x78
          7f89f282e4:   a90573fb        stp     x27, x28, [sp,#80]
          7f89f282e8:   a90153f3        stp     x19, x20, [sp,#16]
          7f89f282ec:   aa0003fb        mov     x27, x0
          7f89f282f0:   910a82e1        add     x1, x23, #0x2a0
          7f89f282f4:   a9025bf5        stp     x21, x22, [sp,#32]
          7f89f282f8:   a9046bf9        stp     x25, x26, [sp,#64]
          7f89f282fc:   910102e0        add     x0, x23, #0x40
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>
          7f89f28300:   f800841f        str     xzr, [x0],#8
          7f89f28304:   eb01001f        cmp     x0, x1
          7f89f28308:   54ffffc1        b.ne    7f89f28300 <free@plt+0x37c0>

Kernel Trace Decoding

When dealing with kernel space traces the vmlinux file has to be communicated explicitely to perf using the "--vmlinux" command line option:

linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf report --stdio --vmlinux=./vmlinux
...
...
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf script --vmlinux=./vmlinux

When using scripts things get a little more convoluted. Using the same example an above but for traces but for kernel traces, the command line becomes:

linaro@t430:~/linaro/coresight/sept20$ export EXEC_PATH=/home/linaro/coresight/perf-opencsd-master/tools/perf/
linaro@t430:~/linaro/coresight/sept20$ export SCRIPT_PATH=$EXEC_PATH/scripts/python/
linaro@t430:~/linaro/coresight/sept20$ export XTOOL_PATH=/your/aarch64/toolchain/path/bin/
linaro@t430:~/linaro/coresight/sept20$ ../perf-opencsd-master/tools/perf/perf --exec-path=${EXEC_PATH} script	\
						--vmlinux=./vmlinux					\
						--script=python:${SCRIPT_PATH}/cs-trace-disasm.py --	\
						-d ${XTOOLS_PATH}/aarch64-linux-gnu-objdump		\
						-k ./vmlinux
...
...

The option "--vmlinux=./vmlinux" is interpreted by the "perf script" command the same way it if for "perf report". The option "-k ./vmlinux" is dependant on the script being executed and has no related to the "--vmlinux", though it is highly advised to keep them synchronized.

Perf Test Environment Scripts

The decoder library comes with a number of bash scripts that ease the setting up of the offline build and test environment for perf, and executing tests.

These scripts can be found in

decoder/tests/perf-test-scripts

There are three scripts provided:

  • perf-setup-env.bash : this sets up all the environment variables mentioned above.
  • perf-test-report.bash : this runs perf report - using the environment setup by perf-setup-env.bash
  • perf-test-script.bash : this runs perf script - using the environment setup by perf-setup-env.bash

Use as follows:-

  1. Prior to building perf, edit perf-setup-env.bash to conform to your environment. There are four lines at the top of the file that will require editing.

  2. Execute the script using the command:

     source perf-setup-env.bash
    

    This will set up a perf execute environment for using the perf report and script commands.

    Alternatively use the command:

     source perf-setup-env.base buildenv
    

    This will add in the build environment variables mentioned in the sections on building above alongside the environment for using the used by the perf-test... scripts to run the tests.

  3. Build perf as described above.

  4. Follow the instructions for downloading the test capture, or create a capture from your target.

  5. Copy the perf-test... scripts into the capture data directory -> the one that contains perf.data.

  6. The scripts can now be run. No options are required for the default operation, but any command line options will be added to the perf report / perf script command line.

e.g.

    ./perf-test-report.bash --dump 

will add the --dump option to the end of the command line and run

    ${PERF_EXEC_PATH}/perf report --stdio --dump

Generating coverage files for Feedback Directed Optimization: AutoFDO

See autofdo.md (@ref AutoFDO) for details and scripts.

The Linaro CoreSight Team

  • Mike Leach
  • Mathieu Poirier

One Last Thing

We welcome help on this project. If you would like to add features or help improve the way things work, we want to hear from you.

Best regards, The Linaro CoreSight Team