Updated: July 4, 2022
The title of this tutorial is a bit vague, but I couldn't help myself. Also, I don't care if the "AI" search engines somehow miss it in the sea of dross, it doesn't matter much to me, strawberry Tux forever. So, we're here to talk about Linux troubleshooting and debugging, on how to identify problems with application execution and detect possible bottlenecks.
This isn't the first such tutorial I've written - there's an almost endless series, which you ought to discover by perusing my Linux section. But specifically, you want strace, which is an awesome tool, and you want perf, which is also extremely useful in figuring out where and why your system might be misbehaving. Then, as it turns out, you can combine the power of these two functions by using the somewhat less known perf trace option. To wit, this guide. Begin.
Setup, tracing userspace applications
In order for you to enjoy this tutorial fully, I must tell you, there are no shortcuts. You ought to spend some time reading the basics, to get the hang of Linux system internals. Without those, you won't fully appreciate the topic here. As it stands, this article aims mostly at professionals who have already exhausted their existing arsenal, and need additional ways to figure out what may be wrong with their software.
Perf trace is simply an option of the perf toolbox, and it is executed thusly:
perf trace "options" "application binary path"
There are some small caveats to take into consideration here. Perf works by accessing (reading) the various system counters, exposed under the /sys filesystem. In most cases, this means you need sudo or root privileges to run perf. However, this may not be ideal, as it may mask or change the application execution sequence. Now, perf does include the option to run a program under a specific user (with the -u flag), but again, this may not be perfect.
If you run perf trace, under different conditions, you may see the following errors:
Error: Permission denied.
Hint: Check /proc/sys/kernel/perf_event_paranoid setting.
Hint: For your workloads it needs to be <= 1
Hint: For system wide tracing it needs to be set to -1.
Hint: Try: 'sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"'
Hint: The current value is 2.
Error: No permissions to read
/sys/kernel/tracing/events/raw_syscalls/sys_(enter|exit)
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/tracing/'
The tool even offers suggestions to get around the issue without switching to sudo or root. Indeed, to be able to perform system-wide tracing as the user, you need to run the following command - a one-time change that will be removed after a reboot:
echo -1 > /proc/sys/kernel/perf_event_paranoid
Similarly, to trace raw system calls:
mount -o remount,mode=755 /sys/kernel/tracing/
If this options does not work, then you can simply chmod the path above to 755:
chmod 755 -R /sys/kernel/tracing/*
Please note that this will "somewhat" affect the security baseline of your system during the troubleshooting, so you should reboot after you are done. The system will launch with the /sys permissions reset to defaults, as we have not made any permanent changes.
Perf trace options
There are a lot of useful options. You may trace major and minor pagefaults, add timing to the logfile, indeed set a logfile, and more. As a rule of thumb, the simplest run will probably look like this:
perf trace -T -o "logfile" -s -v "application path"
What do we have here?
We run with the following options in place:
- -T prints the full timestamp of the execution sequence.
- -o specifies the output file, which can be extremely useful to parse information later on.
- -s also provides the summary (per each executed thread) at the end of the log-file, similar to strace -c.
- -v increases the verbosity level of the information printed to standard output and/or the logfile.
You can alternatively narrow down the log info by printing only system calls that failed (-1 exit status), check only specific system calls (-e), hook to a running process and more.
Example run
So, when you run the command with the above options, the logfile will show something like:
992927.525 ( 1.037 ms): 16524 ... [continued]: clone()) = 16525
(matchpathcon)
992928.257 ( ): matchpathcon/16525
execve(filename: 0x2469f0, argv: 0xc0004f7c00, envp: 0xc000184b60) ...
992928.566 ( 0.002 ms): 16524 rt_sigprocmask(how: SETMASK, nset: 0xc00059ef40, sigsetsize: 8) = 0
992928.572 ( 0.003 ms): 16524 close(fd: 13) = 0
992928.579 ( 0.004 ms): 16524 read(fd: 12, buf: 0xc00059f0a8, count: 8) = 0
992928.586 ( 0.002 ms): 16524 close(fd: 12) = 0
992928.601 ( 0.002 ms): 16524 close(fd: 4</dev/null>) = 0
With the -T option in place, the first column will be the absolute timestamp for the execution. If you care about the relative sequence of events, you can then remove it. The second column shows the time it took to complete the system call, in milliseconds. This can also help you identify possible performance issues.
Then, you get the process name or process ID, and the actual system call that was executed, including any flags, options, and the exit status (very similar to strace). You can use this information to identify possible problems. Examples can include: files that do not exist, wrong permissions, or maybe memory locking contention. You do need to be familiar with the actual application code, or at least be able to consult someone who understands the expected functionality behind these system calls, so you can see what gives.
992887.070 ( 0.496 ms): 16509 ... [continued]: newfstatat()) = -1 ENOENT (No such file or directory)
992892.003 ( 0.003 ms): 16513 futex(uaddr: 0xc000100148, op: WAIT|PRIVATE_FLAG) =
-1 EAGAIN (Resource temporarily unavailable)
992892.009 ( 0.056 ms): 16513 nanosleep(rqtp: 0x7fde8acb4b70) = 0
992892.068 ( 0.006 ms): 16513 futex(uaddr: 0xc00006e948, op: WAKE|PRIVATE_FLAG, val: 1) = 1
992892.078 ( 0.006 ms): 16513 futex(uaddr: 0x55cbd282e388, op: WAKE|PRIVATE_FLAG, val: 1) =
1
992892.086 ( 0.025 ms): 16513 futex(uaddr: 0xc000100148, op: WAIT|PRIVATE_FLAG) = 0
992892.132 ( 0.004 ms): 16513 clock_gettime(which_clock: MONOTONIC, tp: 0x7fde8acb4cd0) = 0
992892.139 ( 0.005 ms): 16513 futex(uaddr: 0xc00006e948, op: WAKE|PRIVATE_FLAG, val: 1) = 1
992892.147 ( 0.005 ms): 16513 futex(uaddr: 0x55cbd282e388, op: WAKE|PRIVATE_FLAG, val: 1) = 1
992922.412 ( 0.178 ms): 16511 futex(uaddr: 0x55cbd282f9b0, op: WAIT|PRIVATE_FLAG, utime: 0x7fde8bcf6cc0) = -1 ETIMEDOUT (Connection timed out)
With the summary information included, you will also see:
syscall calls errors
total min
avg max stddev
(msec) (msec)
(msec) (msec) (%)
--------- -------- ------ -------- --------- --------- ---------
------
futex 319
54 2756.595 0.000 8.641
789.740 42.78%
read 124
0 292.409 0.000 2.358
286.699 98.03%
fsync
10 0 15.245
1.395 1.525 1.914 3.34%
clock_gettime 1078 0
4.836 0.002 0.004
0.037 1.62%
newfstatat 477 58
2.640 0.002 0.006
0.037 2.78%
openat 68
0 1.537 0.005
0.023 0.126 12.48%
access 81
27 0.945 0.004
0.012 0.038 6.89%
unlink 11
0 0.617 0.027
0.056 0.090 11.17%
write
76 0 0.479
0.002 0.006 0.065 14.92%
lseek 165
0 0.401 0.001
0.002 0.008 2.95%
madvise 48
0 0.315 0.004
0.007 0.026 8.48%
getdents64 36
0 0.234 0.002
0.007 0.027 16.35%
close
68 0 0.222
0.002 0.003 0.009 6.09%
...
Finally, if you use the -F option, you can check pagefaults:
0.361 ( 0.000 ms): 15925 minfault [_dl_cache_libcmp+0xe] =>
/etc/ld.so.cache@0x11869 (d.)
0.458 ( 0.000 ms): 15925 minfault [memset+0x3a] => /usr/lib64/libc.so.6@0x1f7888 (d.)
0.475 ( 0.000 ms): 15925 minfault [_dl_map_object_from_fd+0x9d6] =>
/usr/lib64/libc.so.6@0x1f5ba0 (d.)
0.484 ( 0.000 ms): 15925 minfault [_dl_map_object_from_fd+0xab4] =>
/usr/lib64/libc.so.6@0x1f5be8 (d.)
0.496 ( 0.000 ms): 15925 minfault [_dl_map_object_from_fd+0xca2] =>
/usr/lib64/libc.so.6@0x318 (d.)
0.506 ( 0.000 ms): 15925 minfault [strcmp+0x1a] => /usr/lib64/libc.so.6@0x1e393 (d.)
0.531 ( 0.000 ms): 15925 minfault [_dl_allocate_tls_storage+0x4a] =>
//anon@0x7f660f658740 (d.)
0.537 ( 0.000 ms): 15925 minfault [_dl_allocate_tls_storage+0x51] =>
//anon@0x7f660f6590f8 (d.)
0.550 ( 0.000 ms): 15925 minfault [_dl_relocate_object+0x63d] =>
/usr/lib64/libc.so.6@0x1f28f8 (d.)
Conclusion
Perf trace is a versatile, flexible powerful option in the perf toolbox, allowing to combine the power of strace with additional capabilities, which give you a multi-dimensional view of your system's behavior. For instance, by default, strace is a bit more difficult to follow and analyze, whereas perf trace helps more easily bucket results per thread, filter out unnecessary information, and conversely, zero in on the specific data you need.
Hopefully, this tutorial is a good reference point for getting started with perf trace. You do need some background knowledge, you need to know how your software behaves, but then you can use it to figure out possible performance bottlenecks, and improve and optimize its execution. Perf may also need some tweaking to run well in userspace, and you have tons of flags and options to narrow down the troubleshooting to what you need. And that would be all for today.
Cheers.