Extracting system metrics from kernel traceFrancis Giraldeau
Important Linux kernel subsystems are staticaly instrumented with tracepoints, allowing to gather detailed information about a running system, such as process scheduling, system calls and memory management. Each time a tracepoint is encountered, an event is generated and can be recorded to disk for offline analysis. Kernel tracing provides system wide instrumentation that has low performance impact on the running system, suitable to trace online systems in order to debug hard to reproduce errors or analyze the performance.
Despite those benefits, analyzing a kernel trace may be difficult because of the large number of events. Moreover, trace events expose low level behavior of the kernel that require deep understanding of kernel internals to analyze. In many cases, the meaning of an event may depend on previous events. To get valuable information from a kernel trace, fast and reliable analysis tools are required.
In this paper, we present an open source kernel trace analyzer to provide familiar and meaningful metrics to users, including CPU, disk, file and network usage on per process basis. It leverages kernel traces for performance optimization and debugging. This tool can display these metrics live and is easily extensible. A special emphasis was put on scalability given the large number of events in a detailed system trace.