kw kernel debug tools

Introduction

Linux Kernel provides multiple debug methods that can be handy to identify problems and learn how some specific feature works. With this idea in mind, kw tries to hide and standardize the complexity of dealing with the following debug features:

  • Events: Event tracepoints provide real-time information associated with some specific parts of the code. Developers can enable this trace when the target OS is running to collect data. If you want to know more about it, take a look at: https://www.kernel.org/doc/html/latest/trace/events.html

  • Ftrace: Function callback traces can help identify some specific behavior when a target function is invoked. In particular, the function graph filter provides valuable information about the code sequence. If you want to learn more about this feature, take a look at: https://www.kernel.org/doc/html/v5.3/trace/ftrace-uses.html

  • Dmesg: In a few words, it shows the message buffer from the kernel.

In particular, when we dive into events and ftrace we can abstract some common steps required to use these debug tools, which can be summarized as follows:

  1. Clean/Disable debugger:

    echo 0 > /sys/kernel/debug/tracing/tracing_on
    
  2. Setup the target debug:

    echo 'TARGET' > /sys/kernel/debug/tracing/[current_tracer|events...]
    
  3. Fine-tune the debug options:

    echo 'FINE-TUNE' >> /sys/kernel/debug/tracing/[set_ftrace_filter|events/TARGET/filter]
    
  4. Enable the debug:

    echo 1 > /sys/kernel/debug/tracing/tracing_on
    
  5. Collect the data:

    cat /sys/kernel/debug/tracing/trace_pipe
    

As you can see, we have multiple steps, and this situation can be worse when we consider that sometimes we want to collect the debug data from a machine connected to a local network or an external machine. Kw debug tries to hide this complexity under the “debug” option. For detailed information about this feature, check the man page, here we will provide a simple tutorial on how to use it.

Common Options

The debug option provides a standard set of features that can be used with --dmesg, --events, and --ftrace. To avoid explaining the same ideas repeatedly, this section introduces the following shared behavior: --follow, --history, and --cmd.

Follow log (–follow | -f)

If you want to follow the log in real-time, you probably want to use the --follow | -f option, which will keep polling the data and provide you with a live feed of it.

Save (–history | -k)

This option will create a folder named kw_debug in the folder that you run this command and a sub-folder that follows this pattern:

<ID>_<YEAR-MONTH-DAY>

If you use this feature, kw will automatically save your debug log in the folder as described above. This is useful if you want to compare debug when changing some configuration.

Command (–cmd | -c)

This option expects a parameter which is a command to be executed in the target machine. The idea is something like this:

  1. Clean the target log

  2. Start logging

  3. Run the command

  4. Stop logging

Dmesg

Imagine a scenario where you have a bug when you try to load a module, or you have a custom kernel with pr_info() spread around and you want to see that information in the kernel log; you can use the following command:

kw debug --dmesg

However, a better solution could be:

kw debug --dmesg --follow

Or:

kw debug --dmesg --cmd "modprobe amdgpu" --history

Events

Let’s say that you have a complex scenario to debug where you want to follow a specific trace in real-time; in this case, you can use event trace. You can start by listing all available traces:

kw debug --list="events"

You will see something like this:

alarmtimer      bpf_trace       dev             exceptions      fs_dax          hda_intel       initcall        irq_matrix      libata          mmc             netlink         page_pool       random          rpm             skb             syscalls        udp             x86_fpu
amdgpu          cgroup          devfreq         ext4            ftrace          header_event    intel_iommu     irq_vectors     mce             module          nmi             percpu          ras             rseq            smbus           task            vmscan          xdp
amdgpu_dm       clk             devlink         fib             gpio            header_page     iomap           jbd2            mdio            msr             nvme            power           raw_syscalls    rtc             sock            tcp             vsyscall        xen
avc             cma             dma_fence       fib6            gpu_scheduler   huge_memory     iommu           kmem            migrate         napi            oom             printk          rcu             sched           spi             thermal         wbt             xhci-hcd
block           compaction      drm             filelock        hda             hwmon           io_uring        kvm             mmap            neigh           page_isolation  pwm             regmap          scsi            swiotlb         timer           workqueue
bpf_test_run    cpuhp           error_report    filemap         hda_controller  i2c             irq             kvmmmu          mmap_lock       net             pagemap         qdisc           regulator       signal          sync_trace      tlb             writeback

It is up to you to pick up one of the specific events to trace. For this tutorial, let’s say you want to debug something related to amdgpu driver; from the above list, you will notice that we have two options: amdgpu and amdgpu_dm. If you want to know more details about amdgpu_dm, you can use:

kw debug --list="events:amdgpu_dm"

You should see something like this:

Available events for the requested device:
     1  amdgpu_dc_performance
     2  amdgpu_dc_rreg
     3  amdgpu_dc_wreg
     4  amdgpu_dm_atomic_check_begin
     5  amdgpu_dm_atomic_check_finish
     6  amdgpu_dm_atomic_commit_tail_begin
     7  amdgpu_dm_atomic_commit_tail_finish
     8  amdgpu_dm_atomic_state_template
     9  amdgpu_dm_atomic_update_cursor
    10  amdgpu_dm_connector_atomic_check
    11  amdgpu_dm_crtc_atomic_check
    12  amdgpu_dm_dc_clocks_state
    13  amdgpu_dm_dce_clocks_state
    14  amdgpu_dm_dc_pipe_state
    15  amdgpu_dm_plane_atomic_check
    16  amdgpu_dmub_trace_high_irq
    17  amdgpu_refresh_rate_track
    18  dcn_fpu

Notice that you can put any specific event after events:. Finally, let’s say that you want to debug the cursor parameters; you can use:

kw debug --event "amdgpu_dm:amdgpu_dm_atomic_update_cursor" --follow

Note that the basic syntax to use event is something like “TARGET_EVENT:SPECIFIC_EVENT”. However, events also provide a filter mechanism which you can specify to kw by using [], see the below example:

kw debug --event "amdgpu_dm:amdgpu_dm_atomic_update_cursor[crtc_x < 1000]" --follow

For more details, check the man page.

Ftrace

Another powerful and valuable trace is the ftrace option. We have multiple algorithms to be used with this trace, which you can list by using:

kw debug --list="ftrace"

This will show something like this:

1. hwlat
2. blk
3. mmiotrace
4. function_graph
5. wakeup_dl
6. wakeup_rt
7. wakeup
8. function
9. nop

Those are the ftrace algorithms that you can set; function_graph is probably the most common one, and you can use it like this:

kw debug --ftrace="function_graph:amdgpu_dm*" --follow