LinuxLists.cc - help needed in debugging missed wakeup(?)

2020-03-20 17:04:01

Subject: help needed in debugging missed wakeup(?)

Hi sched people,

I'm currently debugging something which looks like a missed wakeup and
involves a rt_mutex (no RT kernel though, this is mainline i2c bus
locking). The issue seems to be a race condition, as we are seeing it
on around 5% population of a 300 devices test field all running the
same load. I was only able to reproduce the issue twice in pretty long
testing sessions on a single device and was able to gather some debug
info, but running out of ideas where to look next. Hopefully someone
around here is able to provide me some fruther clues.

The obvious thing which the kernel lockup detector is able to find is
the sogov (schedutil governor frequency switching task) being blocked:

INFO: task sugov:0:299 blocked for more than 30 seconds.
Not tainted 5.4.22 #3
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
sugov:0 D 0 299 2 0x00000000
Backtrace:
[<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
[<80b30d54>] (schedule) from [<80b34084>] (__rt_mutex_slowlock+0x104/0x17c)
[<80b33f80>] (__rt_mutex_slowlock) from [<80b34254>] (rt_mutex_slowlock+0x158/0x224)
[<80b340fc>] (rt_mutex_slowlock) from [<80b34390>] (rt_mutex_lock_nested+0x70/0x7c)
[<80b34320>] (rt_mutex_lock_nested) from [<807abf20>] (i2c_adapter_lock_bus+0x28/0x2c)
[<807abef8>] (i2c_adapter_lock_bus) from [<807ae924>] (i2c_transfer+0x124/0x13c)
[<807ae800>] (i2c_transfer) from [<807ae994>] (i2c_transfer_buffer_flags+0x58/0x80)
[<807ae93c>] (i2c_transfer_buffer_flags) from [<80681a0c>] (regmap_i2c_write+0x24/0x40)
[<806819e8>] (regmap_i2c_write) from [<8067cccc>] (_regmap_raw_write_impl+0x5d8/0x714)
[<8067c6f4>] (_regmap_raw_write_impl) from [<8067ce98>] (_regmap_bus_raw_write+0x90/0x98)
[<8067ce08>] (_regmap_bus_raw_write) from [<8067c4a0>] (_regmap_write+0x184/0x1cc)
[<8067c31c>] (_regmap_write) from [<8067c5d4>] (_regmap_update_bits+0xec/0xfc)
[<8067c4e8>] (_regmap_update_bits) from [<8067d884>] (regmap_update_bits_base+0x60/0x84)
[<8067d824>] (regmap_update_bits_base) from [<80575468>] (regulator_set_voltage_sel_regmap+0x54/0x90)
[<80575414>] (regulator_set_voltage_sel_regmap) from [<80570130>] (_regulator_call_set_voltage_sel+0x78/0xb4)
[<805700b8>] (_regulator_call_set_voltage_sel) from [<80570784>] (_regulator_do_set_voltage+0x460/0x658)
[<80570324>] (_regulator_do_set_voltage) from [<80573fa4>] (regulator_set_voltage_rdev+0x12c/0x1fc)
[<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
[<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
[<80573d70>] (regulator_set_voltage_unlocked) from [<80573fc8>] (regulator_set_voltage_rdev+0x150/0x1fc)
[<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
[<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
[<80573d70>] (regulator_set_voltage_unlocked) from [<805740c4>] (regulator_set_voltage+0x50/0x84)
[<80574074>] (regulator_set_voltage) from [<807fe00c>] (regulator_set_voltage_tol.constprop.0+0x1c/0x38)
[<807fdff0>] (regulator_set_voltage_tol.constprop.0) from [<807fe37c>] (imx6q_set_target+0x354/0x404)
[<807fe028>] (imx6q_set_target) from [<807f96d8>] (__cpufreq_driver_target+0x254/0x300)
[<807f9484>] (__cpufreq_driver_target) from [<8017ec98>] (sugov_work+0x5c/0x68)
[<8017ec3c>] (sugov_work) from [<80154df4>] (kthread_worker_fn+0x178/0x214)
[<80154c7c>] (kthread_worker_fn) from [<801553d0>] (kthread+0x120/0x130)
[<801552b0>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)

The sugov thread blocks waiting for the i2c_bus rt_mutex, as it needs
to adjust the voltage via a i2c attached PMIC.
The lock holder of the i2c_bus rt_mutex at this time is a user task
reading input data from a temperature sendor on the same i2c bus. This
task does not make any progress, but it also doesn't show up in the
lockup detector output, as it's in RUNNING state. A sysrq-t at some
time after the lockup shows this:

QSGRenderThread R running task 0 559 547 0x00000000
Backtrace:
[<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
[<80b30d54>] (schedule) from [<80b347f8>] (schedule_timeout+0xd0/0x108)
[<80b34728>] (schedule_timeout) from [<807b43dc>] (i2c_imx_trx_complete+0x90/0x13c)
[<807b434c>] (i2c_imx_trx_complete) from [<807b5ab0>] (i2c_imx_xfer+0x938/0xac4)
[<807b5178>] (i2c_imx_xfer) from [<807ae5f0>] (__i2c_transfer+0x5a8/0x7b8)
[<807ae048>] (__i2c_transfer) from [<807ae8ec>] (i2c_transfer+0xec/0x13c)
[<807ae800>] (i2c_transfer) from [<806819b4>] (regmap_i2c_read+0x64/0x98)
[<8067cf80>] (_regmap_raw_read) from [<8067d338>] (_regmap_bus_read+0x4c/0x6c)
[<8067d2ec>] (_regmap_bus_read) from [<8067be70>] (_regmap_read+0x94/0x1e8)
[<8067bddc>] (_regmap_read) from [<8067c014>] (regmap_read+0x50/0x68)
[<8067bfc4>] (regmap_read) from [<807e67bc>] (lm75_read+0x90/0xe0)
[<807e672c>] (lm75_read) from [<807e5908>] (hwmon_attr_show+0x48/0x1cc)
[<807e58c0>] (hwmon_attr_show) from [<80658570>] (dev_attr_show+0x2c/0x50)
[<80658544>] (dev_attr_show) from [<80348414>] (sysfs_kf_seq_show+0x98/0xec)
[<8034837c>] (sysfs_kf_seq_show) from [<80346a2c>] (kernfs_seq_show+0x34/0x38)
[<803469f8>] (kernfs_seq_show) from [<802f9e48>] (seq_read+0x21c/0x454)
[<802f9c2c>] (seq_read) from [<80346e50>] (kernfs_fop_read+0x40/0x1d0)
[<80346e10>] (kernfs_fop_read) from [<802d1f58>] (__vfs_read+0x48/0xf0)
[<802d1f10>] (__vfs_read) from [<802d209c>] (vfs_read+0x9c/0xb8)
[<802d2000>] (vfs_read) from [<802d22c4>] (ksys_read+0x78/0xc4)
[<802d224c>] (ksys_read) from [<802d2328>] (sys_read+0x18/0x1c)
[<802d2310>] (sys_read) from [<80101000>] (ret_fast_syscall+0x0/0x28)

The task isn't making any progress, even though the
wait_event_timeout() in i2c_imx_trx_complete() has a 100ms timeout to
guard against bus lockups, the task is fully stuck there, so it's
unlikely to be a peripheral HW issue. As the task is in RUNNING state I
believe that the event that is waited for has happened, as expected.

The sched state at this time is "interesting" and I'm having a hard
time making any sense out of this. The user task apparently inherited
the DEADLINE priority from the sugov kthread. But while the user task
state shows up as RUNNING the task doesn't show up in the dl_rq
nr_running. Also while cpu#0 shows nr_running as 1, its curr->pid hints
at it executing the idle task, which is at least consitent with the
thread still being stuck in __schedule.

Shortened sched debug below. The machine has 4 cores, I only pasted the
output of the CPUs containing the relevant user task and the sugov
kthread.

Sched Debug Version: v0.11, 5.4.22 #3
ktime : 8167595.022639
sched_clk : 8167600.901887
cpu_clk : 8167600.902554
jiffies : 786762
sysctl_sched
.sysctl_sched_latency : 18.000000
.sysctl_sched_min_granularity : 2.250000
.sysctl_sched_wakeup_granularity : 3.000000
.sysctl_sched_child_runs_first : 0
.sysctl_sched_features : 2059067
.sysctl_sched_tunable_scaling : 1 (logarithmic)
cpu#0
.nr_running : 1
.nr_switches : 2616061
.nr_load_updates : 0
.nr_uninterruptible : 3
.next_balance : 0.786770
.curr->pid : 0
.clock : 8167706.573554
.clock_task : 8167709.714554
.avg_idle : 1000000
.max_idle_balance_cost : 500000
cfs_rq[0]:/
.exec_clock : 0.000000
.MIN_vruntime : 0.000001
.min_vruntime : 170655.365388
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : 0.000000
.nr_spread_over : 0
.nr_running : 0
.load : 0
.runnable_weight : 0
.load_avg : 137
.runnable_load_avg : 0
.util_avg : 111
.util_est_enqueued : 0
.removed.load_avg : 0
.removed.util_avg : 0
.removed.runnable_sum : 0
.tg_load_avg_contrib : 0
.tg_load_avg : 0
rt_rq[0]:
.rt_nr_running : 0
.rt_nr_migratory : 0
.rt_throttled : 0
.rt_time : 15.497666
.rt_runtime : 950.000000
dl_rq[0]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
runnable tasks:
S task PID tree-key switches prio wait-time sum-exec sum-sleep
-----------------------------------------------------------------------------------------------------------
I rcu_gp 3 7.244826 3 100 0.000000 0.202666 0.000000 /
I rcu_par_gp 4 8.838316 3 100 0.000000 0.172000 0.000000 /
I kworker/0:0H 6 6281.741275 12 100 0.000000 1.777333 0.000000 /
I mm_percpu_wq 8 14.640920 3 100 0.000000 0.143000 0.000000 /
S ksoftirqd/0 9 170272.531056 11048 120 0.000000 8117.785661 0.000000 /
I rcu_preempt 10 170647.484054 85503 120 0.000000 10540.749756 0.000000 /
S migration/0 11 0.000000 2052 0 0.000000 169.551366 0.000000 /
I kworker/0:1 12 170492.300722 34025 120 0.000000 5087.014467 0.000000 /
S cpuhp/0 13 3130.355881 9 120 0.000000 0.537667 0.000000 /
S khungtaskd 34 170186.022389 1187 120 0.000000 7649.525998 0.000000 /
I cfg80211 110 2586.210829 4 100 0.000000 0.221000 0.000000 /
S jfsCommit 139 2682.261200 2 120 0.000000 0.127334 0.000000 /
I kworker/0:2 143 13553.725798 3017 120 0.000000 500.353366 0.000000 /
I usbip_event 163 3472.106794 2 100 0.000000 0.216334 0.000000 /
I sdhci 168 3481.262131 2 100 0.000000 0.213334 0.000000 /
S irq/60-mmc0 169 0.000000 7 49 0.000000 1.281667 0.000000 /
S irq/61-mmc1 173 0.000000 7 49 0.000000 0.842335 0.000000 /
S irq/62-mmc2 177 0.000000 4 49 0.000000 0.398334 0.000000 /
I mmc_complete 178 3491.982425 2 100 0.000000 0.192668 0.000000 /
I mmc_complete 182 3507.515908 2 100 0.000000 0.209334 0.000000 /
S irq/97-mma8451 296 0.000000 2 49 0.000000 0.209334 0.000000 /
S 134000.gpu 320 0.000000 2 98 0.000000 0.207334 0.000000 /
S 2204000.gpu 321 0.000000 2 98 0.000000 0.192000 0.000000 /
Sirq/330-!soc!ai 334 0.000000 2 49 0.000000 0.175333 0.000000 /
Sirq/331-!soc!ai 335 0.000000 2 49 0.000000 0.142667 0.000000 /
Sirq/18-imx_ther 338 0.000000 3 49 0.000000 0.300000 0.000000 /
I kworker/u8:4 339 170173.058056 47403 120 0.000000 14864.604657 0.000000 /
I kworker/u9:2 349 170272.470384 49098 100 0.000000 22626.085716 0.000000 /
Ssystemd-journal 371 170655.365388 5176 120 0.000000 8538.471989 0.000000 /
S irq/66-vdoa 448 0.075666 56853 49 0.000000 7462.914997 0.000000 /
Sirq/30-2040000. 449 0.000000 22 49 0.000000 3.228000 0.000000 /
Sirq/31-coda jpe 450 0.081000 3 49 0.000000 0.362001 0.000000 /
I coda 451 6026.818189 2 100 0.000000 0.221000 0.000000 /
I kworker/0:2H 464 13131.123871 178 100 0.000000 74.261336 0.000000 /
Salsa-sink-20280 506 0.000000 328034 94 0.000000 137857.664438 0.000000 /
S ptp4l 512 170588.727054 14587 120 0.000000 12961.910636 0.000000 /
S lldpd 518 161330.930102 413 120 0.000000 235.196326 0.000000 /
S lldpd 523 167124.946732 522 120 0.000000 164.690670 0.000000 /
S QQmlThread 555 13035.420131 633 120 0.000000 693.330648 0.000000 /
Sgldisplay-event 558 13572.855460 1 120 0.000000 0.680333 0.000000 /
RQSGRenderThread 559 0.000000 21984 -1 0.000000 9455.140648 0.000000 /
Sgldisplay-event 560 13625.136454 1 120 0.000000 0.607000 0.000000 /
Sgldisplay-event 561 13634.828782 1 120 0.000000 0.692333 0.000000 /
S typefind:sink 572 14346.726433 205 120 0.000000 84.886339 0.000000 /
Smatroskademux0: 575 27839.889727 8421 120 0.000000 3155.833732 0.000000 /
Smultiqueue1:src 580 27939.450393 7412 120 0.000000 3322.682665 0.000000 /
Smultiqueue2:src 582 64702.132151 36004 120 0.000000 14180.398062 0.000000 /
S threaded-ml 587 170653.935720 806958 120 0.000000 191834.284940 0.000000 /
cpu#1
.nr_running : 0
.nr_switches : 1951962
.nr_load_updates : 0
.nr_uninterruptible : -17
.next_balance : 0.786848
.curr->pid : 0
.clock : 8168488.074888
.clock_task : 8168488.074888
.avg_idle : 1000000
.max_idle_balance_cost : 500000
cfs_rq[1]:/
.exec_clock : 0.000000
.MIN_vruntime : 0.000001
.min_vruntime : 79675.927756
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -91036.566966
.nr_spread_over : 0
.nr_running : 0
.load : 11916
.runnable_weight : 0
.load_avg : 1200
.runnable_load_avg : 0
.util_avg : 51
.util_est_enqueued : 0
.removed.load_avg : 0
.removed.util_avg : 0
.removed.runnable_sum : 0
.tg_load_avg_contrib : 0
.tg_load_avg : 0
rt_rq[1]:
.rt_nr_running : 0
.rt_nr_migratory : 0
.rt_throttled : 0
.rt_time : 36.943999
.rt_runtime : 950.000000
dl_rq[1]:
.dl_nr_running : 0
.dl_nr_migratory : 0
.dl_bw->bw : 996147
.dl_bw->total_bw : 0
runnable tasks:
S task PID tree-key switches prio wait-time sum-exec sum-sleep
-----------------------------------------------------------------------------------------------------------
S systemd 1 78427.950089 7820 120 0.000000 16317.471696 0.000000 /
S cpuhp/1 14 144.194268 8 120 0.000000 1.279665 0.000000 /
S migration/1 15 0.000000 2095 0 0.000000 173.322016 0.000000 /
S ksoftirqd/1 16 78430.110423 827 120 0.000000 220.178683 0.000000 /
I netns 30 5.169486 2 100 0.000000 0.273000 0.000000 /
I writeback 36 14.311131 2 100 0.000000 0.153000 0.000000 /
I kblockd 104 6063.650697 5 100 0.000000 1.485334 0.000000 /
I xprtiod 109 32.881708 2 100 0.000000 0.170001 0.000000 /
S kswapd0 133 144.235935 3 120 0.000000 0.300668 0.000000 /
I nfsiod 134 153.138601 2 100 0.000000 0.157667 0.000000 /
S irq/308-aerdrv 146 0.000000 2 49 0.000000 0.144999 0.000000 /
S scsi_eh_0 147 6187.181146 42 120 0.000000 46.664002 0.000000 /
I rmi4-poller 164 242.126243 2 100 0.000000 0.177001 0.000000 /
I sdhci 172 251.241547 2 100 0.000000 0.170333 0.000000 /
S hwrng 295 1981.228871 19 120 0.000000 6.145667 0.000000 /
D sugov:0 299 0.000000 20836 -1 0.000000 6428.315062 0.000000 /
I kworker/u8:9 355 78399.595765 44983 120 0.000000 13516.948580 0.000000 /
I kworker/1:2H 463 12217.874057 118 100 0.000000 32.701656 0.000000 /
I kworker/1:3H 466 6630.229350 3 100 0.000000 0.192334 0.000000 /
Salsa-sink-202c0 505 0.000000 467719 94 0.000000 501687.028908 0.000000 /
S bluetoothd 483 12196.448363 219 120 0.000000 207.692334 0.000000 /
S NetworkManager 514 78428.249422 1426 120 0.000000 958.963685 0.000000 /
S gmain 516 79661.895205 2911 120 0.000000 1462.902691 0.000000 /
S telnetd 546 13136.934667 201 120 0.000000 128.003662 0.000000 /
SQEvdevTouchScre 553 10397.580038 13 120 0.000000 5.707002 0.000000 /
S typefind:sink 573 13074.246005 159 120 0.000000 99.711000 0.000000 /
Smultiqueue2:src 585 56018.750354 55803 120 0.000000 32116.048930 0.000000 /
S threaded-ml 587 79674.485504 806994 120 0.000000 191843.905943 0.000000 /
S aqueue:src 590 61290.642959 103825 120 0.000000 38918.500285 0.000000 /
S aqueue:src 593 25819.269867 22273 120 0.000000 6675.391368 0.000000 /
I kworker/u9:0 601 61161.845023 23285 100 0.000000 11131.996351 0.000000 /
D kworker/1:2 617 75716.639814 116 120 0.000000 13.313333 0.000000 /
I kworker/1:3 618 79667.860300 9536 120 0.000000 1262.709303 0.000000 /
I kworker/1:1 619 77765.434809 2 120 0.000000 0.189000 0.000000 /

Any ideas on what it happening here or how I could gather better debug
data?

Regards,
Lucas

2020-03-20 18:09:20

by Steven Rostedt

[permalink] [raw]

Subject: Re: help needed in debugging missed wakeup(?)

On Fri, 20 Mar 2020 18:02:32 +0100
Lucas Stach <[email protected]> wrote:

> Hi sched people,
>
> I'm currently debugging something which looks like a missed wakeup and
> involves a rt_mutex (no RT kernel though, this is mainline i2c bus
> locking). The issue seems to be a race condition, as we are seeing it
> on around 5% population of a 300 devices test field all running the
> same load. I was only able to reproduce the issue twice in pretty long
> testing sessions on a single device and was able to gather some debug
> info, but running out of ideas where to look next. Hopefully someone
> around here is able to provide me some fruther clues.
>
> The obvious thing which the kernel lockup detector is able to find is
> the sogov (schedutil governor frequency switching task) being blocked:
>
> INFO: task sugov:0:299 blocked for more than 30 seconds.
> Not tainted 5.4.22 #3
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> sugov:0 D 0 299 2 0x00000000
> Backtrace:
> [<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
> [<80b30d54>] (schedule) from [<80b34084>] (__rt_mutex_slowlock+0x104/0x17c)
> [<80b33f80>] (__rt_mutex_slowlock) from [<80b34254>] (rt_mutex_slowlock+0x158/0x224)
> [<80b340fc>] (rt_mutex_slowlock) from [<80b34390>] (rt_mutex_lock_nested+0x70/0x7c)
> [<80b34320>] (rt_mutex_lock_nested) from [<807abf20>] (i2c_adapter_lock_bus+0x28/0x2c)
> [<807abef8>] (i2c_adapter_lock_bus) from [<807ae924>] (i2c_transfer+0x124/0x13c)
> [<807ae800>] (i2c_transfer) from [<807ae994>] (i2c_transfer_buffer_flags+0x58/0x80)
> [<807ae93c>] (i2c_transfer_buffer_flags) from [<80681a0c>] (regmap_i2c_write+0x24/0x40)
> [<806819e8>] (regmap_i2c_write) from [<8067cccc>] (_regmap_raw_write_impl+0x5d8/0x714)
> [<8067c6f4>] (_regmap_raw_write_impl) from [<8067ce98>] (_regmap_bus_raw_write+0x90/0x98)
> [<8067ce08>] (_regmap_bus_raw_write) from [<8067c4a0>] (_regmap_write+0x184/0x1cc)
> [<8067c31c>] (_regmap_write) from [<8067c5d4>] (_regmap_update_bits+0xec/0xfc)
> [<8067c4e8>] (_regmap_update_bits) from [<8067d884>] (regmap_update_bits_base+0x60/0x84)
> [<8067d824>] (regmap_update_bits_base) from [<80575468>] (regulator_set_voltage_sel_regmap+0x54/0x90)
> [<80575414>] (regulator_set_voltage_sel_regmap) from [<80570130>] (_regulator_call_set_voltage_sel+0x78/0xb4)
> [<805700b8>] (_regulator_call_set_voltage_sel) from [<80570784>] (_regulator_do_set_voltage+0x460/0x658)
> [<80570324>] (_regulator_do_set_voltage) from [<80573fa4>] (regulator_set_voltage_rdev+0x12c/0x1fc)
> [<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
> [<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
> [<80573d70>] (regulator_set_voltage_unlocked) from [<80573fc8>] (regulator_set_voltage_rdev+0x150/0x1fc)
> [<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
> [<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
> [<80573d70>] (regulator_set_voltage_unlocked) from [<805740c4>] (regulator_set_voltage+0x50/0x84)
> [<80574074>] (regulator_set_voltage) from [<807fe00c>] (regulator_set_voltage_tol.constprop.0+0x1c/0x38)
> [<807fdff0>] (regulator_set_voltage_tol.constprop.0) from [<807fe37c>] (imx6q_set_target+0x354/0x404)
> [<807fe028>] (imx6q_set_target) from [<807f96d8>] (__cpufreq_driver_target+0x254/0x300)
> [<807f9484>] (__cpufreq_driver_target) from [<8017ec98>] (sugov_work+0x5c/0x68)
> [<8017ec3c>] (sugov_work) from [<80154df4>] (kthread_worker_fn+0x178/0x214)
> [<80154c7c>] (kthread_worker_fn) from [<801553d0>] (kthread+0x120/0x130)
> [<801552b0>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
>
> The sugov thread blocks waiting for the i2c_bus rt_mutex, as it needs
> to adjust the voltage via a i2c attached PMIC.
> The lock holder of the i2c_bus rt_mutex at this time is a user task
> reading input data from a temperature sendor on the same i2c bus. This
> task does not make any progress, but it also doesn't show up in the
> lockup detector output, as it's in RUNNING state. A sysrq-t at some
> time after the lockup shows this:
>
> QSGRenderThread R running task 0 559 547 0x00000000
> Backtrace:
> [<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
> [<80b30d54>] (schedule) from [<80b347f8>] (schedule_timeout+0xd0/0x108)
> [<80b34728>] (schedule_timeout) from [<807b43dc>] (i2c_imx_trx_complete+0x90/0x13c)
> [<807b434c>] (i2c_imx_trx_complete) from [<807b5ab0>] (i2c_imx_xfer+0x938/0xac4)
> [<807b5178>] (i2c_imx_xfer) from [<807ae5f0>] (__i2c_transfer+0x5a8/0x7b8)
> [<807ae048>] (__i2c_transfer) from [<807ae8ec>] (i2c_transfer+0xec/0x13c)
> [<807ae800>] (i2c_transfer) from [<806819b4>] (regmap_i2c_read+0x64/0x98)
> [<8067cf80>] (_regmap_raw_read) from [<8067d338>] (_regmap_bus_read+0x4c/0x6c)
> [<8067d2ec>] (_regmap_bus_read) from [<8067be70>] (_regmap_read+0x94/0x1e8)
> [<8067bddc>] (_regmap_read) from [<8067c014>] (regmap_read+0x50/0x68)
> [<8067bfc4>] (regmap_read) from [<807e67bc>] (lm75_read+0x90/0xe0)
> [<807e672c>] (lm75_read) from [<807e5908>] (hwmon_attr_show+0x48/0x1cc)
> [<807e58c0>] (hwmon_attr_show) from [<80658570>] (dev_attr_show+0x2c/0x50)
> [<80658544>] (dev_attr_show) from [<80348414>] (sysfs_kf_seq_show+0x98/0xec)
> [<8034837c>] (sysfs_kf_seq_show) from [<80346a2c>] (kernfs_seq_show+0x34/0x38)
> [<803469f8>] (kernfs_seq_show) from [<802f9e48>] (seq_read+0x21c/0x454)
> [<802f9c2c>] (seq_read) from [<80346e50>] (kernfs_fop_read+0x40/0x1d0)
> [<80346e10>] (kernfs_fop_read) from [<802d1f58>] (__vfs_read+0x48/0xf0)
> [<802d1f10>] (__vfs_read) from [<802d209c>] (vfs_read+0x9c/0xb8)
> [<802d2000>] (vfs_read) from [<802d22c4>] (ksys_read+0x78/0xc4)
> [<802d224c>] (ksys_read) from [<802d2328>] (sys_read+0x18/0x1c)
> [<802d2310>] (sys_read) from [<80101000>] (ret_fast_syscall+0x0/0x28)
>
> The task isn't making any progress, even though the
> wait_event_timeout() in i2c_imx_trx_complete() has a 100ms timeout to
> guard against bus lockups, the task is fully stuck there, so it's
> unlikely to be a peripheral HW issue. As the task is in RUNNING state I
> believe that the event that is waited for has happened, as expected.

I'm not so sure about that, especially if it is holding the lock.

>
> The sched state at this time is "interesting" and I'm having a hard
> time making any sense out of this. The user task apparently inherited
> the DEADLINE priority from the sugov kthread. But while the user task
> state shows up as RUNNING the task doesn't show up in the dl_rq
> nr_running. Also while cpu#0 shows nr_running as 1, its curr->pid hints
> at it executing the idle task, which is at least consitent with the
> thread still being stuck in __schedule.
>
> Shortened sched debug below. The machine has 4 cores, I only pasted the
> output of the CPUs containing the relevant user task and the sugov
> kthread.
>

You have it running under SCHED_DEADLINE? That will cause it to run for
only the run time per period, and can cause even more delays.

Perhaps put in some trace_printk()s and see what exactly is happening.

-- Steve

2020-03-20 19:05:56

by Lucas Stach

[permalink] [raw]

Subject: Re: help needed in debugging missed wakeup(?)

Am Freitag, den 20.03.2020, 14:06 -0400 schrieb Steven Rostedt:
> On Fri, 20 Mar 2020 18:02:32 +0100
> Lucas Stach <[email protected]> wrote:
>
> > Hi sched people,
> >
> > I'm currently debugging something which looks like a missed wakeup and
> > involves a rt_mutex (no RT kernel though, this is mainline i2c bus
> > locking). The issue seems to be a race condition, as we are seeing it
> > on around 5% population of a 300 devices test field all running the
> > same load. I was only able to reproduce the issue twice in pretty long
> > testing sessions on a single device and was able to gather some debug
> > info, but running out of ideas where to look next. Hopefully someone
> > around here is able to provide me some fruther clues.
> >
> > The obvious thing which the kernel lockup detector is able to find is
> > the sogov (schedutil governor frequency switching task) being blocked:
> >
> > INFO: task sugov:0:299 blocked for more than 30 seconds.
> > Not tainted 5.4.22 #3
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > sugov:0 D 0 299 2 0x00000000
> > Backtrace:
> > [<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
> > [<80b30d54>] (schedule) from [<80b34084>] (__rt_mutex_slowlock+0x104/0x17c)
> > [<80b33f80>] (__rt_mutex_slowlock) from [<80b34254>] (rt_mutex_slowlock+0x158/0x224)
> > [<80b340fc>] (rt_mutex_slowlock) from [<80b34390>] (rt_mutex_lock_nested+0x70/0x7c)
> > [<80b34320>] (rt_mutex_lock_nested) from [<807abf20>] (i2c_adapter_lock_bus+0x28/0x2c)
> > [<807abef8>] (i2c_adapter_lock_bus) from [<807ae924>] (i2c_transfer+0x124/0x13c)
> > [<807ae800>] (i2c_transfer) from [<807ae994>] (i2c_transfer_buffer_flags+0x58/0x80)
> > [<807ae93c>] (i2c_transfer_buffer_flags) from [<80681a0c>] (regmap_i2c_write+0x24/0x40)
> > [<806819e8>] (regmap_i2c_write) from [<8067cccc>] (_regmap_raw_write_impl+0x5d8/0x714)
> > [<8067c6f4>] (_regmap_raw_write_impl) from [<8067ce98>] (_regmap_bus_raw_write+0x90/0x98)
> > [<8067ce08>] (_regmap_bus_raw_write) from [<8067c4a0>] (_regmap_write+0x184/0x1cc)
> > [<8067c31c>] (_regmap_write) from [<8067c5d4>] (_regmap_update_bits+0xec/0xfc)
> > [<8067c4e8>] (_regmap_update_bits) from [<8067d884>] (regmap_update_bits_base+0x60/0x84)
> > [<8067d824>] (regmap_update_bits_base) from [<80575468>] (regulator_set_voltage_sel_regmap+0x54/0x90)
> > [<80575414>] (regulator_set_voltage_sel_regmap) from [<80570130>] (_regulator_call_set_voltage_sel+0x78/0xb4)
> > [<805700b8>] (_regulator_call_set_voltage_sel) from [<80570784>] (_regulator_do_set_voltage+0x460/0x658)
> > [<80570324>] (_regulator_do_set_voltage) from [<80573fa4>] (regulator_set_voltage_rdev+0x12c/0x1fc)
> > [<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
> > [<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
> > [<80573d70>] (regulator_set_voltage_unlocked) from [<80573fc8>] (regulator_set_voltage_rdev+0x150/0x1fc)
> > [<80573e78>] (regulator_set_voltage_rdev) from [<80571dec>] (regulator_balance_voltage+0xf8/0x498)
> > [<80571cf4>] (regulator_balance_voltage) from [<80573e64>] (regulator_set_voltage_unlocked+0xf4/0x108)
> > [<80573d70>] (regulator_set_voltage_unlocked) from [<805740c4>] (regulator_set_voltage+0x50/0x84)
> > [<80574074>] (regulator_set_voltage) from [<807fe00c>] (regulator_set_voltage_tol.constprop.0+0x1c/0x38)
> > [<807fdff0>] (regulator_set_voltage_tol.constprop.0) from [<807fe37c>] (imx6q_set_target+0x354/0x404)
> > [<807fe028>] (imx6q_set_target) from [<807f96d8>] (__cpufreq_driver_target+0x254/0x300)
> > [<807f9484>] (__cpufreq_driver_target) from [<8017ec98>] (sugov_work+0x5c/0x68)
> > [<8017ec3c>] (sugov_work) from [<80154df4>] (kthread_worker_fn+0x178/0x214)
> > [<80154c7c>] (kthread_worker_fn) from [<801553d0>] (kthread+0x120/0x130)
> > [<801552b0>] (kthread) from [<801010b4>] (ret_from_fork+0x14/0x20)
> >
> > The sugov thread blocks waiting for the i2c_bus rt_mutex, as it needs
> > to adjust the voltage via a i2c attached PMIC.
> > The lock holder of the i2c_bus rt_mutex at this time is a user task
> > reading input data from a temperature sendor on the same i2c bus. This
> > task does not make any progress, but it also doesn't show up in the
> > lockup detector output, as it's in RUNNING state. A sysrq-t at some
> > time after the lockup shows this:
> >
> > QSGRenderThread R running task 0 559 547 0x00000000
> > Backtrace:
> > [<80b306f8>] (__schedule) from [<80b30df8>] (schedule+0xa4/0xdc)
> > [<80b30d54>] (schedule) from [<80b347f8>] (schedule_timeout+0xd0/0x108)
> > [<80b34728>] (schedule_timeout) from [<807b43dc>] (i2c_imx_trx_complete+0x90/0x13c)
> > [<807b434c>] (i2c_imx_trx_complete) from [<807b5ab0>] (i2c_imx_xfer+0x938/0xac4)
> > [<807b5178>] (i2c_imx_xfer) from [<807ae5f0>] (__i2c_transfer+0x5a8/0x7b8)
> > [<807ae048>] (__i2c_transfer) from [<807ae8ec>] (i2c_transfer+0xec/0x13c)
> > [<807ae800>] (i2c_transfer) from [<806819b4>] (regmap_i2c_read+0x64/0x98)
> > [<8067cf80>] (_regmap_raw_read) from [<8067d338>] (_regmap_bus_read+0x4c/0x6c)
> > [<8067d2ec>] (_regmap_bus_read) from [<8067be70>] (_regmap_read+0x94/0x1e8)
> > [<8067bddc>] (_regmap_read) from [<8067c014>] (regmap_read+0x50/0x68)
> > [<8067bfc4>] (regmap_read) from [<807e67bc>] (lm75_read+0x90/0xe0)
> > [<807e672c>] (lm75_read) from [<807e5908>] (hwmon_attr_show+0x48/0x1cc)
> > [<807e58c0>] (hwmon_attr_show) from [<80658570>] (dev_attr_show+0x2c/0x50)
> > [<80658544>] (dev_attr_show) from [<80348414>] (sysfs_kf_seq_show+0x98/0xec)
> > [<8034837c>] (sysfs_kf_seq_show) from [<80346a2c>] (kernfs_seq_show+0x34/0x38)
> > [<803469f8>] (kernfs_seq_show) from [<802f9e48>] (seq_read+0x21c/0x454)
> > [<802f9c2c>] (seq_read) from [<80346e50>] (kernfs_fop_read+0x40/0x1d0)
> > [<80346e10>] (kernfs_fop_read) from [<802d1f58>] (__vfs_read+0x48/0xf0)
> > [<802d1f10>] (__vfs_read) from [<802d209c>] (vfs_read+0x9c/0xb8)
> > [<802d2000>] (vfs_read) from [<802d22c4>] (ksys_read+0x78/0xc4)
> > [<802d224c>] (ksys_read) from [<802d2328>] (sys_read+0x18/0x1c)
> > [<802d2310>] (sys_read) from [<80101000>] (ret_fast_syscall+0x0/0x28)
> >
> > The task isn't making any progress, even though the
> > wait_event_timeout() in i2c_imx_trx_complete() has a 100ms timeout to
> > guard against bus lockups, the task is fully stuck there, so it's
> > unlikely to be a peripheral HW issue. As the task is in RUNNING state I
> > believe that the event that is waited for has happened, as expected.
>
> I'm not so sure about that, especially if it is holding the lock.

I'll take this as an opportunity to learn something: what would cause
the task state to be set to running other than a the waited for event
or the timeout?

> > The sched state at this time is "interesting" and I'm having a hard
> > time making any sense out of this. The user task apparently inherited
> > the DEADLINE priority from the sugov kthread. But while the user task
> > state shows up as RUNNING the task doesn't show up in the dl_rq
> > nr_running. Also while cpu#0 shows nr_running as 1, its curr->pid hints
> > at it executing the idle task, which is at least consitent with the
> > thread still being stuck in __schedule.
> >
> > Shortened sched debug below. The machine has 4 cores, I only pasted the
> > output of the CPUs containing the relevant user task and the sugov
> > kthread.
> >
>
> You have it running under SCHED_DEADLINE? That will cause it to run for
> only the run time per period, and can cause even more delays.

That's the bog standard mainline kernel behavior for the sugov kthread.
The comments in the code say this is done to be able to preempt all
other tasks as to not delay the frequency switching. There's quite a
bit of special casing of the sugov task with SCHED_FLAG_SUGOV in the
deadline scheduler code. This flag isn't carried over to the boosted
thread, maybe that's an issue?

But even if we hit throttling I would have expected the situation to
clear after some time. That's not what I'm seeing: once the user task
is stuck, it's stuck for hours. I would say indefinitely, but it's hard
to empirically prove that. ;)

> Perhaps put in some trace_printk()s and see what exactly is happening.

My hope was that someone would could guide me a little with an educated
guess, to make those trace_printks more helpful. The lockup is hard to
trigger and there's other unrelated system activity going on, even when
the relevant user task is stuck, so randomly sprinkling debug
statements will generate lots of output to filter down.

Regards,
Lucas

2020-03-23 09:30:16

by Juri Lelli

[permalink] [raw]

Subject: Re: help needed in debugging missed wakeup(?)

Hi,

On 20/03/20 20:04, Lucas Stach wrote:

[...]

> My hope was that someone would could guide me a little with an educated
> guess, to make those trace_printks more helpful. The lockup is hard to
> trigger and there's other unrelated system activity going on, even when
> the relevant user task is stuck, so randomly sprinkling debug
> statements will generate lots of output to filter down.

I'd start by enabling sched:sched_pi_setprio (or adding trace_printks
around there to confirm your suspicion). Once that is confirmed, I'd try
to look at update_curr_dl::throttle and verify if the misbehaving task
was indeed throttled at some point; it shouldn't be, because dl_boosted
flag should be set for it at this point (maybe check this condition as
well).

Best,

Juri