If a broadcast timer is registered after the system switched to oneshot
mode, a hang_task err could occur like that:
INFO: task kworker/u15:0:7 blocked for more than 120 seconds.
Tainted: G E 5.15.41-android13-8-00002-xxx #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u16:0 state:D stack: 9808 pid: 7 ppid: 2 flags:0x00000008
Workqueue: events_unbound deferred_probe_work_func.cfi_jt
Call trace:
__switch_to+0y240/0x490
__schedule+0x620/0xafc
schedule+0x110/0x204
schedule_hrtimeout_range_clock+0x9c/0x118
usleep_range_state+0x150/0x1ac
_regulator_do_enable+0x528/0x878
set_machine_constraints+0x6a0/0xf2c
regulator_register+0x3ac/0x7ac
devm_regulator_register+0xbc/0x120
pmu_ext_regulator_probe+0xb0/0x1b4 [pmu_ext_regulator]
platform_probe+0x70/0x194
really_proe+0x320/0x68c
__driver_probe_device+0x204/0x260
driver_probe_device+0x48/0x1e0
When the new broadcast timer was registered after the system switched
to oneshot mode, the broadcast timer was not used as periodic. If the
oneshot mask was set incorrectly, all cores which did not enter cpu_idle
state can't enter cpu_idle normally, causing the hrtimer mechanism to
break.
This patch fixes the issue by moving the update action about oneshot
mask to a more strict conditions. The tick_broadcast_setup_oneshot would
be called in two typical condition, and they all will work.
1. tick_handle_periodic -> tick_broadcast_setup_oneshot
The origin broadcast was periodic, so it can set the oneshot_mask bits
for those waiting for periodic broadcast and program the broadcast timer
to fire.
2. tick_install_broadcast_device -> tick_broadcast_setup_oneshot
The origin broadcast was oneshot, so the cores which enter the cpu_idle
already used the oneshot_mask bits. It is unnecessary to update the
oneshot_mask.
Fixes: 9c336c9935cf ("tick/broadcast: Allow late registered device to enter oneshot mode")
Signed-off-by: Victor Hassan <[email protected]>
---
kernel/time/tick-broadcast.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 93bf2b4e47e5..fdbbba487978 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1041,12 +1041,13 @@ static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
*/
cpumask_copy(tmpmask, tick_broadcast_mask);
cpumask_clear_cpu(cpu, tmpmask);
- cpumask_or(tick_broadcast_oneshot_mask,
- tick_broadcast_oneshot_mask, tmpmask);
if (was_periodic && !cpumask_empty(tmpmask)) {
ktime_t nextevt = tick_get_next_period();
+ cpumask_or(tick_broadcast_oneshot_mask,
+ tick_broadcast_oneshot_mask, tmpmask);
+
clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);
tick_broadcast_init_next_event(tmpmask, nextevt);
tick_broadcast_set_event(bc, cpu, nextevt);
--
2.29.0
On 3/28/2023 2:36 PM, Victor Hassan wrote:
> If a broadcast timer is registered after the system switched to oneshot
> mode, a hang_task err could occur like that:
>
> INFO: task kworker/u15:0:7 blocked for more than 120 seconds.
> Tainted: G E 5.15.41-android13-8-00002-xxx #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:0 state:D stack: 9808 pid: 7 ppid: 2 flags:0x00000008
> Workqueue: events_unbound deferred_probe_work_func.cfi_jt
> Call trace:
> __switch_to+0y240/0x490
> __schedule+0x620/0xafc
> schedule+0x110/0x204
> schedule_hrtimeout_range_clock+0x9c/0x118
> usleep_range_state+0x150/0x1ac
> _regulator_do_enable+0x528/0x878
> set_machine_constraints+0x6a0/0xf2c
> regulator_register+0x3ac/0x7ac
> devm_regulator_register+0xbc/0x120
> pmu_ext_regulator_probe+0xb0/0x1b4 [pmu_ext_regulator]
> platform_probe+0x70/0x194
> really_proe+0x320/0x68c
> __driver_probe_device+0x204/0x260
> driver_probe_device+0x48/0x1e0
>
> When the new broadcast timer was registered after the system switched
> to oneshot mode, the broadcast timer was not used as periodic. If the
> oneshot mask was set incorrectly, all cores which did not enter cpu_idle
> state can't enter cpu_idle normally, causing the hrtimer mechanism to
> break.
>
> This patch fixes the issue by moving the update action about oneshot
> mask to a more strict conditions. The tick_broadcast_setup_oneshot would
> be called in two typical condition, and they all will work.
>
> 1. tick_handle_periodic -> tick_broadcast_setup_oneshot
>
> The origin broadcast was periodic, so it can set the oneshot_mask bits
> for those waiting for periodic broadcast and program the broadcast timer
> to fire.
>
> 2. tick_install_broadcast_device -> tick_broadcast_setup_oneshot
>
> The origin broadcast was oneshot, so the cores which enter the cpu_idle
> already used the oneshot_mask bits. It is unnecessary to update the
> oneshot_mask.
>
> Fixes: 9c336c9935cf ("tick/broadcast: Allow late registered device to enter oneshot mode")
>
> Signed-off-by: Victor Hassan <[email protected]>
> ---
> kernel/time/tick-broadcast.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 93bf2b4e47e5..fdbbba487978 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1041,12 +1041,13 @@ static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
> */
> cpumask_copy(tmpmask, tick_broadcast_mask);
> cpumask_clear_cpu(cpu, tmpmask);
> - cpumask_or(tick_broadcast_oneshot_mask,
> - tick_broadcast_oneshot_mask, tmpmask);
>
> if (was_periodic && !cpumask_empty(tmpmask)) {
> ktime_t nextevt = tick_get_next_period();
>
> + cpumask_or(tick_broadcast_oneshot_mask,
> + tick_broadcast_oneshot_mask, tmpmask);
> +
> clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);
> tick_broadcast_init_next_event(tmpmask, nextevt);
> tick_broadcast_set_event(bc, cpu, nextevt);
Friendly ping.
On Tue, Mar 28, 2023 at 02:36:29PM +0800, Victor Hassan wrote:
> If a broadcast timer is registered after the system switched to oneshot
> mode, a hang_task err could occur like that:
>
> INFO: task kworker/u15:0:7 blocked for more than 120 seconds.
> Tainted: G E 5.15.41-android13-8-00002-xxx #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/u16:0 state:D stack: 9808 pid: 7 ppid: 2 flags:0x00000008
> Workqueue: events_unbound deferred_probe_work_func.cfi_jt
> Call trace:
> __switch_to+0y240/0x490
> __schedule+0x620/0xafc
> schedule+0x110/0x204
> schedule_hrtimeout_range_clock+0x9c/0x118
> usleep_range_state+0x150/0x1ac
> _regulator_do_enable+0x528/0x878
> set_machine_constraints+0x6a0/0xf2c
> regulator_register+0x3ac/0x7ac
> devm_regulator_register+0xbc/0x120
> pmu_ext_regulator_probe+0xb0/0x1b4 [pmu_ext_regulator]
> platform_probe+0x70/0x194
> really_proe+0x320/0x68c
> __driver_probe_device+0x204/0x260
> driver_probe_device+0x48/0x1e0
>
> When the new broadcast timer was registered after the system switched
> to oneshot mode, the broadcast timer was not used as periodic. If the
> oneshot mask was set incorrectly, all cores which did not enter cpu_idle
> state can't enter cpu_idle normally, causing the hrtimer mechanism to
> break.
>
> This patch fixes the issue by moving the update action about oneshot
> mask to a more strict conditions. The tick_broadcast_setup_oneshot would
> be called in two typical condition, and they all will work.
>
> 1. tick_handle_periodic -> tick_broadcast_setup_oneshot
>
> The origin broadcast was periodic, so it can set the oneshot_mask bits
> for those waiting for periodic broadcast and program the broadcast timer
> to fire.
>
> 2. tick_install_broadcast_device -> tick_broadcast_setup_oneshot
>
> The origin broadcast was oneshot, so the cores which enter the cpu_idle
> already used the oneshot_mask bits. It is unnecessary to update the
> oneshot_mask.
>
> Fixes: 9c336c9935cf ("tick/broadcast: Allow late registered device to enter oneshot mode")
>
> Signed-off-by: Victor Hassan <[email protected]>
> ---
> kernel/time/tick-broadcast.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
> index 93bf2b4e47e5..fdbbba487978 100644
> --- a/kernel/time/tick-broadcast.c
> +++ b/kernel/time/tick-broadcast.c
> @@ -1041,12 +1041,13 @@ static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
> */
> cpumask_copy(tmpmask, tick_broadcast_mask);
> cpumask_clear_cpu(cpu, tmpmask);
> - cpumask_or(tick_broadcast_oneshot_mask,
> - tick_broadcast_oneshot_mask, tmpmask);
>
> if (was_periodic && !cpumask_empty(tmpmask)) {
> ktime_t nextevt = tick_get_next_period();
>
> + cpumask_or(tick_broadcast_oneshot_mask,
> + tick_broadcast_oneshot_mask, tmpmask);
> +
Good catch, it looks like one issue that can trigger is due to the resulting
ignored calls to tick_broadcast_exit(). Indeed if the cpu is already in
tick_broadcast_oneshot_mask then cpuidle won't call the exit.
Leading to such race:
* CPU 1 stop its tick, next event is in one hour
* CPU 0 registers new broadcast and sets CPU 1 in tick_broadcast_oneshot_mask
* CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
the CPU is already in tick_broadcast_oneshot_mask
* CPU 1 goes to sleep
* CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
is in one hour, program the broadcast to that deadline
* CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
* CPU 1 don't call tick_broadcast_exit and thus don't remove itself from
tick_broadcast_oneshot_mask
* CPU 1 re-enters in cpuidle_enter_state(), tick_broadcast_enter() is again
ignored so the new timer isn't propagated to the broadcast.
* CPU 1 goes to sleep and won't be woken before one hour.
Reviewed-by: Frederic Weisbecker <[email protected]>
Thanks.
> clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);
> tick_broadcast_init_next_event(tmpmask, nextevt);
> tick_broadcast_set_event(bc, cpu, nextevt);
> --
> 2.29.0
>
On 4/3/2023 6:26 PM, Frederic Weisbecker wrote:
> On Tue, Mar 28, 2023 at 02:36:29PM +0800, Victor Hassan wrote:
>> If a broadcast timer is registered after the system switched to oneshot
>> mode, a hang_task err could occur like that:
>>
>> INFO: task kworker/u15:0:7 blocked for more than 120 seconds.
>> Tainted: G E 5.15.41-android13-8-00002-xxx #1
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> task:kworker/u16:0 state:D stack: 9808 pid: 7 ppid: 2 flags:0x00000008
>> Workqueue: events_unbound deferred_probe_work_func.cfi_jt
>> Call trace:
>> __switch_to+0y240/0x490
>> __schedule+0x620/0xafc
>> schedule+0x110/0x204
>> schedule_hrtimeout_range_clock+0x9c/0x118
>> usleep_range_state+0x150/0x1ac
>> _regulator_do_enable+0x528/0x878
>> set_machine_constraints+0x6a0/0xf2c
>> regulator_register+0x3ac/0x7ac
>> devm_regulator_register+0xbc/0x120
>> pmu_ext_regulator_probe+0xb0/0x1b4 [pmu_ext_regulator]
>> platform_probe+0x70/0x194
>> really_proe+0x320/0x68c
>> __driver_probe_device+0x204/0x260
>> driver_probe_device+0x48/0x1e0
>>
>> When the new broadcast timer was registered after the system switched
>> to oneshot mode, the broadcast timer was not used as periodic. If the
>> oneshot mask was set incorrectly, all cores which did not enter cpu_idle
>> state can't enter cpu_idle normally, causing the hrtimer mechanism to
>> break.
>>
>> This patch fixes the issue by moving the update action about oneshot
>> mask to a more strict conditions. The tick_broadcast_setup_oneshot would
>> be called in two typical condition, and they all will work.
>>
>> 1. tick_handle_periodic -> tick_broadcast_setup_oneshot
>>
>> The origin broadcast was periodic, so it can set the oneshot_mask bits
>> for those waiting for periodic broadcast and program the broadcast timer
>> to fire.
>>
>> 2. tick_install_broadcast_device -> tick_broadcast_setup_oneshot
>>
>> The origin broadcast was oneshot, so the cores which enter the cpu_idle
>> already used the oneshot_mask bits. It is unnecessary to update the
>> oneshot_mask.
>>
>> Fixes: 9c336c9935cf ("tick/broadcast: Allow late registered device to enter oneshot mode")
>>
>> Signed-off-by: Victor Hassan <[email protected]>
>> ---
>> kernel/time/tick-broadcast.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
>> index 93bf2b4e47e5..fdbbba487978 100644
>> --- a/kernel/time/tick-broadcast.c
>> +++ b/kernel/time/tick-broadcast.c
>> @@ -1041,12 +1041,13 @@ static void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
>> */
>> cpumask_copy(tmpmask, tick_broadcast_mask);
>> cpumask_clear_cpu(cpu, tmpmask);
>> - cpumask_or(tick_broadcast_oneshot_mask,
>> - tick_broadcast_oneshot_mask, tmpmask);
>>
>> if (was_periodic && !cpumask_empty(tmpmask)) {
>> ktime_t nextevt = tick_get_next_period();
>>
>> + cpumask_or(tick_broadcast_oneshot_mask,
>> + tick_broadcast_oneshot_mask, tmpmask);
>> +
>
> Good catch, it looks like one issue that can trigger is due to the resulting
> ignored calls to tick_broadcast_exit(). Indeed if the cpu is already in
> tick_broadcast_oneshot_mask then cpuidle won't call the exit
Hi Frederic,
Thanks for the review. Still I have a few questions and would like to
discuss them further.
>
> Leading to such race:
>
> * CPU 1 stop its tick, next event is in one hour
> * CPU 0 registers new broadcast and sets CPU 1 in tick_broadcast_oneshot_mask
> * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
> the CPU is already in tick_broadcast_oneshot_mask
Yes.
> * CPU 1 goes to sleep
> * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
> is in one hour, program the broadcast to that deadline
> * CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
> * CPU 1 don't call tick_broadcast_exit and thus don't remove itself from
> tick_broadcast_oneshot_mask
I'm not sure about this... Actually, I believe CPU 1 *will* call
tick_broadcast_exit in this condition because I cannot find a limitation
on this execution path.
> * CPU 1 re-enters in cpuidle_enter_state(), tick_broadcast_enter() is again
> ignored so the new timer isn't propagated to the broadcast.
> * CPU 1 goes to sleep and won't be woken before one hour.
>
>
> Reviewed-by: Frederic Weisbecker <[email protected]>
>
>
> Thanks.
>
>
>
>
>> clockevents_switch_state(bc, CLOCK_EVT_STATE_ONESHOT);
>> tick_broadcast_init_next_event(tmpmask, nextevt);
>> tick_broadcast_set_event(bc, cpu, nextevt);
>> --
>> 2.29.0
>>
On Tue, Apr 04, 2023 at 07:37:06PM +0800, Victor Hassan wrote:
> >
> > Leading to such race:
> >
> > * CPU 1 stop its tick, next event is in one hour
> > * CPU 0 registers new broadcast and sets CPU 1 in tick_broadcast_oneshot_mask
> > * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
> > the CPU is already in tick_broadcast_oneshot_mask
>
> Yes.
>
> > * CPU 1 goes to sleep
> > * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
> > is in one hour, program the broadcast to that deadline
> > * CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
> > * CPU 1 don't call tick_broadcast_exit and thus don't remove itself from
> > tick_broadcast_oneshot_mask
>
> I'm not sure about this... Actually, I believe CPU 1 *will* call
> tick_broadcast_exit in this condition because I cannot find a limitation on
> this execution path.
You're right, what I wrote doesn't make sense. Let me try again:
* CPU 1 stop its tick, next event is in one hour. It calls
tick_broadcast_enter() and goes to sleep.
* CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
(note it's not yet actually programmed in the tick device)
* CPU 1 call tick_broadcast_exit().
* CPU 0 registers new broadcast device and sets CPU 1 in tick_broadcast_oneshot_mask
* CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
is in one hour (because the recently enqueued timer for CPU 1 hasn't been programmed
yet), so it programs the broadcast to that 1 hour deadline.
* CPU 1 runs tick_nohz_idle_stop_tick() which eventually writes and program
dev->next_event to next jiffy
* CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
the CPU is already in tick_broadcast_oneshot_mask, so the dev->next_event
change isn't propagated to broadcast.
* CPU 1 goes to sleep for 1 hour.
Does it make more sense? There might be more simple scenario of course.
Thanks.
On 4/4/2023 8:21 PM, Frederic Weisbecker wrote:
> On Tue, Apr 04, 2023 at 07:37:06PM +0800, Victor Hassan wrote:
>>>
>>> Leading to such race:
>>>
>>> * CPU 1 stop its tick, next event is in one hour
>>> * CPU 0 registers new broadcast and sets CPU 1 in tick_broadcast_oneshot_mask
>>> * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
>>> the CPU is already in tick_broadcast_oneshot_mask
>>
>> Yes.
>>
>>> * CPU 1 goes to sleep
>>> * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
>>> is in one hour, program the broadcast to that deadline
>>> * CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
>>> * CPU 1 don't call tick_broadcast_exit and thus don't remove itself from
>>> tick_broadcast_oneshot_mask
>>
>> I'm not sure about this... Actually, I believe CPU 1 *will* call
>> tick_broadcast_exit in this condition because I cannot find a limitation on
>> this execution path.
>
> You're right, what I wrote doesn't make sense. Let me try again:
>
> * CPU 1 stop its tick, next event is in one hour. It calls
> tick_broadcast_enter() and goes to sleep.
>
> * CPU 1 gets an interrupt that enqueues a new timer expiring in the next jiffy
> (note it's not yet actually programmed in the tick device)
>
> * CPU 1 call tick_broadcast_exit().
>
> * CPU 0 registers new broadcast device and sets CPU 1 in tick_broadcast_oneshot_mask
>
> * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
> is in one hour (because the recently enqueued timer for CPU 1 hasn't been programmed
> yet), so it programs the broadcast to that 1 hour deadline.
>
> * CPU 1 runs tick_nohz_idle_stop_tick() which eventually writes and program
> dev->next_event to next jiffy
>
> * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is ignored because
> the CPU is already in tick_broadcast_oneshot_mask, so the dev->next_event
> change isn't propagated to broadcast.
>
> * CPU 1 goes to sleep for 1 hour.
Hi Frederic,
Yes, I think that make sense :)
>
> Does it make more sense? There might be more simple scenario of course.
>
> Thanks.
On 4/7/2023 2:51 PM, Victor Hassan wrote:
>
>
> On 4/4/2023 8:21 PM, Frederic Weisbecker wrote:
>> On Tue, Apr 04, 2023 at 07:37:06PM +0800, Victor Hassan wrote:
>>>>
>>>> Leading to such race:
>>>>
>>>> * CPU 1 stop its tick, next event is in one hour
>>>> * CPU 0 registers new broadcast and sets CPU 1 in
>>>> tick_broadcast_oneshot_mask
>>>> * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter()
>>>> is ignored because
>>>> the CPU is already in tick_broadcast_oneshot_mask
>>>
>>> Yes.
>>>
>>>> * CPU 1 goes to sleep
>>>> * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
>>>> is in one hour, program the broadcast to that deadline
>>>> * CPU 1 gets an interrupt that enqueues a new timer expiring in the
>>>> next jiffy
>>>> * CPU 1 don't call tick_broadcast_exit and thus don't remove itself
>>>> from
>>>> tick_broadcast_oneshot_mask
>>>
>>> I'm not sure about this... Actually, I believe CPU 1 *will* call
>>> tick_broadcast_exit in this condition because I cannot find a
>>> limitation on
>>> this execution path.
>>
>> You're right, what I wrote doesn't make sense. Let me try again:
>>
>> * CPU 1 stop its tick, next event is in one hour. It calls
>> tick_broadcast_enter() and goes to sleep.
>> * CPU 1 gets an interrupt that enqueues a new timer expiring in the
>> next jiffy
>> (note it's not yet actually programmed in the tick device)
>> * CPU 1 call tick_broadcast_exit().
>>
>> * CPU 0 registers new broadcast device and sets CPU 1 in
>> tick_broadcast_oneshot_mask
>>
>> * CPU 0 runs the broadcast callback, sees that the next timer for CPU 1
>> is in one hour (because the recently enqueued timer for CPU 1
>> hasn't been programmed
>> yet), so it programs the broadcast to that 1 hour deadline.
>>
>> * CPU 1 runs tick_nohz_idle_stop_tick() which eventually writes and
>> program
>> dev->next_event to next jiffy
>> * CPU 1 runs into cpuidle_enter_state(), and tick_broadcast_enter() is
>> ignored because
>> the CPU is already in tick_broadcast_oneshot_mask, so the
>> dev->next_event
>> change isn't propagated to broadcast.
>>
>> * CPU 1 goes to sleep for 1 hour.
>
> Hi Frederic,
> Yes, I think that make sense :)
Hi Frederic,
If we have reached a consensus, may I add "Reviewed-by: Frederic"
in the next patch?
>
>
>>
>> Does it make more sense? There might be more simple scenario of course.
>>
>> Thanks.
>
On Mon, Apr 10, 2023 at 03:06:15PM +0800, Victor Hassan wrote:
> Hi Frederic,
> If we have reached a consensus, may I add "Reviewed-by: Frederic" in the
> next patch?
Sure and perhaps add the racing scenario I proposed to clarify the changelog.
Thanks.