2021-07-28 13:12:53

by Bruno Goncalves

[permalink] [raw]
Subject: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

Hello,

Since this commit (Commit: 45312bd762d3 - Merge tag 'zonefs-5.14-rc2')
we started to see the following call trace, it seems to be
reproducible only on aarch64.

[ 384.485614] ------------[ cut here ]------------
[ 384.490227] rq->clock_update_flags < RQCF_ACT_SKIP
[ 384.490232] WARNING: CPU: 112 PID: 2041 at
kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0
[ 384.504312] Modules linked in: mlx5_ib ib_uverbs ib_core rfkill
sunrpc acpi_ipmi ipmi_ssif mlx5_core mlxfw psample ipmi_devintf
arm_cmn ipmi_msghandler arm_dsu_pmu cppc_cpufreq acpi_tad vfat fat
fuse zram ip_tables x_tables xfs crct10dif_ce ghash_ce ast
i2c_algo_bit drm_vram_helper sbsa_gwdt drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm nvme
nvme_core drm xgene_hwmon aes_neon_bs
[ 384.541165] CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W
5.14.0-rc1 #1
[ 384.549244] Hardware name: WIWYNN Mt.Jade Server System
B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
1.06.20210526) 2021/05/26
[ 384.561922] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
[ 384.567918] pc : sub_running_bw.isra.0+0x190/0x1a0
[ 384.572698] lr : sub_running_bw.isra.0+0x190/0x1a0
[ 384.577477] sp : ffff800024c4bb20
[ 384.580779] x29: ffff800024c4bb20 x28: 0000000000000000 x27: ffffb9a9bbe1d200
[ 384.587904] x26: 0000000000000074 x25: 0000000000000011 x24: ffffb9a9bdff9000
[ 384.595029] x23: ffff07ffb36fcaa0 x22: ffff401ee09b65c0 x21: ffffb9a9bbe1de00
[ 384.602153] x20: ffff401ee09a3360 x19: ffff401ee09b6f58 x18: 0000000000000000
[ 384.609277] x17: ffff867522f0c000 x16: ffff800010384000 x15: 0000000000000030
[ 384.616401] x14: 0000000000000000 x13: 50494b535f544341 x12: 5f46435152203c20
[ 384.623526] x11: ffff401ee04b0ea8 x10: ffff401ee021e068 x9 : ffffb9a9bbe4214c
[ 384.630650] x8 : 0000000000010ea8 x7 : ffff401ee01e0000 x6 : 0000000000017ffd
[ 384.637774] x5 : ffff401ee09a3490 x4 : 0000000000000001 x3 : ffff867522f0c000
[ 384.644898] x2 : ffff401ee09a3498 x1 : ffff07ffb53cc000 x0 : 0000000000000026
[ 384.652022] Call trace:
[ 384.654457] sub_running_bw.isra.0+0x190/0x1a0
[ 384.658890] migrate_task_rq_dl+0xf8/0x1e0
[ 384.662975] set_task_cpu+0xa8/0x1f0
[ 384.666540] try_to_wake_up+0x150/0x3d4
[ 384.670365] wake_up_q+0x64/0xc0
[ 384.673582] __up_write+0xd0/0x1c0
[ 384.676974] up_write+0x4c/0x2b0
[ 384.680191] cppc_set_perf+0x120/0x2d0
[ 384.683931] cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
[ 384.689756] __cpufreq_driver_target+0x74/0x140
[ 384.694277] sugov_work+0x64/0x80
[ 384.697580] kthread_worker_fn+0xe0/0x230
[ 384.701580] kthread+0x138/0x140
[ 384.704797] ret_from_fork+0x10/0x18


More logs can be found checking out dmesg logs on:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/07/16/338525814/build_aarch64_redhat%3A1431434591/tests/storage_software_RAID_testing/
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/07/16/338525814/build_aarch64_redhat%3A1431434591/tests/xfstests_btrfs/
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/07/16/338525814/build_aarch64_redhat%3A1431434591/tests/xfstests_ext4/
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefix=datawarehouse-public/2021/07/16/338525814/build_aarch64_redhat%3A1431434591/tests/xfstests_xfs/

Thank you,
Bruno Goncalves



2021-07-28 15:56:30

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On 28/07/2021 15:11, Bruno Goncalves wrote:
> Hello,
>
> Since this commit (Commit: 45312bd762d3 - Merge tag 'zonefs-5.14-rc2')
> we started to see the following call trace, it seems to be
> reproducible only on aarch64.

It should happen on platforms using a slow-switching cpufreq driver.

Only in this case you have n (depends on nbr of frequency domains)
special-purpose DL threads when using schedutil CPUFreq governor:

root@juno: ps -eTo comm,pid,pri,class | grep sugov

sugov:0 132 140 DLN
sugov:1 134 140 DLN

>
> [ 384.485614] ------------[ cut here ]------------
> [ 384.490227] rq->clock_update_flags < RQCF_ACT_SKIP
> [ 384.490232] WARNING: CPU: 112 PID: 2041 at
> kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0
> [ 384.504312] Modules linked in: mlx5_ib ib_uverbs ib_core rfkill
> sunrpc acpi_ipmi ipmi_ssif mlx5_core mlxfw psample ipmi_devintf
> arm_cmn ipmi_msghandler arm_dsu_pmu cppc_cpufreq acpi_tad vfat fat
> fuse zram ip_tables x_tables xfs crct10dif_ce ghash_ce ast
> i2c_algo_bit drm_vram_helper sbsa_gwdt drm_kms_helper syscopyarea
> sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm nvme
> nvme_core drm xgene_hwmon aes_neon_bs
> [ 384.541165] CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W
> 5.14.0-rc1 #1
> [ 384.549244] Hardware name: WIWYNN Mt.Jade Server System
> B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
> 1.06.20210526) 2021/05/26
> [ 384.561922] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> [ 384.567918] pc : sub_running_bw.isra.0+0x190/0x1a0
> [ 384.572698] lr : sub_running_bw.isra.0+0x190/0x1a0
> [ 384.577477] sp : ffff800024c4bb20
> [ 384.580779] x29: ffff800024c4bb20 x28: 0000000000000000 x27: ffffb9a9bbe1d200
> [ 384.587904] x26: 0000000000000074 x25: 0000000000000011 x24: ffffb9a9bdff9000
> [ 384.595029] x23: ffff07ffb36fcaa0 x22: ffff401ee09b65c0 x21: ffffb9a9bbe1de00
> [ 384.602153] x20: ffff401ee09a3360 x19: ffff401ee09b6f58 x18: 0000000000000000
> [ 384.609277] x17: ffff867522f0c000 x16: ffff800010384000 x15: 0000000000000030
> [ 384.616401] x14: 0000000000000000 x13: 50494b535f544341 x12: 5f46435152203c20
> [ 384.623526] x11: ffff401ee04b0ea8 x10: ffff401ee021e068 x9 : ffffb9a9bbe4214c
> [ 384.630650] x8 : 0000000000010ea8 x7 : ffff401ee01e0000 x6 : 0000000000017ffd
> [ 384.637774] x5 : ffff401ee09a3490 x4 : 0000000000000001 x3 : ffff867522f0c000
> [ 384.644898] x2 : ffff401ee09a3498 x1 : ffff07ffb53cc000 x0 : 0000000000000026
> [ 384.652022] Call trace:
> [ 384.654457] sub_running_bw.isra.0+0x190/0x1a0
> [ 384.658890] migrate_task_rq_dl+0xf8/0x1e0
> [ 384.662975] set_task_cpu+0xa8/0x1f0
> [ 384.666540] try_to_wake_up+0x150/0x3d4
> [ 384.670365] wake_up_q+0x64/0xc0
> [ 384.673582] __up_write+0xd0/0x1c0
> [ 384.676974] up_write+0x4c/0x2b0
> [ 384.680191] cppc_set_perf+0x120/0x2d0
> [ 384.683931] cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
> [ 384.689756] __cpufreq_driver_target+0x74/0x140
> [ 384.694277] sugov_work+0x64/0x80
> [ 384.697580] kthread_worker_fn+0xe0/0x230
> [ 384.701580] kthread+0x138/0x140
> [ 384.704797] ret_from_fork+0x10/0x18

Don't quite get this.
`sugov:112` should be a special DL entity (dl_se->flags &
SCHED_FLAG_SUGOV) so sub_running_bw() should not call __sub_running_bw()
and hence there won't be a call to cpufreq_update_util() which calls
q_clock(rq) -> assert_clock_updated()?

Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
driver)).

2021-07-29 12:39:22

by Bruno Goncalves

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
<[email protected]> wrote:
>
> On 28/07/2021 15:11, Bruno Goncalves wrote:
> > Hello,
> >
> > Since this commit (Commit: 45312bd762d3 - Merge tag 'zonefs-5.14-rc2')
> > we started to see the following call trace, it seems to be
> > reproducible only on aarch64.
>
> It should happen on platforms using a slow-switching cpufreq driver.
>
> Only in this case you have n (depends on nbr of frequency domains)
> special-purpose DL threads when using schedutil CPUFreq governor:
>
> root@juno: ps -eTo comm,pid,pri,class | grep sugov
>
> sugov:0 132 140 DLN
> sugov:1 134 140 DLN
>
> >
> > [ 384.485614] ------------[ cut here ]------------
> > [ 384.490227] rq->clock_update_flags < RQCF_ACT_SKIP
> > [ 384.490232] WARNING: CPU: 112 PID: 2041 at
> > kernel/sched/sched.h:1453 sub_running_bw.isra.0+0x190/0x1a0
> > [ 384.504312] Modules linked in: mlx5_ib ib_uverbs ib_core rfkill
> > sunrpc acpi_ipmi ipmi_ssif mlx5_core mlxfw psample ipmi_devintf
> > arm_cmn ipmi_msghandler arm_dsu_pmu cppc_cpufreq acpi_tad vfat fat
> > fuse zram ip_tables x_tables xfs crct10dif_ce ghash_ce ast
> > i2c_algo_bit drm_vram_helper sbsa_gwdt drm_kms_helper syscopyarea
> > sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm nvme
> > nvme_core drm xgene_hwmon aes_neon_bs
> > [ 384.541165] CPU: 112 PID: 2041 Comm: sugov:112 Tainted: G W
> > 5.14.0-rc1 #1
> > [ 384.549244] Hardware name: WIWYNN Mt.Jade Server System
> > B81.030Z1.0007/Mt.Jade Motherboard, BIOS 1.6.20210526 (SCP:
> > 1.06.20210526) 2021/05/26
> > [ 384.561922] pstate: 404000c9 (nZcv daIF +PAN -UAO -TCO BTYPE=--)
> > [ 384.567918] pc : sub_running_bw.isra.0+0x190/0x1a0
> > [ 384.572698] lr : sub_running_bw.isra.0+0x190/0x1a0
> > [ 384.577477] sp : ffff800024c4bb20
> > [ 384.580779] x29: ffff800024c4bb20 x28: 0000000000000000 x27: ffffb9a9bbe1d200
> > [ 384.587904] x26: 0000000000000074 x25: 0000000000000011 x24: ffffb9a9bdff9000
> > [ 384.595029] x23: ffff07ffb36fcaa0 x22: ffff401ee09b65c0 x21: ffffb9a9bbe1de00
> > [ 384.602153] x20: ffff401ee09a3360 x19: ffff401ee09b6f58 x18: 0000000000000000
> > [ 384.609277] x17: ffff867522f0c000 x16: ffff800010384000 x15: 0000000000000030
> > [ 384.616401] x14: 0000000000000000 x13: 50494b535f544341 x12: 5f46435152203c20
> > [ 384.623526] x11: ffff401ee04b0ea8 x10: ffff401ee021e068 x9 : ffffb9a9bbe4214c
> > [ 384.630650] x8 : 0000000000010ea8 x7 : ffff401ee01e0000 x6 : 0000000000017ffd
> > [ 384.637774] x5 : ffff401ee09a3490 x4 : 0000000000000001 x3 : ffff867522f0c000
> > [ 384.644898] x2 : ffff401ee09a3498 x1 : ffff07ffb53cc000 x0 : 0000000000000026
> > [ 384.652022] Call trace:
> > [ 384.654457] sub_running_bw.isra.0+0x190/0x1a0
> > [ 384.658890] migrate_task_rq_dl+0xf8/0x1e0
> > [ 384.662975] set_task_cpu+0xa8/0x1f0
> > [ 384.666540] try_to_wake_up+0x150/0x3d4
> > [ 384.670365] wake_up_q+0x64/0xc0
> > [ 384.673582] __up_write+0xd0/0x1c0
> > [ 384.676974] up_write+0x4c/0x2b0
> > [ 384.680191] cppc_set_perf+0x120/0x2d0
> > [ 384.683931] cppc_cpufreq_set_target+0xe0/0x1a4 [cppc_cpufreq]
> > [ 384.689756] __cpufreq_driver_target+0x74/0x140
> > [ 384.694277] sugov_work+0x64/0x80
> > [ 384.697580] kthread_worker_fn+0xe0/0x230
> > [ 384.701580] kthread+0x138/0x140
> > [ 384.704797] ret_from_fork+0x10/0x18
>
> Don't quite get this.
> `sugov:112` should be a special DL entity (dl_se->flags &
> SCHED_FLAG_SUGOV) so sub_running_bw() should not call __sub_running_bw()
> and hence there won't be a call to cpufreq_update_util() which calls
> q_clock(rq) -> assert_clock_updated()?
>
> Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
> driver)).

We seem to be able to reproduce this only on Ampere Altra machines,
specifically on mtjade and mtsnow cpus.

# cpupower frequency-info
analyzing CPU 0:
driver: cppc_cpufreq
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: Cannot determine or is not supported.
hardware limits: 1000 MHz - 2.80 GHz
available cpufreq governors: conservative ondemand userspace
powersave performance schedutil
current policy: frequency should be within 2.00 GHz and 2.80 GHz.
The governor "schedutil" may decide which speed to use
within this range.
current CPU frequency: 1.55 GHz (asserted by call to hardware)

# ps -eTo comm,pid,pri,class | grep sugov
sugov:0 1082 140 DLN
sugov:1 1085 140 DLN
...
sugov:78 1319 140 DLN
sugov:79 1320 140 DLN


Bruno


>


2021-07-29 14:39:36

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On 29/07/2021 14:36, Bruno Goncalves wrote:
> On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
> <[email protected]> wrote:
>>
>> On 28/07/2021 15:11, Bruno Goncalves wrote:

[...]

>> Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
>> driver)).
>
> We seem to be able to reproduce this only on Ampere Altra machines,
> specifically on mtjade and mtsnow cpus.
>
> # cpupower frequency-info
> analyzing CPU 0:
> driver: cppc_cpufreq
> CPUs which run at the same hardware frequency: 0
> CPUs which need to have their frequency coordinated by software: 0
> maximum transition latency: Cannot determine or is not supported.
> hardware limits: 1000 MHz - 2.80 GHz
> available cpufreq governors: conservative ondemand userspace
> powersave performance schedutil
> current policy: frequency should be within 2.00 GHz and 2.80 GHz.
> The governor "schedutil" may decide which speed to use
> within this range.
> current CPU frequency: 1.55 GHz (asserted by call to hardware)
>
> # ps -eTo comm,pid,pri,class | grep sugov
> sugov:0 1082 140 DLN
> sugov:1 1085 140 DLN
> ...
> sugov:78 1319 140 DLN
> sugov:79 1320 140 DLN

Thanks! In the meantime I got access to an Ampere Altra so I can try
5.14.0-rc1 later today.

2021-07-30 12:24:01

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On 29/07/2021 16:38, Dietmar Eggemann wrote:
> On 29/07/2021 14:36, Bruno Goncalves wrote:
>> On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
>> <[email protected]> wrote:
>>>
>>> On 28/07/2021 15:11, Bruno Goncalves wrote:
>
> [...]
>
>>> Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
>>> driver)).
>>
>> We seem to be able to reproduce this only on Ampere Altra machines,
>> specifically on mtjade and mtsnow cpus.
>>
>> # cpupower frequency-info
>> analyzing CPU 0:
>> driver: cppc_cpufreq
>> CPUs which run at the same hardware frequency: 0
>> CPUs which need to have their frequency coordinated by software: 0
>> maximum transition latency: Cannot determine or is not supported.
>> hardware limits: 1000 MHz - 2.80 GHz
>> available cpufreq governors: conservative ondemand userspace
>> powersave performance schedutil
>> current policy: frequency should be within 2.00 GHz and 2.80 GHz.
>> The governor "schedutil" may decide which speed to use
>> within this range.
>> current CPU frequency: 1.55 GHz (asserted by call to hardware)
>>
>> # ps -eTo comm,pid,pri,class | grep sugov
>> sugov:0 1082 140 DLN
>> sugov:1 1085 140 DLN
>> ...
>> sugov:78 1319 140 DLN
>> sugov:79 1320 140 DLN
>
> Thanks! In the meantime I got access to an Ampere Altra so I can try
> 5.14.0-rc1 later today.

The task causing this seem to be the new `cppc_fie` DL task introduced
by commit 1eb5dde674f5 "cpufreq: CPPC: Add support for frequency
invariance" in v5.14-rc1.

With `CONFIG_ACPI_CPPC_CPUFREQ_FIE=y` and schedutil cpufreq governor on
slow-switching system:

DL task curr=`sugov:X` makes p=`cppc_fie` migrate and since it is in
`non_contending` state, migrate_task_rq_dl() calls

sub_running_bw()->__sub_running_bw()->cpufreq_update_util()->
rq_clock()->assert_clock_updated()

on p.

Can you try this snippet? It should fix it.

--8<--

From: Dietmar Eggemann <[email protected]>
Date: Fri, 30 Jul 2021 14:03:40 +0200
Subject: [PATCH] sched/deadline: Fix missing clock update in
migrate_task_rq_dl()

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index aaacd6cfd42f..4920f498492f 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1733,6 +1733,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
*/
raw_spin_rq_lock(rq);
if (p->dl.dl_non_contending) {
+ update_rq_clock(rq);
sub_running_bw(&p->dl, &rq->dl);
p->dl.dl_non_contending = 0;
/*
--
2.25.1

2021-07-30 15:27:44

by Bruno Goncalves

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On Fri, Jul 30, 2021 at 2:22 PM Dietmar Eggemann
<[email protected]> wrote:
>
> On 29/07/2021 16:38, Dietmar Eggemann wrote:
> > On 29/07/2021 14:36, Bruno Goncalves wrote:
> >> On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
> >> <[email protected]> wrote:
> >>>
> >>> On 28/07/2021 15:11, Bruno Goncalves wrote:
> >
> > [...]
> >
> >>> Can't reproduce it on my Juno (arm64) (slow-switching (scpi-cpufreq
> >>> driver)).
> >>
> >> We seem to be able to reproduce this only on Ampere Altra machines,
> >> specifically on mtjade and mtsnow cpus.
> >>
> >> # cpupower frequency-info
> >> analyzing CPU 0:
> >> driver: cppc_cpufreq
> >> CPUs which run at the same hardware frequency: 0
> >> CPUs which need to have their frequency coordinated by software: 0
> >> maximum transition latency: Cannot determine or is not supported.
> >> hardware limits: 1000 MHz - 2.80 GHz
> >> available cpufreq governors: conservative ondemand userspace
> >> powersave performance schedutil
> >> current policy: frequency should be within 2.00 GHz and 2.80 GHz.
> >> The governor "schedutil" may decide which speed to use
> >> within this range.
> >> current CPU frequency: 1.55 GHz (asserted by call to hardware)
> >>
> >> # ps -eTo comm,pid,pri,class | grep sugov
> >> sugov:0 1082 140 DLN
> >> sugov:1 1085 140 DLN
> >> ...
> >> sugov:78 1319 140 DLN
> >> sugov:79 1320 140 DLN
> >
> > Thanks! In the meantime I got access to an Ampere Altra so I can try
> > 5.14.0-rc1 later today.
>
> The task causing this seem to be the new `cppc_fie` DL task introduced
> by commit 1eb5dde674f5 "cpufreq: CPPC: Add support for frequency
> invariance" in v5.14-rc1.
>
> With `CONFIG_ACPI_CPPC_CPUFREQ_FIE=y` and schedutil cpufreq governor on
> slow-switching system:
>
> DL task curr=`sugov:X` makes p=`cppc_fie` migrate and since it is in
> `non_contending` state, migrate_task_rq_dl() calls
>
> sub_running_bw()->__sub_running_bw()->cpufreq_update_util()->
> rq_clock()->assert_clock_updated()
>
> on p.
>
> Can you try this snippet? It should fix it.

Thank you, I've tried the patch and it fixes the issue.

Bruno

>
> --8<--
>
> From: Dietmar Eggemann <[email protected]>
> Date: Fri, 30 Jul 2021 14:03:40 +0200
> Subject: [PATCH] sched/deadline: Fix missing clock update in
> migrate_task_rq_dl()
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index aaacd6cfd42f..4920f498492f 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1733,6 +1733,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
> */
> raw_spin_rq_lock(rq);
> if (p->dl.dl_non_contending) {
> + update_rq_clock(rq);
> sub_running_bw(&p->dl, &rq->dl);
> p->dl.dl_non_contending = 0;
> /*
> --
> 2.25.1
>


2021-08-02 08:45:56

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: WARNING: CPU: 112 PID: 2041 at kernel/sched/sched.h:1453

On 30/07/2021 17:23, Bruno Goncalves wrote:
> On Fri, Jul 30, 2021 at 2:22 PM Dietmar Eggemann
> <[email protected]> wrote:
>>
>> On 29/07/2021 16:38, Dietmar Eggemann wrote:
>>> On 29/07/2021 14:36, Bruno Goncalves wrote:
>>>> On Wed, Jul 28, 2021 at 5:55 PM Dietmar Eggemann
>>>> <[email protected]> wrote:
>>>>>
>>>>> On 28/07/2021 15:11, Bruno Goncalves wrote:

[...]

>> The task causing this seem to be the new `cppc_fie` DL task introduced
>> by commit 1eb5dde674f5 "cpufreq: CPPC: Add support for frequency
>> invariance" in v5.14-rc1.
>>
>> With `CONFIG_ACPI_CPPC_CPUFREQ_FIE=y` and schedutil cpufreq governor on
>> slow-switching system:
>>
>> DL task curr=`sugov:X` makes p=`cppc_fie` migrate and since it is in
>> `non_contending` state, migrate_task_rq_dl() calls
>>
>> sub_running_bw()->__sub_running_bw()->cpufreq_update_util()->
>> rq_clock()->assert_clock_updated()
>>
>> on p.
>>
>> Can you try this snippet? It should fix it.
>
> Thank you, I've tried the patch and it fixes the issue.

Thanks for testing! Let me send out a proper patch then.