With the new design in place, show/store callbacks check if the policy
is active or not before proceeding further and cpufreq_policy_free()
must be called after emptying policy->cpus mask, i.e. inactive policy.
Lets make sure we don't get a bug around this later and catch this early
by putting a BUG_ON() within cpufreq_policy_free().
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e24aa5d4bca5..53d163a84e06 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
unsigned long flags;
int cpu;
+ /*
+ * The callers must ensure the policy is inactive by now, to avoid any
+ * races with show()/store() callbacks.
+ */
+ BUG_ON(!policy_is_inactive(policy));
+
/* Remove policy from list */
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_del(&policy->policy_list);
--
2.31.1.272.g89b43f80a514
With the new design in place, to avoid potential races show() and
store() callbacks check if the policy is active or not before proceeding
any further. And in order to guarantee that cpufreq_policy_free() must
be called after clearing the policy->cpus mask, i.e. by marking it
inactive.
Lets make sure we don't get a bug around this later and catch this early
by putting a BUG_ON() within cpufreq_policy_free().
Also update cpufreq_online() a bit to make sure we clear the cpus mask
for each error case before calling cpufreq_policy_free().
Signed-off-by: Viresh Kumar <[email protected]>
---
V2: Update cpufreq_online() and changelog.
drivers/cpufreq/cpufreq.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e24aa5d4bca5..0f8245731783 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
unsigned long flags;
int cpu;
+ /*
+ * The callers must ensure the policy is inactive by now, to avoid any
+ * races with show()/store() callbacks.
+ */
+ BUG_ON(!policy_is_inactive(policy));
+
/* Remove policy from list */
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_del(&policy->policy_list);
@@ -1538,8 +1544,6 @@ static int cpufreq_online(unsigned int cpu)
for_each_cpu(j, policy->real_cpus)
remove_cpu_dev_symlink(policy, j, get_cpu_device(j));
- cpumask_clear(policy->cpus);
-
out_offline_policy:
if (cpufreq_driver->offline)
cpufreq_driver->offline(policy);
@@ -1549,6 +1553,7 @@ static int cpufreq_online(unsigned int cpu)
cpufreq_driver->exit(policy);
out_free_policy:
+ cpumask_clear(policy->cpus);
up_write(&policy->rwsem);
cpufreq_policy_free(policy);
--
2.31.1.272.g89b43f80a514
Greeting,
FYI, we noticed the following commit (built with gcc-11):
commit: a6cb305191dd85350290fd66aeea8e62e33562f9 ("[PATCH 2/3] cpufreq: Panic if policy is active in cpufreq_policy_free()")
url: https://github.com/intel-lab-lkp/linux/commits/Viresh-Kumar/cpufreq-Minor-cleanups/20220526-195434
base: https://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git linux-next
patch link: https://lore.kernel.org/linux-pm/8c3d50faf8811e86136fb3f9c459e43fc3c50bc0.1653565641.git.viresh.kumar@linaro.org
in testcase: boot
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+------------------------------------------+------------+------------+
| | a1f8da428b | a6cb305191 |
+------------------------------------------+------------+------------+
| boot_successes | 15 | 0 |
| boot_failures | 0 | 13 |
| kernel_BUG_at_drivers/cpufreq/cpufreq.c | 0 | 13 |
| invalid_opcode:#[##] | 0 | 13 |
| RIP:cpufreq_policy_free | 0 | 13 |
| Kernel_panic-not_syncing:Fatal_exception | 0 | 13 |
+------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 10.343913][ T200] kernel BUG at drivers/cpufreq/cpufreq.c:1291!
[ 10.345025][ T200] invalid opcode: 0000 [#1] SMP PTI
[ 10.345958][ T200] CPU: 1 PID: 200 Comm: systemd-udevd Not tainted 5.18.0-02127-ga6cb305191dd #1
[ 10.347496][ T200] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 10.349232][ T200] RIP: 0010:cpufreq_policy_free (drivers/cpufreq/cpufreq.c:1291 (discriminator 1))
[ 10.349244][ T200] Code: e6 ff ff 48 8b 7d 10 e8 bc 47 c5 ff 48 8b 7d 08 e8 b3 47 c5 ff 48 8b 7d 00 e8 aa 47 c5 ff 48 89 ef 5b 5d 41 5c e9 de 09 97 ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 2e 0f 1f 84 00 00 00
All code
========
0: e6 ff out %al,$0xff
2: ff 48 8b decl -0x75(%rax)
5: 7d 10 jge 0x17
7: e8 bc 47 c5 ff callq 0xffffffffffc547c8
c: 48 8b 7d 08 mov 0x8(%rbp),%rdi
10: e8 b3 47 c5 ff callq 0xffffffffffc547c8
15: 48 8b 7d 00 mov 0x0(%rbp),%rdi
19: e8 aa 47 c5 ff callq 0xffffffffffc547c8
1e: 48 89 ef mov %rbp,%rdi
21: 5b pop %rbx
22: 5d pop %rbp
23: 41 5c pop %r12
25: e9 de 09 97 ff jmpq 0xffffffffff970a08
2a:* 0f 0b ud2 <-- trapping instruction
2c: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
33: 00 00 00 00
37: 66 data16
38: 66 data16
39: 2e cs
3a: 0f .byte 0xf
3b: 1f (bad)
3c: 84 00 test %al,(%rax)
...
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 66 66 2e 0f 1f 84 00 data16 nopw %cs:0x0(%rax,%rax,1)
9: 00 00 00 00
d: 66 data16
e: 66 data16
f: 2e cs
10: 0f .byte 0xf
11: 1f (bad)
12: 84 00 test %al,(%rax)
...
[ 10.349247][ T200] RSP: 0018:ffffbdf4405abca0 EFLAGS: 00010293
[ 10.349251][ T200] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000
[ 10.349253][ T200] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff96e640375618
[ 10.349255][ T200] RBP: ffff96e6c5775c00 R08: ffffbdf4405aba24 R09: 0000000000000002
[ 10.349257][ T200] R10: 00000000fffffffb R11: ffffddf43fc029d0 R12: 00000000fffffffb
[ 10.349259][ T200] R13: 0000000000000000 R14: ffff96e6c5775dd0 R15: 0000000000000001
[ 10.349261][ T200] FS: 0000000000000000(0000) GS:ffff96e96fd00000(0063) knlGS:00000000f7bca800
[ 10.349267][ T200] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 10.349269][ T200] CR2: 0000000056e2c164 CR3: 000000042fcd2000 CR4: 00000000000406e0
[ 10.349271][ T200] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 10.349273][ T200] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 10.349275][ T200] Call Trace:
[ 10.349304][ T200] <TASK>
[ 10.349308][ T200] cpufreq_online (drivers/cpufreq/cpufreq.c:1562)
[ 10.349314][ T200] cpufreq_add_dev (drivers/cpufreq/cpufreq.c:1578)
[ 10.349318][ T200] subsys_interface_register (drivers/base/bus.c:1036)
[ 10.349328][ T200] ? 0xffffffffc0071000
[ 10.349330][ T200] cpufreq_register_driver (drivers/cpufreq/cpufreq.c:2872)
[ 10.349333][ T200] acpi_cpufreq_init (drivers/cpufreq/acpi-cpufreq.c:286) acpi_cpufreq
[ 10.349342][ T200] do_one_initcall (init/main.c:1295)
[ 10.349348][ T200] ? __cond_resched (kernel/sched/core.c:8181)
[ 10.349354][ T200] ? kmem_cache_alloc_trace (mm/slub.c:3219 mm/slub.c:3225 mm/slub.c:3256)
[ 10.349362][ T200] do_init_module (kernel/module.c:3731)
[ 10.349366][ T200] __do_sys_finit_module (kernel/module.c:4222)
[ 10.349369][ T200] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178)
[ 10.349376][ T200] do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 10.349379][ T200] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:117)
[ 10.349385][ T200] RIP: 0023:0xf7f01549
[ 10.349388][ T200] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d b4 26 00 00 00 00 8d b4 26 00 00 00 00
All code
========
0: 03 74 c0 01 add 0x1(%rax,%rax,8),%esi
4: 10 05 03 74 b8 01 adc %al,0x1b87403(%rip) # 0x1b8740d
a: 10 06 adc %al,(%rsi)
c: 03 74 b4 01 add 0x1(%rsp,%rsi,4),%esi
10: 10 07 adc %al,(%rdi)
12: 03 74 b0 01 add 0x1(%rax,%rsi,4),%esi
16: 10 08 adc %cl,(%rax)
18: 03 74 d8 01 add 0x1(%rax,%rbx,8),%esi
1c: 00 00 add %al,(%rax)
1e: 00 00 add %al,(%rax)
20: 00 51 52 add %dl,0x52(%rcx)
23: 55 push %rbp
24: 89 e5 mov %esp,%ebp
26: 0f 34 sysenter
28: cd 80 int $0x80
2a:* 5d pop %rbp <-- trapping instruction
2b: 5a pop %rdx
2c: 59 pop %rcx
2d: c3 retq
2e: 90 nop
2f: 90 nop
30: 90 nop
31: 90 nop
32: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
39: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
Code starting with the faulting instruction
===========================================
0: 5d pop %rbp
1: 5a pop %rdx
2: 59 pop %rcx
3: c3 retq
4: 90 nop
5: 90 nop
6: 90 nop
7: 90 nop
8: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
f: 8d b4 26 00 00 00 00 lea 0x0(%rsi,%riz,1),%esi
To reproduce:
# build kernel
cd linux
cp config-5.18.0-02127-ga6cb305191dd .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <[email protected]> wrote:
>
> With the new design in place, to avoid potential races show() and
> store() callbacks check if the policy is active or not before proceeding
> any further. And in order to guarantee that cpufreq_policy_free() must
> be called after clearing the policy->cpus mask, i.e. by marking it
> inactive.
>
> Lets make sure we don't get a bug around this later and catch this early
> by putting a BUG_ON() within cpufreq_policy_free().
>
> Also update cpufreq_online() a bit to make sure we clear the cpus mask
> for each error case before calling cpufreq_policy_free().
>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> V2: Update cpufreq_online() and changelog.
>
> drivers/cpufreq/cpufreq.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index e24aa5d4bca5..0f8245731783 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> unsigned long flags;
> int cpu;
>
> + /*
> + * The callers must ensure the policy is inactive by now, to avoid any
> + * races with show()/store() callbacks.
> + */
> + BUG_ON(!policy_is_inactive(policy));
I'm not a super-big fan of this change.
First off, crashing the kernel outright here because of possible races
appears a bit excessive to me.
Second, it looks like we are worrying about the code running before
the wait_for_completion() call in cpufreq_policy_put_kobj(), because
after that call no one can be running show() or store(). So why don't
we reorder the wait_for_completion() call with respect to the code in
question instead?
> +
> /* Remove policy from list */
> write_lock_irqsave(&cpufreq_driver_lock, flags);
> list_del(&policy->policy_list);
> @@ -1538,8 +1544,6 @@ static int cpufreq_online(unsigned int cpu)
> for_each_cpu(j, policy->real_cpus)
> remove_cpu_dev_symlink(policy, j, get_cpu_device(j));
>
> - cpumask_clear(policy->cpus);
> -
> out_offline_policy:
> if (cpufreq_driver->offline)
> cpufreq_driver->offline(policy);
> @@ -1549,6 +1553,7 @@ static int cpufreq_online(unsigned int cpu)
> cpufreq_driver->exit(policy);
>
> out_free_policy:
> + cpumask_clear(policy->cpus);
> up_write(&policy->rwsem);
>
> cpufreq_policy_free(policy);
> --
> 2.31.1.272.g89b43f80a514
>
On 14-06-22, 15:59, Rafael J. Wysocki wrote:
> On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <[email protected]> wrote:
> >
> > With the new design in place, to avoid potential races show() and
> > store() callbacks check if the policy is active or not before proceeding
> > any further. And in order to guarantee that cpufreq_policy_free() must
> > be called after clearing the policy->cpus mask, i.e. by marking it
> > inactive.
> >
> > Lets make sure we don't get a bug around this later and catch this early
> > by putting a BUG_ON() within cpufreq_policy_free().
> >
> > Also update cpufreq_online() a bit to make sure we clear the cpus mask
> > for each error case before calling cpufreq_policy_free().
> >
> > Signed-off-by: Viresh Kumar <[email protected]>
> > ---
> > V2: Update cpufreq_online() and changelog.
> >
> > drivers/cpufreq/cpufreq.c | 9 +++++++--
> > 1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index e24aa5d4bca5..0f8245731783 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> > unsigned long flags;
> > int cpu;
> >
> > + /*
> > + * The callers must ensure the policy is inactive by now, to avoid any
> > + * races with show()/store() callbacks.
> > + */
> > + BUG_ON(!policy_is_inactive(policy));
>
> I'm not a super-big fan of this change.
>
> First off, crashing the kernel outright here because of possible races
> appears a bit excessive to me.
>
> Second, it looks like we are worrying about the code running before
> the wait_for_completion() call in cpufreq_policy_put_kobj(), because
> after that call no one can be running show() or store(). So why don't
> we reorder the wait_for_completion() call with respect to the code in
> question instead?
No, I am not worrying about that race. I am just trying to make sure some change
in future doesn't break this assumption (that policy should be inactive by this
point). That's all. It all looks good for now.
May be a WARN instead of BUG if we don't want to crash.
--
viresh
On Wed, Jun 15, 2022 at 7:00 AM Viresh Kumar <[email protected]> wrote:
>
> On 14-06-22, 15:59, Rafael J. Wysocki wrote:
> > On Fri, May 27, 2022 at 5:53 AM Viresh Kumar <[email protected]> wrote:
> > >
> > > With the new design in place, to avoid potential races show() and
> > > store() callbacks check if the policy is active or not before proceeding
> > > any further. And in order to guarantee that cpufreq_policy_free() must
> > > be called after clearing the policy->cpus mask, i.e. by marking it
> > > inactive.
> > >
> > > Lets make sure we don't get a bug around this later and catch this early
> > > by putting a BUG_ON() within cpufreq_policy_free().
> > >
> > > Also update cpufreq_online() a bit to make sure we clear the cpus mask
> > > for each error case before calling cpufreq_policy_free().
> > >
> > > Signed-off-by: Viresh Kumar <[email protected]>
> > > ---
> > > V2: Update cpufreq_online() and changelog.
> > >
> > > drivers/cpufreq/cpufreq.c | 9 +++++++--
> > > 1 file changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > > index e24aa5d4bca5..0f8245731783 100644
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -1284,6 +1284,12 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy)
> > > unsigned long flags;
> > > int cpu;
> > >
> > > + /*
> > > + * The callers must ensure the policy is inactive by now, to avoid any
> > > + * races with show()/store() callbacks.
> > > + */
> > > + BUG_ON(!policy_is_inactive(policy));
> >
> > I'm not a super-big fan of this change.
> >
> > First off, crashing the kernel outright here because of possible races
> > appears a bit excessive to me.
> >
> > Second, it looks like we are worrying about the code running before
> > the wait_for_completion() call in cpufreq_policy_put_kobj(), because
> > after that call no one can be running show() or store(). So why don't
> > we reorder the wait_for_completion() call with respect to the code in
> > question instead?
>
> No, I am not worrying about that race. I am just trying to make sure some change
> in future doesn't break this assumption (that policy should be inactive by this
> point). That's all. It all looks good for now.
>
> May be a WARN instead of BUG if we don't want to crash.
WARN_ON() would be somewhat better, but then I'm not sure if having a
full call trace in this case is really useful, because we know when
cpufreq_policy_free() can be called anyway.
Maybe just print a warning message.
On 06-07-22, 15:49, Rafael J. Wysocki wrote:
> WARN_ON() would be somewhat better, but then I'm not sure if having a
> full call trace in this case is really useful, because we know when
> cpufreq_policy_free() can be called anyway.
>
> Maybe just print a warning message.
The warning will get printed, yes, but I am sure everyone will end up
ignoring it, once it happens.
One of the benefits of printing the call-stack is people will take it
seriously and report it, and we won't miss a bug, if one gets in
somehow.
--
viresh
On Wed, Jul 6, 2022 at 5:23 PM Viresh Kumar <[email protected]> wrote:
>
> On 06-07-22, 15:49, Rafael J. Wysocki wrote:
> > WARN_ON() would be somewhat better, but then I'm not sure if having a
> > full call trace in this case is really useful, because we know when
> > cpufreq_policy_free() can be called anyway.
> >
> > Maybe just print a warning message.
>
> The warning will get printed, yes, but I am sure everyone will end up
> ignoring it, once it happens.
>
> One of the benefits of printing the call-stack is people will take it
> seriously and report it, and we won't miss a bug, if one gets in
> somehow.
I'd rather not go into discussing things that people may or may not do and why.
My point is that if WARN_ON() gets converted to panic(), they will not
see the message at all and if the message gets printed, they will have
a chance to see it even in that case. Whether or not they use that
chance as desirable is beyond the scope of engineering IMV.