2021-03-17 02:19:18

by Josef Bacik

[permalink] [raw]
Subject: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.

This patch causes a panic when rebooting my Dell Poweredge r440. I do
not have the full panic log as it's lost at that stage of the reboot and
I do not have a serial console. Reverting this patch makes my system
able to reboot again.

Signed-off-by: Josef Bacik <[email protected]>
---
- apologies, I mistyped the lkml list email.

kernel/reboot.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index eb1b15850761..a6ad5eb2fa73 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -244,8 +244,6 @@ void migrate_to_reboot_cpu(void)
void kernel_restart(char *cmd)
{
kernel_restart_prepare(cmd);
- if (pm_power_off_prepare)
- pm_power_off_prepare();
migrate_to_reboot_cpu();
syscore_shutdown();
if (!cmd)
--
2.26.2


2021-03-17 02:54:35

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

Hi,

On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik <[email protected]> wrote:
>
> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
>
> This patch causes a panic when rebooting my Dell Poweredge r440. I do
> not have the full panic log as it's lost at that stage of the reboot and
> I do not have a serial console. Reverting this patch makes my system
> able to reboot again.

But this patch also helps many HP laptops, so maybe we should figure
out what's going on on Poweredge r440.
Does it also panic on shutdown?

Kai-Heng

>
> Signed-off-by: Josef Bacik <[email protected]>
> ---
> - apologies, I mistyped the lkml list email.
>
> kernel/reboot.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index eb1b15850761..a6ad5eb2fa73 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -244,8 +244,6 @@ void migrate_to_reboot_cpu(void)
> void kernel_restart(char *cmd)
> {
> kernel_restart_prepare(cmd);
> - if (pm_power_off_prepare)
> - pm_power_off_prepare();
> migrate_to_reboot_cpu();
> syscore_shutdown();
> if (!cmd)
> --
> 2.26.2
>

2021-03-17 16:28:29

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On 3/16/21 10:50 PM, Kai-Heng Feng wrote:
> Hi,
>
> On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik <[email protected]> wrote:
>>
>> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
>>
>> This patch causes a panic when rebooting my Dell Poweredge r440. I do
>> not have the full panic log as it's lost at that stage of the reboot and
>> I do not have a serial console. Reverting this patch makes my system
>> able to reboot again.
>
> But this patch also helps many HP laptops, so maybe we should figure
> out what's going on on Poweredge r440.
> Does it also panic on shutdown?
>

Sure I'll test whatever to get it fixed, but I just wasted 3 days bisecting and
lost a weekend of performance testing on btrfs because of this regression, so
until you figure out how it broke it needs to be reverted so people don't have
to figure out why reboot suddenly isn't working.

Running "halt" has the same effect with and without your patch, it gets to
"system halted" and just sits there without powering off. Not entirely sure why
that is, but there's no panic.

The panic itself is lost, but I see there's an NMI and I have the RIP

(gdb) list *('mwait_idle_with_hints.constprop.0'+0x4b)
0xffffffff816dabdb is in mwait_idle_with_hints
(./arch/x86/include/asm/current.h:15).
10
11 DECLARE_PER_CPU(struct task_struct *, current_task);
12
13 static __always_inline struct task_struct *get_current(void)
14 {
15 return this_cpu_read_stable(current_task);
16 }
17
18 #define current get_current()
19

<mwait_idle_with_hints.constprop.0>: jmp 0xffffffff936dac02
<mwait_idle_with_hints.constprop.0+0x72>
<mwait_idle_with_hints.constprop.0+0x2>: nopl (%rax)
<mwait_idle_with_hints.constprop.0+0x5>: jmp 0xffffffff936dabac
<mwait_idle_with_hints.constprop.0+0x1c>
<mwait_idle_with_hints.constprop.0+0x7>: nopl (%rax)
<mwait_idle_with_hints.constprop.0+0xa>: mfence
<mwait_idle_with_hints.constprop.0+0xd>: mov %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x16>: clflush (%rax)
<mwait_idle_with_hints.constprop.0+0x19>: mfence
<mwait_idle_with_hints.constprop.0+0x1c>: xor %edx,%edx
<mwait_idle_with_hints.constprop.0+0x1e>: mov %rdx,%rcx
<mwait_idle_with_hints.constprop.0+0x21>: mov %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x2a>: monitor %rax,%rcx,%rdx
<mwait_idle_with_hints.constprop.0+0x2d>: mov (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x30>: test $0x8,%al
<mwait_idle_with_hints.constprop.0+0x32>: jne 0xffffffff936dabdb
<mwait_idle_with_hints.constprop.0+0x4b>
<mwait_idle_with_hints.constprop.0+0x34>: jmpq 0xffffffff936dabd0
<mwait_idle_with_hints.constprop.0+0x40>
<mwait_idle_with_hints.constprop.0+0x39>: verw 0x9f9fec(%rip) #
0xffffffff940d4bbc
<mwait_idle_with_hints.constprop.0+0x40>: mov $0x1,%ecx
<mwait_idle_with_hints.constprop.0+0x45>: mov %rdi,%rax
<mwait_idle_with_hints.constprop.0+0x48>: mwait %rax,%rcx
<mwait_idle_with_hints.constprop.0+0x4b>: mov %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x54>: lock andb $0xdf,0x2(%rax)
<mwait_idle_with_hints.constprop.0+0x59>: lock addl $0x0,-0x4(%rsp)
<mwait_idle_with_hints.constprop.0+0x5f>: mov (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x62>: test $0x8,%al
<mwait_idle_with_hints.constprop.0+0x64>: je 0xffffffff936dac01
<mwait_idle_with_hints.constprop.0+0x71>
<mwait_idle_with_hints.constprop.0+0x66>: andl
$0x7fffffff,%gs:0x6c93cf7f(%rip) # 0x17b80
<mwait_idle_with_hints.constprop.0+0x71>: retq
<mwait_idle_with_hints.constprop.0+0x72>: mov %gs:0x17bc0,%rax
<mwait_idle_with_hints.constprop.0+0x7b>: lock orb $0x20,0x2(%rax)
<mwait_idle_with_hints.constprop.0+0x80>: mov (%rax),%rax
<mwait_idle_with_hints.constprop.0+0x83>: test $0x8,%al
<mwait_idle_with_hints.constprop.0+0x85>: jne 0xffffffff936dabdb
<mwait_idle_with_hints.constprop.0+0x4b>
<mwait_idle_with_hints.constprop.0+0x87>: jmpq 0xffffffff936dab95
<mwait_idle_with_hints.constprop.0+0x5>
<mwait_idle_with_hints.constprop.0+0x8c>: nopl 0x0(%rax)

0x4b is after the mwait, which means we're panicing in the
current_clr_polling(), where we do clear_thread_flag(TIF_POLLING_NRFLAG). Thanks,

Josef

2021-03-17 17:23:22

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On Wed, Mar 17, 2021 at 11:19 PM Josef Bacik <[email protected]> wrote:
>
> On 3/16/21 10:50 PM, Kai-Heng Feng wrote:
> > Hi,
> >
> > On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik <[email protected]> wrote:
> >>
> >> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
> >>
> >> This patch causes a panic when rebooting my Dell Poweredge r440. I do
> >> not have the full panic log as it's lost at that stage of the reboot and
> >> I do not have a serial console. Reverting this patch makes my system
> >> able to reboot again.
> >
> > But this patch also helps many HP laptops, so maybe we should figure
> > out what's going on on Poweredge r440.
> > Does it also panic on shutdown?
> >
>
> Sure I'll test whatever to get it fixed, but I just wasted 3 days bisecting and
> lost a weekend of performance testing on btrfs because of this regression, so
> until you figure out how it broke it needs to be reverted so people don't have
> to figure out why reboot suddenly isn't working.

That's unfortunate to hear. However, I've been spending tons of time
on bisecting kernels. To me it's just a normal part of kernel
development so I won't call it "wasted".

Feel free to revert the patch though.

>
> Running "halt" has the same effect with and without your patch, it gets to
> "system halted" and just sits there without powering off. Not entirely sure why
> that is, but there's no panic.

What about shutdown? pm_power_off_prepare() is used by shutdown but
it's not used by halt.

Kai-Heng

>
> The panic itself is lost, but I see there's an NMI and I have the RIP
>
> (gdb) list *('mwait_idle_with_hints.constprop.0'+0x4b)
> 0xffffffff816dabdb is in mwait_idle_with_hints
> (./arch/x86/include/asm/current.h:15).
> 10
> 11 DECLARE_PER_CPU(struct task_struct *, current_task);
> 12
> 13 static __always_inline struct task_struct *get_current(void)
> 14 {
> 15 return this_cpu_read_stable(current_task);
> 16 }
> 17
> 18 #define current get_current()
> 19
>
> <mwait_idle_with_hints.constprop.0>: jmp 0xffffffff936dac02
> <mwait_idle_with_hints.constprop.0+0x72>
> <mwait_idle_with_hints.constprop.0+0x2>: nopl (%rax)
> <mwait_idle_with_hints.constprop.0+0x5>: jmp 0xffffffff936dabac
> <mwait_idle_with_hints.constprop.0+0x1c>
> <mwait_idle_with_hints.constprop.0+0x7>: nopl (%rax)
> <mwait_idle_with_hints.constprop.0+0xa>: mfence
> <mwait_idle_with_hints.constprop.0+0xd>: mov %gs:0x17bc0,%rax
> <mwait_idle_with_hints.constprop.0+0x16>: clflush (%rax)
> <mwait_idle_with_hints.constprop.0+0x19>: mfence
> <mwait_idle_with_hints.constprop.0+0x1c>: xor %edx,%edx
> <mwait_idle_with_hints.constprop.0+0x1e>: mov %rdx,%rcx
> <mwait_idle_with_hints.constprop.0+0x21>: mov %gs:0x17bc0,%rax
> <mwait_idle_with_hints.constprop.0+0x2a>: monitor %rax,%rcx,%rdx
> <mwait_idle_with_hints.constprop.0+0x2d>: mov (%rax),%rax
> <mwait_idle_with_hints.constprop.0+0x30>: test $0x8,%al
> <mwait_idle_with_hints.constprop.0+0x32>: jne 0xffffffff936dabdb
> <mwait_idle_with_hints.constprop.0+0x4b>
> <mwait_idle_with_hints.constprop.0+0x34>: jmpq 0xffffffff936dabd0
> <mwait_idle_with_hints.constprop.0+0x40>
> <mwait_idle_with_hints.constprop.0+0x39>: verw 0x9f9fec(%rip) #
> 0xffffffff940d4bbc
> <mwait_idle_with_hints.constprop.0+0x40>: mov $0x1,%ecx
> <mwait_idle_with_hints.constprop.0+0x45>: mov %rdi,%rax
> <mwait_idle_with_hints.constprop.0+0x48>: mwait %rax,%rcx
> <mwait_idle_with_hints.constprop.0+0x4b>: mov %gs:0x17bc0,%rax
> <mwait_idle_with_hints.constprop.0+0x54>: lock andb $0xdf,0x2(%rax)
> <mwait_idle_with_hints.constprop.0+0x59>: lock addl $0x0,-0x4(%rsp)
> <mwait_idle_with_hints.constprop.0+0x5f>: mov (%rax),%rax
> <mwait_idle_with_hints.constprop.0+0x62>: test $0x8,%al
> <mwait_idle_with_hints.constprop.0+0x64>: je 0xffffffff936dac01
> <mwait_idle_with_hints.constprop.0+0x71>
> <mwait_idle_with_hints.constprop.0+0x66>: andl
> $0x7fffffff,%gs:0x6c93cf7f(%rip) # 0x17b80
> <mwait_idle_with_hints.constprop.0+0x71>: retq
> <mwait_idle_with_hints.constprop.0+0x72>: mov %gs:0x17bc0,%rax
> <mwait_idle_with_hints.constprop.0+0x7b>: lock orb $0x20,0x2(%rax)
> <mwait_idle_with_hints.constprop.0+0x80>: mov (%rax),%rax
> <mwait_idle_with_hints.constprop.0+0x83>: test $0x8,%al
> <mwait_idle_with_hints.constprop.0+0x85>: jne 0xffffffff936dabdb
> <mwait_idle_with_hints.constprop.0+0x4b>
> <mwait_idle_with_hints.constprop.0+0x87>: jmpq 0xffffffff936dab95
> <mwait_idle_with_hints.constprop.0+0x5>
> <mwait_idle_with_hints.constprop.0+0x8c>: nopl 0x0(%rax)
>
> 0x4b is after the mwait, which means we're panicing in the
> current_clr_polling(), where we do clear_thread_flag(TIF_POLLING_NRFLAG). Thanks,
>
> Josef

2021-03-17 21:30:24

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On 3/17/21 12:14 PM, Kai-Heng Feng wrote:
> On Wed, Mar 17, 2021 at 11:19 PM Josef Bacik <[email protected]> wrote:
>>
>> On 3/16/21 10:50 PM, Kai-Heng Feng wrote:
>>> Hi,
>>>
>>> On Wed, Mar 17, 2021 at 10:17 AM Josef Bacik <[email protected]> wrote:
>>>>
>>>> This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
>>>>
>>>> This patch causes a panic when rebooting my Dell Poweredge r440. I do
>>>> not have the full panic log as it's lost at that stage of the reboot and
>>>> I do not have a serial console. Reverting this patch makes my system
>>>> able to reboot again.
>>>
>>> But this patch also helps many HP laptops, so maybe we should figure
>>> out what's going on on Poweredge r440.
>>> Does it also panic on shutdown?
>>>
>>
>> Sure I'll test whatever to get it fixed, but I just wasted 3 days bisecting and
>> lost a weekend of performance testing on btrfs because of this regression, so
>> until you figure out how it broke it needs to be reverted so people don't have
>> to figure out why reboot suddenly isn't working.
>
> That's unfortunate to hear. However, I've been spending tons of time
> on bisecting kernels. To me it's just a normal part of kernel
> development so I won't call it "wasted".
>
> Feel free to revert the patch though.
>
>>
>> Running "halt" has the same effect with and without your patch, it gets to
>> "system halted" and just sits there without powering off. Not entirely sure why
>> that is, but there's no panic.
>
> What about shutdown? pm_power_off_prepare() is used by shutdown but
> it's not used by halt.

"shutdown now" works fine with and without your patch. Thanks,

Josef

2021-03-18 05:47:29

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On Thu, Mar 18, 2021 at 1:25 AM Josef Bacik <[email protected]> wrote:
[snipped]
> "shutdown now" works fine with and without your patch. Thanks,

Rafael,
Please revert the patch while we are working on it.

Josef,
Can you please test the following patch:

diff --git a/kernel/reboot.c b/kernel/reboot.c
index eb1b15850761..263444a3fb38 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -233,6 +233,15 @@ void migrate_to_reboot_cpu(void)
set_cpus_allowed_ptr(current, cpumask_of(cpu));
}

+static void kernel_shutdown_prepare(enum system_states state)
+{
+ blocking_notifier_call_chain(&reboot_notifier_list,
+ (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
+ system_state = state;
+ usermodehelper_disable();
+ device_shutdown();
+}
+
/**
* kernel_restart - reboot the system
* @cmd: pointer to buffer containing command to execute for restart
@@ -243,7 +252,7 @@ void migrate_to_reboot_cpu(void)
*/
void kernel_restart(char *cmd)
{
- kernel_restart_prepare(cmd);
+ kernel_shutdown_prepare(SYSTEM_POWER_OFF);
if (pm_power_off_prepare)
pm_power_off_prepare();
migrate_to_reboot_cpu();
@@ -257,14 +266,6 @@ void kernel_restart(char *cmd)
}
EXPORT_SYMBOL_GPL(kernel_restart);

-static void kernel_shutdown_prepare(enum system_states state)
-{
- blocking_notifier_call_chain(&reboot_notifier_list,
- (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
- system_state = state;
- usermodehelper_disable();
- device_shutdown();
-}
/**
* kernel_halt - halt the system
*

>
> Josef

2021-03-18 16:01:57

by Wysocki, Rafael J

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On 3/18/2021 6:42 AM, Kai-Heng Feng wrote:
> On Thu, Mar 18, 2021 at 1:25 AM Josef Bacik <[email protected]> wrote:
> [snipped]
>> "shutdown now" works fine with and without your patch. Thanks,
> Rafael,
> Please revert the patch while we are working on it.

Done, thanks!


> Josef,
> Can you please test the following patch:
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index eb1b15850761..263444a3fb38 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -233,6 +233,15 @@ void migrate_to_reboot_cpu(void)
> set_cpus_allowed_ptr(current, cpumask_of(cpu));
> }
>
> +static void kernel_shutdown_prepare(enum system_states state)
> +{
> + blocking_notifier_call_chain(&reboot_notifier_list,
> + (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
> + system_state = state;
> + usermodehelper_disable();
> + device_shutdown();
> +}
> +
> /**
> * kernel_restart - reboot the system
> * @cmd: pointer to buffer containing command to execute for restart
> @@ -243,7 +252,7 @@ void migrate_to_reboot_cpu(void)
> */
> void kernel_restart(char *cmd)
> {
> - kernel_restart_prepare(cmd);
> + kernel_shutdown_prepare(SYSTEM_POWER_OFF);
> if (pm_power_off_prepare)
> pm_power_off_prepare();
> migrate_to_reboot_cpu();
> @@ -257,14 +266,6 @@ void kernel_restart(char *cmd)
> }
> EXPORT_SYMBOL_GPL(kernel_restart);
>
> -static void kernel_shutdown_prepare(enum system_states state)
> -{
> - blocking_notifier_call_chain(&reboot_notifier_list,
> - (state == SYSTEM_HALT) ? SYS_HALT : SYS_POWER_OFF, NULL);
> - system_state = state;
> - usermodehelper_disable();
> - device_shutdown();
> -}
> /**
> * kernel_halt - halt the system
> *
>
>> Josef


2021-03-18 19:05:26

by Josef Bacik

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On 3/18/21 1:42 AM, Kai-Heng Feng wrote:
> On Thu, Mar 18, 2021 at 1:25 AM Josef Bacik <[email protected]> wrote:
> [snipped]
>> "shutdown now" works fine with and without your patch. Thanks,
>
> Rafael,
> Please revert the patch while we are working on it.
>
> Josef,
> Can you please test the following patch:
>

That made it work fine. Thanks,

Josef

2021-03-19 16:25:21

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On Fri, Mar 19, 2021 at 3:02 AM Josef Bacik <[email protected]> wrote:
>
> On 3/18/21 1:42 AM, Kai-Heng Feng wrote:
> > On Thu, Mar 18, 2021 at 1:25 AM Josef Bacik <[email protected]> wrote:
> > [snipped]
> >> "shutdown now" works fine with and without your patch. Thanks,
> >
> > Rafael,
> > Please revert the patch while we are working on it.
> >
> > Josef,
> > Can you please test the following patch:
> >
>
> That made it work fine. Thanks,

So there are things depend on reboot or shutdown to carry out
different behaviors.
Can you please find whether there is any code path uses SYSTEM_RESTART
or SYSTEM_POWER_OFF that leads to different behavior? Most likely
happen in a driver's .shutdown callback.

Kai-Heng

>
> Josef

2021-07-12 17:07:59

by Kai-Heng Feng

[permalink] [raw]
Subject: Re: [PATCH][RESEND] Revert "PM: ACPI: reboot: Use S5 for reboot"

On Sat, Mar 20, 2021 at 12:23 AM Kai-Heng Feng
<[email protected]> wrote:
>
> On Fri, Mar 19, 2021 at 3:02 AM Josef Bacik <[email protected]> wrote:
> >
> > On 3/18/21 1:42 AM, Kai-Heng Feng wrote:
> > > On Thu, Mar 18, 2021 at 1:25 AM Josef Bacik <[email protected]> wrote:
> > > [snipped]
> > >> "shutdown now" works fine with and without your patch. Thanks,
> > >
> > > Rafael,
> > > Please revert the patch while we are working on it.
> > >
> > > Josef,
> > > Can you please test the following patch:
> > >
> >
> > That made it work fine. Thanks,
>
> So there are things depend on reboot or shutdown to carry out
> different behaviors.
> Can you please find whether there is any code path uses SYSTEM_RESTART
> or SYSTEM_POWER_OFF that leads to different behavior? Most likely
> happen in a driver's .shutdown callback.

A gentle ping...

>
> Kai-Heng
>
> >
> > Josef