On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
> The smp_processor_id() shouldn't be called from preemptible code.
> Instead use get_cpu() and put_cpu() which disables preemption in
> addition to getting the processor id. This fixes the following bug:
>
> [ 119.143590] sysrq: Show backtrace of all active CPUs
> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30
> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
> [ 119.145053] Call trace:
> [ 119.145093] dump_backtrace+0x0/0x1a0
> [ 119.145122] show_stack+0x18/0x70
> [ 119.145141] dump_stack+0xc4/0x11c
> [ 119.145159] check_preemption_disabled+0x100/0x110
> [ 119.145175] debug_smp_processor_id+0x20/0x30
> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0
> [ 119.145211] __handle_sysrq+0x8c/0x1a0
> [ 119.145227] write_sysrq_trigger+0x94/0x12c
> [ 119.145247] proc_reg_write+0xa8/0xe4
> [ 119.145266] vfs_write+0xec/0x280
> [ 119.145282] ksys_write+0x6c/0x100
> [ 119.145298] __arm64_sys_write+0x20/0x30
> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4
> [ 119.145332] do_el0_svc+0x24/0x8c
> [ 119.145348] el0_svc+0x10/0x20
> [ 119.145364] el0_sync_handler+0x134/0x140
> [ 119.145381] el0_sync+0x180/0x1c0
>
> Cc: [email protected]
> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
> Signed-off-by: Muhammad Usama Anjum <[email protected]>
> ---
> Changes since v2:
> - Add changelog and resend
>
> Changes since v1:
> - Add "Cc: [email protected]" tag
> ---
> drivers/tty/sysrq.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> index 23198e3f1461a..6b4a28bcf2f5f 100644
> --- a/drivers/tty/sysrq.c
> +++ b/drivers/tty/sysrq.c
> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
> if (in_hardirq())
> regs = get_irq_regs();
>
> - pr_info("CPU%d:\n", smp_processor_id());
> + pr_info("CPU%d:\n", get_cpu());
Why not call put_cpu() right here?
> if (regs)
> show_regs(regs);
> else
> show_stack(NULL, NULL, KERN_INFO);
>
> schedule_work(&sysrq_showallcpus);
> + put_cpu();
Why wait so long here after you have scheduled work? Please drop the
cpu reference right away, you don't need to hold it for this length of
time, right?
thanks,
greg k-h
On 18. 09. 23, 10:10, Greg Kroah-Hartman wrote:
> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>> The smp_processor_id() shouldn't be called from preemptible code.
>> Instead use get_cpu() and put_cpu() which disables preemption in
>> addition to getting the processor id. This fixes the following bug:
>>
>> [ 119.143590] sysrq: Show backtrace of all active CPUs
>> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30
>> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>> [ 119.145053] Call trace:
>> [ 119.145093] dump_backtrace+0x0/0x1a0
>> [ 119.145122] show_stack+0x18/0x70
>> [ 119.145141] dump_stack+0xc4/0x11c
>> [ 119.145159] check_preemption_disabled+0x100/0x110
>> [ 119.145175] debug_smp_processor_id+0x20/0x30
>> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0
>> [ 119.145211] __handle_sysrq+0x8c/0x1a0
>> [ 119.145227] write_sysrq_trigger+0x94/0x12c
>> [ 119.145247] proc_reg_write+0xa8/0xe4
>> [ 119.145266] vfs_write+0xec/0x280
>> [ 119.145282] ksys_write+0x6c/0x100
>> [ 119.145298] __arm64_sys_write+0x20/0x30
>> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4
>> [ 119.145332] do_el0_svc+0x24/0x8c
>> [ 119.145348] el0_svc+0x10/0x20
>> [ 119.145364] el0_sync_handler+0x134/0x140
>> [ 119.145381] el0_sync+0x180/0x1c0
>>
>> Cc: [email protected]
>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
>> Signed-off-by: Muhammad Usama Anjum <[email protected]>
>> ---
>> Changes since v2:
>> - Add changelog and resend
>>
>> Changes since v1:
>> - Add "Cc: [email protected]" tag
>> ---
>> drivers/tty/sysrq.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>> --- a/drivers/tty/sysrq.c
>> +++ b/drivers/tty/sysrq.c
>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>> if (in_hardirq())
>> regs = get_irq_regs();
>>
>> - pr_info("CPU%d:\n", smp_processor_id());
>> + pr_info("CPU%d:\n", get_cpu());
>
> Why not call put_cpu() right here?
>
>> if (regs)
>> show_regs(regs);
>> else
>> show_stack(NULL, NULL, KERN_INFO);
>>
>> schedule_work(&sysrq_showallcpus);
>> + put_cpu();
>
> Why wait so long here after you have scheduled work? Please drop the
> cpu reference right away, you don't need to hold it for this length of
> time, right?
As I understand it, this way, schedule_work() will queue the work on the
"gotten" (current) CPU. So sysrq_showregs_othercpus() will really dump
other than the "gotten" cpu.
If that is the case, it indeed should have been described in the commit log.
regards,
--
js
suse labs
On Tue, Sep 19, 2023 at 07:52:42AM +0200, Jiri Slaby wrote:
> On 18. 09. 23, 10:10, Greg Kroah-Hartman wrote:
> > On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
> > > The smp_processor_id() shouldn't be called from preemptible code.
> > > Instead use get_cpu() and put_cpu() which disables preemption in
> > > addition to getting the processor id. This fixes the following bug:
> > >
> > > [ 119.143590] sysrq: Show backtrace of all active CPUs
> > > [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
> > > [ 119.144586] caller is debug_smp_processor_id+0x20/0x30
> > > [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
> > > [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
> > > [ 119.145053] Call trace:
> > > [ 119.145093] dump_backtrace+0x0/0x1a0
> > > [ 119.145122] show_stack+0x18/0x70
> > > [ 119.145141] dump_stack+0xc4/0x11c
> > > [ 119.145159] check_preemption_disabled+0x100/0x110
> > > [ 119.145175] debug_smp_processor_id+0x20/0x30
> > > [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0
> > > [ 119.145211] __handle_sysrq+0x8c/0x1a0
> > > [ 119.145227] write_sysrq_trigger+0x94/0x12c
> > > [ 119.145247] proc_reg_write+0xa8/0xe4
> > > [ 119.145266] vfs_write+0xec/0x280
> > > [ 119.145282] ksys_write+0x6c/0x100
> > > [ 119.145298] __arm64_sys_write+0x20/0x30
> > > [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4
> > > [ 119.145332] do_el0_svc+0x24/0x8c
> > > [ 119.145348] el0_svc+0x10/0x20
> > > [ 119.145364] el0_sync_handler+0x134/0x140
> > > [ 119.145381] el0_sync+0x180/0x1c0
> > >
> > > Cc: [email protected]
> > > Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
> > > Signed-off-by: Muhammad Usama Anjum <[email protected]>
> > > ---
> > > Changes since v2:
> > > - Add changelog and resend
> > >
> > > Changes since v1:
> > > - Add "Cc: [email protected]" tag
> > > ---
> > > drivers/tty/sysrq.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
> > > index 23198e3f1461a..6b4a28bcf2f5f 100644
> > > --- a/drivers/tty/sysrq.c
> > > +++ b/drivers/tty/sysrq.c
> > > @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
> > > if (in_hardirq())
> > > regs = get_irq_regs();
> > > - pr_info("CPU%d:\n", smp_processor_id());
> > > + pr_info("CPU%d:\n", get_cpu());
> >
> > Why not call put_cpu() right here?
> >
> > > if (regs)
> > > show_regs(regs);
> > > else
> > > show_stack(NULL, NULL, KERN_INFO);
> > > schedule_work(&sysrq_showallcpus);
> > > + put_cpu();
> >
> > Why wait so long here after you have scheduled work? Please drop the
> > cpu reference right away, you don't need to hold it for this length of
> > time, right?
>
> As I understand it, this way, schedule_work() will queue the work on the
> "gotten" (current) CPU. So sysrq_showregs_othercpus() will really dump other
> than the "gotten" cpu.
Ok, that makes a bit more sense, but that's not what the code does
today, have people seen the regs dumped from the wrong cpu in the past?
> If that is the case, it indeed should have been described in the commit log.
Agreed.
thanks for the review,
greg k-h
On 9/19/23 12:13 PM, Greg Kroah-Hartman wrote:
> On Tue, Sep 19, 2023 at 07:52:42AM +0200, Jiri Slaby wrote:
>> On 18. 09. 23, 10:10, Greg Kroah-Hartman wrote:
>>> On Tue, Aug 22, 2023 at 03:26:06PM +0500, Muhammad Usama Anjum wrote:
>>>> The smp_processor_id() shouldn't be called from preemptible code.
>>>> Instead use get_cpu() and put_cpu() which disables preemption in
>>>> addition to getting the processor id. This fixes the following bug:
>>>>
>>>> [ 119.143590] sysrq: Show backtrace of all active CPUs
>>>> [ 119.143902] BUG: using smp_processor_id() in preemptible [00000000] code: bash/873
>>>> [ 119.144586] caller is debug_smp_processor_id+0x20/0x30
>>>> [ 119.144827] CPU: 6 PID: 873 Comm: bash Not tainted 5.10.124-dirty #3
>>>> [ 119.144861] Hardware name: QEMU QEMU Virtual Machine, BIOS 2023.05-1 07/22/2023
>>>> [ 119.145053] Call trace:
>>>> [ 119.145093] dump_backtrace+0x0/0x1a0
>>>> [ 119.145122] show_stack+0x18/0x70
>>>> [ 119.145141] dump_stack+0xc4/0x11c
>>>> [ 119.145159] check_preemption_disabled+0x100/0x110
>>>> [ 119.145175] debug_smp_processor_id+0x20/0x30
>>>> [ 119.145195] sysrq_handle_showallcpus+0x20/0xc0
>>>> [ 119.145211] __handle_sysrq+0x8c/0x1a0
>>>> [ 119.145227] write_sysrq_trigger+0x94/0x12c
>>>> [ 119.145247] proc_reg_write+0xa8/0xe4
>>>> [ 119.145266] vfs_write+0xec/0x280
>>>> [ 119.145282] ksys_write+0x6c/0x100
>>>> [ 119.145298] __arm64_sys_write+0x20/0x30
>>>> [ 119.145315] el0_svc_common.constprop.0+0x78/0x1e4
>>>> [ 119.145332] do_el0_svc+0x24/0x8c
>>>> [ 119.145348] el0_svc+0x10/0x20
>>>> [ 119.145364] el0_sync_handler+0x134/0x140
>>>> [ 119.145381] el0_sync+0x180/0x1c0
>>>>
>>>> Cc: [email protected]
>>>> Fixes: 47cab6a722d4 ("debug lockups: Improve lockup detection, fix generic arch fallback")
>>>> Signed-off-by: Muhammad Usama Anjum <[email protected]>
>>>> ---
>>>> Changes since v2:
>>>> - Add changelog and resend
>>>>
>>>> Changes since v1:
>>>> - Add "Cc: [email protected]" tag
>>>> ---
>>>> drivers/tty/sysrq.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c
>>>> index 23198e3f1461a..6b4a28bcf2f5f 100644
>>>> --- a/drivers/tty/sysrq.c
>>>> +++ b/drivers/tty/sysrq.c
>>>> @@ -262,13 +262,14 @@ static void sysrq_handle_showallcpus(u8 key)
>>>> if (in_hardirq())
>>>> regs = get_irq_regs();
>>>> - pr_info("CPU%d:\n", smp_processor_id());
>>>> + pr_info("CPU%d:\n", get_cpu());
>>>
>>> Why not call put_cpu() right here?
>>>
>>>> if (regs)
>>>> show_regs(regs);
>>>> else
>>>> show_stack(NULL, NULL, KERN_INFO);
>>>> schedule_work(&sysrq_showallcpus);
>>>> + put_cpu();
>>>
>>> Why wait so long here after you have scheduled work? Please drop the
>>> cpu reference right away, you don't need to hold it for this length of
>>> time, right?
>>
>> As I understand it, this way, schedule_work() will queue the work on the
>> "gotten" (current) CPU. So sysrq_showregs_othercpus() will really dump other
>> than the "gotten" cpu.
>
> Ok, that makes a bit more sense, but that's not what the code does
> today, have people seen the regs dumped from the wrong cpu in the past?
>
>> If that is the case, it indeed should have been described in the commit log.
Thanks for review. I'll add the explanation in the commit log and send again.
>
> Agreed.
>
> thanks for the review,
>
> greg k-h
--
BR,
Muhammad Usama Anjum