2023-07-26 19:26:50

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

The ndev was accessed on shutdown without a check if it actually exists.
This triggered the crash pasted below. This patch simply adds a check
before using ndev.

BUG: kernel NULL pointer dereference, address: 0000000000000300
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die+0x20/0x60
? page_fault_oops+0x14c/0x3c0
? exc_page_fault+0x75/0x140
? asm_exc_page_fault+0x22/0x30
? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
device_shutdown+0x13e/0x1e0
kernel_restart+0x36/0x90
__do_sys_reboot+0x141/0x210
? vfs_writev+0xcd/0x140
? handle_mm_fault+0x161/0x260
? do_writev+0x6b/0x110
do_syscall_64+0x3d/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f496990fb56
RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
</TASK>
CR2: 0000000000000300
---[ end trace 0000000000000000 ]---

Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 9138ef2fb2c8..e2e7ebd71798 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev)
mgtdev = auxiliary_get_drvdata(auxdev);
ndev = mgtdev->ndev;

- free_irqs(ndev);
+ if (ndev)
+ free_irqs(ndev);
}

static const struct auxiliary_device_id mlx5v_id_table[] = {
--
2.41.0



2023-07-26 19:58:32

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> The ndev was accessed on shutdown without a check if it actually exists.
> This triggered the crash pasted below. This patch simply adds a check
> before using ndev.
>
> BUG: kernel NULL pointer dereference, address: 0000000000000300
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] SMP
> CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __die+0x20/0x60
> ? page_fault_oops+0x14c/0x3c0
> ? exc_page_fault+0x75/0x140
> ? asm_exc_page_fault+0x22/0x30
> ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> device_shutdown+0x13e/0x1e0
> kernel_restart+0x36/0x90
> __do_sys_reboot+0x141/0x210
> ? vfs_writev+0xcd/0x140
> ? handle_mm_fault+0x161/0x260
> ? do_writev+0x6b/0x110
> do_syscall_64+0x3d/0x90
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f496990fb56
> RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> </TASK>
> CR2: 0000000000000300
> ---[ end trace 0000000000000000 ]---
>
> Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> Signed-off-by: Dragos Tatulea <[email protected]>
> ---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> index 9138ef2fb2c8..e2e7ebd71798 100644
> --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device *auxdev)
> mgtdev = auxiliary_get_drvdata(auxdev);
> ndev = mgtdev->ndev;
>
> - free_irqs(ndev);
> + if (ndev)
> + free_irqs(ndev);
> }
>

something I don't get:
irqs are allocated in mlx5_vdpa_dev_add
why are they not freed in mlx5_vdpa_dev_del?

this is what's creating all this mess.



> static const struct auxiliary_device_id mlx5v_id_table[] = {
> --
> 2.41.0


2023-07-27 16:26:15

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > The ndev was accessed on shutdown without a check if it actually exists.
> > This triggered the crash pasted below. This patch simply adds a check
> > before using ndev.
> >
> >  BUG: kernel NULL pointer dereference, address: 0000000000000300
> >  #PF: supervisor read access in kernel mode
> >  #PF: error_code(0x0000) - not-present page
> >  PGD 0 P4D 0
> >  Oops: 0000 [#1] SMP
> >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> >  RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> >  R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> >  R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > knlGS:0000000000000000
> >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >  Call Trace:
> >   <TASK>
> >   ? __die+0x20/0x60
> >   ? page_fault_oops+0x14c/0x3c0
> >   ? exc_page_fault+0x75/0x140
> >   ? asm_exc_page_fault+0x22/0x30
> >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> >   device_shutdown+0x13e/0x1e0
> >   kernel_restart+0x36/0x90
> >   __do_sys_reboot+0x141/0x210
> >   ? vfs_writev+0xcd/0x140
> >   ? handle_mm_fault+0x161/0x260
> >   ? do_writev+0x6b/0x110
> >   do_syscall_64+0x3d/0x90
> >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> >  RIP: 0033:0x7f496990fb56
> >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> >  RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> >  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> >   </TASK>
> >  CR2: 0000000000000300
> >  ---[ end trace 0000000000000000 ]---
> >
> > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > Signed-off-by: Dragos Tatulea <[email protected]>
> > ---
> >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > index 9138ef2fb2c8..e2e7ebd71798 100644
> > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > *auxdev)
> >         mgtdev = auxiliary_get_drvdata(auxdev);
> >         ndev = mgtdev->ndev;
> >  
> > -       free_irqs(ndev);
> > +       if (ndev)
> > +               free_irqs(ndev);
> >  }
> >  
>
> something I don't get:
> irqs are allocated in mlx5_vdpa_dev_add
> why are they not freed in mlx5_vdpa_dev_del?
>
That is a good point. I will try to find out. I also don't get why free_irq is
called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change
that in a different refactoring.

> this is what's creating all this mess.
>
>
Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or
am I missing something?

> >  static const struct auxiliary_device_id mlx5v_id_table[] = {
> > --
> > 2.41.0
>

2023-07-27 16:49:08

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > The ndev was accessed on shutdown without a check if it actually exists.
> > > This triggered the crash pasted below. This patch simply adds a check
> > > before using ndev.
> > >
> > > ?BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > ?#PF: supervisor read access in kernel mode
> > > ?#PF: error_code(0x0000) - not-present page
> > > ?PGD 0 P4D 0
> > > ?Oops: 0000 [#1] SMP
> > > ?CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > ?Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > ?RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > ?RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > ?RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > ?RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > ?RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > ?R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > ?R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > ?FS:? 00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > knlGS:0000000000000000
> > > ?CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > ?CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > ?DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > ?DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > ?Call Trace:
> > > ? <TASK>
> > > ? ? __die+0x20/0x60
> > > ? ? page_fault_oops+0x14c/0x3c0
> > > ? ? exc_page_fault+0x75/0x140
> > > ? ? asm_exc_page_fault+0x22/0x30
> > > ? ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > ? device_shutdown+0x13e/0x1e0
> > > ? kernel_restart+0x36/0x90
> > > ? __do_sys_reboot+0x141/0x210
> > > ? ? vfs_writev+0xcd/0x140
> > > ? ? handle_mm_fault+0x161/0x260
> > > ? ? do_writev+0x6b/0x110
> > > ? do_syscall_64+0x3d/0x90
> > > ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > ?RIP: 0033:0x7f496990fb56
> > > ?RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> > > ?RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > ?RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > ?RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > ?R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > ?R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > ? </TASK>
> > > ?CR2: 0000000000000300
> > > ?---[ end trace 0000000000000000 ]---
> > >
> > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > ---
> > > ?drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > ?1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > *auxdev)
> > > ????????mgtdev = auxiliary_get_drvdata(auxdev);
> > > ????????ndev = mgtdev->ndev;
> > > ?
> > > -???????free_irqs(ndev);
> > > +???????if (ndev)
> > > +???????????????free_irqs(ndev);
> > > ?}
> > > ?
> >
> > something I don't get:
> > irqs are allocated in mlx5_vdpa_dev_add
> > why are they not freed in mlx5_vdpa_dev_del?
> >
> That is a good point. I will try to find out. I also don't get why free_irq is
> called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can change
> that in a different refactoring.

as it is I have no idea whether e.g. ndev can change
between these two call sites. that would make the check
pointless.

> > this is what's creating all this mess.
> >
> >
> Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I see. Or
> am I missing something?

and why do we care whether irqs are freed on shutdown?

> > > ?static const struct auxiliary_device_id mlx5v_id_table[] = {
> > > --
> > > 2.41.0
> >
>


2023-07-31 07:51:17

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > The ndev was accessed on shutdown without a check if it actually exists.
> > > > This triggered the crash pasted below. This patch simply adds a check
> > > > before using ndev.
> > > >
> > > >  BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > >  #PF: supervisor read access in kernel mode
> > > >  #PF: error_code(0x0000) - not-present page
> > > >  PGD 0 P4D 0
> > > >  Oops: 0000 [#1] SMP
> > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > >  RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > >  R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > >  R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > knlGS:0000000000000000
> > > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > >  Call Trace:
> > > >   <TASK>
> > > >   ? __die+0x20/0x60
> > > >   ? page_fault_oops+0x14c/0x3c0
> > > >   ? exc_page_fault+0x75/0x140
> > > >   ? asm_exc_page_fault+0x22/0x30
> > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > >   device_shutdown+0x13e/0x1e0
> > > >   kernel_restart+0x36/0x90
> > > >   __do_sys_reboot+0x141/0x210
> > > >   ? vfs_writev+0xcd/0x140
> > > >   ? handle_mm_fault+0x161/0x260
> > > >   ? do_writev+0x6b/0x110
> > > >   do_syscall_64+0x3d/0x90
> > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > >  RIP: 0033:0x7f496990fb56
> > > >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> > > >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > >  RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > >  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > >   </TASK>
> > > >  CR2: 0000000000000300
> > > >  ---[ end trace 0000000000000000 ]---
> > > >
> > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > ---
> > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > > *auxdev)
> > > >         mgtdev = auxiliary_get_drvdata(auxdev);
> > > >         ndev = mgtdev->ndev;
> > > >  
> > > > -       free_irqs(ndev);
> > > > +       if (ndev)
> > > > +               free_irqs(ndev);
> > > >  }
> > > >  
> > >
> > > something I don't get:
> > > irqs are allocated in mlx5_vdpa_dev_add
> > > why are they not freed in mlx5_vdpa_dev_del?
> > >
> > That is a good point. I will try to find out. I also don't get why free_irq
> > is
> > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > change
> > that in a different refactoring.
>
> as it is I have no idea whether e.g. ndev can change
> between these two call sites. that would make the check
> pointless.
>
> > > this is what's creating all this mess.
> > >
> > >
> > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > see. Or
> > am I missing something?
>
> and why do we care whether irqs are freed on shutdown?
>
Had to ask around a bit to find out the answer: there can be issues with kexec
IRQ allocation on some platforms. It is documented here [0] for mlx5_core.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129

Thanks,
Dragos

2023-07-31 09:20:21

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > The ndev was accessed on shutdown without a check if it actually exists.
> > > > > This triggered the crash pasted below. This patch simply adds a check
> > > > > before using ndev.
> > > > >
> > > > > ?BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > > > ?#PF: supervisor read access in kernel mode
> > > > > ?#PF: error_code(0x0000) - not-present page
> > > > > ?PGD 0 P4D 0
> > > > > ?Oops: 0000 [#1] SMP
> > > > > ?CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > ?Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > ?RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > ?RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > ?RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > > > ?RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > > > ?RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > > > ?R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > > > ?R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > > > ?FS:? 00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > knlGS:0000000000000000
> > > > > ?CS:? 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > ?CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > > > ?DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > ?DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > ?Call Trace:
> > > > > ? <TASK>
> > > > > ? ? __die+0x20/0x60
> > > > > ? ? page_fault_oops+0x14c/0x3c0
> > > > > ? ? exc_page_fault+0x75/0x140
> > > > > ? ? asm_exc_page_fault+0x22/0x30
> > > > > ? ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > ? device_shutdown+0x13e/0x1e0
> > > > > ? kernel_restart+0x36/0x90
> > > > > ? __do_sys_reboot+0x141/0x210
> > > > > ? ? vfs_writev+0xcd/0x140
> > > > > ? ? handle_mm_fault+0x161/0x260
> > > > > ? ? do_writev+0x6b/0x110
> > > > > ? do_syscall_64+0x3d/0x90
> > > > > ? entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > ?RIP: 0033:0x7f496990fb56
> > > > > ?RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> > > > > ?RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > > > ?RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > > > ?RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > > > ?R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > > ?R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > > > ? </TASK>
> > > > > ?CR2: 0000000000000300
> > > > > ?---[ end trace 0000000000000000 ]---
> > > > >
> > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > ---
> > > > > ?drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > ?1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > > > *auxdev)
> > > > > ????????mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > ????????ndev = mgtdev->ndev;
> > > > > ?
> > > > > -???????free_irqs(ndev);
> > > > > +???????if (ndev)
> > > > > +???????????????free_irqs(ndev);
> > > > > ?}
> > > > > ?
> > > >
> > > > something I don't get:
> > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > why are they not freed in mlx5_vdpa_dev_del?
> > > >
> > > That is a good point. I will try to find out. I also don't get why free_irq
> > > is
> > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > > change
> > > that in a different refactoring.
> >
> > as it is I have no idea whether e.g. ndev can change
> > between these two call sites. that would make the check
> > pointless.
> >
> > > > this is what's creating all this mess.
> > > >
> > > >
> > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > see. Or
> > > am I missing something?
> >
> > and why do we care whether irqs are freed on shutdown?
> >
> Had to ask around a bit to find out the answer: there can be issues with kexec
> IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
>
> Thanks,
> Dragos

It's quite weird.
* Some platforms requiring freeing the IRQ's in the shutdown
* flow. If they aren't freed they can't be allocated after
* kexec. There is no need to cleanup the mlx5_core software
* contexts.

but most drivers don't have a shutdown callback how do they work then?
do you know which platforms these are?

I don't really know much about why shutdown callback is even necessary.
I guess this is to detect shutdown and do a faster cleanup than
the slow, graceful removal, just cleaning hardware resources?


--
MST


2023-08-01 04:29:00

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]> wrote:
>
> On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > The ndev was accessed on shutdown without a check if it actually exists.
> > > > > > This triggered the crash pasted below. This patch simply adds a check
> > > > > > before using ndev.
> > > > > >
> > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > > > > #PF: supervisor read access in kernel mode
> > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > PGD 0 P4D 0
> > > > > > Oops: 0000 [#1] SMP
> > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-
> > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > > > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > > > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > > > > R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > > > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > knlGS:0000000000000000
> > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > Call Trace:
> > > > > > <TASK>
> > > > > > ? __die+0x20/0x60
> > > > > > ? page_fault_oops+0x14c/0x3c0
> > > > > > ? exc_page_fault+0x75/0x140
> > > > > > ? asm_exc_page_fault+0x22/0x30
> > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > device_shutdown+0x13e/0x1e0
> > > > > > kernel_restart+0x36/0x90
> > > > > > __do_sys_reboot+0x141/0x210
> > > > > > ? vfs_writev+0xcd/0x140
> > > > > > ? handle_mm_fault+0x161/0x260
> > > > > > ? do_writev+0x6b/0x110
> > > > > > do_syscall_64+0x3d/0x90
> > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > RIP: 0033:0x7f496990fb56
> > > > > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> > > > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > > > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > > > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > > > > </TASK>
> > > > > > CR2: 0000000000000300
> > > > > > ---[ end trace 0000000000000000 ]---
> > > > > >
> > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > ---
> > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct auxiliary_device
> > > > > > *auxdev)
> > > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > ndev = mgtdev->ndev;
> > > > > >
> > > > > > - free_irqs(ndev);
> > > > > > + if (ndev)
> > > > > > + free_irqs(ndev);
> > > > > > }
> > > > > >
> > > > >
> > > > > something I don't get:
> > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > >
> > > > That is a good point. I will try to find out. I also don't get why free_irq
> > > > is
> > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I can
> > > > change
> > > > that in a different refactoring.
> > >
> > > as it is I have no idea whether e.g. ndev can change
> > > between these two call sites. that would make the check
> > > pointless.
> > >
> > > > > this is what's creating all this mess.
> > > > >
> > > > >
> > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what I
> > > > see. Or
> > > > am I missing something?
> > >
> > > and why do we care whether irqs are freed on shutdown?
> > >
> > Had to ask around a bit to find out the answer: there can be issues with kexec
> > IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> >
> > Thanks,
> > Dragos
>
> It's quite weird.
> * Some platforms requiring freeing the IRQ's in the shutdown
> * flow. If they aren't freed they can't be allocated after
> * kexec. There is no need to cleanup the mlx5_core software
> * contexts.
>
> but most drivers don't have a shutdown callback how do they work then?
> do you know which platforms these are?

There used to be bzs that requires virtio drivers to add a shutdown to
fix kexec:

https://bugzilla.redhat.com/show_bug.cgi?id=2108406

Thanks

>
> I don't really know much about why shutdown callback is even necessary.
> I guess this is to detect shutdown and do a faster cleanup than
> the slow, graceful removal, just cleaning hardware resources?
>
>
> --
> MST
>


2023-08-01 10:17:07

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
> On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]> wrote:
> >
> > On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > > The ndev was accessed on shutdown without a check if it actually
> > > > > > > exists.
> > > > > > > This triggered the crash pasted below. This patch simply adds a
> > > > > > > check
> > > > > > > before using ndev.
> > > > > > >
> > > > > > >  BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > > > > >  #PF: supervisor read access in kernel mode
> > > > > > >  #PF: error_code(0x0000) - not-present page
> > > > > > >  PGD 0 P4D 0
> > > > > > >  Oops: 0000 [#1] SMP
> > > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-
> > > > > > > 1.13.0-0-
> > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > > > > >  RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > > > > >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > > > > >  R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > > > > >  R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > > > > >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > > knlGS:0000000000000000
> > > > > > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > > > > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > >  Call Trace:
> > > > > > >   <TASK>
> > > > > > >   ? __die+0x20/0x60
> > > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > > >   ? exc_page_fault+0x75/0x140
> > > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > >   device_shutdown+0x13e/0x1e0
> > > > > > >   kernel_restart+0x36/0x90
> > > > > > >   __do_sys_reboot+0x141/0x210
> > > > > > >   ? vfs_writev+0xcd/0x140
> > > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > > >   ? do_writev+0x6b/0x110
> > > > > > >   do_syscall_64+0x3d/0x90
> > > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > >  RIP: 0033:0x7f496990fb56
> > > > > > >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX:
> > > > > > > 00000000000000a9
> > > > > > >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > > > > >  RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > > > > >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > > > > >  R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > > > >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > > > > >   </TASK>
> > > > > > >  CR2: 0000000000000300
> > > > > > >  ---[ end trace 0000000000000000 ]---
> > > > > > >
> > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > > ---
> > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > >
> > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct
> > > > > > > auxiliary_device
> > > > > > > *auxdev)
> > > > > > >         mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > >         ndev = mgtdev->ndev;
> > > > > > >
> > > > > > > -       free_irqs(ndev);
> > > > > > > +       if (ndev)
> > > > > > > +               free_irqs(ndev);
> > > > > > >  }
> > > > > > >
> > > > > >
> > > > > > something I don't get:
> > > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > > >
> > > > > That is a good point. I will try to find out. I also don't get why
> > > > > free_irq
> > > > > is
> > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I
> > > > > can
> > > > > change
> > > > > that in a different refactoring.
> > > >
> > > > as it is I have no idea whether e.g. ndev can change
> > > > between these two call sites. that would make the check
> > > > pointless.
> > > >
> > > > > > this is what's creating all this mess.
> > > > > >
> > > > > >
> > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what
> > > > > I
> > > > > see. Or
> > > > > am I missing something?
> > > >
> > > > and why do we care whether irqs are freed on shutdown?
> > > >
> > > Had to ask around a bit to find out the answer: there can be issues with
> > > kexec
> > > IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> > >
> > > Thanks,
> > > Dragos
> >
> > It's quite weird.
> >          * Some platforms requiring freeing the IRQ's in the shutdown
> >          * flow. If they aren't freed they can't be allocated after
> >          * kexec. There is no need to cleanup the mlx5_core software
> >          * contexts.
> >
> > but most drivers don't have a shutdown callback how do they work then?
> > do you know which platforms these are?
>
I don't. x86_64 is not one of them though. I will do some more digging ...

> There used to be bzs that requires virtio drivers to add a shutdown to
> fix kexec:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=2108406
>
I don't have access to this. What is it about?

Thanks,
Dragos
> Thanks
>
> >
> > I don't really know much about why shutdown callback is even necessary.
> > I guess this is to detect shutdown and do a faster cleanup than
> > the slow, graceful removal, just cleaning hardware resources?
> >
> >
> > --
> > MST
> >
>

2023-08-02 03:20:46

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea <[email protected]> wrote:
>
> On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
> > On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]> wrote:
> > >
> > > On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > > > The ndev was accessed on shutdown without a check if it actually
> > > > > > > > exists.
> > > > > > > > This triggered the crash pasted below. This patch simply adds a
> > > > > > > > check
> > > > > > > > before using ndev.
> > > > > > > >
> > > > > > > > BUG: kernel NULL pointer dereference, address: 0000000000000300
> > > > > > > > #PF: supervisor read access in kernel mode
> > > > > > > > #PF: error_code(0x0000) - not-present page
> > > > > > > > PGD 0 P4D 0
> > > > > > > > Oops: 0000 [#1] SMP
> > > > > > > > CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-
> > > > > > > > 1.13.0-0-
> > > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > > > RAX: ffff888103befba0 RBX: ffff888109d28008 RCX: 0000000000000017
> > > > > > > > RDX: 0000000000000001 RSI: 0000000000000212 RDI: ffff888109d28000
> > > > > > > > RBP: 0000000000000000 R08: 0000000d3a3a3882 R09: 0000000000000001
> > > > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffff888109d28000
> > > > > > > > R13: ffff888109d28080 R14: 00000000fee1dead R15: 0000000000000000
> > > > > > > > FS: 00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > > > knlGS:0000000000000000
> > > > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > CR2: 0000000000000300 CR3: 00000001051cd006 CR4: 0000000000370eb0
> > > > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > > > Call Trace:
> > > > > > > > <TASK>
> > > > > > > > ? __die+0x20/0x60
> > > > > > > > ? page_fault_oops+0x14c/0x3c0
> > > > > > > > ? exc_page_fault+0x75/0x140
> > > > > > > > ? asm_exc_page_fault+0x22/0x30
> > > > > > > > ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > device_shutdown+0x13e/0x1e0
> > > > > > > > kernel_restart+0x36/0x90
> > > > > > > > __do_sys_reboot+0x141/0x210
> > > > > > > > ? vfs_writev+0xcd/0x140
> > > > > > > > ? handle_mm_fault+0x161/0x260
> > > > > > > > ? do_writev+0x6b/0x110
> > > > > > > > do_syscall_64+0x3d/0x90
> > > > > > > > entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > > > RIP: 0033:0x7f496990fb56
> > > > > > > > RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX:
> > > > > > > > 00000000000000a9
> > > > > > > > RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f496990fb56
> > > > > > > > RDX: 0000000001234567 RSI: 0000000028121969 RDI: fffffffffee1dead
> > > > > > > > RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09: 0000000000000000
> > > > > > > > R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> > > > > > > > R13: 00007fffc7bddf10 R14: 0000000000000000 R15: 00007fffc7bde2b8
> > > > > > > > </TASK>
> > > > > > > > CR2: 0000000000000300
> > > > > > > > ---[ end trace 0000000000000000 ]---
> > > > > > > >
> > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > > > ---
> > > > > > > > drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct
> > > > > > > > auxiliary_device
> > > > > > > > *auxdev)
> > > > > > > > mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > > > ndev = mgtdev->ndev;
> > > > > > > >
> > > > > > > > - free_irqs(ndev);
> > > > > > > > + if (ndev)
> > > > > > > > + free_irqs(ndev);
> > > > > > > > }
> > > > > > > >
> > > > > > >
> > > > > > > something I don't get:
> > > > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > > > >
> > > > > > That is a good point. I will try to find out. I also don't get why
> > > > > > free_irq
> > > > > > is
> > > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del. Maybe I
> > > > > > can
> > > > > > change
> > > > > > that in a different refactoring.
> > > > >
> > > > > as it is I have no idea whether e.g. ndev can change
> > > > > between these two call sites. that would make the check
> > > > > pointless.
> > > > >
> > > > > > > this is what's creating all this mess.
> > > > > > >
> > > > > > >
> > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's what
> > > > > > I
> > > > > > see. Or
> > > > > > am I missing something?
> > > > >
> > > > > and why do we care whether irqs are freed on shutdown?
> > > > >
> > > > Had to ask around a bit to find out the answer: there can be issues with
> > > > kexec
> > > > IRQ allocation on some platforms. It is documented here [0] for mlx5_core.
> > > >
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> > > >
> > > > Thanks,
> > > > Dragos
> > >
> > > It's quite weird.
> > > * Some platforms requiring freeing the IRQ's in the shutdown
> > > * flow. If they aren't freed they can't be allocated after
> > > * kexec. There is no need to cleanup the mlx5_core software
> > > * contexts.
> > >
> > > but most drivers don't have a shutdown callback how do they work then?
> > > do you know which platforms these are?
> >
> I don't. x86_64 is not one of them though. I will do some more digging ...
>
> > There used to be bzs that requires virtio drivers to add a shutdown to
> > fix kexec:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=2108406
> >
> I don't have access to this. What is it about?

This bug might be more accurate:

https://bugzilla.redhat.com/show_bug.cgi?id=1820521

It's about the kexec guys (cced relevant people) wanting to add a
shutdown method for virito to fix potential kexec issues.

Thanks

>
> Thanks,
> Dragos
> > Thanks
> >
> > >
> > > I don't really know much about why shutdown callback is even necessary.
> > > I guess this is to detect shutdown and do a faster cleanup than
> > > the slow, graceful removal, just cleaning hardware resources?
> > >
> > >
> > > --
> > > MST
> > >
> >
>


2023-08-02 08:53:56

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
> On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea <[email protected]> wrote:
> >
> > On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
> > > On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]> wrote:
> > > >
> > > > On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > > > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > > > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea wrote:
> > > > > > > > > The ndev was accessed on shutdown without a check if it
> > > > > > > > > actually
> > > > > > > > > exists.
> > > > > > > > > This triggered the crash pasted below. This patch simply adds
> > > > > > > > > a
> > > > > > > > > check
> > > > > > > > > before using ndev.
> > > > > > > > >
> > > > > > > > >  BUG: kernel NULL pointer dereference, address:
> > > > > > > > > 0000000000000300
> > > > > > > > >  #PF: supervisor read access in kernel mode
> > > > > > > > >  #PF: error_code(0x0000) - not-present page
> > > > > > > > >  PGD 0 P4D 0
> > > > > > > > >  Oops: 0000 [#1] SMP
> > > > > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-
> > > > > > > > > 1.13.0-0-
> > > > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > > > >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX:
> > > > > > > > > 0000000000000017
> > > > > > > > >  RDX: 0000000000000001 RSI: 0000000000000212 RDI:
> > > > > > > > > ffff888109d28000
> > > > > > > > >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09:
> > > > > > > > > 0000000000000001
> > > > > > > > >  R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > > > ffff888109d28000
> > > > > > > > >  R13: ffff888109d28080 R14: 00000000fee1dead R15:
> > > > > > > > > 0000000000000000
> > > > > > > > >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > > > > knlGS:0000000000000000
> > > > > > > > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4:
> > > > > > > > > 0000000000370eb0
> > > > > > > > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > > > 0000000000000000
> > > > > > > > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > > > > > > 0000000000000400
> > > > > > > > >  Call Trace:
> > > > > > > > >   <TASK>
> > > > > > > > >   ? __die+0x20/0x60
> > > > > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > > > > >   ? exc_page_fault+0x75/0x140
> > > > > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > >   device_shutdown+0x13e/0x1e0
> > > > > > > > >   kernel_restart+0x36/0x90
> > > > > > > > >   __do_sys_reboot+0x141/0x210
> > > > > > > > >   ? vfs_writev+0xcd/0x140
> > > > > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > > > > >   ? do_writev+0x6b/0x110
> > > > > > > > >   do_syscall_64+0x3d/0x90
> > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > > > >  RIP: 0033:0x7f496990fb56
> > > > > > > > >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX:
> > > > > > > > > 00000000000000a9
> > > > > > > > >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > > > > > > > > 00007f496990fb56
> > > > > > > > >  RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> > > > > > > > > fffffffffee1dead
> > > > > > > > >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09:
> > > > > > > > > 0000000000000000
> > > > > > > > >  R10: 0000000000000000 R11: 0000000000000206 R12:
> > > > > > > > > 0000000000000000
> > > > > > > > >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15:
> > > > > > > > > 00007fffc7bde2b8
> > > > > > > > >   </TASK>
> > > > > > > > >  CR2: 0000000000000300
> > > > > > > > >  ---[ end trace 0000000000000000 ]---
> > > > > > > > >
> > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt bypassing")
> > > > > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > > > > ---
> > > > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct
> > > > > > > > > auxiliary_device
> > > > > > > > > *auxdev)
> > > > > > > > >         mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > > > >         ndev = mgtdev->ndev;
> > > > > > > > >
> > > > > > > > > -       free_irqs(ndev);
> > > > > > > > > +       if (ndev)
> > > > > > > > > +               free_irqs(ndev);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > >
> > > > > > > > something I don't get:
> > > > > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > > > > >
> > > > > > > That is a good point. I will try to find out. I also don't get why
> > > > > > > free_irq
> > > > > > > is
> > > > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del.
> > > > > > > Maybe I
> > > > > > > can
> > > > > > > change
> > > > > > > that in a different refactoring.
> > > > > >
> > > > > > as it is I have no idea whether e.g. ndev can change
> > > > > > between these two call sites. that would make the check
> > > > > > pointless.
> > > > > >
> > > > > > > > this is what's creating all this mess.
> > > > > > > >
> > > > > > > >
> > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least that's
> > > > > > > what
> > > > > > > I
> > > > > > > see. Or
> > > > > > > am I missing something?
> > > > > >
> > > > > > and why do we care whether irqs are freed on shutdown?
> > > > > >
> > > > > Had to ask around a bit to find out the answer: there can be issues
> > > > > with
> > > > > kexec
> > > > > IRQ allocation on some platforms. It is documented here [0] for
> > > > > mlx5_core.
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> > > > >
> > > > > Thanks,
> > > > > Dragos
> > > >
> > > > It's quite weird.
> > > >          * Some platforms requiring freeing the IRQ's in the shutdown
> > > >          * flow. If they aren't freed they can't be allocated after
> > > >          * kexec. There is no need to cleanup the mlx5_core software
> > > >          * contexts.
> > > >
> > > > but most drivers don't have a shutdown callback how do they work then?
> > > > do you know which platforms these are?
> > >
> > I don't. x86_64 is not one of them though. I will do some more digging ...
> >
Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was
required for PPC arch but only for certain mainframe systems. That's all the
info I could find.

> > > There used to be bzs that requires virtio drivers to add a shutdown to
> > > fix kexec:
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=2108406
> > >
> > I don't have access to this. What is it about?
>
> This bug might be more accurate:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1820521
>
> It's about the kexec guys (cced relevant people) wanting to add a
> shutdown method for virito to fix potential kexec issues.
>
> Thanks
>
> >
> > Thanks,
> > Dragos
> > > Thanks
> > >
> > > >
> > > > I don't really know much about why shutdown callback is even necessary.
> > > > I guess this is to detect shutdown and do a faster cleanup than
> > > > the slow, graceful removal, just cleaning hardware resources?
> > > >
.shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's .shutdown
kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be
freed but as a side effect. Which is not good.

Thanks,
Dragos

2023-08-03 15:27:39

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Wed, 2023-08-02 at 09:56 +0200, Dragos Tatulea wrote:
> On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
> > On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea <[email protected]> wrote:
> > >
> > > On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
> > > > On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]>
> > > > wrote:
> > > > >
> > > > > On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > > > > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > > > > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea
> > > > > > > > > wrote:
> > > > > > > > > > The ndev was accessed on shutdown without a check if it
> > > > > > > > > > actually
> > > > > > > > > > exists.
> > > > > > > > > > This triggered the crash pasted below. This patch simply
> > > > > > > > > > adds
> > > > > > > > > > a
> > > > > > > > > > check
> > > > > > > > > > before using ndev.
> > > > > > > > > >
> > > > > > > > > >  BUG: kernel NULL pointer dereference, address:
> > > > > > > > > > 0000000000000300
> > > > > > > > > >  #PF: supervisor read access in kernel mode
> > > > > > > > > >  #PF: error_code(0x0000) - not-present page
> > > > > > > > > >  PGD 0 P4D 0
> > > > > > > > > >  Oops: 0000 [#1] SMP
> > > > > > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > rel-
> > > > > > > > > > 1.13.0-0-
> > > > > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > > >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > > > > >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX:
> > > > > > > > > > 0000000000000017
> > > > > > > > > >  RDX: 0000000000000001 RSI: 0000000000000212 RDI:
> > > > > > > > > > ffff888109d28000
> > > > > > > > > >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09:
> > > > > > > > > > 0000000000000001
> > > > > > > > > >  R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > > > > ffff888109d28000
> > > > > > > > > >  R13: ffff888109d28080 R14: 00000000fee1dead R15:
> > > > > > > > > > 0000000000000000
> > > > > > > > > >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4:
> > > > > > > > > > 0000000000370eb0
> > > > > > > > > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > > > > 0000000000000000
> > > > > > > > > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > > > > > > > 0000000000000400
> > > > > > > > > >  Call Trace:
> > > > > > > > > >   <TASK>
> > > > > > > > > >   ? __die+0x20/0x60
> > > > > > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > > > > > >   ? exc_page_fault+0x75/0x140
> > > > > > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > > >   device_shutdown+0x13e/0x1e0
> > > > > > > > > >   kernel_restart+0x36/0x90
> > > > > > > > > >   __do_sys_reboot+0x141/0x210
> > > > > > > > > >   ? vfs_writev+0xcd/0x140
> > > > > > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > > > > > >   ? do_writev+0x6b/0x110
> > > > > > > > > >   do_syscall_64+0x3d/0x90
> > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > > > > >  RIP: 0033:0x7f496990fb56
> > > > > > > > > >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX:
> > > > > > > > > > 00000000000000a9
> > > > > > > > > >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > > > > > > > > > 00007f496990fb56
> > > > > > > > > >  RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> > > > > > > > > > fffffffffee1dead
> > > > > > > > > >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09:
> > > > > > > > > > 0000000000000000
> > > > > > > > > >  R10: 0000000000000000 R11: 0000000000000206 R12:
> > > > > > > > > > 0000000000000000
> > > > > > > > > >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15:
> > > > > > > > > > 00007fffc7bde2b8
> > > > > > > > > >   </TASK>
> > > > > > > > > >  CR2: 0000000000000300
> > > > > > > > > >  ---[ end trace 0000000000000000 ]---
> > > > > > > > > >
> > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt
> > > > > > > > > > bypassing")
> > > > > > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct
> > > > > > > > > > auxiliary_device
> > > > > > > > > > *auxdev)
> > > > > > > > > >         mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > > > > >         ndev = mgtdev->ndev;
> > > > > > > > > >
> > > > > > > > > > -       free_irqs(ndev);
> > > > > > > > > > +       if (ndev)
> > > > > > > > > > +               free_irqs(ndev);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > something I don't get:
> > > > > > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > > > > > >
> > > > > > > > That is a good point. I will try to find out. I also don't get
> > > > > > > > why
> > > > > > > > free_irq
> > > > > > > > is
> > > > > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del.
> > > > > > > > Maybe I
> > > > > > > > can
> > > > > > > > change
> > > > > > > > that in a different refactoring.
> > > > > > >
> > > > > > > as it is I have no idea whether e.g. ndev can change
> > > > > > > between these two call sites. that would make the check
> > > > > > > pointless.
> > > > > > >
> > > > > > > > > this is what's creating all this mess.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least
> > > > > > > > that's
> > > > > > > > what
> > > > > > > > I
> > > > > > > > see. Or
> > > > > > > > am I missing something?
> > > > > > >
> > > > > > > and why do we care whether irqs are freed on shutdown?
> > > > > > >
> > > > > > Had to ask around a bit to find out the answer: there can be issues
> > > > > > with
> > > > > > kexec
> > > > > > IRQ allocation on some platforms. It is documented here [0] for
> > > > > > mlx5_core.
> > > > > >
> > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> > > > > >
> > > > > > Thanks,
> > > > > > Dragos
> > > > >
> > > > > It's quite weird.
> > > > >          * Some platforms requiring freeing the IRQ's in the shutdown
> > > > >          * flow. If they aren't freed they can't be allocated after
> > > > >          * kexec. There is no need to cleanup the mlx5_core software
> > > > >          * contexts.
> > > > >
> > > > > but most drivers don't have a shutdown callback how do they work then?
> > > > > do you know which platforms these are?
> > > >
> > > I don't. x86_64 is not one of them though. I will do some more digging ...
> > >
> Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was
> required for PPC arch but only for certain mainframe systems. That's all the
> info I could find.
>
I will send a v2 for this patch that removes the shutdown op. The irqs will be
released by the mlx5_core shutdown handler which is responsible for the VF.

Thanks,
Dragos

> > > > There used to be bzs that requires virtio drivers to add a shutdown to
> > > > fix kexec:
> > > >
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=2108406
> > > >
> > > I don't have access to this. What is it about?
> >
> > This bug might be more accurate:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1820521
> >
> > It's about the kexec guys (cced relevant people) wanting to add a
> > shutdown method for virito to fix potential kexec issues.
> >
> > Thanks
> >
> > >
> > > Thanks,
> > > Dragos
> > > > Thanks
> > > >
> > > > >
> > > > > I don't really know much about why shutdown callback is even
> > > > > necessary.
> > > > > I guess this is to detect shutdown and do a faster cleanup than
> > > > > the slow, graceful removal, just cleaning hardware resources?
> > > > >
> .shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's
> .shutdown
> kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be
> freed but as a side effect. Which is not good.
>
> Thanks,
> Dragos

2023-08-03 17:11:18

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH] vdpa/mlx5: Fix crash on shutdown for when no ndev exists

On Thu, Aug 03, 2023 at 03:02:59PM +0000, Dragos Tatulea wrote:
> On Wed, 2023-08-02 at 09:56 +0200, Dragos Tatulea wrote:
> > On Wed, 2023-08-02 at 10:51 +0800, Jason Wang wrote:
> > > On Tue, Aug 1, 2023 at 4:17 PM Dragos Tatulea <[email protected]> wrote:
> > > >
> > > > On Tue, 2023-08-01 at 11:59 +0800, Jason Wang wrote:
> > > > > On Mon, Jul 31, 2023 at 5:08 PM Michael S. Tsirkin <[email protected]>
> > > > > wrote:
> > > > > >
> > > > > > On Mon, Jul 31, 2023 at 07:15:31AM +0000, Dragos Tatulea wrote:
> > > > > > > On Thu, 2023-07-27 at 12:28 -0400, Michael S. Tsirkin wrote:
> > > > > > > > On Thu, Jul 27, 2023 at 04:02:16PM +0000, Dragos Tatulea wrote:
> > > > > > > > > On Wed, 2023-07-26 at 15:26 -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > On Wed, Jul 26, 2023 at 10:07:38PM +0300, Dragos Tatulea
> > > > > > > > > > wrote:
> > > > > > > > > > > The ndev was accessed on shutdown without a check if it
> > > > > > > > > > > actually
> > > > > > > > > > > exists.
> > > > > > > > > > > This triggered the crash pasted below. This patch simply
> > > > > > > > > > > adds
> > > > > > > > > > > a
> > > > > > > > > > > check
> > > > > > > > > > > before using ndev.
> > > > > > > > > > >
> > > > > > > > > > >  BUG: kernel NULL pointer dereference, address:
> > > > > > > > > > > 0000000000000300
> > > > > > > > > > >  #PF: supervisor read access in kernel mode
> > > > > > > > > > >  #PF: error_code(0x0000) - not-present page
> > > > > > > > > > >  PGD 0 P4D 0
> > > > > > > > > > >  Oops: 0000 [#1] SMP
> > > > > > > > > > >  CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 6.5.0-
> > > > > > > > > > > rc2_for_upstream_min_debug_2023_07_17_15_05 #1
> > > > > > > > > > >  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
> > > > > > > > > > > rel-
> > > > > > > > > > > 1.13.0-0-
> > > > > > > > > > > gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> > > > > > > > > > >  RIP: 0010:mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > > > >  RSP: 0018:ffff8881003bfdc0 EFLAGS: 00010286
> > > > > > > > > > >  RAX: ffff888103befba0 RBX: ffff888109d28008 RCX:
> > > > > > > > > > > 0000000000000017
> > > > > > > > > > >  RDX: 0000000000000001 RSI: 0000000000000212 RDI:
> > > > > > > > > > > ffff888109d28000
> > > > > > > > > > >  RBP: 0000000000000000 R08: 0000000d3a3a3882 R09:
> > > > > > > > > > > 0000000000000001
> > > > > > > > > > >  R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > > > > > > > > ffff888109d28000
> > > > > > > > > > >  R13: ffff888109d28080 R14: 00000000fee1dead R15:
> > > > > > > > > > > 0000000000000000
> > > > > > > > > > >  FS:  00007f4969e0be40(0000) GS:ffff88852c800000(0000)
> > > > > > > > > > > knlGS:0000000000000000
> > > > > > > > > > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > > > > > >  CR2: 0000000000000300 CR3: 00000001051cd006 CR4:
> > > > > > > > > > > 0000000000370eb0
> > > > > > > > > > >  DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > > > > > > > > 0000000000000000
> > > > > > > > > > >  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > > > > > > > > 0000000000000400
> > > > > > > > > > >  Call Trace:
> > > > > > > > > > >   <TASK>
> > > > > > > > > > >   ? __die+0x20/0x60
> > > > > > > > > > >   ? page_fault_oops+0x14c/0x3c0
> > > > > > > > > > >   ? exc_page_fault+0x75/0x140
> > > > > > > > > > >   ? asm_exc_page_fault+0x22/0x30
> > > > > > > > > > >   ? mlx5v_shutdown+0xe/0x50 [mlx5_vdpa]
> > > > > > > > > > >   device_shutdown+0x13e/0x1e0
> > > > > > > > > > >   kernel_restart+0x36/0x90
> > > > > > > > > > >   __do_sys_reboot+0x141/0x210
> > > > > > > > > > >   ? vfs_writev+0xcd/0x140
> > > > > > > > > > >   ? handle_mm_fault+0x161/0x260
> > > > > > > > > > >   ? do_writev+0x6b/0x110
> > > > > > > > > > >   do_syscall_64+0x3d/0x90
> > > > > > > > > > >   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> > > > > > > > > > >  RIP: 0033:0x7f496990fb56
> > > > > > > > > > >  RSP: 002b:00007fffc7bdde88 EFLAGS: 00000206 ORIG_RAX:
> > > > > > > > > > > 00000000000000a9
> > > > > > > > > > >  RAX: ffffffffffffffda RBX: 0000000000000000 RCX:
> > > > > > > > > > > 00007f496990fb56
> > > > > > > > > > >  RDX: 0000000001234567 RSI: 0000000028121969 RDI:
> > > > > > > > > > > fffffffffee1dead
> > > > > > > > > > >  RBP: 00007fffc7bde1d0 R08: 0000000000000000 R09:
> > > > > > > > > > > 0000000000000000
> > > > > > > > > > >  R10: 0000000000000000 R11: 0000000000000206 R12:
> > > > > > > > > > > 0000000000000000
> > > > > > > > > > >  R13: 00007fffc7bddf10 R14: 0000000000000000 R15:
> > > > > > > > > > > 00007fffc7bde2b8
> > > > > > > > > > >   </TASK>
> > > > > > > > > > >  CR2: 0000000000000300
> > > > > > > > > > >  ---[ end trace 0000000000000000 ]---
> > > > > > > > > > >
> > > > > > > > > > > Fixes: bc9a2b3e686e ("vdpa/mlx5: Support interrupt
> > > > > > > > > > > bypassing")
> > > > > > > > > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/vdpa/mlx5/net/mlx5_vnet.c | 3 ++-
> > > > > > > > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > index 9138ef2fb2c8..e2e7ebd71798 100644
> > > > > > > > > > > --- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> > > > > > > > > > > @@ -3556,7 +3556,8 @@ static void mlx5v_shutdown(struct
> > > > > > > > > > > auxiliary_device
> > > > > > > > > > > *auxdev)
> > > > > > > > > > >         mgtdev = auxiliary_get_drvdata(auxdev);
> > > > > > > > > > >         ndev = mgtdev->ndev;
> > > > > > > > > > >
> > > > > > > > > > > -       free_irqs(ndev);
> > > > > > > > > > > +       if (ndev)
> > > > > > > > > > > +               free_irqs(ndev);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > something I don't get:
> > > > > > > > > > irqs are allocated in mlx5_vdpa_dev_add
> > > > > > > > > > why are they not freed in mlx5_vdpa_dev_del?
> > > > > > > > > >
> > > > > > > > > That is a good point. I will try to find out. I also don't get
> > > > > > > > > why
> > > > > > > > > free_irq
> > > > > > > > > is
> > > > > > > > > called in the vdpa dev .free op instead of mlx5_vdpa_dev_del.
> > > > > > > > > Maybe I
> > > > > > > > > can
> > > > > > > > > change
> > > > > > > > > that in a different refactoring.
> > > > > > > >
> > > > > > > > as it is I have no idea whether e.g. ndev can change
> > > > > > > > between these two call sites. that would make the check
> > > > > > > > pointless.
> > > > > > > >
> > > > > > > > > > this is what's creating all this mess.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > Not quite: mlx5_vdpa_dev_del (which is a .dev_del of for struct
> > > > > > > > > vdpa_mgmtdev_ops) doesn't get called on shutdown. At least
> > > > > > > > > that's
> > > > > > > > > what
> > > > > > > > > I
> > > > > > > > > see. Or
> > > > > > > > > am I missing something?
> > > > > > > >
> > > > > > > > and why do we care whether irqs are freed on shutdown?
> > > > > > > >
> > > > > > > Had to ask around a bit to find out the answer: there can be issues
> > > > > > > with
> > > > > > > kexec
> > > > > > > IRQ allocation on some platforms. It is documented here [0] for
> > > > > > > mlx5_core.
> > > > > > >
> > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/mellanox/mlx5/core/main.c#n2129
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dragos
> > > > > >
> > > > > > It's quite weird.
> > > > > >          * Some platforms requiring freeing the IRQ's in the shutdown
> > > > > >          * flow. If they aren't freed they can't be allocated after
> > > > > >          * kexec. There is no need to cleanup the mlx5_core software
> > > > > >          * contexts.
> > > > > >
> > > > > > but most drivers don't have a shutdown callback how do they work then?
> > > > > > do you know which platforms these are?
> > > > >
> > > > I don't. x86_64 is not one of them though. I will do some more digging ...
> > > >
> > Turns out that this fix (releasing the irqs on .shutdown on mlx5_core) was
> > required for PPC arch but only for certain mainframe systems. That's all the
> > info I could find.
> >
> I will send a v2 for this patch that removes the shutdown op. The irqs will be
> released by the mlx5_core shutdown handler which is responsible for the VF.
>
> Thanks,
> Dragos

Certainly seems cleaner. Thanks!

> > > > > There used to be bzs that requires virtio drivers to add a shutdown to
> > > > > fix kexec:
> > > > >
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=2108406
> > > > >
> > > > I don't have access to this. What is it about?
> > >
> > > This bug might be more accurate:
> > >
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1820521
> > >
> > > It's about the kexec guys (cced relevant people) wanting to add a
> > > shutdown method for virito to fix potential kexec issues.
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks,
> > > > Dragos
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > I don't really know much about why shutdown callback is even
> > > > > > necessary.
> > > > > > I guess this is to detect shutdown and do a faster cleanup than
> > > > > > the slow, graceful removal, just cleaning hardware resources?
> > > > > >
> > .shutdown could be removed in mlx5_vdpa. But I notice that mlx5_core's
> > .shutdown
> > kicks in from pci_device_shutdown to clean the irqs. So the irqs will still be
> > freed but as a side effect. Which is not good.
> >
> > Thanks,
> > Dragos
>