2023-02-08 11:35:35

by Mohammed Gamal

[permalink] [raw]
Subject: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

relid2channel() assumes vmbus channel array to be allocated when called.
However, if the guest receives a vmbus interrupt during driver initialization
before vmbus_connect() is called or if vmbus_connect() fails, the vmbus
interrupt service routine is called which in turn calls relid2channel()
and can cause a null pointer dereference.

So Make relid2channel() check if vmbus channels is allocated first and return
NULL to the caller if not allocated.

Fixes: 8b6a877c060e ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")

Signed-off-by: Mohammed Gamal <[email protected]>
---
drivers/hv/connection.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 9dc27e5d367a..5c603c4f75a2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -409,6 +409,8 @@ void vmbus_disconnect(void)
*/
struct vmbus_channel *relid2channel(u32 relid)
{
+ if (WARN_ON(vmbus_connection.channels == NULL))
+ return NULL;
if (WARN_ON(relid >= MAX_CHANNEL_RELIDS))
return NULL;
return READ_ONCE(vmbus_connection.channels[relid]);
--
2.38.1



2023-02-08 19:04:44

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

> From: Mohammed Gamal <[email protected]>
> Sent: Wednesday, February 8, 2023 3:34 AM
>
> relid2channel() assumes vmbus channel array to be allocated when called.
> However, if the guest receives a vmbus interrupt during driver initialization
> before vmbus_connect() is called or if vmbus_connect() fails, the vmbus
> interrupt service routine is called which in turn calls relid2channel()
> and can cause a null pointer dereference.

Before vmbus_connect() is called or if vmbus_connect() fails, there should
be no VMBus channel related interrupts at all, so relid2channel() can't be
called.

Can you please share the log or at least the crash call-stack?
I'm curious how the crash can happen.

2023-02-09 09:48:56

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

On Thu, Feb 9, 2023 at 11:25 AM Mohammed Gamal <[email protected]> wrote:
>
>
> On Wed, Feb 8, 2023 at 9:03 PM Dexuan Cui <[email protected]> wrote:
> >
> > > From: Mohammed Gamal <[email protected]>
> > > Sent: Wednesday, February 8, 2023 3:34 AM
> > >
> > > relid2channel() assumes vmbus channel array to be allocated when called.
> > > However, if the guest receives a vmbus interrupt during driver initialization
> > > before vmbus_connect() is called or if vmbus_connect() fails, the vmbus
> > > interrupt service routine is called which in turn calls relid2channel()
> > > and can cause a null pointer dereference.
> >
> > Before vmbus_connect() is called or if vmbus_connect() fails, there should
> > be no VMBus channel related interrupts at all, so relid2channel() can't be
> > called.
> >
> > Can you please share the log or at least the crash call-stack?
> > I'm curious how the crash can happen.
> >
>
> Hi Dexuan,
> We saw this when triggering a crash with kdump enabled with
> echo 'c' > /proc/sysrq-trigger
>
> When the new kernel boots, we see this stack trace:
>
> [ 21.790653] BUG: kernel NULL pointer dereference, address: 0000000000000070
> [ 21.816550] #PF: supervisor read access in kernel mode
> [ 21.835697] #PF: error_code(0x0000) - not-present page
> [ 21.855499] PGD 0 P4D 0
> [ 21.865471] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 21.881150] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.14.0-247.el9.x86_64 # 1
> [ 21.906679] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 05/18/2018
> [ 21.939413] RIP: 0010:relid2channel+0x1a/0x30 [hv_vmbus]
> [ 21.958240] Code: 00 00 00 e8 a8 01 db c0 e9 78 ff ff ff 0f 1f 00 0f 1f 44 00 00 81 ff ff 07 00 00 77 15 48 8b 05 ac 31 01 00 89 ff 48 8d 04 f8 <48> 8b 00 e9 de ef b2 c1 0f 0b 31 c0 e9 d5 ef b2 c1 0f 1f 44 00 00
> [ 22.022266] RSP: 0018:ffffc90000003f90 EFLAGS: 00010097
> [ 22.040588] RAX: 0000000000000070 RBX: ffff88807a4ef200 RCX: 0000000000000000
> [ 22.065670] RDX: ffffffff82a1a940 RSI: 0000000000000000 RDI: 000000000000000e
> [ 22.090778] RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000
> [ 22.115947] R10: 000000000000000e R11: ffffc90000003ff8 R12: 0000000000000001
> [ 22.140901] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 22.165344] FS: 0000000000000000(0000) GS:ffff88807e600000(0000) knlGS:00000 00000000000
> [ 22.192958] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 22.213376] CR2: 0000000000000070 CR3: 000000007ae40000 CR4: 00000000003506b0
> [ 22.238792] Call Trace:
> [ 22.248268] <IRQ>
> [ 22.256236] vmbus_chan_sched.isra.0+0x67/0x190 [hv_vmbus]
> [ 22.275822] vmbus_isr+0x21/0xd0 [hv_vmbus]
> [ 22.290906] __sysvec_hyperv_callback+0x2e/0x60
> [ 22.307021] sysvec_hyperv_callback+0x6d/0x90
> [ 22.322530] </IRQ>
> [ 22.330559] <TASK>
> [ 22.338573] asm_sysvec_hyperv_callback+0x16/0x20
> [ 22.355090] RIP: 0010:default_idle+0x10/0x20
> [ 22.370572] Code: 00 0f ae f0 0f ae 38 0f ae f0 eb b5 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 eb 07 0f 00 2d 7e 6e 4d 00 fb f4 <e9> 4b dc 2c 00 cc cc cc cc cc cc cc cc cc cc cc 0f 1f 44 00 00 65
> [ 22.435289] RSP: 0018:ffffffff82a03ea8 EFLAGS: 00000202
> [ 22.453695] RAX: ffffffff81b33ea0 RBX: ffffffff82a1a940 RCX: 0000000000000000
> [ 22.478759] RDX: 00000000000000cd RSI: 0000000000000087 RDI: 00000000000000ce
> [ 22.503863] RBP: 0000000000000000 R08: 0138a8c77d17acc3 R09: 0000000000000001
> [ 22.528774] R10: 0000000000000400 R11: 00000000005d0eea R12: 0000000000000000
> [ 22.553857] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 22.578981] ? mwait_idle+0x80/0x80
> [ 22.591660] ? mwait_idle+0x80/0x80
> [ 22.604332] default_idle_call+0x33/0xe0
> [ 22.618343] cpuidle_idle_call+0x15d/0x1c0
> [ 22.632969] ? read_hv_sched_clock_tsc+0x5/0x20
> [ 22.649218] do_idle+0x7b/0xe0
> [ 22.660741] cpu_startup_entry+0x19/0x20
> [ 22.674964] rest_init+0xca/0xd0
> [ 22.686907] arch_call_rest_init+0xa/0x14
> [ 22.701272] start_kernel+0x49e/0x4c0
> [ 22.714418] secondary_startup_64_no_verify+0xe5/0xeb
> [ 22.732484] </TASK>
> [ 22.740973] Modules linked in: hv_vmbus(+) serio_raw dm_mirror dm_region_hash dm_log dm_mod fuse overlay squashfs loop
> [ 22.779694] CR2: 0000000000000070
> [ 22.792006] ---[ end trace 56dd24038e89124f ]---
> [ 22.808686] RIP: 0010:relid2channel+0x1a/0x30 [hv_vmbus]
> [ 22.827388] Code: 00 00 00 e8 a8 01 db c0 e9 78 ff ff ff 0f 1f 00 0f 1f 44 00 00 81 ff ff 07 00 00 77 15 48 8b 05 ac 31 01 00 89 ff 48 8d 04 f8 <48> 8b 00 e9 de ef b2 c1 0f 0b 31 c0 e9 d5 ef b2 c1 0f 1f 44 00 00
> [ 22.891669] RSP: 0018:ffffc90000003f90 EFLAGS: 00010097
> [ 22.910168] RAX: 0000000000000070 RBX: ffff88807a4ef200 RCX: 0000000000000000
> [ 22.935445] RDX: ffffffff82a1a940 RSI: 0000000000000000 RDI: 000000000000000e
> [ 22.958556] RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000
> [ 22.979539] R10: 000000000000000e R11: ffffc90000003ff8 R12: 0000000000000001
> [ 23.004127] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ 23.027088] FS: 0000000000000000(0000) GS:ffff88807e600000(0000) knlGS:00000 00000000000
> [ 23.053341] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 23.074013] CR2: 0000000000000070 CR3: 000000007ae40000 CR4: 00000000003506b0
> [ 23.099322] Kernel panic - not syncing: Fatal exception in interrupt
> [ 23.121729] Kernel Offset: disabled

Ugh, my mail client messed up the reply and it was marked as spam.
Resending to the lists in plain text. See the reply above.


2023-02-10 02:58:06

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

> From: Mohammed Gamal <[email protected]>
> Sent: Thursday, February 9, 2023 1:48 AM
> ...
> > We saw this when triggering a crash with kdump enabled with
> > echo 'c' > /proc/sysrq-trigger
> >
> > When the new kernel boots, we see this stack trace:

Thanks for the details. Kdump is special in that the 'old' VMBus
channels might still be active (from the host's perspective),
when the new kernel starts to run.

Upon crash, Linux sends a CHANNELMSG_UNLOAD messge to the host,
and the host is supposed to quiesce/reset the VMBus devices, so
normally we should not see a crash in relid2channel().

> > [ 21.906679] Hardware name: Microsoft Corporation Virtual
> > Machine/Virtual Machine, BIOS 090007 05/18/2018

I guess you see the crash because you're running an old Hyper-V,
probably Windows Server 2016 or 2019, which may be unable to
reliably handle the guest's CHANNELMSG_UNLOAD messge.

Can you please mention kdump in the commit message?

BTW, regarding "before vmbus_connect() is called ", IMO it
should be "before vmbus_connect() is called or before it finishes".

2023-02-10 09:13:24

by Mohammed Gamal

[permalink] [raw]
Subject: Re: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

(Re-CC'ing people from the old thread)

On Fri, Feb 10, 2023 at 4:57 AM Dexuan Cui <[email protected]> wrote:
>
> > From: Mohammed Gamal <[email protected]>
> > Sent: Thursday, February 9, 2023 1:48 AM
> > ...
> > > We saw this when triggering a crash with kdump enabled with
> > > echo 'c' > /proc/sysrq-trigger
> > >
> > > When the new kernel boots, we see this stack trace:
>
> Thanks for the details. Kdump is special in that the 'old' VMBus
> channels might still be active (from the host's perspective),
> when the new kernel starts to run.
>
> Upon crash, Linux sends a CHANNELMSG_UNLOAD messge to the host,
> and the host is supposed to quiesce/reset the VMBus devices, so
> normally we should not see a crash in relid2channel().

Does this not happen in the case of kdump? Shouldn't a CHANNELMSG_UNLOAD
message be sent to the host in that case as well?

>
> > > [ 21.906679] Hardware name: Microsoft Corporation Virtual
> > > Machine/Virtual Machine, BIOS 090007 05/18/2018
>
> I guess you see the crash because you're running an old Hyper-V,
> probably Windows Server 2016 or 2019, which may be unable to
> reliably handle the guest's CHANNELMSG_UNLOAD messge.

We've actually seen this on Windows Server 2016, 2019, and 2022.

>
> Can you please mention kdump in the commit message?
>

Will do.

> BTW, regarding "before vmbus_connect() is called ", IMO it
> should be "before vmbus_connect() is called or before it finishes".


2023-02-10 19:18:34

by Dexuan Cui

[permalink] [raw]
Subject: RE: [PATCH] Drivers: vmbus: Check for channel allocation before looking up relids

> From: Mohammed Gamal <[email protected]>
> Sent: Friday, February 10, 2023 1:12 AM
> > ...
> > Upon crash, Linux sends a CHANNELMSG_UNLOAD messge to the host,
> > and the host is supposed to quiesce/reset the VMBus devices, so
> > normally we should not see a crash in relid2channel().
>
> Does this not happen in the case of kdump? Shouldn't a
> CHANNELMSG_UNLOAD
> message be sent to the host in that case as well?

The message is sent to the host in the case of kdump.

> > > > [ 21.906679] Hardware name: Microsoft Corporation Virtual
> > > > Machine/Virtual Machine, BIOS 090007 05/18/2018
> >
> > I guess you see the crash because you're running an old Hyper-V,
> > probably Windows Server 2016 or 2019, which may be unable to
> > reliably handle the guest's CHANNELMSG_UNLOAD messge.
>
> We've actually seen this on Windows Server 2016, 2019, and 2022.

I didn't expect this to happen to WS 2022. It looks like some of the
VMBus devices are not reset by the host upon the message
CHANNELMSG_UNLOAD. If you can check all the 'relids' in the first
kernel beforehand, and print the 'relid' in relid2channel, we'll be
able to tell which device is not reset. Maybe it's a good idea to print
the 'relid' in the newly-added warning for debug purposes.