2023-10-20 16:16:16

by David Woodhouse

[permalink] [raw]
Subject: [PATCH v2 0/3] hvc/xen: Xen console fixes.

It started out relatively simple, fixing the fact that the secondary
consoles didn't work at *all* due to bugs in the IRQ setup. There were
one or two other simple fixes that I snuck into the same patch in v1.

But it's much more broken than that, so split the fixes out, especially
the last one for hot-unplug.

Preserving the Reviewed-by: from Jürgen on the first two seems fair;
the third patch is new and exciting but *does* fix the case of removing
the console while userspace is spamming it.

And all I really wanted to do this week was spin up a PV guest under KVM
so I could play with its SMEP behaviour...

David Woodhouse (3):
hvc/xen: fix event channel handling for secondary consoles
hvc/xen: fix error path in xen_hvc_init() to always register frontend driver
hvc/xen: fix console unplug

drivers/tty/hvc/hvc_xen.c | 39 ++++++++++++++++++++++++++++-----------
1 file changed, 28 insertions(+), 11 deletions(-)



2023-10-20 16:16:17

by David Woodhouse

[permalink] [raw]
Subject: [PATCH v2 3/3] hvc/xen: fix console unplug

From: David Woodhouse <[email protected]>

On unplug of a Xen console, xencons_disconnect_backend() unconditionally
calls free_irq() via unbind_from_irqhandler(), causing a warning of
freeing an already-free IRQ:

(qemu) device_del con1
[ 32.050919] ------------[ cut here ]------------
[ 32.050942] Trying to free already-free IRQ 33
[ 32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330

It should be using evtchn_put() to tear down the event channel binding,
and let the Linux IRQ side of it be handled by notifier_del_irq() through
the HVC code.

On which topic... xencons_disconnect_backend() should call hvc_remove()
*first*, rather than tearing down the event channel and grant mapping
while they are in use. And then the IRQ is guaranteed to be freed by
the time it's torn down by evtchn_put().

Since evtchn_put() also closes the actual event channel, avoid calling
xenbus_free_evtchn() except in the failure path where the IRQ was not
successfully set up.

However, calling hvc_remove() at the start of xencons_disconnect_backend()
still isn't early enough. An unplug request is indicated by the backend
setting its state to XenbusStateClosing, which triggers a notification
to xencons_backend_changed(), which... does nothing except set its own
frontend state directly to XenbusStateClosed without *actually* tearing
down the HVC device or, you know, making sure it isn't actively in use.

So the backend sees the guest frontend set its state to XenbusStateClosed
and stops servicing the interrupt... and the guest spins for ever in the
domU_write_console() function waiting for the ring to drain.

Fix that one by calling hvc_remove() from xencons_backend_changed() before
signalling to the backend that it's OK to proceed with the removal.

Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove
the console device.

Signed-off-by: David Woodhouse <[email protected]>
Cc: [email protected]
---
drivers/tty/hvc/hvc_xen.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 4a768b504263..34c01874f45b 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -377,18 +377,21 @@ void xen_console_resume(void)
#ifdef CONFIG_HVC_XEN_FRONTEND
static void xencons_disconnect_backend(struct xencons_info *info)
{
- if (info->irq > 0)
- unbind_from_irqhandler(info->irq, NULL);
- info->irq = 0;
+ if (info->hvc != NULL)
+ hvc_remove(info->hvc);
+ info->hvc = NULL;
+ if (info->irq > 0) {
+ evtchn_put(info->evtchn);
+ info->irq = 0;
+ info->evtchn = 0;
+ }
+ /* evtchn_put() will also close it so this is only an error path */
if (info->evtchn > 0)
xenbus_free_evtchn(info->xbdev, info->evtchn);
info->evtchn = 0;
if (info->gntref > 0)
gnttab_free_grant_references(info->gntref);
info->gntref = 0;
- if (info->hvc != NULL)
- hvc_remove(info->hvc);
- info->hvc = NULL;
}

static void xencons_free(struct xencons_info *info)
@@ -553,10 +556,23 @@ static void xencons_backend_changed(struct xenbus_device *dev,
if (dev->state == XenbusStateClosed)
break;
fallthrough; /* Missed the backend's CLOSING state */
- case XenbusStateClosing:
+ case XenbusStateClosing: {
+ struct xencons_info *info = dev_get_drvdata(&dev->dev);;
+
+ /*
+ * Don't tear down the evtchn and grant ref before the other
+ * end has disconnected, but do stop userspace from trying
+ * to use the device before we allow the backend to close.
+ */
+ if (info->hvc) {
+ hvc_remove(info->hvc);
+ info->hvc = NULL;
+ }
+
xenbus_frontend_closed(dev);
break;
}
+ }
}

static const struct xenbus_device_id xencons_ids[] = {
@@ -616,7 +632,7 @@ static int __init xen_hvc_init(void)
list_del(&info->list);
spin_unlock_irqrestore(&xencons_lock, flags);
if (info->irq)
- unbind_from_irqhandler(info->irq, NULL);
+ evtchn_put(info->evtchn);
kfree(info);
return r;
}
--
2.40.1

2023-10-20 16:16:30

by David Woodhouse

[permalink] [raw]
Subject: [PATCH v2 1/3] hvc/xen: fix event channel handling for secondary consoles

From: David Woodhouse <[email protected]>

The xencons_connect_backend() function allocates a local interdomain
event channel with xenbus_alloc_evtchn(), then calls
bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the
*remote* domain.

That doesn't work very well:

(qemu) device_add xen-console,id=con1,chardev=pty0
[ 44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1
[ 44.323995] xenconsole: probe of console-1 failed with error -2

Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing
by just binding that *local* event channel to an irq. The backend will
do the interdomain binding.

This didn't affect the primary console because the setup for that is
special — the toolstack allocates the guest event channel and the guest
discovers it with HVMOP_get_param.

Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms")
Signed-off-by: David Woodhouse <[email protected]>
Reviewed-by: Juergen Gross <[email protected]>
Cc: [email protected]
---
drivers/tty/hvc/hvc_xen.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/tty/hvc/hvc_xen.c b/drivers/tty/hvc/hvc_xen.c
index 98764e740c07..f24e285b6441 100644
--- a/drivers/tty/hvc/hvc_xen.c
+++ b/drivers/tty/hvc/hvc_xen.c
@@ -433,7 +433,7 @@ static int xencons_connect_backend(struct xenbus_device *dev,
if (ret)
return ret;
info->evtchn = evtchn;
- irq = bind_interdomain_evtchn_to_irq_lateeoi(dev, evtchn);
+ irq = bind_evtchn_to_irq_lateeoi(evtchn);
if (irq < 0)
return irq;
info->irq = irq;
--
2.40.1

2023-10-21 16:33:22

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v2 1/3] hvc/xen: fix event channel handling for secondary consoles

On Fri, Oct 20, 2023 at 05:15:27PM +0100, David Woodhouse wrote:
> From: David Woodhouse <[email protected]>
>
> The xencons_connect_backend() function allocates a local interdomain
> event channel with xenbus_alloc_evtchn(), then calls
> bind_interdomain_evtchn_to_irq_lateeoi() to bind to that port# on the
> *remote* domain.
>
> That doesn't work very well:
>
> (qemu) device_add xen-console,id=con1,chardev=pty0
> [ 44.323872] xenconsole console-1: 2 xenbus_dev_probe on device/console/1
> [ 44.323995] xenconsole: probe of console-1 failed with error -2
>
> Fix it to use bind_evtchn_to_irq_lateeoi(), which does the right thing
> by just binding that *local* event channel to an irq. The backend will
> do the interdomain binding.
>
> This didn't affect the primary console because the setup for that is
> special — the toolstack allocates the guest event channel and the guest
> discovers it with HVMOP_get_param.
>
> Fixes: fe415186b4 ("xen/console: harden hvc_xen against event channel storms")

Nit, our tools complain that the sha1 isn't big enough, "fe415186b43d"
I'll go fix it up...

2023-10-23 08:32:25

by Juergen Gross

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] hvc/xen: fix console unplug

On 20.10.23 18:15, David Woodhouse wrote:
> From: David Woodhouse <[email protected]>
>
> On unplug of a Xen console, xencons_disconnect_backend() unconditionally
> calls free_irq() via unbind_from_irqhandler(), causing a warning of
> freeing an already-free IRQ:
>
> (qemu) device_del con1
> [ 32.050919] ------------[ cut here ]------------
> [ 32.050942] Trying to free already-free IRQ 33
> [ 32.050990] WARNING: CPU: 0 PID: 51 at kernel/irq/manage.c:1895 __free_irq+0x1d4/0x330
>
> It should be using evtchn_put() to tear down the event channel binding,
> and let the Linux IRQ side of it be handled by notifier_del_irq() through
> the HVC code.
>
> On which topic... xencons_disconnect_backend() should call hvc_remove()
> *first*, rather than tearing down the event channel and grant mapping
> while they are in use. And then the IRQ is guaranteed to be freed by
> the time it's torn down by evtchn_put().
>
> Since evtchn_put() also closes the actual event channel, avoid calling
> xenbus_free_evtchn() except in the failure path where the IRQ was not
> successfully set up.
>
> However, calling hvc_remove() at the start of xencons_disconnect_backend()
> still isn't early enough. An unplug request is indicated by the backend
> setting its state to XenbusStateClosing, which triggers a notification
> to xencons_backend_changed(), which... does nothing except set its own
> frontend state directly to XenbusStateClosed without *actually* tearing
> down the HVC device or, you know, making sure it isn't actively in use.
>
> So the backend sees the guest frontend set its state to XenbusStateClosed
> and stops servicing the interrupt... and the guest spins for ever in the
> domU_write_console() function waiting for the ring to drain.
>
> Fix that one by calling hvc_remove() from xencons_backend_changed() before
> signalling to the backend that it's OK to proceed with the removal.
>
> Tested with 'dd if=/dev/zero of=/dev/hvc1' while telling Qemu to remove
> the console device.
>
> Signed-off-by: David Woodhouse <[email protected]>
> Cc: [email protected]

Reviewed-by: Juergen Gross <[email protected]>


Juergen


Attachments:
OpenPGP_0xB0DE9DD628BF132F.asc (3.08 kB)
OpenPGP public key
OpenPGP_signature.asc (505.00 B)
OpenPGP digital signature
Download all attachments