Set "opened" to "0" before the hotplug script is called. Once the
device node has been opened, set "opened" to "1".
"opened" is used exclusively by userspace. It serves two purposes:
1. It tells userspace that the diskseq Xenstore entry is supported.
2. It tells userspace that it can wait for "opened" to be set to 1.
Once "opened" is 1, blkback has a reference to the device, so
userspace doesn't need to keep one.
Together, these changes allow userspace to use block devices with
delete-on-close behavior, such as loop devices with the autoclear flag
set or device-mapper devices with the deferred-remove flag set.
Signed-off-by: Demi Marie Obenour <[email protected]>
---
drivers/block/xen-blkback/xenbus.c | 35 ++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index 9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -3,6 +3,20 @@
Copyright (C) 2005 Rusty Russell <[email protected]>
Copyright (C) 2005 XenSource Ltd
+In addition to the Xenstore nodes required by the Xen block device
+specification, this implementation of blkback uses a new Xenstore
+node: "opened". blkback sets "opened" to "0" before the hotplug script
+is called. Once the device node has been opened, blkback sets "opened"
+to "1".
+
+"opened" is read exclusively by userspace. It serves two purposes:
+
+1. It tells userspace that diskseq@major:minor syntax for "physical-device" is
+ supported.
+
+2. It tells userspace that it can wait for "opened" to be set to 1 after writing
+ "physical-device". Once "opened" is 1, blkback has a reference to the
+ device, so userspace doesn't need to keep one.
*/
@@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
if (err)
pr_warn("%s write out 'max-ring-page-order' failed\n", __func__);
+ /*
+ * This informs userspace that the "opened" node will be set to "1" when
+ * the device has been opened successfully.
+ */
+ err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0");
+ if (err)
+ goto fail;
+
err = xenbus_switch_state(dev, XenbusStateInitWait);
if (err)
goto fail;
@@ -826,6 +848,19 @@ static void backend_changed(struct xenbus_watch *watch,
goto fail;
}
+ /*
+ * Tell userspace that the device has been opened and that blkback has a
+ * reference to it. Userspace can then close the device or mark it as
+ * delete-on-close, knowing that blkback will keep the device open as
+ * long as necessary.
+ */
+ err = xenbus_write(XBT_NIL, dev->nodename, "opened", "1");
+ if (err) {
+ xenbus_dev_fatal(dev, err, "%s: notifying userspace device has been opened",
+ dev->nodename);
+ goto free_vbd;
+ }
+
err = xenvbd_sysfs_addif(dev);
if (err) {
xenbus_dev_fatal(dev, err, "creating sysfs entries");
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote:
> On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote:
> > Set "opened" to "0" before the hotplug script is called. Once the
> > device node has been opened, set "opened" to "1".
> >
> > "opened" is used exclusively by userspace. It serves two purposes:
> >
> > 1. It tells userspace that the diskseq Xenstore entry is supported.
> >
> > 2. It tells userspace that it can wait for "opened" to be set to 1.
> > Once "opened" is 1, blkback has a reference to the device, so
> > userspace doesn't need to keep one.
> >
> > Together, these changes allow userspace to use block devices with
> > delete-on-close behavior, such as loop devices with the autoclear flag
> > set or device-mapper devices with the deferred-remove flag set.
>
> There was some work in the past to allow reloading blkback as a
> module, it's clear that using delete-on-close won't work if attempting
> to reload blkback.
Should blkback stop itself from being unloaded if delete-on-close is in
use?
> Isn't there some existing way to check whether a device is opened?
> (stat syscall maybe?).
Knowing that the device has been opened isn’t enough. The block script
needs to be able to wait for blkback (and not something else) to open
the device. Otherwise it will be confused if the device is opened by
e.g. udev.
> I would like to avoid adding more xenstore blkback state if such
> information can be fetched from other methods.
I don’t think it can be, unless the information is passed via a
completely different method. Maybe netlink(7) or ioctl(2)? Arguably
this information should not be stored in Xenstore at all, as it exposes
backend implementation details to the frontend.
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > index 9c3eb148fbd802c74e626c3d7bcd69dcb09bd921..519a78aa9073d1faa1dce5c1b36e95ae58da534b 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -699,6 +713,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
> > if (err)
> > pr_warn("%s write out 'max-ring-page-order' failed\n", __func__);
> >
> > + /*
> > + * This informs userspace that the "opened" node will be set to "1" when
> > + * the device has been opened successfully.
> > + */
> > + err = xenbus_write(XBT_NIL, dev->nodename, "opened", "0");
> > + if (err)
> > + goto fail;
> > +
>
> You would need to set "opened" before registering the xenstore backend
> watch AFAICT, or else it could be racy.
Will fix in the next version.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote:
> On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote:
> > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote:
> > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote:
> > > > Set "opened" to "0" before the hotplug script is called. Once the
> > > > device node has been opened, set "opened" to "1".
> > > >
> > > > "opened" is used exclusively by userspace. It serves two purposes:
> > > >
> > > > 1. It tells userspace that the diskseq Xenstore entry is supported.
> > > >
> > > > 2. It tells userspace that it can wait for "opened" to be set to 1.
> > > > Once "opened" is 1, blkback has a reference to the device, so
> > > > userspace doesn't need to keep one.
> > > >
> > > > Together, these changes allow userspace to use block devices with
> > > > delete-on-close behavior, such as loop devices with the autoclear flag
> > > > set or device-mapper devices with the deferred-remove flag set.
> > >
> > > There was some work in the past to allow reloading blkback as a
> > > module, it's clear that using delete-on-close won't work if attempting
> > > to reload blkback.
> >
> > Should blkback stop itself from being unloaded if delete-on-close is in
> > use?
>
> Hm, maybe. I guess that's the best we can do right now.
I’ll implement this.
> > > Isn't there some existing way to check whether a device is opened?
> > > (stat syscall maybe?).
> >
> > Knowing that the device has been opened isn’t enough. The block script
> > needs to be able to wait for blkback (and not something else) to open
> > the device. Otherwise it will be confused if the device is opened by
> > e.g. udev.
>
> Urg, no, the block script cannot wait indefinitely for blkback to open
> the device, as it has an execution timeout. blkback is free to only
> open the device upon guest frontend connection, and that (when using
> libxl) requires the hotplug scripts execution to be finished so the
> guest can be started.
I’m a bit confused here. My understanding is that blkdev_get_by_dev()
already opens the device, and that happens in the xenstore watch
handler. I have tested this with delete-on-close device-mapper devices,
and it does work.
> > > I would like to avoid adding more xenstore blkback state if such
> > > information can be fetched from other methods.
> >
> > I don’t think it can be, unless the information is passed via a
> > completely different method. Maybe netlink(7) or ioctl(2)? Arguably
> > this information should not be stored in Xenstore at all, as it exposes
> > backend implementation details to the frontend.
>
> Could you maybe use sysfs for this information?
Probably? This would involve adding a new file in sysfs.
> We have all sorts of crap in xenstore, but it would be best if we can
> see of placing stuff like this in another interface.
Fair.
> Thanks, Roger.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
On Thu, Jun 08, 2023 at 11:11:44AM +0200, Roger Pau Monné wrote:
> On Wed, Jun 07, 2023 at 12:29:26PM -0400, Demi Marie Obenour wrote:
> > On Wed, Jun 07, 2023 at 10:44:48AM +0200, Roger Pau Monné wrote:
> > > On Tue, Jun 06, 2023 at 01:31:25PM -0400, Demi Marie Obenour wrote:
> > > > On Tue, Jun 06, 2023 at 11:15:37AM +0200, Roger Pau Monné wrote:
> > > > > On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote:
> > > > > > Set "opened" to "0" before the hotplug script is called. Once the
> > > > > > device node has been opened, set "opened" to "1".
> > > > > >
> > > > > > "opened" is used exclusively by userspace. It serves two purposes:
> > > > > >
> > > > > > 1. It tells userspace that the diskseq Xenstore entry is supported.
> > > > > >
> > > > > > 2. It tells userspace that it can wait for "opened" to be set to 1.
> > > > > > Once "opened" is 1, blkback has a reference to the device, so
> > > > > > userspace doesn't need to keep one.
> > > > > >
> > > > > > Together, these changes allow userspace to use block devices with
> > > > > > delete-on-close behavior, such as loop devices with the autoclear flag
> > > > > > set or device-mapper devices with the deferred-remove flag set.
> > > > >
> > > > > There was some work in the past to allow reloading blkback as a
> > > > > module, it's clear that using delete-on-close won't work if attempting
> > > > > to reload blkback.
> > > >
> > > > Should blkback stop itself from being unloaded if delete-on-close is in
> > > > use?
> > >
> > > Hm, maybe. I guess that's the best we can do right now.
> >
> > I’ll implement this.
>
> Let's make this a separate patch.
Good idea.
> > > > > Isn't there some existing way to check whether a device is opened?
> > > > > (stat syscall maybe?).
> > > >
> > > > Knowing that the device has been opened isn’t enough. The block script
> > > > needs to be able to wait for blkback (and not something else) to open
> > > > the device. Otherwise it will be confused if the device is opened by
> > > > e.g. udev.
> > >
> > > Urg, no, the block script cannot wait indefinitely for blkback to open
> > > the device, as it has an execution timeout. blkback is free to only
> > > open the device upon guest frontend connection, and that (when using
> > > libxl) requires the hotplug scripts execution to be finished so the
> > > guest can be started.
> >
> > I’m a bit confused here. My understanding is that blkdev_get_by_dev()
> > already opens the device, and that happens in the xenstore watch
> > handler. I have tested this with delete-on-close device-mapper devices,
> > and it does work.
>
> Right, but on a very contended system there's no guarantee of when
> blkback will pick up the update to "physical-device" and open the
> device, so far the block script only writes the physical-device node
> and exits. With the proposed change the block script will also wait
> for blkback to react to the physcal-device write, hence making VM
> creation slower.
Only block scripts that choose to wait for device open suffer
this performance penalty. My current plan is to only do so for
delete-on-close devices which are managed by the block script
itself. Other devices will not suffer a performance hit.
In the long term, I would like to solve this problem entirely by using
an ioctl to configure blkback. The ioctl would take a file descriptor
argument, avoiding the need for a round-trip through xenstore. This
also solves a security annoyance with the current design, which is that
the device is opened by a kernel thread and so the security context of
whoever requested the device to be opened is lost.
> > > > > I would like to avoid adding more xenstore blkback state if such
> > > > > information can be fetched from other methods.
> > > >
> > > > I don’t think it can be, unless the information is passed via a
> > > > completely different method. Maybe netlink(7) or ioctl(2)? Arguably
> > > > this information should not be stored in Xenstore at all, as it exposes
> > > > backend implementation details to the frontend.
> > >
> > > Could you maybe use sysfs for this information?
> >
> > Probably? This would involve adding a new file in sysfs.
> >
> > > We have all sorts of crap in xenstore, but it would be best if we can
> > > see of placing stuff like this in another interface.
> >
> > Fair.
>
> Let's see if that's a suitable approach, and we can avoid having to
> add an extra node to xenstore.
I thought about this some more and realized that in Qubes OS, we might
want to include the diskseq in the information dom0 gets about each
exported block device. This would allow dom0 to write the xenstore node
itself, but it would require some way for dom0 to be informed about
blkback having this feature.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
On Thu, Jun 08, 2023 at 12:08:55PM +0200, Roger Pau Monné wrote:
> On Tue, May 30, 2023 at 04:31:16PM -0400, Demi Marie Obenour wrote:
> > Set "opened" to "0" before the hotplug script is called. Once the
> > device node has been opened, set "opened" to "1".
> >
> > "opened" is used exclusively by userspace. It serves two purposes:
> >
> > 1. It tells userspace that the diskseq Xenstore entry is supported.
> >
> > 2. It tells userspace that it can wait for "opened" to be set to 1.
> > Once "opened" is 1, blkback has a reference to the device, so
> > userspace doesn't need to keep one.
> >
> > Together, these changes allow userspace to use block devices with
> > delete-on-close behavior, such as loop devices with the autoclear flag
> > set or device-mapper devices with the deferred-remove flag set.
>
> Now that I think a bit more about this, how are you planning to handle
> reboot with such devices? It's fine for loop (because those get
> instantiated by the block script), but likely not with other block
> devices, as on reboot the toolstack will find the block device is
> gone.
>
> I guess the delete-on-close is only intended to be used for loop
> devices? (or in general block devices that are instantiated by the
> block script itself)
You understand correctly.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab