Switching to closed state earlier can cause the block-drbd
script to fail with 'Device is held open by someone':
root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
Signed-off-by: Valentin Vidic <[email protected]>
Cc: [email protected]
---
drivers/block/xen-blkback/xenbus.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..43bddc996709 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
{
WARN_ON(xen_blkif_disconnect(blkif));
xen_vbd_free(&blkif->vbd);
+ xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
kfree(blkif->be->mode);
kfree(blkif->be);
@@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
- xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
/* fall through */
--
2.18.0
On 29/08/18 08:52, Valentin Vidic wrote:
> Switching to closed state earlier can cause the block-drbd
> script to fail with 'Device is held open by someone':
>
> root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
>
> Signed-off-by: Valentin Vidic <[email protected]>
> Cc: [email protected]
> ---
> drivers/block/xen-blkback/xenbus.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index a4bc74e72c39..43bddc996709 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> {
> WARN_ON(xen_blkif_disconnect(blkif));
> xen_vbd_free(&blkif->vbd);
> + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> kfree(blkif->be->mode);
> kfree(blkif->be);
>
> @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
>
> case XenbusStateClosed:
> xen_blkif_disconnect(be->blkif);
> - xenbus_switch_state(dev, XenbusStateClosed);
> if (xenbus_dev_is_online(dev))
> break;
> /* fall through */
Did you test whether it is okay to not change state in case the device
is still online?
Juergen
On Wed, Aug 29, 2018 at 10:16:09AM +0200, Juergen Gross wrote:
> Did you test whether it is okay to not change state in case the device
> is still online?
Not sure how to simulate that. Maybe using xl block-detach or something else?
--
Valentin
On 29/08/18 10:27, Valentin Vidic wrote:
> On Wed, Aug 29, 2018 at 10:16:09AM +0200, Juergen Gross wrote:
>> Did you test whether it is okay to not change state in case the device
>> is still online?
>
> Not sure how to simulate that. Maybe using xl block-detach or something else?
>
I think this should be an action triggered by the frontend, not by dom0
(xen tools will always set "online" to 0 when removing a device, AFAIK).
I'm not sure this is relevant, but I realized this behavior is changed
by your patch.
Juergen
On Wed, Aug 29, 2018 at 10:43:48AM +0200, Juergen Gross wrote:
> I think this should be an action triggered by the frontend, not by dom0
> (xen tools will always set "online" to 0 when removing a device, AFAIK).
>
> I'm not sure this is relevant, but I realized this behavior is changed
> by your patch.
Can't find any place in xen-blkfront.c (or the rest of the kernel) where
the "online" value gets set, so it seems xen tools in dom0 is the only
thing that modifies this Xenbus node?
--
Valentin
On 29/08/18 11:23, Valentin Vidic wrote:
> On Wed, Aug 29, 2018 at 10:43:48AM +0200, Juergen Gross wrote:
>> I think this should be an action triggered by the frontend, not by dom0
>> (xen tools will always set "online" to 0 when removing a device, AFAIK).
>>
>> I'm not sure this is relevant, but I realized this behavior is changed
>> by your patch.
>
> Can't find any place in xen-blkfront.c (or the rest of the kernel) where
> the "online" value gets set, so it seems xen tools in dom0 is the only
> thing that modifies this Xenbus node?
>
I think so, yes.
It will be set to "1" at device creation and to "0" when removing the
device, both times by Xen tools.
Juergen
On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> Switching to closed state earlier can cause the block-drbd
> script to fail with 'Device is held open by someone':
>
> root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
>
> Signed-off-by: Valentin Vidic <[email protected]>
> Cc: [email protected]
> ---
> drivers/block/xen-blkback/xenbus.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> index a4bc74e72c39..43bddc996709 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> {
> WARN_ON(xen_blkif_disconnect(blkif));
> xen_vbd_free(&blkif->vbd);
> + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> kfree(blkif->be->mode);
> kfree(blkif->be);
>
> @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
>
> case XenbusStateClosed:
> xen_blkif_disconnect(be->blkif);
> - xenbus_switch_state(dev, XenbusStateClosed);
> if (xenbus_dev_is_online(dev))
> break;
AFAICT, this will cause the backend to never switch to 'Closed' state
until the toolstack sets online to 0, which is not good IMO.
If for example a frontend decides to close a device, the backend will
stay in state 'Closing' until the toolstack actually removes the disk
by setting online to 0.
This will prevent resetting blk connections, as blkback will refuse to
switch to state XenbusStateInitWait unless it's at XenbusStateClosed
(see the XenbusStateInitialising case in frontend_changed), which will
never be reached with your patch.
Maybe the easiest solution would be to wait in the block-drbd script
until the device is released? Maybe using fstat in a loop or one of
the drbd tools?
Thanks, Roger.
On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn? wrote:
> On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > Switching to closed state earlier can cause the block-drbd
> > script to fail with 'Device is held open by someone':
> >
> > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> >
> > Signed-off-by: Valentin Vidic <[email protected]>
> > Cc: [email protected]
> > ---
> > drivers/block/xen-blkback/xenbus.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > index a4bc74e72c39..43bddc996709 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > {
> > WARN_ON(xen_blkif_disconnect(blkif));
> > xen_vbd_free(&blkif->vbd);
> > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > kfree(blkif->be->mode);
> > kfree(blkif->be);
> >
> > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> >
> > case XenbusStateClosed:
> > xen_blkif_disconnect(be->blkif);
> > - xenbus_switch_state(dev, XenbusStateClosed);
> > if (xenbus_dev_is_online(dev))
> > break;
>
> AFAICT, this will cause the backend to never switch to 'Closed' state
> until the toolstack sets online to 0, which is not good IMO.
>
> If for example a frontend decides to close a device, the backend will
> stay in state 'Closing' until the toolstack actually removes the disk
> by setting online to 0.
>
> This will prevent resetting blk connections, as blkback will refuse to
> switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> (see the XenbusStateInitialising case in frontend_changed), which will
> never be reached with your patch.
Ok, is it possible to test this somehow?
> Maybe the easiest solution would be to wait in the block-drbd script
> until the device is released? Maybe using fstat in a loop or one of
> the drbd tools?
That is an option, but I don't know if it is possible to check device
release like that. Including drbd-user in CC, maybe they have some
idea how to solve this.
--
Valentin
On Wed, Sep 05, 2018 at 01:35:15PM +0200, Valentin Vidic wrote:
> > AFAICT, this will cause the backend to never switch to 'Closed' state
> > until the toolstack sets online to 0, which is not good IMO.
> >
> > If for example a frontend decides to close a device, the backend will
> > stay in state 'Closing' until the toolstack actually removes the disk
> > by setting online to 0.
> >
> > This will prevent resetting blk connections, as blkback will refuse to
> > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > (see the XenbusStateInitialising case in frontend_changed), which will
> > never be reached with your patch.
Would it be possible to call xen_vbd_free before the state change?
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
xen_vbd_free(&be->blkif->vbd);
xenbus_switch_state(dev, XenbusStateClosed);
--
Valentin
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn? wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > >
> > > Signed-off-by: Valentin Vidic <[email protected]>
> > > Cc: [email protected]
> > > ---
> > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > index a4bc74e72c39..43bddc996709 100644
> > > --- a/drivers/block/xen-blkback/xenbus.c
> > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > {
> > > WARN_ON(xen_blkif_disconnect(blkif));
> > > xen_vbd_free(&blkif->vbd);
> > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > kfree(blkif->be->mode);
> > > kfree(blkif->be);
> > >
> > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > >
> > > case XenbusStateClosed:
> > > xen_blkif_disconnect(be->blkif);
> > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > if (xenbus_dev_is_online(dev))
> > > break;
> >
> > AFAICT, this will cause the backend to never switch to 'Closed' state
> > until the toolstack sets online to 0, which is not good IMO.
> >
> > If for example a frontend decides to close a device, the backend will
> > stay in state 'Closing' until the toolstack actually removes the disk
> > by setting online to 0.
> >
> > This will prevent resetting blk connections, as blkback will refuse to
> > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > (see the XenbusStateInitialising case in frontend_changed), which will
> > never be reached with your patch.
>
> Ok, is it possible to test this somehow?
Yes, you can try booting a PV guest with pvgrub, that will cause
pvgrub to open a blk connection, then close it and afterwards the OS
kernel will start a new connection.
Thanks, Roger.
On Wed, Sep 05, 2018 at 06:28:01PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 01:35:15PM +0200, Valentin Vidic wrote:
> > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > until the toolstack sets online to 0, which is not good IMO.
> > >
> > > If for example a frontend decides to close a device, the backend will
> > > stay in state 'Closing' until the toolstack actually removes the disk
> > > by setting online to 0.
> > >
> > > This will prevent resetting blk connections, as blkback will refuse to
> > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > never be reached with your patch.
>
> Would it be possible to call xen_vbd_free before the state change?
>
> case XenbusStateClosed:
> xen_blkif_disconnect(be->blkif);
> xen_vbd_free(&be->blkif->vbd);
> xenbus_switch_state(dev, XenbusStateClosed);
I think that will break reconnection, since xen_vbd_create is only
called after hotplug script execution is performed (which happens only
once at device attachment), but not when DomU changes frontend
state.
If you want to perform this xen_vbd_free you will also have to move
the xen_vbd_create call AFAICT, to a place that's also called when
reconnecting a device. Note that I could be wrong, so it might be
worth a shot to try different approaches since the blkback code is
quite tangled and I might miss something.
Thanks, Roger.
On Thu, Sep 06, 2018 at 06:29:32PM +0200, Roger Pau Monn? wrote:
> On Wed, Sep 05, 2018 at 06:28:01PM +0200, Valentin Vidic wrote:
> > On Wed, Sep 05, 2018 at 01:35:15PM +0200, Valentin Vidic wrote:
> > > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > > until the toolstack sets online to 0, which is not good IMO.
> > > >
> > > > If for example a frontend decides to close a device, the backend will
> > > > stay in state 'Closing' until the toolstack actually removes the disk
> > > > by setting online to 0.
> > > >
> > > > This will prevent resetting blk connections, as blkback will refuse to
> > > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > > never be reached with your patch.
> >
> > Would it be possible to call xen_vbd_free before the state change?
> >
> > case XenbusStateClosed:
> > xen_blkif_disconnect(be->blkif);
> > xen_vbd_free(&be->blkif->vbd);
> > xenbus_switch_state(dev, XenbusStateClosed);
>
> I think that will break reconnection, since xen_vbd_create is only
> called after hotplug script execution is performed (which happens only
> once at device attachment), but not when DomU changes frontend
> state.
>
> If you want to perform this xen_vbd_free you will also have to move
> the xen_vbd_create call AFAICT, to a place that's also called when
> reconnecting a device. Note that I could be wrong, so it might be
> worth a shot to try different approaches since the blkback code is
> quite tangled and I might miss something.
It seems like the Closed state is not a good point to call the remove
script since the device could go back from Closed to Connected.
Maybe it would help to introduce a new final state (7 = XenbusStateFree
or XenbusStateRemove) that would be set after xen_vbd_free to let the
userspace know it is safe to run the remove script?
static void xen_blkif_free(struct xen_blkif *blkif)
{
WARN_ON(xen_blkif_disconnect(blkif));
xen_vbd_free(&blkif->vbd);
xenbus_switch_state(blkif->be->dev, XenbusStateFree);
kfree(blkif->be->mode);
kfree(blkif->be);
/* Make sure everything is drained before shutting down */
kmem_cache_free(xen_blkif_cachep, blkif);
}
--
Valentin
On Thu, Sep 06, 2018 at 06:14:21PM +0200, Roger Pau Monn? wrote:
> On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> > On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn? wrote:
> > > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > > Switching to closed state earlier can cause the block-drbd
> > > > script to fail with 'Device is held open by someone':
> > > >
> > > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
> > > >
> > > > Signed-off-by: Valentin Vidic <[email protected]>
> > > > Cc: [email protected]
> > > > ---
> > > > drivers/block/xen-blkback/xenbus.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > > index a4bc74e72c39..43bddc996709 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -323,6 +323,7 @@ static void xen_blkif_free(struct xen_blkif *blkif)
> > > > {
> > > > WARN_ON(xen_blkif_disconnect(blkif));
> > > > xen_vbd_free(&blkif->vbd);
> > > > + xenbus_switch_state(blkif->be->dev, XenbusStateClosed);
> > > > kfree(blkif->be->mode);
> > > > kfree(blkif->be);
> > > >
> > > > @@ -814,7 +815,6 @@ static void frontend_changed(struct xenbus_device *dev,
> > > >
> > > > case XenbusStateClosed:
> > > > xen_blkif_disconnect(be->blkif);
> > > > - xenbus_switch_state(dev, XenbusStateClosed);
> > > > if (xenbus_dev_is_online(dev))
> > > > break;
> > >
> > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > until the toolstack sets online to 0, which is not good IMO.
> > >
> > > If for example a frontend decides to close a device, the backend will
> > > stay in state 'Closing' until the toolstack actually removes the disk
> > > by setting online to 0.
> > >
> > > This will prevent resetting blk connections, as blkback will refuse to
> > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > never be reached with your patch.
> >
> > Ok, is it possible to test this somehow?
>
> Yes, you can try booting a PV guest with pvgrub, that will cause
> pvgrub to open a blk connection, then close it and afterwards the OS
> kernel will start a new connection.
Indeed the boot hangs after pvgrub loads the kernel and initrd, so it
seems this does not work as expected.
--
Valentin
On Fri, Sep 07, 2018 at 12:19:29AM +0200, Valentin Vidic wrote:
> On Thu, Sep 06, 2018 at 06:29:32PM +0200, Roger Pau Monn? wrote:
> > On Wed, Sep 05, 2018 at 06:28:01PM +0200, Valentin Vidic wrote:
> > > On Wed, Sep 05, 2018 at 01:35:15PM +0200, Valentin Vidic wrote:
> > > > > AFAICT, this will cause the backend to never switch to 'Closed' state
> > > > > until the toolstack sets online to 0, which is not good IMO.
> > > > >
> > > > > If for example a frontend decides to close a device, the backend will
> > > > > stay in state 'Closing' until the toolstack actually removes the disk
> > > > > by setting online to 0.
> > > > >
> > > > > This will prevent resetting blk connections, as blkback will refuse to
> > > > > switch to state XenbusStateInitWait unless it's at XenbusStateClosed
> > > > > (see the XenbusStateInitialising case in frontend_changed), which will
> > > > > never be reached with your patch.
> > >
> > > Would it be possible to call xen_vbd_free before the state change?
> > >
> > > case XenbusStateClosed:
> > > xen_blkif_disconnect(be->blkif);
> > > xen_vbd_free(&be->blkif->vbd);
> > > xenbus_switch_state(dev, XenbusStateClosed);
> >
> > I think that will break reconnection, since xen_vbd_create is only
> > called after hotplug script execution is performed (which happens only
> > once at device attachment), but not when DomU changes frontend
> > state.
> >
> > If you want to perform this xen_vbd_free you will also have to move
> > the xen_vbd_create call AFAICT, to a place that's also called when
> > reconnecting a device. Note that I could be wrong, so it might be
> > worth a shot to try different approaches since the blkback code is
> > quite tangled and I might miss something.
>
> It seems like the Closed state is not a good point to call the remove
> script since the device could go back from Closed to Connected.
>
> Maybe it would help to introduce a new final state (7 = XenbusStateFree
> or XenbusStateRemove) that would be set after xen_vbd_free to let the
> userspace know it is safe to run the remove script?
I'm not sure that's a good idea, there are a lot of backends (apart
from blkback), and the tools won't know whether a specific backend
supports such state or not. Also the current protocol and states are
shared between all the Xen PV devices, so new additions should be
considered very carefully.
IMO the best options are either calling vbd_free/vbd_create at proper
stages in blkback or changing the hotplug script so it waits for the
device to have no open clients.
Thanks, Roger.
On Fri, Sep 07, 2018 at 09:15:30AM +0200, Roger Pau Monn? wrote:
> I'm not sure that's a good idea, there are a lot of backends (apart
> from blkback), and the tools won't know whether a specific backend
> supports such state or not. Also the current protocol and states are
> shared between all the Xen PV devices, so new additions should be
> considered very carefully.
Sure, I understand.
> IMO the best options are either calling vbd_free/vbd_create at proper
> stages in blkback or changing the hotplug script so it waits for the
> device to have no open clients.
Changing the block-drbd script would be ideal for me too, but I don't
think that piece of DRBD state is exposed at the moment.
--
Valentin
On Fri, Sep 07, 2018 at 09:23:19AM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 09:15:30AM +0200, Roger Pau Monn? wrote:
> > I'm not sure that's a good idea, there are a lot of backends (apart
> > from blkback), and the tools won't know whether a specific backend
> > supports such state or not. Also the current protocol and states are
> > shared between all the Xen PV devices, so new additions should be
> > considered very carefully.
>
> Sure, I understand.
>
> > IMO the best options are either calling vbd_free/vbd_create at proper
> > stages in blkback or changing the hotplug script so it waits for the
> > device to have no open clients.
>
> Changing the block-drbd script would be ideal for me too, but I don't
> think that piece of DRBD state is exposed at the moment.
Then I'm afraid you will have to look into the vbd_free/create fix.
Thanks, Roger.
On Fri, Sep 07, 2018 at 09:54:55AM +0200, Roger Pau Monn? wrote:
> Then I'm afraid you will have to look into the vbd_free/create fix.
Yes, here is a first draft for that idea, let me know if you see some
problems there:
--- xenbus.c.orig 2018-09-07 12:11:57.798071485 +0200
+++ xenbus.c 2018-09-07 12:14:23.536077992 +0200
@@ -758,6 +759,7 @@
enum xenbus_state frontend_state)
{
struct backend_info *be = dev_get_drvdata(&dev->dev);
+ struct block_device *bdev;
int err;
pr_debug("%s %p %s\n", __func__, dev, xenbus_strstate(frontend_state));
@@ -772,6 +774,22 @@
case XenbusStateInitialised:
case XenbusStateConnected:
+ if (!be->blkif->vbd.bdev) {
+ printk("blkdev_get");
+ bdev = blkdev_get_by_dev(be->blkif->vbd.pdevice,
+ be->blkif->vbd.readonly ?
+ FMODE_READ : FMODE_WRITE, NULL);
+
+ if (IS_ERR(bdev)) {
+ pr_warn("frontend_changed: device %08x could not be opened\n",
+ be->blkif->vbd.pdevice);
+ break;
+ }
+
+ printk("blkdev_get good");
+ be->blkif->vbd.bdev = bdev;
+ }
+
/*
* Ensure we connect even when two watches fire in
* close succession and we miss the intermediate value
@@ -808,6 +826,7 @@
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
+ xen_vbd_free(&be->blkif->vbd);
xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
--
Valentin
On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> Very frequently it is *NOT* the "original user", that "still" holds it
> open, but udev, or something triggered-by-udev.
>
> So double-checking the udev rules,
> or the "lvm global_filter" settings may help.
> You could instrument DRBD to log current->{pid,comm} on open and close,
> so you can better detect who the "someone" is in the message above.
Don't think there is anything else holding the device open, because it
is possible to change state to Secondary a few seconds later. But I will
try to print those values in case anything interesting comes up.
> Adding a small retry loop in the script may help as well.
Yes, that is an option, but it would still leave those nasty "State
change failed" messages in the log. I guess there is no way to check
the value of DRBD device->open_cnt from userspace?
--
Valentin
On Fri, Sep 07, 2018 at 12:20:26PM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 09:54:55AM +0200, Roger Pau Monn? wrote:
> > Then I'm afraid you will have to look into the vbd_free/create fix.
>
> Yes, here is a first draft for that idea, let me know if you see some
> problems there:
Thanks!
> --- xenbus.c.orig 2018-09-07 12:11:57.798071485 +0200
> +++ xenbus.c 2018-09-07 12:14:23.536077992 +0200
> @@ -758,6 +759,7 @@
> enum xenbus_state frontend_state)
> {
> struct backend_info *be = dev_get_drvdata(&dev->dev);
> + struct block_device *bdev;
> int err;
>
> pr_debug("%s %p %s\n", __func__, dev, xenbus_strstate(frontend_state));
> @@ -772,6 +774,22 @@
>
> case XenbusStateInitialised:
> case XenbusStateConnected:
> + if (!be->blkif->vbd.bdev) {
> + printk("blkdev_get");
> + bdev = blkdev_get_by_dev(be->blkif->vbd.pdevice,
> + be->blkif->vbd.readonly ?
> + FMODE_READ : FMODE_WRITE, NULL);
> +
> + if (IS_ERR(bdev)) {
> + pr_warn("frontend_changed: device %08x could not be opened\n",
> + be->blkif->vbd.pdevice);
> + break;
> + }
> +
> + printk("blkdev_get good");
> + be->blkif->vbd.bdev = bdev;
> + }
> +
I would prefer if you could avoid open-coding this here, and instead
use xen_vbd_create or similar. I would also prefer that the call to
xen_vbd_create in backend_changed was removed and we had a single call
to xen_vbd_create that's used for both initial device connection and
reconnection.
Also, I think this could cause issues if for some reason the frontend
switches to state 'Connected' before hotplug scripts have run, in
which case you would try to open an unexpected device because pdevice
won't be correctly set.
Roger.
On Fri, Sep 07, 2018 at 12:43:09PM +0200, Roger Pau Monn? wrote:
> I would prefer if you could avoid open-coding this here, and instead
> use xen_vbd_create or similar. I would also prefer that the call to
> xen_vbd_create in backend_changed was removed and we had a single call
> to xen_vbd_create that's used for both initial device connection and
> reconnection.
>
> Also, I think this could cause issues if for some reason the frontend
> switches to state 'Connected' before hotplug scripts have run, in
> which case you would try to open an unexpected device because pdevice
> won't be correctly set.
Sure, this is just to test if the idea would work and needs a lot of
cleanup. Unfortunately it does not seem to help with the original
problem because this case is not executed on VM shutdown:
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
xen_vbd_free(&be->blkif->vbd);
xenbus_switch_state(dev, XenbusStateClosed);
Instead xen_vbd_free gets run from a different code path after the
remove script has already failed:
[ 337.407634] block drbd0: State change failed: Device is held open by someone
[ 337.407673] block drbd0: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
[ 337.407713] block drbd0: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
...
[ 340.109459] Workqueue: events xen_blkif_deferred_free [xen_blkback]
[ 340.109461] 0000000000000000 ffffffff81331e54 ffff883f84d19d38 ffff883f84d19d32
[ 340.109463] ffffffffc058169e ffff883f84d19d88 ffff883f84d19d20 ffffffffc05816f7
[ 340.109465] ffff883f84d19d88 ffff883f87b5a900 ffffffff81092fea 0000000088ec3080
[ 340.109467] Call Trace:
[ 340.109471] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 340.109473] [<ffffffffc058169e>] ? xen_vbd_free.isra.9+0x2e/0x60 [xen_blkback]
[ 340.109475] [<ffffffffc05816f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
[ 340.109477] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
[ 340.109479] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
[ 340.109480] [<ffffffff81093280>] ? process_one_work+0x420/0x420
[ 340.109482] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 340.109484] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 340.109486] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
On Wed, Sep 05, 2018 at 06:27:56PM +0200, Valentin Vidic wrote:
> On Wed, Sep 05, 2018 at 12:36:49PM +0200, Roger Pau Monn? wrote:
> > On Wed, Aug 29, 2018 at 08:52:14AM +0200, Valentin Vidic wrote:
> > > Switching to closed state earlier can cause the block-drbd
> > > script to fail with 'Device is held open by someone':
> > >
> > > root: /etc/xen/scripts/block-drbd: remove XENBUS_PATH=backend/vbd/6/51712
> > > kernel: [ 2222.278235] block drbd6: State change failed: Device is held open by someone
> > > kernel: [ 2222.278304] block drbd6: state = { cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate r----- }
> > > kernel: [ 2222.278340] block drbd6: wanted = { cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate r----- }
> > > root: /etc/xen/scripts/block-drbd: Writing backend/vbd/6/51712/hotplug-error /etc/xen/scripts/block-drbd failed; error detected. backend/vbd/6/51712/hotplug-status error to xenstore.
> > > root: /etc/xen/scripts/block-drbd: /etc/xen/scripts/block-drbd failed; error detected.
Very frequently it is *NOT* the "original user", that "still" holds it
open, but udev, or something triggered-by-udev.
So double-checking the udev rules,
or the "lvm global_filter" settings may help.
You could instrument DRBD to log current->{pid,comm} on open and close,
so you can better detect who the "someone" is in the message above.
Adding a small retry loop in the script may help as well.
As a side note,
with DRBD 9 (which still moves too fast to be "in tree")
we have the "auto-promote" feature, where no explicit promotion/demotion
is necessary anymore, but DRBD (tries to) promote itself with the
first "RW" opener, and demote itself once the last opener is gone.
Hth,
Lars
On Fri, Sep 07, 2018 at 02:13:48PM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 02:03:37PM +0200, Lars Ellenberg wrote:
> > Very frequently it is *NOT* the "original user", that "still" holds it
> > open, but udev, or something triggered-by-udev.
> >
> > So double-checking the udev rules,
> > or the "lvm global_filter" settings may help.
> > You could instrument DRBD to log current->{pid,comm} on open and close,
> > so you can better detect who the "someone" is in the message above.
>
> Don't think there is anything else holding the device open, because it
> is possible to change state to Secondary a few seconds later. But I will
> try to print those values in case anything interesting comes up.
>
> > Adding a small retry loop in the script may help as well.
>
> Yes, that is an option, but it would still leave those nasty "State
> change failed" messages in the log. I guess there is no way to check
> the value of DRBD device->open_cnt from userspace?
We don't expose that, no.
But even if we did, that would not be racefree :-)
The last (or even: any?) "close" of a block device that used to be open
for WRITE triggeres a udev "change" event, thus a udev run,
and the minimal action will be some read-only open and ioctl from
(systemd-)udev itself, more likely there also will be blkid and possibly
pvscan and similar actions. All of them should be read-only openers,
and all of them should be "short".
But they will race with the drbd demotion.
You can watch that happen with
# udevadm monitor &
then open and close for write:
# : > /dev/sda
(or any other block device ...)
To excercise the drbd vs udev race,
(you can leave the udevadm monitor running if you want):
## clear dmesg
# dmesg -c > /dev/null
## promote
# drbdadm primary s0 ;
## open/close for write
# : > /dev/drbd0 ;
## the close has triggered a udev run,
## which will now race with demotion,
## which will "sometimes" fail:
# drbdadm secondary s0 ;
## wait a bit, and retry
# sleep 1; drbdadm secondary s0 ; dmesg
### --- example output on some test box right now: ----------------
@root@ava:~# udevadm monitor &
...
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent
...
@root@ava:~# dmesg -c > /dev/null; drbdadm primary s0 ; : > /dev/drbd0 ; drbdadm secondary s0 ; sleep 1; drbdadm secondary s0 ; dmesg
KERNEL[609638.990320] change /devices/virtual/block/drbd0 (block)
KERNEL[609638.991364] change /devices/virtual/block/drbd0 (block)
UDEV [609639.008879] change /devices/virtual/block/drbd0 (block)
0: State change failed: (-12) Device is held open by someone
Command 'drbdsetup-84 secondary 0' terminated with exit code 11
UDEV [609639.011652] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.017356] change /devices/virtual/block/drbd0 (block)
KERNEL[609640.018074] change /devices/virtual/block/drbd0 (block)
[609613.882751] block drbd0: role( Secondary -> Primary )
[609613.889998] block drbd0: State change failed: Device is held open by someone
[609613.894280] block drbd0: state = { cs:WFConnection ro:Primary/Unknown ds:UpToDate/Outdated r----- }
[609613.897609] block drbd0: wanted = { cs:WFConnection ro:Secondary/Unknown ds:UpToDate/Outdated r----- }
[609614.909537] block drbd0: role( Primary -> Secondary )
[609614.909662] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
UDEV [609640.024588] change /devices/virtual/block/drbd0 (block)
UDEV [609640.028731] change /devices/virtual/block/drbd0 (block)
@root@ava:~#
### --------------------------------------------------------------
Obviously change s0 and drbd0 appropriately; to exactly hit the time
window where udev has it open may be tricky, and on a busy system even
the sleep 1 can be not enough (so even the second invocation may fail
still), on an idle system udev may already be done, or may not even have
started yet when the first "secondary" runs.
Your best bet is to review your udev rules,
and make sure "drbd*" will not be looked at
by blkid or pvscan or similar.
See also: https://github.com/systemd/systemd/commit/fee854ee8ccde
respectively https://github.com/systemd/systemd/issues/9371
to get rid even of the unconditional open by systemd-udevd itself.
Note,
I'm not saying blkback has no issue; I don't know.
I'm just pointing out that there are other things
that may cause the same effects.
Lars
On Fri, Sep 07, 2018 at 03:28:28PM +0200, Lars Ellenberg wrote:
> We don't expose that, no.
> But even if we did, that would not be racefree :-)
>
> The last (or even: any?) "close" of a block device that used to be open
> for WRITE triggeres a udev "change" event, thus a udev run,
> and the minimal action will be some read-only open and ioctl from
> (systemd-)udev itself, more likely there also will be blkid and possibly
> pvscan and similar actions. All of them should be read-only openers,
> and all of them should be "short".
> But they will race with the drbd demotion.
True, but did not find any strange interaction with udev during VM shutdown.
This is what udevadm monitor reports:
KERNEL[1174.220256] remove /devices/vbd-10-51712 (xen-backend)
UDEV [1174.222484] remove /devices/vbd-10-51712 (xen-backend)
KERNEL[1174.224405] remove /devices/console-10-0 (xen-backend)
UDEV [1174.226964] remove /devices/console-10-0 (xen-backend)
KERNEL[1174.287215] change /devices/virtual/block/drbd0 (block)
KERNEL[1174.287267] change /devices/virtual/block/drbd0 (block)
UDEV [1174.295811] change /devices/virtual/block/drbd0 (block)
UDEV [1174.301983] change /devices/virtual/block/drbd0 (block)
Strace on the udev daemon gives only these:
[pid 7416] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 10 vars */]) = 0
[pid 7416] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 12 vars */]) = 0
[pid 7418] execve("/sbin/drbdadm", ["/sbin/drbdadm", "sh-udev", "minor-0"], [/* 9 vars */]) = 0
[pid 7418] execve("/lib/drbd/drbdadm-84", ["drbdadm", "sh-udev", "minor-0"], [/* 11 vars */]) = 0
and this should be 65-drbd.rules to add/remove symlinks in /dev.
Adding a dump_stack in drbd_release gives two possible code paths,
both from xen_blkback and the first one from workqueue being the
problematic one:
[ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
[ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
[ 530.698791] Call Trace:
[ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
[ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
[ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
[ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
[ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
[ 1216.251924] CPU: 14 PID: 297 Comm: xenwatch Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
[ 1216.251925] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
[ 1216.251926] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
[ 1216.251928] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
[ 1216.251930] ffffffffc0562360 97e48f39d448f082 000000000000020a 0000000000000006
[ 1216.251933] Call Trace:
[ 1216.251935] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
[ 1216.251947] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
[ 1216.251951] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
[ 1216.251954] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251956] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
[ 1216.251959] [<ffffffffc055e9fe>] ? frontend_changed+0x9e/0x660 [xen_blkback]
[ 1216.251961] [<ffffffff814120d9>] ? xenbus_read_driver_state+0x39/0x60
[ 1216.251962] [<ffffffff8141542c>] ? xenbus_otherend_changed+0x8c/0x120
[ 1216.251964] [<ffffffff81414910>] ? register_xenbus_watch+0xe0/0xe0
[ 1216.251967] [<ffffffff81414995>] ? xenwatch_thread+0x85/0x120
[ 1216.251969] [<ffffffff810bbfb0>] ? prepare_to_wait_event+0xf0/0xf0
[ 1216.251971] [<ffffffff81099329>] ? kthread+0xd9/0xf0
[ 1216.251973] [<ffffffff81099250>] ? kthread_park+0x60/0x60
[ 1216.251975] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
On Fri, Sep 07, 2018 at 06:45:00PM +0200, Valentin Vidic wrote:
> Adding a dump_stack in drbd_release gives two possible code paths,
> both from xen_blkback and the first one from workqueue being the
> problematic one:
In fact the first one is the original code path before I modified
blkback. The problem is it gets executed async from workqueue so
it might not always run before the call to drbdadm secondary.
> [ 530.698782] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G O 4.9.0-8-amd64 #1 Debian 4.9.110-3+deb9u4
> [ 530.698783] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 09/12/2016
> [ 530.698784] Workqueue: events xen_blkif_deferred_free [xen_blkback]
> [ 530.698785] 0000000000000000 ffffffff81331e54 ffff883f82143800 ffff883f741b5660
> [ 530.698787] ffffffffc03a309e ffff883f741b5580 ffffffff81245ca8 0000000281a186b5
> [ 530.698789] 000000008935f100 8412ffa31a64cc4c 000000000000020a ffff883fa6218280
> [ 530.698791] Call Trace:
> [ 530.698792] [<ffffffff81331e54>] ? dump_stack+0x5c/0x78
> [ 530.698805] [<ffffffffc03a309e>] ? drbd_release+0x1e/0x40 [drbd]
> [ 530.698810] [<ffffffff81245ca8>] ? __blkdev_put+0x1e8/0x2a0
> [ 530.698813] [<ffffffffc055d6b8>] ? xen_vbd_free.isra.9+0x48/0x60 [xen_blkback]
> [ 530.698814] [<ffffffffc055d6f7>] ? xen_blkif_deferred_free+0x27/0x70 [xen_blkback]
> [ 530.698816] [<ffffffff81092fea>] ? process_one_work+0x18a/0x420
> [ 530.698817] [<ffffffff810932cd>] ? worker_thread+0x4d/0x490
> [ 530.698818] [<ffffffff81093280>] ? process_one_work+0x420/0x420
> [ 530.698820] [<ffffffff81099329>] ? kthread+0xd9/0xf0
> [ 530.698822] [<ffffffff81099250>] ? kthread_park+0x60/0x60
> [ 530.698823] [<ffffffff81615df7>] ? ret_from_fork+0x57/0x70
--
Valentin
On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> In fact the first one is the original code path before I modified
> blkback. The problem is it gets executed async from workqueue so
> it might not always run before the call to drbdadm secondary.
As the DRBD device gets released only when the last IO request
has finished, I found a way to check and wait for this in the
block-drbd script:
--- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
+++ block-drbd 2018-09-08 09:28:12.892193649 +0200
@@ -230,6 +230,24 @@
and so cannot be mounted ${m2}${when}."
}
+wait_for_inflight()
+{
+ local dev="$1"
+ local inflight="/sys/block/${dev#/dev/}/inflight"
+ local rd wr
+
+ if ! [ -f "$inflight" ]; then
+ return
+ fi
+
+ while true; do
+ read rd wr < $inflight
+ if [ "$rd" = "0" -a "$wr" = "0" ]; then
+ return
+ fi
+ sleep 1
+ done
+}
t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
@@ -285,6 +303,8 @@
drbd_lrole="${drbd_role%%/*}"
drbd_dev="$(drbdadm sh-dev $drbd_resource)"
+ wait_for_inflight $drbd_dev
+
if [ "$drbd_lrole" != 'Secondary' ]; then
drbdadm secondary $drbd_resource
fi
--
Valentin
On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > In fact the first one is the original code path before I modified
> > blkback. The problem is it gets executed async from workqueue so
> > it might not always run before the call to drbdadm secondary.
>
> As the DRBD device gets released only when the last IO request
> has finished, I found a way to check and wait for this in the
> block-drbd script:
> --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> @@ -230,6 +230,24 @@
> and so cannot be mounted ${m2}${when}."
> }
>
> +wait_for_inflight()
> +{
> + local dev="$1"
> + local inflight="/sys/block/${dev#/dev/}/inflight"
> + local rd wr
> +
> + if ! [ -f "$inflight" ]; then
> + return
> + fi
> +
> + while true; do
> + read rd wr < $inflight
> + if [ "$rd" = "0" -a "$wr" = "0" ]; then
If it is "idle" now, but still "open",
this will not sleep, and still fail the demotion below.
> + return
> + fi
> + sleep 1
> + done
> +}
>
> t=$(xenstore_read_default "$XENBUS_PATH/type" 'MISSING')
>
> @@ -285,6 +303,8 @@
> drbd_lrole="${drbd_role%%/*}"
> drbd_dev="$(drbdadm sh-dev $drbd_resource)"
>
> + wait_for_inflight $drbd_dev
> +
> if [ "$drbd_lrole" != 'Secondary' ]; then
> drbdadm secondary $drbd_resource
You try to help it by "waiting forever until it appears to be idle".
I suggest to at least limit the retries by iteration or time.
And also (or, instead; but you'd potentially get a number of
"scary messages" in the logs) add something like:
for i in 1 2 3 5 7 x; do
drbdadm secondary $drbd_resource && exit 0
if [ $i = x ]; then
# ... "appears to still be in use, maybe by" ...
fuser -v $drbd_dev
exit 1
# else ... "will retry in $i seconds" ...
fi
sleep $i
done
...
Or, well, yes, fix blkback to not "defer" the final close "too long",
if at all possible.
Lars
On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > In fact the first one is the original code path before I modified
> > > blkback. The problem is it gets executed async from workqueue so
> > > it might not always run before the call to drbdadm secondary.
> >
> > As the DRBD device gets released only when the last IO request
> > has finished, I found a way to check and wait for this in the
> > block-drbd script:
>
> > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > @@ -230,6 +230,24 @@
> > and so cannot be mounted ${m2}${when}."
> > }
> >
> > +wait_for_inflight()
> > +{
> > + local dev="$1"
> > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > + local rd wr
> > +
> > + if ! [ -f "$inflight" ]; then
> > + return
> > + fi
> > +
> > + while true; do
> > + read rd wr < $inflight
> > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
>
> If it is "idle" now, but still "open",
> this will not sleep, and still fail the demotion below.
True, but in this case blkback is holding it open until all
the writes have finished and the last write closes the device.
Since fuser can't check blkback this is an approximation that
seems to work because I don't get any failed drbdadm calls now.
> You try to help it by "waiting forever until it appears to be idle".
> I suggest to at least limit the retries by iteration or time.
> And also (or, instead; but you'd potentially get a number of
> "scary messages" in the logs) add something like:
Ok, should I open a PR to discuss this change further?
> Or, well, yes, fix blkback to not "defer" the final close "too long",
> if at all possible.
blkback needs to finish the writes on shutdown or I get a fsck errors
on next boot. Ideally XenbusStateClosed should be delayed until the
device release but currently it does not seem possible without breaking
other things.
--
Valentin
On Mon, Sep 10, 2018 at 03:22:52PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 02:45:31PM +0200, Lars Ellenberg wrote:
> > On Sat, Sep 08, 2018 at 09:34:32AM +0200, Valentin Vidic wrote:
> > > On Fri, Sep 07, 2018 at 07:14:59PM +0200, Valentin Vidic wrote:
> > > > In fact the first one is the original code path before I modified
> > > > blkback. The problem is it gets executed async from workqueue so
> > > > it might not always run before the call to drbdadm secondary.
> > >
> > > As the DRBD device gets released only when the last IO request
> > > has finished, I found a way to check and wait for this in the
> > > block-drbd script:
> >
> > > --- block-drbd.orig 2018-09-08 09:07:23.499648515 +0200
> > > +++ block-drbd 2018-09-08 09:28:12.892193649 +0200
> > > @@ -230,6 +230,24 @@
> > > and so cannot be mounted ${m2}${when}."
> > > }
> > >
> > > +wait_for_inflight()
> > > +{
> > > + local dev="$1"
> > > + local inflight="/sys/block/${dev#/dev/}/inflight"
> > > + local rd wr
> > > +
> > > + if ! [ -f "$inflight" ]; then
> > > + return
> > > + fi
> > > +
> > > + while true; do
> > > + read rd wr < $inflight
> > > + if [ "$rd" = "0" -a "$wr" = "0" ]; then
> >
> > If it is "idle" now, but still "open",
> > this will not sleep, and still fail the demotion below.
>
> True, but in this case blkback is holding it open until all
> the writes have finished and the last write closes the device.
> Since fuser can't check blkback this is an approximation that
> seems to work because I don't get any failed drbdadm calls now.
>
> > You try to help it by "waiting forever until it appears to be idle".
> > I suggest to at least limit the retries by iteration or time.
> > And also (or, instead; but you'd potentially get a number of
> > "scary messages" in the logs) add something like:
>
> Ok, should I open a PR to discuss this change further?
>
> > Or, well, yes, fix blkback to not "defer" the final close "too long",
> > if at all possible.
>
> blkback needs to finish the writes on shutdown or I get a fsck errors
> on next boot. Ideally XenbusStateClosed should be delayed until the
> device release but currently it does not seem possible without breaking
> other things.
I can try to take a look at this and attempt to make sure the state is
only changed to closed in blkback _after_ the device has been
released, but it might take me a couple of days to get you a patch.
I'm afraid that other hotplug scripts will also have issues with such
behavior, and we shouldn't force all users of hotplug scripts to add
such workarounds.
Roger.
On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monn? wrote:
> I can try to take a look at this and attempt to make sure the state is
> only changed to closed in blkback _after_ the device has been
> released, but it might take me a couple of days to get you a patch.
Thanks, I have two test nodes now where I can try different approaches.
> I'm afraid that other hotplug scripts will also have issues with such
> behavior, and we shouldn't force all users of hotplug scripts to add
> such workarounds.
True, iSCSI and other "network" disks might have similar problems.
--
Valentin
On Mon, Sep 10, 2018 at 06:18:31PM +0200, Valentin Vidic wrote:
> On Mon, Sep 10, 2018 at 05:00:58PM +0200, Roger Pau Monn? wrote:
> > I can try to take a look at this and attempt to make sure the state is
> > only changed to closed in blkback _after_ the device has been
> > released, but it might take me a couple of days to get you a patch.
>
> Thanks, I have two test nodes now where I can try different approaches.
>
> > I'm afraid that other hotplug scripts will also have issues with such
> > behavior, and we shouldn't force all users of hotplug scripts to add
> > such workarounds.
>
> True, iSCSI and other "network" disks might have similar problems.
So I have the following patch which I think might solve your issues
while keeping the reset logic working. Would you mind giving it a try
with your use case and pvgrub?
Thanks, Roger.
---8<---
From c6a76a6265f29440c886b422c9d1b8549ffd6ed8 Mon Sep 17 00:00:00 2001
From: Roger Pau Monne <[email protected]>
Date: Thu, 13 Sep 2018 11:27:57 +0200
Subject: [PATCH] xen-blkback: only set Closed state when resources have been
released
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Don't set the backend to Closed state until all the resources have
been released and the blkback instance is gone. Setting the backend
state to Closed will be done by xenbus_dev_remove after removing the
device instance.
Note that this requires disconnecting and allowing reconnection with
state Closing, since the Closed state will only be reached when
detaching the device.
Reported-by: Valentin Vidic <[email protected]>
Signed-off-by: Roger Pau Monn? <[email protected]>
---
drivers/block/xen-blkback/xenbus.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
index a4bc74e72c39..b0629f1b0511 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -770,7 +770,8 @@ static void frontend_changed(struct xenbus_device *dev,
switch (frontend_state) {
case XenbusStateInitialising:
- if (dev->state == XenbusStateClosed) {
+ if (dev->state == XenbusStateClosed ||
+ dev->state == XenbusStateClosing) {
pr_info("%s: prepare for reconnect\n", dev->nodename);
xenbus_switch_state(dev, XenbusStateInitWait);
}
@@ -809,12 +810,12 @@ static void frontend_changed(struct xenbus_device *dev,
break;
case XenbusStateClosing:
+ xen_blkif_disconnect(be->blkif);
xenbus_switch_state(dev, XenbusStateClosing);
break;
case XenbusStateClosed:
xen_blkif_disconnect(be->blkif);
- xenbus_switch_state(dev, XenbusStateClosed);
if (xenbus_dev_is_online(dev))
break;
/* fall through */
--
2.18.0
On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monn? wrote:
> So I have the following patch which I think might solve your issues
> while keeping the reset logic working. Would you mind giving it a try
> with your use case and pvgrub?
Thanks for the patch. It seems to be having some problems with pvgrub:
machines don't boot past pvgrub and they are using 100% CPU. Also in
dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
read(26, "\\\240\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\330\0\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
read(26, "\0\200\1\0", 4) = 4
read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
read(12, "\321\1\0\0", 4) = 4
write(12, "\321\1\0\0", 4) = 4
--
Valentin
On Fri, Sep 14, 2018 at 01:49:51PM +0200, Valentin Vidic wrote:
> On Thu, Sep 13, 2018 at 05:08:00PM +0200, Roger Pau Monn? wrote:
> > So I have the following patch which I think might solve your issues
> > while keeping the reset logic working. Would you mind giving it a try
> > with your use case and pvgrub?
>
> Thanks for the patch. It seems to be having some problems with pvgrub:
> machines don't boot past pvgrub and they are using 100% CPU. Also in
> dom0 xenstored is usind 100% CPU - strace reports it is looping in this:
>
> read(26, "\\\240\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\\232`\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/1/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\330\0\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\JG\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/2/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0004", 29) = 29
> read(26, "\0\200\1\0", 4) = 4
> read(26, "\0\0\0\0L\0\0\0)\0\0\0\35\0\0\0+\\Z\373\231\31\1&", 24) = 24
> read(26, "/local/domain/0/backend/vbd/5/51712/state", 41) = 41
> read(26, "\2\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\1\0\0\0004", 29) = 29
> read(12, "\321\1\0\0", 4) = 4
> write(12, "\321\1\0\0", 4) = 4
I see the issue, blkfront in pvgrub is waiting for the backend to
switch to Closed state before booting the OS kernel, so my fix is not
going to work. I guess I will have to find a proper way to switch to
Closed state.
Thanks, Roger.