Hi, Eduardo!
I am working on a frontend driver (PV DRM) and also seeing some strange
things on driver unloading:
xt# rmmod -f drm_xen_front.ko
[ 3236.462497] [drm] Unregistering XEN PV vdispl
[ 3236.485745] [drm:xen_drv_remove [drm_xen_front]] *ERROR* Backend
state is InitWait while removing driver
[ 3236.486950] vdispl vdispl-0: 22 freeing event channel 11
[ 3236.496123] vdispl vdispl-0: failed to write error node for
device/vdispl/0 (22 freeing event channel 11)
[ 3236.496271] vdispl vdispl-0: 22 freeing event channel 12
[ 3236.501633] vdispl vdispl-0: failed to write error node for
device/vdispl/0 (22 freeing event channel 12)
These are somewhat different from your use-case with grant references,
but I have a question:
do you really see that XenbusStateClosed and XenbusStateClosing are
called? In my driver I can't see those and once I tried to dig deeper
into the problem
I saw that on driver removal it is disconnected from XenBus, so no
backend state
change events come in via .otherend_changed callback.
The only difference I see here is that the backend is a user-space
application
Thank you,
Oleksandr
On 11/23/2017 04:18 PM, Eduardo Otubo wrote:
> v2:
> * Replace busy wait with wait_event()/wake_up_all()
> * Cannot garantee that at the time xennet_remove is called, the
> xen_netback state will not be XenbusStateClosed, so added a
> condition for that
> * There's a small chance for the xen_netback state is
> XenbusStateUnknown by the time the xen_netfront switches to Closed,
> so added a condition for that.
>
> When unloading module xen_netfront from guest, dmesg would output
> warning messages like below:
>
> [ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
> [ 105.236839] deferring g.e. 0x903 (pfn 0x35805)
>
> This problem relies on netfront and netback being out of sync. By the time
> netfront revokes the g.e.'s netback didn't have enough time to free all of
> them, hence displaying the warnings on dmesg.
>
> The trick here is to make netfront to wait until netback frees all the g.e.'s
> and only then continue to cleanup for the module removal, and this is done by
> manipulating both device states.
>
> Signed-off-by: Eduardo Otubo <[email protected]>
> ---
> drivers/net/xen-netfront.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 8b8689c6d887..391432e2725d 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -87,6 +87,8 @@ struct netfront_cb {
> /* IRQ name is queue name with "-tx" or "-rx" appended */
> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>
> +static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
> +
> struct netfront_stats {
> u64 packets;
> u64 bytes;
> @@ -2021,10 +2023,12 @@ static void netback_changed(struct xenbus_device *dev,
> break;
>
> case XenbusStateClosed:
> + wake_up_all(&module_unload_q);
> if (dev->state == XenbusStateClosed)
> break;
> /* Missed the backend's CLOSING state -- fallthrough */
> case XenbusStateClosing:
> + wake_up_all(&module_unload_q);
> xenbus_frontend_closed(dev);
> break;
> }
> @@ -2130,6 +2134,20 @@ static int xennet_remove(struct xenbus_device *dev)
>
> dev_dbg(&dev->dev, "%s\n", dev->nodename);
>
> + if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
> + xenbus_switch_state(dev, XenbusStateClosing);
> + wait_event(module_unload_q,
> + xenbus_read_driver_state(dev->otherend) ==
> + XenbusStateClosing);
> +
> + xenbus_switch_state(dev, XenbusStateClosed);
> + wait_event(module_unload_q,
> + xenbus_read_driver_state(dev->otherend) ==
> + XenbusStateClosed ||
> + xenbus_read_driver_state(dev->otherend) ==
> + XenbusStateUnknown);
> + }
> +
> xennet_disconnect_backend(info);
>
> unregister_netdev(info->netdev);
On Wed, Jan 31, 2018 at 05:00:23PM +0200, Oleksandr Andrushchenko wrote:
> Hi, Eduardo!
>
> I am working on a frontend driver (PV DRM) and also seeing some strange
>
> things on driver unloading:
>
> xt# rmmod -f drm_xen_front.ko
> [ 3236.462497] [drm] Unregistering XEN PV vdispl
> [ 3236.485745] [drm:xen_drv_remove [drm_xen_front]] *ERROR* Backend state is
> InitWait while removing driver
> [ 3236.486950] vdispl vdispl-0: 22 freeing event channel 11
> [ 3236.496123] vdispl vdispl-0: failed to write error node for
> device/vdispl/0 (22 freeing event channel 11)
> [ 3236.496271] vdispl vdispl-0: 22 freeing event channel 12
> [ 3236.501633] vdispl vdispl-0: failed to write error node for
> device/vdispl/0 (22 freeing event channel 12)
>
> These are somewhat different from your use-case with grant references, but I
> have a question:
>
> do you really see that XenbusStateClosed and XenbusStateClosing are
>
> called? In my driver I can't see those and once I tried to dig deeper into
> the problem
>
> I saw that on driver removal it is disconnected from XenBus, so no backend
> state
>
> change events come in via .otherend_changed callback.
>
> The only difference I see here is that the backend is a user-space
> application
>
> Thank you,
> Oleksandr
To be honest, most of the things I assumed were true, according to some talks on
IRC with maintainers. Since I assumed it was true I started to write code based
on that and all the behaviors that followed were correct according to my
assumptions (and discussions).
But if you find something else weird, please let me know and we can fix it.
>
> On 11/23/2017 04:18 PM, Eduardo Otubo wrote:
> > v2:
> > * Replace busy wait with wait_event()/wake_up_all()
> > * Cannot garantee that at the time xennet_remove is called, the
> > xen_netback state will not be XenbusStateClosed, so added a
> > condition for that
> > * There's a small chance for the xen_netback state is
> > XenbusStateUnknown by the time the xen_netfront switches to Closed,
> > so added a condition for that.
> >
> > When unloading module xen_netfront from guest, dmesg would output
> > warning messages like below:
> >
> > [ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
> > [ 105.236839] deferring g.e. 0x903 (pfn 0x35805)
> >
> > This problem relies on netfront and netback being out of sync. By the time
> > netfront revokes the g.e.'s netback didn't have enough time to free all of
> > them, hence displaying the warnings on dmesg.
> >
> > The trick here is to make netfront to wait until netback frees all the g.e.'s
> > and only then continue to cleanup for the module removal, and this is done by
> > manipulating both device states.
> >
> > Signed-off-by: Eduardo Otubo <[email protected]>
> > ---
> > drivers/net/xen-netfront.c | 18 ++++++++++++++++++
> > 1 file changed, 18 insertions(+)
> >
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 8b8689c6d887..391432e2725d 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -87,6 +87,8 @@ struct netfront_cb {
> > /* IRQ name is queue name with "-tx" or "-rx" appended */
> > #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
> > +static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
> > +
> > struct netfront_stats {
> > u64 packets;
> > u64 bytes;
> > @@ -2021,10 +2023,12 @@ static void netback_changed(struct xenbus_device *dev,
> > break;
> > case XenbusStateClosed:
> > + wake_up_all(&module_unload_q);
> > if (dev->state == XenbusStateClosed)
> > break;
> > /* Missed the backend's CLOSING state -- fallthrough */
> > case XenbusStateClosing:
> > + wake_up_all(&module_unload_q);
> > xenbus_frontend_closed(dev);
> > break;
> > }
> > @@ -2130,6 +2134,20 @@ static int xennet_remove(struct xenbus_device *dev)
> > dev_dbg(&dev->dev, "%s\n", dev->nodename);
> > + if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
> > + xenbus_switch_state(dev, XenbusStateClosing);
> > + wait_event(module_unload_q,
> > + xenbus_read_driver_state(dev->otherend) ==
> > + XenbusStateClosing);
> > +
> > + xenbus_switch_state(dev, XenbusStateClosed);
> > + wait_event(module_unload_q,
> > + xenbus_read_driver_state(dev->otherend) ==
> > + XenbusStateClosed ||
> > + xenbus_read_driver_state(dev->otherend) ==
> > + XenbusStateUnknown);
> > + }
> > +
> > xennet_disconnect_backend(info);
> > unregister_netdev(info->netdev);
>
--
Eduardo Otubo
On 02/02/2018 10:54 AM, Eduardo Otubo wrote:
> On Wed, Jan 31, 2018 at 05:00:23PM +0200, Oleksandr Andrushchenko wrote:
>> Hi, Eduardo!
>>
>> I am working on a frontend driver (PV DRM) and also seeing some strange
>>
>> things on driver unloading:
>>
>> xt# rmmod -f drm_xen_front.ko
>> [ 3236.462497] [drm] Unregistering XEN PV vdispl
>> [ 3236.485745] [drm:xen_drv_remove [drm_xen_front]] *ERROR* Backend state is
>> InitWait while removing driver
>> [ 3236.486950] vdispl vdispl-0: 22 freeing event channel 11
>> [ 3236.496123] vdispl vdispl-0: failed to write error node for
>> device/vdispl/0 (22 freeing event channel 11)
>> [ 3236.496271] vdispl vdispl-0: 22 freeing event channel 12
>> [ 3236.501633] vdispl vdispl-0: failed to write error node for
>> device/vdispl/0 (22 freeing event channel 12)
>>
>> These are somewhat different from your use-case with grant references, but I
>> have a question:
>>
>> do you really see that XenbusStateClosed and XenbusStateClosing are
>>
>> called? In my driver I can't see those and once I tried to dig deeper into
>> the problem
>>
>> I saw that on driver removal it is disconnected from XenBus, so no backend
>> state
>>
>> change events come in via .otherend_changed callback.
>>
>> The only difference I see here is that the backend is a user-space
>> application
>>
>> Thank you,
>> Oleksandr
> To be honest, most of the things I assumed were true, according to some talks on
> IRC with maintainers. Since I assumed it was true I started to write code based
> on that and all the behaviors that followed were correct according to my
> assumptions (and discussions).
>
> But if you find something else weird, please let me know and we can fix it.
There is nothing wrong with the patch. One thing that
I cannot get in my driver is that .otherend_changed callback
is not called on .remove. Please see [1]
>> On 11/23/2017 04:18 PM, Eduardo Otubo wrote:
>>> v2:
>>> * Replace busy wait with wait_event()/wake_up_all()
>>> * Cannot garantee that at the time xennet_remove is called, the
>>> xen_netback state will not be XenbusStateClosed, so added a
>>> condition for that
>>> * There's a small chance for the xen_netback state is
>>> XenbusStateUnknown by the time the xen_netfront switches to Closed,
>>> so added a condition for that.
>>>
>>> When unloading module xen_netfront from guest, dmesg would output
>>> warning messages like below:
>>>
>>> [ 105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
>>> [ 105.236839] deferring g.e. 0x903 (pfn 0x35805)
>>>
>>> This problem relies on netfront and netback being out of sync. By the time
>>> netfront revokes the g.e.'s netback didn't have enough time to free all of
>>> them, hence displaying the warnings on dmesg.
>>>
>>> The trick here is to make netfront to wait until netback frees all the g.e.'s
>>> and only then continue to cleanup for the module removal, and this is done by
>>> manipulating both device states.
>>>
>>> Signed-off-by: Eduardo Otubo <[email protected]>
>>> ---
>>> drivers/net/xen-netfront.c | 18 ++++++++++++++++++
>>> 1 file changed, 18 insertions(+)
>>>
>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>> index 8b8689c6d887..391432e2725d 100644
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>>> @@ -87,6 +87,8 @@ struct netfront_cb {
>>> /* IRQ name is queue name with "-tx" or "-rx" appended */
>>> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>>> +static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
>>> +
>>> struct netfront_stats {
>>> u64 packets;
>>> u64 bytes;
>>> @@ -2021,10 +2023,12 @@ static void netback_changed(struct xenbus_device *dev,
>>> break;
>>> case XenbusStateClosed:
>>> + wake_up_all(&module_unload_q);
>>> if (dev->state == XenbusStateClosed)
>>> break;
>>> /* Missed the backend's CLOSING state -- fallthrough */
>>> case XenbusStateClosing:
>>> + wake_up_all(&module_unload_q);
>>> xenbus_frontend_closed(dev);
>>> break;
>>> }
>>> @@ -2130,6 +2134,20 @@ static int xennet_remove(struct xenbus_device *dev)
>>> dev_dbg(&dev->dev, "%s\n", dev->nodename);
>>> + if (xenbus_read_driver_state(dev->otherend) != XenbusStateClosed) {
>>> + xenbus_switch_state(dev, XenbusStateClosing);
>>> + wait_event(module_unload_q,
>>> + xenbus_read_driver_state(dev->otherend) ==
>>> + XenbusStateClosing);
>>> +
>>> + xenbus_switch_state(dev, XenbusStateClosed);
>>> + wait_event(module_unload_q,
>>> + xenbus_read_driver_state(dev->otherend) ==
>>> + XenbusStateClosed ||
>>> + xenbus_read_driver_state(dev->otherend) ==
>>> + XenbusStateUnknown);
>>> + }
>>> +
>>> xennet_disconnect_backend(info);
>>> unregister_netdev(info->netdev);
[1] https://patchwork.kernel.org/patch/10195163/