When loading module manually, after call xenbus_switch_state to initializes
the state of the netfront device, the driver state did not change so fast
that may lead no dev created in latest kernel. This patch adds wait to make
sure xenbus knows the driver is not in closed/unknown state.
Current state:
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Cannot get device settings: No such device
Cannot get wake-on-lan settings: No such device
Cannot get message level: No such device
Cannot get link status: No such device
No data available
With the patch installed.
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
[vm]# modprobe -r xen_netfront
[vm]# modprobe xen_netfront
[vm]# ethtool eth0
Settings for eth0:
Link detected: yes
Signed-off-by: Xiao Liang <[email protected]>
---
drivers/net/xen-netfront.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index a57daecf1d57..2d8812dd1534 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,6 +87,7 @@ struct netfront_cb {
/* IRQ name is queue name with "-tx" or "-rx" appended */
#define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
+static DECLARE_WAIT_QUEUE_HEAD(module_load_q);
static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
struct netfront_stats {
@@ -1330,6 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
netif_carrier_off(netdev);
xenbus_switch_state(dev, XenbusStateInitialising);
+ wait_event(module_load_q,
+ xenbus_read_driver_state(dev->otherend) !=
+ XenbusStateClosed &&
+ xenbus_read_driver_state(dev->otherend) !=
+ XenbusStateUnknown);
return netdev;
exit:
--
2.17.1
On 07/27/2018 05:56 AM, Xiao Liang wrote:
> When loading module manually, after call xenbus_switch_state to initializes
> the state of the netfront device, the driver state did not change so fast
> that may lead no dev created in latest kernel. This patch adds wait to make
> sure xenbus knows the driver is not in closed/unknown state.
>
> Current state:
> [vm]# ethtool eth0
> Settings for eth0:
> Link detected: yes
> [vm]# modprobe -r xen_netfront
> [vm]# modprobe xen_netfront
> [vm]# ethtool eth0
> Settings for eth0:
> Cannot get device settings: No such device
> Cannot get wake-on-lan settings: No such device
> Cannot get message level: No such device
> Cannot get link status: No such device
> No data available
>
> With the patch installed.
> [vm]# ethtool eth0
> Settings for eth0:
> Link detected: yes
> [vm]# modprobe -r xen_netfront
> [vm]# modprobe xen_netfront
> [vm]# ethtool eth0
> Settings for eth0:
> Link detected: yes
>
> Signed-off-by: Xiao Liang <[email protected]>
> ---
> drivers/net/xen-netfront.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index a57daecf1d57..2d8812dd1534 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -87,6 +87,7 @@ struct netfront_cb {
> /* IRQ name is queue name with "-tx" or "-rx" appended */
> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>
> +static DECLARE_WAIT_QUEUE_HEAD(module_load_q);
> static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
>
> struct netfront_stats {
> @@ -1330,6 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
> netif_carrier_off(netdev);
>
> xenbus_switch_state(dev, XenbusStateInitialising);
> + wait_event(module_load_q,
> + xenbus_read_driver_state(dev->otherend) !=
> + XenbusStateClosed &&
> + xenbus_read_driver_state(dev->otherend) !=
> + XenbusStateUnknown);
> return netdev;
>
> exit:
Should we have a wake_up somewhere? And what about other states --- is,
for example, XenbusStateClosing a valid reason to continue?
-boris
From: Xiao Liang <[email protected]>
Date: Fri, 27 Jul 2018 17:56:08 +0800
> @@ -1330,6 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
> netif_carrier_off(netdev);
>
> xenbus_switch_state(dev, XenbusStateInitialising);
> + wait_event(module_load_q,
> + xenbus_read_driver_state(dev->otherend) !=
> + XenbusStateClosed &&
> + xenbus_read_driver_state(dev->otherend) !=
> + XenbusStateUnknown);
> return netdev;
>
> exit:
What performs the wakeups that will trigger for this sleep site?
Thank you.
Thanks, Boris
Please see my reply inline.
On 07/28/2018 02:40 AM, Boris Ostrovsky wrote:
> On 07/27/2018 05:56 AM, Xiao Liang wrote:
>> When loading module manually, after call xenbus_switch_state to initializes
>> the state of the netfront device, the driver state did not change so fast
>> that may lead no dev created in latest kernel. This patch adds wait to make
>> sure xenbus knows the driver is not in closed/unknown state.
>>
>> Current state:
>> [vm]# ethtool eth0
>> Settings for eth0:
>> Link detected: yes
>> [vm]# modprobe -r xen_netfront
>> [vm]# modprobe xen_netfront
>> [vm]# ethtool eth0
>> Settings for eth0:
>> Cannot get device settings: No such device
>> Cannot get wake-on-lan settings: No such device
>> Cannot get message level: No such device
>> Cannot get link status: No such device
>> No data available
>>
>> With the patch installed.
>> [vm]# ethtool eth0
>> Settings for eth0:
>> Link detected: yes
>> [vm]# modprobe -r xen_netfront
>> [vm]# modprobe xen_netfront
>> [vm]# ethtool eth0
>> Settings for eth0:
>> Link detected: yes
>>
>> Signed-off-by: Xiao Liang <[email protected]>
>> ---
>> drivers/net/xen-netfront.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index a57daecf1d57..2d8812dd1534 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -87,6 +87,7 @@ struct netfront_cb {
>> /* IRQ name is queue name with "-tx" or "-rx" appended */
>> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>>
>> +static DECLARE_WAIT_QUEUE_HEAD(module_load_q);
>> static DECLARE_WAIT_QUEUE_HEAD(module_unload_q);
>>
>> struct netfront_stats {
>> @@ -1330,6 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
>> netif_carrier_off(netdev);
>>
>> xenbus_switch_state(dev, XenbusStateInitialising);
>> + wait_event(module_load_q,
>> + xenbus_read_driver_state(dev->otherend) !=
>> + XenbusStateClosed &&
>> + xenbus_read_driver_state(dev->otherend) !=
>> + XenbusStateUnknown);
>> return netdev;
>>
>> exit:
>
> Should we have a wake_up somewhere?
In my understanding, netback_changed handles it if dev state is in
XenbusStateInitialising and otherend is in XenbusStateInitWait, and then
create connection to backend.
But in most cases, it breaks out as dev->state not in
XenbusStateInitialising. So I added a wait here.
> And what about other states --- is,
> for example, XenbusStateClosing a valid reason to continue?
I think XenbusStateClosing should not be a valid reason to continue.
My purpose is waiting otherend status to be XenbusStateInitWait(after
new dev created).To avoid unnecessary impact, I only check it leaves
closed and unknow state in this patch.
In my testing, hotplug vifs from guest in host or load/unload module in
guest over 100 times, only waiting XenbusStateInitWait or as this patch
does, both are working.
vifs can be created each time successfully.
Thanks,
Xiao Liang
>
>
> -boris
>
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Thanks, David
On 07/29/2018 11:30 PM, David Miller wrote:
> From: Xiao Liang <[email protected]>
> Date: Fri, 27 Jul 2018 17:56:08 +0800
>
>> @@ -1330,6 +1331,11 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
>> netif_carrier_off(netdev);
>>
>> xenbus_switch_state(dev, XenbusStateInitialising);
>> + wait_event(module_load_q,
>> + xenbus_read_driver_state(dev->otherend) !=
>> + XenbusStateClosed &&
>> + xenbus_read_driver_state(dev->otherend) !=
>> + XenbusStateUnknown);
>> return netdev;
>>
>> exit:
> What performs the wakeups that will trigger for this sleep site?
In my understanding, backend leaving closed/unknow state can trigger the
wakeups. I mean to make sure both sides are ready for creating connection.
Thanks,
Liang
>
> Thank you.
>
> _______________________________________________
> Xen-devel mailing list
> [email protected]
> https://lists.xenproject.org/mailman/listinfo/xen-devel
From: Xiao Liang <[email protected]>
Date: Fri, 27 Jul 2018 17:56:08 +0800
> When loading module manually, after call xenbus_switch_state to initializes
> the state of the netfront device, the driver state did not change so fast
> that may lead no dev created in latest kernel. This patch adds wait to make
> sure xenbus knows the driver is not in closed/unknown state.
...
> Signed-off-by: Xiao Liang <[email protected]>
Applied and queued up for -stable, thanks.
On 07/30/2018, 10:18 AM, Xiao Liang wrote:
> On 07/29/2018 11:30 PM, David Miller wrote:
>> From: Xiao Liang <[email protected]>
>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>
>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>> *xennet_create_dev(struct xenbus_device *dev)
>>> netif_carrier_off(netdev);
>>> xenbus_switch_state(dev, XenbusStateInitialising);
>>> + wait_event(module_load_q,
>>> + xenbus_read_driver_state(dev->otherend) !=
>>> + XenbusStateClosed &&
>>> + xenbus_read_driver_state(dev->otherend) !=
>>> + XenbusStateUnknown);
>>> return netdev;
>>> exit:
>> What performs the wakeups that will trigger for this sleep site?
> In my understanding, backend leaving closed/unknow state can trigger the
> wakeups. I mean to make sure both sides are ready for creating connection.
While backporting this to 4.12, I was surprised by the commit the same
as Boris and David.
So I assume the explanation is that wake_up_all of module_unload_q in
netback_changed wakes also all the processes waiting on module_load_q?
If so, what makes sure that module_unload_q is queued and the process is
the same as for module_load_q?
To me, it looks rather error-prone. Unless it is erroneous now, at least
for future changes. Wouldn't it make sense to wake up module_load_q
along with module_unload_q in netback_changed? Or drop module_load_q
completely and use only module_unload_q (i.e. in xennet_create_dev too)?
thanks,
--
js
suse labs
On 24/08/18 13:12, Jiri Slaby wrote:
> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>> On 07/29/2018 11:30 PM, David Miller wrote:
>>> From: Xiao Liang <[email protected]>
>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>
>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>> netif_carrier_off(netdev);
>>>> xenbus_switch_state(dev, XenbusStateInitialising);
>>>> + wait_event(module_load_q,
>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>> + XenbusStateClosed &&
>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>> + XenbusStateUnknown);
>>>> return netdev;
>>>> exit:
>>> What performs the wakeups that will trigger for this sleep site?
>> In my understanding, backend leaving closed/unknow state can trigger the
>> wakeups. I mean to make sure both sides are ready for creating connection.
>
> While backporting this to 4.12, I was surprised by the commit the same
> as Boris and David.
>
> So I assume the explanation is that wake_up_all of module_unload_q in
> netback_changed wakes also all the processes waiting on module_load_q?
> If so, what makes sure that module_unload_q is queued and the process is
> the same as for module_load_q?
How could it? Either the thread is waiting on module_unload_q _or_ on
module_load_q. It can't wait on two queues at the same time.
> To me, it looks rather error-prone. Unless it is erroneous now, at least
> for future changes. Wouldn't it make sense to wake up module_load_q
> along with module_unload_q in netback_changed? Or drop module_load_q
> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
To me this looks just wrong. A thread waiting on module_load_q won't be
woken up again.
I'd drop module_load_q in favor of module_unload_q.
Juergen
On 08/24/2018 07:26 AM, Juergen Gross wrote:
> On 24/08/18 13:12, Jiri Slaby wrote:
>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>> From: Xiao Liang <[email protected]>
>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>
>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>> netif_carrier_off(netdev);
>>>>> xenbus_switch_state(dev, XenbusStateInitialising);
>>>>> + wait_event(module_load_q,
>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>> + XenbusStateClosed &&
>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>> + XenbusStateUnknown);
>>>>> return netdev;
>>>>> exit:
>>>> What performs the wakeups that will trigger for this sleep site?
>>> In my understanding, backend leaving closed/unknow state can trigger the
>>> wakeups. I mean to make sure both sides are ready for creating connection.
>> While backporting this to 4.12, I was surprised by the commit the same
>> as Boris and David.
>>
>> So I assume the explanation is that wake_up_all of module_unload_q in
>> netback_changed wakes also all the processes waiting on module_load_q?
>> If so, what makes sure that module_unload_q is queued and the process is
>> the same as for module_load_q?
> How could it? Either the thread is waiting on module_unload_q _or_ on
> module_load_q. It can't wait on two queues at the same time.
>
>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>> for future changes. Wouldn't it make sense to wake up module_load_q
>> along with module_unload_q in netback_changed? Or drop module_load_q
>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
> To me this looks just wrong. A thread waiting on module_load_q won't be
> woken up again.
>
> I'd drop module_load_q in favor of module_unload_q.
Yes, use single queue, but rename it to something more neutral. module_wq?
-boris
On 08/24/2018, 04:26 PM, Boris Ostrovsky wrote:
> On 08/24/2018 07:26 AM, Juergen Gross wrote:
>> On 24/08/18 13:12, Jiri Slaby wrote:
>>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>>> From: Xiao Liang <[email protected]>
>>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>>
>>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>>> netif_carrier_off(netdev);
>>>>>> xenbus_switch_state(dev, XenbusStateInitialising);
>>>>>> + wait_event(module_load_q,
>>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>>> + XenbusStateClosed &&
>>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>>> + XenbusStateUnknown);
>>>>>> return netdev;
>>>>>> exit:
>>>>> What performs the wakeups that will trigger for this sleep site?
>>>> In my understanding, backend leaving closed/unknow state can trigger the
>>>> wakeups. I mean to make sure both sides are ready for creating connection.
>>> While backporting this to 4.12, I was surprised by the commit the same
>>> as Boris and David.
>>>
>>> So I assume the explanation is that wake_up_all of module_unload_q in
>>> netback_changed wakes also all the processes waiting on module_load_q?
>>> If so, what makes sure that module_unload_q is queued and the process is
>>> the same as for module_load_q?
>> How could it? Either the thread is waiting on module_unload_q _or_ on
>> module_load_q. It can't wait on two queues at the same time.
>>
>>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>>> for future changes. Wouldn't it make sense to wake up module_load_q
>>> along with module_unload_q in netback_changed? Or drop module_load_q
>>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
>> To me this looks just wrong. A thread waiting on module_load_q won't be
>> woken up again.
>>
>> I'd drop module_load_q in favor of module_unload_q.
>
>
> Yes, use single queue, but rename it to something more neutral. module_wq?
Can somebody who is actually using the module fix this, please?
I could fix it, but untested changes are "a bit" worse than tested changes.
thanks,
--
js
suse labs
On 07/09/18 13:06, Jiri Slaby wrote:
> On 08/24/2018, 04:26 PM, Boris Ostrovsky wrote:
>> On 08/24/2018 07:26 AM, Juergen Gross wrote:
>>> On 24/08/18 13:12, Jiri Slaby wrote:
>>>> On 07/30/2018, 10:18 AM, Xiao Liang wrote:
>>>>> On 07/29/2018 11:30 PM, David Miller wrote:
>>>>>> From: Xiao Liang <[email protected]>
>>>>>> Date: Fri, 27 Jul 2018 17:56:08 +0800
>>>>>>
>>>>>>> @@ -1330,6 +1331,11 @@ static struct net_device
>>>>>>> *xennet_create_dev(struct xenbus_device *dev)
>>>>>>> netif_carrier_off(netdev);
>>>>>>> xenbus_switch_state(dev, XenbusStateInitialising);
>>>>>>> + wait_event(module_load_q,
>>>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>>>> + XenbusStateClosed &&
>>>>>>> + xenbus_read_driver_state(dev->otherend) !=
>>>>>>> + XenbusStateUnknown);
>>>>>>> return netdev;
>>>>>>> exit:
>>>>>> What performs the wakeups that will trigger for this sleep site?
>>>>> In my understanding, backend leaving closed/unknow state can trigger the
>>>>> wakeups. I mean to make sure both sides are ready for creating connection.
>>>> While backporting this to 4.12, I was surprised by the commit the same
>>>> as Boris and David.
>>>>
>>>> So I assume the explanation is that wake_up_all of module_unload_q in
>>>> netback_changed wakes also all the processes waiting on module_load_q?
>>>> If so, what makes sure that module_unload_q is queued and the process is
>>>> the same as for module_load_q?
>>> How could it? Either the thread is waiting on module_unload_q _or_ on
>>> module_load_q. It can't wait on two queues at the same time.
>>>
>>>> To me, it looks rather error-prone. Unless it is erroneous now, at least
>>>> for future changes. Wouldn't it make sense to wake up module_load_q
>>>> along with module_unload_q in netback_changed? Or drop module_load_q
>>>> completely and use only module_unload_q (i.e. in xennet_create_dev too)?
>>> To me this looks just wrong. A thread waiting on module_load_q won't be
>>> woken up again.
>>>
>>> I'd drop module_load_q in favor of module_unload_q.
>>
>>
>> Yes, use single queue, but rename it to something more neutral. module_wq?
>
> Can somebody who is actually using the module fix this, please?
Already at it.
Juergen