The linkwatch_event work queue runs up to one second later.
When the MicroVM starts, it takes 300+ms for the ethX flag
to change from '+UP +LOWER_UP' to '+RUNNING', as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP +LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING
Let's manually trigger it here to make the network service start faster.
After applying this patch, the time consumption of
systemd-networkd.service was reduced from 366ms to 50ms.
Signed-off-by: Wen Yang <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jiri Pirko <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Julian Wiedmann <[email protected]>
Cc: [email protected]
Cc: [email protected]
---
net/core/link_watch.c | 3 +++
net/core/rtnetlink.c | 1 +
2 files changed, 4 insertions(+)
diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index 75431ca..6b9d44b 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
return true;
+ if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+ return true;
+
return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 58c484a..fd0b3b6 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2604,6 +2604,7 @@ static int do_setlink(const struct sk_buff *skb,
extack);
if (err < 0)
goto errout;
+ linkwatch_fire_event(dev);
}
if (tb[IFLA_MASTER]) {
--
1.8.3.1
From: Wen Yang <[email protected]>
Date: Sat, 1 Aug 2020 16:58:45 +0800
> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
> index 75431ca..6b9d44b 100644
> --- a/net/core/link_watch.c
> +++ b/net/core/link_watch.c
> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
> if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
> return true;
>
> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> + return true;
> +
> return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
> }
>
You're bypassing explicitly the logic here:
/*
* Limit the number of linkwatch events to one
* per second so that a runaway driver does not
* cause a storm of messages on the netlink
* socket. This limit does not apply to up events
* while the device qdisc is down.
*/
if (!urgent_only)
linkwatch_nextevent = jiffies + HZ;
/* Limit wrap-around effect on delay. */
else if (time_after(linkwatch_nextevent, jiffies + HZ))
linkwatch_nextevent = jiffies;
Something about this isn't right. We need to analyze what you are seeing,
what device you are using, and what systemd is doing to figure out what
the right place for the fix.
Thank you.
?? 2020/8/5 ????6:58, David Miller д??:
> From: Wen Yang <[email protected]>
> Date: Sat, 1 Aug 2020 16:58:45 +0800
>
>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>> index 75431ca..6b9d44b 100644
>> --- a/net/core/link_watch.c
>> +++ b/net/core/link_watch.c
>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct net_device *dev)
>> if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>> return true;
>>
>> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>> + return true;
>> +
>> return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
>> }
>>
>
> You're bypassing explicitly the logic here:
>
> /*
> * Limit the number of linkwatch events to one
> * per second so that a runaway driver does not
> * cause a storm of messages on the netlink
> * socket. This limit does not apply to up events
> * while the device qdisc is down.
> */
> if (!urgent_only)
> linkwatch_nextevent = jiffies + HZ;
> /* Limit wrap-around effect on delay. */
> else if (time_after(linkwatch_nextevent, jiffies + HZ))
> linkwatch_nextevent = jiffies;
>
> Something about this isn't right. We need to analyze what you are seeing,
> what device you are using, and what systemd is doing to figure out what
> the right place for the fix.
>
> Thank you.
>
Thank you very much for your comments.
We are using virtio_net and the environment is a microvm similar to
firecracker.
Let's briefly explain.
net_device->operstate is assigned through linkwatch_event, and the call
stack is as follows:
process_one_work
-> linkwatch_event
-> __linkwatch_run_queue
-> linkwatch_do_dev
-> rfc2863_policy
-> default_operstate
During the machine startup process, net_device->operstate has the
following two-step state changes:
STEP A: virtnet_probe detects the network card and triggers the
execution of linkwatch_fire_event.
Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
And since net_device->state is 6 (__LINK_STATE_PRESENT |
__LINK_STATE_NOCARRIER), net_device->operstate will be changed from
IF_OPER_UNKNOWN to IF_OPER_DOWN:
eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)
virtnet_probe then executes netif_carrier_on to update
net_device->state, it will be changed from ??__LINK_STATE_PRESENT |
__LINK_STATE_NOCARRIER?? to __LINK_STATE_PRESENT:
eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2
(__LINK_STATE_PRESENT)
STEP B: One second later (because linkwatch_nextevent = jiffies + HZ),
linkwatch_work is executed again.
At this time, since net_device->state is __LINK_STATE_PRESENT, so the
net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)
The above state change can be completed within 2 seconds.
Generally, the machine will load the initramfs first, and do some
initialization in the initramfs, which takes some time; then switch_root
to the system disk and continue the initialization, which will also take
some time, and finally start the systemd-networkd service, bringing
link, etc.,
In this way, the linkwatch_work work queue has enough time to run twice,
and the state of net_device->operstate is already IF_OPER_UP,
So bringing link up quickly returns the following information:
Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
eth0: bringing link up
...
Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
eth0: flags change: +UP +LOWER_UP +RUNNING
But we are now using MicroVM, which requires extreme speed to start,
bypassing the initramfs and directly booting the trimmed system on the disk.
systemd-networkd starts in less than 1 second after booting. the STEP B
has not been run yet, so it will wait for several hundred milliseconds
here, as follows:
Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
...
Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP
+LOWER_UP
...
Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change: +RUNNING
Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
https://www.kernel.org/doc/Documentation/networking/operstates.txt
A routing daemon or dhcp client just needs to care for IFF_RUNNING or
waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
considering the interface / querying a DHCP address.
Finally, the STEP B above only updates the value of operstate based on
the known state (operstate/state) on the net_device, without any
hardware interaction involved, so it is not very reasonable to wait for
1 second there.
By adding:
+ if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
+ return true;
+
We hope to improve the linkwatch_urgent_event function a bit.
Hope to get more of your advice and guidance.
Best wishes,
Wen
on 2020/8/6 PM5:09, Wen Yang wrote:
>
>
> 在 2020/8/5 上午6:58, David Miller 写道:
>> From: Wen Yang <[email protected]>
>> Date: Sat, 1 Aug 2020 16:58:45 +0800
>>
>>> diff --git a/net/core/link_watch.c b/net/core/link_watch.c
>>> index 75431ca..6b9d44b 100644
>>> --- a/net/core/link_watch.c
>>> +++ b/net/core/link_watch.c
>>> @@ -98,6 +98,9 @@ static bool linkwatch_urgent_event(struct
>>> net_device *dev)
>>> if (netif_is_lag_port(dev) || netif_is_lag_master(dev))
>>> return true;
>>> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
>>> + return true;
>>> +
>>> return netif_carrier_ok(dev) && qdisc_tx_changing(dev);
>>> }
>>
>> You're bypassing explicitly the logic here:
>>
>> /*
>> * Limit the number of linkwatch events to one
>> * per second so that a runaway driver does not
>> * cause a storm of messages on the netlink
>> * socket. This limit does not apply to up events
>> * while the device qdisc is down.
>> */
>> if (!urgent_only)
>> linkwatch_nextevent = jiffies + HZ;
>> /* Limit wrap-around effect on delay. */
>> else if (time_after(linkwatch_nextevent, jiffies + HZ))
>> linkwatch_nextevent = jiffies;
>>
>> Something about this isn't right. We need to analyze what you are
>> seeing,
>> what device you are using, and what systemd is doing to figure out what
>> the right place for the fix.
>>
>> Thank you.
>>
>
> Thank you very much for your comments.
> We are using virtio_net and the environment is a microvm similar to
> firecracker.
>
> Let's briefly explain.
> net_device->operstate is assigned through linkwatch_event, and the
> call stack is as follows:
> process_one_work
> -> linkwatch_event
> -> __linkwatch_run_queue
> -> linkwatch_do_dev
> -> rfc2863_policy
> -> default_operstate
>
> During the machine startup process, net_device->operstate has the
> following two-step state changes:
>
> STEP A: virtnet_probe detects the network card and triggers the
> execution of linkwatch_fire_event.
> Since linkwatch_nextevent is initialized to 0, linkwatch_work will run.
> And since net_device->state is 6 (__LINK_STATE_PRESENT |
> __LINK_STATE_NOCARRIER), net_device->operstate will be changed from
> IF_OPER_UNKNOWN to IF_OPER_DOWN:
> eth0 operstate:0 (IF_OPER_UNKNOWN) -> operstate:2 (IF_OPER_DOWN)
>
> virtnet_probe then executes netif_carrier_on to update
> net_device->state, it will be changed from ‘__LINK_STATE_PRESENT |
> __LINK_STATE_NOCARRIER’ to __LINK_STATE_PRESENT:
> eth0 state: 6 (__LINK_STATE_PRESENT | __LINK_STATE_NOCARRIER) -> 2
> (__LINK_STATE_PRESENT)
>
> STEP B: One second later (because linkwatch_nextevent = jiffies + HZ),
> linkwatch_work is executed again.
> At this time, since net_device->state is __LINK_STATE_PRESENT, so the
> net_device->operstate will be changed from IF_OPER_DOWN to IF_OPER_UP:
> eth0 operstate:2 (IF_OPER_DOWN) -> operstate:6 (IF_OPER_UP)
>
>
> The above state change can be completed within 2 seconds.
> Generally, the machine will load the initramfs first, and do some
> initialization in the initramfs, which takes some time; then
> switch_root to the system disk and continue the initialization, which
> will also take some time, and finally start the systemd-networkd
> service, bringing link, etc.,
> In this way, the linkwatch_work work queue has enough time to run
> twice, and the state of net_device->operstate is already IF_OPER_UP,
> So bringing link up quickly returns the following information:
> Aug 06 16:35:55.966121 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
> eth0: bringing link up
> ...
> Aug 06 16:35:55.990461 iZuf6h1kfgutxc3el68z2lZ systemd-networkd[580]:
> eth0: flags change: +UP +LOWER_UP +RUNNING
>
> But we are now using MicroVM, which requires extreme speed to start,
> bypassing the initramfs and directly booting the trimmed system on the
> disk.
> systemd-networkd starts in less than 1 second after booting. the STEP
> B has not been run yet, so it will wait for several hundred
> milliseconds here, as follows:
> Jul 20 22:00:47.432552 systemd-networkd[210]: eth0: bringing link up
> ...
> Jul 20 22:00:47.446108 systemd-networkd[210]: eth0: flags change: +UP
> +LOWER_UP
> ...
> Jul 20 22:00:47.781463 systemd-networkd[210]: eth0: flags change:
> +RUNNING
>
>
> Note: dhcp pays attention to IFF_RUNNING status, we may refer to:
> https://www.kernel.org/doc/Documentation/networking/operstates.txt
>
> A routing daemon or dhcp client just needs to care for IFF_RUNNING or
> waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
> considering the interface / querying a DHCP address.
>
> Finally, the STEP B above only updates the value of operstate based on
> the known state (operstate/state) on the net_device, without any
> hardware interaction involved, so it is not very reasonable to wait
> for 1 second there.
>
> By adding:
> + if ((dev->flags & IFF_UP) && dev->operstate == IF_OPER_DOWN)
> + return true;
> +
> We hope to improve the linkwatch_urgent_event function a bit.
>
> Hope to get more of your advice and guidance.
>
> Best wishes,
> Wen
hi, this issue is worth continuing discussion.
In the microVM scenario, it is valuable to increase the startup speed by
a few hundred milliseconds.
Best wishes,
Wen