Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754046AbbHEPYx (ORCPT ); Wed, 5 Aug 2015 11:24:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56304 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752998AbbHEPYE (ORCPT ); Wed, 5 Aug 2015 11:24:04 -0400 Date: Wed, 5 Aug 2015 12:23:59 -0300 From: Thadeu Lima de Souza Cascardo To: Shaun Crampton Cc: Cong Wang , "linux-kernel@vger.kernel.org" , Linux Kernel Network Developers Subject: Re: veths often slow to come up Message-ID: <20150805152358.GE10686@indiana.gru.redhat.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4434 Lines: 143 On Tue, Aug 04, 2015 at 08:26:28PM -0700, Cong Wang wrote: > (Cc'ing netdev for network issues) > > On Tue, Aug 4, 2015 at 6:42 AM, Shaun Crampton > wrote: > > Please CC me on any responses, thanks. > > > > Setting both ends of a veth to be oper UP completes very quickly but I > > find that pings only start flowing over the veth after about a second. > > This seems to correlate with the NO-CARRIER flag being set or the > > interface being in "state UNKNOWN" or "state DOWN² for about a second > > (demo script below). > > > > If I run the script repeatedly then sometimes it completes very quickly on > > subsequent runs as if there¹s a hot cache somewhere. > > > > Could this be a bug or is there a configuration to speed this up? Seems > > odd that it¹s almost exactly 1s on the first run. > > > > Seen on these kernels: > > * 3.13.0-57-generic #95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 x86_64 > > x86_64 x86_64 GNU/Linux > > * 4.0.9-coreos #2 SMP Thu Jul 30 01:07:55 UTC 2015 x86_64 Intel(R) Xeon(R) > > CPU @ 2.50GHz GenuineIntel GNU/Linux > > > > Regards, > > > > -Shaun > > Take a look at linkwatch_urgent_event at net/core/link_watch.c, and all of link_watch.c in general. That's where the 1s delay comes from. It's designed to prevent link message storms. In particular, look at commit 294cc44b7e48a6e7732499eebcf409b231460d8e, which added the urgent event. I suspect this was designed to workaround buggy drivers/hardware, not to help userspace handle thousands of virtual devices being created and destroyed all the time. Maybe virtual devices should be whitelisted here? Maybe the patch below is stupid, because drivers may abuse it, and drivers are buggy, otherwise linkwatch would not be needed in the first place. Regards. Cascardo. > > > > Running my test script below (Assumes veth0/1 do not already exist): > > > > $ sudo ./veth-test.sh > > Time to create veth: > > > > real 0m0.019s > > user 0m0.002s > > sys 0m0.010s > > > > Time to wait for carrier: > > > > real 0m1.005s > > user 0m0.007s > > sys 0m0.123s > > > > > > > > # veth-test.sh > > > > #!/bin/bash > > function create_veth { > > ip link add type veth > > ip link set veth0 up > > ip link set veth1 up > > } > > function wait_for_carrier { > > while ! ip link show | grep -qE 'veth[01]'; > > do > > sleep 0.05 > > done > > while ip link show | grep -E 'veth[01]¹ | \ > > grep -Eq 'NO-CARRIER|state DOWN|state UNKNOWN'; > > do > > sleep 0.05 > > done > > } > > echo "Time to create veth:" > > time create_veth > > echo > > echo "Time to wait for carrier:" > > time wait_for_carrier > > ip link del veth0 --- diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 343592c..91123a8 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -306,6 +306,7 @@ static void veth_setup(struct net_device *dev) dev->priv_flags &= ~IFF_TX_SKB_SHARING; dev->priv_flags |= IFF_LIVE_ADDR_CHANGE; + dev->priv_flags |= IFF_LINKWATCH_URGENT; dev->netdev_ops = &veth_netdev_ops; dev->ethtool_ops = &veth_ethtool_ops; diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 607b5f4..138f5e9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1262,6 +1262,7 @@ struct net_device_ops { * @IFF_LIVE_ADDR_CHANGE: device supports hardware address * change when it's running * @IFF_MACVLAN: Macvlan device + * @IFF_LINKWATCH_URGENT: device does not flood with link updates */ enum netdev_priv_flags { IFF_802_1Q_VLAN = 1<<0, @@ -1289,6 +1290,7 @@ enum netdev_priv_flags { IFF_XMIT_DST_RELEASE_PERM = 1<<22, IFF_IPVLAN_MASTER = 1<<23, IFF_IPVLAN_SLAVE = 1<<24, + IFF_LINKWATCH_URGENT = 1<<25, }; #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 9828616..e2957a0 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -95,6 +95,9 @@ static bool linkwatch_urgent_event(struct net_device *dev) if (dev->priv_flags & IFF_TEAM_PORT) return true; + if (dev->priv_flags & IFF_LINKWATCH_URGENT) + return true; + return netif_carrier_ok(dev) && qdisc_tx_changing(dev); } --- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/