Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp2238348imm; Thu, 18 Oct 2018 11:11:42 -0700 (PDT) X-Google-Smtp-Source: ACcGV60BPT5bS0PSO5mcE+5Z/BwFIZENf0bXMYSAj2zYeRWKJ2XI1ZDQzCe2Ym7x9tpy5IT+M8Rq X-Received: by 2002:a17:902:d808:: with SMTP id a8-v6mr31701759plz.306.1539886302657; Thu, 18 Oct 2018 11:11:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1539886302; cv=none; d=google.com; s=arc-20160816; b=o6Rmqi06YWu0cPl2u+gZLn2FcoxPWWSUwa8LZFsj7GVUZYSxJwA1jvRNLHtQSQLOLD VBnMTwUfRXJAoOI1u7ZASjg8VSQLoxzga88ePFqIxw9yyP7GvNmhsD7D85iom/+6252q CK7SAq38tSxecw0OX5S1AAJy7sJ22aHegkL+71z4FuKPcVCPF6M+Hc/871gYNHuOrNhe 4lXu9DIW0rZ1rgmbBaHXn+vx7olH7utgRAHS7YeV9hRbn9OgyCKjdqo1Ij2/A1LC+s4k d9octKFJrdyuslRfx4+UjeWXsthgSH1LEPyu0mvRZnMqoIHtlm3QuYoeCqfLB18ZwVm/ P1kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=tIeZXob3eXEQJEtemjmktEc+MtKTnP1sCIMeUSYPNVw=; b=Cjbwpa33GaZJpJ0BRIikPXV4Vfv7J4V36YXRx3Isu1BZvUXCAsgGXs4HXLPweGwv4c Vm4LPQf94bsGyQ/95P2BF6tblKgprgVjLMgE7/Y5Dh89JqGPFsMdFrjj5KMtBZ89+vLc vFcg+zeKMZTg8Acy88XGXK8FVq4WNAlZ7fItw9eh1b6j1HqTTna1IYpK2FTexG8HzNOf +u5ytOZNYiBMVZhLJy0h40Fl27IyqyPdUIcknhWXPYNU5fg2DaYN9hRIrI8+Hlh2Ys44 WIz8aTRCPli+tes6IFFpfJ2sWSnAZwlfHq5gDNC1aXHF0x+hYDQm2vL3KnKe87GeaikM 0oMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=eYOxQY9y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m5-v6si20258051pgi.327.2018.10.18.11.11.26; Thu, 18 Oct 2018 11:11:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=eYOxQY9y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730316AbeJSCHn (ORCPT + 99 others); Thu, 18 Oct 2018 22:07:43 -0400 Received: from mail.kernel.org ([198.145.29.99]:56462 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726599AbeJSCHn (ORCPT ); Thu, 18 Oct 2018 22:07:43 -0400 Received: from localhost (ip-213-127-77-176.ip.prioritytelecom.net [213.127.77.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 6FCD42145D; Thu, 18 Oct 2018 18:05:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1539885935; bh=r5cpX9Rw/E83I+M+txMwmqOOFgQtSuIDpQKAwAm1/fg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=eYOxQY9yb2xGHyI0PAIEqQ5YqgdreQF8x6jCWn2tvBQEdTsj+JP2Z6M5Djuhyd5cx 0gxcdILhdyaH/MIiBmYFdnMtPXYws+GcLuC6StGoOmdp6HJjlBEIuJVEvCBRGmeedX UQhr9uBDvZZbMoxv9ZkFJTW1psYcNC35dmEFFu5A= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Sabrina Dubroca , Stefano Brivio , David Ahern , "David S. Miller" Subject: [PATCH 4.4 22/48] net: ipv4: update fnhe_pmtu when first hops MTU changes Date: Thu, 18 Oct 2018 19:54:57 +0200 Message-Id: <20181018175429.208281484@linuxfoundation.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181018175427.133690306@linuxfoundation.org> References: <20181018175427.133690306@linuxfoundation.org> User-Agent: quilt/0.65 X-stable: review MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 4.4-stable review patch. If anyone has any objections, please let me know. ------------------ From: Sabrina Dubroca [ Upstream commit af7d6cce53694a88d6a1bb60c9a239a6a5144459 ] Since commit 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions"), exceptions get deprecated separately from cached routes. In particular, administrative changes don't clear PMTU anymore. As Stefano described in commit e9fa1495d738 ("ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes"), the PMTU discovered before the local MTU change can become stale: - if the local MTU is now lower than the PMTU, that PMTU is now incorrect - if the local MTU was the lowest value in the path, and is increased, we might discover a higher PMTU Similarly to what commit e9fa1495d738 did for IPv6, update PMTU in those cases. If the exception was locked, the discovered PMTU was smaller than the minimal accepted PMTU. In that case, if the new local MTU is smaller than the current PMTU, let PMTU discovery figure out if locking of the exception is still needed. To do this, we need to know the old link MTU in the NETDEV_CHANGEMTU notifier. By the time the notifier is called, dev->mtu has been changed. This patch adds the old MTU as additional information in the notifier structure, and a new call_netdevice_notifiers_u32() function. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Signed-off-by: Sabrina Dubroca Reviewed-by: Stefano Brivio Reviewed-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/netdevice.h | 7 ++++++ include/net/ip_fib.h | 1 net/core/dev.c | 28 +++++++++++++++++++++++-- net/ipv4/fib_frontend.c | 12 +++++++---- net/ipv4/fib_semantics.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 92 insertions(+), 6 deletions(-) --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2168,6 +2168,13 @@ struct netdev_notifier_info { struct net_device *dev; }; +struct netdev_notifier_info_ext { + struct netdev_notifier_info info; /* must be first */ + union { + u32 mtu; + } ext; +}; + struct netdev_notifier_change_info { struct netdev_notifier_info info; /* must be first */ unsigned int flags_changed; --- a/include/net/ip_fib.h +++ b/include/net/ip_fib.h @@ -322,6 +322,7 @@ int ip_fib_check_default(__be32 gw, stru int fib_sync_down_dev(struct net_device *dev, unsigned long event, bool force); int fib_sync_down_addr(struct net *net, __be32 local); int fib_sync_up(struct net_device *dev, unsigned int nh_flags); +void fib_sync_mtu(struct net_device *dev, u32 orig_mtu); extern u32 fib_multipath_secret __read_mostly; --- a/net/core/dev.c +++ b/net/core/dev.c @@ -1660,6 +1660,28 @@ int call_netdevice_notifiers(unsigned lo } EXPORT_SYMBOL(call_netdevice_notifiers); +/** + * call_netdevice_notifiers_mtu - call all network notifier blocks + * @val: value passed unmodified to notifier function + * @dev: net_device pointer passed unmodified to notifier function + * @arg: additional u32 argument passed to the notifier function + * + * Call all network notifier blocks. Parameters and return value + * are as for raw_notifier_call_chain(). + */ +static int call_netdevice_notifiers_mtu(unsigned long val, + struct net_device *dev, u32 arg) +{ + struct netdev_notifier_info_ext info = { + .info.dev = dev, + .ext.mtu = arg, + }; + + BUILD_BUG_ON(offsetof(struct netdev_notifier_info_ext, info) != 0); + + return call_netdevice_notifiers_info(val, dev, &info.info); +} + #ifdef CONFIG_NET_INGRESS static struct static_key ingress_needed __read_mostly; @@ -6134,14 +6156,16 @@ int dev_set_mtu(struct net_device *dev, err = __dev_set_mtu(dev, new_mtu); if (!err) { - err = call_netdevice_notifiers(NETDEV_CHANGEMTU, dev); + err = call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev, + orig_mtu); err = notifier_to_errno(err); if (err) { /* setting mtu back and notifying everyone again, * so that they have a chance to revert changes. */ __dev_set_mtu(dev, orig_mtu); - call_netdevice_notifiers(NETDEV_CHANGEMTU, dev); + call_netdevice_notifiers_mtu(NETDEV_CHANGEMTU, dev, + new_mtu); } } return err; --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1170,7 +1170,8 @@ static int fib_inetaddr_event(struct not static int fib_netdev_event(struct notifier_block *this, unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); - struct netdev_notifier_changeupper_info *info; + struct netdev_notifier_changeupper_info *upper_info = ptr; + struct netdev_notifier_info_ext *info_ext = ptr; struct in_device *in_dev; struct net *net = dev_net(dev); unsigned int flags; @@ -1205,16 +1206,19 @@ static int fib_netdev_event(struct notif fib_sync_up(dev, RTNH_F_LINKDOWN); else fib_sync_down_dev(dev, event, false); - /* fall through */ + rt_cache_flush(net); + break; case NETDEV_CHANGEMTU: + fib_sync_mtu(dev, info_ext->ext.mtu); rt_cache_flush(net); break; case NETDEV_CHANGEUPPER: - info = ptr; + upper_info = ptr; /* flush all routes if dev is linked to or unlinked from * an L3 master device (e.g., VRF) */ - if (info->upper_dev && netif_is_l3_master(info->upper_dev)) + if (upper_info->upper_dev && + netif_is_l3_master(upper_info->upper_dev)) fib_disable_ip(dev, NETDEV_DOWN, true); break; } --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -1373,6 +1373,56 @@ int fib_sync_down_addr(struct net *net, return ret; } +/* Update the PMTU of exceptions when: + * - the new MTU of the first hop becomes smaller than the PMTU + * - the old MTU was the same as the PMTU, and it limited discovery of + * larger MTUs on the path. With that limit raised, we can now + * discover larger MTUs + * A special case is locked exceptions, for which the PMTU is smaller + * than the minimal accepted PMTU: + * - if the new MTU is greater than the PMTU, don't make any change + * - otherwise, unlock and set PMTU + */ +static void nh_update_mtu(struct fib_nh *nh, u32 new, u32 orig) +{ + struct fnhe_hash_bucket *bucket; + int i; + + bucket = rcu_dereference_protected(nh->nh_exceptions, 1); + if (!bucket) + return; + + for (i = 0; i < FNHE_HASH_SIZE; i++) { + struct fib_nh_exception *fnhe; + + for (fnhe = rcu_dereference_protected(bucket[i].chain, 1); + fnhe; + fnhe = rcu_dereference_protected(fnhe->fnhe_next, 1)) { + if (fnhe->fnhe_mtu_locked) { + if (new <= fnhe->fnhe_pmtu) { + fnhe->fnhe_pmtu = new; + fnhe->fnhe_mtu_locked = false; + } + } else if (new < fnhe->fnhe_pmtu || + orig == fnhe->fnhe_pmtu) { + fnhe->fnhe_pmtu = new; + } + } + } +} + +void fib_sync_mtu(struct net_device *dev, u32 orig_mtu) +{ + unsigned int hash = fib_devindex_hashfn(dev->ifindex); + struct hlist_head *head = &fib_info_devhash[hash]; + struct fib_nh *nh; + + hlist_for_each_entry(nh, head, nh_hash) { + if (nh->nh_dev == dev) + nh_update_mtu(nh, dev->mtu, orig_mtu); + } +} + /* Event force Flags Description * NETDEV_CHANGE 0 LINKDOWN Carrier OFF, not for scope host * NETDEV_DOWN 0 LINKDOWN|DEAD Link down, not for scope host