Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp3474768pxu; Tue, 8 Dec 2020 13:04:58 -0800 (PST) X-Google-Smtp-Source: ABdhPJzldEovzCa/LuX9L+an6VPNhJ/RUTwAUGYkE0rbxm49UegPpJW5F1mgVww1kOc+x75llrA3 X-Received: by 2002:aa7:c7d8:: with SMTP id o24mr10086728eds.328.1607461498287; Tue, 08 Dec 2020 13:04:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607461498; cv=none; d=google.com; s=arc-20160816; b=ZrzYoWj1qQUxyxl5mxTHkOrJly1GA06GmbfAOyTg3lPACjxZUCYc5TI20Te77chhcj 2kRdCe/M8Q1AuuM1dFxI1lVBXpS82X8zuRS83n519mDkkW6Gj/PC7s0R1+go9I4xi27u WCfKVQRZJZl3WufQfoc4sWIGcXij1D8zyYbV1fA7SequNZATxgxy/JmUSCAC7ZzBi7vo bqA4WuTY97hMDAx3NFThsg2G17MlHJG4lj4/AmxlfKeHHk1QDSuMw747xz9RZNmyEtsa jvRArZB5zkoTCp1NZVtx8eJ9LUvYCwYl05a25FZ9MAwz6nxD4hhfou6wytezoT5w/nBK gRdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-transfer-encoding :content-id:mime-version:comments:references:in-reply-to:subject:cc :to:from; bh=juE8gtkAxNij8Wj8NzW93aGEd2CYN0Ogiiqzi2hk/uI=; b=E1K8A3leQGRwsRlPR43wJPLwCfTZBQzqZR0N7kBp6yq8WPxA/cHEW6Shj2QWI/1VhZ T6IRZsw6uo6P6FZZiN4qQlgJPmrBJF2uGVX4X8l1crwYo5ZyDIMchav4Zc1Rke2hZK8c 2b57l2vWlpEmiD+rdJH/JoYVmOTmwvpUIevvRFnD2qgnBPioPy38aTq4Gw2K6pqxBKnT XxMKiKXUwER2tKv075T2mHWAjVeUAPDqhldlWOc3hqtLda/bRj5QN9EnRDDea+hGwqRw NDivSW4rIFtrWB50jVoH9DsTYMVkxNfOrEOWE3ccKSbk7a4qKaHiUSXnjbqfjdCb5CoU cu1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v5si10998292edj.313.2020.12.08.13.04.35; Tue, 08 Dec 2020 13:04:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=canonical.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728592AbgLHUhf convert rfc822-to-8bit (ORCPT + 99 others); Tue, 8 Dec 2020 15:37:35 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:57202 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730934AbgLHUfl (ORCPT ); Tue, 8 Dec 2020 15:35:41 -0500 Received: from 1.general.jvosburgh.us.vpn ([10.172.68.206] helo=famine.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1kmjhE-0001pY-Lu; Tue, 08 Dec 2020 20:34:53 +0000 Received: by famine.localdomain (Postfix, from userid 1000) id DAC085FEE7; Tue, 8 Dec 2020 12:34:50 -0800 (PST) Received: from famine (localhost [127.0.0.1]) by famine.localdomain (Postfix) with ESMTP id D2F0F9FAB0; Tue, 8 Dec 2020 12:34:50 -0800 (PST) From: Jay Vosburgh To: Jarod Wilson cc: linux-kernel@vger.kernel.org, Mahesh Bandewar , Veaceslav Falico , Andy Gospodarek , "David S. Miller" , Jakub Kicinski , Thomas Davis , netdev@vger.kernel.org Subject: Re: [PATCH net] bonding: reduce rtnl lock contention in mii monitor thread In-reply-to: <20201205234354.1710-1-jarod@redhat.com> References: <20201205234354.1710-1-jarod@redhat.com> Comments: In-reply-to Jarod Wilson message dated "Sat, 05 Dec 2020 18:43:54 -0500." X-Mailer: MH-E 8.6+git; nmh 1.6; GNU Emacs 27.0.50 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <11897.1607459690.1@famine> Content-Transfer-Encoding: 8BIT Date: Tue, 08 Dec 2020 12:34:50 -0800 Message-ID: <11900.1607459690@famine> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jarod Wilson wrote: >I'm seeing a system get stuck unable to bring a downed interface back up >when it's got an updelay value set, behavior which ceased when logging >spew was removed from bond_miimon_inspect(). I'm monitoring logs on this >system over another network connection, and it seems that the act of >spewing logs at all there increases rtnl lock contention, because >instrumented code showed bond_mii_monitor() never able to succeed in it's >attempts to call rtnl_trylock() to actually commit link state changes, >leaving the downed link stuck in BOND_LINK_DOWN. The system in question >appears to be fine with the log spew being moved to >bond_commit_link_state(), which is called after the successful >rtnl_trylock(). I'm actually wondering if perhaps we ultimately need/want >some bond-specific lock here to prevent racing with bond_close() instead >of using rtnl, but this shift of the output appears to work. I believe >this started happening when de77ecd4ef02 ("bonding: improve link-status >update in mii-monitoring") went in, but I'm not 100% on that. We use RTNL not to avoid deadlock with bonding itself, but because the "commit" side undertakes actions which require RTNL, e.g., various events will eventually call netdev_lower_state_changed. However, the RTNL acquisition is a trylock to avoid the deadlock with bond_close. Moving that out of line here (e.g., putting the commit into another work queue event or the like) has the same problem, in that bond_close needs to wait for all of the work queue events to finish, and it holds RTNL. Also, a dim memory says that the various notification messages were mostly placed in the "inspect" phase and not the "commit" phase to avoid doing printk-like activities with RTNL held. As a general principle, I don't think we want to add more verbiage under RTNL. >The addition of a case BOND_LINK_BACK in bond_miimon_inspect() is somewhat >separate from the fix for the actual hang, but it eliminates a constant >"invalid new link 3 on slave" message seen related to this issue, and it's >not actually an invalid state here, so we shouldn't be reporting it as an >error. Do you mean bond_miimon_commit here and not bond_miimon_inspect (which already has a case for BOND_LINK_BACK)? In principle, bond_miimon_commit should not see _BACK or _FAIL state as a new link state, because those states should be managed at the bond_miimon_inspect level (as they are the result of updelay and downdelay). These states should not be "committed" in the sense of causing notifications or doing actions that require RTNL. My recollection is that the "invalid new link" messages were the result of a bug in de77ecd4ef02, which was fixed in 1899bb325149 ("bonding: fix state transition issue in link monitoring"), but maybe the RTNL problem here induces that in some other fashion. Either way, I believe this message is correct as-is. -J >CC: Mahesh Bandewar >CC: Jay Vosburgh >CC: Veaceslav Falico >CC: Andy Gospodarek >CC: "David S. Miller" >CC: Jakub Kicinski >CC: Thomas Davis >CC: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson >--- > drivers/net/bonding/bond_main.c | 26 ++++++---------------- > include/net/bonding.h | 38 +++++++++++++++++++++++++++++++++ > 2 files changed, 44 insertions(+), 20 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 47afc5938c26..cdb6c64f16b6 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -2292,23 +2292,13 @@ static int bond_miimon_inspect(struct bonding *bond) > bond_propose_link_state(slave, BOND_LINK_FAIL); > commit++; > slave->delay = bond->params.downdelay; >- if (slave->delay) { >- slave_info(bond->dev, slave->dev, "link status down for %sinterface, disabling it in %d ms\n", >- (BOND_MODE(bond) == >- BOND_MODE_ACTIVEBACKUP) ? >- (bond_is_active_slave(slave) ? >- "active " : "backup ") : "", >- bond->params.downdelay * bond->params.miimon); >- } >+ > fallthrough; > case BOND_LINK_FAIL: > if (link_state) { > /* recovered before downdelay expired */ > bond_propose_link_state(slave, BOND_LINK_UP); > slave->last_link_up = jiffies; >- slave_info(bond->dev, slave->dev, "link status up again after %d ms\n", >- (bond->params.downdelay - slave->delay) * >- bond->params.miimon); > commit++; > continue; > } >@@ -2330,19 +2320,10 @@ static int bond_miimon_inspect(struct bonding *bond) > commit++; > slave->delay = bond->params.updelay; > >- if (slave->delay) { >- slave_info(bond->dev, slave->dev, "link status up, enabling it in %d ms\n", >- ignore_updelay ? 0 : >- bond->params.updelay * >- bond->params.miimon); >- } > fallthrough; > case BOND_LINK_BACK: > if (!link_state) { > bond_propose_link_state(slave, BOND_LINK_DOWN); >- slave_info(bond->dev, slave->dev, "link status down again after %d ms\n", >- (bond->params.updelay - slave->delay) * >- bond->params.miimon); > commit++; > continue; > } >@@ -2456,6 +2437,11 @@ static void bond_miimon_commit(struct bonding *bond) > > continue; > >+ case BOND_LINK_BACK: >+ bond_propose_link_state(slave, BOND_LINK_NOCHANGE); >+ >+ continue; >+ > default: > slave_err(bond->dev, slave->dev, "invalid new link %d on slave\n", > slave->link_new_state); >diff --git a/include/net/bonding.h b/include/net/bonding.h >index adc3da776970..6a09de9a3f03 100644 >--- a/include/net/bonding.h >+++ b/include/net/bonding.h >@@ -558,10 +558,48 @@ static inline void bond_propose_link_state(struct slave *slave, int state) > > static inline void bond_commit_link_state(struct slave *slave, bool notify) > { >+ struct bonding *bond = slave->bond; >+ > if (slave->link_new_state == BOND_LINK_NOCHANGE) > return; > >+ if (slave->link == slave->link_new_state) >+ return; >+ > slave->link = slave->link_new_state; >+ >+ switch(slave->link) { >+ case BOND_LINK_UP: >+ slave_info(bond->dev, slave->dev, "link status up again after %d ms\n", >+ (bond->params.downdelay - slave->delay) * >+ bond->params.miimon); >+ break; >+ >+ case BOND_LINK_FAIL: >+ if (slave->delay) { >+ slave_info(bond->dev, slave->dev, "link status down for %sinterface, disabling it in %d ms\n", >+ (BOND_MODE(bond) == >+ BOND_MODE_ACTIVEBACKUP) ? >+ (bond_is_active_slave(slave) ? >+ "active " : "backup ") : "", >+ bond->params.downdelay * bond->params.miimon); >+ } >+ break; >+ >+ case BOND_LINK_DOWN: >+ slave_info(bond->dev, slave->dev, "link status down again after %d ms\n", >+ (bond->params.updelay - slave->delay) * >+ bond->params.miimon); >+ break; >+ >+ case BOND_LINK_BACK: >+ if (slave->delay) { >+ slave_info(bond->dev, slave->dev, "link status up, enabling it in %d ms\n", >+ bond->params.updelay * bond->params.miimon); >+ } >+ break; >+ } >+ > if (notify) { > bond_queue_slave_event(slave); > bond_lower_state_changed(slave); >-- >2.28.0 > --- -Jay Vosburgh, jay.vosburgh@canonical.com