Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp1328344pxu; Sat, 5 Dec 2020 11:48:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJz7zxT7krz77yfgY9izfohE5E7ZHJhfzKdZ33hr0sy3miXBbTLO+/itJpUEwiW+drN7jGgr X-Received: by 2002:a17:907:9619:: with SMTP id gb25mr12265561ejc.444.1607197696203; Sat, 05 Dec 2020 11:48:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1607197696; cv=none; d=google.com; s=arc-20160816; b=FgLpi4VXK37zrg4jbAFNkwyjHjCTd/KPXB23Oazy1ug1rLC+lU2A5s7ptDXzfS+QfK /A/Dqgpq43YtYNovvAFJJVsgHcMm/9uV2+Dle5SiyJPWZFpMsQsHi4Qshvl0pc6m6lZC 9qS+H4kkrCfNRewOaER3R9VOCNvo79Jy/bnh6aY0YiTUq/xo1fucsq+lgtbU6AZzedxo FRLxj6G96sVL1p/MQ0lEQ52Q53XVSEdD5gv/99SlcXFuMxbhN0GgGYUDtzA9VWXnqo5D 9Rp0oSXv7Cgee5aMx4dwUlHuSKAgCBrgh3IcmEH28oA7qn+7eaLxsvQTexeUxGCyIhkc KJPQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:dkim-signature :date; bh=QpwRf+oc2NtQzIAfYra+UMksenaQdCF82U+C27+MqHo=; b=cZqP34Fa66PyTMeUwaYS5vWSG0ya6nknRlWu0rE4guCkwRFXvWTfGiHaO0PtkCrA9R iqW/gpJnZfOFbfnfyuAteXPYLaiK1XEwSZwVBF6x/EonabwlnuAy+XxhR2ILJWB/ayk1 md15Bs6ZtjNNvE5ynf87XWAJgdo1XPkJUkMMhhDl/Fs3IYmtX0YOwWnbCB1e6yJ1eEWo RwZBL7L4doOFIJBDBXfDZEzCzDg4mICicafL48Md69XndS+DHERpq2RXoBizTJFcvpGF RUXEZxsKo7EovudRcu7W3mGmF8Tj5DZRL0JH4HnUPd83s4zqvea5w7SbUoU4ngcqZVGH dvcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GYT7M0x2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id cw6si5170712edb.154.2020.12.05.11.47.53; Sat, 05 Dec 2020 11:48:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=GYT7M0x2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726171AbgLETp4 (ORCPT + 99 others); Sat, 5 Dec 2020 14:45:56 -0500 Received: from mail.kernel.org ([198.145.29.99]:60646 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725270AbgLETpz (ORCPT ); Sat, 5 Dec 2020 14:45:55 -0500 Date: Sat, 5 Dec 2020 11:45:13 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1607197515; bh=waY9/Lv9f7z230WTpMxBq86+oBITP8tsVVKJVz/83Ig=; h=From:To:Cc:Subject:In-Reply-To:References:From; b=GYT7M0x2SJfwFvRNCB97xzPZmcO5TWdUUAwrcaKRNh5cfsc8JbXY0l98Ws2NviVta htXJbEGwJt8vCE/0qsn5KFCwRFrXdQeAjrddOH51gofAyrKLn3aUzezWcJ4BUtle3p hunomva0QZAvfrAZgsLWiCnK95Ql2YSNlkpmY/gCnYLPzdi/3DcWF4+Tj8o4VMDRc/ kbDYjHRdRZcRMl5SxzVmZ4VJr0gKs2wGTqIIOJNM1t/3gu5Kc0N/WREGtBplJ946X+ KTGORioc3K0XggAGtUHUkTYAK1Ecc/YrNeNaZYMga1GG0lkRujVCAu8LZHOoqZBAqt ASu5F+gKrqZqQ== From: Jakub Kicinski To: Lars Everbrand Cc: linux-kernel@vger.kernel.org, Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , "David S. Miller" , netdev@vger.kernel.org Subject: Re: [PATCH net-next] bonding: correct rr balancing during link failure Message-ID: <20201205114513.4886d15e@kicinski-fedora-pc1c0hjn.DHCP.thefacebook.com> In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 02 Dec 2020 20:55:57 +0000 Lars Everbrand wrote: > This patch updates the sending algorithm for roundrobin to avoid > over-subscribing interface(s) when one or more interfaces in the bond is > not able to send packets. This happened when order was not random and > more than 2 interfaces were used. > > Previously the algorithm would find the next available interface > when an interface failed to send by, this means that most often it is > current_interface + 1. The problem is that when the next packet is to be > sent and the "normal" algorithm then continues with interface++ which > then hits that same interface again. > > This patch updates the resending algorithm to update the global counter > of the next interface to use. > > Example (prior to patch): > > Consider 6 x 100 Mbit/s interfaces in a rr bond. The normal order of links > being used to send would look like: > 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 ... > > If, for instance, interface 2 where unable to send the order would have been: > 1 3 3 4 5 6 1 3 3 4 5 6 1 3 3 4 5 6 ... > > The resulting speed (for TCP) would then become: > 50 + 0 + 100 + 50 + 50 + 50 = 300 Mbit/s > instead of the expected 500 Mbit/s. > > If interface 3 also would fail the resulting speed would be half of the > expected 400 Mbit/s (33 + 0 + 0 + 100 + 33 + 33). > > Signed-off-by: Lars Everbrand Thanks for the patch! Looking at the code in question it feels a little like we're breaking abstractions if we bump the counter directly in get_slave_by_id. For one thing when the function is called for IGMP packets the counter should not be incremented at all. But also if packets_per_slave is not 1 we'd still be hitting the same leg multiple times (packets_per_slave / 2). So it seems like we should round the counter up somehow? For IGMP maybe we don't have to call bond_get_slave_by_id() at all, IMHO, just find first leg that can TX. Then we can restructure bond_get_slave_by_id() appropriately for the non-IGMP case. > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index e0880a3840d7..e02d9c6d40ee 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -4107,6 +4107,7 @@ static struct slave *bond_get_slave_by_id(struct bonding *bond, > if (--i < 0) { > if (bond_slave_can_tx(slave)) > return slave; > + bond->rr_tx_counter++; > } > } > > @@ -4117,6 +4118,7 @@ static struct slave *bond_get_slave_by_id(struct bonding *bond, > break; > if (bond_slave_can_tx(slave)) > return slave; > + bond->rr_tx_counter++; > } > /* no slave that can tx has been found */ > return NULL;