Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp4827672ybg; Tue, 29 Oct 2019 13:01:48 -0700 (PDT) X-Google-Smtp-Source: APXvYqx+TsAYosuSTab0MKgK4MaA5Xqw06wtKEXngN5fyWd4/GNPko4eFdeN5cYqLSXLNwNhBs5B X-Received: by 2002:a17:906:e8d:: with SMTP id p13mr5175315ejf.159.1572379308228; Tue, 29 Oct 2019 13:01:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572379308; cv=none; d=google.com; s=arc-20160816; b=xijGsNWaucTPmKQbZGuWFc5SmVx9BrbHndTb8xGY69oeznje6ICy07RTd8lEJh6LIo qABE9Qvu5gSX6KSP9RVZRcUmahdXzf8vIGuUIt1fUva6ZMZlotJItGB8EucwgrVr8FSe blmfQIPxK8OLv2fh52Jc1c3YH9s/XJWIOkSz+ExeWFS8U7moUFy+OpYTZ8CLgcRZzEDT KQrc7mJByKs8RJChWthcuNlA2VJkxxnXHygIFGyztwebCGA76itgawLhKZiJTuMyoTHA yMhMueZcK+VFWea2tJga6atcgTu4X0u0vhNSaTZrchQZ4p30i67W1vvVT7BDbcenzdT6 wamQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=VVzOWaW+8WScZs4n+Dg4lETcm9GAwtrOVevdTx2yiNw=; b=NQwSF+8kt6kxSW5KRQM1NLlwj7QJOBSXgFMT2TEuVaadPKmSXmeTmlp/5qEQ3K69fP IJ2tjPSIemXJGih3ntW+wtOUR1+a1XqtXXVAneiatDMPPpsztirILq20nfVProZZ7Ut2 W24hCwZy2tjbuLpntPq8FH8ZeBGABG+npGWdqxMGR5Y0ifbDeME0Qu7/lLxiNlVcLuQg DgND+tMFbcDgPQBLZ9BiO+GbGxnIpjyJuT8V1JEaLi5aeKUiDxKtwBDEGtaFkeXQ3jrP 0CxVJjqegtRXr0ku4PEIk9lCAfUSxNe+y6bBoTjilpxY7aEs8oxmta3ZkmIUMwbFbLu1 sGeg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cumulusnetworks.com header.s=google header.b=HKsjqS2I; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cumulusnetworks.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p15si11190805edi.27.2019.10.29.13.01.24; Tue, 29 Oct 2019 13:01:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@cumulusnetworks.com header.s=google header.b=HKsjqS2I; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=cumulusnetworks.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729688AbfJ2SgC (ORCPT + 99 others); Tue, 29 Oct 2019 14:36:02 -0400 Received: from mail-lj1-f196.google.com ([209.85.208.196]:46874 "EHLO mail-lj1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725962AbfJ2SgB (ORCPT ); Tue, 29 Oct 2019 14:36:01 -0400 Received: by mail-lj1-f196.google.com with SMTP id w8so11894489lji.13 for ; Tue, 29 Oct 2019 11:35:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cumulusnetworks.com; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=VVzOWaW+8WScZs4n+Dg4lETcm9GAwtrOVevdTx2yiNw=; b=HKsjqS2IAM573Lz8ROLM6jfS3knA1pQ01xLeIp0aBxdvWQ3mG/tT4NDr9gfClzXGRE 1XbK1E6SADvZEn87iaBrrsojdnnf3XWdqvrOjzsZvhHfM1XMQgARJHMNRySqyBCF54Fw uRTMQ7TxhcGdShK8QiEyXimlf+yK8At44wDag= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=VVzOWaW+8WScZs4n+Dg4lETcm9GAwtrOVevdTx2yiNw=; b=DFcKU26FsEZMoWmqQ7xkvjf+LKqt2qJW86rkZkBYYymtVTPgllLLd0n2x23brGHKEW SsMEEy+K+nsyKv7tEZK36Q5AcEwsuS5vYs8FvRWh2uCkiqWS9GIX6Y2E1jbUErvDjPon lRWJxwgI+CPz1GHjw0Qss/XNwx75nm5G1BnE2ufqQ/EIXF8XluC2Ry0im1s9MChOb0Yl YHx7aj3tMzsstW7Pm4X05Fbyl33ZddhyDnbF78TzUm19dWLe7ZbyteD0a2XD0q9nos2z Zxj0gBaCY/moFIcbuOoqew/l+1iqvMr2GUqfKhk5IoZ9IhvxR1ATZI6818dF9Ftr5JVT wizw== X-Gm-Message-State: APjAAAXnRmuEYWFZN9VyL1STLWqyMi+HzcZ81j7mFbprQv7aPJ79UiG9 qWUpFYxtRsiAYzHGYSklUp2iK/g4vVE= X-Received: by 2002:a2e:9595:: with SMTP id w21mr3649227ljh.181.1572374157687; Tue, 29 Oct 2019 11:35:57 -0700 (PDT) Received: from [192.168.0.107] (84-238-136-197.ip.btc-net.bg. [84.238.136.197]) by smtp.gmail.com with ESMTPSA id u11sm3665585ljo.17.2019.10.29.11.35.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 29 Oct 2019 11:35:57 -0700 (PDT) Subject: Re: [PATCH net-next v2 4/4] bonding: balance ICMP echoes in layer3+4 mode To: Matteo Croce , netdev@vger.kernel.org Cc: Jay Vosburgh , Veaceslav Falico , Andy Gospodarek , "David S . Miller" , Stanislav Fomichev , Daniel Borkmann , Song Liu , Alexei Starovoitov , Paul Blakey , linux-kernel@vger.kernel.org References: <20191029135053.10055-1-mcroce@redhat.com> <20191029135053.10055-5-mcroce@redhat.com> From: Nikolay Aleksandrov Message-ID: <5be14e4e-807f-486d-d11a-3113901e72fe@cumulusnetworks.com> Date: Tue, 29 Oct 2019 20:35:55 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.1.1 MIME-Version: 1.0 In-Reply-To: <20191029135053.10055-5-mcroce@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 29/10/2019 15:50, Matteo Croce wrote: > The bonding uses the L4 ports to balance flows between slaves. As the ICMP > protocol has no ports, those packets are sent all to the same device: > > # tcpdump -qltnni veth0 ip |sed 's/^/0: /' & > # tcpdump -qltnni veth1 ip |sed 's/^/1: /' & > # ping -qc1 192.168.0.2 > 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 315, seq 1, length 64 > 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 315, seq 1, length 64 > # ping -qc1 192.168.0.2 > 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 316, seq 1, length 64 > 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 316, seq 1, length 64 > # ping -qc1 192.168.0.2 > 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 317, seq 1, length 64 > 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 317, seq 1, length 64 > > But some ICMP packets have an Identifier field which is > used to match packets within sessions, let's use this value in the hash > function to balance these packets between bond slaves: > > # ping -qc1 192.168.0.2 > 0: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 303, seq 1, length 64 > 0: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 303, seq 1, length 64 > # ping -qc1 192.168.0.2 > 1: IP 192.168.0.1 > 192.168.0.2: ICMP echo request, id 304, seq 1, length 64 > 1: IP 192.168.0.2 > 192.168.0.1: ICMP echo reply, id 304, seq 1, length 64 > > Aso, let's use a flow_dissector_key which defines FLOW_DISSECTOR_KEY_ICMP, Also ? > so we can balance pings encapsulated in a tunnel when using mode encap3+4: > > # ping -q 192.168.1.2 -c1 > 0: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 585, seq 1, length 64 > 0: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 585, seq 1, length 64 > # ping -q 192.168.1.2 -c1 > 1: IP 192.168.0.1 > 192.168.0.2: GREv0, length 102: IP 192.168.1.1 > 192.168.1.2: ICMP echo request, id 586, seq 1, length 64 > 1: IP 192.168.0.2 > 192.168.0.1: GREv0, length 102: IP 192.168.1.2 > 192.168.1.1: ICMP echo reply, id 586, seq 1, length 64 > > Signed-off-by: Matteo Croce > --- > drivers/net/bonding/bond_main.c | 77 ++++++++++++++++++++++++++++++--- > 1 file changed, 70 insertions(+), 7 deletions(-) > Hi Matteo, Wouldn't it be more useful and simpler to use some field to choose the slave (override the hash completely) in a deterministic way from user-space ? For example the mark can be interpreted as a slave id in the bonding (should be optional, to avoid breaking existing setups). ping already supports -m and anything else can set it, this way it can be used to do monitoring for a specific slave with any protocol and would be a much simpler change. User-space can then implement any logic for the monitoring case and as a minor bonus can monitor the slaves in parallel. And the opposite as well - if people don't want these balanced for some reason, they wouldn't enable it. Or maybe I've misunderstood why this change is needed. :) It would actually be nice to include the use-case which brought this on in the commit message. Cheers, Nik