Received: by 2002:a05:6a10:17d3:0:0:0:0 with SMTP id hz19csp2395547pxb; Tue, 13 Apr 2021 00:23:55 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzouIVWs75fRfwpPhaGUf4Oeozgadw9REqwfv/DDLOTOyeK5Ah48XXOARZctZp1Yh2gMQEI X-Received: by 2002:aa7:d60f:: with SMTP id c15mr33049608edr.88.1618298635213; Tue, 13 Apr 2021 00:23:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1618298635; cv=none; d=google.com; s=arc-20160816; b=r0ngTHMW5cb3kUKU6ZrPvBSpyTis/HFUes7ncsAowBzxr3N1LwuOcGoP5dIHK5+WvN KyN21wVwcPOPvZ+X/kURA9GSvJTwpjN749wVnwtST1nvOfsAi5b2+JC4aa/KS6xcakck CrDQLEQ4ZVZBSqwiKK6TyJwUl1Sc2f4g/F6/DSrqSVZw0GNfHjW0NNkyl6sFs8rBrhIg MNsw/ign/BMmPiy9wrR85Sgg4Rj7i8Bwg9JB/yTu3xySa3OKgBB8aJVvL5NaI7O8h7xn La/WqnK0jTx/+k3tFfrhzJS4ckSQX96hP8cwgwwR0GL+2+XuS1DhSpRSUwl3F2LVj4/Q Ri3g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date; bh=FWEy11AOyC9dEN+/pqf2aiy6N0h34zUXeONQvibyBJ4=; b=N8vaGvLCnAOXQjJTObwE3iL5W20wRMTq0k1mTPERlZHiFREqOkcRUzhDzQ7lCex7zq hKrXOmeXGLoiz4spLnIQawJcyxnQG3E6Or2wZeO7e7ozyj6T9Kqr81G9n5e/gre2DlWp U4X4T8gb1z1Q6P0a3BU4eVTjJ/y8fzJZYEL+NbnOGCZP4k7cTXY3r0Axh5C0WYmPkp9q rysUgIHLuufMSDcOs8Axzzf4chGcCRVxKtQvXo5Zyz12kBGKnPs4UgRHxGrj3Nhjnfj+ k8rt8Nf4P5wO6valPkkDtd6TjmMjQzItCXYi/AkKABaIjVxms9Mskq4V7zNA6OgBXDqH sjzw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nic.cz Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id gx3si7810940ejb.269.2021.04.13.00.23.32; Tue, 13 Apr 2021 00:23:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nic.cz Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239966AbhDLWzk (ORCPT + 99 others); Mon, 12 Apr 2021 18:55:40 -0400 Received: from lists.nic.cz ([217.31.204.67]:60788 "EHLO mail.nic.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238085AbhDLWzj (ORCPT ); Mon, 12 Apr 2021 18:55:39 -0400 Received: from thinkpad (unknown [IPv6:2a0e:b107:ae1:0:3e97:eff:fe61:c680]) by mail.nic.cz (Postfix) with ESMTPSA id AC2DC13FC7A; Tue, 13 Apr 2021 00:55:18 +0200 (CEST) Date: Tue, 13 Apr 2021 00:55:18 +0200 From: Marek Behun To: Tobias Waldekranz Cc: Vladimir Oltean , Ansuel Smith , netdev@vger.kernel.org, "David S. Miller" , Jakub Kicinski , Andrew Lunn , Vivien Didelot , Florian Fainelli , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Eric Dumazet , Wei Wang , Cong Wang , Taehee Yoo , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , zhang kai , Weilong Chen , Roopa Prabhu , Di Zhu , Francis Laniel , linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC net-next 0/3] Multi-CPU DSA support Message-ID: <20210413005518.2f9b9cef@thinkpad> In-Reply-To: <87wnt7jgzk.fsf@waldekranz.com> References: <20210410133454.4768-1-ansuelsmth@gmail.com> <20210411200135.35fb5985@thinkpad> <20210411185017.3xf7kxzzq2vefpwu@skbuf> <878s5nllgs.fsf@waldekranz.com> <20210412213045.4277a598@thinkpad> <8735vvkxju.fsf@waldekranz.com> <20210412235054.73754df9@thinkpad> <87wnt7jgzk.fsf@waldekranz.com> X-Mailer: Claws Mail 3.17.8 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-100.0 required=5.9 tests=SHORTCIRCUIT,URIBL_BLOCKED, USER_IN_WELCOMELIST,USER_IN_WHITELIST shortcircuit=ham autolearn=disabled version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on mail.nic.cz X-Virus-Scanned: clamav-milter 0.102.2 at mail X-Virus-Status: Clean Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Apr 2021 00:05:51 +0200 Tobias Waldekranz wrote: > On Mon, Apr 12, 2021 at 23:50, Marek Behun wrote: > > On Mon, 12 Apr 2021 23:22:45 +0200 > > Tobias Waldekranz wrote: > > > >> On Mon, Apr 12, 2021 at 21:30, Marek Behun wrote: > >> > On Mon, 12 Apr 2021 14:46:11 +0200 > >> > Tobias Waldekranz wrote: > >> > > >> >> I agree. Unless you only have a few really wideband flows, a LAG will > >> >> typically do a great job with balancing. This will happen without the > >> >> user having to do any configuration at all. It would also perform well > >> >> in "router-on-a-stick"-setups where the incoming and outgoing port is > >> >> the same. > >> > > >> > TLDR: The problem with LAGs how they are currently implemented is that > >> > for Turris Omnia, basically in 1/16 of configurations the traffic would > >> > go via one CPU port anyway. > >> > > >> > > >> > > >> > One potencial problem that I see with using LAGs for aggregating CPU > >> > ports on mv88e6xxx is how these switches determine the port for a > >> > packet: only the src and dst MAC address is used for the hash that > >> > chooses the port. > >> > > >> > The most common scenario for Turris Omnia, for example, where we have 2 > >> > CPU ports and 5 user ports, is that into these 5 user ports the user > >> > plugs 5 simple devices (no switches, so only one peer MAC address for > >> > port). So we have only 5 pairs of src + dst MAC addresses. If we simply > >> > fill the LAG table as it is done now, then there is 2 * 0.5^5 = 1/16 > >> > chance that all packets would go through one CPU port. > >> > > >> > In order to have real load balancing in this scenario, we would either > >> > have to recompute the LAG mask table depending on the MAC addresses, or > >> > rewrite the LAG mask table somewhat randomly periodically. (This could > >> > be in theory offloaded onto the Z80 internal CPU for some of the > >> > switches of the mv88e6xxx family, but not for Omnia.) > >> > >> I thought that the option to associate each port netdev with a DSA > >> master would only be used on transmit. Are you saying that there is a > >> way to configure an mv88e6xxx chip to steer packets to different CPU > >> ports depending on the incoming port? > >> > >> The reason that the traffic is directed towards the CPU is that some > >> kind of entry in the ATU says so, and the destination of that entry will > >> either be a port vector or a LAG. Of those two, only the LAG will offer > >> any kind of balancing. What am I missing? > > > > Via port vectors you can "load balance" by ports only, i.e. input port X > > -> trasmit via CPU port Y. > > How is this done? In a case where there is no bridging between the > ports, then I understand. Each port could have its own FID. But if you > have this setup... > > br0 wan > / \ > lan0 lan1 > > lan0 and lan1 would use the same FID. So how could you say that frames > from lan0 should go to cpu0 and frames from lan1 should go to cpu1 if > the DA is the same? What would be the content of the ATU in a setup > like that? > > > When using LAGs, you are load balancing via hash(src MAC | dst mac) > > only. This is better in some ways. But what I am saying is that if the > > LAG mask table is static, as it is now implemented in mv88e6xxx code, > > then for many scenarios there is a big probability of no load balancing > > at all. For Turris Omnia for example there is 6.25% probability that > > the switch chip will send all traffic to the CPU via one CPU port. > > This is because the switch chooses the LAG port only from the hash of > > dst+src MAC address. (By the 1/16 = 6.25% probability I mean that for > > cca 1 in 16 customers, the switch would only use one port when sending > > data to the CPU). > > > > The round robin solution here is therefore better in this case. > > I agree that it would be better in that case. I just do not get how you > get the switch to do it for you. I thought that this is configured in the mv88e6xxx_port_vlan() function. For each port, you specify via which ports data can egress. So for ports 0, 2, 4 you can enable CPU port 0, and for ports 1 and 3 CPU port 1. Am I wrong? I confess that I did not understand this into the most fine details, so it is entirely possible that I am missing something important and am completely wrong. Maybe this cannot be done. Marek