Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp478661imm; Wed, 15 Aug 2018 00:06:43 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyDgLoHhtrl8e15uVI3W0EI2040lE3xIyr8UGXResjl9cLYsEnI5ssvW2hbUmGFvx40qPnV X-Received: by 2002:a65:538e:: with SMTP id x14-v6mr23732837pgq.388.1534316803618; Wed, 15 Aug 2018 00:06:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534316803; cv=none; d=google.com; s=arc-20160816; b=zGVW3Rnco3jZ1KUOfr6VlJf9BosCgEJj43n9nu1v3g7KSX0A9DlAaQg457ip5iD16U jhfL7vZDaQqfGrq7yKzPL6k2ncbeKlC4D5MtCAttx9iJtw8o09KHLXxq/iecjyJh2NAW g0RWv1Bpd0Wr1KjPludXjfCbNEjDycqm1wVDW8tCWa53Ce6B+TmPY/TXpZofhyzaS8vb f0qiz7WZJacff9YNXFXrBMuF+lyx2xS+mTKLXTRvMsy1QAVBrcHHK3BHJyZT0aaIuVmH GBj+cz1VHarYE8+zwvQnSCFD94vSfrCgMK7Ll9+tRL46+mc1vt3vtukP1S9zjJAdVKln 2xrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=I7UqSmXAUizOosMF87TmsRWO6FLcae69cB5hFdGkpSw=; b=cZs41jyMnbdwUZtrgni2w2XazdSht71iL4kSZAV+3uIK41vLvvyKVlGmiSlIm3n6qv +436yG+0amMGc+mXMJklGFJ6q/3YlKadpJ7Rcs5LHT4sdlXq7MuGhshXU/zcZCSzhrOc mR1JKTLNAD7FBo8huHlVsotEV+Hko586xvHhla83NxmAfdQGavkPVNgUB71SJcSFTB9O d6cVCjke9QMlCAn2N9G5zmdDY7KQ5LY1TvqebABLXIaDcF75CHoWiIptptfuEmcBlhdN NuIuE0GCXYNFX8WY21WHoOYZa6nRmWzQb2c+Qu7rAtN/Ps28MDKceDoLWuIOfIx6fN8O 2ctg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d26-v6si22153466pgb.571.2018.08.15.00.06.17; Wed, 15 Aug 2018 00:06:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728752AbeHOJzp (ORCPT + 99 others); Wed, 15 Aug 2018 05:55:45 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55210 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726135AbeHOJzo (ORCPT ); Wed, 15 Aug 2018 05:55:44 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7080D7A7E8; Wed, 15 Aug 2018 07:04:46 +0000 (UTC) Received: from [10.72.12.27] (ovpn-12-27.pek2.redhat.com [10.72.12.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8C1C11012331; Wed, 15 Aug 2018 07:04:39 +0000 (UTC) Subject: Re: [RFC PATCH net-next V2 0/6] XDP rx handler To: Alexei Starovoitov Cc: David Ahern , Jesper Dangaard Brouer , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, mst@redhat.com, Toshiaki Makita References: <1534130250-5302-1-git-send-email-jasowang@redhat.com> <20180814003253.fkgl6lyklc7fclvq@ast-mbp> <5de3d14f-f21a-c806-51f4-b5efd7d809b7@redhat.com> <20180814121734.105769fa@redhat.com> <03ab3b18-9b13-8169-7e68-ada307694bc1@redhat.com> <08bf7aec-078a-612d-833f-5b3d09a289d0@gmail.com> <20180815053550.5g4f5qeb7r4wtgm5@ast-mbp> From: Jason Wang Message-ID: <0809cbab-9c91-52f1-2abe-124a255d9304@redhat.com> Date: Wed, 15 Aug 2018 15:04:35 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20180815053550.5g4f5qeb7r4wtgm5@ast-mbp> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 15 Aug 2018 07:04:46 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Wed, 15 Aug 2018 07:04:46 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018年08月15日 13:35, Alexei Starovoitov wrote: > On Wed, Aug 15, 2018 at 08:29:45AM +0800, Jason Wang wrote: >> Looks less flexible since the topology is hard coded in the XDP program >> itself and this requires all logic to be implemented in the program on the >> root netdev. >> >>> I have L3 forwarding working for vlan devices and bonds. I had not >>> considered macvlans specifically yet, but it should be straightforward >>> to add. >>> >> Yes, and all these could be done through XDP rx handler as well, and it can >> do even more with rather simple logic: >> >> 1 macvlan has its own namespace, and want its own bpf logic. >> 2 Ruse the exist topology information for dealing with more complex setup >> like macvlan on top of bond and team. There's no need to bpf program to care >> about topology. If you look at the code, there's even no need to attach XDP >> on each stacked device. The calling of xdp_do_pass() can try to pass XDP >> buff to upper device even if there's no XDP program attached to current >> layer. >> 3 Deliver XDP buff to userspace through macvtap. > I think I'm getting what you're trying to achieve. > You actually don't want any bpf programs in there at all. > You want macvlan builtin logic to act on raw packet frames. The built-in logic is just used to find the destination macvlan device. It could be done by through another bpf program. Instead of inventing lots of generic infrastructure on kernel with specific userspace API, built-in logic has its own advantages: - support hundreds or even thousands of macvlans - using exist tools to configure network - immunity to topology changes > It would have been less confusing if you said so from the beginning. The name "XDP rx handler" is probably not good. Something like "stacked deivce XDP" might be better. > I think there is little value in such work, since something still > needs to process this raw frames eventually. If it's XDP with BPF progs > than they can maintain the speed, but in such case there is no need > for macvlan. The first layer can be normal xdp+bpf+xdp_redirect just fine. I'm a little bit confused. We allow per veth XDP program, so I believe per macvlan XDP program makes sense as well? This allows great flexibility and there's no need to care about topology in bpf program. The configuration is also greatly simplified. The only difference is we can use xdp_redirect for veth since it was pair device, we can transmit XDP frames to one veth and do XDP on its peer. This does not work for the case of macvlan which is based on rx handler. Actually, for the case of veth, if we implement XDP rx handler for bridge it can works seamlessly with veth like. eth0(XDP_PASS) -> [bridge XDP rx handler and ndo_xdp_xmit()] -> veth --- veth (XDP). Besides the usage for containers, we can implement macvtap RX handler which allows a fast packet forwarding to userspace. > In case where there is no xdp+bpf in final processing, the frames are > converted to skb and performance is lost, so in such cases there is no > need for builtin macvlan acting on raw xdp frames either. Just keep > existing macvlan acting on skbs. > Yes, this is how veth works as well. Actually, the idea is not limited to macvlan but for all device that is based on rx handler. Consider the case of bonding, this allows to set a very simple XDP program on slaves and keep a single main logic XDP program on the bond instead of duplicating it in all slaves. Thanks