Received: by 2002:a4a:311b:0:0:0:0:0 with SMTP id k27-v6csp4250667ooa; Tue, 14 Aug 2018 03:29:17 -0700 (PDT) X-Google-Smtp-Source: AA+uWPylsmdU9QXGTv5pr/3NkHOsWJifANy8TszcdUCl+4+F/Q5oDCn+a81vSJtOhr7rwLtvN/DX X-Received: by 2002:a17:902:b709:: with SMTP id d9-v6mr4316005pls.138.1534242557182; Tue, 14 Aug 2018 03:29:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534242557; cv=none; d=google.com; s=arc-20160816; b=Wqw4gDHPIbs8bykbHBntuUDeYIiKZZxTd76c2xWI4cbbF5wKqwgJxyI6wJx0j/qtUO unkj2xv0E8qt3nxr6fqqdnYIIhdsXBSiyF8s1hXmn12J0UnNdwemipC5lwQWoc3JGHGG d7QyPIDcUVWCLS28qn1lyzbYeAcJDbSU7OlVgQdlJSZFl49FSPu93o8HEOSifLNp77dK gEPTAjv1vfEgZgJkTCSDOgm3PIsBBa1n2rIoejmElreZVolbqB1XL5GYzJFmOjY9bQdC zXCZ6y84EL3Rou0PragWQ5teGMEIhAPuMXW2j5E/aLICSNttzCYp4o3Vlakx2s//OAro Lhqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :organization:references:in-reply-to:message-id:subject:cc:to:from :date:arc-authentication-results; bh=xy69SF+LUVNfJPn/bGShmN/SP+Yi34ZIiXkbDVGNG80=; b=FKy4YY7+vkIG/Dm3oh3zDkhu7XzwsbXMspE2OhcbdGv99M/lwFgwIglNtjeJZtjqCX yezW4tyAakQpbZLLn8nchJYHqI/aMRNRmRn9+1CGlRVRbweUXlL1rky6jrwekYcHPhEn cUCeYBldhSsTz5VoveTXgawOnLVhLLevfUUx8dys1d2EHyQX21/KgzPyog3Cf+a107UN ZVX1gGO+ibf9HpOwwt8+/4brNqPKCONXD2Qr5bKNAjCSc39q02QK1yqVZrPlIY9TIKqx WnBWTzimDYftuiZiBgXZxpJD1AfFrMyDh1J0ZMmN/f6tVyrc46f9jGvgS4buqhohxXvj Nr3g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r17-v6si17462641pgd.682.2018.08.14.03.29.02; Tue, 14 Aug 2018 03:29:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732174AbeHNNEO convert rfc822-to-8bit (ORCPT + 99 others); Tue, 14 Aug 2018 09:04:14 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34084 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727858AbeHNNEO (ORCPT ); Tue, 14 Aug 2018 09:04:14 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 999A6B5C6; Tue, 14 Aug 2018 10:17:42 +0000 (UTC) Received: from localhost (ovpn-200-47.brq.redhat.com [10.40.200.47]) by smtp.corp.redhat.com (Postfix) with ESMTP id C8A3E2156712; Tue, 14 Aug 2018 10:17:39 +0000 (UTC) Date: Tue, 14 Aug 2018 12:17:34 +0200 From: Jesper Dangaard Brouer To: Jason Wang Cc: Alexei Starovoitov , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net, mst@redhat.com Subject: Re: [RFC PATCH net-next V2 0/6] XDP rx handler Message-ID: <20180814121734.105769fa@redhat.com> In-Reply-To: <5de3d14f-f21a-c806-51f4-b5efd7d809b7@redhat.com> References: <1534130250-5302-1-git-send-email-jasowang@redhat.com> <20180814003253.fkgl6lyklc7fclvq@ast-mbp> <5de3d14f-f21a-c806-51f4-b5efd7d809b7@redhat.com> Organization: Red Hat Inc. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 14 Aug 2018 10:17:42 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.1]); Tue, 14 Aug 2018 10:17:42 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jbrouer@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 14 Aug 2018 15:59:01 +0800 Jason Wang wrote: > On 2018年08月14日 08:32, Alexei Starovoitov wrote: > > On Mon, Aug 13, 2018 at 11:17:24AM +0800, Jason Wang wrote: > >> Hi: > >> > >> This series tries to implement XDP support for rx hanlder. This would > >> be useful for doing native XDP on stacked device like macvlan, bridge > >> or even bond. > >> > >> The idea is simple, let stacked device register a XDP rx handler. And > >> when driver return XDP_PASS, it will call a new helper xdp_do_pass() > >> which will try to pass XDP buff to XDP rx handler directly. XDP rx > >> handler may then decide how to proceed, it could consume the buff, ask > >> driver to drop the packet or ask the driver to fallback to normal skb > >> path. > >> > >> A sample XDP rx handler was implemented for macvlan. And virtio-net > >> (mergeable buffer case) was converted to call xdp_do_pass() as an > >> example. For ease comparision, generic XDP support for rx handler was > >> also implemented. > >> > >> Compared to skb mode XDP on macvlan, native XDP on macvlan (XDP_DROP) > >> shows about 83% improvement. > > I'm missing the motiviation for this. > > It seems performance of such solution is ~1M packet per second. > > Notice it was measured by virtio-net which is kind of slow. > > > What would be a real life use case for such feature ? > > I had another run on top of 10G mlx4 and macvlan: > > XDP_DROP on mlx4: 14.0Mpps > XDP_DROP on macvlan: 10.05Mpps > > Perf shows macvlan_hash_lookup() and indirect call to > macvlan_handle_xdp() are the reasons for the number drop. I think the > numbers are acceptable. And we could try more optimizations on top. > > So here's real life use case is trying to have an fast XDP path for rx > handler based device: > > - For containers, we can run XDP for macvlan (~70% of wire speed). This > allows a container specific policy. > - For VM, we can implement macvtap XDP rx handler on top. This allow us > to forward packet to VM without building skb in the setup of macvtap. > - The idea could be used by other rx handler based device like bridge, > we may have a XDP fast forwarding path for bridge. > > > > > Another concern is that XDP users expect to get line rate performance > > and native XDP delivers it. 'generic XDP' is a fallback only > > mechanism to operate on NICs that don't have native XDP yet. > > So I can replace generic XDP TX routine with a native one for macvlan. If you simply implement ndo_xdp_xmit() for macvlan, and instead use XDP_REDIRECT, then we are basically done. > > Toshiaki's veth XDP work fits XDP philosophy and allows > > high speed networking to be done inside containers after veth. > > It's trying to get to line rate inside container. > > This is one of the goal of this series as well. I agree veth XDP work > looks pretty fine, but it only work for a specific setup I believe since > it depends on XDP_REDIRECT which is supported by few drivers (and > there's no VF driver support). The XDP_REDIRECT (RX-side) is trivial to add to drivers. It is a bad argument that only a few drivers implement this. Especially since all drivers also need to be extended with your proposed xdp_do_pass() call. (rant) The thing that is delaying XDP_REDIRECT adaption in drivers, is that it is harder to implement the TX-side, as the ndo_xdp_xmit() call have to allocate HW TX-queue resources. If we disconnect RX and TX side of redirect, then we can implement RX-side in an afternoon. > And in order to make it work for a end > user, the XDP program still need logic like hash(map) lookup to > determine the destination veth. That _is_ the general idea behind XDP and eBPF, that we need to add logic that determine the destination. The kernel provides the basic mechanisms for moving/redirecting packets fast, and someone else builds an orchestration tool like Cilium, that adds the needed logic. Did you notice that we (Ahern) added bpf_fib_lookup a FIB route lookup accessible from XDP. For macvlan, I imagine that we could add a BPF helper that allows you to lookup/call macvlan_hash_lookup(). > > This XDP rx handler stuff is destined to stay at 1Mpps speeds forever > > and the users will get confused with forever slow modes of XDP. > > > > Please explain the problem you're trying to solve. > > "look, here I can to XDP on top of macvlan" is not an explanation of the problem. > > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer