Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8996356imu; Tue, 4 Dec 2018 18:50:28 -0800 (PST) X-Google-Smtp-Source: AFSGD/XltJB9j+O3hhQYA8OkB8QPXdZSoz3UylQNuk8IRAuy+Al4IgHuDIGIDFd3i7/idZuB+Rts X-Received: by 2002:a62:1c06:: with SMTP id c6mr22202312pfc.157.1543978228437; Tue, 04 Dec 2018 18:50:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543978228; cv=none; d=google.com; s=arc-20160816; b=0sK08QBn0qz5hxooA0XvRedrvBGG4R2UdXpHSRrq2grr1FMzCrhamWPc6y4uysVbu5 /aApQksArULCXzG0g0jY9sQ+t2F8V/XS5iLheKppnjdveav6w+IbkntRl8nCvYk6WQUG l2FnubmLo1qlPW/D4T6ezxROhnQCc6To6zKp+3nFgKgimqArYRVWOD9A+SKjpPJNRIsF YkR88RYLyCZv3jIxlaRc9GU+olFmwHOnRrW+mYmrtdEHd6AomLaVgcRT1ap9CaceAyqM y0JvjMm1At/LCrIGub+Z571E1iQod7fn56yA3udZ5HpEG+Sr//JfUiTokkqbT7v5Qugv aOdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=PCBvX9s5iUr4KG02Mf7F60KrJg0h+OuE4z4iGyRamW4=; b=qihcNouyVqT40QbQFdgRdEMx8NBJeOvHtkR+Q1if1kpDesPqFU+12+DCs3nl9qV6tT wFj33N4D1DuG9Aywzz6FWkkWJlI7GlIwB3TV3ZI7rjKd+BsQZKdFLgDHUo6HYfBKIsNb l0TKl/+1c/xpDvT4VV/Szy8BUeQ5mgKD/Pz+02LT63ArboI0Zi3k7HpqbCbs3TpshmqP QzeVPA4nnbSgtcUEpE1SHlexJ6S0b/c4P4IF0gx/bhJI+bIZnWh3BZVhyHl2Jt0JvWwn k8bD8O8ZtUnTZWCXHfUiZSlOjIP0DZafVUNEDLKBn6c60y78IZ3pi/2Gj82husXppeuC be3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=L28U+PqZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j193si17016885pge.332.2018.12.04.18.50.13; Tue, 04 Dec 2018 18:50:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=L28U+PqZ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726830AbeLECtf (ORCPT + 99 others); Tue, 4 Dec 2018 21:49:35 -0500 Received: from mail-io1-f54.google.com ([209.85.166.54]:43864 "EHLO mail-io1-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725864AbeLECtf (ORCPT ); Tue, 4 Dec 2018 21:49:35 -0500 Received: by mail-io1-f54.google.com with SMTP id f10so9646901iop.10; Tue, 04 Dec 2018 18:49:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=PCBvX9s5iUr4KG02Mf7F60KrJg0h+OuE4z4iGyRamW4=; b=L28U+PqZfYnyR2dk5speoap7xelQ0W1IgC4VmYkoo9IZW1pU/Ay/WOsg8XTK0KH2fr luF6OiTcxhkvAZP61tgZjCUeKOmnA/IvcGhzzD2RZKWVroJ6js2ZmxkXWw1mGU1gmfN9 rU9biH8B5cG1m1ej5GaS9cpMZ/VccMaq0sb4rnP26VtqSG1h/bCO4oO5EEvtRlcR6893 EoGLD5Se2XUJ2WPPJUxgAeMzi80E625tug8+5HJzGKNnsEKK9OylGAAFStm+Yut7qu3e YxcliAa0X0GrJPuvXgTOUW9JNrBTARlA2R6kDU5pTupzoV+FBz9Kz5YZqVCgyeCLmbcJ /1cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=PCBvX9s5iUr4KG02Mf7F60KrJg0h+OuE4z4iGyRamW4=; b=HbNEjLbIpneMoQ+O3bYqB770KP8IOycxfUUd7nBQ4OpBigYoUaeEVUZmWeB/fasYwM gghww27gMQNIEVNtgJp6hyWuDchqKAG4dqnfXzVKZaUXJ4iYZE1y7xZxwMP/ukTGrT5P N8gbkQf1X7u7/l1cTgm/7xh0cbjd4pIiWnTR3XyejU3GJB/jYZB/PD4ziJjhUdBRxYtX rpdR4t+VjdKResFJNHw+i2zEqqHe7Td6Dw72TccFtvOVTIQrTsl9FH1TZ+evEIyUE4X5 qqodroeDWMBDShgQFJ6cDJxAaHtOm6qvM1CFSVMWrRgElZmaihEbwsbTIi4d7Oa31wb+ +5YA== X-Gm-Message-State: AA+aEWbdIRIxWZdiTPNfGHASKrEaexactTNlBGcocNa21OzRZS/FPui1 UwWy10j61kRil8xEIfQGXE0= X-Received: by 2002:a6b:3b47:: with SMTP id i68mr19376206ioa.133.1543978173629; Tue, 04 Dec 2018 18:49:33 -0800 (PST) Received: from ast-mbp.dhcp.thefacebook.com ([2620:10d:c090:200::4:ffc9]) by smtp.gmail.com with ESMTPSA id l25sm7263736ioj.68.2018.12.04.18.49.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Dec 2018 18:49:32 -0800 (PST) Date: Tue, 4 Dec 2018 18:49:30 -0800 From: Alexei Starovoitov To: Aaron Conole Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, Alexei Starovoitov , Daniel Borkmann , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , John Fastabend , Jesper Brouer , "David S . Miller" , Andy Gospodarek , Rony Efraim , Simon Horman , Marcelo Leitner Subject: Re: [RFC -next v0 1/3] bpf: modular maps Message-ID: <20181205024928.57xcrgspllcr7umo@ast-mbp.dhcp.thefacebook.com> References: <20181125180919.13996-1-aconole@bytheb.org> <20181125180919.13996-2-aconole@bytheb.org> <20181127020608.4vucwmhrtu2cxrwu@ast-mbp.dhcp.thefacebook.com> <20181128051001.wcsgqx3d6c2aszp6@ast-mbp.dhcp.thefacebook.com> <20181129041948.pepdcksplt6xppk3@ast-mbp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180223 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Nov 30, 2018 at 08:49:17AM -0500, Aaron Conole wrote: > > While this is one reason to use hash map, I don't think we should use > this as a reason to exclude development of a data type that may work > better. After all, if we can do better then we should. I'm all for improving existing hash map or implementing new data types. Like classifier map == same as wild-card match map == ACL map. The one that OVS folks could use and other folks wanted for long time. But I don't want bpf to become a collection of single purpose solutions. Like mega-flow style OVS map. That one does linear number of lookups applying mask at a time. It sounds to me that you're proposing "NAT-as-bpf-helper" or "NAT-as-bpf-map" type of solution. That falls into single purpose solution category. I'd rather see generic connection tracking building block. The one that works out of skb and out of XDP layer. Existing stack-queue-map can already be used to allocate integers out of specified range. It can be used to implement port allocation for NAT. If generic stack-queue-map is not enough, let's improve it. > >> forward direction addresses could be different from reverse direction so > >> just swapping addresses / ports will not match). > > > > That makes no sense to me. What would be an example of such flow? > > Certainly not a tcp flow. > > Maybe it's poorly worded on my part. Think about this scenario (ipv4, tcp): > > Interfaces A(internet), B(lan) > > When XDP program receives a packet from B, it will have a tuple like: > > source=B-subnet:B-port dest=inet-addr:inet-port > > When XDP program receives a packet from A, it will have a tuple like: > > source=inet-addr:inet-port dest=gw-addr:gw-port first of all there are two netdevs. one XDP program can attach to multiple netdevs, but in this case we're dealing with two indepedent tcp flows. > The only data in common there is inet-addr:inet-port, and that will > likely be shared among too many connections to be a valid key. two independent tcp flows don't make a 'connection'. That definition of connection is only meaningful in the context of the particular problem you're trying to solve and confuses me quite a bit. > I don't know how to figure out from A the same connetion that > corresponds to B. A really simple static map works, *except*, when > something causes either side of the connection to become invalid, I > can't mark the other side. For instance, even if I have some static > mapping, I might not be able to infer the correct B-side tuple from the > A-side tuple to do the teardown. I don't think I got enough information from the above description to understand why two tcp flows (same as two tcp connections) will form single 'connection' in your definition of connection. > 1. Port / address reservation. If I want to do NAT, I need to reserve > ports and addresses correctly. That requires knowing the interface > addresses, and which addresses are currently allocated. The stack > knows this already, let it do these allocations then. Then when > packets arrive for the connection that the stack set up, just forward > via XDP. I beg to disagree. For NAT use case the stack has nothing to do with port allocation for NATing. It's all within NAT framework (whichever way it's implemented). The stack cares about sockets and ports that are open on the host to be consumed by the host. NAT function is independent of that. > 2. Helpers. Parsing an in-flight stream is always going to be slow. > Let the stack do that. But when it sets up an expectation, then use > that information to forward that via XDP. XDP parses packets way faster than the stack, since XDP deals with linear buffers whereas stack has to do pskb_may_pull at every step. The stack can be optimized further, but assuming that packet parsing by the stack is faster than XDP and making techincal decisions based on that just doesn't seem like the right approach to take.