Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756372AbaFYWAl (ORCPT ); Wed, 25 Jun 2014 18:00:41 -0400 Received: from mail-ie0-f175.google.com ([209.85.223.175]:36835 "EHLO mail-ie0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754214AbaFYWAj (ORCPT ); Wed, 25 Jun 2014 18:00:39 -0400 MIME-Version: 1.0 In-Reply-To: <53A9337F.50707@redhat.com> References: <1398882591-30422-1-git-send-email-chema@google.com> <1401389758-13252-1-git-send-email-chema@google.com> <5387C8AD.6000909@redhat.com> <538C6FD8.9040305@redhat.com> <538D884E.5030007@redhat.com> <538EDE1A.8060305@redhat.com> <53A9337F.50707@redhat.com> Date: Wed, 25 Jun 2014 15:00:38 -0700 Message-ID: Subject: Re: [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF From: Chema Gonzalez To: Daniel Borkmann Cc: Alexei Starovoitov , Ingo Molnar , Steven Rostedt , Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa , Thomas Gleixner , "H. Peter Anvin" , Andrew Morton , Kees Cook , David Miller , Eric Dumazet , Network Development , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 24, 2014 at 1:14 AM, Daniel Borkmann wrote: >> This is a high-level decision, more than a technical one. Do we want >> to freeze classic BPF development in linux, even before we have a >> complete eBPF replacement, and zero eBPF tool (libpcap) support? > > > In my opinion, I don't think we strictly have to hard-freeze it. The > only concern I see is that conceptually hooking into the flow_dissector > to read out all keys for further processing on top of them 1) sort > of breaks/bypasses the concept of BPF (as it's actually the task of > BPF itself for doing this), I don't think we want to do flow dissection using BPF insns. It's not easy to write BPF insns, and we already have kernel code that does that. IMO that's what eBPF calls/BPF ancillary loads are for (e.g. vlan access). > 2) effectively freezes any changes to the > flow_dissector as BPF applications making use of it now depend on the > provided offsets for doing further processing on top of them, 3) it Remember that my approach does not have (user-visible) offsets. It uses the eBPF stack to dump the output (struct flow_keys) of the flow dissector (skb_flow_dissect()). The only dependencies we're adding is that, once we provide a BPF ancillary load to access e.g. thoff, we have to keep providing it. > can already be resolved by (re-)writing the kernel's flow dissector > in C-like syntax in user space iff eBPF can be loaded from there with > similar performance. So shouldn't we rather work towards that as a > more generic approach/goal in the mid term and w/o having to maintain > a very short term intermediate solution that we need to special case > along the code and have to carry around forever ... Once (if) we reach the point where we can do eBPF filters in "C-like syntax," I'd agree with you that it would be nice to be able to reuse the same function inside the kernel and as an eBPF library. The same probably applies to other network functions. Now, I'm not sure what's the model to reuse: Are we planning to re-write (maybe "re-write" is too strong, as we will probably only need some minor changes) some of the kernel functions into this "C--" language so that eBPF can use them? Do other people agree with this vision? There's still the problem of whether we want to obsolete classic BPF in the kernel before the tools (libpcap mainly) accept eBPF. This can take a lot. Finally, what's the user's CLI interface you have in mind? Right now, tcpdump expressions are very handy: I know I can pass "ip[2:2] == 1500" or "(tcp[13] & 0x03)" to any libpcap-based application. This is very handy to log into a machine, and quickly run tcpdump to get the packets I'm interested on. What would be the model for using C-- eBPF filters in the same manner? Thanks again, -Chema -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/