Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965373AbaFCVko (ORCPT ); Tue, 3 Jun 2014 17:40:44 -0400 Received: from mail-ie0-f177.google.com ([209.85.223.177]:53302 "EHLO mail-ie0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965335AbaFCVkk (ORCPT ); Tue, 3 Jun 2014 17:40:40 -0400 MIME-Version: 1.0 In-Reply-To: References: <1401692506-7796-1-git-send-email-ast@plumgrid.com> <538C3C94.3080206@redhat.com> <538CAEA6.4060307@redhat.com> <538D8DAA.7090105@redhat.com> <538E319B.3000606@redhat.com> Date: Tue, 3 Jun 2014 14:40:39 -0700 Message-ID: Subject: Re: [PATCH v2 net-next 0/2] split BPF out of core networking From: Chema Gonzalez To: Alexei Starovoitov Cc: Daniel Borkmann , "David S. Miller" , Ingo Molnar , Steven Rostedt , Eric Dumazet , Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa , Thomas Gleixner , "H. Peter Anvin" , Andrew Morton , Kees Cook , Network Development , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org First of all, and just to join the crowd, kernel/bpf/ FTW. Now, I have some suggestions about eBPF. IMO classic BPF is an ISA oriented to filter (meaning returning a single integer that states how many bytes of the packet must be captured) packets (e.g. consider the 6 load modes, where 3 provide access the packet -- abs, ind, msh --, one to an skb field -- len--, the 5th one to the memory itself -- mem --, and the 6th is an immediate set mode --imm-- ) that has been used in other environments (seccomp, tracing, etc.) by (a) extending the idea of a "packet" into a "buffer", and (b) adding ancillary loads. eBPF should be a generic ISA that can be used by many environments, including those served today by classic BPF. IMO, we should get a nicely-defined ISA (MIPS anyone?) and check what should go into eBPF and what should not. - 1. we should considering separating the eBPF ISA farther from classic BPF - eBPF still uses a_reg and x_reg as the names of the 2 op registers. This is very confusing, especially when dealing with translated filters that do move data between A and X. I've had a_reg being X, and x_reg being A. We should rename them d_reg and s_reg. - BPF_LD vs. BPF_LDX: this made sense in classic BPF, as there was only one register, and d_reg was implicit in the name of the insn code. Now, why are we keeping both in eBPF, when the register we're writing to is made explicit in d_reg (I already forgot if d_reg was a_reg or x_reg ;) ? Removing one of them will save us 1/8th of the insns. - BPF_ST vs. BPF_STX: same here. Note that the current sk_convert_filter() just converts all stores to BPF_STX. - 2. there are other insn that we should consider adding: - lui: AFAICT, there is no clean way to build a 64-bit number (you can LD_IMM the upper part, lsh 32, and then add the lower part). - nop: I'd like to have a nop. Do I know why? Nope. On Tue, Jun 3, 2014 at 1:58 PM, Alexei Starovoitov wrote: > On Tue, Jun 3, 2014 at 1:35 PM, Daniel Borkmann wrote: >> On 06/03/2014 05:44 PM, Alexei Starovoitov wrote: >> ... >>> >>> All of your points are valid. They are right questions to ask. I just >>> >>> don't see why you're still arguing about first step of filter.c split, >>> whereas your concerns are about steps 2, 3, 4. >> >> >> Fair enough, lets keep them in mind though for future work. Btw, > > Ok :) > >> are other files planned for kernel/bpf/ or should it instead just >> simply be kernel/bpf.c? > > The most obvious one is eBPF verifier in separate file (kernel/bpf/verifier.c) > bpf maps is yet another thing, but that's different topic. > Probably a set of bpf-callable functions in another file. Like right now > for sockets these helpers are __skb_get_pay_offset(), __skb_get_nlattr() > For tracing there will be a different set of helper functions and eventually > some will be common. Like __get_raw_cpu_id() from filter.c could > eventually move to kernel/bpf/helpers.c LGTM. I like the idea of every user (packet filter, seccomp, etc.) providing a map of the bpf calls that are ok, as in the packet filter stating that {1->__skb_get_pay_offset(), 2->__skb_get_nlattr(), ...}, but seccomp providing a completely different (or even empty) map. -Chema > I'm not a fan of squeezing different logic into one file. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/