Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966846AbaFTV4Z (ORCPT ); Fri, 20 Jun 2014 17:56:25 -0400 Received: from mail-ie0-f182.google.com ([209.85.223.182]:41655 "EHLO mail-ie0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754155AbaFTV4W (ORCPT ); Fri, 20 Jun 2014 17:56:22 -0400 MIME-Version: 1.0 In-Reply-To: <538EDE1A.8060305@redhat.com> References: <1398882591-30422-1-git-send-email-chema@google.com> <1401389758-13252-1-git-send-email-chema@google.com> <5387C8AD.6000909@redhat.com> <538C6FD8.9040305@redhat.com> <538D884E.5030007@redhat.com> <538EDE1A.8060305@redhat.com> Date: Fri, 20 Jun 2014 14:56:21 -0700 Message-ID: Subject: Re: [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF From: Chema Gonzalez To: Daniel Borkmann Cc: Alexei Starovoitov , Ingo Molnar , Steven Rostedt , Peter Zijlstra , Arnaldo Carvalho de Melo , Jiri Olsa , Thomas Gleixner , "H. Peter Anvin" , Andrew Morton , Kees Cook , David Miller , Eric Dumazet , Network Development , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I'll try to revive the discussion for this patch, in case I can convince you about its implementation. I rebased it to the latest HEAD, and I'm ready to re-submit. On Wed, Jun 4, 2014 at 1:51 AM, Daniel Borkmann wrote: > On 06/03/2014 11:12 PM, Chema Gonzalez wrote: > ... > >> Your approach needs it too. Citing from your pseudo-code: >> >>> ld #5 <-- indicates to fill the first 5 slots of M[], so M[0] to M[4] >>> ld #keys <-- triggers the extension to fill the M[] slots >>> ld M[0] <-- loads nhoff from M[0] into accu >> >> >> How does the "ld M[0]" know that the actual flow dissector has already >> been called? What if the insn just before the "ld #5" was "jmp +2" ? >> In that case, the "ld #keys" would have never been called. > > > But that case would be no different from doing something like ... > > [...] > jmp foo > ldi #42 > st M[0] > foo: > ld M[0] > [...] > > ... and would then not pass the checker in check_load_and_stores(), > which, as others have already stated, would need to be extended, > of course. It's one possible approach. As Alexei/you have noted, this is moving the complexity to check_load_and_stores(). It's more complicated than the current ensure-there's-a-store-preceding-each-load-for-each-M[]-position check. In particular, your approach requires 2 insns before a memory position can be considered filled, namely the "ld #5" and the "ld #keys". That means adding intra-instruction state to the FSM in check_load_and_stores(). A second issue is that you need to ensure that M[0:5] does not get polluted between the moment you call "ld #5; ld #keys" and the moment the code uses the flow-dissector values. This is more complex than just ensuring the code doesn't access uninit'ed M[]. There's also some concerns about effectively adding a prologue. We already have that: $ cat net/core/filter.c ... int sk_convert_filter(struct sock_filter *prog, int len, struct sock_filter_int *new_prog, int *new_len) { ... if (new_insn) *new_insn = BPF_MOV64_REG(BPF_REG_CTX, BPF_REG_ARG1); new_insn++; for (i = 0; i < len; fp++, i++) { ... The current prologue is embedded in sk_convert_filter(). The patch just moves it to its own function, and (potentially) adds another instruction (which zeroes flow_inited iff there's any call to the flow dissector). >>> Anyway as I said before I'm not excited about either. >>> I don't think we should be adding classic BPF extensions any more. >>> The long term headache of supporting classic BPF extensions >>> outweighs the short term benefits. >> I see a couple of issues with (effectively) freezing classic BPF >> development while waiting for direct eBPF access to happen. The first >> one is that the kernel has to accept it. I can see many questions >> about this, especially security and usability (I'll send an email >> about the "split BPF out of core later"). Now, the main issue is >> whether/when the tools will support it. IMO, this is useful iff I can >> quickly write/reuse filters and run tcpdump filters based on them. I'm >> trying to get upstream libpcap to accept support for raw (classic) BPF >> filters, and it's taking a long time. I can imagine how they may be >> less receptive about supporting a Linux-only eBPF mechanism. Tools do >> matter. This is a high-level decision, more than a technical one. Do we want to freeze classic BPF development in linux, even before we have a complete eBPF replacement, and zero eBPF tool (libpcap) support? > Grepping through libpcap code, which tries to be platform independent, > it seems after all the years, the only thing where you can see support > for in their code is SKF_AD_PKTTYPE and SKF_AD_PROTOCOL. Perhaps they Actually they recently added MOD/XOR support. Woo-hoo! > just don't care, perhaps they do, who knows, but it looks to me a bit > that they are reluctant to these improvements, maybe for one reason > that other OSes don't support it. >From the comments in the MOD/XOR patch, the latter seem to be the issue. > That was also one of the reasons that > led me to start writing bpf_asm (net/tools/) for having a small DSL > for more easily trying out BPF code while having _full_ control over it. > > Maybe someone should start a binary-compatible Linux-only version of > libpcap, where tcpdump will transparently make use of these low level > improvements eventually. ;) There's too much code dependent on libpcap to make a replacement possible. Thanks, -Chema -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/