MIME-Version: 1.0
In-Reply-To: <538CAEA6.4060307@redhat.com>
References: <1401692506-7796-1-git-send-email-ast@plumgrid.com>
	<538C3C94.3080206@redhat.com>
	<CAMEtUuzkjZCsReWH9cZs8AU0mJjZH9YOdCBTWusxe6-NZ9mQ=g@mail.gmail.com>
	<538CAEA6.4060307@redhat.com>
Date: Mon, 2 Jun 2014 12:02:03 -0700
Message-ID: <CAMEtUuwxJOgYgFjg9deQZOffeg6-JUGZXZ1Rj344eQmriQ0F0Q@mail.gmail.com>
Subject: Re: [PATCH v2 net-next 0/2] split BPF out of core networking
From: Alexei Starovoitov <ast@plumgrid.com>
To: Daniel Borkmann <dborkman@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>, Ingo Molnar <mingo@kernel.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        Chema Gonzalez <chema@google.com>, Eric Dumazet <edumazet@google.com>,
        Peter Zijlstra <a.p.zijlstra@chello.nl>,
        Arnaldo Carvalho de Melo <acme@infradead.org>,
        Jiri Olsa <jolsa@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Andrew Morton <akpm@linux-foundation.org>,
        Kees Cook <keescook@chromium.org>,
        Network Development <netdev@vger.kernel.org>,
        LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset=UTF-8
Sender: linux-kernel-owner@vger.kernel.org

On Mon, Jun 2, 2014 at 10:04 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 06/02/2014 05:41 PM, Alexei Starovoitov wrote:
> ...
>
>> Glad you brought up this point :)
>> 100% agree that current double verification done by seccomp is far from
>> being generic and quite hard to maintain, since any change done to
>> classic BPF verifier needs to be thought through from
>> seccomp_check_filter()
>> perspective as well.
>
>
> Glad we're on the same page.
>
>
>> BPF's input context, set of allowed calls need to be expressed in a
>> generic way.
>> Obviously this split by itself won't make classic BPF all of a sudden
>> generic.
>> It rather defines a boundary of eBPF core.
>
>
> Note, I'm not at all against using it in tracing, I think it's probably
> a good idea, but shouldn't we _first_ think about how to overcome such
> deficits as above by improving upon its in-kernel API design, thus to
> better prepare it to be generic? I feel this step is otherwise just
> skipped and quickly 'hacked' around ... ;)

Are you talking about classic 'deficit' or eBPF 'deficit' ?
Classic has all sorts of hard coded assumptions. The whole
concept of 'load from magic constant' to mean different things
is flawed. We all got used to it and now think that it's normal
for "ld_abs -4056" to mean "a ^= x"
This split is not trying to make classic easier to hack.
With eBPF underneath classic, it got a lot easier to add extensions
to classic, but we shouldn't be doing it.
Classic BPF is not generic and cannot become one. It's eBPF's job.

The split is mainly helping to clearly see the boundary of eBPF core
vs its socket use case. It doesn't change or add any API.
We need to carefully design eBPF APIs when we expose it
to user space. I have a proposal for that too, but that's separate
discussion.
In terms of in-kernel eBPF API there is nothing to be done.
eBPF program 'prog' is generated by whatever means and then:
      struct sk_filter *fp;

      fp = kzalloc(sk_filter_size(prog_len), GFP_KERNEL);
      memcpy(fp->insni, prog, prog_len * sizeof(fp->insni[0]));
      fp->len = prog_len;

      sk_filter_select_runtime(fp); // select interpreter or JIT
      SK_RUN_FILTER(fp, ctx);  // run the program
      sk_filter_free(fp); // free program

that's how sockets, testsuite, seccomp, tracing are doing it.
All have different ways of producing 'prog' and 'prog_len'.
This in-kernel API cleanup was done in commit 5fe821a9dee2
You even acked it back then :)

If you're referring to eBPF verifier in-kernel API then yeah, it's
missing, just like the whole eBPF verifier :)
Ideally any kernel component that generates eBPF on the fly
sends eBPF program to verifier first just to double check
that generated program is valid.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/