Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752224Ab2BQDEr (ORCPT ); Thu, 16 Feb 2012 22:04:47 -0500 Received: from smarthost1.greenhost.nl ([195.190.28.78]:39315 "EHLO smarthost1.greenhost.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751041Ab2BQDEo (ORCPT ); Thu, 16 Feb 2012 22:04:44 -0500 Message-ID: In-Reply-To: <1329422549-16407-1-git-send-email-wad@chromium.org> References: <1329422549-16407-1-git-send-email-wad@chromium.org> Date: Fri, 17 Feb 2012 04:04:35 +0100 Subject: Re: [PATCH v8 1/8] sk_run_filter: add support for custom load_pointer From: "Indan Zupancic" To: "Will Drewry" Cc: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org, linux-doc@vger.kernel.org, kernel-hardening@lists.openwall.org, netdev@vger.kernel.org, x86@kernel.org, arnd@arndb.de, davem@davemloft.net, hpa@zytor.com, mingo@redhat.com, oleg@redhat.com, peterz@infradead.org, rdunlap@xenotime.net, mcgrathr@chromium.org, tglx@linutronix.de, luto@mit.edu, eparis@redhat.com, serge.hallyn@canonical.com, djm@mindrot.org, scarybeasts@gmail.com, pmoore@redhat.com, akpm@linux-foundation.org, corbet@lwn.net, eric.dumazet@gmail.com, markus@chromium.org, keescook@chromium.org, "Will Drewry" User-Agent: SquirrelMail/1.4.22 MIME-Version: 1.0 Content-Type: text/plain;charset=UTF-8 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-Spam-Score: 0.1 X-Scan-Signature: ee2bfa361008024c6c092bb669749993 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 16074 Lines: 524 Hello, On Thu, February 16, 2012 21:02, Will Drewry wrote: > This change allows CONFIG_SECCOMP to make use of BPF programs for > user-controlled system call filtering (as shown in this patch series). > > To minimize the impact on existing BPF evaluation, function pointer > use must be declared at sk_chk_filter-time. This allows ancillary > load instructions to be generated that use the function pointer rather > than adding _any_ code to the existing LD_* instruction paths. > > Crude performance numbers using udpflood -l 10000000 against dummy0. > 3 trials for baseline, 3 for with tcpdump. Averaged then differenced. > Hard to believe trials were repeated at least a couple more times. > > * x86 32-bit (Atom N570 @ 1.66 GHz 2 core HT) [stackprot]: > - Without: 94.05s - 76.36s = 17.68s > - With: 86.22s - 73.30s = 12.92s > - Slowdown per call: -476 nanoseconds > > * x86 32-bit (Atom N570 @ 1.66 GHz 2 core HT) [no stackprot]: > - Without: 92.06s - 77.81s = 14.25s > - With: 91.77s - 76.91s = 14.86s > - Slowdown per call: +61 nanoseconds > > * x86 64-bit (Atom N570 @ 1.66 GHz 2 core HT) [stackprot]: > - Without: 122.58s - 99.54s = 23.04s > - With: 115.52s - 98.99s = 16.53s > - Slowdown per call: -651 nanoseconds > > * x86 64-bit (Atom N570 @ 1.66 GHz 2 core HT) [no stackprot]: > - Without: 114.95s - 91.92s = 23.03s > - With: 110.47s - 90.79s = 19.68s > - Slowdown per call: -335 nanoseconds > > This makes the x86-32-nossp make sense. Added register pressure always > makes x86-32 sad. Your 32-bit numbers are better than your 64-bit numbers, so I don't get this comment. > If this is a concern, I could change the call > approach to bpf_run_filter to see if I can alleviate it a bit. > > That said, the x86-*-ssp numbers show a marked increase in performance. > I've tested and retested and I keep getting these results. I'm also > suprised by the nossp speed up on 64-bit, but I dunno. I haven't looked > at the full disassembly of the call path. If that is required for the > performance differences I'm seeing, please let me know. Or if I there is > a preferred cpu to run this against - atoms can be a little weird. Yeah, testing on Atom is a bit silly. > v8: - fixed variable positioning and bad cast (eric.dumazet@gmail.com) > - no longer passes A as a pointer (inspection of x86 asm shows A is > %ebx again; thanks eric.dumazet@gmail.com) > - cleaned up switch macros and expanded use > (joe@perches.com, indan@nul.nu) > - added length fn pointer and handled LD_W_LEN/LDX_W_LEN > - moved from a wrapping struct to a typedef for the function > pointer. (matches existing function pointer style) > - added comprehensive comment above the typedef. > - benchmarks > v7: - first cut > > Signed-off-by: Will Drewry > --- > include/linux/filter.h | 69 +++++++++++++++++++++- > net/core/filter.c | 152 +++++++++++++++++++++++++++++++++++++---------- > 2 files changed, 185 insertions(+), 36 deletions(-) > > diff --git a/include/linux/filter.h b/include/linux/filter.h > index 8eeb205..d22ad46 100644 > --- a/include/linux/filter.h > +++ b/include/linux/filter.h > @@ -110,6 +110,9 @@ struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ > */ > #define BPF_MEMWORDS 16 > > +/* BPF program (checking) flags */ > +#define BPF_CHK_FLAGS_NO_SKB 1 > + > /* RATIONALE. Negative offsets are invalid in BPF. > We use them to reference ancillary data. > Unlike introduction new instructions, it does not break > @@ -145,17 +148,67 @@ struct sk_filter > struct sock_filter insns[0]; > }; > > +/** > + * struct bpf_load_fns - callbacks for bpf_run_filter > + * These functions are called by bpf_run_filter if bpf_chk_filter > + * was invoked with BPF_CHK_FLAGS_NO_SKB. > + * > + * pointer: > + * @data: const pointer to the data passed into bpf_run_filter > + * @k: offset into @skb's data > + * @size: the size of the requested data in bytes: 1, 2, or 4. > + * @buffer: If non-NULL, a 32-bit buffer for staging data. > + * > + * Returns a pointer to the requested data. > + * > + * This function operates similarly to load_pointer in net/core/filter.c > + * except that the pointer to the returned data must already be > + * byteswapped as appropriate to the source data and endianness. > + * @buffer may be used if the data needs to be staged. > + * > + * length: > + * @data: const pointer to the data passed into bpf_fun_filter > + * > + * Returns the length of the data. > + */ > +struct bpf_load_fns { > + void *(*pointer)(const void *data, int k, unsigned int size, > + void *buffer); > + u32 (*length)(const void *data); > +}; Like I said in the other email, length is useless for the non-skb case. If you really want to add it, just make it a constant. And 'pointer' isn't the best name. > + > static inline unsigned int sk_filter_len(const struct sk_filter *fp) > { > return fp->len * sizeof(struct sock_filter) + sizeof(*fp); > } > > +extern unsigned int bpf_run_filter(const void *data, > + const struct sock_filter *filter, > + const struct bpf_load_fns *load_fn); > + > +/** > + * sk_run_filter - run a filter on a socket > + * @skb: buffer to run the filter on > + * @fentry: filter to apply > + * > + * Runs bpf_run_filter with the struct sk_buff-specific data > + * accessor behavior. > + */ > +static inline unsigned int sk_run_filter(const struct sk_buff *skb, > + const struct sock_filter *filter) > +{ > + return bpf_run_filter(skb, filter, NULL); > +} > + > extern int sk_filter(struct sock *sk, struct sk_buff *skb); > -extern unsigned int sk_run_filter(const struct sk_buff *skb, > - const struct sock_filter *filter); > extern int sk_attach_filter(struct sock_fprog *fprog, struct sock *sk); > extern int sk_detach_filter(struct sock *sk); > -extern int sk_chk_filter(struct sock_filter *filter, unsigned int flen); > +extern int bpf_chk_filter(struct sock_filter *filter, unsigned int flen, u32 flags); > + > +static inline int sk_chk_filter(struct sock_filter *filter, unsigned int flen) > +{ > + return bpf_chk_filter(filter, flen, 0); > +} > > #ifdef CONFIG_BPF_JIT > extern void bpf_jit_compile(struct sk_filter *fp); > @@ -228,6 +281,16 @@ enum { > BPF_S_ANC_HATYPE, > BPF_S_ANC_RXHASH, > BPF_S_ANC_CPU, > + /* Used to differentiate SKB data and generic data */ > + BPF_S_ANC_LD_W_ABS, > + BPF_S_ANC_LD_H_ABS, > + BPF_S_ANC_LD_B_ABS, > + BPF_S_ANC_LD_W_LEN, > + BPF_S_ANC_LD_W_IND, > + BPF_S_ANC_LD_H_IND, > + BPF_S_ANC_LD_B_IND, > + BPF_S_ANC_LDX_W_LEN, > + BPF_S_ANC_LDX_B_MSH, > }; > > #endif /* __KERNEL__ */ > diff --git a/net/core/filter.c b/net/core/filter.c > index 5dea452..a5c98a9 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -98,9 +98,10 @@ int sk_filter(struct sock *sk, struct sk_buff *skb) > EXPORT_SYMBOL(sk_filter); > > /** > - * sk_run_filter - run a filter on a socket > - * @skb: buffer to run the filter on > + * bpf_run_filter - run a filter on a BPF program The filter is the BPF program, so this comment is weird. > + * @data: buffer to run the filter on > * @fentry: filter to apply > + * @load_fns: custom data accessor functions > * > * Decode and apply filter instructions to the skb->data. > * Return length to keep, 0 for none. @skb is the data we are > @@ -108,9 +109,13 @@ EXPORT_SYMBOL(sk_filter); > * Because all jumps are guaranteed to be before last instruction, > * and last instruction guaranteed to be a RET, we dont need to check > * flen. (We used to pass to this function the length of filter) > + * > + * load_fn is only used if SKF_FLAGS_USE_LOAD_FNS was specified > + * to sk_chk_generic_filter. Stale comment. > */ > -unsigned int sk_run_filter(const struct sk_buff *skb, > - const struct sock_filter *fentry) > +unsigned int bpf_run_filter(const void *data, > + const struct sock_filter *fentry, > + const struct bpf_load_fns *load_fns) > { > void *ptr; > u32 A = 0; /* Accumulator */ > @@ -128,6 +133,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb, > #else > const u32 K = fentry->k; > #endif > +#define SKB(_data) ((const struct sk_buff *)(_data)) Urgh! If you had done: const struct sk_buff *skb = data; at the top, all those changed wouldn't be needed and it would look better too. > > switch (fentry->code) { > case BPF_S_ALU_ADD_X: > @@ -213,7 +219,7 @@ unsigned int sk_run_filter(const struct sk_buff *skb, > case BPF_S_LD_W_ABS: > k = K; > load_w: > - ptr = load_pointer(skb, k, 4, &tmp); > + ptr = load_pointer(data, k, 4, &tmp); > if (ptr != NULL) { > A = get_unaligned_be32(ptr); > continue; > @@ -222,7 +228,7 @@ load_w: > case BPF_S_LD_H_ABS: > k = K; > load_h: > - ptr = load_pointer(skb, k, 2, &tmp); > + ptr = load_pointer(data, k, 2, &tmp); > if (ptr != NULL) { > A = get_unaligned_be16(ptr); > continue; > @@ -231,17 +237,17 @@ load_h: > case BPF_S_LD_B_ABS: > k = K; > load_b: > - ptr = load_pointer(skb, k, 1, &tmp); > + ptr = load_pointer(data, k, 1, &tmp); > if (ptr != NULL) { > A = *(u8 *)ptr; > continue; > } > return 0; > case BPF_S_LD_W_LEN: > - A = skb->len; > + A = SKB(data)->len; > continue; > case BPF_S_LDX_W_LEN: > - X = skb->len; > + X = SKB(data)->len; > continue; > case BPF_S_LD_W_IND: > k = X + K; > @@ -253,7 +259,7 @@ load_b: > k = X + K; > goto load_b; > case BPF_S_LDX_B_MSH: > - ptr = load_pointer(skb, K, 1, &tmp); > + ptr = load_pointer(data, K, 1, &tmp); > if (ptr != NULL) { > X = (*(u8 *)ptr & 0xf) << 2; > continue; > @@ -288,29 +294,29 @@ load_b: > mem[K] = X; > continue; > case BPF_S_ANC_PROTOCOL: > - A = ntohs(skb->protocol); > + A = ntohs(SKB(data)->protocol); > continue; > case BPF_S_ANC_PKTTYPE: > - A = skb->pkt_type; > + A = SKB(data)->pkt_type; > continue; > case BPF_S_ANC_IFINDEX: > - if (!skb->dev) > + if (!SKB(data)->dev) > return 0; > - A = skb->dev->ifindex; > + A = SKB(data)->dev->ifindex; > continue; > case BPF_S_ANC_MARK: > - A = skb->mark; > + A = SKB(data)->mark; > continue; > case BPF_S_ANC_QUEUE: > - A = skb->queue_mapping; > + A = SKB(data)->queue_mapping; > continue; > case BPF_S_ANC_HATYPE: > - if (!skb->dev) > + if (!SKB(data)->dev) > return 0; > - A = skb->dev->type; > + A = SKB(data)->dev->type; > continue; > case BPF_S_ANC_RXHASH: > - A = skb->rxhash; > + A = SKB(data)->rxhash; > continue; > case BPF_S_ANC_CPU: > A = raw_smp_processor_id(); > @@ -318,15 +324,15 @@ load_b: > case BPF_S_ANC_NLATTR: { > struct nlattr *nla; > > - if (skb_is_nonlinear(skb)) > + if (skb_is_nonlinear(SKB(data))) > return 0; > - if (A > skb->len - sizeof(struct nlattr)) > + if (A > SKB(data)->len - sizeof(struct nlattr)) > return 0; > > - nla = nla_find((struct nlattr *)&skb->data[A], > - skb->len - A, X); > + nla = nla_find((struct nlattr *)&SKB(data)->data[A], > + SKB(data)->len - A, X); > if (nla) > - A = (void *)nla - (void *)skb->data; > + A = (void *)nla - (void *)SKB(data)->data; > else > A = 0; > continue; > @@ -334,22 +340,71 @@ load_b: > case BPF_S_ANC_NLATTR_NEST: { > struct nlattr *nla; > > - if (skb_is_nonlinear(skb)) > + if (skb_is_nonlinear(SKB(data))) > return 0; > - if (A > skb->len - sizeof(struct nlattr)) > + if (A > SKB(data)->len - sizeof(struct nlattr)) > return 0; > > - nla = (struct nlattr *)&skb->data[A]; > - if (nla->nla_len > A - skb->len) > + nla = (struct nlattr *)&SKB(data)->data[A]; > + if (nla->nla_len > A - SKB(data)->len) > return 0; > > nla = nla_find_nested(nla, X); > if (nla) > - A = (void *)nla - (void *)skb->data; > + A = (void *)nla - (void *)SKB(data)->data; > else > A = 0; > continue; > } All changes up to here are unnecessary. > + case BPF_S_ANC_LD_W_ABS: > + k = K; > +load_fn_w: > + ptr = load_fns->pointer(data, k, 4, &tmp); > + if (ptr) { > + A = *(u32 *)ptr; > + continue; > + } > + return 0; > + case BPF_S_ANC_LD_H_ABS: > + k = K; > +load_fn_h: > + ptr = load_fns->pointer(data, k, 2, &tmp); > + if (ptr) { > + A = *(u16 *)ptr; > + continue; > + } > + return 0; > + case BPF_S_ANC_LD_B_ABS: > + k = K; > +load_fn_b: > + ptr = load_fns->pointer(data, k, 1, &tmp); > + if (ptr) { > + A = *(u8 *)ptr; > + continue; > + } > + return 0; > + case BPF_S_ANC_LDX_B_MSH: > + ptr = load_fns->pointer(data, K, 1, &tmp); > + if (ptr) { > + X = (*(u8 *)ptr & 0xf) << 2; > + continue; > + } > + return 0; > + case BPF_S_ANC_LD_W_IND: > + k = X + K; > + goto load_fn_w; > + case BPF_S_ANC_LD_H_IND: > + k = X + K; > + goto load_fn_h; > + case BPF_S_ANC_LD_B_IND: > + k = X + K; > + goto load_fn_b; > + case BPF_S_ANC_LD_W_LEN: > + A = load_fns->length(data); > + continue; > + case BPF_S_ANC_LDX_W_LEN: > + X = load_fns->length(data); These two should either return 0, be networking-only, just return 0/-1 or use a constant length. > + continue; > default: > WARN_RATELIMIT(1, "Unknown code:%u jt:%u tf:%u k:%u\n", > fentry->code, fentry->jt, > @@ -360,7 +415,7 @@ load_b: > > return 0; > } > -EXPORT_SYMBOL(sk_run_filter); > +EXPORT_SYMBOL(bpf_run_filter); > > /* > * Security : > @@ -423,9 +478,10 @@ error: > } > > /** > - * sk_chk_filter - verify socket filter code > + * bpf_chk_filter - verify socket filter BPF code > * @filter: filter to verify > * @flen: length of filter > + * @flags: May be BPF_CHK_FLAGS_NO_SKB or 0 > * > * Check the user's filter code. If we let some ugly > * filter code slip through kaboom! The filter must contain > @@ -434,9 +490,13 @@ error: > * > * All jumps are forward as they are not signed. > * > + * If BPF_CHK_FLAGS_NO_SKB is set in flags, any SKB-specific > + * rules become illegal and a custom set of bpf_load_fns will > + * be expected by bpf_run_filter. > + * > * Returns 0 if the rule set is legal or -EINVAL if not. > */ > -int sk_chk_filter(struct sock_filter *filter, unsigned int flen) > +int bpf_chk_filter(struct sock_filter *filter, unsigned int flen, u32 flags) > { > /* > * Valid instructions are initialized to non-0. > @@ -542,9 +602,35 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen) > pc + ftest->jf + 1 >= flen) > return -EINVAL; > break; > +#define MAYBE_USE_LOAD_FN(CODE) \ > + if (flags & BPF_CHK_FLAGS_NO_SKB) { \ > + code = BPF_S_ANC_##CODE; \ > + break; \ > + } You can as well hide everything in the macro then, including the case, like the ANCILLARY() macro does. > + case BPF_S_LD_W_LEN: > + MAYBE_USE_LOAD_FN(LD_W_LEN); > + break; > + case BPF_S_LDX_W_LEN: > + MAYBE_USE_LOAD_FN(LDX_W_LEN); > + break; > + case BPF_S_LD_W_IND: > + MAYBE_USE_LOAD_FN(LD_W_IND); > + break; > + case BPF_S_LD_H_IND: > + MAYBE_USE_LOAD_FN(LD_H_IND); > + break; > + case BPF_S_LD_B_IND: > + MAYBE_USE_LOAD_FN(LD_B_IND); > + break; > + case BPF_S_LDX_B_MSH: > + MAYBE_USE_LOAD_FN(LDX_B_MSH); > + break; > case BPF_S_LD_W_ABS: > + MAYBE_USE_LOAD_FN(LD_W_ABS); > case BPF_S_LD_H_ABS: > + MAYBE_USE_LOAD_FN(LD_H_ABS); > case BPF_S_LD_B_ABS: > + MAYBE_USE_LOAD_FN(LD_B_ABS); > #define ANCILLARY(CODE) case SKF_AD_OFF + SKF_AD_##CODE: \ > code = BPF_S_ANC_##CODE; \ > break > @@ -572,7 +658,7 @@ int sk_chk_filter(struct sock_filter *filter, unsigned int flen) > } > return -EINVAL; > } > -EXPORT_SYMBOL(sk_chk_filter); > +EXPORT_SYMBOL(bpf_chk_filter); > > /** > * sk_filter_release_rcu - Release a socket filter by rcu_head > -- > 1.7.5.4 > Greetings, Indan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/