Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757774AbaGBV2s (ORCPT ); Wed, 2 Jul 2014 17:28:48 -0400 Received: from mail-wg0-f50.google.com ([74.125.82.50]:38333 "EHLO mail-wg0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757566AbaGBV2p (ORCPT ); Wed, 2 Jul 2014 17:28:45 -0400 MIME-Version: 1.0 In-Reply-To: <1404278424-31176-1-git-send-email-zlim.lnx@gmail.com> References: <1404278424-31176-1-git-send-email-zlim.lnx@gmail.com> Date: Wed, 2 Jul 2014 14:28:44 -0700 Message-ID: Subject: Re: [PATCH RFC] arm64: eBPF JIT compiler From: Alexei Starovoitov To: Zi Shen Lim Cc: Catalin Marinas , Will Deacon , "David S. Miller" , Daniel Borkmann , Chema Gonzalez , LKML , "linux-arm-kernel@lists.infradead.org" , Network Development Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jul 1, 2014 at 10:20 PM, Zi Shen Lim wrote: > The JIT compiler emits A64 instructions. It supports eBPF only. > Legacy BPF is supported thanks to conversion by BPF core. > > JIT is enabled in the same way as for other architectures: > > echo 1 > /proc/sys/net/core/bpf_jit_enable > > Or for additional compiler output: > > echo 2 > /proc/sys/net/core/bpf_jit_enable > > See Documentation/networking/filter.txt for more information. > > The implementation passes all 57 tests in lib/test_bpf.c > on ARMv8 Foundation Model :) Looks great. Comments below: > --- a/arch/arm64/Makefile > +++ b/arch/arm64/Makefile > @@ -43,6 +43,7 @@ TEXT_OFFSET := 0x00080000 > export TEXT_OFFSET GZFLAGS > > core-y += arch/arm64/kernel/ arch/arm64/mm/ > +core-y += arch/arm64/net/ please use instead: core-$(CONFIG_NET) > + > +#define BITSMASK(bits) ((1 << (bits)) - 1) there is GENMASK macro already. > +/* Compare & branch (immediate) */ > +static inline u32 A64_COMP_BRANCH_IMM(int sf, int op, int imm19, int Rt) odd function name. lower case it? > +/* Conditional branch (immediate) */ > +static inline u32 A64_COND_BRANCH_IMM(int o1, int imm19, int o0, int cond) same and in several other places. I guess you're trying to make the usage look similar for macro and function calls. I don't think the look-alike is worth it. Functions should be lower case. > +#define EMIT(insn) emit(insn, ctx) extra macro just to save one argument? I would add ...,ctx) explicitly. > +#define EMIT_A64_MOV_I64(reg, val) emit_A64_MOV_I64(reg, val, ctx) same here and in other cases. > +static void build_prologue(struct jit_ctx *ctx) > +{ > + const u8 r6 = bpf2a64[BPF_REG_6]; > + const u8 r7 = bpf2a64[BPF_REG_7]; > + const u8 r8 = bpf2a64[BPF_REG_8]; > + const u8 r9 = bpf2a64[BPF_REG_9]; > + const u8 fp = bpf2a64[BPF_REG_FP]; > + const u8 ra = bpf2a64[BPF_REG_A]; > + const u8 rx = bpf2a64[BPF_REG_X]; > + const u8 tmp1 = bpf2a64[TMP_REG_1]; > + const u8 tmp2 = bpf2a64[TMP_REG_2]; > + int stack_size = MAX_BPF_STACK; > + > + stack_size += 16; /* extra for skb_copy_bit buffer */ why extra 16? skb_copy_bits is called with max len 4 > + /* Save callee-saved register */ > + EMIT(A64_PUSH(r6, r7, A64_SP)); simd style double push requires consecutive registers or not? Just curious. > +/* From load_pointer in net/core/filter.c. > + * XXX: should we just export it? */ > +extern void *bpf_internal_load_pointer_neg_helper(const struct sk_buff *skb, > + int k, unsigned int size); > +static void *load_pointer_helper(const struct sk_buff *skb, int k, > + unsigned int size, void *buffer) That's an interesting way of supporting negative offsets! probably makes sense to move load_pointer() from net/core/filter.c into filter.h and export bpf_internal_load_pointer_neg_helper() in filter.h as well, but I'm not sure which tree you want this stuff to go through. If it's arm tree, then it's better to keep things as you did and do a cleanup patch later. If net-next, then it's better to do it cleanly right away. > + case BPF_ALU | BPF_MOD | BPF_X: > + case BPF_ALU64 | BPF_MOD | BPF_X: > + ctx->tmp_used = 1; > + EMIT(A64_UDIV(is64, tmp, dst, src)); > + EMIT(A64_MUL(is64, tmp, tmp, src)); > + EMIT(A64_SUB(is64, dst, dst, tmp)); > + break; there needs to be run-time check for src == 0 > + /* dst = dst OP imm */ > + case BPF_ALU | BPF_ADD | BPF_K: > + case BPF_ALU64 | BPF_ADD | BPF_K: > + ctx->tmp_used = 1; > + EMIT_A64_MOV_I(is64, tmp, imm); > + EMIT(A64_ADD(is64, dst, dst, tmp)); > + break; Potential for future optimizations on small immediate? > + /* function call */ > + case BPF_JMP | BPF_CALL: > + { > + const u8 r0 = bpf2a64[BPF_REG_0]; > + const u64 func = (u64)__bpf_call_base + imm; > + > + ctx->tmp_used = 1; > + EMIT_A64_MOV_I64(tmp, func); > + EMIT(A64_PUSH(A64_FP, A64_LR, A64_SP)); > + EMIT(A64_MOV(1, A64_FP, A64_SP)); > + EMIT(A64_BLR(tmp)); Aren't on arm64 kernel and module_alloc() addresses in the same 32-bit range? Do you really need 'jump by register' then? Regular 'bl' would be much faster. > + /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */ > + case BPF_LD | BPF_ABS | BPF_W: > + case BPF_LD | BPF_ABS | BPF_H: > + case BPF_LD | BPF_ABS | BPF_B: > + case BPF_LD | BPF_ABS | BPF_DW: there is no such LD_ABS + DW instruction yet. Would be trivial to add, but let's not rush it in just because it's so easy. > + /* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + src + imm)) */ > + case BPF_LD | BPF_IND | BPF_W: > + case BPF_LD | BPF_IND | BPF_H: > + case BPF_LD | BPF_IND | BPF_B: > + case BPF_LD | BPF_IND | BPF_DW: > + { > + const u8 r0 = bpf2a64[BPF_REG_0]; /* r0 = return value */ > + const u8 r6 = bpf2a64[BPF_REG_6]; /* r6 = pointer to sk_buff */ > + const u8 fp = bpf2a64[BPF_REG_FP]; > + const u8 r1 = bpf2a64[BPF_REG_1]; /* r1: struct sk_buff *skb */ > + const u8 r2 = bpf2a64[BPF_REG_2]; /* r2: int k */ > + const u8 r3 = bpf2a64[BPF_REG_3]; /* r3: unsigned int size */ > + const u8 r4 = bpf2a64[BPF_REG_4]; /* r4: void *buffer */ > + const u8 r5 = bpf2a64[BPF_REG_5]; /* r5: void *(*func)(...) */ > + int size; > + > + EMIT(A64_MOV(1, r1, r6)); > + EMIT_A64_MOV_I(0, r2, imm); > + if (BPF_MODE(code) == BPF_IND) > + EMIT(A64_ADD(0, r2, r2, src)); > + switch (BPF_SIZE(code)) { > + case BPF_W: > + size = 4; > + break; > + case BPF_H: > + size = 2; > + break; > + case BPF_B: > + size = 1; > + break; > + case BPF_DW: > + size = 8; there is no DW in ld_abs/ld_ind. Let's not rush it in. > +notyet: > + pr_info("*** NOT YET: opcode %02x ***\n", code); > + return -EFAULT; It's ok to implement JIT support step by step. Just change pr_info() to pr_info_once() not to spam the logs. > + default: > + pr_err("unknown opcode %02x\n", code); same. Overall looks great. Thank you for doing all this work! Alexei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/