Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752114AbdHST7F (ORCPT ); Sat, 19 Aug 2017 15:59:05 -0400 Received: from mail-vk0-f68.google.com ([209.85.213.68]:33384 "EHLO mail-vk0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751267AbdHST7D (ORCPT ); Sat, 19 Aug 2017 15:59:03 -0400 MIME-Version: 1.0 In-Reply-To: References: <1503134429-29063-1-git-send-email-illusionist.neo@gmail.com> From: Shubham Bansal Date: Sun, 20 Aug 2017 01:29:02 +0530 Message-ID: Subject: Re: [PATCH net-next v3] arm: eBPF JIT compiler To: Alexei Starovoitov Cc: Russell King - ARM Linux , David Miller , Network Development , Daniel Borkmann , linux-arm-kernel@lists.infradead.org, LKML , Kees Cook , Andrew Lunn Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2881 Lines: 73 > impressive work. > Acked-by: Alexei Starovoitov Thanks :) I can't take all the credit. It was Daniel and Kees who helped me a lot. I would have given up a long time ago without them. > > Any performance numbers with vs without JIT ? Here is the mail from Kees on v1 of the patch. For what it's worth, I did an comparison of the numbers Shubham posted in another thread for the JIT, comparing the eBPF interpreter with his new JIT. The post is here: https://www.spinics.net/lists/netdev/msg436402.html Other than that I can send the test runs which have time, but I will not be able to compare them like kees this week. Does that sound good? > >> +static const u8 bpf2a32[][2] = { >> + /* return value from in-kernel function, and exit value from eBPF >> */ >> + [BPF_REG_0] = {ARM_R1, ARM_R0}, >> + /* arguments from eBPF program to in-kernel function */ >> + [BPF_REG_1] = {ARM_R3, ARM_R2}, > > > as far as i understand arm32 calling convention the mapping makes sense > to me. Hard to come up with anything better than the above. I tried different versions of it, according to the need of different eBPF instructions, as you can see, we are register deficient. This is the best I could come up with. Would love to hear any improvement over this. > >> + /* function call */ >> + case BPF_JMP | BPF_CALL: >> + { >> + const u8 *r0 = bpf2a32[BPF_REG_0]; >> + const u8 *r1 = bpf2a32[BPF_REG_1]; >> + const u8 *r2 = bpf2a32[BPF_REG_2]; >> + const u8 *r3 = bpf2a32[BPF_REG_3]; >> + const u8 *r4 = bpf2a32[BPF_REG_4]; >> + const u8 *r5 = bpf2a32[BPF_REG_5]; >> + const u32 func = (u32)__bpf_call_base + (u32)imm; >> + >> + emit_a32_mov_r64(true, r0, r1, false, false, ctx); >> + emit_a32_mov_r64(true, r1, r2, false, true, ctx); >> + emit_push_r64(r5, 0, ctx); >> + emit_push_r64(r4, 8, ctx); >> + emit_push_r64(r3, 16, ctx); >> + >> + emit_a32_mov_i(tmp[1], func, false, ctx); >> + emit_blx_r(tmp[1], ctx); > > > to improve the cost of call we can teach verifier to mark the registers > actually used to pass arguments, so not all pushes would be needed. > But it may be drop in the bucket comparing to the cost of compound > 64-bit alu ops. Thats right. But still an improvement I guess. I think I discussed it with Daniel and I thought, I should get this patch reach mainstream first then I can improve on it. > There was some work on llvm side to use 32-bit subregisters which > should help 32-bit architectures and JITs, but it didn't go far. > So if you're interested further improving bpf program speeds on arm32 > you may take a look at llvm side. I can certainly provide the tips. Sure. Sounds good. Best, Shubham