Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp3192539imu; Fri, 23 Nov 2018 23:46:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/VDApdAmtvgLKSoNlA+gKNqdUGCupLKpZXz3+VikGxPtbREoavCAgsD/t5uYVKXrZ0f9cIX X-Received: by 2002:a63:eb0e:: with SMTP id t14mr17219894pgh.445.1543045574037; Fri, 23 Nov 2018 23:46:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543045574; cv=none; d=google.com; s=arc-20160816; b=bgKH6rsC/LYIkKKpiYmQBzLC1YGLTWuUVYC45eV7xJrbI1L3iIrQwuGv+3oWd8Vt50 m3VUEkUEcYLblA7ycOkIOZtm8x8+z5lQr19zE0TJh+zjmE/A/7M2Wn0eP1p/M+KiWuvT emXjv4/X0D1to2iPlEQs6IjRhTJypegZWHpt0KXATIEi9lYtlqxj0KIlYLuCR5J8Xvse +NaX1UDFKMYWlessLZaOd6nCO9zMRwW/Hn5r0JmfQtVjuGDtGoF6vWaaeNwLM7lIFlyD DgID3c6y0h9xThFoeAr9KCXBDotDZVsejVvudbIZNKxCHOsYM50boxHfdsyOf2QgXnyv tPPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=GdqTEGUmHPE+5pQ5MVPECrqrWS+2NGdt5sP1W8HoFcg=; b=qgQZN89WSOljsR279Vg7/CVYulIKjkE3HvOhOCtxDXsdO74S1rmwx5lBlOw3omhzvg c1WNCjzs/nhORevpegURUBmSj/AmHlw3u9QcwqBhhlWZqdeoWnm5K+W4/obdWTnv880+ PfSDGEDilATukf0T4nLRSBV4yQcmO+qSwGvFPMF2mDipeSrGa54CFDfDglptgG8YELQc +bUs/BbxF2YW+C7vjOlFsY6qUTGhiWnXFpqYLvD/PC6lDzQcTURK3R8s1Rn8QRQ3CxOs ULHx1UWJE5GRU2lyJiyU73tx0JfRwClHy6lOqqUXXhCVb/It7toTNP1L3ZKxd2fTwUua 4VSQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 89-v6si15029710plb.405.2018.11.23.23.45.59; Fri, 23 Nov 2018 23:46:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407832AbeKWJbi (ORCPT + 99 others); Fri, 23 Nov 2018 04:31:38 -0500 Received: from www62.your-server.de ([213.133.104.62]:57700 "EHLO www62.your-server.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730540AbeKWJbh (ORCPT ); Fri, 23 Nov 2018 04:31:37 -0500 Received: from [78.46.172.3] (helo=sslproxy06.your-server.de) by www62.your-server.de with esmtpsa (TLSv1.2:DHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89_1) (envelope-from ) id 1gPxn2-0002lF-9x; Thu, 22 Nov 2018 23:49:40 +0100 Received: from [178.197.248.22] (helo=linux.home) by sslproxy06.your-server.de with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.89) (envelope-from ) id 1gPxn2-0000K4-3H; Thu, 22 Nov 2018 23:49:40 +0100 Subject: Re: [PATCH v2 2/2] arm64/bpf: don't allocate BPF JIT programs in module memory To: Ard Biesheuvel Cc: linux-arm-kernel , Alexei Starovoitov , Rick Edgecombe , Eric Dumazet , Jann Horn , Kees Cook , Jessica Yu , Arnd Bergmann , Catalin Marinas , Will Deacon , Mark Rutland , "David S. Miller" , Linux Kernel Mailing List , "" References: <20181121131733.14910-1-ard.biesheuvel@linaro.org> <20181121131733.14910-3-ard.biesheuvel@linaro.org> <945415e1-0ff8-65ce-15fa-33cea0a7d1c9@iogearbox.net> From: Daniel Borkmann Message-ID: <37f48c6a-e760-87eb-6559-afdd2f116ed6@iogearbox.net> Date: Thu, 22 Nov 2018 23:49:39 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Authenticated-Sender: daniel@iogearbox.net X-Virus-Scanned: Clear (ClamAV 0.100.2/25143/Thu Nov 22 15:14:45 2018) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/22/2018 09:02 AM, Ard Biesheuvel wrote: > On Thu, 22 Nov 2018 at 00:20, Daniel Borkmann wrote: >> On 11/21/2018 02:17 PM, Ard Biesheuvel wrote: >>> The arm64 module region is a 128 MB region that is kept close to >>> the core kernel, in order to ensure that relative branches are >>> always in range. So using the same region for programs that do >>> not have this restriction is wasteful, and preferably avoided. >>> >>> Now that the core BPF JIT code permits the alloc/free routines to >>> be overridden, implement them by simple vmalloc_exec()/vfree() >>> calls, which can be served from anywere. This also solves an >>> issue under KASAN, where shadow memory is needlessly allocated for >>> all BPF programs (which don't require KASAN shadow pages since >>> they are not KASAN instrumented) >>> >>> Signed-off-by: Ard Biesheuvel >>> --- >>> arch/arm64/net/bpf_jit_comp.c | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c >>> index a6fdaea07c63..f91b7c157841 100644 >>> --- a/arch/arm64/net/bpf_jit_comp.c >>> +++ b/arch/arm64/net/bpf_jit_comp.c >>> @@ -940,3 +940,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) >>> tmp : orig_prog); >>> return prog; >>> } >>> + >>> +void *bpf_jit_alloc_exec(unsigned long size) >>> +{ >>> + return vmalloc_exec(size); >>> +} >>> + >>> +void bpf_jit_free_exec(const void *addr) >>> +{ >>> + return vfree(size); >>> +} >> >> Hmm, could you elaborate in the commit log on the potential performance >> regression for JITed progs on arm64 after this change? > > This does not affect the generated code, so I don't anticipate a > performance hit. Did you have anything in particular in mind? We do optimize immediate emission in the JIT, I was mostly wondering that once the code is much further away from core kernel how much more insns we might need to emit in some worst case for each BPF helper call, but then unlike some other archs we always use absolute addresses so nothing would change here, so never mind. (And BPF to BPF calls emits unoptimized 64 immediates since this is needed as we pass through the JIT several times so we need this as place holder for later once the address is actually known.) >> I think this change would also break JITing of BPF to BPF calls. You might >> have the same issue as ppc64 folks where the offset might not fit into imm >> anymore and would have to transfer it via fp->aux->func[off]->bpf_func >> instead. > > If we are relying on BPF programs to remain within 128 MB of each > other, then we already have a potential problem, given that the > module_alloc() spills over into a 4 GB window if the 128 MB window is > exhausted. Perhaps we should do something like Hmm, good point, presumably you mean this one here fd045f6cd98e ("arm64: add support for module PLTs"). Agree that this needs fixing. > void *bpf_jit_alloc_exec(unsigned long size) { > return __vmalloc_node_range(size, MODULE_ALIGN, > BPF_REGION_START, BPF_REGION_END, > GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, > __builtin_return_address(0)); > } > > and make [BPF_REGION_START, BPF_REGION_END) a separate 128 MB window > at the top of the vmalloc space. That way, it is guaranteed that BPF > programs are within branching range of each other, and we still solve > the original problem. I also like that it becomes impossible to infer > anything about the state of the vmalloc space, placement of the kernel > and modules etc from the placement of the BPF programs (in case it > leaks this information in one way or the other) > > That would only give you space for 128M/4K == 32768 programs (or > 128M/64K == 2048 on 64k pages kernels). So I guess we'd still need a Note that it's 4k BPF insns which do not map 1:1, if possible I'd actually prefer if we could enlarge this space a bit. > spillover window as well, in which case we'd need a fix for the > BPF-to-BPF branching issue (but we need that at the moment anyway) Yeah, or spillover window instead. I think as a fix starting out with its own region without the spillover window would be less complex and better suited for stable? Thanks, Daniel