Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp1497411imu; Thu, 22 Nov 2018 17:41:53 -0800 (PST) X-Google-Smtp-Source: AJdET5dkbd87eCJKq0SXFYhaiDkoi5sOAU9gJQFr43n/r/5h/e9sQAgDLZP+7iUsmCoKoweu5F03 X-Received: by 2002:a62:5e83:: with SMTP id s125-v6mr13653047pfb.232.1542937313806; Thu, 22 Nov 2018 17:41:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1542937313; cv=none; d=google.com; s=arc-20160816; b=Bk6PLrrVQt+DRPmmebz4fYe8uRojPKHsn6nNFl9jUJCwQVxj9Ipa8l3GzxWHGURTtS JtAICOUm2GVWHsDJGD3WinDjc2xvApDzDBnKNytjyifzOfPqid1LbYgUf5ATeVYGdBZ0 usUnywHZrjxDh8aTkRo0yb4oWRylsIdZlHOwbOJsGjpJkhYWkwBj3uxfUskNbXJU/KyD FRddK92zte30jd1QdV/P/3Rq4exAf4ed19pjMtAB82H9wPndcxnOkc0mXppISK2td9jb RaBcUQGZa4A7GuAVn1AW2dSecczCzJfhfx7k6+9KhhU11sgvsESpz7UE471QGTQK5upQ 9hIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=GK/MWZZCGIGjTgrc3mmKl7oPXbkNjzS5FgkXVomfxG8=; b=Kh/8koVq2lQNsaFzux4DSxX1s+znZdvNT+uL4+uaipHWaeVTOxCmU51XKLkWnmUkB9 aQuNV3WmRls/T3hMptGql2H1kESW91tO81T1ZVktjN0xcDDsfCsRo7qaZzx0JTOi14pB 8GV57E/rsjJfphHexbuN2vcLrUUSYH+GpG241mWo5PQGOLiSHvZSuk/VtFMzJkApUKWW VdXLX6qlyHC9hnSsObs5pTeY85KzYiD4sHmLAXFVNMv2Instg3B2UgIS0wi2riFRk5o6 MN+OAbvYDw9RqNJVeA3VhPz0W/8WPtvK87auhEnxb98quQPK2wtrSy8TG3JMNAzZWa8J 0Mfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="J/pDGpSS"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c31si35448031pgc.465.2018.11.22.17.41.07; Thu, 22 Nov 2018 17:41:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="J/pDGpSS"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2405052AbeKVSlH (ORCPT + 99 others); Thu, 22 Nov 2018 13:41:07 -0500 Received: from mail-it1-f194.google.com ([209.85.166.194]:40555 "EHLO mail-it1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2392840AbeKVSlH (ORCPT ); Thu, 22 Nov 2018 13:41:07 -0500 Received: by mail-it1-f194.google.com with SMTP id h193so12639768ita.5 for ; Thu, 22 Nov 2018 00:02:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=GK/MWZZCGIGjTgrc3mmKl7oPXbkNjzS5FgkXVomfxG8=; b=J/pDGpSSTb4PDr3ntrTCOitxIUuQet0KlMJSmw73StHO1heJev9R8vzV6TgYeFOKj5 XVDOZHGR0F3bU6WWFJEPjgoRim61B4GIeUG5ag9z2L9wMyb2iejfq+KLwSWN6LVHIYBG E5Wt0rYJI5DwvlRU4ZK7Kqz3utzqN7TMDbmQc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=GK/MWZZCGIGjTgrc3mmKl7oPXbkNjzS5FgkXVomfxG8=; b=YIyqQKCOyIDFdaCpenlvrITAEgmxoGkYXEKo1DoOQeLGWzoGxj7aqvDfw8kxHnX2Q9 7ma/VJai93SIayddrQDYtq9BOmeOx4U9HOtHEIyH4KNgPWiDGyxjjCVuvq7qTs/qEu6D gc+w5qRbew+ZX32azdci0M0cEazcTlHhKjV8IU/ANy5fcKLwS7a//pfXl9vbQfkUQY3/ j+4SCUXWfqja3LTGbi4Osvx8eAL0zPBZv8r6zYeLevpnGIUuJjaCVTopuOWz7ffifHyR uuslybemTtLP++B5/U6HYkD4cYKWnYNZly429q4OS+hTwjLVfLMnC5MlIh+wpOcc6bsn b9Aw== X-Gm-Message-State: AGRZ1gKwB+SmF/6Sba/43xQ4xchaBpvMeLBVBMVqiS80zd2jXi7zy23O g2uZjWpSCCIUPGKR5ekiRvseh1IfPXetnC61MbNK9g== X-Received: by 2002:a02:8449:: with SMTP id l9-v6mr8742684jah.130.1542873768543; Thu, 22 Nov 2018 00:02:48 -0800 (PST) MIME-Version: 1.0 References: <20181121131733.14910-1-ard.biesheuvel@linaro.org> <20181121131733.14910-3-ard.biesheuvel@linaro.org> <945415e1-0ff8-65ce-15fa-33cea0a7d1c9@iogearbox.net> In-Reply-To: <945415e1-0ff8-65ce-15fa-33cea0a7d1c9@iogearbox.net> From: Ard Biesheuvel Date: Thu, 22 Nov 2018 09:02:39 +0100 Message-ID: Subject: Re: [PATCH v2 2/2] arm64/bpf: don't allocate BPF JIT programs in module memory To: Daniel Borkmann Cc: linux-arm-kernel , Alexei Starovoitov , Rick Edgecombe , Eric Dumazet , Jann Horn , Kees Cook , Jessica Yu , Arnd Bergmann , Catalin Marinas , Will Deacon , Mark Rutland , "David S. Miller" , Linux Kernel Mailing List , "" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 22 Nov 2018 at 00:20, Daniel Borkmann wrote: > > On 11/21/2018 02:17 PM, Ard Biesheuvel wrote: > > The arm64 module region is a 128 MB region that is kept close to > > the core kernel, in order to ensure that relative branches are > > always in range. So using the same region for programs that do > > not have this restriction is wasteful, and preferably avoided. > > > > Now that the core BPF JIT code permits the alloc/free routines to > > be overridden, implement them by simple vmalloc_exec()/vfree() > > calls, which can be served from anywere. This also solves an > > issue under KASAN, where shadow memory is needlessly allocated for > > all BPF programs (which don't require KASAN shadow pages since > > they are not KASAN instrumented) > > > > Signed-off-by: Ard Biesheuvel > > --- > > arch/arm64/net/bpf_jit_comp.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c > > index a6fdaea07c63..f91b7c157841 100644 > > --- a/arch/arm64/net/bpf_jit_comp.c > > +++ b/arch/arm64/net/bpf_jit_comp.c > > @@ -940,3 +940,13 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) > > tmp : orig_prog); > > return prog; > > } > > + > > +void *bpf_jit_alloc_exec(unsigned long size) > > +{ > > + return vmalloc_exec(size); > > +} > > + > > +void bpf_jit_free_exec(const void *addr) > > +{ > > + return vfree(size); > > +} > > Hmm, could you elaborate in the commit log on the potential performance > regression for JITed progs on arm64 after this change? > This does not affect the generated code, so I don't anticipate a performance hit. Did you have anything in particular in mind? > I think this change would also break JITing of BPF to BPF calls. You might > have the same issue as ppc64 folks where the offset might not fit into imm > anymore and would have to transfer it via fp->aux->func[off]->bpf_func > instead. If we are relying on BPF programs to remain within 128 MB of each other, then we already have a potential problem, given that the module_alloc() spills over into a 4 GB window if the 128 MB window is exhausted. Perhaps we should do something like void *bpf_jit_alloc_exec(unsigned long size) { return __vmalloc_node_range(size, MODULE_ALIGN, BPF_REGION_START, BPF_REGION_END, GFP_KERNEL, PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, __builtin_return_address(0)); } and make [BPF_REGION_START, BPF_REGION_END) a separate 128 MB window at the top of the vmalloc space. That way, it is guaranteed that BPF programs are within branching range of each other, and we still solve the original problem. I also like that it becomes impossible to infer anything about the state of the vmalloc space, placement of the kernel and modules etc from the placement of the BPF programs (in case it leaks this information in one way or the other) That would only give you space for 128M/4K == 32768 programs (or 128M/64K == 2048 on 64k pages kernels). So I guess we'd still need a spillover window as well, in which case we'd need a fix for the BPF-to-BPF branching issue (but we need that at the moment anyway)