On arm64, modules are allocated from a 128 MB window which is close to
the core kernel, so that relative direct branches are guaranteed to be
in range (except in some KASLR configurations). Also, module_alloc()
is in charge of allocating KASAN shadow memory when running with KASAN
enabled.
This means that the way BPF reuses module_alloc()/module_memfree() is
undesirable on arm64 (and potentially other architectures as well),
and so this series refactors BPF's use of those functions to permit
architectures to change this behavior.
Patch #1 fixes a bug introduced during the merge window, where the new
alloc/free tracking does not account for memory that is freed by some
arch code.
Patch #2 refactors the freeing path so that architectures can switch to
something other than module_memfree().
Patch #3 does the same for module_alloc().
Patch #4 implements the new alloc/free overrides for arm64
Cc: Daniel Borkmann <[email protected]>
Cc: Alexei Starovoitov <[email protected]>
Cc: Rick Edgecombe <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Ard Biesheuvel (4):
bpf: account for freed JIT allocations in arch code
net/bpf: refactor freeing of executable allocations
bpf: add __weak hook for allocating executable memory
arm64/bpf: don't allocate BPF JIT programs in module memory
arch/arm64/net/bpf_jit_comp.c | 11 ++++++++++
arch/mips/net/bpf_jit.c | 7 ++-----
arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
arch/powerpc/net/bpf_jit_comp64.c | 12 +++--------
arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
kernel/bpf/core.c | 22 ++++++++++----------
6 files changed, 31 insertions(+), 35 deletions(-)
--
2.17.1
The arm64 module region is a 128 MB region that is kept close to
the core kernel, in order to ensure that relative branches are
always in range. So using the same region for programs that do
not have this restriction is wasteful, and preferably avoided.
Now that the core BPF JIT code permits the alloc/free routines to
be overridden, implement them by simple vmalloc_exec()/vfree()
calls, which can be served from anywere. This also solves an
issue under KASAN, where shadow memory is needlessly allocated for
all BPF programs (which don't require KASAN shadow pages since
they are not KASAN instrumented)
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm64/net/bpf_jit_comp.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index a6fdaea07c63..e0c702c2f682 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -940,3 +940,14 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
tmp : orig_prog);
return prog;
}
+
+void *bpf_jit_alloc_exec(unsigned long size)
+{
+ return vmalloc_exec(size);
+}
+
+void bpf_jit_binary_free(struct bpf_binary_header *hdr)
+{
+ bpf_jit_binary_unlock_ro(hdr);
+ vfree(hdr);
+}
--
2.17.1
By default, BPF uses module_alloc() to allocate executable memory,
but this is not necessary on all arches and potentially undesirable
on some of them.
So break out the module_alloc() call into a __weak function to allow
it to be overridden in arch code.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
kernel/bpf/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29f766dac203..156d6b96ac6c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -609,6 +609,11 @@ static void bpf_jit_uncharge_modmem(u32 pages)
atomic_long_sub(pages, &bpf_jit_current);
}
+void *__weak bpf_jit_alloc_exec(unsigned long size)
+{
+ return module_alloc(size);
+}
+
struct bpf_binary_header *
bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
unsigned int alignment,
@@ -626,7 +631,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
if (bpf_jit_charge_modmem(pages))
return NULL;
- hdr = module_alloc(size);
+ hdr = bpf_jit_alloc_exec(size);
if (!hdr) {
bpf_jit_uncharge_modmem(pages);
return NULL;
--
2.17.1
All arch overrides of the __weak bpf_jit_free() amount to the same
thing: the allocated memory was never mapped read-only, and so
it does not have to be remapped to read-write before being freed.
So in preparation of permitting arches to serve allocations for BPF
JIT programs from other regions than the module region, refactor
the existing bpf_jit_free() implementations to use the shared code
where possible, and only specialize the remap and free operations.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/mips/net/bpf_jit.c | 7 ++-----
arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
arch/powerpc/net/bpf_jit_comp64.c | 9 +++------
arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
kernel/bpf/core.c | 15 +++++----------
5 files changed, 14 insertions(+), 31 deletions(-)
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 1b69897274a1..5696bd7dccc7 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -1261,10 +1261,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
kfree(ctx.offsets);
}
-void bpf_jit_free(struct bpf_prog *fp)
+void bpf_jit_binary_free(struct bpf_binary_header *hdr)
{
- if (fp->jited)
- bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
-
- bpf_prog_unlock_free(fp);
+ module_memfree(hdr);
}
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index a1ea1ea6b40d..5b5ce4a1b44b 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -680,10 +680,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
return;
}
-void bpf_jit_free(struct bpf_prog *fp)
+void bpf_jit_binary_free(struct bpf_binary_header *hdr)
{
- if (fp->jited)
- bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
-
- bpf_prog_unlock_free(fp);
+ module_memfree(hdr);
}
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 84c8f013a6c6..f64f1294bd62 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1021,11 +1021,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
return fp;
}
-/* Overriding bpf_jit_free() as we don't set images read-only. */
-void bpf_jit_free(struct bpf_prog *fp)
+/* Overriding bpf_jit_binary_free() as we don't set images read-only. */
+void bpf_jit_binary_free(struct bpf_binary_header *hdr)
{
- if (fp->jited)
- bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
-
- bpf_prog_unlock_free(fp);
+ module_memfree(hdr);
}
diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
index 01bda6bc9e7f..589950d152cc 100644
--- a/arch/sparc/net/bpf_jit_comp_32.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -756,10 +756,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
return;
}
-void bpf_jit_free(struct bpf_prog *fp)
+void bpf_jit_binary_free(struct bpf_binary_header *hdr)
{
- if (fp->jited)
- bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
-
- bpf_prog_unlock_free(fp);
+ module_memfree(hdr);
}
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1a796e0799ec..29f766dac203 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -646,25 +646,20 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
return hdr;
}
-void bpf_jit_binary_free(struct bpf_binary_header *hdr)
+void __weak bpf_jit_binary_free(struct bpf_binary_header *hdr)
{
- u32 pages = hdr->pages;
-
+ bpf_jit_binary_unlock_ro(hdr);
module_memfree(hdr);
- bpf_jit_uncharge_modmem(pages);
}
-/* This symbol is only overridden by archs that have different
- * requirements than the usual eBPF JITs, f.e. when they only
- * implement cBPF JIT, do not set images read-only, etc.
- */
-void __weak bpf_jit_free(struct bpf_prog *fp)
+void bpf_jit_free(struct bpf_prog *fp)
{
if (fp->jited) {
struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
+ u32 pages = hdr->pages;
- bpf_jit_binary_unlock_ro(hdr);
bpf_jit_binary_free(hdr);
+ bpf_jit_uncharge_modmem(pages);
WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
}
--
2.17.1
Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
allocations") added a call to bpf_jit_uncharge_modmem() to the routine
bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
This function is overridden by arches, some of which do not call
bpf_jit_binary_free() to release the memory, and so the released
memory is not accounted for, potentially leading to spurious allocation
failures.
So replace the direct calls to module_memfree() in the arch code with
calls to bpf_jit_binary_free().
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/mips/net/bpf_jit.c | 2 +-
arch/powerpc/net/bpf_jit_comp.c | 2 +-
arch/powerpc/net/bpf_jit_comp64.c | 5 +----
arch/sparc/net/bpf_jit_comp_32.c | 2 +-
4 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 4d8cb9bb8365..1b69897274a1 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
void bpf_jit_free(struct bpf_prog *fp)
{
if (fp->jited)
- module_memfree(fp->bpf_func);
+ bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
bpf_prog_unlock_free(fp);
}
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index d5bfe24bb3b5..a1ea1ea6b40d 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
void bpf_jit_free(struct bpf_prog *fp)
{
if (fp->jited)
- module_memfree(fp->bpf_func);
+ bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
bpf_prog_unlock_free(fp);
}
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 50b129785aee..84c8f013a6c6 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
/* Overriding bpf_jit_free() as we don't set images read-only. */
void bpf_jit_free(struct bpf_prog *fp)
{
- unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
- struct bpf_binary_header *bpf_hdr = (void *)addr;
-
if (fp->jited)
- bpf_jit_binary_free(bpf_hdr);
+ bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
bpf_prog_unlock_free(fp);
}
diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
index a5ff88643d5c..01bda6bc9e7f 100644
--- a/arch/sparc/net/bpf_jit_comp_32.c
+++ b/arch/sparc/net/bpf_jit_comp_32.c
@@ -759,7 +759,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
void bpf_jit_free(struct bpf_prog *fp)
{
if (fp->jited)
- module_memfree(fp->bpf_func);
+ bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
bpf_prog_unlock_free(fp);
}
--
2.17.1
On Sat, Nov 17, 2018 at 6:58 PM Ard Biesheuvel
<[email protected]> wrote:
>
> All arch overrides of the __weak bpf_jit_free() amount to the same
> thing: the allocated memory was never mapped read-only, and so
> it does not have to be remapped to read-write before being freed.
>
> So in preparation of permitting arches to serve allocations for BPF
> JIT programs from other regions than the module region, refactor
> the existing bpf_jit_free() implementations to use the shared code
> where possible, and only specialize the remap and free operations.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/mips/net/bpf_jit.c | 7 ++-----
> arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
> arch/powerpc/net/bpf_jit_comp64.c | 9 +++------
> arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
> kernel/bpf/core.c | 15 +++++----------
> 5 files changed, 14 insertions(+), 31 deletions(-)
>
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 1b69897274a1..5696bd7dccc7 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1261,10 +1261,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> kfree(ctx.offsets);
> }
>
> -void bpf_jit_free(struct bpf_prog *fp)
> +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> {
> - if (fp->jited)
> - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> -
> - bpf_prog_unlock_free(fp);
> + module_memfree(hdr);
> }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index a1ea1ea6b40d..5b5ce4a1b44b 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -680,10 +680,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> return;
> }
>
> -void bpf_jit_free(struct bpf_prog *fp)
> +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> {
> - if (fp->jited)
> - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> -
> - bpf_prog_unlock_free(fp);
> + module_memfree(hdr);
> }
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 84c8f013a6c6..f64f1294bd62 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1021,11 +1021,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> return fp;
> }
>
> -/* Overriding bpf_jit_free() as we don't set images read-only. */
> -void bpf_jit_free(struct bpf_prog *fp)
> +/* Overriding bpf_jit_binary_free() as we don't set images read-only. */
> +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> {
> - if (fp->jited)
> - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> -
> - bpf_prog_unlock_free(fp);
> + module_memfree(hdr);
> }
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
> index 01bda6bc9e7f..589950d152cc 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -756,10 +756,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> return;
> }
>
> -void bpf_jit_free(struct bpf_prog *fp)
> +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> {
> - if (fp->jited)
> - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> -
> - bpf_prog_unlock_free(fp);
> + module_memfree(hdr);
> }
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 1a796e0799ec..29f766dac203 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -646,25 +646,20 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> return hdr;
> }
>
> -void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> +void __weak bpf_jit_binary_free(struct bpf_binary_header *hdr)
> {
> - u32 pages = hdr->pages;
> -
> + bpf_jit_binary_unlock_ro(hdr);
> module_memfree(hdr);
> - bpf_jit_uncharge_modmem(pages);
> }
>
> -/* This symbol is only overridden by archs that have different
> - * requirements than the usual eBPF JITs, f.e. when they only
> - * implement cBPF JIT, do not set images read-only, etc.
> - */
Do you want to move the above comments to
new weak function bpf_jit_binary_free?
> -void __weak bpf_jit_free(struct bpf_prog *fp)
> +void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited) {
> struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
> + u32 pages = hdr->pages;
>
> - bpf_jit_binary_unlock_ro(hdr);
> bpf_jit_binary_free(hdr);
> + bpf_jit_uncharge_modmem(pages);
>
> WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));
> }
> --
> 2.17.1
>
On Sat, Nov 17, 2018 at 6:58 PM Ard Biesheuvel
<[email protected]> wrote:
>
> On arm64, modules are allocated from a 128 MB window which is close to
> the core kernel, so that relative direct branches are guaranteed to be
> in range (except in some KASLR configurations). Also, module_alloc()
> is in charge of allocating KASAN shadow memory when running with KASAN
> enabled.
>
> This means that the way BPF reuses module_alloc()/module_memfree() is
> undesirable on arm64 (and potentially other architectures as well),
> and so this series refactors BPF's use of those functions to permit
> architectures to change this behavior.
>
> Patch #1 fixes a bug introduced during the merge window, where the new
> alloc/free tracking does not account for memory that is freed by some
> arch code.
>
> Patch #2 refactors the freeing path so that architectures can switch to
> something other than module_memfree().
>
> Patch #3 does the same for module_alloc().
>
> Patch #4 implements the new alloc/free overrides for arm64
Except a minor comment, the whole patch set looks good to me.
Acked-by: Yonghong Song <[email protected]>
>
> Cc: Daniel Borkmann <[email protected]>
> Cc: Alexei Starovoitov <[email protected]>
> Cc: Rick Edgecombe <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jann Horn <[email protected]>
> Cc: Kees Cook <[email protected]>
>
> Cc: Jessica Yu <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Catalin Marinas <[email protected]>
> Cc: Will Deacon <[email protected]>
> Cc: Mark Rutland <[email protected]>
> Cc: Ralf Baechle <[email protected]>
> Cc: Paul Burton <[email protected]>
> Cc: James Hogan <[email protected]>
> Cc: Benjamin Herrenschmidt <[email protected]>
> Cc: Paul Mackerras <[email protected]>
> Cc: Michael Ellerman <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
>
> Ard Biesheuvel (4):
> bpf: account for freed JIT allocations in arch code
> net/bpf: refactor freeing of executable allocations
> bpf: add __weak hook for allocating executable memory
> arm64/bpf: don't allocate BPF JIT programs in module memory
>
> arch/arm64/net/bpf_jit_comp.c | 11 ++++++++++
> arch/mips/net/bpf_jit.c | 7 ++-----
> arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
> arch/powerpc/net/bpf_jit_comp64.c | 12 +++--------
> arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
> kernel/bpf/core.c | 22 ++++++++++----------
> 6 files changed, 31 insertions(+), 35 deletions(-)
>
> --
> 2.17.1
>
On Sat, 17 Nov 2018 at 23:47, Y Song <[email protected]> wrote:
>
> On Sat, Nov 17, 2018 at 6:58 PM Ard Biesheuvel
> <[email protected]> wrote:
> >
> > All arch overrides of the __weak bpf_jit_free() amount to the same
> > thing: the allocated memory was never mapped read-only, and so
> > it does not have to be remapped to read-write before being freed.
> >
> > So in preparation of permitting arches to serve allocations for BPF
> > JIT programs from other regions than the module region, refactor
> > the existing bpf_jit_free() implementations to use the shared code
> > where possible, and only specialize the remap and free operations.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/mips/net/bpf_jit.c | 7 ++-----
> > arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
> > arch/powerpc/net/bpf_jit_comp64.c | 9 +++------
> > arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
> > kernel/bpf/core.c | 15 +++++----------
> > 5 files changed, 14 insertions(+), 31 deletions(-)
> >
> > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > index 1b69897274a1..5696bd7dccc7 100644
> > --- a/arch/mips/net/bpf_jit.c
> > +++ b/arch/mips/net/bpf_jit.c
> > @@ -1261,10 +1261,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > kfree(ctx.offsets);
> > }
> >
> > -void bpf_jit_free(struct bpf_prog *fp)
> > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > {
> > - if (fp->jited)
> > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > -
> > - bpf_prog_unlock_free(fp);
> > + module_memfree(hdr);
> > }
> > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> > index a1ea1ea6b40d..5b5ce4a1b44b 100644
> > --- a/arch/powerpc/net/bpf_jit_comp.c
> > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > @@ -680,10 +680,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > return;
> > }
> >
> > -void bpf_jit_free(struct bpf_prog *fp)
> > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > {
> > - if (fp->jited)
> > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > -
> > - bpf_prog_unlock_free(fp);
> > + module_memfree(hdr);
> > }
> > diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> > index 84c8f013a6c6..f64f1294bd62 100644
> > --- a/arch/powerpc/net/bpf_jit_comp64.c
> > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > @@ -1021,11 +1021,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> > return fp;
> > }
> >
> > -/* Overriding bpf_jit_free() as we don't set images read-only. */
> > -void bpf_jit_free(struct bpf_prog *fp)
> > +/* Overriding bpf_jit_binary_free() as we don't set images read-only. */
> > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > {
> > - if (fp->jited)
> > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > -
> > - bpf_prog_unlock_free(fp);
> > + module_memfree(hdr);
> > }
> > diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
> > index 01bda6bc9e7f..589950d152cc 100644
> > --- a/arch/sparc/net/bpf_jit_comp_32.c
> > +++ b/arch/sparc/net/bpf_jit_comp_32.c
> > @@ -756,10 +756,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> > return;
> > }
> >
> > -void bpf_jit_free(struct bpf_prog *fp)
> > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > {
> > - if (fp->jited)
> > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > -
> > - bpf_prog_unlock_free(fp);
> > + module_memfree(hdr);
> > }
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 1a796e0799ec..29f766dac203 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -646,25 +646,20 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > return hdr;
> > }
> >
> > -void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > +void __weak bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > {
> > - u32 pages = hdr->pages;
> > -
> > + bpf_jit_binary_unlock_ro(hdr);
> > module_memfree(hdr);
> > - bpf_jit_uncharge_modmem(pages);
> > }
> >
> > -/* This symbol is only overridden by archs that have different
> > - * requirements than the usual eBPF JITs, f.e. when they only
> > - * implement cBPF JIT, do not set images read-only, etc.
> > - */
>
> Do you want to move the above comments to
> new weak function bpf_jit_binary_free?
>
Perhaps. But one thing I don't understand, looking at this again, is
why we have these overrides in the first place. module_memfree() just
calls vfree(), which takes down the mapping entirely (along with any
updated permissions), and so remapping it back to r/w right before
that seems rather pointless imo.
Can we get rid of bpf_jit_binary_unlock_ro() entirely, and along with
it, all these overrides for the free() path?
On Sun, Nov 18, 2018 at 3:55 PM Ard Biesheuvel
<[email protected]> wrote:
>
> On Sat, 17 Nov 2018 at 23:47, Y Song <[email protected]> wrote:
> >
> > On Sat, Nov 17, 2018 at 6:58 PM Ard Biesheuvel
> > <[email protected]> wrote:
> > >
> > > All arch overrides of the __weak bpf_jit_free() amount to the same
> > > thing: the allocated memory was never mapped read-only, and so
> > > it does not have to be remapped to read-write before being freed.
> > >
> > > So in preparation of permitting arches to serve allocations for BPF
> > > JIT programs from other regions than the module region, refactor
> > > the existing bpf_jit_free() implementations to use the shared code
> > > where possible, and only specialize the remap and free operations.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > arch/mips/net/bpf_jit.c | 7 ++-----
> > > arch/powerpc/net/bpf_jit_comp.c | 7 ++-----
> > > arch/powerpc/net/bpf_jit_comp64.c | 9 +++------
> > > arch/sparc/net/bpf_jit_comp_32.c | 7 ++-----
> > > kernel/bpf/core.c | 15 +++++----------
> > > 5 files changed, 14 insertions(+), 31 deletions(-)
> > >
> > > diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> > > index 1b69897274a1..5696bd7dccc7 100644
> > > --- a/arch/mips/net/bpf_jit.c
> > > +++ b/arch/mips/net/bpf_jit.c
> > > @@ -1261,10 +1261,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > > kfree(ctx.offsets);
> > > }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > {
> > > - if (fp->jited)
> > > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > - bpf_prog_unlock_free(fp);
> > > + module_memfree(hdr);
> > > }
> > > diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> > > index a1ea1ea6b40d..5b5ce4a1b44b 100644
> > > --- a/arch/powerpc/net/bpf_jit_comp.c
> > > +++ b/arch/powerpc/net/bpf_jit_comp.c
> > > @@ -680,10 +680,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> > > return;
> > > }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > {
> > > - if (fp->jited)
> > > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > - bpf_prog_unlock_free(fp);
> > > + module_memfree(hdr);
> > > }
> > > diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> > > index 84c8f013a6c6..f64f1294bd62 100644
> > > --- a/arch/powerpc/net/bpf_jit_comp64.c
> > > +++ b/arch/powerpc/net/bpf_jit_comp64.c
> > > @@ -1021,11 +1021,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> > > return fp;
> > > }
> > >
> > > -/* Overriding bpf_jit_free() as we don't set images read-only. */
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +/* Overriding bpf_jit_binary_free() as we don't set images read-only. */
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > {
> > > - if (fp->jited)
> > > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > - bpf_prog_unlock_free(fp);
> > > + module_memfree(hdr);
> > > }
> > > diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
> > > index 01bda6bc9e7f..589950d152cc 100644
> > > --- a/arch/sparc/net/bpf_jit_comp_32.c
> > > +++ b/arch/sparc/net/bpf_jit_comp_32.c
> > > @@ -756,10 +756,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> > > return;
> > > }
> > >
> > > -void bpf_jit_free(struct bpf_prog *fp)
> > > +void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > {
> > > - if (fp->jited)
> > > - bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
> > > -
> > > - bpf_prog_unlock_free(fp);
> > > + module_memfree(hdr);
> > > }
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 1a796e0799ec..29f766dac203 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -646,25 +646,20 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > > return hdr;
> > > }
> > >
> > > -void bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > +void __weak bpf_jit_binary_free(struct bpf_binary_header *hdr)
> > > {
> > > - u32 pages = hdr->pages;
> > > -
> > > + bpf_jit_binary_unlock_ro(hdr);
> > > module_memfree(hdr);
> > > - bpf_jit_uncharge_modmem(pages);
> > > }
> > >
> > > -/* This symbol is only overridden by archs that have different
> > > - * requirements than the usual eBPF JITs, f.e. when they only
> > > - * implement cBPF JIT, do not set images read-only, etc.
> > > - */
> >
> > Do you want to move the above comments to
> > new weak function bpf_jit_binary_free?
> >
>
> Perhaps. But one thing I don't understand, looking at this again, is
> why we have these overrides in the first place. module_memfree() just
> calls vfree(), which takes down the mapping entirely (along with any
> updated permissions), and so remapping it back to r/w right before
> that seems rather pointless imo.
>
> Can we get rid of bpf_jit_binary_unlock_ro() entirely, and along with
> it, all these overrides for the free() path?
Maybe based on current implementation. Just a pure speculation.
module_memfree() can be overwritten by arch specific implementation.
The intention could be restoring the allocated page to its original permission
just in case arch specific implementation of module_memfree()
does different thing than default vfee().
On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> This function is overridden by arches, some of which do not call
> bpf_jit_binary_free() to release the memory, and so the released
> memory is not accounted for, potentially leading to spurious allocation
> failures.
>
> So replace the direct calls to module_memfree() in the arch code with
> calls to bpf_jit_binary_free().
Sorry but this patch is completely buggy, and above description on the
accounting incorrect as well. Looks like this patch was not tested at all.
The below cBPF JITs that use module_memfree() which you replace with
bpf_jit_binary_free() are using module_alloc() internally to get the JIT
image buffer ...
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/mips/net/bpf_jit.c | 2 +-
> arch/powerpc/net/bpf_jit_comp.c | 2 +-
> arch/powerpc/net/bpf_jit_comp64.c | 5 +----
> arch/sparc/net/bpf_jit_comp_32.c | 2 +-
> 4 files changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
> index 4d8cb9bb8365..1b69897274a1 100644
> --- a/arch/mips/net/bpf_jit.c
> +++ b/arch/mips/net/bpf_jit.c
> @@ -1264,7 +1264,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
> index d5bfe24bb3b5..a1ea1ea6b40d 100644
> --- a/arch/powerpc/net/bpf_jit_comp.c
> +++ b/arch/powerpc/net/bpf_jit_comp.c
> @@ -683,7 +683,7 @@ void bpf_jit_compile(struct bpf_prog *fp)
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 50b129785aee..84c8f013a6c6 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -1024,11 +1024,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
> /* Overriding bpf_jit_free() as we don't set images read-only. */
> void bpf_jit_free(struct bpf_prog *fp)
> {
> - unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
> - struct bpf_binary_header *bpf_hdr = (void *)addr;
> -
> if (fp->jited)
> - bpf_jit_binary_free(bpf_hdr);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>
> bpf_prog_unlock_free(fp);
> }
> diff --git a/arch/sparc/net/bpf_jit_comp_32.c b/arch/sparc/net/bpf_jit_comp_32.c
> index a5ff88643d5c..01bda6bc9e7f 100644
> --- a/arch/sparc/net/bpf_jit_comp_32.c
> +++ b/arch/sparc/net/bpf_jit_comp_32.c
> @@ -759,7 +759,7 @@ cond_branch: f_offset = addrs[i + filter[i].jf];
> void bpf_jit_free(struct bpf_prog *fp)
> {
> if (fp->jited)
> - module_memfree(fp->bpf_func);
> + bpf_jit_binary_free(bpf_jit_binary_hdr(fp));
>
> bpf_prog_unlock_free(fp);
> }
>
On Mon, 19 Nov 2018 at 02:37, Daniel Borkmann <[email protected]> wrote:
>
> On 11/17/2018 07:57 PM, Ard Biesheuvel wrote:
> > Commit ede95a63b5e84 ("bpf: add bpf_jit_limit knob to restrict unpriv
> > allocations") added a call to bpf_jit_uncharge_modmem() to the routine
> > bpf_jit_binary_free() which is called from the __weak bpf_jit_free().
> > This function is overridden by arches, some of which do not call
> > bpf_jit_binary_free() to release the memory, and so the released
> > memory is not accounted for, potentially leading to spurious allocation
> > failures.
> >
> > So replace the direct calls to module_memfree() in the arch code with
> > calls to bpf_jit_binary_free().
>
> Sorry but this patch is completely buggy, and above description on the
> accounting incorrect as well. Looks like this patch was not tested at all.
>
My apologies. I went off into the weeds a bit looking at different
versions for 32-bit and 64-bit on different architectures. So indeed,
this patch should be dropped.
> The below cBPF JITs that use module_memfree() which you replace with
> bpf_jit_binary_free() are using module_alloc() internally to get the JIT
> image buffer ...
>
Indeed. So would you prefer for arm64 to override bpf_jit_free() in
its entirety, and not call bpf_jit_binary_free() but simply call
bpf_jit_uncharge_modmem() and vfree() directly? It's either that, or
we'd have to untangle this a bit, to avoid having one __weak function
on top of the other just so other arches can replace the
module_memfree() call in bpf_jit_binary_free() with vfree() (which
amount to the same thing on arm64 anyway)