2021-09-24 12:13:34

by Lorenz Bauer

[permalink] [raw]
Subject: [PATCH bpf-next 4/4] bpf: export bpf_jit_current

Expose bpf_jit_current as a read only value via sysctl.

Signed-off-by: Lorenz Bauer <[email protected]>
---
include/linux/filter.h | 1 +
kernel/bpf/core.c | 3 +--
net/core/sysctl_net_core.c | 7 +++++++
3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ef03ff34234d..b2143ad5ce00 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1052,6 +1052,7 @@ extern int bpf_jit_harden;
extern int bpf_jit_kallsyms;
extern long bpf_jit_limit;
extern long bpf_jit_limit_max;
+extern atomic_long_t bpf_jit_current;

typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e844a2a4c99a..93f95e9ee8be 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -525,6 +525,7 @@ int bpf_jit_kallsyms __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON);
int bpf_jit_harden __read_mostly;
long bpf_jit_limit __read_mostly;
long bpf_jit_limit_max __read_mostly;
+atomic_long_t bpf_jit_current __read_mostly;

static void
bpf_prog_ksym_set_addr(struct bpf_prog *prog)
@@ -800,8 +801,6 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
return slot;
}

-static atomic_long_t bpf_jit_current;
-
/* Can be overridden by an arch's JIT compiler if it has a custom,
* dedicated BPF backend memory area, or if neither of the two
* below apply.
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 5f88526ad61c..674aac163b84 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -421,6 +421,13 @@ static struct ctl_table net_core_table[] = {
.extra1 = &long_one,
.extra2 = &bpf_jit_limit_max,
},
+ {
+ .procname = "bpf_jit_current",
+ .data = &bpf_jit_current,
+ .maxlen = sizeof(long),
+ .mode = 0400,
+ .proc_handler = proc_dolongvec_minmax_bpf_restricted,
+ },
#endif
{
.procname = "netdev_tstamp_prequeue",
--
2.30.2


2021-09-27 13:36:05

by Daniel Borkmann

[permalink] [raw]
Subject: Re: [PATCH bpf-next 4/4] bpf: export bpf_jit_current

On 9/24/21 11:55 AM, Lorenz Bauer wrote:
> Expose bpf_jit_current as a read only value via sysctl.
>
> Signed-off-by: Lorenz Bauer <[email protected]>
> ---
> include/linux/filter.h | 1 +
> kernel/bpf/core.c | 3 +--
> net/core/sysctl_net_core.c | 7 +++++++
> 3 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ef03ff34234d..b2143ad5ce00 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1052,6 +1052,7 @@ extern int bpf_jit_harden;
> extern int bpf_jit_kallsyms;
> extern long bpf_jit_limit;
> extern long bpf_jit_limit_max;
> +extern atomic_long_t bpf_jit_current;
>
> typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
>
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index e844a2a4c99a..93f95e9ee8be 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -525,6 +525,7 @@ int bpf_jit_kallsyms __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON);
> int bpf_jit_harden __read_mostly;
> long bpf_jit_limit __read_mostly;
> long bpf_jit_limit_max __read_mostly;
> +atomic_long_t bpf_jit_current __read_mostly;
>
> static void
> bpf_prog_ksym_set_addr(struct bpf_prog *prog)
> @@ -800,8 +801,6 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
> return slot;
> }
>
> -static atomic_long_t bpf_jit_current;
> -
> /* Can be overridden by an arch's JIT compiler if it has a custom,
> * dedicated BPF backend memory area, or if neither of the two
> * below apply.
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index 5f88526ad61c..674aac163b84 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -421,6 +421,13 @@ static struct ctl_table net_core_table[] = {
> .extra1 = &long_one,
> .extra2 = &bpf_jit_limit_max,
> },
> + {
> + .procname = "bpf_jit_current",
> + .data = &bpf_jit_current,
> + .maxlen = sizeof(long),
> + .mode = 0400,
> + .proc_handler = proc_dolongvec_minmax_bpf_restricted,

Overall series looks good to me. The only nit I would have is that the above could (in theory)
be subject to atomic_long_t vs long type confusion. I would rather prefer to have a small handler
which properly reads out the atomic_long_t and then passes it onwards as a temporary/plain long
to user space.

Thanks,
Daniel

> + },
> #endif
> {
> .procname = "netdev_tstamp_prequeue",
>

2021-09-27 14:04:32

by Jakub Sitnicki

[permalink] [raw]
Subject: Re: [PATCH bpf-next 4/4] bpf: export bpf_jit_current

On Mon, Sep 27, 2021 at 03:34 PM CEST, Daniel Borkmann wrote:
> On 9/24/21 11:55 AM, Lorenz Bauer wrote:
>> Expose bpf_jit_current as a read only value via sysctl.
>> Signed-off-by: Lorenz Bauer <[email protected]>
>> ---

I find exposing stats via system configuration variables a bit
unexpected. Not sure if there is any example today that we're following.

Maybe an entry under /sys/kernel/debug would be a better fit?

That way we don't have to commit to a sysctl that might go away if we
start charging JIT allocs against memory cgroup quota.

Although that brings up question against which cgroup iptables xt_bpf
allocations should be charged? Root cgroup?

2021-09-28 09:07:18

by Lorenz Bauer

[permalink] [raw]
Subject: Re: [PATCH bpf-next 4/4] bpf: export bpf_jit_current

On Mon, 27 Sept 2021 at 15:01, Jakub Sitnicki <[email protected]> wrote:
>
> I find exposing stats via system configuration variables a bit
> unexpected. Not sure if there is any example today that we're following.
>
> Maybe an entry under /sys/kernel/debug would be a better fit?
>
> That way we don't have to commit to a sysctl that might go away if we
> start charging JIT allocs against memory cgroup quota.

I had a look around, there are no other obvious places in debugfs or
proc where we already have bpf info exposed. It currently all goes via
sysctl.

There are examples of readonly sysctls:
$ sudo find /proc/sys -perm 0444 | wc -l
90

There are no examples of sysctls with mode 0400 however:
$ sudo find /proc/sys -perm 0400 | wc -l
0

I find it kind of weird that the bpf sysctls are so tightly locked
down (CAP_SYS_ADMIN && root) even for reading. Maybe something I can
change?

--
Lorenz Bauer | Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

http://www.cloudflare.com

2021-09-29 15:00:58

by Nicolas Dichtel

[permalink] [raw]
Subject: Re: [PATCH bpf-next 4/4] bpf: export bpf_jit_current

Le 24/09/2021 à 11:55, Lorenz Bauer a écrit :
> Expose bpf_jit_current as a read only value via sysctl.
bpf_jit_current unit is 'pages' and bpf_jit_limit unit is bytes.
Maybe exposing those values with the same unit will ease debugging.


Regards,
Nicolas