From: David Miller <[email protected]>
All of these cases are strictly of the form:
preempt_disable();
BPF_PROG_RUN(...);
preempt_enable();
Replace this with BPF_PROG_RUN_PIN_ON_CPU() which wraps BPF_PROG_RUN()
with:
migrate_disable();
BPF_PROG_RUN(...);
migrate_enable();
On non RT enabled kernels this maps to preempt_disable/enable() and on RT
enabled kernels this solely prevents migration, which is sufficient as
there is no requirement to prevent reentrancy to any BPF program from a
preempting task. The only requirement is that the program stays on the same
CPU.
Therefore, this is a trivially correct transformation.
[ tglx: Converted to BPF_PROG_RUN_PIN_ON_CPU() ]
Signed-off-by: David S. Miller <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
---
include/linux/filter.h | 4 +---
kernel/seccomp.c | 4 +---
net/core/flow_dissector.c | 4 +---
net/core/skmsg.c | 8 ++------
net/kcm/kcmsock.c | 4 +---
5 files changed, 6 insertions(+), 18 deletions(-)
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -713,9 +713,7 @@ static inline u32 bpf_prog_run_clear_cb(
if (unlikely(prog->cb_access))
memset(cb_data, 0, BPF_SKB_CB_LEN);
- preempt_disable();
- res = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
return res;
}
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
* All filters in the list are evaluated and the lowest BPF return
* value always takes priority (ignoring the DATA).
*/
- preempt_disable();
for (; f; f = f->prev) {
- u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
+ u32 cur_ret = BPF_PROG_RUN_PIN_ON_CPU(f->prog, sd);
if (ACTION_ONLY(cur_ret) < ACTION_ONLY(ret)) {
ret = cur_ret;
*match = f;
}
}
- preempt_enable();
return ret;
}
#endif /* CONFIG_SECCOMP_FILTER */
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -920,9 +920,7 @@ bool bpf_flow_dissect(struct bpf_prog *p
(int)FLOW_DISSECTOR_F_STOP_AT_ENCAP);
flow_keys->flags = flags;
- preempt_disable();
- result = BPF_PROG_RUN(prog, ctx);
- preempt_enable();
+ result = BPF_PROG_RUN_PIN_ON_CPU(prog, ctx);
flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, nhoff, hlen);
flow_keys->thoff = clamp_t(u16, flow_keys->thoff,
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -628,7 +628,6 @@ int sk_psock_msg_verdict(struct sock *sk
struct bpf_prog *prog;
int ret;
- preempt_disable();
rcu_read_lock();
prog = READ_ONCE(psock->progs.msg_parser);
if (unlikely(!prog)) {
@@ -638,7 +637,7 @@ int sk_psock_msg_verdict(struct sock *sk
sk_msg_compute_data_pointers(msg);
msg->sk = sk;
- ret = BPF_PROG_RUN(prog, msg);
+ ret = BPF_PROG_RUN_PIN_ON_CPU(prog, msg);
ret = sk_psock_map_verd(ret, msg->sk_redir);
psock->apply_bytes = msg->apply_bytes;
if (ret == __SK_REDIRECT) {
@@ -653,7 +652,6 @@ int sk_psock_msg_verdict(struct sock *sk
}
out:
rcu_read_unlock();
- preempt_enable();
return ret;
}
EXPORT_SYMBOL_GPL(sk_psock_msg_verdict);
@@ -665,9 +663,7 @@ static int sk_psock_bpf_run(struct sk_ps
skb->sk = psock->sk;
bpf_compute_data_end_sk_skb(skb);
- preempt_disable();
- ret = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ ret = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
/* strparser clones the skb before handing it to a upper layer,
* meaning skb_orphan has been called. We NULL sk on the way out
* to ensure we don't trigger a BUG_ON() in skb/sk operations
--- a/net/kcm/kcmsock.c
+++ b/net/kcm/kcmsock.c
@@ -380,9 +380,7 @@ static int kcm_parse_func_strparser(stru
struct bpf_prog *prog = psock->bpf_prog;
int res;
- preempt_disable();
- res = BPF_PROG_RUN(prog, skb);
- preempt_enable();
+ res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
return res;
}
Hi,
Thomas Gleixner <[email protected]> writes:
> From: David Miller <[email protected]>
>
> All of these cases are strictly of the form:
>
> preempt_disable();
> BPF_PROG_RUN(...);
> preempt_enable();
>
> Replace this with BPF_PROG_RUN_PIN_ON_CPU() which wraps BPF_PROG_RUN()
> with:
>
> migrate_disable();
> BPF_PROG_RUN(...);
> migrate_enable();
>
> On non RT enabled kernels this maps to preempt_disable/enable() and on RT
> enabled kernels this solely prevents migration, which is sufficient as
> there is no requirement to prevent reentrancy to any BPF program from a
> preempting task. The only requirement is that the program stays on the same
> CPU.
>
> Therefore, this is a trivially correct transformation.
>
> [ tglx: Converted to BPF_PROG_RUN_PIN_ON_CPU() ]
>
> Signed-off-by: David S. Miller <[email protected]>
> Signed-off-by: Thomas Gleixner <[email protected]>
>
> ---
> include/linux/filter.h | 4 +---
> kernel/seccomp.c | 4 +---
> net/core/flow_dissector.c | 4 +---
> net/core/skmsg.c | 8 ++------
> net/kcm/kcmsock.c | 4 +---
> 5 files changed, 6 insertions(+), 18 deletions(-)
>
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -713,9 +713,7 @@ static inline u32 bpf_prog_run_clear_cb(
> if (unlikely(prog->cb_access))
> memset(cb_data, 0, BPF_SKB_CB_LEN);
>
> - preempt_disable();
> - res = BPF_PROG_RUN(prog, skb);
> - preempt_enable();
> + res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
> return res;
> }
>
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
> * All filters in the list are evaluated and the lowest BPF return
> * value always takes priority (ignoring the DATA).
> */
> - preempt_disable();
> for (; f; f = f->prev) {
> - u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> + u32 cur_ret = BPF_PROG_RUN_PIN_ON_CPU(f->prog, sd);
>
More a question really, isn't the behavior changing here? i.e. shouldn't
migrate_disable()/migrate_enable() be moved to outside the loop? Or is
running seccomp filters on different cpus not a problem?
--
Vinicius
Vinicius Costa Gomes <[email protected]> writes:
Cc+: seccomp folks
> Thomas Gleixner <[email protected]> writes:
>
>> From: David Miller <[email protected]>
Leaving content for reference
>> All of these cases are strictly of the form:
>>
>> preempt_disable();
>> BPF_PROG_RUN(...);
>> preempt_enable();
>>
>> Replace this with BPF_PROG_RUN_PIN_ON_CPU() which wraps BPF_PROG_RUN()
>> with:
>>
>> migrate_disable();
>> BPF_PROG_RUN(...);
>> migrate_enable();
>>
>> On non RT enabled kernels this maps to preempt_disable/enable() and on RT
>> enabled kernels this solely prevents migration, which is sufficient as
>> there is no requirement to prevent reentrancy to any BPF program from a
>> preempting task. The only requirement is that the program stays on the same
>> CPU.
>>
>> Therefore, this is a trivially correct transformation.
>>
>> [ tglx: Converted to BPF_PROG_RUN_PIN_ON_CPU() ]
>>
>> Signed-off-by: David S. Miller <[email protected]>
>> Signed-off-by: Thomas Gleixner <[email protected]>
>>
>> ---
>> include/linux/filter.h | 4 +---
>> kernel/seccomp.c | 4 +---
>> net/core/flow_dissector.c | 4 +---
>> net/core/skmsg.c | 8 ++------
>> net/kcm/kcmsock.c | 4 +---
>> 5 files changed, 6 insertions(+), 18 deletions(-)
>>
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -713,9 +713,7 @@ static inline u32 bpf_prog_run_clear_cb(
>> if (unlikely(prog->cb_access))
>> memset(cb_data, 0, BPF_SKB_CB_LEN);
>>
>> - preempt_disable();
>> - res = BPF_PROG_RUN(prog, skb);
>> - preempt_enable();
>> + res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
>> return res;
>> }
>>
>> --- a/kernel/seccomp.c
>> +++ b/kernel/seccomp.c
>> @@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
>> * All filters in the list are evaluated and the lowest BPF return
>> * value always takes priority (ignoring the DATA).
>> */
>> - preempt_disable();
>> for (; f; f = f->prev) {
>> - u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
>> + u32 cur_ret = BPF_PROG_RUN_PIN_ON_CPU(f->prog, sd);
>>
>
> More a question really, isn't the behavior changing here? i.e. shouldn't
> migrate_disable()/migrate_enable() be moved to outside the loop? Or is
> running seccomp filters on different cpus not a problem?
In my understanding this is a list of filters and they are independent
of each other.
Kees, Will. Andy?
Thanks,
tglx
On Wed, Feb 19, 2020 at 1:01 AM Thomas Gleixner <[email protected]> wrote:
>
> Vinicius Costa Gomes <[email protected]> writes:
>
> Cc+: seccomp folks
>
> > Thomas Gleixner <[email protected]> writes:
> >
> >> From: David Miller <[email protected]>
>
> Leaving content for reference
>
> >> All of these cases are strictly of the form:
> >>
> >> preempt_disable();
> >> BPF_PROG_RUN(...);
> >> preempt_enable();
> >>
> >> Replace this with BPF_PROG_RUN_PIN_ON_CPU() which wraps BPF_PROG_RUN()
> >> with:
> >>
> >> migrate_disable();
> >> BPF_PROG_RUN(...);
> >> migrate_enable();
> >>
> >> On non RT enabled kernels this maps to preempt_disable/enable() and on RT
> >> enabled kernels this solely prevents migration, which is sufficient as
> >> there is no requirement to prevent reentrancy to any BPF program from a
> >> preempting task. The only requirement is that the program stays on the same
> >> CPU.
> >>
> >> Therefore, this is a trivially correct transformation.
> >>
> >> [ tglx: Converted to BPF_PROG_RUN_PIN_ON_CPU() ]
> >>
> >> Signed-off-by: David S. Miller <[email protected]>
> >> Signed-off-by: Thomas Gleixner <[email protected]>
> >>
> >> ---
> >> include/linux/filter.h | 4 +---
> >> kernel/seccomp.c | 4 +---
> >> net/core/flow_dissector.c | 4 +---
> >> net/core/skmsg.c | 8 ++------
> >> net/kcm/kcmsock.c | 4 +---
> >> 5 files changed, 6 insertions(+), 18 deletions(-)
> >>
> >> --- a/include/linux/filter.h
> >> +++ b/include/linux/filter.h
> >> @@ -713,9 +713,7 @@ static inline u32 bpf_prog_run_clear_cb(
> >> if (unlikely(prog->cb_access))
> >> memset(cb_data, 0, BPF_SKB_CB_LEN);
> >>
> >> - preempt_disable();
> >> - res = BPF_PROG_RUN(prog, skb);
> >> - preempt_enable();
> >> + res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
> >> return res;
> >> }
> >>
> >> --- a/kernel/seccomp.c
> >> +++ b/kernel/seccomp.c
> >> @@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
> >> * All filters in the list are evaluated and the lowest BPF return
> >> * value always takes priority (ignoring the DATA).
> >> */
> >> - preempt_disable();
> >> for (; f; f = f->prev) {
> >> - u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> >> + u32 cur_ret = BPF_PROG_RUN_PIN_ON_CPU(f->prog, sd);
> >>
> >
> > More a question really, isn't the behavior changing here? i.e. shouldn't
> > migrate_disable()/migrate_enable() be moved to outside the loop? Or is
> > running seccomp filters on different cpus not a problem?
>
> In my understanding this is a list of filters and they are independent
> of each other.
Yes. It's fine to be preempted between filters.
On Wed, Feb 19, 2020 at 10:00:56AM +0100, Thomas Gleixner wrote:
> Vinicius Costa Gomes <[email protected]> writes:
>
> Cc+: seccomp folks
>
> > Thomas Gleixner <[email protected]> writes:
> >
> >> From: David Miller <[email protected]>
>
> Leaving content for reference
>
> >> All of these cases are strictly of the form:
> >>
> >> preempt_disable();
> >> BPF_PROG_RUN(...);
> >> preempt_enable();
> >>
> >> Replace this with BPF_PROG_RUN_PIN_ON_CPU() which wraps BPF_PROG_RUN()
> >> with:
> >>
> >> migrate_disable();
> >> BPF_PROG_RUN(...);
> >> migrate_enable();
> >>
> >> On non RT enabled kernels this maps to preempt_disable/enable() and on RT
> >> enabled kernels this solely prevents migration, which is sufficient as
> >> there is no requirement to prevent reentrancy to any BPF program from a
> >> preempting task. The only requirement is that the program stays on the same
> >> CPU.
> >>
> >> Therefore, this is a trivially correct transformation.
> >>
> >> [ tglx: Converted to BPF_PROG_RUN_PIN_ON_CPU() ]
> >>
> >> Signed-off-by: David S. Miller <[email protected]>
> >> Signed-off-by: Thomas Gleixner <[email protected]>
> >>
> >> ---
> >> include/linux/filter.h | 4 +---
> >> kernel/seccomp.c | 4 +---
> >> net/core/flow_dissector.c | 4 +---
> >> net/core/skmsg.c | 8 ++------
> >> net/kcm/kcmsock.c | 4 +---
> >> 5 files changed, 6 insertions(+), 18 deletions(-)
> >>
> >> --- a/include/linux/filter.h
> >> +++ b/include/linux/filter.h
> >> @@ -713,9 +713,7 @@ static inline u32 bpf_prog_run_clear_cb(
> >> if (unlikely(prog->cb_access))
> >> memset(cb_data, 0, BPF_SKB_CB_LEN);
> >>
> >> - preempt_disable();
> >> - res = BPF_PROG_RUN(prog, skb);
> >> - preempt_enable();
> >> + res = BPF_PROG_RUN_PIN_ON_CPU(prog, skb);
> >> return res;
> >> }
> >>
> >> --- a/kernel/seccomp.c
> >> +++ b/kernel/seccomp.c
> >> @@ -268,16 +268,14 @@ static u32 seccomp_run_filters(const str
> >> * All filters in the list are evaluated and the lowest BPF return
> >> * value always takes priority (ignoring the DATA).
> >> */
> >> - preempt_disable();
> >> for (; f; f = f->prev) {
> >> - u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
> >> + u32 cur_ret = BPF_PROG_RUN_PIN_ON_CPU(f->prog, sd);
> >>
> >
> > More a question really, isn't the behavior changing here? i.e. shouldn't
> > migrate_disable()/migrate_enable() be moved to outside the loop? Or is
> > running seccomp filters on different cpus not a problem?
>
> In my understanding this is a list of filters and they are independent
> of each other.
>
> Kees, Will. Andy?
They're technically independent, but they are related to each
other. (i.e. order matters, process hierarchy matters, etc). There's no
reason I can see that we can't switch CPUs between running them, though.
(AIUI, nothing here would suddenly make these run in parallel, right?)
As long as "current" is still "current", and they run in the same order,
we'll get the same final result as far as seccomp is concerned.
--
Kees Cook
Kees Cook <[email protected]> writes:
> On Wed, Feb 19, 2020 at 10:00:56AM +0100, Thomas Gleixner wrote:
>> Vinicius Costa Gomes <[email protected]> writes:
>> > More a question really, isn't the behavior changing here? i.e. shouldn't
>> > migrate_disable()/migrate_enable() be moved to outside the loop? Or is
>> > running seccomp filters on different cpus not a problem?
>>
>> In my understanding this is a list of filters and they are independent
>> of each other.
>>
>> Kees, Will. Andy?
>
> They're technically independent, but they are related to each
> other. (i.e. order matters, process hierarchy matters, etc). There's no
> reason I can see that we can't switch CPUs between running them, though.
> (AIUI, nothing here would suddenly make these run in parallel, right?)
Of course not. If we'd run the same thread on multiple CPUs in parallel
the ordering of your BPF programs would be the least of your worries.
> As long as "current" is still "current", and they run in the same order,
> we'll get the same final result as far as seccomp is concerned.
Right.
Thanks,
tglx
On Fri, Feb 21, 2020 at 03:00:54PM +0100, Thomas Gleixner wrote:
> Of course not. If we'd run the same thread on multiple CPUs in parallel
> the ordering of your BPF programs would be the least of your worries.
Been there, done that. It goes sideways *REALLY* fast :-)
On Fri, Feb 21, 2020 at 03:00:54PM +0100, Thomas Gleixner wrote:
> Kees Cook <[email protected]> writes:
> > They're technically independent, but they are related to each
> > other. (i.e. order matters, process hierarchy matters, etc). There's no
> > reason I can see that we can't switch CPUs between running them, though.
> > (AIUI, nothing here would suddenly make these run in parallel, right?)
>
> Of course not. If we'd run the same thread on multiple CPUs in parallel
> the ordering of your BPF programs would be the least of your worries.
Right, okay, good. I just wanted to be extra sure. :)
--
Kees Cook