The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
but were missing from the explicit list of allowed calls since its introduction
in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
commit. Add support for these instructions by adding them to the allowed list
in the seccomp_check_filter function.
Signed-off-by: Anton Protopopov <[email protected]>
---
kernel/seccomp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dcb57bf..cae7561b44d4 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
case BPF_ALU | BPF_MUL | BPF_X:
case BPF_ALU | BPF_DIV | BPF_K:
case BPF_ALU | BPF_DIV | BPF_X:
+ case BPF_ALU | BPF_MOD | BPF_K:
+ case BPF_ALU | BPF_MOD | BPF_X:
case BPF_ALU | BPF_AND | BPF_K:
case BPF_ALU | BPF_AND | BPF_X:
case BPF_ALU | BPF_OR | BPF_K:
--
2.19.1
On Mon, Mar 16, 2020 at 04:36:46PM +0000, Anton Protopopov wrote:
> The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
> but were missing from the explicit list of allowed calls since its introduction
> in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
> commit. Add support for these instructions by adding them to the allowed list
> in the seccomp_check_filter function.
>
> Signed-off-by: Anton Protopopov <[email protected]>
This has been suggested in the past, but was deemed ultimately redundant:
https://lore.kernel.org/lkml/201908121035.06695C79F@keescook/
Is there a strong reason it's needed?
Thanks!
-Kees
> ---
> kernel/seccomp.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index b6ea3dcb57bf..cae7561b44d4 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
> case BPF_ALU | BPF_MUL | BPF_X:
> case BPF_ALU | BPF_DIV | BPF_K:
> case BPF_ALU | BPF_DIV | BPF_X:
> + case BPF_ALU | BPF_MOD | BPF_K:
> + case BPF_ALU | BPF_MOD | BPF_X:
> case BPF_ALU | BPF_AND | BPF_K:
> case BPF_ALU | BPF_AND | BPF_X:
> case BPF_ALU | BPF_OR | BPF_K:
> --
> 2.19.1
--
Kees Cook
пн, 16 мар. 2020 г. в 17:24, Kees Cook <[email protected]>:
>
> On Mon, Mar 16, 2020 at 04:36:46PM +0000, Anton Protopopov wrote:
> > The BPF_MOD ALU instructions could be utilized by seccomp classic BPF filters,
> > but were missing from the explicit list of allowed calls since its introduction
> > in the original e2cfabdfd075 ("seccomp: add system call filtering using BPF")
> > commit. Add support for these instructions by adding them to the allowed list
> > in the seccomp_check_filter function.
> >
> > Signed-off-by: Anton Protopopov <[email protected]>
>
> This has been suggested in the past, but was deemed ultimately redundant:
> https://lore.kernel.org/lkml/201908121035.06695C79F@keescook/
Yeah, Paul told me this right after I submitted the patch.
> Is there a strong reason it's needed?
I really don't have such a strong need in BPF_MOD, but let me tell why
I wanted to use it in the first place.
I've used this operation to speedup processing linear blacklist
filters. Namely, if you have a list of syscall numbers to blacklist,
you can do, say,
ldw [0]
mod #4
jeq #1, case1
jeq #1, case2
jeq #1, case3
case0:
...
and in every case to walk only a corresponding factor-list. In my case
I had a list of ~40 syscall numbers and after this change filter
executed in 17.25 instructions on average per syscall vs. 45
instructions for the linear filter (so this removes about 30
instructions penalty per every syscall). To replace "mod #4" I
actually used "and #3", but this obviously doesn't work for
non-power-of-two divisors. If I would use "mod 5", then it would give
me about 15.5 instructions on average.
Thanks,
Anton
>
> Thanks!
>
> -Kees
>
> > ---
> > kernel/seccomp.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index b6ea3dcb57bf..cae7561b44d4 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -206,6 +206,8 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
> > case BPF_ALU | BPF_MUL | BPF_X:
> > case BPF_ALU | BPF_DIV | BPF_K:
> > case BPF_ALU | BPF_DIV | BPF_X:
> > + case BPF_ALU | BPF_MOD | BPF_K:
> > + case BPF_ALU | BPF_MOD | BPF_X:
> > case BPF_ALU | BPF_AND | BPF_K:
> > case BPF_ALU | BPF_AND | BPF_X:
> > case BPF_ALU | BPF_OR | BPF_K:
> > --
> > 2.19.1
>
> --
> Kees Cook
On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> and in every case to walk only a corresponding factor-list. In my case
> I had a list of ~40 syscall numbers and after this change filter
> executed in 17.25 instructions on average per syscall vs. 45
> instructions for the linear filter (so this removes about 30
> instructions penalty per every syscall). To replace "mod #4" I
> actually used "and #3", but this obviously doesn't work for
> non-power-of-two divisors. If I would use "mod 5", then it would give
> me about 15.5 instructions on average.
Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
would mean a process couldn't run on older kernels without some tricks
on the seccomp side.
Since the syscall list is static for a given filter, why not arrange it
as a binary search? That should get even better average instructions
as O(log n) instead of O(n).
Though frankly I've also been considering an ABI version bump for adding
a syscall bitmap feature: the vast majority of seccomp filters are just
binary yes/no across a list of syscalls. Only the special cases need
special handling (arg inspection, fd notification, etc). Then these
kinds of filters could run as O(1).
--
Kees Cook
вт, 17 мар. 2020 г. в 16:21, Kees Cook <[email protected]>:
>
> On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > and in every case to walk only a corresponding factor-list. In my case
> > I had a list of ~40 syscall numbers and after this change filter
> > executed in 17.25 instructions on average per syscall vs. 45
> > instructions for the linear filter (so this removes about 30
> > instructions penalty per every syscall). To replace "mod #4" I
> > actually used "and #3", but this obviously doesn't work for
> > non-power-of-two divisors. If I would use "mod 5", then it would give
> > me about 15.5 instructions on average.
>
> Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> would mean a process couldn't run on older kernels without some tricks
> on the seccomp side.
Yes, I understood. Could you tell what would you do exactly if there
was a real need in a new instruction?
> Since the syscall list is static for a given filter, why not arrange it
> as a binary search? That should get even better average instructions
> as O(log n) instead of O(n).
Right, thanks! This saves about 4 more instructions for my case and
works 1-2 ns faster.
> Though frankly I've also been considering an ABI version bump for adding
> a syscall bitmap feature: the vast majority of seccomp filters are just
> binary yes/no across a list of syscalls. Only the special cases need
> special handling (arg inspection, fd notification, etc). Then these
> kinds of filters could run as O(1).
>
> --
> Kees Cook
Thanks,
Anton
On Tue, Mar 17, 2020 at 09:11:57PM -0400, Anton Protopopov wrote:
> вт, 17 мар. 2020 г. в 16:21, Kees Cook <[email protected]>:
> >
> > On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > > and in every case to walk only a corresponding factor-list. In my case
> > > I had a list of ~40 syscall numbers and after this change filter
> > > executed in 17.25 instructions on average per syscall vs. 45
> > > instructions for the linear filter (so this removes about 30
> > > instructions penalty per every syscall). To replace "mod #4" I
> > > actually used "and #3", but this obviously doesn't work for
> > > non-power-of-two divisors. If I would use "mod 5", then it would give
> > > me about 15.5 instructions on average.
> >
> > Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> > would mean a process couldn't run on older kernels without some tricks
> > on the seccomp side.
>
> Yes, I understood. Could you tell what would you do exactly if there
> was a real need in a new instruction?
I'd likely need to introduce some kind of way to query (and declare) the
"language version" of seccomp filters. New programs would need to
declare the language level (EINVAL would mean the program must support
the original "v1", ENOTSUPP would mean "kernel doesn't support that
level"), and the program would have to build a filter based on the
supported language features. The kernel would assume all undeclared
seccomp users were "v1" and would need to reject BPF_MOD. All programs
declaring "v2" would be allowed to use BPF_MOD.
It's really a lot for something that isn't really needed. :)
> > Since the syscall list is static for a given filter, why not arrange it
> > as a binary search? That should get even better average instructions
> > as O(log n) instead of O(n).
>
> Right, thanks! This saves about 4 more instructions for my case and
> works 1-2 ns faster.
Excellent!
> > Though frankly I've also been considering an ABI version bump for adding
> > a syscall bitmap feature: the vast majority of seccomp filters are just
> > binary yes/no across a list of syscalls. Only the special cases need
> > special handling (arg inspection, fd notification, etc). Then these
> > kinds of filters could run as O(1).
*This* feature wouldn't need my crazy language version idea, but it
_would_ still need to be detectable, much like how RET_USER_NOTIF was
added.
--
Kees Cook
ср, 18 мар. 2020 г. в 00:06, Kees Cook <[email protected]>:
>
> On Tue, Mar 17, 2020 at 09:11:57PM -0400, Anton Protopopov wrote:
> > вт, 17 мар. 2020 г. в 16:21, Kees Cook <[email protected]>:
> > >
> > > On Mon, Mar 16, 2020 at 06:17:34PM -0400, Anton Protopopov wrote:
> > > > and in every case to walk only a corresponding factor-list. In my case
> > > > I had a list of ~40 syscall numbers and after this change filter
> > > > executed in 17.25 instructions on average per syscall vs. 45
> > > > instructions for the linear filter (so this removes about 30
> > > > instructions penalty per every syscall). To replace "mod #4" I
> > > > actually used "and #3", but this obviously doesn't work for
> > > > non-power-of-two divisors. If I would use "mod 5", then it would give
> > > > me about 15.5 instructions on average.
> > >
> > > Gotcha. My real concern is with breaking the ABI here -- using BPF_MOD
> > > would mean a process couldn't run on older kernels without some tricks
> > > on the seccomp side.
> >
> > Yes, I understood. Could you tell what would you do exactly if there
> > was a real need in a new instruction?
>
> I'd likely need to introduce some kind of way to query (and declare) the
> "language version" of seccomp filters. New programs would need to
> declare the language level (EINVAL would mean the program must support
> the original "v1", ENOTSUPP would mean "kernel doesn't support that
> level"), and the program would have to build a filter based on the
> supported language features. The kernel would assume all undeclared
> seccomp users were "v1" and would need to reject BPF_MOD. All programs
> declaring "v2" would be allowed to use BPF_MOD.
>
> It's really a lot for something that isn't really needed. :)
Right :) Thanks for the explanations!
> > > Since the syscall list is static for a given filter, why not arrange it
> > > as a binary search? That should get even better average instructions
> > > as O(log n) instead of O(n).
> >
> > Right, thanks! This saves about 4 more instructions for my case and
> > works 1-2 ns faster.
>
> Excellent!
>
> > > Though frankly I've also been considering an ABI version bump for adding
> > > a syscall bitmap feature: the vast majority of seccomp filters are just
> > > binary yes/no across a list of syscalls. Only the special cases need
> > > special handling (arg inspection, fd notification, etc). Then these
> > > kinds of filters could run as O(1).
>
> *This* feature wouldn't need my crazy language version idea, but it
> _would_ still need to be detectable, much like how RET_USER_NOTIF was
> added.
>
> --
> Kees Cook