2020-10-02 17:21:19

by Topi Miettinen

[permalink] [raw]
Subject: [PATCH] mm: optionally disable brk()

The brk() system call allows to change data segment size (heap). This
is mainly used by glibc for memory allocation, but it can use mmap()
and that results in more randomized memory mappings since the heap is
always located at fixed offset to program while mmap()ed memory is
randomized.

Signed-off-by: Topi Miettinen <[email protected]>
---
init/Kconfig | 15 +++++++++++++++
kernel/sys_ni.c | 2 ++
mm/mmap.c | 2 ++
3 files changed, 19 insertions(+)

diff --git a/init/Kconfig b/init/Kconfig
index c5ea2e694f6a..53735ac305d8 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1851,6 +1851,20 @@ config SLUB_MEMCG_SYSFS_ON
controlled by slub_memcg_sysfs boot parameter and this
config option determines the parameter's default value.

+config BRK_SYSCALL
+ bool "Enable brk() system call" if EXPERT
+ default y
+ help
+ Enable the brk() system call that allows to change data
+ segment size (heap). This is mainly used by glibc for memory
+ allocation, but it can use mmap() and that results in more
+ randomized memory mappings since the heap is always located
+ at fixed offset to program while mmap()ed memory is
+ randomized.
+
+ If unsure, say Y for maximum compatibility.
+
+if BRK_SYSCALL
config COMPAT_BRK
bool "Disable heap randomization"
default y
@@ -1862,6 +1876,7 @@ config COMPAT_BRK
/proc/sys/kernel/randomize_va_space to 2 or 3.

On non-ancient distros (post-2000 ones) N is usually a safe choice.
+endif # BRK_SYSCALL

choice
prompt "Choose SLAB allocator"
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 4d59775ea79c..3ffa5c4002e1 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -299,6 +299,8 @@ COND_SYSCALL(recvmmsg_time32);
COND_SYSCALL_COMPAT(recvmmsg_time32);
COND_SYSCALL_COMPAT(recvmmsg_time64);

+COND_SYSCALL(brk);
+
/*
* Architecture specific syscalls: see further below
*/
diff --git a/mm/mmap.c b/mm/mmap.c
index 489368f43af1..653be2c8982a 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -188,6 +188,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma)

static int do_brk_flags(unsigned long addr, unsigned long request, unsigned long flags,
struct list_head *uf);
+#ifdef CONFIG_BRK_SYSCALL
SYSCALL_DEFINE1(brk, unsigned long, brk)
{
unsigned long retval;
@@ -286,6 +287,7 @@ SYSCALL_DEFINE1(brk, unsigned long, brk)
mmap_write_unlock(mm);
return retval;
}
+#endif

static inline unsigned long vma_compute_gap(struct vm_area_struct *vma)
{
--
2.28.0


2020-10-02 17:56:05

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 02.10.20 19:19, Topi Miettinen wrote:
> The brk() system call allows to change data segment size (heap). This
> is mainly used by glibc for memory allocation, but it can use mmap()
> and that results in more randomized memory mappings since the heap is
> always located at fixed offset to program while mmap()ed memory is
> randomized.

Want to take more Unix out of Linux?

Honestly, why care about disabling? User space can happily use mmap() if
it prefers.


--
Thanks,

David / dhildenb

2020-10-02 21:23:02

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] mm: optionally disable brk()

From: David Hildenbrand
> Sent: 02 October 2020 18:52
>
> On 02.10.20 19:19, Topi Miettinen wrote:
> > The brk() system call allows to change data segment size (heap). This
> > is mainly used by glibc for memory allocation, but it can use mmap()
> > and that results in more randomized memory mappings since the heap is
> > always located at fixed offset to program while mmap()ed memory is
> > randomized.
>
> Want to take more Unix out of Linux?
>
> Honestly, why care about disabling? User space can happily use mmap() if
> it prefers.

I bet some obscure applications rely on it.

Although hopefully nothing still does heap allocation
by just increasing the VA and calling brk() in response
to SIGSEGV.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-02 21:45:42

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 2.10.2020 20.52, David Hildenbrand wrote:
> On 02.10.20 19:19, Topi Miettinen wrote:
>> The brk() system call allows to change data segment size (heap). This
>> is mainly used by glibc for memory allocation, but it can use mmap()
>> and that results in more randomized memory mappings since the heap is
>> always located at fixed offset to program while mmap()ed memory is
>> randomized.
>
> Want to take more Unix out of Linux?
>
> Honestly, why care about disabling? User space can happily use mmap() if
> it prefers.

brk() interface doesn't seem to be used much and glibc is happy to
switch to mmap() if brk() fails, so why not allow disabling it
optionally? If you don't care to disable, don't do it and this is even
the default.

-Topi


2020-10-05 06:14:51

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
> On 2.10.2020 20.52, David Hildenbrand wrote:
> > On 02.10.20 19:19, Topi Miettinen wrote:
> > > The brk() system call allows to change data segment size (heap). This
> > > is mainly used by glibc for memory allocation, but it can use mmap()
> > > and that results in more randomized memory mappings since the heap is
> > > always located at fixed offset to program while mmap()ed memory is
> > > randomized.
> >
> > Want to take more Unix out of Linux?
> >
> > Honestly, why care about disabling? User space can happily use mmap() if
> > it prefers.
>
> brk() interface doesn't seem to be used much and glibc is happy to switch to
> mmap() if brk() fails, so why not allow disabling it optionally? If you
> don't care to disable, don't do it and this is even the default.

I do not think we want to have config per syscall, do we? There are many
other syscalls which are rarely used. Your changelog is actually missing
the most important part. Why do we care so much to increase the config
space and make the kerneel even more tricky for users to configure? How
do I know that something won't break? brk() is one of those syscalls
that has been here for ever and a lot of userspace might depend on it.
I haven't checked but the code size is very unlikely to be shrunk much
as this is mostly a tiny wrapper around mmap code. We are not going to
get rid of any complexity.

So what is the point?
--
Michal Hocko
SUSE Labs

2020-10-05 08:15:13

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 9.12, Michal Hocko wrote:
> On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
>> On 2.10.2020 20.52, David Hildenbrand wrote:
>>> On 02.10.20 19:19, Topi Miettinen wrote:
>>>> The brk() system call allows to change data segment size (heap). This
>>>> is mainly used by glibc for memory allocation, but it can use mmap()
>>>> and that results in more randomized memory mappings since the heap is
>>>> always located at fixed offset to program while mmap()ed memory is
>>>> randomized.
>>>
>>> Want to take more Unix out of Linux?
>>>
>>> Honestly, why care about disabling? User space can happily use mmap() if
>>> it prefers.
>>
>> brk() interface doesn't seem to be used much and glibc is happy to switch to
>> mmap() if brk() fails, so why not allow disabling it optionally? If you
>> don't care to disable, don't do it and this is even the default.
>
> I do not think we want to have config per syscall, do we? There are many
> other syscalls which are rarely used. Your changelog is actually missing
> the most important part. Why do we care so much to increase the config
> space and make the kerneel even more tricky for users to configure?

Maybe, I didn't know this was an important priority since there are
other similar config options. Can you suggest some other config option
which could trigger this? This option is already buried under CONFIG_EXPERT.

> How
> do I know that something won't break? brk() is one of those syscalls
> that has been here for ever and a lot of userspace might depend on it.

1. brk() is used by glibc for malloc() as the primary choice, secondary
to mmap(NULL, ...). But malloc() switches to using only mmap() as soon
as brk() fails the first time, without breakage.

2. brk() also used for initializing glibc's internal thread structures.
The only program I saw having problems was ldconfig which indeed
segfaults due to an unsafe assumption that sbrk() will never fail. This
is easily fixable by switching to an internal version of mmap().

3. The dynamic loader uses brk() but this is only done to help malloc()
and nothing breaks there if brk() returns ENOSYS.

I've sent to glibc list RFC patches which switch to mmap() completely.
This improves the randomization for malloc()ated memory and the location
of the thread structures.

> I haven't checked but the code size is very unlikely to be shrunk much
> as this is mostly a tiny wrapper around mmap code. We are not going to
> get rid of any complexity.
>
> So what is the point?

The point is not to shrink the kernel (it will shrink by one small
function) or get rid of complexity. The point is to disable an inferior
interface. Memory returned by mmap() is at a random location but with
brk() it is located near the data segment, so the address is more easily
predictable.

I think hardened, security oriented systems should disable brk()
completely because it will increase the randomization of the process
address space (ASLR). This wouldn't be a good option to enable for
systems where maximum compatibility with legacy software is more
important than any hardening.

-Topi

2020-10-05 08:24:54

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On Mon 05-10-20 11:11:35, Topi Miettinen wrote:
[...]
> I think hardened, security oriented systems should disable brk() completely
> because it will increase the randomization of the process address space
> (ASLR). This wouldn't be a good option to enable for systems where maximum
> compatibility with legacy software is more important than any hardening.

I believe we already do have means to filter syscalls from userspace for
security hardened environements. Or is there any reason to duplicate
that and control during the configuration time?
--
Michal Hocko
SUSE Labs

2020-10-05 09:05:13

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 11.22, Michal Hocko wrote:
> On Mon 05-10-20 11:11:35, Topi Miettinen wrote:
> [...]
>> I think hardened, security oriented systems should disable brk() completely
>> because it will increase the randomization of the process address space
>> (ASLR). This wouldn't be a good option to enable for systems where maximum
>> compatibility with legacy software is more important than any hardening.
>
> I believe we already do have means to filter syscalls from userspace for
> security hardened environements. Or is there any reason to duplicate
> that and control during the configuration time?

This is true, but seccomp can't be used for cases where NoNewPrivileges
can't be enabled (setuid/setgid binaries present which sadly is still
often the case even in otherwise hardened system), so it's typically not
possible to install a filter for the whole system.

-Topi

2020-10-05 09:18:17

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 05.10.20 08:12, Michal Hocko wrote:
> On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
>> On 2.10.2020 20.52, David Hildenbrand wrote:
>>> On 02.10.20 19:19, Topi Miettinen wrote:
>>>> The brk() system call allows to change data segment size (heap). This
>>>> is mainly used by glibc for memory allocation, but it can use mmap()
>>>> and that results in more randomized memory mappings since the heap is
>>>> always located at fixed offset to program while mmap()ed memory is
>>>> randomized.
>>>
>>> Want to take more Unix out of Linux?
>>>
>>> Honestly, why care about disabling? User space can happily use mmap() if
>>> it prefers.
>>
>> brk() interface doesn't seem to be used much and glibc is happy to switch to
>> mmap() if brk() fails, so why not allow disabling it optionally? If you
>> don't care to disable, don't do it and this is even the default.
>
> I do not think we want to have config per syscall, do we?

I do wonder if grouping would be a better option then (finding a proper
level of abstraction ...).

--
Thanks,

David / dhildenb

2020-10-05 09:22:25

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On Mon 05-10-20 11:13:48, David Hildenbrand wrote:
> On 05.10.20 08:12, Michal Hocko wrote:
> > On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
> >> On 2.10.2020 20.52, David Hildenbrand wrote:
> >>> On 02.10.20 19:19, Topi Miettinen wrote:
> >>>> The brk() system call allows to change data segment size (heap). This
> >>>> is mainly used by glibc for memory allocation, but it can use mmap()
> >>>> and that results in more randomized memory mappings since the heap is
> >>>> always located at fixed offset to program while mmap()ed memory is
> >>>> randomized.
> >>>
> >>> Want to take more Unix out of Linux?
> >>>
> >>> Honestly, why care about disabling? User space can happily use mmap() if
> >>> it prefers.
> >>
> >> brk() interface doesn't seem to be used much and glibc is happy to switch to
> >> mmap() if brk() fails, so why not allow disabling it optionally? If you
> >> don't care to disable, don't do it and this is even the default.
> >
> > I do not think we want to have config per syscall, do we?
>
> I do wonder if grouping would be a better option then (finding a proper
> level of abstraction ...).

I have a vague recollection that project for the kernel tinification was
aiming that direction. No idea what is the current state or whether
somebody is pursuing it.

--
Michal Hocko
SUSE Labs

2020-10-05 09:50:32

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 12.13, David Hildenbrand wrote:
> On 05.10.20 08:12, Michal Hocko wrote:
>> On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
>>> On 2.10.2020 20.52, David Hildenbrand wrote:
>>>> On 02.10.20 19:19, Topi Miettinen wrote:
>>>>> The brk() system call allows to change data segment size (heap). This
>>>>> is mainly used by glibc for memory allocation, but it can use mmap()
>>>>> and that results in more randomized memory mappings since the heap is
>>>>> always located at fixed offset to program while mmap()ed memory is
>>>>> randomized.
>>>>
>>>> Want to take more Unix out of Linux?
>>>>
>>>> Honestly, why care about disabling? User space can happily use mmap() if
>>>> it prefers.
>>>
>>> brk() interface doesn't seem to be used much and glibc is happy to switch to
>>> mmap() if brk() fails, so why not allow disabling it optionally? If you
>>> don't care to disable, don't do it and this is even the default.
>>
>> I do not think we want to have config per syscall, do we?
>
> I do wonder if grouping would be a better option then (finding a proper
> level of abstraction ...).

If hardening and compatibility are seen as tradeoffs, perhaps there
could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
It would have options
- "compatibility" (default) to gear questions for maximum compatibility,
deselecting any hardening options which reduce compatibility
- "hardening" to gear questions for maximum hardening, deselecting any
compatibility options which reduce hardening
- "none/manual": ask all questions like before

-Topi

2020-10-05 09:59:03

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 05.10.20 11:47, Topi Miettinen wrote:
> On 5.10.2020 12.13, David Hildenbrand wrote:
>> On 05.10.20 08:12, Michal Hocko wrote:
>>> On Sat 03-10-20 00:44:09, Topi Miettinen wrote:
>>>> On 2.10.2020 20.52, David Hildenbrand wrote:
>>>>> On 02.10.20 19:19, Topi Miettinen wrote:
>>>>>> The brk() system call allows to change data segment size (heap). This
>>>>>> is mainly used by glibc for memory allocation, but it can use mmap()
>>>>>> and that results in more randomized memory mappings since the heap is
>>>>>> always located at fixed offset to program while mmap()ed memory is
>>>>>> randomized.
>>>>>
>>>>> Want to take more Unix out of Linux?
>>>>>
>>>>> Honestly, why care about disabling? User space can happily use mmap() if
>>>>> it prefers.
>>>>
>>>> brk() interface doesn't seem to be used much and glibc is happy to switch to
>>>> mmap() if brk() fails, so why not allow disabling it optionally? If you
>>>> don't care to disable, don't do it and this is even the default.
>>>
>>> I do not think we want to have config per syscall, do we?
>>
>> I do wonder if grouping would be a better option then (finding a proper
>> level of abstraction ...).
>
> If hardening and compatibility are seen as tradeoffs, perhaps there
> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
> It would have options
> - "compatibility" (default) to gear questions for maximum compatibility,
> deselecting any hardening options which reduce compatibility
> - "hardening" to gear questions for maximum hardening, deselecting any
> compatibility options which reduce hardening
> - "none/manual": ask all questions like before

I think the general direction is to avoid an exploding set of config
options. So if there isn't a *real* demand, I guess gluing this to a
single option ("CONFIG_SECURITY_HARDENING") might be good enough.

--
Thanks,

David / dhildenb

2020-10-05 11:25:20

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] mm: optionally disable brk()

From: David Hildenbrand
> Sent: 05 October 2020 10:55
...
> > If hardening and compatibility are seen as tradeoffs, perhaps there
> > could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
> > It would have options
> > - "compatibility" (default) to gear questions for maximum compatibility,
> > deselecting any hardening options which reduce compatibility
> > - "hardening" to gear questions for maximum hardening, deselecting any
> > compatibility options which reduce hardening
> > - "none/manual": ask all questions like before
>
> I think the general direction is to avoid an exploding set of config
> options. So if there isn't a *real* demand, I guess gluing this to a
> single option ("CONFIG_SECURITY_HARDENING") might be good enough.

Wouldn't that be better achieved by run-time clobbering
of the syscall vectors?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-05 12:20:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 05.10.20 13:21, David Laight wrote:
> From: David Hildenbrand
>> Sent: 05 October 2020 10:55
> ...
>>> If hardening and compatibility are seen as tradeoffs, perhaps there
>>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
>>> It would have options
>>> - "compatibility" (default) to gear questions for maximum compatibility,
>>> deselecting any hardening options which reduce compatibility
>>> - "hardening" to gear questions for maximum hardening, deselecting any
>>> compatibility options which reduce hardening
>>> - "none/manual": ask all questions like before
>>
>> I think the general direction is to avoid an exploding set of config
>> options. So if there isn't a *real* demand, I guess gluing this to a
>> single option ("CONFIG_SECURITY_HARDENING") might be good enough.
>
> Wouldn't that be better achieved by run-time clobbering
> of the syscall vectors?

You mean via something like a boot parameter? Possibly yes.

--
Thanks,

David / dhildenb

2020-10-05 12:29:10

by David Laight

[permalink] [raw]
Subject: RE: [PATCH] mm: optionally disable brk()

From: David Hildenbrand
> Sent: 05 October 2020 13:19
>
> On 05.10.20 13:21, David Laight wrote:
> > From: David Hildenbrand
> >> Sent: 05 October 2020 10:55
> > ...
> >>> If hardening and compatibility are seen as tradeoffs, perhaps there
> >>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
> >>> It would have options
> >>> - "compatibility" (default) to gear questions for maximum compatibility,
> >>> deselecting any hardening options which reduce compatibility
> >>> - "hardening" to gear questions for maximum hardening, deselecting any
> >>> compatibility options which reduce hardening
> >>> - "none/manual": ask all questions like before
> >>
> >> I think the general direction is to avoid an exploding set of config
> >> options. So if there isn't a *real* demand, I guess gluing this to a
> >> single option ("CONFIG_SECURITY_HARDENING") might be good enough.
> >
> > Wouldn't that be better achieved by run-time clobbering
> > of the syscall vectors?
>
> You mean via something like a boot parameter? Possibly yes.

I was thinking of later.
Some kind of restricted system might want the 'clobber'
mount() after everything is running.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-10-05 14:13:59

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On Mon, 5 Oct 2020 11:11:35 +0300
Topi Miettinen <[email protected]> wrote:

> The point is not to shrink the kernel (it will shrink by one small
> function) or get rid of complexity. The point is to disable an inferior
> interface. Memory returned by mmap() is at a random location but with
> brk() it is located near the data segment, so the address is more easily
> predictable.

So if your true objective is to get glibc to allocate memory differently,
perhaps the right thing to do is to patch glibc?

Thanks,

jon

2020-10-05 17:55:47

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 17.12, Jonathan Corbet wrote:
> On Mon, 5 Oct 2020 11:11:35 +0300
> Topi Miettinen <[email protected]> wrote:
>
>> The point is not to shrink the kernel (it will shrink by one small
>> function) or get rid of complexity. The point is to disable an inferior
>> interface. Memory returned by mmap() is at a random location but with
>> brk() it is located near the data segment, so the address is more easily
>> predictable.
>
> So if your true objective is to get glibc to allocate memory differently,
> perhaps the right thing to do is to patch glibc?

Of course:
https://sourceware.org/pipermail/libc-alpha/2020-October/118319.html

But since glibc is pretty much the only user of brk(), it also makes
sense to disable it in the kernel if nothing uses it anymore.

-Topi

2020-10-07 09:45:42

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 15.25, David Laight wrote:
> From: David Hildenbrand
>> Sent: 05 October 2020 13:19
>>
>> On 05.10.20 13:21, David Laight wrote:
>>> From: David Hildenbrand
>>>> Sent: 05 October 2020 10:55
>>> ...
>>>>> If hardening and compatibility are seen as tradeoffs, perhaps there
>>>>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
>>>>> It would have options
>>>>> - "compatibility" (default) to gear questions for maximum compatibility,
>>>>> deselecting any hardening options which reduce compatibility
>>>>> - "hardening" to gear questions for maximum hardening, deselecting any
>>>>> compatibility options which reduce hardening
>>>>> - "none/manual": ask all questions like before
>>>>
>>>> I think the general direction is to avoid an exploding set of config
>>>> options. So if there isn't a *real* demand, I guess gluing this to a
>>>> single option ("CONFIG_SECURITY_HARDENING") might be good enough.
>>>
>>> Wouldn't that be better achieved by run-time clobbering
>>> of the syscall vectors?
>>
>> You mean via something like a boot parameter? Possibly yes.
>
> I was thinking of later.
> Some kind of restricted system might want the 'clobber'
> mount() after everything is running.

Perhaps suitably privileged tasks should be able to install global
seccomp filters which would disregard any NoNewPrivileges requirements
and would apply immediately to all tasks. The boot parameter would be
also nice so that initrd and PID1 would be also restricted. Seccomp
would also allow more specific filtering than messing with the syscall
tables.

-Topi

2020-11-01 11:43:27

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH] mm: optionally disable brk()

On 5.10.2020 15.18, David Hildenbrand wrote:
> On 05.10.20 13:21, David Laight wrote:
>> From: David Hildenbrand
>>> Sent: 05 October 2020 10:55
>> ...
>>>> If hardening and compatibility are seen as tradeoffs, perhaps there
>>>> could be a top level config choice (CONFIG_HARDENING_TRADEOFF) for this.
>>>> It would have options
>>>> - "compatibility" (default) to gear questions for maximum compatibility,
>>>> deselecting any hardening options which reduce compatibility
>>>> - "hardening" to gear questions for maximum hardening, deselecting any
>>>> compatibility options which reduce hardening
>>>> - "none/manual": ask all questions like before
>>>
>>> I think the general direction is to avoid an exploding set of config
>>> options. So if there isn't a *real* demand, I guess gluing this to a
>>> single option ("CONFIG_SECURITY_HARDENING") might be good enough.
>>
>> Wouldn't that be better achieved by run-time clobbering
>> of the syscall vectors?
>
> You mean via something like a boot parameter? Possibly yes.
>

This may be obvious, but a global seccomp filter which doesn't affect
NNP can be installed in initrd with a simple program with no changes to
kernel:

#include <errno.h>
#include <seccomp.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int argc, char **argv) {
if (argc < 3) {
fprintf(stderr, "Usage: %s syscall [syscall]...
program\n", argv[0]);
return EXIT_FAILURE;
}

scmp_filter_ctx ctx = seccomp_init(SCMP_ACT_ALLOW);

if (ctx == NULL) {
fprintf(stderr, "failed to init filter\n");
return EXIT_FAILURE;
}

int r;
r = seccomp_attr_set(ctx, SCMP_FLTATR_CTL_NNP, 0);
if (r != 0) {
fprintf(stderr, "failed to disable NNP\n");
return EXIT_FAILURE;
}

fprintf(stderr, "filtering");
for (int i = 1; i < argc - 1; i++) {
const char *syscall = argv[i];

int syscall_nr = seccomp_syscall_resolve_name(syscall);

if (syscall_nr == __NR_SCMP_ERROR) {
//fprintf(stderr, "unknown syscall %s,
ignoring\n", syscall);
continue;
}
r = seccomp_rule_add_exact(ctx, SCMP_ACT_ERRNO(ENOSYS),
syscall_nr, 0);
if (r != 0) {
//fprintf(stderr, "failed to filter syscall %s,
ignoring\n", syscall);
continue;
}
fprintf(stderr, " %s", syscall);
}
fprintf(stderr, "\n");
r = seccomp_load(ctx);
if (r != 0) {
fprintf(stderr, "failed to apply filter\n");
return EXIT_FAILURE;
}

seccomp_release(ctx);

char *program = argv[argc - 1];
char *new_argv[] = { program, NULL };

execv(program, new_argv);

fprintf(stderr, "failed to exec %s\n", program);
return EXIT_FAILURE;
}

This can be inserted in initrd to disable some obsolete and old system
calls like this:
#!/bin/sh

exec /usr/local/sbin/seccomp-exec _sysctl afs_syscall bdflush break
create_module ftime get_kernel_syms getpmsg gtty idle lock mpx prof
profil putpmsg query_module security sgetmask ssetmask stty sysfs
tuxcall ulimit uselib ustat vserver epoll_ctl_old epoll_wait_old
old_adjtimex old_getpagesize oldfstat oldlstat oldolduname oldstat
oldumount olduname osf_old_creat osf_old_fstat osf_old_getpgrp
osf_old_killpg osf_old_lstat osf_old_open osf_old_sigaction
osf_old_sigblock osf_old_sigreturn osf_old_sigsetmask osf_old_sigvec
osf_old_stat osf_old_vadvise osf_old_vtrace osf_old_wait osf_oldquota
vm86old brk /init

-Topi