2020-11-03 10:28:05

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Re-mmap executable segments instead of mprotecting them in
case mprotect is seccomp filtered.

For the kernel mapped main executable we don't have the fd
for re-mmap so linux needs to be updated to add BTI. (In the
presence of seccomp filters for mprotect(PROT_EXEC) the libc
cannot change BTI protection at runtime based on user space
policy so it is better if the kernel maps BTI compatible
binaries with PROT_BTI by default.)

Szabolcs Nagy (4):
elf: Pass the fd to note processing [BZ #26831]
elf: Move note processing after l_phdr is updated [BZ #26831]
aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]
aarch64: Remove the bti link_map field [BZ #26831]

elf/dl-load.c | 38 ++++++++++++++++---------------
elf/rtld.c | 4 ++--
sysdeps/aarch64/dl-bti.c | 46 ++++++++++++++++++++------------------
sysdeps/aarch64/dl-prop.h | 17 +++++++-------
sysdeps/aarch64/linkmap.h | 1 -
sysdeps/generic/dl-prop.h | 6 ++---
sysdeps/generic/ldsodefs.h | 5 +++--
sysdeps/x86/dl-prop.h | 6 ++---
8 files changed, 64 insertions(+), 59 deletions(-)

--
2.17.1


2020-11-03 10:28:58

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH 1/4] elf: Pass the fd to note processing [BZ #26831]

To handle GNU property notes on aarch64 some segments need to
be mmaped again, so the fd of the loaded ELF module is needed.

When the fd is not available (kernel loaded modules), then -1
is passed.

The fd is passed to both _dl_process_pt_gnu_property and
_dl_process_pt_note for consistency. Target specific note
processing functions are updated accordingly.
---
elf/dl-load.c | 12 +++++++-----
elf/rtld.c | 4 ++--
sysdeps/aarch64/dl-prop.h | 6 +++---
sysdeps/generic/dl-prop.h | 6 +++---
sysdeps/generic/ldsodefs.h | 5 +++--
sysdeps/x86/dl-prop.h | 6 +++---
6 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index f3201e7c14..ceaab7f18e 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -861,10 +861,12 @@ lose (int code, int fd, const char *name, char *realname, struct link_map *l,

/* Process PT_GNU_PROPERTY program header PH in module L after
PT_LOAD segments are mapped. Only one NT_GNU_PROPERTY_TYPE_0
- note is handled which contains processor specific properties. */
+ note is handled which contains processor specific properties.
+ FD is -1 for the kernel mapped main executable otherwise it is
+ the fd used for loading module L. */

void
-_dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_gnu_property (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
const ElfW(Addr) size = ph->p_memsz;
@@ -911,7 +913,7 @@ _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
last_type = type;

/* Target specific property processing. */
- if (_dl_process_gnu_property (l, type, datasz, ptr) == 0)
+ if (_dl_process_gnu_property (l, fd, type, datasz, ptr) == 0)
return;

/* Check the next property item. */
@@ -1266,10 +1268,10 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
switch (ph[-1].p_type)
{
case PT_NOTE:
- _dl_process_pt_note (l, &ph[-1]);
+ _dl_process_pt_note (l, fd, &ph[-1]);
break;
case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (l, &ph[-1]);
+ _dl_process_pt_gnu_property (l, fd, &ph[-1]);
break;
}
}
diff --git a/elf/rtld.c b/elf/rtld.c
index 5d117d0d2c..6ba918338b 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1531,10 +1531,10 @@ dl_main (const ElfW(Phdr) *phdr,
switch (ph[-1].p_type)
{
case PT_NOTE:
- _dl_process_pt_note (main_map, &ph[-1]);
+ _dl_process_pt_note (main_map, -1, &ph[-1]);
break;
case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (main_map, &ph[-1]);
+ _dl_process_pt_gnu_property (main_map, -1, &ph[-1]);
break;
}

diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index b0785bda83..2016d1472e 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -35,13 +35,13 @@ _dl_open_check (struct link_map *m)
}

static inline void __attribute__ ((always_inline))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
}

static inline int
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
{
diff --git a/sysdeps/generic/dl-prop.h b/sysdeps/generic/dl-prop.h
index f1cf576fe3..df27ff8e6a 100644
--- a/sysdeps/generic/dl-prop.h
+++ b/sysdeps/generic/dl-prop.h
@@ -37,15 +37,15 @@ _dl_open_check (struct link_map *m)
}

static inline void __attribute__ ((always_inline))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
}

/* Called for each property in the NT_GNU_PROPERTY_TYPE_0 note of L,
processing of the properties continues until this returns 0. */
static inline int __attribute__ ((always_inline))
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
return 0;
}
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 382eeb9be0..702cb0f488 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -925,8 +925,9 @@ extern void _dl_rtld_di_serinfo (struct link_map *loader,
Dl_serinfo *si, bool counting);

/* Process PT_GNU_PROPERTY program header PH in module L after
- PT_LOAD segments are mapped. */
-void _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph);
+ PT_LOAD segments are mapped from file FD. */
+void _dl_process_pt_gnu_property (struct link_map *l, int fd,
+ const ElfW(Phdr) *ph);


/* Search loaded objects' symbol tables for a definition of the symbol
diff --git a/sysdeps/x86/dl-prop.h b/sysdeps/x86/dl-prop.h
index 89911e19e2..4eb3b85a7b 100644
--- a/sysdeps/x86/dl-prop.h
+++ b/sysdeps/x86/dl-prop.h
@@ -145,15 +145,15 @@ _dl_process_cet_property_note (struct link_map *l,
}

static inline void __attribute__ ((unused))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
_dl_process_cet_property_note (l, note, ph->p_memsz, ph->p_align);
}

static inline int __attribute__ ((always_inline))
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
return 0;
}
--
2.17.1

2020-11-03 10:29:24

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH 4/4] aarch64: Remove the bti link_map field [BZ #26831]

The bti link_map field is no longer necessary because PROT_BTI
is applied at note processing time immediately instead of in
_dl_open_check based on the bti field.

This is a separate patch that is not expected to be backported
to avoid changing the link_map layout that is libc internal ABI.
---
sysdeps/aarch64/dl-prop.h | 5 +----
sysdeps/aarch64/linkmap.h | 1 -
2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index 762bc93733..cf14381e4a 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -52,10 +52,7 @@ _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,

unsigned int feature_1 = *(unsigned int *) data;
if (feature_1 & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
- {
- l->l_mach.bti = true; /* No longer needed. */
- _dl_bti_protect (l, fd);
- }
+ _dl_bti_protect (l, fd);

/* Stop if we processed the property note. */
return 0;
diff --git a/sysdeps/aarch64/linkmap.h b/sysdeps/aarch64/linkmap.h
index 847a03ace2..e921e77495 100644
--- a/sysdeps/aarch64/linkmap.h
+++ b/sysdeps/aarch64/linkmap.h
@@ -22,5 +22,4 @@ struct link_map_machine
{
ElfW(Addr) plt; /* Address of .plt */
void *tlsdesc_table; /* Address of TLS descriptor hash table. */
- bool bti; /* Branch Target Identification is enabled. */
};
--
2.17.1

2020-11-03 10:29:29

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

Program headers are processed in two pass: after the first pass
load segments are mmapped so in the second pass target specific
note processing logic can access the notes.

The second pass is moved later so various link_map fields are
set up that may be useful for note processing such as l_phdr.
---
elf/dl-load.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index ceaab7f18e..673cf960a0 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -1259,21 +1259,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
maplength, has_holes, loader);
if (__glibc_unlikely (errstring != NULL))
goto call_lose;
-
- /* Process program headers again after load segments are mapped in
- case processing requires accessing those segments. Scan program
- headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
- exits. */
- for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
- switch (ph[-1].p_type)
- {
- case PT_NOTE:
- _dl_process_pt_note (l, fd, &ph[-1]);
- break;
- case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (l, fd, &ph[-1]);
- break;
- }
}

if (l->l_ld == 0)
@@ -1481,6 +1466,21 @@ cannot enable executable stack as shared object requires");
/* Assign the next available module ID. */
l->l_tls_modid = _dl_next_tls_modid ();

+ /* Process program headers again after load segments are mapped in
+ case processing requires accessing those segments. Scan program
+ headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
+ exits. */
+ for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
+ switch (ph[-1].p_type)
+ {
+ case PT_NOTE:
+ _dl_process_pt_note (l, fd, &ph[-1]);
+ break;
+ case PT_GNU_PROPERTY:
+ _dl_process_pt_gnu_property (l, fd, &ph[-1]);
+ break;
+ }
+
#ifdef DL_AFTER_LOAD
DL_AFTER_LOAD (l);
#endif
--
2.17.1

2020-11-03 10:29:34

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH 3/4] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

Re-mmap executable segments if possible instead of using mprotect
to add PROT_BTI. This allows using BTI protection with security
policies that prevent mprotect with PROT_EXEC.

If the fd of the ELF module is not available because it was kernel
mapped then mprotect is used and failures are ignored. It is
expected that linux kernel will add PROT_BTI when mapping a module
(current linux as of version 5.9 does not do this).

Computing the mapping parameters follows the logic of
_dl_map_object_from_fd more closely now.

Fixes bug 26831.
---
sysdeps/aarch64/dl-bti.c | 46 ++++++++++++++++++++-------------------
sysdeps/aarch64/dl-prop.h | 14 +++++++-----
2 files changed, 33 insertions(+), 27 deletions(-)

diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
index 196e462520..385f1731ca 100644
--- a/sysdeps/aarch64/dl-bti.c
+++ b/sysdeps/aarch64/dl-bti.c
@@ -19,43 +19,45 @@
#include <errno.h>
#include <libintl.h>
#include <ldsodefs.h>
+#include <dl-load.h> /* For MAP_COPY. */

-static int
-enable_bti (struct link_map *map, const char *program)
+/* Enable BTI protection for MAP. */
+
+void
+_dl_bti_protect (struct link_map *map, int fd)
{
+ const size_t pagesz = GLRO(dl_pagesize);
const ElfW(Phdr) *phdr;
- unsigned prot;

for (phdr = map->l_phdr; phdr < &map->l_phdr[map->l_phnum]; ++phdr)
if (phdr->p_type == PT_LOAD && (phdr->p_flags & PF_X))
{
- void *start = (void *) (phdr->p_vaddr + map->l_addr);
- size_t len = phdr->p_memsz;
-
- prot = PROT_EXEC | PROT_BTI;
+ size_t vstart = ALIGN_DOWN (phdr->p_vaddr, pagesz);
+ size_t vend = ALIGN_UP (phdr->p_vaddr + phdr->p_filesz, pagesz);
+ off_t off = ALIGN_DOWN (phdr->p_offset, pagesz);
+ void *start = (void *) (vstart + map->l_addr);
+ size_t len = vend - vstart;
+
+ /* Add PROT_BTI. */
+ unsigned prot = PROT_EXEC | PROT_BTI;
if (phdr->p_flags & PF_R)
prot |= PROT_READ;
if (phdr->p_flags & PF_W)
prot |= PROT_WRITE;

- if (__mprotect (start, len, prot) < 0)
+ if (fd == -1)
+ {
+ /* Ignore failures: rely on the kernel adding PROT_BTI then. */
+ __mprotect (start, len, prot);
+ }
+ else
{
- if (program)
- _dl_fatal_printf ("%s: mprotect failed to turn on BTI\n",
- map->l_name);
- else
+ void *p = __mmap (start, len, prot, MAP_FIXED|MAP_COPY|MAP_FILE,
+ fd, off);
+ if (p == MAP_FAILED)
_dl_signal_error (errno, map->l_name, "dlopen",
- N_("mprotect failed to turn on BTI"));
+ N_("failed to turn on BTI protection"));
}
}
return 0;
}
-
-/* Enable BTI for L if required. */
-
-void
-_dl_bti_check (struct link_map *l, const char *program)
-{
- if (GLRO(dl_aarch64_cpu_features).bti && l->l_mach.bti)
- enable_bti (l, program);
-}
diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index 2016d1472e..762bc93733 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -19,19 +19,16 @@
#ifndef _DL_PROP_H
#define _DL_PROP_H

-extern void _dl_bti_check (struct link_map *, const char *)
- attribute_hidden;
+extern void _dl_bti_protect (struct link_map *, int) attribute_hidden;

static inline void __attribute__ ((always_inline))
_rtld_main_check (struct link_map *m, const char *program)
{
- _dl_bti_check (m, program);
}

static inline void __attribute__ ((always_inline))
_dl_open_check (struct link_map *m)
{
- _dl_bti_check (m, NULL);
}

static inline void __attribute__ ((always_inline))
@@ -43,6 +40,10 @@ static inline int
_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
uint32_t datasz, void *data)
{
+ if (!GLRO(dl_aarch64_cpu_features).bti)
+ /* Skip note processing. */
+ return 0;
+
if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
{
/* Stop if the property note is ill-formed. */
@@ -51,7 +52,10 @@ _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,

unsigned int feature_1 = *(unsigned int *) data;
if (feature_1 & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
- l->l_mach.bti = true;
+ {
+ l->l_mach.bti = true; /* No longer needed. */
+ _dl_bti_protect (l, fd);
+ }

/* Stop if we processed the property note. */
return 0;
--
2.17.1

2020-11-03 10:38:45

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 3/4] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

* Szabolcs Nagy:

> Re-mmap executable segments if possible instead of using mprotect
> to add PROT_BTI. This allows using BTI protection with security
> policies that prevent mprotect with PROT_EXEC.
>
> If the fd of the ELF module is not available because it was kernel
> mapped then mprotect is used and failures are ignored. It is
> expected that linux kernel will add PROT_BTI when mapping a module
> (current linux as of version 5.9 does not do this).
>
> Computing the mapping parameters follows the logic of
> _dl_map_object_from_fd more closely now.

What's the performance of this on execve-heavy workloads, such as kernel
or glibc builds? Hopefully it's cheap because these mappings have not
been faulted in yet.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

2020-11-03 10:42:09

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

* Szabolcs Nagy:

> Program headers are processed in two pass: after the first pass
> load segments are mmapped so in the second pass target specific
> note processing logic can access the notes.
>
> The second pass is moved later so various link_map fields are
> set up that may be useful for note processing such as l_phdr.
> ---
> elf/dl-load.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index ceaab7f18e..673cf960a0 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -1259,21 +1259,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> maplength, has_holes, loader);
> if (__glibc_unlikely (errstring != NULL))
> goto call_lose;
> -
> - /* Process program headers again after load segments are mapped in
> - case processing requires accessing those segments. Scan program
> - headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> - exits. */
> - for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
> - switch (ph[-1].p_type)
> - {
> - case PT_NOTE:
> - _dl_process_pt_note (l, fd, &ph[-1]);
> - break;
> - case PT_GNU_PROPERTY:
> - _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> - break;
> - }
> }
>
> if (l->l_ld == 0)
> @@ -1481,6 +1466,21 @@ cannot enable executable stack as shared object requires");
> /* Assign the next available module ID. */
> l->l_tls_modid = _dl_next_tls_modid ();
>
> + /* Process program headers again after load segments are mapped in
> + case processing requires accessing those segments. Scan program
> + headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> + exits. */
> + for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
> + switch (ph[-1].p_type)
> + {
> + case PT_NOTE:
> + _dl_process_pt_note (l, fd, &ph[-1]);
> + break;
> + case PT_GNU_PROPERTY:
> + _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> + break;
> + }
> +
> #ifdef DL_AFTER_LOAD
> DL_AFTER_LOAD (l);
> #endif

Is this still compatible with the CET requirements?

I hope it is because the CET magic happens in _dl_open_check, so after
the the code in elf/dl-load.c has run.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

2020-11-03 12:39:11

by H.J. Lu

[permalink] [raw]
Subject: Re: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

On Tue, Nov 3, 2020 at 2:38 AM Florian Weimer <[email protected]> wrote:
>
> * Szabolcs Nagy:
>
> > Program headers are processed in two pass: after the first pass
> > load segments are mmapped so in the second pass target specific
> > note processing logic can access the notes.
> >
> > The second pass is moved later so various link_map fields are
> > set up that may be useful for note processing such as l_phdr.
> > ---
> > elf/dl-load.c | 30 +++++++++++++++---------------
> > 1 file changed, 15 insertions(+), 15 deletions(-)
> >
> > diff --git a/elf/dl-load.c b/elf/dl-load.c
> > index ceaab7f18e..673cf960a0 100644
> > --- a/elf/dl-load.c
> > +++ b/elf/dl-load.c
> > @@ -1259,21 +1259,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> > maplength, has_holes, loader);
> > if (__glibc_unlikely (errstring != NULL))
> > goto call_lose;
> > -
> > - /* Process program headers again after load segments are mapped in
> > - case processing requires accessing those segments. Scan program
> > - headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > - exits. */
> > - for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
> > - switch (ph[-1].p_type)
> > - {
> > - case PT_NOTE:
> > - _dl_process_pt_note (l, fd, &ph[-1]);
> > - break;
> > - case PT_GNU_PROPERTY:
> > - _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > - break;
> > - }
> > }
> >
> > if (l->l_ld == 0)
> > @@ -1481,6 +1466,21 @@ cannot enable executable stack as shared object requires");
> > /* Assign the next available module ID. */
> > l->l_tls_modid = _dl_next_tls_modid ();
> >
> > + /* Process program headers again after load segments are mapped in
> > + case processing requires accessing those segments. Scan program
> > + headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > + exits. */
> > + for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
> > + switch (ph[-1].p_type)
> > + {
> > + case PT_NOTE:
> > + _dl_process_pt_note (l, fd, &ph[-1]);
> > + break;
> > + case PT_GNU_PROPERTY:
> > + _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > + break;
> > + }
> > +
> > #ifdef DL_AFTER_LOAD
> > DL_AFTER_LOAD (l);
> > #endif
>
> Is this still compatible with the CET requirements?
>
> I hope it is because the CET magic happens in _dl_open_check, so after
> the the code in elf/dl-load.c has run.
>
>

_dl_process_pt_note and _dl_process_pt_gnu_property may call
_dl_signal_error. Are we prepared to clean more things up when it
happens? I am investigating:

https://sourceware.org/bugzilla/show_bug.cgi?id=26825

I don't think cleanup of _dl_process_pt_gnu_property failure is done
properly.

--
H.J.

2020-11-03 15:09:51

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

The 11/03/2020 04:36, H.J. Lu wrote:
> On Tue, Nov 3, 2020 at 2:38 AM Florian Weimer <[email protected]> wrote:
> > * Szabolcs Nagy:
> >
> > > Program headers are processed in two pass: after the first pass
> > > load segments are mmapped so in the second pass target specific
> > > note processing logic can access the notes.
> > >
> > > The second pass is moved later so various link_map fields are
> > > set up that may be useful for note processing such as l_phdr.
> > > ---
> > > elf/dl-load.c | 30 +++++++++++++++---------------
> > > 1 file changed, 15 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/elf/dl-load.c b/elf/dl-load.c
> > > index ceaab7f18e..673cf960a0 100644
> > > --- a/elf/dl-load.c
> > > +++ b/elf/dl-load.c
> > > @@ -1259,21 +1259,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> > > maplength, has_holes, loader);
> > > if (__glibc_unlikely (errstring != NULL))
> > > goto call_lose;
> > > -
> > > - /* Process program headers again after load segments are mapped in
> > > - case processing requires accessing those segments. Scan program
> > > - headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > > - exits. */
> > > - for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
> > > - switch (ph[-1].p_type)
> > > - {
> > > - case PT_NOTE:
> > > - _dl_process_pt_note (l, fd, &ph[-1]);
> > > - break;
> > > - case PT_GNU_PROPERTY:
> > > - _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > > - break;
> > > - }
> > > }
> > >
> > > if (l->l_ld == 0)
> > > @@ -1481,6 +1466,21 @@ cannot enable executable stack as shared object requires");
> > > /* Assign the next available module ID. */
> > > l->l_tls_modid = _dl_next_tls_modid ();
> > >
> > > + /* Process program headers again after load segments are mapped in
> > > + case processing requires accessing those segments. Scan program
> > > + headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > > + exits. */
> > > + for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
> > > + switch (ph[-1].p_type)
> > > + {
> > > + case PT_NOTE:
> > > + _dl_process_pt_note (l, fd, &ph[-1]);
> > > + break;
> > > + case PT_GNU_PROPERTY:
> > > + _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > > + break;
> > > + }
> > > +
> > > #ifdef DL_AFTER_LOAD
> > > DL_AFTER_LOAD (l);
> > > #endif
> >
> > Is this still compatible with the CET requirements?
> >
> > I hope it is because the CET magic happens in _dl_open_check, so after
> > the the code in elf/dl-load.c has run.

i believe the note processing and later cet magic
are not affected by this code move.

but i did not test this with cet.

>
> _dl_process_pt_note and _dl_process_pt_gnu_property may call
> _dl_signal_error. Are we prepared to clean more things up when it
> happens? I am investigating:

yeah, this is difficult to reason about.

it seems to me that after _dl_map_object returns there
may be _dl_map_object_deps which can fail in a way that
all of dlopen has to be rolled back, so if i move things
around in _dl_map_object that should not introduce new
issues.

but it is not clear to me how robust the dlopen code is
against arbitrary failure in dl_open_worker.

>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26825
>
> I don't think cleanup of _dl_process_pt_gnu_property failure is done
> properly.
>
> --
> H.J.

--

2020-11-03 15:30:51

by H.J. Lu

[permalink] [raw]
Subject: Re: [PATCH 2/4] elf: Move note processing after l_phdr is updated [BZ #26831]

On Tue, Nov 3, 2020 at 7:04 AM Szabolcs Nagy <[email protected]> wrote:
>
> The 11/03/2020 04:36, H.J. Lu wrote:
> > On Tue, Nov 3, 2020 at 2:38 AM Florian Weimer <[email protected]> wrote:
> > > * Szabolcs Nagy:
> > >
> > > > Program headers are processed in two pass: after the first pass
> > > > load segments are mmapped so in the second pass target specific
> > > > note processing logic can access the notes.
> > > >
> > > > The second pass is moved later so various link_map fields are
> > > > set up that may be useful for note processing such as l_phdr.
> > > > ---
> > > > elf/dl-load.c | 30 +++++++++++++++---------------
> > > > 1 file changed, 15 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/elf/dl-load.c b/elf/dl-load.c
> > > > index ceaab7f18e..673cf960a0 100644
> > > > --- a/elf/dl-load.c
> > > > +++ b/elf/dl-load.c
> > > > @@ -1259,21 +1259,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> > > > maplength, has_holes, loader);
> > > > if (__glibc_unlikely (errstring != NULL))
> > > > goto call_lose;
> > > > -
> > > > - /* Process program headers again after load segments are mapped in
> > > > - case processing requires accessing those segments. Scan program
> > > > - headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > > > - exits. */
> > > > - for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
> > > > - switch (ph[-1].p_type)
> > > > - {
> > > > - case PT_NOTE:
> > > > - _dl_process_pt_note (l, fd, &ph[-1]);
> > > > - break;
> > > > - case PT_GNU_PROPERTY:
> > > > - _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > > > - break;
> > > > - }
> > > > }
> > > >
> > > > if (l->l_ld == 0)
> > > > @@ -1481,6 +1466,21 @@ cannot enable executable stack as shared object requires");
> > > > /* Assign the next available module ID. */
> > > > l->l_tls_modid = _dl_next_tls_modid ();
> > > >
> > > > + /* Process program headers again after load segments are mapped in
> > > > + case processing requires accessing those segments. Scan program
> > > > + headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
> > > > + exits. */
> > > > + for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
> > > > + switch (ph[-1].p_type)
> > > > + {
> > > > + case PT_NOTE:
> > > > + _dl_process_pt_note (l, fd, &ph[-1]);
> > > > + break;
> > > > + case PT_GNU_PROPERTY:
> > > > + _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> > > > + break;
> > > > + }
> > > > +
> > > > #ifdef DL_AFTER_LOAD
> > > > DL_AFTER_LOAD (l);
> > > > #endif
> > >
> > > Is this still compatible with the CET requirements?
> > >
> > > I hope it is because the CET magic happens in _dl_open_check, so after
> > > the the code in elf/dl-load.c has run.
>
> i believe the note processing and later cet magic
> are not affected by this code move.
>
> but i did not test this with cet.
>
> >
> > _dl_process_pt_note and _dl_process_pt_gnu_property may call
> > _dl_signal_error. Are we prepared to clean more things up when it
> > happens? I am investigating:
>
> yeah, this is difficult to reason about.
>
> it seems to me that after _dl_map_object returns there
> may be _dl_map_object_deps which can fail in a way that
> all of dlopen has to be rolled back, so if i move things
> around in _dl_map_object that should not introduce new
> issues.

I haven't investigated it in detail. But there are

1314 if (l->l_phdr == NULL)
1315 {
1316 /* The program header is not contained in any of the segments.
1317 We have to allocate memory ourself and copy it over from out
1318 temporary place. */
1319 ElfW(Phdr) *newp = (ElfW(Phdr) *) malloc (header->e_phnum
1320 * sizeof (ElfW(Phdr)));
1321 if (newp == NULL)
1322 {
1323 errstring = N_("cannot allocate memory for program header");
1324 goto call_lose_errno;
1325 }
1326
1327 l->l_phdr = memcpy (newp, phdr,
1328 (header->e_phnum * sizeof (ElfW(Phdr))));
1329 l->l_phdr_allocated = 1;
1330 }

When _dl_process_pt_gnu_property is moved after it, will l_phdr be
free on _dl_signal_error?

> but it is not clear to me how robust the dlopen code is
> against arbitrary failure in dl_open_worker.

I think we are mostly OK, except for some corner cases. Delay
_dl_process_pt_gnu_property may introduce more corner cases.

> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26825
> >
> > I don't think cleanup of _dl_process_pt_gnu_property failure is done
> > properly.
> >
> > --
> > H.J.
>
> --



--
H.J.

2020-11-03 17:36:31

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:

> Re-mmap executable segments instead of mprotecting them in
> case mprotect is seccomp filtered.

> For the kernel mapped main executable we don't have the fd
> for re-mmap so linux needs to be updated to add BTI. (In the
> presence of seccomp filters for mprotect(PROT_EXEC) the libc
> cannot change BTI protection at runtime based on user space
> policy so it is better if the kernel maps BTI compatible
> binaries with PROT_BTI by default.)

Given that there were still some ongoing discussions on a more robust
kernel interface here and there seem to be a few concerns with this
series should we perhaps just take a step back and disable this seccomp
filter in systemd on arm64, at least for the time being? That seems
safer than rolling out things that set ABI quickly, a big part of the
reason we went with having the dynamic linker enable PROT_BTI in the
first place was to give us more flexibility to handle any unforseen
consequences of enabling BTI that we run into. We are going to have
similar issues with other features like MTE so we need to make sure that
whatever we're doing works with them too.

Also updated to Will's current e-mail address - Will, do you have
thoughts on what we should do here?


Attachments:
(No filename) (1.28 kB)
signature.asc (499.00 B)
Download all attachments

2020-11-04 05:43:59

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Hi,

On 11/3/20 11:34 AM, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
>
>> Re-mmap executable segments instead of mprotecting them in
>> case mprotect is seccomp filtered.
>
>> For the kernel mapped main executable we don't have the fd
>> for re-mmap so linux needs to be updated to add BTI. (In the
>> presence of seccomp filters for mprotect(PROT_EXEC) the libc
>> cannot change BTI protection at runtime based on user space
>> policy so it is better if the kernel maps BTI compatible
>> binaries with PROT_BTI by default.)
>
> Given that there were still some ongoing discussions on a more robust
> kernel interface here and there seem to be a few concerns with this
> series should we perhaps just take a step back and disable this seccomp
> filter in systemd on arm64, at least for the time being? That seems
> safer than rolling out things that set ABI quickly, a big part of the

So, that's a bigger hammer than I think is needed and punishes !BTI
machines. I'm going to suggest that if we need to carry a temp patch its
more like the glibc patch I mentioned in the Fedora defect. That patch
simply logs a message, on the mprotect failures rather than aborting.
Its fairly non-intrusive.

That leaves seccomp functional, and BTI generally functional except when
seccomp is restricting it. I've also been asked that if a patch like
that is needed, its (temporary?) merged to the glibc trunk, rather than
just being carried by the distro's.

Thanks,

> reason we went with having the dynamic linker enable PROT_BTI in the
> first place was to give us more flexibility to handle any unforseen
> consequences of enabling BTI that we run into. We are going to have
> similar issues with other features like MTE so we need to make sure that
> whatever we're doing works with them too.
>
> Also updated to Will's current e-mail address - Will, do you have
> thoughts on what we should do here?
>

2020-11-04 08:59:08

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

The 11/03/2020 23:41, Jeremy Linton wrote:
> On 11/3/20 11:34 AM, Mark Brown wrote:
> > On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
> >
> > > Re-mmap executable segments instead of mprotecting them in
> > > case mprotect is seccomp filtered.
> >
> > > For the kernel mapped main executable we don't have the fd
> > > for re-mmap so linux needs to be updated to add BTI. (In the
> > > presence of seccomp filters for mprotect(PROT_EXEC) the libc
> > > cannot change BTI protection at runtime based on user space
> > > policy so it is better if the kernel maps BTI compatible
> > > binaries with PROT_BTI by default.)
> >
> > Given that there were still some ongoing discussions on a more robust
> > kernel interface here and there seem to be a few concerns with this
> > series should we perhaps just take a step back and disable this seccomp
> > filter in systemd on arm64, at least for the time being? That seems
> > safer than rolling out things that set ABI quickly, a big part of the
>
> So, that's a bigger hammer than I think is needed and punishes !BTI
> machines. I'm going to suggest that if we need to carry a temp patch its
> more like the glibc patch I mentioned in the Fedora defect. That patch
> simply logs a message, on the mprotect failures rather than aborting. Its
> fairly non-intrusive.
>
> That leaves seccomp functional, and BTI generally functional except when
> seccomp is restricting it. I've also been asked that if a patch like that is
> needed, its (temporary?) merged to the glibc trunk, rather than just being
> carried by the distro's.

note that changing mprotect into mmap in glibc works
even if the kernel or systemd decides to do things
differently: currently the only wart is that on the
main exe we have to use mprotect and silently ignore
the failures. (but e.g. some return to libc attacks
are reliably prevented since libc.so guaranteed to
have PROT_BTI this way.)

the patchset is a bit more intrusive than i hoped
for, so if we expect to get a new syscall then
simply ignoring mprotect failures may be a better
temporary solution. (and i still need to do
benchmarks to see if the cost of re-mmap is not
much more than the cost of mprotect.)

changing the seccomp filter in systemd does not
help if there are other seccomp filters elsewhere
doing the same. i think we will have to avoid using
mprotect(PROT_EXEC) or at least ignore the failure.

> > reason we went with having the dynamic linker enable PROT_BTI in the
> > first place was to give us more flexibility to handle any unforseen
> > consequences of enabling BTI that we run into. We are going to have
> > similar issues with other features like MTE so we need to make sure that
> > whatever we're doing works with them too.
> >
> > Also updated to Will's current e-mail address - Will, do you have
> > thoughts on what we should do here?
> >

2020-11-04 09:05:14

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On 3.11.2020 19.34, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
>
>> Re-mmap executable segments instead of mprotecting them in
>> case mprotect is seccomp filtered.
>
>> For the kernel mapped main executable we don't have the fd
>> for re-mmap so linux needs to be updated to add BTI. (In the
>> presence of seccomp filters for mprotect(PROT_EXEC) the libc
>> cannot change BTI protection at runtime based on user space
>> policy so it is better if the kernel maps BTI compatible
>> binaries with PROT_BTI by default.)
>
> Given that there were still some ongoing discussions on a more robust
> kernel interface here and there seem to be a few concerns with this
> series should we perhaps just take a step back and disable this seccomp
> filter in systemd on arm64, at least for the time being?

Filtering mprotect() and mmap() with seccomp also protects BTI, since
without it the attacker could remove PROT_BTI from existing pages, or
map new pages without BTI. This would be possible even with SARA or
SELinux execmem protections enabled, since they don't care about PROT_BTI.

-Topi

2020-11-04 09:22:52

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Tue, Nov 03, 2020 at 05:34:38PM +0000, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
>
> > Re-mmap executable segments instead of mprotecting them in
> > case mprotect is seccomp filtered.
>
> > For the kernel mapped main executable we don't have the fd
> > for re-mmap so linux needs to be updated to add BTI. (In the
> > presence of seccomp filters for mprotect(PROT_EXEC) the libc
> > cannot change BTI protection at runtime based on user space
> > policy so it is better if the kernel maps BTI compatible
> > binaries with PROT_BTI by default.)
>
> Given that there were still some ongoing discussions on a more robust
> kernel interface here and there seem to be a few concerns with this
> series should we perhaps just take a step back and disable this seccomp
> filter in systemd on arm64, at least for the time being? That seems
> safer than rolling out things that set ABI quickly, a big part of the
> reason we went with having the dynamic linker enable PROT_BTI in the
> first place was to give us more flexibility to handle any unforseen
> consequences of enabling BTI that we run into. We are going to have
> similar issues with other features like MTE so we need to make sure that
> whatever we're doing works with them too.
>
> Also updated to Will's current e-mail address - Will, do you have
> thoughts on what we should do here?

Changing the kernel to map the main executable with PROT_BTI by default is a
user-visible change in behaviour and not without risk, so if we're going to
do that then it needs to be opt-in because the current behaviour has been
there since 5.8. I suppose we could shoe-horn in a cmdline option for 5.10
(which will be the first LTS with BTI) but it would be better to put up with
the current ABI if possible.

Is there real value in this seccomp filter if it only looks at mprotect(),
or was it just implemented because it's easy to do and sounds like a good
idea?

Will

2020-11-04 09:31:37

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

* Will Deacon:

> Is there real value in this seccomp filter if it only looks at mprotect(),
> or was it just implemented because it's easy to do and sounds like a good
> idea?

It seems bogus to me. Everyone will just create alias mappings instead,
just like they did for the similar SELinux feature. See “Example code
to avoid execmem violations” in:

<https://www.akkadia.org/drepper/selinux-mem.html>

As you can see, this reference implementation creates a PROT_WRITE
mapping aliased to a PROT_EXEC mapping, so it actually reduces security
compared to something that generates the code in an anonymous mapping
and calls mprotect to make it executable.

Furthermore, it requires unusual cache flushing code on some AArch64
implementations (a requirement that is not shared by any Linux other
architecture to which libffi has been ported), resulting in
hard-to-track-down real-world bugs.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

2020-11-04 09:58:04

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On 4.11.2020 11.29, Florian Weimer wrote:
> * Will Deacon:
>
>> Is there real value in this seccomp filter if it only looks at mprotect(),
>> or was it just implemented because it's easy to do and sounds like a good
>> idea?
>
> It seems bogus to me. Everyone will just create alias mappings instead,
> just like they did for the similar SELinux feature. See “Example code
> to avoid execmem violations” in:
>
> <https://www.akkadia.org/drepper/selinux-mem.html>

Also note "But this is very dangerous: programs should never use memory
regions which are writable and executable at the same time. Assuming
that it is really necessary to generate executable code while the
program runs the method employed should be reconsidered."

> As you can see, this reference implementation creates a PROT_WRITE
> mapping aliased to a PROT_EXEC mapping, so it actually reduces security
> compared to something that generates the code in an anonymous mapping
> and calls mprotect to make it executable.

Drepper's methods to avoid SELinux protections are indeed the two ways
(which I'm aware) to also avoid the seccomp filter: by using
memfd_create() and using a file system which writable and executable to
the process to create a new executable file. Both methods can be
eliminated for many system services, memfd_create() with seccomp and
filesystem W&X with mount namespaces.

If a service legitimately needs executable and writable mappings (due to
JIT, trampolines etc), it's easy to disable the filter whenever really
needed with "MemoryDenyWriteExecute=no" (which is the default) in case
of systemd or a TE rule like "allow type_t self:process { execmem };"
for SELinux. But this shouldn't be the default case, since there are
many services which don't need W&X.

I'd also question what is the value of BTI if it can be easily
circumvented by removing PROT_BTI with mprotect()?

-Topi

2020-11-04 10:53:54

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Tue, Nov 03, 2020 at 11:41:42PM -0600, Jeremy Linton wrote:
> On 11/3/20 11:34 AM, Mark Brown wrote:

> > Given that there were still some ongoing discussions on a more robust
> > kernel interface here and there seem to be a few concerns with this
> > series should we perhaps just take a step back and disable this seccomp
> > filter in systemd on arm64, at least for the time being? That seems
> > safer than rolling out things that set ABI quickly, a big part of the

> So, that's a bigger hammer than I think is needed and punishes !BTI
> machines. I'm going to suggest that if we need to carry a temp patch its
> more like the glibc patch I mentioned in the Fedora defect. That patch
> simply logs a message, on the mprotect failures rather than aborting. Its
> fairly non-intrusive.

> That leaves seccomp functional, and BTI generally functional except when
> seccomp is restricting it. I've also been asked that if a patch like that is
> needed, its (temporary?) merged to the glibc trunk, rather than just being
> carried by the distro's.

The effect on pre-BTI hardware is an issue, another option would be for
systemd to disable this seccomp usage but only after checking for BTI
support in the system rather than just doing so purely based on the
architecture.


Attachments:
(No filename) (1.27 kB)
signature.asc (499.00 B)
Download all attachments

2020-11-04 14:37:31

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote:
> On 4.11.2020 11.29, Florian Weimer wrote:
> > * Will Deacon:
> >
> > > Is there real value in this seccomp filter if it only looks at mprotect(),
> > > or was it just implemented because it's easy to do and sounds like a good
> > > idea?
> >
> > It seems bogus to me. Everyone will just create alias mappings instead,
> > just like they did for the similar SELinux feature. See “Example code
> > to avoid execmem violations” in:
> >
> > <https://www.akkadia.org/drepper/selinux-mem.html>
[...]
> > As you can see, this reference implementation creates a PROT_WRITE
> > mapping aliased to a PROT_EXEC mapping, so it actually reduces security
> > compared to something that generates the code in an anonymous mapping
> > and calls mprotect to make it executable.
[...]
> If a service legitimately needs executable and writable mappings (due to
> JIT, trampolines etc), it's easy to disable the filter whenever really
> needed with "MemoryDenyWriteExecute=no" (which is the default) in case of
> systemd or a TE rule like "allow type_t self:process { execmem };" for
> SELinux. But this shouldn't be the default case, since there are many
> services which don't need W&X.

I think Drepper's point is that separate X and W mappings, with enough
randomisation, would be more secure than allowing W&X at the same
address (but, of course, less secure than not having W at all, though
that's not always possible).

> I'd also question what is the value of BTI if it can be easily circumvented
> by removing PROT_BTI with mprotect()?

Well, BTI is a protection against JOP attacks. The assumption here is
that an attacker cannot invoke mprotect() to disable PROT_BTI. If it
can, it's probably not worth bothering with a subsequent JOP attack, it
can already call functions directly.

I see MDWX not as a way of detecting attacks but rather plugging
inadvertent security holes in certain programs. On arm64, such hardening
currently gets in the way of another hardening feature, BTI.

--
Catalin

2020-11-04 14:46:29

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Wed, Nov 04, 2020 at 08:57:05AM +0000, Szabolcs Nagy wrote:
> The 11/03/2020 23:41, Jeremy Linton wrote:
> > On 11/3/20 11:34 AM, Mark Brown wrote:
> > > On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
> > >
> > > > Re-mmap executable segments instead of mprotecting them in
> > > > case mprotect is seccomp filtered.
> > >
> > > > For the kernel mapped main executable we don't have the fd
> > > > for re-mmap so linux needs to be updated to add BTI. (In the
> > > > presence of seccomp filters for mprotect(PROT_EXEC) the libc
> > > > cannot change BTI protection at runtime based on user space
> > > > policy so it is better if the kernel maps BTI compatible
> > > > binaries with PROT_BTI by default.)
> > >
> > > Given that there were still some ongoing discussions on a more robust
> > > kernel interface here and there seem to be a few concerns with this
> > > series should we perhaps just take a step back and disable this seccomp
> > > filter in systemd on arm64, at least for the time being? That seems
> > > safer than rolling out things that set ABI quickly, a big part of the
> >
> > So, that's a bigger hammer than I think is needed and punishes !BTI
> > machines. I'm going to suggest that if we need to carry a temp patch its
> > more like the glibc patch I mentioned in the Fedora defect. That patch
> > simply logs a message, on the mprotect failures rather than aborting. Its
> > fairly non-intrusive.
> >
> > That leaves seccomp functional, and BTI generally functional except when
> > seccomp is restricting it. I've also been asked that if a patch like that is
> > needed, its (temporary?) merged to the glibc trunk, rather than just being
> > carried by the distro's.
>
> note that changing mprotect into mmap in glibc works
> even if the kernel or systemd decides to do things
> differently: currently the only wart is that on the
> main exe we have to use mprotect and silently ignore
> the failures.

Can the dynamic loader mmap() the main exe again while munmap'ing the
original one? (sorry if it was already discussed)

--
Catalin

2020-11-04 14:50:16

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

* Catalin Marinas:

> Can the dynamic loader mmap() the main exe again while munmap'ing the
> original one? (sorry if it was already discussed)

No, we don't have a descriptor for that. /proc may not be mounted, and
using the path stored there has a race condition anyway.

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

2020-11-04 16:11:49

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

The 11/04/2020 17:19, Topi Miettinen wrote:
> On 4.11.2020 16.35, Catalin Marinas wrote:
> > On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote:
> > > On 4.11.2020 11.29, Florian Weimer wrote:
> > > > * Will Deacon:
> > > > > Is there real value in this seccomp filter if it only looks at mprotect(),
> > > > > or was it just implemented because it's easy to do and sounds like a good
> > > > > idea?
> > > >
> > > > It seems bogus to me. Everyone will just create alias mappings instead,
> > > > just like they did for the similar SELinux feature. See “Example code
> > > > to avoid execmem violations” in:
> > > >
> > > > <https://www.akkadia.org/drepper/selinux-mem.html>
> > [...]
> > > > As you can see, this reference implementation creates a PROT_WRITE
> > > > mapping aliased to a PROT_EXEC mapping, so it actually reduces security
> > > > compared to something that generates the code in an anonymous mapping
> > > > and calls mprotect to make it executable.
> > [...]
> > > If a service legitimately needs executable and writable mappings (due to
> > > JIT, trampolines etc), it's easy to disable the filter whenever really
> > > needed with "MemoryDenyWriteExecute=no" (which is the default) in case of
> > > systemd or a TE rule like "allow type_t self:process { execmem };" for
> > > SELinux. But this shouldn't be the default case, since there are many
> > > services which don't need W&X.
> >
> > I think Drepper's point is that separate X and W mappings, with enough
> > randomisation, would be more secure than allowing W&X at the same
> > address (but, of course, less secure than not having W at all, though
> > that's not always possible).
> >
> > > I'd also question what is the value of BTI if it can be easily circumvented
> > > by removing PROT_BTI with mprotect()?
> >
> > Well, BTI is a protection against JOP attacks. The assumption here is
> > that an attacker cannot invoke mprotect() to disable PROT_BTI. If it
> > can, it's probably not worth bothering with a subsequent JOP attack, it
> > can already call functions directly.
>
> I suppose that the target for the attacker is to eventually perform system
> calls rather than looping forever in JOP/ROP gadgets.
>
> > I see MDWX not as a way of detecting attacks but rather plugging
> > inadvertent security holes in certain programs. On arm64, such hardening
> > currently gets in the way of another hardening feature, BTI.
>
> I don't think it has to get in the way at all. Why wouldn't something simple
> like this work:

PROT_BTI is only valid on binaries that are BTI compatible.
to detect that, the load segments must be already mapped.

AT_BTI does not solve this: we want to be able to load legacy
elf modules. (a BTI enforcement setting may be useful where
incompatible modules are rejected, but that cannot be the
default for backward compatibility reasons.)

>
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 646c5dca40..12a74d15e8 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -1170,8 +1170,13 @@ _dl_map_object_from_fd (const char *name, const char
> *origname, int fd,
> c->prot |= PROT_READ;
> if (ph->p_flags & PF_W)
> c->prot |= PROT_WRITE;
> - if (ph->p_flags & PF_X)
> + if (ph->p_flags & PF_X) {
> c->prot |= PROT_EXEC;
> +#ifdef PROT_BTI
> + if (GLRO(dl_bti) & 1)
> + c->prot |= PROT_BTI;
> +#endif
> + }
> #endif
> break;
>
> diff --git a/elf/dl-support.c b/elf/dl-support.c
> index 7704c101c5..22c7cc7b81 100644
> --- a/elf/dl-support.c
> +++ b/elf/dl-support.c
> @@ -222,7 +222,7 @@ __rtld_lock_define_initialized_recursive (,
> _dl_load_write_lock)
>
>
> #ifdef HAVE_AUX_VECTOR
> -int _dl_clktck;
> +int _dl_clktck, _dl_bti;
>
> void
> _dl_aux_init (ElfW(auxv_t) *av)
> @@ -294,6 +294,11 @@ _dl_aux_init (ElfW(auxv_t) *av)
> case AT_RANDOM:
> _dl_random = (void *) av->a_un.a_val;
> break;
> +#ifdef PROT_BTI
> + case AT_BTI:
> + _dl_bti = (void *) av->a_un.a_val;
> + break;
> +#endif
> DL_PLATFORM_AUXV
> }
> if (seen == 0xf)
>
> Kernel sets the aux vector to indicate that BTI should be enabled for all
> segments and main exe is already protected.
>
> -Topi

2020-11-04 17:40:12

by Mark Rutland

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote:
> On 4.11.2020 11.29, Florian Weimer wrote:
> > * Will Deacon:
> >
> > > Is there real value in this seccomp filter if it only looks at mprotect(),
> > > or was it just implemented because it's easy to do and sounds like a good
> > > idea?
> >
> > It seems bogus to me. Everyone will just create alias mappings instead,
> > just like they did for the similar SELinux feature. See “Example code
> > to avoid execmem violations” in:
> >
> > <https://www.akkadia.org/drepper/selinux-mem.html>
>
> Also note "But this is very dangerous: programs should never use memory
> regions which are writable and executable at the same time. Assuming that it
> is really necessary to generate executable code while the program runs the
> method employed should be reconsidered."

Sure, and to be clear we're not trying to violate the "at the same time"
property. We do not want to permit simultaneous PROT_WRITE and PROT_EXEC
at any instant in time. What we're asking is to not block changing
permissions to PROT_EXEC in the absence of PROT_WRITE.

I think that the goal of preventing WRITE -> EXEC transitions for some
memory is sane, but I think the existing kernel primitives available to
systemd don't allow us to do that in a robust way because we don't have
all the relevant state tracked and accessible, and the existing approach
gets in the way of doing the right thing for other mitigations.

Consequently I think it would be better going forward to add a more
robust (kernel) mechanism for enforcement that can distinguish
WRITE->EXEC from EXEC->EXEC+BTI, and e.g. can be used to forbid aliasing
mappings with differing W/X permissions. Then userspace could eventually
transition over to that and get /stronger/ protection while permitting
the BTI case we'd like to work now.

> If a service legitimately needs executable and writable mappings (due to
> JIT, trampolines etc), it's easy to disable the filter whenever really
> needed with "MemoryDenyWriteExecute=no" (which is the default) in case of
> systemd or a TE rule like "allow type_t self:process { execmem };" for
> SELinux. But this shouldn't be the default case, since there are many
> services which don't need W&X.
>
> I'd also question what is the value of BTI if it can be easily circumvented
> by removing PROT_BTI with mprotect()?

I agree that turning BTI off is a concern, and to that end I'd like to
add an enforcement mechanism whereby we could prevent that (ideally the
same mechanism by which we could prevent WRITE -> EXEC transitions).

But, as with all things it's a matter of degree. MDWE and BTI are both
hurdles to an adversary, but neither are absolutes and there are
approaches to bypass either. By the time someone's issuing mprotect()
with an arbitrary VA and/or prot, they are liable to have been able to
do the same with mmap() and circumvent MDWE.

I'd really like to not have BTI silently disabled in order to work with
MDWE, because the risk is that it gets silently disabled elsewhere. The
risk of the changing the kernel to enable BTI for a binary is not well
known since we don't control other peoples libraries that might end up
not being compatible somehow with that. The risk of disabling a portion
of the MDWE protections seems to be the least out of the options we have
available, as unfortunate as it seems, and I think we can come up with a
better MDWE approach going forward.

Thanks,
Mark.

2020-11-04 18:09:11

by Topi Miettinen

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On 4.11.2020 16.35, Catalin Marinas wrote:
> On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote:
>> On 4.11.2020 11.29, Florian Weimer wrote:
>>> * Will Deacon:
>>>
>>>> Is there real value in this seccomp filter if it only looks at mprotect(),
>>>> or was it just implemented because it's easy to do and sounds like a good
>>>> idea?
>>>
>>> It seems bogus to me. Everyone will just create alias mappings instead,
>>> just like they did for the similar SELinux feature. See “Example code
>>> to avoid execmem violations” in:
>>>
>>> <https://www.akkadia.org/drepper/selinux-mem.html>
> [...]
>>> As you can see, this reference implementation creates a PROT_WRITE
>>> mapping aliased to a PROT_EXEC mapping, so it actually reduces security
>>> compared to something that generates the code in an anonymous mapping
>>> and calls mprotect to make it executable.
> [...]
>> If a service legitimately needs executable and writable mappings (due to
>> JIT, trampolines etc), it's easy to disable the filter whenever really
>> needed with "MemoryDenyWriteExecute=no" (which is the default) in case of
>> systemd or a TE rule like "allow type_t self:process { execmem };" for
>> SELinux. But this shouldn't be the default case, since there are many
>> services which don't need W&X.
>
> I think Drepper's point is that separate X and W mappings, with enough
> randomisation, would be more secure than allowing W&X at the same
> address (but, of course, less secure than not having W at all, though
> that's not always possible).
>
>> I'd also question what is the value of BTI if it can be easily circumvented
>> by removing PROT_BTI with mprotect()?
>
> Well, BTI is a protection against JOP attacks. The assumption here is
> that an attacker cannot invoke mprotect() to disable PROT_BTI. If it
> can, it's probably not worth bothering with a subsequent JOP attack, it
> can already call functions directly.

I suppose that the target for the attacker is to eventually perform
system calls rather than looping forever in JOP/ROP gadgets.

> I see MDWX not as a way of detecting attacks but rather plugging
> inadvertent security holes in certain programs. On arm64, such hardening
> currently gets in the way of another hardening feature, BTI.

I don't think it has to get in the way at all. Why wouldn't something
simple like this work:

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 646c5dca40..12a74d15e8 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -1170,8 +1170,13 @@ _dl_map_object_from_fd (const char *name, const
char *origname, int fd,
c->prot |= PROT_READ;
if (ph->p_flags & PF_W)
c->prot |= PROT_WRITE;
- if (ph->p_flags & PF_X)
+ if (ph->p_flags & PF_X) {
c->prot |= PROT_EXEC;
+#ifdef PROT_BTI
+ if (GLRO(dl_bti) & 1)
+ c->prot |= PROT_BTI;
+#endif
+ }
#endif
break;

diff --git a/elf/dl-support.c b/elf/dl-support.c
index 7704c101c5..22c7cc7b81 100644
--- a/elf/dl-support.c
+++ b/elf/dl-support.c
@@ -222,7 +222,7 @@ __rtld_lock_define_initialized_recursive (,
_dl_load_write_lock)


#ifdef HAVE_AUX_VECTOR
-int _dl_clktck;
+int _dl_clktck, _dl_bti;

void
_dl_aux_init (ElfW(auxv_t) *av)
@@ -294,6 +294,11 @@ _dl_aux_init (ElfW(auxv_t) *av)
case AT_RANDOM:
_dl_random = (void *) av->a_un.a_val;
break;
+#ifdef PROT_BTI
+ case AT_BTI:
+ _dl_bti = (void *) av->a_un.a_val;
+ break;
+#endif
DL_PLATFORM_AUXV
}
if (seen == 0xf)

Kernel sets the aux vector to indicate that BTI should be enabled for
all segments and main exe is already protected.

-Topi

2020-11-04 19:21:56

by Mark Brown

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Wed, Nov 04, 2020 at 12:47:09PM -0600, Jeremy Linton wrote:
> On 11/4/20 4:50 AM, Mark Brown wrote:

> > The effect on pre-BTI hardware is an issue, another option would be for
> > systemd to disable this seccomp usage but only after checking for BTI
> > support in the system rather than just doing so purely based on the
> > architecture.

> That works, but your also losing seccomp in the case where the machine is
> BTI capable, but the service isn't. So it should really be checking the elf
> notes, but at that point you might just as well patch glibc.

True, I guess I was assuming that a BTI rebuild is done at the distro
level but of course even if that's the case a system could have third
party binaries so you can't just assume that the world is BTI.


Attachments:
(No filename) (781.00 B)
signature.asc (499.00 B)
Download all attachments

2020-11-04 19:26:46

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Hi,

On 11/4/20 9:20 AM, Mark Rutland wrote:
> On Wed, Nov 04, 2020 at 11:55:57AM +0200, Topi Miettinen wrote:
>> On 4.11.2020 11.29, Florian Weimer wrote:
>>> * Will Deacon:
>>>
>>>> Is there real value in this seccomp filter if it only looks at mprotect(),
>>>> or was it just implemented because it's easy to do and sounds like a good
>>>> idea?
>>>
>>> It seems bogus to me. Everyone will just create alias mappings instead,
>>> just like they did for the similar SELinux feature. See “Example code
>>> to avoid execmem violations” in:
>>>
>>> <https://www.akkadia.org/drepper/selinux-mem.html>
>>
>> Also note "But this is very dangerous: programs should never use memory
>> regions which are writable and executable at the same time. Assuming that it
>> is really necessary to generate executable code while the program runs the
>> method employed should be reconsidered."
>
> Sure, and to be clear we're not trying to violate the "at the same time"
> property. We do not want to permit simultaneous PROT_WRITE and PROT_EXEC
> at any instant in time. What we're asking is to not block changing
> permissions to PROT_EXEC in the absence of PROT_WRITE.
>
> I think that the goal of preventing WRITE -> EXEC transitions for some
> memory is sane, but I think the existing kernel primitives available to
> systemd don't allow us to do that in a robust way because we don't have
> all the relevant state tracked and accessible, and the existing approach
> gets in the way of doing the right thing for other mitigations.
>
> Consequently I think it would be better going forward to add a more
> robust (kernel) mechanism for enforcement that can distinguish
> WRITE->EXEC from EXEC->EXEC+BTI, and e.g. can be used to forbid aliasing
> mappings with differing W/X permissions. Then userspace could eventually
> transition over to that and get /stronger/ protection while permitting
> the BTI case we'd like to work now.
>
>> If a service legitimately needs executable and writable mappings (due to
>> JIT, trampolines etc), it's easy to disable the filter whenever really
>> needed with "MemoryDenyWriteExecute=no" (which is the default) in case of
>> systemd or a TE rule like "allow type_t self:process { execmem };" for
>> SELinux. But this shouldn't be the default case, since there are many
>> services which don't need W&X.
>>
>> I'd also question what is the value of BTI if it can be easily circumvented
>> by removing PROT_BTI with mprotect()?
>
> I agree that turning BTI off is a concern, and to that end I'd like to
> add an enforcement mechanism whereby we could prevent that (ideally the
> same mechanism by which we could prevent WRITE -> EXEC transitions).
>
> But, as with all things it's a matter of degree. MDWE and BTI are both
> hurdles to an adversary, but neither are absolutes and there are
> approaches to bypass either. By the time someone's issuing mprotect()
> with an arbitrary VA and/or prot, they are liable to have been able to
> do the same with mmap() and circumvent MDWE.
>
> I'd really like to not have BTI silently disabled in order to work with
> MDWE, because the risk is that it gets silently disabled elsewhere. The
> risk of the changing the kernel to enable BTI for a binary is not well
> known since we don't control other peoples libraries that might end up
> not being compatible somehow with that. The risk of disabling a portion
> of the MDWE protections seems to be the least out of the options we have
> available, as unfortunate as it seems, and I think we can come up with a
> better MDWE approach going forward.

OTOH, You don't really want to blanket disable either protection, and
unfortunately you can't really tell until its too late if the service
is fully BTI enabled. So you either end up disabling MDWE unnecessarily,
or you delay until the only choice is not enabling BTI.

I guess there is another option too, which is some kind of delayed MDWE
policy that only turns on once the service has started, but that isn't
ideal either.

.

>
> Thanks,
> Mark.
>

2020-11-04 20:37:27

by Jeremy Linton

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Hi,

On 11/4/20 4:50 AM, Mark Brown wrote:
> On Tue, Nov 03, 2020 at 11:41:42PM -0600, Jeremy Linton wrote:
>> On 11/3/20 11:34 AM, Mark Brown wrote:
>
>>> Given that there were still some ongoing discussions on a more robust
>>> kernel interface here and there seem to be a few concerns with this
>>> series should we perhaps just take a step back and disable this seccomp
>>> filter in systemd on arm64, at least for the time being? That seems
>>> safer than rolling out things that set ABI quickly, a big part of the
>
>> So, that's a bigger hammer than I think is needed and punishes !BTI
>> machines. I'm going to suggest that if we need to carry a temp patch its
>> more like the glibc patch I mentioned in the Fedora defect. That patch
>> simply logs a message, on the mprotect failures rather than aborting. Its
>> fairly non-intrusive.
>
>> That leaves seccomp functional, and BTI generally functional except when
>> seccomp is restricting it. I've also been asked that if a patch like that is
>> needed, its (temporary?) merged to the glibc trunk, rather than just being
>> carried by the distro's.
>
> The effect on pre-BTI hardware is an issue, another option would be for
> systemd to disable this seccomp usage but only after checking for BTI
> support in the system rather than just doing so purely based on the
> architecture.
>

That works, but your also losing seccomp in the case where the machine
is BTI capable, but the service isn't. So it should really be checking
the elf notes, but at that point you might just as well patch glibc.

2020-11-05 11:35:29

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH 0/4] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

The 11/04/2020 09:20, Will Deacon wrote:
> On Tue, Nov 03, 2020 at 05:34:38PM +0000, Mark Brown wrote:
> > On Tue, Nov 03, 2020 at 10:25:37AM +0000, Szabolcs Nagy wrote:
> >
> > > Re-mmap executable segments instead of mprotecting them in
> > > case mprotect is seccomp filtered.
> >
> > > For the kernel mapped main executable we don't have the fd
> > > for re-mmap so linux needs to be updated to add BTI. (In the
> > > presence of seccomp filters for mprotect(PROT_EXEC) the libc
> > > cannot change BTI protection at runtime based on user space
> > > policy so it is better if the kernel maps BTI compatible
> > > binaries with PROT_BTI by default.)
> >
> > Given that there were still some ongoing discussions on a more robust
> > kernel interface here and there seem to be a few concerns with this
> > series should we perhaps just take a step back and disable this seccomp
> > filter in systemd on arm64, at least for the time being? That seems
> > safer than rolling out things that set ABI quickly, a big part of the
> > reason we went with having the dynamic linker enable PROT_BTI in the
> > first place was to give us more flexibility to handle any unforseen
> > consequences of enabling BTI that we run into. We are going to have
> > similar issues with other features like MTE so we need to make sure that
> > whatever we're doing works with them too.
> >
> > Also updated to Will's current e-mail address - Will, do you have
> > thoughts on what we should do here?
>
> Changing the kernel to map the main executable with PROT_BTI by default is a
> user-visible change in behaviour and not without risk, so if we're going to
> do that then it needs to be opt-in because the current behaviour has been
> there since 5.8. I suppose we could shoe-horn in a cmdline option for 5.10
> (which will be the first LTS with BTI) but it would be better to put up with
> the current ABI if possible.

it's not clear to me how adding PROT_BTI in
the kernel would be observable in practice.

adding PROT_BTI to marked elf modules should
only have effect in cases when the process does
invalid operations and then there is no compat
requirement. if this is not the case then adding
PROT_BTI on static exe is already problematic.

if there is some issue with bti that makes
users want to turn it off, then they should do
it system wide or may be we can have a no-bti
option in userspace which uses mprotect to turn
it off (but since that has to be an explicit
opt-out i don't mind if the user also has to
disable the seccomp sandbox).

> Is there real value in this seccomp filter if it only looks at mprotect(),
> or was it just implemented because it's easy to do and sounds like a good
> idea?

i'm fine with just using mprotect and telling
users to remove the seccomp filter. but that
makes bti less attractive for deployment.