2020-11-27 13:22:27

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

This is v2 of
https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html

To enable BTI support, re-mmap executable segments instead of
mprotecting them in case mprotect is seccomp filtered.

I would like linux to change to map the main exe with PROT_BTI when
that is marked as BTI compatible. From the linux side i heard the
following concerns about this:
- it's an ABI change so requires some ABI bump. (this is fine with
me, i think glibc does not care about backward compat as nothing
can reasonably rely on the current behaviour, but if we have a
new bit in auxv or similar then we can save one mprotect call.)
- in case we discover compatibility issues with user binaries it's
better if userspace can easily disable BTI (e.g. removing the
mprotect based on some env var, but if kernel adds PROT_BTI and
mprotect is filtered then we have no reliable way to remove that
from executables. this problem already exists for static linked
exes, although admittedly those are less of a compat concern.)
- ideally PROT_BTI would be added via a new syscall that does not
interfere with PROT_EXEC filtering. (this does not conflict with
the current patches: even with a new syscall we need a fallback.)
- solve it in systemd (e.g. turn off the filter, use better filter):
i would prefer not to have aarch64 (or BTI) specific policy in
user code. and there was no satisfying way to do this portably.

Other concerns about the approach:
- mmap is more expensive than mprotect: in my measurements using
mmap instead of mprotect is 3-8x slower (and after mmap pages
have to be faulted in again), but e.g. the exec time of a program
with 4 deps only increases by < 8% due to the 4 new mmaps. (the
kernel side resource usage may increase too, i didnt look at that.)
- _dl_signal_error is not valid from the _dl_process_gnu_property
hook. The v2 set addresses this problem: i could either propagate
the errors up until they can be handled or solve it in the aarch64
backend by first recording failures and then dealing with them in
_dl_open_check. I choose the latter, but did some refactorings in
_dl_map_object_from_fd that makes the former possible too.

v2:
- [1/6]: New patch that fixes a missed BTI bug found during v2.
- [2-3/6]: New, _dl_map_object_from_fd failure handling improvements,
these are independent of the rest of the series.
- [4/6]: Move the note handling to a different place (after l_phdr
setup, but before fd is closed).
- [5/6]: Rebased.
- [6/6]: First record errors and only report them later. (this fixes
various failure handling issues.)

Szabolcs Nagy (6):
aarch64: Fix missing BTI protection from dependencies [BZ #26926]
elf: lose is closely tied to _dl_map_object_from_fd
elf: Fix failure handling in _dl_map_object_from_fd
elf: Move note processing after l_phdr is updated
elf: Pass the fd to note processing
aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

elf/dl-load.c | 110 ++++++++++++++++++++-----------------
elf/rtld.c | 4 +-
sysdeps/aarch64/dl-bti.c | 74 ++++++++++++++++++-------
sysdeps/aarch64/dl-prop.h | 14 +++--
sysdeps/aarch64/linkmap.h | 2 +-
sysdeps/generic/dl-prop.h | 6 +-
sysdeps/generic/ldsodefs.h | 5 +-
sysdeps/x86/dl-prop.h | 6 +-
8 files changed, 135 insertions(+), 86 deletions(-)

--
2.17.1


2020-11-27 13:23:41

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 1/6] aarch64: Fix missing BTI protection from dependencies [BZ #26926]

The _dl_open_check and _rtld_main_check hooks are not called on the
dependencies of a loaded module, so BTI protection was missed on
every module other than the main executable and directly dlopened
libraries.

The fix just iterates over dependencies to enable BTI.

Fixes bug 26926.
---
sysdeps/aarch64/dl-bti.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
index 196e462520..8f4728adce 100644
--- a/sysdeps/aarch64/dl-bti.c
+++ b/sysdeps/aarch64/dl-bti.c
@@ -51,11 +51,24 @@ enable_bti (struct link_map *map, const char *program)
return 0;
}

-/* Enable BTI for L if required. */
+/* Enable BTI for MAP and its dependencies. */

void
-_dl_bti_check (struct link_map *l, const char *program)
+_dl_bti_check (struct link_map *map, const char *program)
{
- if (GLRO(dl_aarch64_cpu_features).bti && l->l_mach.bti)
- enable_bti (l, program);
+ if (!GLRO(dl_aarch64_cpu_features).bti)
+ return;
+
+ if (map->l_mach.bti)
+ enable_bti (map, program);
+
+ unsigned int i = map->l_searchlist.r_nlist;
+ while (i-- > 0)
+ {
+ struct link_map *l = map->l_initfini[i];
+ if (l->l_init_called)
+ continue;
+ if (l->l_mach.bti)
+ enable_bti (l, program);
+ }
}
--
2.17.1

2020-11-27 13:23:54

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 2/6] elf: lose is closely tied to _dl_map_object_from_fd

Simple refactoring to keep failure handling next to
_dl_map_object_from_fd.
---
elf/dl-load.c | 48 ++++++++++++++++++++++++------------------------
1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index f3201e7c14..21e55deb19 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -835,30 +835,6 @@ _dl_init_paths (const char *llp, const char *source)
}


-static void
-__attribute__ ((noreturn, noinline))
-lose (int code, int fd, const char *name, char *realname, struct link_map *l,
- const char *msg, struct r_debug *r, Lmid_t nsid)
-{
- /* The file might already be closed. */
- if (fd != -1)
- (void) __close_nocancel (fd);
- if (l != NULL && l->l_origin != (char *) -1l)
- free ((char *) l->l_origin);
- free (l);
- free (realname);
-
- if (r != NULL)
- {
- r->r_state = RT_CONSISTENT;
- _dl_debug_state ();
- LIBC_PROBE (map_failed, 2, nsid, r);
- }
-
- _dl_signal_error (code, name, NULL, msg);
-}
-
-
/* Process PT_GNU_PROPERTY program header PH in module L after
PT_LOAD segments are mapped. Only one NT_GNU_PROPERTY_TYPE_0
note is handled which contains processor specific properties. */
@@ -930,6 +906,30 @@ _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
}


+static void
+__attribute__ ((noreturn, noinline))
+lose (int code, int fd, const char *name, char *realname, struct link_map *l,
+ const char *msg, struct r_debug *r, Lmid_t nsid)
+{
+ /* The file might already be closed. */
+ if (fd != -1)
+ (void) __close_nocancel (fd);
+ if (l != NULL && l->l_origin != (char *) -1l)
+ free ((char *) l->l_origin);
+ free (l);
+ free (realname);
+
+ if (r != NULL)
+ {
+ r->r_state = RT_CONSISTENT;
+ _dl_debug_state ();
+ LIBC_PROBE (map_failed, 2, nsid, r);
+ }
+
+ _dl_signal_error (code, name, NULL, msg);
+}
+
+
/* Map in the shared object NAME, actually located in REALNAME, and already
opened on FD. */

--
2.17.1

2020-11-27 13:25:02

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 3/6] elf: Fix failure handling in _dl_map_object_from_fd

There are many failure paths that call lose to do local cleanups
in _dl_map_object_from_fd, but it did not clean everything.

Handle l_phdr, l_libname and mapped segments in the common failure
handling code.

There are various bits that may not be cleaned properly on failure
(e.g. executable stack, tlsid, incomplete dl_map_segments).
---
elf/dl-load.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 21e55deb19..9c71b7562c 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -914,8 +914,15 @@ lose (int code, int fd, const char *name, char *realname, struct link_map *l,
/* The file might already be closed. */
if (fd != -1)
(void) __close_nocancel (fd);
+ if (l != NULL && l->l_map_start != 0)
+ _dl_unmap_segments (l);
if (l != NULL && l->l_origin != (char *) -1l)
free ((char *) l->l_origin);
+ if (l != NULL && !l->l_libname->dont_free)
+ free (l->l_libname);
+ if (l != NULL && l->l_phdr_allocated)
+ free ((void *) l->l_phdr);
+
free (l);
free (realname);

@@ -1256,7 +1263,11 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
errstring = _dl_map_segments (l, fd, header, type, loadcmds, nloadcmds,
maplength, has_holes, loader);
if (__glibc_unlikely (errstring != NULL))
- goto call_lose;
+ {
+ /* Mappings can be in an inconsistent state: avoid unmap. */
+ l->l_map_start = l->l_map_end = 0;
+ goto call_lose;
+ }

/* Process program headers again after load segments are mapped in
case processing requires accessing those segments. Scan program
@@ -1294,14 +1305,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
|| (__glibc_unlikely (l->l_flags_1 & DF_1_PIE)
&& __glibc_unlikely ((mode & __RTLD_OPENEXEC) == 0)))
{
- /* We are not supposed to load this object. Free all resources. */
- _dl_unmap_segments (l);
-
- if (!l->l_libname->dont_free)
- free (l->l_libname);
-
- if (l->l_phdr_allocated)
- free ((void *) l->l_phdr);

if (l->l_flags_1 & DF_1_PIE)
errstring
@@ -1392,6 +1395,9 @@ cannot enable executable stack as shared object requires");
/* Signal that we closed the file. */
fd = -1;

+ /* Failures before this point are handled locally via lose.
+ No more failures are allowed in this function until return. */
+
/* If this is ET_EXEC, we should have loaded it as lt_executable. */
assert (type != ET_EXEC || l->l_type == lt_executable);

--
2.17.1

2020-11-27 13:25:22

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 4/6] elf: Move note processing after l_phdr is updated

Program headers are processed in two pass: after the first pass
load segments are mmapped so in the second pass target specific
note processing logic can access the notes.

The second pass is moved later so various link_map fields are
set up that may be useful for note processing such as l_phdr.
The second pass should be before the fd is closed so that is
available.
---
elf/dl-load.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index 9c71b7562c..b0d65f32cc 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -1268,21 +1268,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
l->l_map_start = l->l_map_end = 0;
goto call_lose;
}
-
- /* Process program headers again after load segments are mapped in
- case processing requires accessing those segments. Scan program
- headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
- exits. */
- for (ph = &phdr[l->l_phnum]; ph != phdr; --ph)
- switch (ph[-1].p_type)
- {
- case PT_NOTE:
- _dl_process_pt_note (l, &ph[-1]);
- break;
- case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (l, &ph[-1]);
- break;
- }
}

if (l->l_ld == 0)
@@ -1386,6 +1371,21 @@ cannot enable executable stack as shared object requires");
if (l->l_tls_initimage != NULL)
l->l_tls_initimage = (char *) l->l_tls_initimage + l->l_addr;

+ /* Process program headers again after load segments are mapped in
+ case processing requires accessing those segments. Scan program
+ headers backward so that PT_NOTE can be skipped if PT_GNU_PROPERTY
+ exits. */
+ for (ph = &l->l_phdr[l->l_phnum]; ph != l->l_phdr; --ph)
+ switch (ph[-1].p_type)
+ {
+ case PT_NOTE:
+ _dl_process_pt_note (l, &ph[-1]);
+ break;
+ case PT_GNU_PROPERTY:
+ _dl_process_pt_gnu_property (l, &ph[-1]);
+ break;
+ }
+
/* We are done mapping in the file. We no longer need the descriptor. */
if (__glibc_unlikely (__close_nocancel (fd) != 0))
{
--
2.17.1

2020-11-27 13:26:58

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 5/6] elf: Pass the fd to note processing

To handle GNU property notes on aarch64 some segments need to
be mmaped again, so the fd of the loaded ELF module is needed.

When the fd is not available (kernel loaded modules), then -1
is passed.

The fd is passed to both _dl_process_pt_gnu_property and
_dl_process_pt_note for consistency. Target specific note
processing functions are updated accordingly.
---
elf/dl-load.c | 12 +++++++-----
elf/rtld.c | 4 ++--
sysdeps/aarch64/dl-prop.h | 6 +++---
sysdeps/generic/dl-prop.h | 6 +++---
sysdeps/generic/ldsodefs.h | 5 +++--
sysdeps/x86/dl-prop.h | 6 +++---
6 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/elf/dl-load.c b/elf/dl-load.c
index b0d65f32cc..74039f22a6 100644
--- a/elf/dl-load.c
+++ b/elf/dl-load.c
@@ -837,10 +837,12 @@ _dl_init_paths (const char *llp, const char *source)

/* Process PT_GNU_PROPERTY program header PH in module L after
PT_LOAD segments are mapped. Only one NT_GNU_PROPERTY_TYPE_0
- note is handled which contains processor specific properties. */
+ note is handled which contains processor specific properties.
+ FD is -1 for the kernel mapped main executable otherwise it is
+ the fd used for loading module L. */

void
-_dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_gnu_property (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
const ElfW(Addr) size = ph->p_memsz;
@@ -887,7 +889,7 @@ _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
last_type = type;

/* Target specific property processing. */
- if (_dl_process_gnu_property (l, type, datasz, ptr) == 0)
+ if (_dl_process_gnu_property (l, fd, type, datasz, ptr) == 0)
return;

/* Check the next property item. */
@@ -1379,10 +1381,10 @@ cannot enable executable stack as shared object requires");
switch (ph[-1].p_type)
{
case PT_NOTE:
- _dl_process_pt_note (l, &ph[-1]);
+ _dl_process_pt_note (l, fd, &ph[-1]);
break;
case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (l, &ph[-1]);
+ _dl_process_pt_gnu_property (l, fd, &ph[-1]);
break;
}

diff --git a/elf/rtld.c b/elf/rtld.c
index c4ffc8d4b7..ec62567580 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1540,10 +1540,10 @@ dl_main (const ElfW(Phdr) *phdr,
switch (ph[-1].p_type)
{
case PT_NOTE:
- _dl_process_pt_note (main_map, &ph[-1]);
+ _dl_process_pt_note (main_map, -1, &ph[-1]);
break;
case PT_GNU_PROPERTY:
- _dl_process_pt_gnu_property (main_map, &ph[-1]);
+ _dl_process_pt_gnu_property (main_map, -1, &ph[-1]);
break;
}

diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index b0785bda83..2016d1472e 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -35,13 +35,13 @@ _dl_open_check (struct link_map *m)
}

static inline void __attribute__ ((always_inline))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
}

static inline int
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
{
diff --git a/sysdeps/generic/dl-prop.h b/sysdeps/generic/dl-prop.h
index f1cf576fe3..df27ff8e6a 100644
--- a/sysdeps/generic/dl-prop.h
+++ b/sysdeps/generic/dl-prop.h
@@ -37,15 +37,15 @@ _dl_open_check (struct link_map *m)
}

static inline void __attribute__ ((always_inline))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
}

/* Called for each property in the NT_GNU_PROPERTY_TYPE_0 note of L,
processing of the properties continues until this returns 0. */
static inline int __attribute__ ((always_inline))
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
return 0;
}
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index b1da03cafe..89eab4719d 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -933,8 +933,9 @@ extern void _dl_rtld_di_serinfo (struct link_map *loader,
Dl_serinfo *si, bool counting);

/* Process PT_GNU_PROPERTY program header PH in module L after
- PT_LOAD segments are mapped. */
-void _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph);
+ PT_LOAD segments are mapped from file FD. */
+void _dl_process_pt_gnu_property (struct link_map *l, int fd,
+ const ElfW(Phdr) *ph);


/* Search loaded objects' symbol tables for a definition of the symbol
diff --git a/sysdeps/x86/dl-prop.h b/sysdeps/x86/dl-prop.h
index 89911e19e2..4eb3b85a7b 100644
--- a/sysdeps/x86/dl-prop.h
+++ b/sysdeps/x86/dl-prop.h
@@ -145,15 +145,15 @@ _dl_process_cet_property_note (struct link_map *l,
}

static inline void __attribute__ ((unused))
-_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
+_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
{
const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
_dl_process_cet_property_note (l, note, ph->p_memsz, ph->p_align);
}

static inline int __attribute__ ((always_inline))
-_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
- void *data)
+_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
+ uint32_t datasz, void *data)
{
return 0;
}
--
2.17.1

2020-11-27 18:31:37

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v2 6/6] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

Re-mmap executable segments if possible instead of using mprotect
to add PROT_BTI. This allows using BTI protection with security
policies that prevent mprotect with PROT_EXEC.

If the fd of the ELF module is not available because it was kernel
mapped then mprotect is used and failures are ignored. To protect
the main executable even when mprotect is filtered the linux kernel
will have to be changed to add PROT_BTI to it.

Computing the mapping bounds follows _dl_map_object_from_fd more
closely now.

The delayed failure reporting is mainly needed because currently
_dl_process_gnu_properties does not propagate failures such that
the required cleanups happen. Using the link_map_machine struct for
error propagation is not ideal, but this seemed to be the least
intrusive solution.

Fixes bug 26831.
---
sysdeps/aarch64/dl-bti.c | 67 +++++++++++++++++++++++++--------------
sysdeps/aarch64/dl-prop.h | 8 ++++-
sysdeps/aarch64/linkmap.h | 2 +-
3 files changed, 52 insertions(+), 25 deletions(-)

diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
index 8f4728adce..34b5294f92 100644
--- a/sysdeps/aarch64/dl-bti.c
+++ b/sysdeps/aarch64/dl-bti.c
@@ -19,39 +19,62 @@
#include <errno.h>
#include <libintl.h>
#include <ldsodefs.h>
+#include <sys/mman.h>

-static int
-enable_bti (struct link_map *map, const char *program)
+/* See elf/dl-load.h. */
+#ifndef MAP_COPY
+# define MAP_COPY (MAP_PRIVATE | MAP_DENYWRITE)
+#endif
+
+/* Enable BTI protection for MAP. */
+
+void
+_dl_bti_protect (struct link_map *map, int fd)
{
+ const size_t pagesz = GLRO(dl_pagesize);
const ElfW(Phdr) *phdr;
- unsigned prot;

for (phdr = map->l_phdr; phdr < &map->l_phdr[map->l_phnum]; ++phdr)
if (phdr->p_type == PT_LOAD && (phdr->p_flags & PF_X))
{
- void *start = (void *) (phdr->p_vaddr + map->l_addr);
- size_t len = phdr->p_memsz;
+ size_t vstart = ALIGN_DOWN (phdr->p_vaddr, pagesz);
+ size_t vend = ALIGN_UP (phdr->p_vaddr + phdr->p_filesz, pagesz);
+ off_t off = ALIGN_DOWN (phdr->p_offset, pagesz);
+ void *start = (void *) (vstart + map->l_addr);
+ size_t len = vend - vstart;

- prot = PROT_EXEC | PROT_BTI;
+ /* Add PROT_BTI. */
+ unsigned prot = PROT_EXEC | PROT_BTI;
if (phdr->p_flags & PF_R)
prot |= PROT_READ;
if (phdr->p_flags & PF_W)
prot |= PROT_WRITE;

- if (__mprotect (start, len, prot) < 0)
- {
- if (program)
- _dl_fatal_printf ("%s: mprotect failed to turn on BTI\n",
- map->l_name);
- else
- _dl_signal_error (errno, map->l_name, "dlopen",
- N_("mprotect failed to turn on BTI"));
- }
+ if (fd == -1)
+ /* Ignore failures for kernel mapped binaries. */
+ __mprotect (start, len, prot);
+ else
+ map->l_mach.bti_fail = __mmap (start, len, prot,
+ MAP_FIXED|MAP_COPY|MAP_FILE,
+ fd, off) == MAP_FAILED;
}
- return 0;
}

-/* Enable BTI for MAP and its dependencies. */
+
+static void
+bti_failed (struct link_map *l, const char *program)
+{
+ if (program)
+ _dl_fatal_printf ("%s: %s: failed to turn on BTI protection\n",
+ program, l->l_name);
+ else
+ /* Note: the errno value is not available any more. */
+ _dl_signal_error (0, l->l_name, "dlopen",
+ N_("failed to turn on BTI protection"));
+}
+
+
+/* Report BTI protection failures for MAP and its dependencies. */

void
_dl_bti_check (struct link_map *map, const char *program)
@@ -59,16 +82,14 @@ _dl_bti_check (struct link_map *map, const char *program)
if (!GLRO(dl_aarch64_cpu_features).bti)
return;

- if (map->l_mach.bti)
- enable_bti (map, program);
+ if (map->l_mach.bti_fail)
+ bti_failed (map, program);

unsigned int i = map->l_searchlist.r_nlist;
while (i-- > 0)
{
struct link_map *l = map->l_initfini[i];
- if (l->l_init_called)
- continue;
- if (l->l_mach.bti)
- enable_bti (l, program);
+ if (l->l_mach.bti_fail)
+ bti_failed (l, program);
}
}
diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index 2016d1472e..e926e54984 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -19,6 +19,8 @@
#ifndef _DL_PROP_H
#define _DL_PROP_H

+extern void _dl_bti_protect (struct link_map *, int) attribute_hidden;
+
extern void _dl_bti_check (struct link_map *, const char *)
attribute_hidden;

@@ -43,6 +45,10 @@ static inline int
_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
uint32_t datasz, void *data)
{
+ if (!GLRO(dl_aarch64_cpu_features).bti)
+ /* Skip note processing. */
+ return 0;
+
if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
{
/* Stop if the property note is ill-formed. */
@@ -51,7 +57,7 @@ _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,

unsigned int feature_1 = *(unsigned int *) data;
if (feature_1 & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
- l->l_mach.bti = true;
+ _dl_bti_protect (l, fd);

/* Stop if we processed the property note. */
return 0;
diff --git a/sysdeps/aarch64/linkmap.h b/sysdeps/aarch64/linkmap.h
index 847a03ace2..b3f7663b07 100644
--- a/sysdeps/aarch64/linkmap.h
+++ b/sysdeps/aarch64/linkmap.h
@@ -22,5 +22,5 @@ struct link_map_machine
{
ElfW(Addr) plt; /* Address of .plt */
void *tlsdesc_table; /* Address of TLS descriptor hash table. */
- bool bti; /* Branch Target Identification is enabled. */
+ bool bti_fail; /* Failed to enable Branch Target Identification. */
};
--
2.17.1

2020-11-30 16:00:08

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

The 11/27/2020 13:19, Szabolcs Nagy via Libc-alpha wrote:
> This is v2 of
> https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
>
> To enable BTI support, re-mmap executable segments instead of
> mprotecting them in case mprotect is seccomp filtered.
>
> I would like linux to change to map the main exe with PROT_BTI when
> that is marked as BTI compatible. From the linux side i heard the
> following concerns about this:
> - it's an ABI change so requires some ABI bump. (this is fine with
> me, i think glibc does not care about backward compat as nothing
> can reasonably rely on the current behaviour, but if we have a
> new bit in auxv or similar then we can save one mprotect call.)
> - in case we discover compatibility issues with user binaries it's
> better if userspace can easily disable BTI (e.g. removing the
> mprotect based on some env var, but if kernel adds PROT_BTI and
> mprotect is filtered then we have no reliable way to remove that
> from executables. this problem already exists for static linked
> exes, although admittedly those are less of a compat concern.)
> - ideally PROT_BTI would be added via a new syscall that does not
> interfere with PROT_EXEC filtering. (this does not conflict with
> the current patches: even with a new syscall we need a fallback.)
> - solve it in systemd (e.g. turn off the filter, use better filter):
> i would prefer not to have aarch64 (or BTI) specific policy in
> user code. and there was no satisfying way to do this portably.
>
> Other concerns about the approach:
> - mmap is more expensive than mprotect: in my measurements using
> mmap instead of mprotect is 3-8x slower (and after mmap pages
> have to be faulted in again), but e.g. the exec time of a program
> with 4 deps only increases by < 8% due to the 4 new mmaps. (the
> kernel side resource usage may increase too, i didnt look at that.)

i tested glibc build time with mprotect vs mmap
which should be exec heavy.

the real time overhead was < 0.2% on a particular
4 core system with linux 5.3 ubuntu kernel, which
i consider to be small.

(used PROT_EXEC without PROT_BTI for the measurement).


> - _dl_signal_error is not valid from the _dl_process_gnu_property
> hook. The v2 set addresses this problem: i could either propagate
> the errors up until they can be handled or solve it in the aarch64
> backend by first recording failures and then dealing with them in
> _dl_open_check. I choose the latter, but did some refactorings in
> _dl_map_object_from_fd that makes the former possible too.
>
> v2:
> - [1/6]: New patch that fixes a missed BTI bug found during v2.
> - [2-3/6]: New, _dl_map_object_from_fd failure handling improvements,
> these are independent of the rest of the series.
> - [4/6]: Move the note handling to a different place (after l_phdr
> setup, but before fd is closed).
> - [5/6]: Rebased.
> - [6/6]: First record errors and only report them later. (this fixes
> various failure handling issues.)
>
> Szabolcs Nagy (6):
> aarch64: Fix missing BTI protection from dependencies [BZ #26926]
> elf: lose is closely tied to _dl_map_object_from_fd
> elf: Fix failure handling in _dl_map_object_from_fd
> elf: Move note processing after l_phdr is updated
> elf: Pass the fd to note processing
> aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]
>
> elf/dl-load.c | 110 ++++++++++++++++++++-----------------
> elf/rtld.c | 4 +-
> sysdeps/aarch64/dl-bti.c | 74 ++++++++++++++++++-------
> sysdeps/aarch64/dl-prop.h | 14 +++--
> sysdeps/aarch64/linkmap.h | 2 +-
> sysdeps/generic/dl-prop.h | 6 +-
> sysdeps/generic/ldsodefs.h | 5 +-
> sysdeps/x86/dl-prop.h | 6 +-
> 8 files changed, 135 insertions(+), 86 deletions(-)
>
> --
> 2.17.1
>

2020-12-02 08:59:55

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v3 2/2] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]

Re-mmap executable segments if possible instead of using mprotect
to add PROT_BTI. This allows using BTI protection with security
policies that prevent mprotect with PROT_EXEC.

If the fd of the ELF module is not available because it was kernel
mapped then mprotect is used and failures are ignored. To protect
the main executable even when mprotect is filtered the linux kernel
will have to be changed to add PROT_BTI to it.

The delayed failure reporting is mainly needed because currently
_dl_process_gnu_properties does not propagate failures such that
the required cleanups happen. Using the link_map_machine struct for
error propagation is not ideal, but this seemed to be the least
intrusive solution.

Fixes bug 26831.
---
v3:
- split the last patch in two.
- pushed to nsz/btifix-v3 branch.

sysdeps/aarch64/dl-bti.c | 54 ++++++++++++++++++++++++++-------------
sysdeps/aarch64/dl-prop.h | 8 +++++-
sysdeps/aarch64/linkmap.h | 2 +-
3 files changed, 44 insertions(+), 20 deletions(-)

diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
index 67d63c8a73..ff26c98ccf 100644
--- a/sysdeps/aarch64/dl-bti.c
+++ b/sysdeps/aarch64/dl-bti.c
@@ -19,9 +19,17 @@
#include <errno.h>
#include <libintl.h>
#include <ldsodefs.h>
+#include <sys/mman.h>

-static void
-enable_bti (struct link_map *map, const char *program)
+/* See elf/dl-load.h. */
+#ifndef MAP_COPY
+# define MAP_COPY (MAP_PRIVATE | MAP_DENYWRITE)
+#endif
+
+/* Enable BTI protection for MAP. */
+
+void
+_dl_bti_protect (struct link_map *map, int fd)
{
const size_t pagesz = GLRO(dl_pagesize);
const ElfW(Phdr) *phdr;
@@ -41,19 +49,31 @@ enable_bti (struct link_map *map, const char *program)
if (phdr->p_flags & PF_W)
prot |= PROT_WRITE;

- if (__mprotect (start, len, prot) < 0)
- {
- if (program)
- _dl_fatal_printf ("%s: mprotect failed to turn on BTI\n",
- map->l_name);
- else
- _dl_signal_error (errno, map->l_name, "dlopen",
- N_("mprotect failed to turn on BTI"));
- }
+ if (fd == -1)
+ /* Ignore failures for kernel mapped binaries. */
+ __mprotect (start, len, prot);
+ else
+ map->l_mach.bti_fail = __mmap (start, len, prot,
+ MAP_FIXED|MAP_COPY|MAP_FILE,
+ fd, off) == MAP_FAILED;
}
}

-/* Enable BTI for MAP and its dependencies. */
+
+static void
+bti_failed (struct link_map *l, const char *program)
+{
+ if (program)
+ _dl_fatal_printf ("%s: %s: failed to turn on BTI protection\n",
+ program, l->l_name);
+ else
+ /* Note: the errno value is not available any more. */
+ _dl_signal_error (0, l->l_name, "dlopen",
+ N_("failed to turn on BTI protection"));
+}
+
+
+/* Report BTI protection failures for MAP and its dependencies. */

void
_dl_bti_check (struct link_map *map, const char *program)
@@ -61,16 +81,14 @@ _dl_bti_check (struct link_map *map, const char *program)
if (!GLRO(dl_aarch64_cpu_features).bti)
return;

- if (map->l_mach.bti)
- enable_bti (map, program);
+ if (map->l_mach.bti_fail)
+ bti_failed (map, program);

unsigned int i = map->l_searchlist.r_nlist;
while (i-- > 0)
{
struct link_map *l = map->l_initfini[i];
- if (l->l_init_called)
- continue;
- if (l->l_mach.bti)
- enable_bti (l, program);
+ if (l->l_mach.bti_fail)
+ bti_failed (l, program);
}
}
diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
index 2016d1472e..e926e54984 100644
--- a/sysdeps/aarch64/dl-prop.h
+++ b/sysdeps/aarch64/dl-prop.h
@@ -19,6 +19,8 @@
#ifndef _DL_PROP_H
#define _DL_PROP_H

+extern void _dl_bti_protect (struct link_map *, int) attribute_hidden;
+
extern void _dl_bti_check (struct link_map *, const char *)
attribute_hidden;

@@ -43,6 +45,10 @@ static inline int
_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
uint32_t datasz, void *data)
{
+ if (!GLRO(dl_aarch64_cpu_features).bti)
+ /* Skip note processing. */
+ return 0;
+
if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
{
/* Stop if the property note is ill-formed. */
@@ -51,7 +57,7 @@ _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,

unsigned int feature_1 = *(unsigned int *) data;
if (feature_1 & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
- l->l_mach.bti = true;
+ _dl_bti_protect (l, fd);

/* Stop if we processed the property note. */
return 0;
diff --git a/sysdeps/aarch64/linkmap.h b/sysdeps/aarch64/linkmap.h
index 847a03ace2..b3f7663b07 100644
--- a/sysdeps/aarch64/linkmap.h
+++ b/sysdeps/aarch64/linkmap.h
@@ -22,5 +22,5 @@ struct link_map_machine
{
ElfW(Addr) plt; /* Address of .plt */
void *tlsdesc_table; /* Address of TLS descriptor hash table. */
- bool bti; /* Branch Target Identification is enabled. */
+ bool bti_fail; /* Failed to enable Branch Target Identification. */
};
--
2.17.1

2020-12-02 09:00:16

by Szabolcs Nagy

[permalink] [raw]
Subject: [PATCH v3 1/2] aarch64: align address for BTI protection [BZ #26988]

Handle unaligned executable load segments (the bfd linker is not
expected to produce such binaries, but other linkers may).

Computing the mapping bounds follows _dl_map_object_from_fd more
closely now.

Fixes bug 26988.
---
v3:
- split the last patch in two so this bug is fixed separately.
- pushed to nsz/btifix-v3 branch.

sysdeps/aarch64/dl-bti.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
index 8f4728adce..67d63c8a73 100644
--- a/sysdeps/aarch64/dl-bti.c
+++ b/sysdeps/aarch64/dl-bti.c
@@ -20,19 +20,22 @@
#include <libintl.h>
#include <ldsodefs.h>

-static int
+static void
enable_bti (struct link_map *map, const char *program)
{
+ const size_t pagesz = GLRO(dl_pagesize);
const ElfW(Phdr) *phdr;
- unsigned prot;

for (phdr = map->l_phdr; phdr < &map->l_phdr[map->l_phnum]; ++phdr)
if (phdr->p_type == PT_LOAD && (phdr->p_flags & PF_X))
{
- void *start = (void *) (phdr->p_vaddr + map->l_addr);
- size_t len = phdr->p_memsz;
+ size_t vstart = ALIGN_DOWN (phdr->p_vaddr, pagesz);
+ size_t vend = ALIGN_UP (phdr->p_vaddr + phdr->p_filesz, pagesz);
+ off_t off = ALIGN_DOWN (phdr->p_offset, pagesz);
+ void *start = (void *) (vstart + map->l_addr);
+ size_t len = vend - vstart;

- prot = PROT_EXEC | PROT_BTI;
+ unsigned prot = PROT_EXEC | PROT_BTI;
if (phdr->p_flags & PF_R)
prot |= PROT_READ;
if (phdr->p_flags & PF_W)
@@ -48,7 +51,6 @@ enable_bti (struct link_map *map, const char *program)
N_("mprotect failed to turn on BTI"));
}
}
- return 0;
}

/* Enable BTI for MAP and its dependencies. */
--
2.17.1

2020-12-03 17:33:27

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

Hi Szabolcs,

On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> This is v2 of
> https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
>
> To enable BTI support, re-mmap executable segments instead of
> mprotecting them in case mprotect is seccomp filtered.
>
> I would like linux to change to map the main exe with PROT_BTI when
> that is marked as BTI compatible. From the linux side i heard the
> following concerns about this:
> - it's an ABI change so requires some ABI bump. (this is fine with
> me, i think glibc does not care about backward compat as nothing
> can reasonably rely on the current behaviour, but if we have a
> new bit in auxv or similar then we can save one mprotect call.)

I'm not concerned about the ABI change but there are workarounds like a
new auxv bit.

> - in case we discover compatibility issues with user binaries it's
> better if userspace can easily disable BTI (e.g. removing the
> mprotect based on some env var, but if kernel adds PROT_BTI and
> mprotect is filtered then we have no reliable way to remove that
> from executables. this problem already exists for static linked
> exes, although admittedly those are less of a compat concern.)

This is our main concern. For static binaries, the linker could detect,
in theory, potential issues when linking and not set the corresponding
ELF information.

At runtime, a dynamic linker could detect issues and avoid enabling BTI.
In both cases, it's a (static or dynamic) linker decision that belongs
in user-space.

> - ideally PROT_BTI would be added via a new syscall that does not
> interfere with PROT_EXEC filtering. (this does not conflict with
> the current patches: even with a new syscall we need a fallback.)

This can be discussed as a long term solution.

> - solve it in systemd (e.g. turn off the filter, use better filter):
> i would prefer not to have aarch64 (or BTI) specific policy in
> user code. and there was no satisfying way to do this portably.

I agree. I think the best for now (as a back-portable glibc fix) is to
ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
gets. BTI will be disabled if MDWX is enabled.

In the meantime, we should start (continue) looking at a solution that
works for both systemd and the kernel and be generic enough for other
architectures. The stateless nature of the current SECCOMP approach is
not suitable for this W^X policy. Kees had some suggestions here but the
thread seems to have died:

https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/

--
Catalin

2020-12-07 20:08:40

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

The 12/03/2020 17:30, Catalin Marinas wrote:
> On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> > This is v2 of
> > https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> >
> > To enable BTI support, re-mmap executable segments instead of
> > mprotecting them in case mprotect is seccomp filtered.
> >
> > I would like linux to change to map the main exe with PROT_BTI when
> > that is marked as BTI compatible. From the linux side i heard the
> > following concerns about this:
> > - it's an ABI change so requires some ABI bump. (this is fine with
> > me, i think glibc does not care about backward compat as nothing
> > can reasonably rely on the current behaviour, but if we have a
> > new bit in auxv or similar then we can save one mprotect call.)
>
> I'm not concerned about the ABI change but there are workarounds like a
> new auxv bit.
>
> > - in case we discover compatibility issues with user binaries it's
> > better if userspace can easily disable BTI (e.g. removing the
> > mprotect based on some env var, but if kernel adds PROT_BTI and
> > mprotect is filtered then we have no reliable way to remove that
> > from executables. this problem already exists for static linked
> > exes, although admittedly those are less of a compat concern.)
>
> This is our main concern. For static binaries, the linker could detect,
> in theory, potential issues when linking and not set the corresponding
> ELF information.
>
> At runtime, a dynamic linker could detect issues and avoid enabling BTI.
> In both cases, it's a (static or dynamic) linker decision that belongs
> in user-space.

note that the marking is tied to an elf module: if the static
linker can be trusted to produce correct marking then both the
static and dynamic linking cases work, otherwise neither works.
(the dynamic linker cannot detect bti issues, just apply user
supplied policy.)

1) if we consider bti part of the semantics of a marked module
then it should be always on if the system supports it and
ideally the loader of the module should deal with PROT_BTI.
(and if the marking is wrong then the binary is wrong.)

2) if we consider the marking to be a compatibility indicator
and let userspace policy to decide what to do with it then the
static exe and vdso cases should be handled by that policy too.
(this makes sense if we expect that there are reasons to turn
bti off for a process independently of markings. this requires
the static linking startup code to do the policy decision and
self-apply PROT_BTI early.)

the current code does not fit either case well, but i was
planning to do (1). and ideally PROT_BTI would be added
reliably, but a best effort only PROT_BTI works too, however
it limits our ability to report real mprotect failures.

> > - ideally PROT_BTI would be added via a new syscall that does not
> > interfere with PROT_EXEC filtering. (this does not conflict with
> > the current patches: even with a new syscall we need a fallback.)
>
> This can be discussed as a long term solution.
>
> > - solve it in systemd (e.g. turn off the filter, use better filter):
> > i would prefer not to have aarch64 (or BTI) specific policy in
> > user code. and there was no satisfying way to do this portably.
>
> I agree. I think the best for now (as a back-portable glibc fix) is to
> ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
> gets. BTI will be disabled if MDWX is enabled.

ok.

we got back to the original proposal: silently ignore mprotect
failures. i'm still considering the mmap solution for libraries
only: at least then libraries are handled reliably on current
setups, but i will have to think about whether attack targets
are mainly in libraries like libc or in executables.

>
> In the meantime, we should start (continue) looking at a solution that
> works for both systemd and the kernel and be generic enough for other
> architectures. The stateless nature of the current SECCOMP approach is
> not suitable for this W^X policy. Kees had some suggestions here but the
> thread seems to have died:
>
> https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/

it sounded like better W^X enforcement won't happen any time soon.

2020-12-10 17:55:31

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] aarch64: Fix missing BTI protection from dependencies [BZ #26926]



On 27/11/2020 10:19, Szabolcs Nagy via Libc-alpha wrote:
> The _dl_open_check and _rtld_main_check hooks are not called on the
> dependencies of a loaded module, so BTI protection was missed on
> every module other than the main executable and directly dlopened
> libraries.
>
> The fix just iterates over dependencies to enable BTI.
>
> Fixes bug 26926.

LGTM, modulus the argument name change.

I also think it would be better to add a testcase, for both DT_NEEDED
and dlopen case.

> ---
> sysdeps/aarch64/dl-bti.c | 21 +++++++++++++++++----
> 1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
> index 196e462520..8f4728adce 100644
> --- a/sysdeps/aarch64/dl-bti.c
> +++ b/sysdeps/aarch64/dl-bti.c
> @@ -51,11 +51,24 @@ enable_bti (struct link_map *map, const char *program)
> return 0;
> }
>
> -/* Enable BTI for L if required. */
> +/* Enable BTI for MAP and its dependencies. */
>
> void
> -_dl_bti_check (struct link_map *l, const char *program)
> +_dl_bti_check (struct link_map *map, const char *program)

I don't see much gain changing the argument name.

> {
> - if (GLRO(dl_aarch64_cpu_features).bti && l->l_mach.bti)
> - enable_bti (l, program);
> + if (!GLRO(dl_aarch64_cpu_features).bti)
> + return;
> +
> + if (map->l_mach.bti)
> + enable_bti (map, program);
> +
> + unsigned int i = map->l_searchlist.r_nlist;
> + while (i-- > 0)
> + {
> + struct link_map *l = map->l_initfini[i];
> + if (l->l_init_called)
> + continue;
> + if (l->l_mach.bti)
> + enable_bti (l, program);
> + }
> }
>

Ok.

2020-12-10 18:30:34

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] elf: Fix failure handling in _dl_map_object_from_fd



On 27/11/2020 10:20, Szabolcs Nagy via Libc-alpha wrote:
> There are many failure paths that call lose to do local cleanups
> in _dl_map_object_from_fd, but it did not clean everything.
>
> Handle l_phdr, l_libname and mapped segments in the common failure
> handling code.
>
> There are various bits that may not be cleaned properly on failure
> (e.g. executable stack, tlsid, incomplete dl_map_segments).
> ---
> elf/dl-load.c | 24 +++++++++++++++---------
> 1 file changed, 15 insertions(+), 9 deletions(-)
>
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index 21e55deb19..9c71b7562c 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -914,8 +914,15 @@ lose (int code, int fd, const char *name, char *realname, struct link_map *l,
> /* The file might already be closed. */
> if (fd != -1)
> (void) __close_nocancel (fd);
> + if (l != NULL && l->l_map_start != 0)
> + _dl_unmap_segments (l);
> if (l != NULL && l->l_origin != (char *) -1l)
> free ((char *) l->l_origin);
> + if (l != NULL && !l->l_libname->dont_free)
> + free (l->l_libname);
> + if (l != NULL && l->l_phdr_allocated)
> + free ((void *) l->l_phdr);
> +
> free (l);
> free (realname);
>
> @@ -1256,7 +1263,11 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> errstring = _dl_map_segments (l, fd, header, type, loadcmds, nloadcmds,
> maplength, has_holes, loader);
> if (__glibc_unlikely (errstring != NULL))
> - goto call_lose;
> + {
> + /* Mappings can be in an inconsistent state: avoid unmap. */
> + l->l_map_start = l->l_map_end = 0;
> + goto call_lose;
> + }
>
> /* Process program headers again after load segments are mapped in
> case processing requires accessing those segments. Scan program

In this case I am failing to see who would be responsible to unmap
l_map_start int the type == ET_DYN where first mmap succeeds but
with a later mmap failure in any load command.

> @@ -1294,14 +1305,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> || (__glibc_unlikely (l->l_flags_1 & DF_1_PIE)
> && __glibc_unlikely ((mode & __RTLD_OPENEXEC) == 0)))
> {
> - /* We are not supposed to load this object. Free all resources. */
> - _dl_unmap_segments (l);
> -
> - if (!l->l_libname->dont_free)
> - free (l->l_libname);
> -
> - if (l->l_phdr_allocated)
> - free ((void *) l->l_phdr);
>
> if (l->l_flags_1 & DF_1_PIE)
> errstring
> @@ -1392,6 +1395,9 @@ cannot enable executable stack as shared object requires");
> /* Signal that we closed the file. */
> fd = -1;
>
> + /* Failures before this point are handled locally via lose.
> + No more failures are allowed in this function until return. */
> +
> /* If this is ET_EXEC, we should have loaded it as lt_executable. */
> assert (type != ET_EXEC || l->l_type == lt_executable);
>
>

Ok.

2020-12-10 18:39:21

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] elf: Pass the fd to note processing



On 27/11/2020 10:21, Szabolcs Nagy via Libc-alpha wrote:
> To handle GNU property notes on aarch64 some segments need to
> be mmaped again, so the fd of the loaded ELF module is needed.
>
> When the fd is not available (kernel loaded modules), then -1
> is passed.
>
> The fd is passed to both _dl_process_pt_gnu_property and
> _dl_process_pt_note for consistency. Target specific note
> processing functions are updated accordingly.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella <[email protected]>

> ---
> elf/dl-load.c | 12 +++++++-----
> elf/rtld.c | 4 ++--
> sysdeps/aarch64/dl-prop.h | 6 +++---
> sysdeps/generic/dl-prop.h | 6 +++---
> sysdeps/generic/ldsodefs.h | 5 +++--
> sysdeps/x86/dl-prop.h | 6 +++---
> 6 files changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/elf/dl-load.c b/elf/dl-load.c
> index b0d65f32cc..74039f22a6 100644
> --- a/elf/dl-load.c
> +++ b/elf/dl-load.c
> @@ -837,10 +837,12 @@ _dl_init_paths (const char *llp, const char *source)
>
> /* Process PT_GNU_PROPERTY program header PH in module L after
> PT_LOAD segments are mapped. Only one NT_GNU_PROPERTY_TYPE_0
> - note is handled which contains processor specific properties. */
> + note is handled which contains processor specific properties.
> + FD is -1 for the kernel mapped main executable otherwise it is
> + the fd used for loading module L. */
>
> void
> -_dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
> +_dl_process_pt_gnu_property (struct link_map *l, int fd, const ElfW(Phdr) *ph)
> {
> const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
> const ElfW(Addr) size = ph->p_memsz;

Ok.

> @@ -887,7 +889,7 @@ _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph)
> last_type = type;
>
> /* Target specific property processing. */
> - if (_dl_process_gnu_property (l, type, datasz, ptr) == 0)
> + if (_dl_process_gnu_property (l, fd, type, datasz, ptr) == 0)
> return;
>
> /* Check the next property item. */

Ok.

> @@ -1379,10 +1381,10 @@ cannot enable executable stack as shared object requires");
> switch (ph[-1].p_type)
> {
> case PT_NOTE:
> - _dl_process_pt_note (l, &ph[-1]);
> + _dl_process_pt_note (l, fd, &ph[-1]);
> break;
> case PT_GNU_PROPERTY:
> - _dl_process_pt_gnu_property (l, &ph[-1]);
> + _dl_process_pt_gnu_property (l, fd, &ph[-1]);
> break;
> }
>

Ok.

> diff --git a/elf/rtld.c b/elf/rtld.c
> index c4ffc8d4b7..ec62567580 100644
> --- a/elf/rtld.c
> +++ b/elf/rtld.c
> @@ -1540,10 +1540,10 @@ dl_main (const ElfW(Phdr) *phdr,
> switch (ph[-1].p_type)
> {
> case PT_NOTE:
> - _dl_process_pt_note (main_map, &ph[-1]);
> + _dl_process_pt_note (main_map, -1, &ph[-1]);
> break;
> case PT_GNU_PROPERTY:
> - _dl_process_pt_gnu_property (main_map, &ph[-1]);
> + _dl_process_pt_gnu_property (main_map, -1, &ph[-1]);
> break;
> }
>

Ok.

> diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
> index b0785bda83..2016d1472e 100644
> --- a/sysdeps/aarch64/dl-prop.h
> +++ b/sysdeps/aarch64/dl-prop.h
> @@ -35,13 +35,13 @@ _dl_open_check (struct link_map *m)
> }
>
> static inline void __attribute__ ((always_inline))
> -_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
> +_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
> {
> }
>
> static inline int
> -_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
> - void *data)
> +_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
> + uint32_t datasz, void *data)
> {
> if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
> {

Ok.

> diff --git a/sysdeps/generic/dl-prop.h b/sysdeps/generic/dl-prop.h
> index f1cf576fe3..df27ff8e6a 100644
> --- a/sysdeps/generic/dl-prop.h
> +++ b/sysdeps/generic/dl-prop.h
> @@ -37,15 +37,15 @@ _dl_open_check (struct link_map *m)
> }
>
> static inline void __attribute__ ((always_inline))
> -_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
> +_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
> {
> }
>
> /* Called for each property in the NT_GNU_PROPERTY_TYPE_0 note of L,
> processing of the properties continues until this returns 0. */
> static inline int __attribute__ ((always_inline))
> -_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
> - void *data)
> +_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
> + uint32_t datasz, void *data)
> {
> return 0;
> }

Ok.

> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index b1da03cafe..89eab4719d 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -933,8 +933,9 @@ extern void _dl_rtld_di_serinfo (struct link_map *loader,
> Dl_serinfo *si, bool counting);
>
> /* Process PT_GNU_PROPERTY program header PH in module L after
> - PT_LOAD segments are mapped. */
> -void _dl_process_pt_gnu_property (struct link_map *l, const ElfW(Phdr) *ph);
> + PT_LOAD segments are mapped from file FD. */
> +void _dl_process_pt_gnu_property (struct link_map *l, int fd,
> + const ElfW(Phdr) *ph);
>
>
> /* Search loaded objects' symbol tables for a definition of the symbol

Ok.

> diff --git a/sysdeps/x86/dl-prop.h b/sysdeps/x86/dl-prop.h
> index 89911e19e2..4eb3b85a7b 100644
> --- a/sysdeps/x86/dl-prop.h
> +++ b/sysdeps/x86/dl-prop.h
> @@ -145,15 +145,15 @@ _dl_process_cet_property_note (struct link_map *l,
> }
>
> static inline void __attribute__ ((unused))
> -_dl_process_pt_note (struct link_map *l, const ElfW(Phdr) *ph)
> +_dl_process_pt_note (struct link_map *l, int fd, const ElfW(Phdr) *ph)
> {
> const ElfW(Nhdr) *note = (const void *) (ph->p_vaddr + l->l_addr);
> _dl_process_cet_property_note (l, note, ph->p_memsz, ph->p_align);
> }
>
> static inline int __attribute__ ((always_inline))
> -_dl_process_gnu_property (struct link_map *l, uint32_t type, uint32_t datasz,
> - void *data)
> +_dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
> + uint32_t datasz, void *data)
> {
> return 0;
> }
>

Ok.

2020-12-10 19:17:09

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] aarch64: Use mmap to add PROT_BTI instead of mprotect [BZ #26831]



On 02/12/2020 05:55, Szabolcs Nagy via Libc-alpha wrote:
> Re-mmap executable segments if possible instead of using mprotect
> to add PROT_BTI. This allows using BTI protection with security
> policies that prevent mprotect with PROT_EXEC.
>
> If the fd of the ELF module is not available because it was kernel
> mapped then mprotect is used and failures are ignored. To protect
> the main executable even when mprotect is filtered the linux kernel
> will have to be changed to add PROT_BTI to it.
>
> The delayed failure reporting is mainly needed because currently
> _dl_process_gnu_properties does not propagate failures such that
> the required cleanups happen. Using the link_map_machine struct for
> error propagation is not ideal, but this seemed to be the least
> intrusive solution.
>
> Fixes bug 26831.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella <[email protected]>

> ---
> v3:
> - split the last patch in two.
> - pushed to nsz/btifix-v3 branch.
>
> sysdeps/aarch64/dl-bti.c | 54 ++++++++++++++++++++++++++-------------
> sysdeps/aarch64/dl-prop.h | 8 +++++-
> sysdeps/aarch64/linkmap.h | 2 +-
> 3 files changed, 44 insertions(+), 20 deletions(-)
>
> diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
> index 67d63c8a73..ff26c98ccf 100644
> --- a/sysdeps/aarch64/dl-bti.c
> +++ b/sysdeps/aarch64/dl-bti.c
> @@ -19,9 +19,17 @@
> #include <errno.h>
> #include <libintl.h>
> #include <ldsodefs.h>
> +#include <sys/mman.h>
>
> -static void
> -enable_bti (struct link_map *map, const char *program)
> +/* See elf/dl-load.h. */
> +#ifndef MAP_COPY
> +# define MAP_COPY (MAP_PRIVATE | MAP_DENYWRITE)
> +#endif
> +
> +/* Enable BTI protection for MAP. */
> +
> +void
> +_dl_bti_protect (struct link_map *map, int fd)
> {
> const size_t pagesz = GLRO(dl_pagesize);
> const ElfW(Phdr) *phdr;
> @@ -41,19 +49,31 @@ enable_bti (struct link_map *map, const char *program)
> if (phdr->p_flags & PF_W)
> prot |= PROT_WRITE;
>
> - if (__mprotect (start, len, prot) < 0)
> - {
> - if (program)
> - _dl_fatal_printf ("%s: mprotect failed to turn on BTI\n",
> - map->l_name);
> - else
> - _dl_signal_error (errno, map->l_name, "dlopen",
> - N_("mprotect failed to turn on BTI"));
> - }
> + if (fd == -1)
> + /* Ignore failures for kernel mapped binaries. */
> + __mprotect (start, len, prot);
> + else
> + map->l_mach.bti_fail = __mmap (start, len, prot,
> + MAP_FIXED|MAP_COPY|MAP_FILE,
> + fd, off) == MAP_FAILED;
> }
> }
>

OK. I am not very found of ignore the mprotect failure here, but it
has been discussed in multiple threads that we need kernel support
to proper handle it.

> -/* Enable BTI for MAP and its dependencies. */
> +
> +static void
> +bti_failed (struct link_map *l, const char *program)
> +{
> + if (program)

No implicit checks.

> + _dl_fatal_printf ("%s: %s: failed to turn on BTI protection\n",
> + program, l->l_name);
> + else
> + /* Note: the errno value is not available any more. */
> + _dl_signal_error (0, l->l_name, "dlopen",
> + N_("failed to turn on BTI protection"));
> +}
> +
> +
> +/* Report BTI protection failures for MAP and its dependencies. */
>

Ok.

> void
> _dl_bti_check (struct link_map *map, const char *program)
> @@ -61,16 +81,14 @@ _dl_bti_check (struct link_map *map, const char *program)
> if (!GLRO(dl_aarch64_cpu_features).bti)
> return;
>
> - if (map->l_mach.bti)
> - enable_bti (map, program);
> + if (map->l_mach.bti_fail)
> + bti_failed (map, program);
>
> unsigned int i = map->l_searchlist.r_nlist;
> while (i-- > 0)
> {
> struct link_map *l = map->l_initfini[i];
> - if (l->l_init_called)
> - continue;
> - if (l->l_mach.bti)
> - enable_bti (l, program);
> + if (l->l_mach.bti_fail)
> + bti_failed (l, program);
> }
> }

Ok.

> diff --git a/sysdeps/aarch64/dl-prop.h b/sysdeps/aarch64/dl-prop.h
> index 2016d1472e..e926e54984 100644
> --- a/sysdeps/aarch64/dl-prop.h
> +++ b/sysdeps/aarch64/dl-prop.h
> @@ -19,6 +19,8 @@
> #ifndef _DL_PROP_H
> #define _DL_PROP_H
>
> +extern void _dl_bti_protect (struct link_map *, int) attribute_hidden;
> +
> extern void _dl_bti_check (struct link_map *, const char *)
> attribute_hidden;
>
> @@ -43,6 +45,10 @@ static inline int
> _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
> uint32_t datasz, void *data)
> {
> + if (!GLRO(dl_aarch64_cpu_features).bti)
> + /* Skip note processing. */
> + return 0;
> +
> if (type == GNU_PROPERTY_AARCH64_FEATURE_1_AND)
> {
> /* Stop if the property note is ill-formed. */
> @@ -51,7 +57,7 @@ _dl_process_gnu_property (struct link_map *l, int fd, uint32_t type,
>
> unsigned int feature_1 = *(unsigned int *) data;
> if (feature_1 & GNU_PROPERTY_AARCH64_FEATURE_1_BTI)
> - l->l_mach.bti = true;
> + _dl_bti_protect (l, fd);
>
> /* Stop if we processed the property note. */
> return 0;

Ok.

> diff --git a/sysdeps/aarch64/linkmap.h b/sysdeps/aarch64/linkmap.h
> index 847a03ace2..b3f7663b07 100644
> --- a/sysdeps/aarch64/linkmap.h
> +++ b/sysdeps/aarch64/linkmap.h
> @@ -22,5 +22,5 @@ struct link_map_machine
> {
> ElfW(Addr) plt; /* Address of .plt */
> void *tlsdesc_table; /* Address of TLS descriptor hash table. */
> - bool bti; /* Branch Target Identification is enabled. */
> + bool bti_fail; /* Failed to enable Branch Target Identification. */
> };
>

Ok.

2020-12-11 13:17:03

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] elf: Fix failure handling in _dl_map_object_from_fd

The 12/10/2020 15:25, Adhemerval Zanella wrote:
> On 27/11/2020 10:20, Szabolcs Nagy via Libc-alpha wrote:
> > There are many failure paths that call lose to do local cleanups
> > in _dl_map_object_from_fd, but it did not clean everything.
> >
> > Handle l_phdr, l_libname and mapped segments in the common failure
> > handling code.
> >
> > There are various bits that may not be cleaned properly on failure
> > (e.g. executable stack, tlsid, incomplete dl_map_segments).
> > ---
> > elf/dl-load.c | 24 +++++++++++++++---------
> > 1 file changed, 15 insertions(+), 9 deletions(-)
> >
> > diff --git a/elf/dl-load.c b/elf/dl-load.c
> > index 21e55deb19..9c71b7562c 100644
> > --- a/elf/dl-load.c
> > +++ b/elf/dl-load.c
> > @@ -914,8 +914,15 @@ lose (int code, int fd, const char *name, char *realname, struct link_map *l,
> > /* The file might already be closed. */
> > if (fd != -1)
> > (void) __close_nocancel (fd);
> > + if (l != NULL && l->l_map_start != 0)
> > + _dl_unmap_segments (l);
> > if (l != NULL && l->l_origin != (char *) -1l)
> > free ((char *) l->l_origin);
> > + if (l != NULL && !l->l_libname->dont_free)
> > + free (l->l_libname);
> > + if (l != NULL && l->l_phdr_allocated)
> > + free ((void *) l->l_phdr);
> > +
> > free (l);
> > free (realname);
> >
> > @@ -1256,7 +1263,11 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> > errstring = _dl_map_segments (l, fd, header, type, loadcmds, nloadcmds,
> > maplength, has_holes, loader);
> > if (__glibc_unlikely (errstring != NULL))
> > - goto call_lose;
> > + {
> > + /* Mappings can be in an inconsistent state: avoid unmap. */
> > + l->l_map_start = l->l_map_end = 0;
> > + goto call_lose;
> > + }
> >
> > /* Process program headers again after load segments are mapped in
> > case processing requires accessing those segments. Scan program
>
> In this case I am failing to see who would be responsible to unmap
> l_map_start int the type == ET_DYN where first mmap succeeds but
> with a later mmap failure in any load command.

failures are either cleaned up locally in this function
via lose or after a clean return via dlclose.

failures that are not cleaned up will leak resources.

_dl_map_segments failure is not cleaned up (the mappings
are in an unknown state). however after a successful
_dl_map_segments later failures can clean the mappings
and that's what i fixed here.

i did not try to fix transitive design bugs (such as
leaks in _dl_map_segments) that would require interface
change or local cleanups in those other functions.

> > @@ -1294,14 +1305,6 @@ _dl_map_object_from_fd (const char *name, const char *origname, int fd,
> > || (__glibc_unlikely (l->l_flags_1 & DF_1_PIE)
> > && __glibc_unlikely ((mode & __RTLD_OPENEXEC) == 0)))
> > {
> > - /* We are not supposed to load this object. Free all resources. */
> > - _dl_unmap_segments (l);
> > -
> > - if (!l->l_libname->dont_free)
> > - free (l->l_libname);
> > -
> > - if (l->l_phdr_allocated)
> > - free ((void *) l->l_phdr);
> >
> > if (l->l_flags_1 & DF_1_PIE)
> > errstring
> > @@ -1392,6 +1395,9 @@ cannot enable executable stack as shared object requires");
> > /* Signal that we closed the file. */
> > fd = -1;
> >
> > + /* Failures before this point are handled locally via lose.
> > + No more failures are allowed in this function until return. */
> > +
> > /* If this is ET_EXEC, we should have loaded it as lt_executable. */
> > assert (type != ET_EXEC || l->l_type == lt_executable);
> >
> >
>
> Ok.

2020-12-11 21:14:39

by Adhemerval Zanella

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] aarch64: align address for BTI protection [BZ #26988]



On 02/12/2020 05:55, Szabolcs Nagy via Libc-alpha wrote:
> Handle unaligned executable load segments (the bfd linker is not
> expected to produce such binaries, but other linkers may).
>
> Computing the mapping bounds follows _dl_map_object_from_fd more
> closely now.
>
> Fixes bug 26988.

LGTM, thanks.

Reviewed-by: Adhemerval Zanella <[email protected]>

> ---
> v3:
> - split the last patch in two so this bug is fixed separately.
> - pushed to nsz/btifix-v3 branch.
>
> sysdeps/aarch64/dl-bti.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/sysdeps/aarch64/dl-bti.c b/sysdeps/aarch64/dl-bti.c
> index 8f4728adce..67d63c8a73 100644
> --- a/sysdeps/aarch64/dl-bti.c
> +++ b/sysdeps/aarch64/dl-bti.c
> @@ -20,19 +20,22 @@
> #include <libintl.h>
> #include <ldsodefs.h>
>
> -static int
> +static void
> enable_bti (struct link_map *map, const char *program)
> {

Ok.

> + const size_t pagesz = GLRO(dl_pagesize);
> const ElfW(Phdr) *phdr;
> - unsigned prot;
>
> for (phdr = map->l_phdr; phdr < &map->l_phdr[map->l_phnum]; ++phdr)
> if (phdr->p_type == PT_LOAD && (phdr->p_flags & PF_X))
> {
> - void *start = (void *) (phdr->p_vaddr + map->l_addr);
> - size_t len = phdr->p_memsz;
> + size_t vstart = ALIGN_DOWN (phdr->p_vaddr, pagesz);
> + size_t vend = ALIGN_UP (phdr->p_vaddr + phdr->p_filesz, pagesz);
> + off_t off = ALIGN_DOWN (phdr->p_offset, pagesz);
> + void *start = (void *) (vstart + map->l_addr);
> + size_t len = vend - vstart;
>
> - prot = PROT_EXEC | PROT_BTI;
> + unsigned prot = PROT_EXEC | PROT_BTI;
> if (phdr->p_flags & PF_R)
> prot |= PROT_READ;
> if (phdr->p_flags & PF_W)
> @@ -48,7 +51,6 @@ enable_bti (struct link_map *map, const char *program)
> N_("mprotect failed to turn on BTI"));
> }
> }
> - return 0;
> }
>
> /* Enable BTI for MAP and its dependencies. */
>

Ok.

2020-12-12 18:06:13

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [PATCH v2 1/6] aarch64: Fix missing BTI protection from dependencies [BZ #26926]

The 12/10/2020 14:51, Adhemerval Zanella wrote:
> On 27/11/2020 10:19, Szabolcs Nagy via Libc-alpha wrote:
> > The _dl_open_check and _rtld_main_check hooks are not called on the
> > dependencies of a loaded module, so BTI protection was missed on
> > every module other than the main executable and directly dlopened
> > libraries.
> >
> > The fix just iterates over dependencies to enable BTI.
> >
> > Fixes bug 26926.
>
> LGTM, modulus the argument name change.
>
> I also think it would be better to add a testcase, for both DT_NEEDED
> and dlopen case.

thanks, i committed this with fixed argument name as attached.

i plan to do tests later once i understand the BTI semantics
(right now it's not clear if in the presence of some W^X
policy BTI may be silently ignored or not).


Attachments:
(No filename) (810.00 B)
0001-aarch64-Fix-missing-BTI-protection-from-dependencies.patch (1.46 kB)
Download all attachments

2020-12-12 21:44:09

by Catalin Marinas

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] aarch64: avoid mprotect(PROT_BTI|PROT_EXEC) [BZ #26831]

On Mon, Dec 07, 2020 at 08:03:38PM +0000, Szabolcs Nagy wrote:
> The 12/03/2020 17:30, Catalin Marinas wrote:
> > On Fri, Nov 27, 2020 at 01:19:16PM +0000, Szabolcs Nagy wrote:
> > > This is v2 of
> > > https://sourceware.org/pipermail/libc-alpha/2020-November/119305.html
> > >
> > > To enable BTI support, re-mmap executable segments instead of
> > > mprotecting them in case mprotect is seccomp filtered.
> > >
> > > I would like linux to change to map the main exe with PROT_BTI when
> > > that is marked as BTI compatible. From the linux side i heard the
> > > following concerns about this:
> > > - it's an ABI change so requires some ABI bump. (this is fine with
> > > me, i think glibc does not care about backward compat as nothing
> > > can reasonably rely on the current behaviour, but if we have a
> > > new bit in auxv or similar then we can save one mprotect call.)
> >
> > I'm not concerned about the ABI change but there are workarounds like a
> > new auxv bit.
> >
> > > - in case we discover compatibility issues with user binaries it's
> > > better if userspace can easily disable BTI (e.g. removing the
> > > mprotect based on some env var, but if kernel adds PROT_BTI and
> > > mprotect is filtered then we have no reliable way to remove that
> > > from executables. this problem already exists for static linked
> > > exes, although admittedly those are less of a compat concern.)
> >
> > This is our main concern. For static binaries, the linker could detect,
> > in theory, potential issues when linking and not set the corresponding
> > ELF information.
> >
> > At runtime, a dynamic linker could detect issues and avoid enabling BTI.
> > In both cases, it's a (static or dynamic) linker decision that belongs
> > in user-space.
>
> note that the marking is tied to an elf module: if the static
> linker can be trusted to produce correct marking then both the
> static and dynamic linking cases work, otherwise neither works.
> (the dynamic linker cannot detect bti issues, just apply user
> supplied policy.)

My assumption is that the dynamic linker may become smarter and detect
BTI issues, if necessary.

Let's say we link together multiple objects, some of them with BTI
instructions, others without. Does the static linker generate a
.note.gnu.property section with GNU_PROPERTY_AARCH64_FEATURE_1_BTI? I
guess not, otherwise the .text section would have a mixture of functions
with and without landing pads.

In the dynamic linker case, if there are multiple shared objects where
some are missing BTI, I guess the dynamic linker currently invokes
mprotect(PROT_BTI) (or mmap()) on all objects with the corresponding
GNU_PROPERTY.

While I don't immediately see an issue with the dynamic loader always
turning on PROT_BTI based solely on the shared object it is linking in,
the static linker takes a more conservative approach. The dynamic linker
may not have a similar choice in the future if the kernel forced
PROT_BTI on the main executable. In both cases it was a user choice.

The dynamic loader itself is statically linked, so any potential
mismatch would have been detected at build time and the corresponding
GNU_PROPERTY unset.

> 1) if we consider bti part of the semantics of a marked module
> then it should be always on if the system supports it and
> ideally the loader of the module should deal with PROT_BTI.
> (and if the marking is wrong then the binary is wrong.)
>
> 2) if we consider the marking to be a compatibility indicator
> and let userspace policy to decide what to do with it then the
> static exe and vdso cases should be handled by that policy too.

For static exe, we assume that the compatibility was checked at link
time. However, you are right on the vdso, we always turn BTI on. So it
can indeed be argued that the kernel already made the decision for (some
of) the user modules.

> (this makes sense if we expect that there are reasons to turn
> bti off for a process independently of markings. this requires
> the static linking startup code to do the policy decision and
> self-apply PROT_BTI early.)

We currently left this policy decision to the dynamic loader (mostly,
apart from vdso).

> the current code does not fit either case well, but i was
> planning to do (1). and ideally PROT_BTI would be added
> reliably, but a best effort only PROT_BTI works too, however
> it limits our ability to report real mprotect failures.

If we (kernel people) agree to set PROT_BTI on for the main executable,
we can expose a bit (in AT_FLAGS or somewhere) to tell the dynamic
loader that PROT_BTI is already on. I presume subsequent objects will be
mapped with mmap().

> > > - ideally PROT_BTI would be added via a new syscall that does not
> > > interfere with PROT_EXEC filtering. (this does not conflict with
> > > the current patches: even with a new syscall we need a fallback.)
> >
> > This can be discussed as a long term solution.
> >
> > > - solve it in systemd (e.g. turn off the filter, use better filter):
> > > i would prefer not to have aarch64 (or BTI) specific policy in
> > > user code. and there was no satisfying way to do this portably.
> >
> > I agree. I think the best for now (as a back-portable glibc fix) is to
> > ignore the mprotect(PROT_EXEC|PROT_BTI) error that the dynamic loader
> > gets. BTI will be disabled if MDWX is enabled.
>
> ok.
>
> we got back to the original proposal: silently ignore mprotect
> failures. i'm still considering the mmap solution for libraries
> only: at least then libraries are handled reliably on current
> setups, but i will have to think about whether attack targets
> are mainly in libraries like libc or in executables.

I think ignoring the mprotect() error is the best we can do now. If we
add a kernel patch to turn PROT_BTI on together with an AT_FLAGS bit,
the user mprotect() would no longer be necessary.

In the absence of an AT_FLAGS bit, we could add PROT_BTI on the main exe
and backport the fix to when we first added BTI support. This way the
dynamic loader may just ignore the mprotect() altogether on the main
exe, assuming that people run latest stable kernels.

> > In the meantime, we should start (continue) looking at a solution that
> > works for both systemd and the kernel and be generic enough for other
> > architectures. The stateless nature of the current SECCOMP approach is
> > not suitable for this W^X policy. Kees had some suggestions here but the
> > thread seems to have died:
> >
> > https://lore.kernel.org/kernel-hardening/202010221256.A4F95FD11@keescook/
>
> it sounded like better W^X enforcement won't happen any time soon.

Unfortunately, I think you are right here.

Anyway, looking for any other input from the kernel and systemd people.
If not, I'll post a patch at 5.11-rc1 turning PROT_BTI on for the main
exe and take it from there. I think such discussion shouldn't disrupt
the glibc fixes/improvements.

--
Catalin