2020-10-22 09:02:25

by Szabolcs Nagy

[permalink] [raw]
Subject: Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

The 10/22/2020 09:18, Lennart Poettering wrote:
> On Mi, 21.10.20 22:44, Jeremy Linton ([email protected]) wrote:
>
> > Hi,
> >
> > There is a problem with glibc+systemd on BTI enabled systems. Systemd
> > has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
> > PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
> > being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
> > caught by the seccomp filter, resulting in service failures.
> >
> > So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
> > This is obviously not desirable.
> >
> > Various changes have been suggested, replacing the mprotect with mmap calls
> > having PROT_BTI set on the original mapping, re-mmapping the segments,
> > implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
> > and various modification to seccomp to allow particular mprotect cases to
> > bypass the filters. In each case there seems to be an undesirable attribute
> > to the solution.
> >
> > So, whats the best solution?
>
> Did you see Topi's comments on the systemd issue?
>
> https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
>
> I think I agree with this: it's a bit weird to alter the bits after
> the fact. Can't glibc set up everything right from the begining? That
> would keep both concepts working.

that's hard to do and does not work for the main exe currently
(which is mmaped by the kernel).

(it's hard to do because to know that the elf module requires
bti the PT_GNU_PROPERTY notes have to be accessed that are
often in the executable load segment, so either you mmap that
or have to read that, but the latter has a lot more failure
modes, so if i have to get the mmap flags right i'd do a mmap
and then re-mmap if the flags were not right)


2020-10-22 16:00:11

by Lennart Poettering

[permalink] [raw]
Subject: Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

On Do, 22.10.20 09:05, Szabolcs Nagy ([email protected]) wrote:

> > > Various changes have been suggested, replacing the mprotect with mmap calls
> > > having PROT_BTI set on the original mapping, re-mmapping the segments,
> > > implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
> > > and various modification to seccomp to allow particular mprotect cases to
> > > bypass the filters. In each case there seems to be an undesirable attribute
> > > to the solution.
> > >
> > > So, whats the best solution?
> >
> > Did you see Topi's comments on the systemd issue?
> >
> > https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
> >
> > I think I agree with this: it's a bit weird to alter the bits after
> > the fact. Can't glibc set up everything right from the begining? That
> > would keep both concepts working.
>
> that's hard to do and does not work for the main exe currently
> (which is mmaped by the kernel).
>
> (it's hard to do because to know that the elf module requires
> bti the PT_GNU_PROPERTY notes have to be accessed that are
> often in the executable load segment, so either you mmap that
> or have to read that, but the latter has a lot more failure
> modes, so if i have to get the mmap flags right i'd do a mmap
> and then re-mmap if the flags were not right)

Only other option I then see is to neuter one of the two
mechanisms. We could certainly turn off MDWE on arm in systemd, if
people want that. Or make it a build-time choice, so that distros make
the choice: build everything with BTI xor suppport MDWE.

(Might make sense for glibc to gracefully fallback to non-BTI mode if
the mprotect() fails though, to make sure BTI-built binaries work
everywhere.)

I figure your interest in ARM system security is bigger than mine. I
am totally fine to turn off MDWE on ARM if that's what the Linux ARM
folks want. I ave no horse in the race. Just let me know.

[An acceptable compromise might be to allow
mprotect(PROT_EXEC|PROT_BTI) if MDWE is on, but prohibit
mprotect(PROT_EXEC) without PROT_BTI. Then at least you get one of the
two protections, but not both. I mean, MDWE is not perfect anyway on
non-x86-64 already: on 32bit i386 MDWE protection is not complete, due
to ipc() syscall multiplexing being unmatchable with seccomp. I
personally am happy as long as it works fully on x86-64]

Lennart

--
Lennart Poettering, Berlin