2020-09-29 21:05:01

by Chang S. Bae

[permalink] [raw]
Subject: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

During signal entry, the kernel pushes data onto the normal userspace
stack. On x86, the data pushed onto the user stack includes XSAVE state,
which has grown over time as new features and larger registers have been
added to the architecture.

MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
constant indicates to userspace how much data the kernel expects to push on
the user stack, [2][3].

However, this constant is much too small and does not reflect recent
additions to the architecture. For instance, when AVX-512 states are in
use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.

The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
cause user stack overflow when delivering a signal.

In this series, we suggest a couple of things:
1. Provide a variable minimum stack size to userspace, as a similar
approach to [5]
2. Avoid using a too-small alternate stack

[1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/bits/sigstack.h;h=b9dca794da093dc4d41d39db9851d444e1b54d9b;hb=HEAD
[2]: https://www.gnu.org/software/libc/manual/html_node/Signal-Stack.html
[3]: https://man7.org/linux/man-pages/man2/sigaltstack.2.html
[4]: https://bugzilla.kernel.org/show_bug.cgi?id=153531
[5]: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4671/original/plumbers-dm-2017.pdf

Chang S. Bae (4):
x86/signal: Introduce helpers to get the maximum signal frame size
x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ
x86/signal: Prevent an alternate stack overflow before a signal
delivery
selftest/x86/signal: Include test cases for validating sigaltstack

arch/x86/ia32/ia32_signal.c | 11 +-
arch/x86/include/asm/elf.h | 4 +
arch/x86/include/asm/fpu/signal.h | 2 +
arch/x86/include/asm/sigframe.h | 25 +++++
arch/x86/include/uapi/asm/auxvec.h | 6 +-
arch/x86/kernel/cpu/common.c | 3 +
arch/x86/kernel/fpu/signal.c | 20 ++++
arch/x86/kernel/signal.c | 66 +++++++++++-
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigaltstack.c | 126 ++++++++++++++++++++++
10 files changed, 258 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/x86/sigaltstack.c

--
2.17.1


2020-10-05 13:48:02

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> During signal entry, the kernel pushes data onto the normal userspace
> stack. On x86, the data pushed onto the user stack includes XSAVE state,
> which has grown over time as new features and larger registers have been
> added to the architecture.
>
> MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> constant indicates to userspace how much data the kernel expects to push on
> the user stack, [2][3].
>
> However, this constant is much too small and does not reflect recent
> additions to the architecture. For instance, when AVX-512 states are in
> use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
>
> The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> cause user stack overflow when delivering a signal.
>
> In this series, we suggest a couple of things:
> 1. Provide a variable minimum stack size to userspace, as a similar
> approach to [5]
> 2. Avoid using a too-small alternate stack

I can't comment on the x86 specifics, but the approach followed in this
series does seem consistent with the way arm64 populates
AT_MINSIGSTKSZ.

I need to dig up my glibc hacks for providing a sysconf interface to
this...

Cheers
---Dave

>
> [1]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/bits/sigstack.h;h=b9dca794da093dc4d41d39db9851d444e1b54d9b;hb=HEAD
> [2]: https://www.gnu.org/software/libc/manual/html_node/Signal-Stack.html
> [3]: https://man7.org/linux/man-pages/man2/sigaltstack.2.html
> [4]: https://bugzilla.kernel.org/show_bug.cgi?id=153531
> [5]: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4671/original/plumbers-dm-2017.pdf
>
> Chang S. Bae (4):
> x86/signal: Introduce helpers to get the maximum signal frame size
> x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ
> x86/signal: Prevent an alternate stack overflow before a signal
> delivery
> selftest/x86/signal: Include test cases for validating sigaltstack
>
> arch/x86/ia32/ia32_signal.c | 11 +-
> arch/x86/include/asm/elf.h | 4 +
> arch/x86/include/asm/fpu/signal.h | 2 +
> arch/x86/include/asm/sigframe.h | 25 +++++
> arch/x86/include/uapi/asm/auxvec.h | 6 +-
> arch/x86/kernel/cpu/common.c | 3 +
> arch/x86/kernel/fpu/signal.c | 20 ++++
> arch/x86/kernel/signal.c | 66 +++++++++++-
> tools/testing/selftests/x86/Makefile | 2 +-
> tools/testing/selftests/x86/sigaltstack.c | 126 ++++++++++++++++++++++
> 10 files changed, 258 insertions(+), 7 deletions(-)
> create mode 100644 tools/testing/selftests/x86/sigaltstack.c
>
> --
> 2.17.1
>

2020-10-05 23:17:57

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
>
> On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > During signal entry, the kernel pushes data onto the normal userspace
> > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > which has grown over time as new features and larger registers have been
> > added to the architecture.
> >
> > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > constant indicates to userspace how much data the kernel expects to push on
> > the user stack, [2][3].
> >
> > However, this constant is much too small and does not reflect recent
> > additions to the architecture. For instance, when AVX-512 states are in
> > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> >
> > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > cause user stack overflow when delivering a signal.
> >
> > In this series, we suggest a couple of things:
> > 1. Provide a variable minimum stack size to userspace, as a similar
> > approach to [5]
> > 2. Avoid using a too-small alternate stack
>
> I can't comment on the x86 specifics, but the approach followed in this
> series does seem consistent with the way arm64 populates
> AT_MINSIGSTKSZ.
>
> I need to dig up my glibc hacks for providing a sysconf interface to
> this...

Here is my proposal for glibc:

https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html

1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
is in use.


--
H.J.

2020-10-06 09:29:36

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> >
> > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > During signal entry, the kernel pushes data onto the normal userspace
> > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > which has grown over time as new features and larger registers have been
> > > added to the architecture.
> > >
> > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > constant indicates to userspace how much data the kernel expects to push on
> > > the user stack, [2][3].
> > >
> > > However, this constant is much too small and does not reflect recent
> > > additions to the architecture. For instance, when AVX-512 states are in
> > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > >
> > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > cause user stack overflow when delivering a signal.
> > >
> > > In this series, we suggest a couple of things:
> > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > approach to [5]
> > > 2. Avoid using a too-small alternate stack
> >
> > I can't comment on the x86 specifics, but the approach followed in this
> > series does seem consistent with the way arm64 populates
> > AT_MINSIGSTKSZ.
> >
> > I need to dig up my glibc hacks for providing a sysconf interface to
> > this...
>
> Here is my proposal for glibc:
>
> https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html

Thanks for the link.

Are there patches yet? I already had some hacks in the works, but I can
drop them if there's something already out there.


> 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.

Can we do this? IIUC, this is an ABI break and carries the risk of
buffer overruns.

The reason for not simply increasing the kernel's MINSIGSTKSZ #define
(apart from the fact that it is rarely used, due to glibc's shadowing
definitions) was that userspace binaries will have baked in the old
value of the constant and may be making assumptions about it.

For example, the type (char [MINSIGSTKSZ]) changes if this #define
changes. This could be a problem if an newly built library tries to
memcpy() or dump such an object defined by and old binary.
Bounds-checking and the stack sizes passed to things like sigaltstack()
and makecontext() could similarly go wrong.


> 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.

How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
discovery method is changing. The meaning of the value is exactly the
same as before.

If we are going to rename it though, it could make sense to go for
something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".

The trouble with including "STKSZ" is that is sounds like a
recommendation for your stack size. While the signal frame size is
relevant to picking a stack size, it's not the only thing to
consider.


Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
of a "recommended stack size" be abandoned? glibc can at least make a
slightly more informed guess about suitable stack sizes than the kernel
(and glibc already has to guess anyway, in order to determine the
default thread stack size).


> 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> is in use.

Great if we can do it. I was concerned that this might be
controversial.

Would this just be a recommendation, or can we enforce it somehow?

Cheers
---Dave

2020-10-06 12:15:01

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
>
> On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > which has grown over time as new features and larger registers have been
> > > > added to the architecture.
> > > >
> > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > constant indicates to userspace how much data the kernel expects to push on
> > > > the user stack, [2][3].
> > > >
> > > > However, this constant is much too small and does not reflect recent
> > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > >
> > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > cause user stack overflow when delivering a signal.
> > > >
> > > > In this series, we suggest a couple of things:
> > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > approach to [5]
> > > > 2. Avoid using a too-small alternate stack
> > >
> > > I can't comment on the x86 specifics, but the approach followed in this
> > > series does seem consistent with the way arm64 populates
> > > AT_MINSIGSTKSZ.
> > >
> > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > this...
> >
> > Here is my proposal for glibc:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
>
> Thanks for the link.
>
> Are there patches yet? I already had some hacks in the works, but I can
> drop them if there's something already out there.

I am working on it.

>
> > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
>
> Can we do this? IIUC, this is an ABI break and carries the risk of
> buffer overruns.
>
> The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> (apart from the fact that it is rarely used, due to glibc's shadowing
> definitions) was that userspace binaries will have baked in the old
> value of the constant and may be making assumptions about it.
>
> For example, the type (char [MINSIGSTKSZ]) changes if this #define
> changes. This could be a problem if an newly built library tries to
> memcpy() or dump such an object defined by and old binary.
> Bounds-checking and the stack sizes passed to things like sigaltstack()
> and makecontext() could similarly go wrong.

With my original proposal:

https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html

char [MINSIGSTKSZ] won't compile. The feedback is to increase the
constants:

https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html

>
> > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
>
> How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> discovery method is changing. The meaning of the value is exactly the
> same as before.
>
> If we are going to rename it though, it could make sense to go for
> something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
>
> The trouble with including "STKSZ" is that is sounds like a
> recommendation for your stack size. While the signal frame size is
> relevant to picking a stack size, it's not the only thing to
> consider.

The problem is that AT_MINSIGSTKSZ is the signal frame size used by
kernel. The minimum stack size for a signal handler is more likely
AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
frame size used by kernel + 6KB for user application.

>
> Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> of a "recommended stack size" be abandoned? glibc can at least make a
> slightly more informed guess about suitable stack sizes than the kernel
> (and glibc already has to guess anyway, in order to determine the
> default thread stack size).

Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
available.

>
> > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > is in use.
>
> Great if we can do it. I was concerned that this might be
> controversial.
>
> Would this just be a recommendation, or can we enforce it somehow?

It is just an idea. We need to move away from constant SIGSTKSZ and
MINSIGSTKSZ.

--
H.J.

2020-10-06 15:20:12

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 6, 2020 at 5:12 AM H.J. Lu <[email protected]> wrote:
>
> On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> >
> > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > >
> > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > which has grown over time as new features and larger registers have been
> > > > > added to the architecture.
> > > > >
> > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > the user stack, [2][3].
> > > > >
> > > > > However, this constant is much too small and does not reflect recent
> > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > >
> > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > cause user stack overflow when delivering a signal.
> > > > >
> > > > > In this series, we suggest a couple of things:
> > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > approach to [5]
> > > > > 2. Avoid using a too-small alternate stack
> > > >
> > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > series does seem consistent with the way arm64 populates
> > > > AT_MINSIGSTKSZ.
> > > >
> > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > this...
> > >
> > > Here is my proposal for glibc:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> >
> > Thanks for the link.
> >
> > Are there patches yet? I already had some hacks in the works, but I can
> > drop them if there's something already out there.
>
> I am working on it.
>
> >
> > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> >
> > Can we do this? IIUC, this is an ABI break and carries the risk of
> > buffer overruns.
> >
> > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > (apart from the fact that it is rarely used, due to glibc's shadowing
> > definitions) was that userspace binaries will have baked in the old
> > value of the constant and may be making assumptions about it.
> >
> > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > changes. This could be a problem if an newly built library tries to
> > memcpy() or dump such an object defined by and old binary.
> > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > and makecontext() could similarly go wrong.
>
> With my original proposal:
>
> https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
>
> char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> constants:
>
> https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
>
> >
> > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> >
> > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > discovery method is changing. The meaning of the value is exactly the
> > same as before.
> >
> > If we are going to rename it though, it could make sense to go for
> > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> >
> > The trouble with including "STKSZ" is that is sounds like a
> > recommendation for your stack size. While the signal frame size is
> > relevant to picking a stack size, it's not the only thing to
> > consider.
>
> The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> kernel. The minimum stack size for a signal handler is more likely
> AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> frame size used by kernel + 6KB for user application.
>
> >
> > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > of a "recommended stack size" be abandoned? glibc can at least make a
> > slightly more informed guess about suitable stack sizes than the kernel
> > (and glibc already has to guess anyway, in order to determine the
> > default thread stack size).
>
> Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> available.
>
> >
> > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > is in use.
> >
> > Great if we can do it. I was concerned that this might be
> > controversial.
> >
> > Would this just be a recommendation, or can we enforce it somehow?
>
> It is just an idea. We need to move away from constant SIGSTKSZ and
> MINSIGSTKSZ.
>

Here is the glibc patch:

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ

AT_MINSIGSTKSZ should return the signal frame size used by kernel + 6KB
for user application.

--
H.J.

2020-10-06 15:28:44

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> >
> > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > >
> > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > which has grown over time as new features and larger registers have been
> > > > > added to the architecture.
> > > > >
> > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > the user stack, [2][3].
> > > > >
> > > > > However, this constant is much too small and does not reflect recent
> > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > >
> > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > cause user stack overflow when delivering a signal.
> > > > >
> > > > > In this series, we suggest a couple of things:
> > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > approach to [5]
> > > > > 2. Avoid using a too-small alternate stack
> > > >
> > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > series does seem consistent with the way arm64 populates
> > > > AT_MINSIGSTKSZ.
> > > >
> > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > this...
> > >
> > > Here is my proposal for glibc:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> >
> > Thanks for the link.
> >
> > Are there patches yet? I already had some hacks in the works, but I can
> > drop them if there's something already out there.
>
> I am working on it.

OK. I may post something for discussion, but I'm happy for it to be
superseded by someone (i.e., other than me) who actually knows what
they're doing...

> >
> > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> >
> > Can we do this? IIUC, this is an ABI break and carries the risk of
> > buffer overruns.
> >
> > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > (apart from the fact that it is rarely used, due to glibc's shadowing
> > definitions) was that userspace binaries will have baked in the old
> > value of the constant and may be making assumptions about it.
> >
> > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > changes. This could be a problem if an newly built library tries to
> > memcpy() or dump such an object defined by and old binary.
> > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > and makecontext() could similarly go wrong.
>
> With my original proposal:
>
> https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
>
> char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> constants:
>
> https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html

Ah, I see. But both still API and ABI breaks; moreover, declaraing an
array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
obvious thing to do with this constant in many simple cases. Such usage
is widespread, see:

* https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1


Your two approaches seem to trade off two different sources of buffer
overruns: undersized stacks versus ABI breaks across library boundaries.

Since undersized stack is by far the more familiar problem and we at
least have guard regions to help detect overruns, I'd vote to keep
MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.

Or are people reporting real stack overruns on x86 today?


For arm64, we made large vectors on SVE opt-in, so that oversized signal
frames are not seen by default. Would somethine similar be feasible on
x86?


> > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> >
> > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > discovery method is changing. The meaning of the value is exactly the
> > same as before.
> >
> > If we are going to rename it though, it could make sense to go for
> > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> >
> > The trouble with including "STKSZ" is that is sounds like a
> > recommendation for your stack size. While the signal frame size is
> > relevant to picking a stack size, it's not the only thing to
> > consider.
>
> The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> kernel. The minimum stack size for a signal handler is more likely
> AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> frame size used by kernel + 6KB for user application.

Ack; to be correct, you also need to take into account which signals may
be unmasked while running on this stack, and the stack requirements of
all their handlers. Unfortunately, that's hard :(

What's your view on my naming suggesions?


> > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > of a "recommended stack size" be abandoned? glibc can at least make a
> > slightly more informed guess about suitable stack sizes than the kernel
> > (and glibc already has to guess anyway, in order to determine the
> > default thread stack size).
>
> Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> available.

In my code, I generate _SC_SIGSTKSZ as the equivalent of

max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)

which is >= the legacy value, and broadly reperesentative of the
relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.


What do you think?


> > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > is in use.
> >
> > Great if we can do it. I was concerned that this might be
> > controversial.
> >
> > Would this just be a recommendation, or can we enforce it somehow?
>
> It is just an idea. We need to move away from constant SIGSTKSZ and
> MINSIGSTKSZ.

Totally agree with that.

Cheers
---Dave

2020-10-06 15:38:38

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <[email protected]> wrote:
>
> On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > >
> > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > which has grown over time as new features and larger registers have been
> > > > > > added to the architecture.
> > > > > >
> > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > the user stack, [2][3].
> > > > > >
> > > > > > However, this constant is much too small and does not reflect recent
> > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > >
> > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > cause user stack overflow when delivering a signal.
> > > > > >
> > > > > > In this series, we suggest a couple of things:
> > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > approach to [5]
> > > > > > 2. Avoid using a too-small alternate stack
> > > > >
> > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > series does seem consistent with the way arm64 populates
> > > > > AT_MINSIGSTKSZ.
> > > > >
> > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > this...
> > > >
> > > > Here is my proposal for glibc:
> > > >
> > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > >
> > > Thanks for the link.
> > >
> > > Are there patches yet? I already had some hacks in the works, but I can
> > > drop them if there's something already out there.
> >
> > I am working on it.
>
> OK. I may post something for discussion, but I'm happy for it to be
> superseded by someone (i.e., other than me) who actually knows what
> they're doing...

Please see my previous email for my glibc patch:

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ

> > >
> > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > >
> > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > buffer overruns.
> > >
> > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > definitions) was that userspace binaries will have baked in the old
> > > value of the constant and may be making assumptions about it.
> > >
> > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > changes. This could be a problem if an newly built library tries to
> > > memcpy() or dump such an object defined by and old binary.
> > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > and makecontext() could similarly go wrong.
> >
> > With my original proposal:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> >
> > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > constants:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
>
> Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> obvious thing to do with this constant in many simple cases. Such usage
> is widespread, see:
>
> * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
>
>
> Your two approaches seem to trade off two different sources of buffer
> overruns: undersized stacks versus ABI breaks across library boundaries.

We can't get everything we want.

> Since undersized stack is by far the more familiar problem and we at
> least have guard regions to help detect overruns, I'd vote to keep
> MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.

Agree.

> Or are people reporting real stack overruns on x86 today?

I hope so.

>
> For arm64, we made large vectors on SVE opt-in, so that oversized signal
> frames are not seen by default. Would somethine similar be feasible on
> x86?
>
>
> > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > >
> > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > discovery method is changing. The meaning of the value is exactly the
> > > same as before.
> > >
> > > If we are going to rename it though, it could make sense to go for
> > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > >
> > > The trouble with including "STKSZ" is that is sounds like a
> > > recommendation for your stack size. While the signal frame size is
> > > relevant to picking a stack size, it's not the only thing to
> > > consider.
> >
> > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > kernel. The minimum stack size for a signal handler is more likely
> > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > frame size used by kernel + 6KB for user application.
>
> Ack; to be correct, you also need to take into account which signals may
> be unmasked while running on this stack, and the stack requirements of
> all their handlers. Unfortunately, that's hard :(
>
> What's your view on my naming suggesions?

I used _SC_MINSIGSTKSZ:

https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9

>
> > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > slightly more informed guess about suitable stack sizes than the kernel
> > > (and glibc already has to guess anyway, in order to determine the
> > > default thread stack size).
> >
> > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > available.
>
> In my code, I generate _SC_SIGSTKSZ as the equivalent of
>
> max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
>
> which is >= the legacy value, and broadly reperesentative of the
> relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
>
>
> What do you think?

sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.

>
> > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > is in use.
> > >
> > > Great if we can do it. I was concerned that this might be
> > > controversial.
> > >
> > > Would this just be a recommendation, or can we enforce it somehow?
> >
> > It is just an idea. We need to move away from constant SIGSTKSZ and
> > MINSIGSTKSZ.
>
> Totally agree with that.
>

With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.

--
H.J.

2020-10-06 15:45:34

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 08:18:03AM -0700, H.J. Lu wrote:
> On Tue, Oct 6, 2020 at 5:12 AM H.J. Lu <[email protected]> wrote:
> >
> > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > >
> > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > which has grown over time as new features and larger registers have been
> > > > > > added to the architecture.
> > > > > >
> > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > the user stack, [2][3].
> > > > > >
> > > > > > However, this constant is much too small and does not reflect recent
> > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > >
> > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > cause user stack overflow when delivering a signal.
> > > > > >
> > > > > > In this series, we suggest a couple of things:
> > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > approach to [5]
> > > > > > 2. Avoid using a too-small alternate stack
> > > > >
> > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > series does seem consistent with the way arm64 populates
> > > > > AT_MINSIGSTKSZ.
> > > > >
> > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > this...
> > > >
> > > > Here is my proposal for glibc:
> > > >
> > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > >
> > > Thanks for the link.
> > >
> > > Are there patches yet? I already had some hacks in the works, but I can
> > > drop them if there's something already out there.
> >
> > I am working on it.
> >
> > >
> > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > >
> > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > buffer overruns.
> > >
> > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > definitions) was that userspace binaries will have baked in the old
> > > value of the constant and may be making assumptions about it.
> > >
> > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > changes. This could be a problem if an newly built library tries to
> > > memcpy() or dump such an object defined by and old binary.
> > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > and makecontext() could similarly go wrong.
> >
> > With my original proposal:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> >
> > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > constants:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> >
> > >
> > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > >
> > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > discovery method is changing. The meaning of the value is exactly the
> > > same as before.
> > >
> > > If we are going to rename it though, it could make sense to go for
> > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > >
> > > The trouble with including "STKSZ" is that is sounds like a
> > > recommendation for your stack size. While the signal frame size is
> > > relevant to picking a stack size, it's not the only thing to
> > > consider.
> >
> > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > kernel. The minimum stack size for a signal handler is more likely
> > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > frame size used by kernel + 6KB for user application.
> >
> > >
> > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > slightly more informed guess about suitable stack sizes than the kernel
> > > (and glibc already has to guess anyway, in order to determine the
> > > default thread stack size).
> >
> > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > available.
> >
> > >
> > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > is in use.
> > >
> > > Great if we can do it. I was concerned that this might be
> > > controversial.
> > >
> > > Would this just be a recommendation, or can we enforce it somehow?
> >
> > It is just an idea. We need to move away from constant SIGSTKSZ and
> > MINSIGSTKSZ.
> >
>
> Here is the glibc patch:
>
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
>
> AT_MINSIGSTKSZ should return the signal frame size used by kernel + 6KB
> for user application.

I'm not sure about the 6K here.

We a few fundamental parameters:

* the actual maximum size of the kernel-allocated signal frame (which
we'll report via AT_MINSIGSTKSZ);

* the size of additional userspace stack frame required to execute the
minimal (i.e., empty) signal handler. (On AArch64, this is 0. In
environments where the C lirbrary calls signal handlers through some
sort of wrapper, this would need to include the wrapper's stack
needs also);

* additional userspace stack needs for the actual signal handler code.
This is completely unknown.


_SC_MINSIGSTKSZ (however named) should certainly include the first two,
but I'm not sure about the third. It will at least be architecture-
dependent.


This is one reason why I still favor having more than one constant here:
the fundamental system properties should be discoverable for software
that knows how to calculate its own stack needs accurately.

Since calculating stack needs is hard and most software doesn't bother
to do it, we could also give a "recommended" stack size which
incorporates a guess of typical handler stack needs (similarly to the
legacy SIGSTKSZ constant), but I think that should be a separate
parameter.

Cheers
---Dave

2020-10-06 15:48:04

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On 10/6/20 8:25 AM, Dave Martin wrote:
> Or are people reporting real stack overruns on x86 today?

We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.

2020-10-06 16:58:01

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 6, 2020 at 8:43 AM Dave Martin <[email protected]> wrote:
>
> On Tue, Oct 06, 2020 at 08:18:03AM -0700, H.J. Lu wrote:
> > On Tue, Oct 6, 2020 at 5:12 AM H.J. Lu <[email protected]> wrote:
> > >
> > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > > >
> > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > added to the architecture.
> > > > > > >
> > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > the user stack, [2][3].
> > > > > > >
> > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > >
> > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > cause user stack overflow when delivering a signal.
> > > > > > >
> > > > > > > In this series, we suggest a couple of things:
> > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > approach to [5]
> > > > > > > 2. Avoid using a too-small alternate stack
> > > > > >
> > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > series does seem consistent with the way arm64 populates
> > > > > > AT_MINSIGSTKSZ.
> > > > > >
> > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > this...
> > > > >
> > > > > Here is my proposal for glibc:
> > > > >
> > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > >
> > > > Thanks for the link.
> > > >
> > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > drop them if there's something already out there.
> > >
> > > I am working on it.
> > >
> > > >
> > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > >
> > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > buffer overruns.
> > > >
> > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > definitions) was that userspace binaries will have baked in the old
> > > > value of the constant and may be making assumptions about it.
> > > >
> > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > changes. This could be a problem if an newly built library tries to
> > > > memcpy() or dump such an object defined by and old binary.
> > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > and makecontext() could similarly go wrong.
> > >
> > > With my original proposal:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > >
> > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > constants:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> > >
> > > >
> > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > >
> > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > discovery method is changing. The meaning of the value is exactly the
> > > > same as before.
> > > >
> > > > If we are going to rename it though, it could make sense to go for
> > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > >
> > > > The trouble with including "STKSZ" is that is sounds like a
> > > > recommendation for your stack size. While the signal frame size is
> > > > relevant to picking a stack size, it's not the only thing to
> > > > consider.
> > >
> > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > kernel. The minimum stack size for a signal handler is more likely
> > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > frame size used by kernel + 6KB for user application.
> > >
> > > >
> > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > (and glibc already has to guess anyway, in order to determine the
> > > > default thread stack size).
> > >
> > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > available.
> > >
> > > >
> > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > is in use.
> > > >
> > > > Great if we can do it. I was concerned that this might be
> > > > controversial.
> > > >
> > > > Would this just be a recommendation, or can we enforce it somehow?
> > >
> > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > MINSIGSTKSZ.
> > >
> >
> > Here is the glibc patch:
> >
> > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
> >
> > AT_MINSIGSTKSZ should return the signal frame size used by kernel + 6KB
> > for user application.
>
> I'm not sure about the 6K here.

6KB is something I made up.

> We a few fundamental parameters:
>
> * the actual maximum size of the kernel-allocated signal frame (which
> we'll report via AT_MINSIGSTKSZ);

Agree.

> * the size of additional userspace stack frame required to execute the
> minimal (i.e., empty) signal handler. (On AArch64, this is 0. In

It is also 0 for x86.

> environments where the C lirbrary calls signal handlers through some
> sort of wrapper, this would need to include the wrapper's stack
> needs also);

> * additional userspace stack needs for the actual signal handler code.
> This is completely unknown.

That is 6KB I made up.

>
> _SC_MINSIGSTKSZ (however named) should certainly include the first two,
> but I'm not sure about the third. It will at least be architecture-
> dependent.
>
>
> This is one reason why I still favor having more than one constant here:
> the fundamental system properties should be discoverable for software
> that knows how to calculate its own stack needs accurately.
>
> Since calculating stack needs is hard and most software doesn't bother
> to do it, we could also give a "recommended" stack size which
> incorporates a guess of typical handler stack needs (similarly to the
> legacy SIGSTKSZ constant), but I think that should be a separate
> parameter.

Sounds reasonable. We can have _SC_MINSIGSTKSZ and
_SC_SIGSTKSZ which is _SC_MINSIGSTKSZ + 6KB (or some
other value).

--
H.J.

2020-10-06 16:59:07

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 08:34:06AM -0700, H.J. Lu wrote:
> On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <[email protected]> wrote:
> >
> > On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > > >
> > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > > >
> > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > added to the architecture.
> > > > > > >
> > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > the user stack, [2][3].
> > > > > > >
> > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > >
> > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > cause user stack overflow when delivering a signal.
> > > > > > >
> > > > > > > In this series, we suggest a couple of things:
> > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > approach to [5]
> > > > > > > 2. Avoid using a too-small alternate stack
> > > > > >
> > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > series does seem consistent with the way arm64 populates
> > > > > > AT_MINSIGSTKSZ.
> > > > > >
> > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > this...
> > > > >
> > > > > Here is my proposal for glibc:
> > > > >
> > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > >
> > > > Thanks for the link.
> > > >
> > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > drop them if there's something already out there.
> > >
> > > I am working on it.
> >
> > OK. I may post something for discussion, but I'm happy for it to be
> > superseded by someone (i.e., other than me) who actually knows what
> > they're doing...
>
> Please see my previous email for my glibc patch:
>
> https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
>
> > > >
> > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > >
> > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > buffer overruns.
> > > >
> > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > definitions) was that userspace binaries will have baked in the old
> > > > value of the constant and may be making assumptions about it.
> > > >
> > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > changes. This could be a problem if an newly built library tries to
> > > > memcpy() or dump such an object defined by and old binary.
> > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > and makecontext() could similarly go wrong.
> > >
> > > With my original proposal:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > >
> > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > constants:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> >
> > Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> > array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> > obvious thing to do with this constant in many simple cases. Such usage
> > is widespread, see:
> >
> > * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
> >
> >
> > Your two approaches seem to trade off two different sources of buffer
> > overruns: undersized stacks versus ABI breaks across library boundaries.
>
> We can't get everything we want.
>
> > Since undersized stack is by far the more familiar problem and we at
> > least have guard regions to help detect overruns, I'd vote to keep
> > MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.
>
> Agree.
>
> > Or are people reporting real stack overruns on x86 today?
>
> I hope so.
>
> >
> > For arm64, we made large vectors on SVE opt-in, so that oversized signal
> > frames are not seen by default. Would somethine similar be feasible on
> > x86?
> >
> >
> > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > >
> > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > discovery method is changing. The meaning of the value is exactly the
> > > > same as before.
> > > >
> > > > If we are going to rename it though, it could make sense to go for
> > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > >
> > > > The trouble with including "STKSZ" is that is sounds like a
> > > > recommendation for your stack size. While the signal frame size is
> > > > relevant to picking a stack size, it's not the only thing to
> > > > consider.
> > >
> > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > kernel. The minimum stack size for a signal handler is more likely
> > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > frame size used by kernel + 6KB for user application.
> >
> > Ack; to be correct, you also need to take into account which signals may
> > be unmasked while running on this stack, and the stack requirements of
> > all their handlers. Unfortunately, that's hard :(
> >
> > What's your view on my naming suggesions?
>
> I used _SC_MINSIGSTKSZ:
>
> https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9

Apologies, I missed that.

Otherwise, the changes look much as I would expect, except for the
"6K for user program" thing. This is strictly not included in the
legacy MINSIGSTKSZ.

>
> >
> > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > (and glibc already has to guess anyway, in order to determine the
> > > > default thread stack size).
> > >
> > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > available.
> >
> > In my code, I generate _SC_SIGSTKSZ as the equivalent of
> >
> > max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
> >
> > which is >= the legacy value, and broadly reperesentative of the
> > relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
> >
> >
> > What do you think?
>
> sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.

Why, though?

MINSIGSTKSZ is not specified to be usable as-is for any case whatsoever.


Software that calculates its own needs to know the actual system values,
not estimates based on guesses about how much stack a typical program
might need if it were recompiled for x86.

This doesn't mean we can't have a generic suggested value that's suitable
for common scenarios (like SIGSTKSZ), but if we do then I think it
should be a separate constant.


> > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > is in use.
> > > >
> > > > Great if we can do it. I was concerned that this might be
> > > > controversial.
> > > >
> > > > Would this just be a recommendation, or can we enforce it somehow?
> > >
> > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > MINSIGSTKSZ.
> >
> > Totally agree with that.
> >
>
> With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
> if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.

Ah yes, I see. That's a sensible precaution.

Is it accepted in general that defining different feature test macros
can lead to ABI incompatibilities?

I have thought that building a shared library with _GNU_SOURCE (say)
doesn't mean that a program that loads that library must also be built
with _GNU_SOURCE. For one thing, that's hard to police.


However, there are already combinations that could break, e.g., mixing
-D_FILE_OFFSET_BITS=64 with -D_FILE_OFFSET_BITS=32 would be broken if
this define changes off_t.


So, maybe having _SC_MINSIGSTKSZ_SOURCE break things in this way is an
acceptable compromise. Interfaces that depend on the value of
MINSIGSTKSZ or SIGSTKSZ are possible, but probably rare in practice --
I don't know of a specific example.

Cheers
---Dave

2020-10-06 17:01:54

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote:
> On 10/6/20 8:25 AM, Dave Martin wrote:
> > Or are people reporting real stack overruns on x86 today?
>
> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
> mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.

Right. Out of interest, do you believe that's a direct consequence of
the larger kernel-generated signal frame, or does the expansion of
userspace stack frames play a role too?

In practice software just assumes SIGSTKSZ and then ignores the problem
until / unless an actual stack overflow is seen.

There's probably a lot of software out there whose stack is
theoretically too small even without AVX-512 etc. in the mix, especially
when considering the possibility of nested signals...

Cheers
---Dave

2020-10-06 17:47:29

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 6, 2020 at 9:55 AM Dave Martin <[email protected]> wrote:
>
> On Tue, Oct 06, 2020 at 08:34:06AM -0700, H.J. Lu wrote:
> > On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > > > >
> > > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > > > >
> > > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > > added to the architecture.
> > > > > > > >
> > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > > the user stack, [2][3].
> > > > > > > >
> > > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > > >
> > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > > cause user stack overflow when delivering a signal.
> > > > > > > >
> > > > > > > > In this series, we suggest a couple of things:
> > > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > > approach to [5]
> > > > > > > > 2. Avoid using a too-small alternate stack
> > > > > > >
> > > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > > series does seem consistent with the way arm64 populates
> > > > > > > AT_MINSIGSTKSZ.
> > > > > > >
> > > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > > this...
> > > > > >
> > > > > > Here is my proposal for glibc:
> > > > > >
> > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > > >
> > > > > Thanks for the link.
> > > > >
> > > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > > drop them if there's something already out there.
> > > >
> > > > I am working on it.
> > >
> > > OK. I may post something for discussion, but I'm happy for it to be
> > > superseded by someone (i.e., other than me) who actually knows what
> > > they're doing...
> >
> > Please see my previous email for my glibc patch:
> >
> > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
> >
> > > > >
> > > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > > >
> > > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > > buffer overruns.
> > > > >
> > > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > > definitions) was that userspace binaries will have baked in the old
> > > > > value of the constant and may be making assumptions about it.
> > > > >
> > > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > > changes. This could be a problem if an newly built library tries to
> > > > > memcpy() or dump such an object defined by and old binary.
> > > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > > and makecontext() could similarly go wrong.
> > > >
> > > > With my original proposal:
> > > >
> > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > > >
> > > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > > constants:
> > > >
> > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> > >
> > > Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> > > array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> > > obvious thing to do with this constant in many simple cases. Such usage
> > > is widespread, see:
> > >
> > > * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
> > >
> > >
> > > Your two approaches seem to trade off two different sources of buffer
> > > overruns: undersized stacks versus ABI breaks across library boundaries.
> >
> > We can't get everything we want.
> >
> > > Since undersized stack is by far the more familiar problem and we at
> > > least have guard regions to help detect overruns, I'd vote to keep
> > > MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.
> >
> > Agree.
> >
> > > Or are people reporting real stack overruns on x86 today?
> >
> > I hope so.
> >
> > >
> > > For arm64, we made large vectors on SVE opt-in, so that oversized signal
> > > frames are not seen by default. Would somethine similar be feasible on
> > > x86?
> > >
> > >
> > > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > > >
> > > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > > discovery method is changing. The meaning of the value is exactly the
> > > > > same as before.
> > > > >
> > > > > If we are going to rename it though, it could make sense to go for
> > > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > > >
> > > > > The trouble with including "STKSZ" is that is sounds like a
> > > > > recommendation for your stack size. While the signal frame size is
> > > > > relevant to picking a stack size, it's not the only thing to
> > > > > consider.
> > > >
> > > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > > kernel. The minimum stack size for a signal handler is more likely
> > > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > > frame size used by kernel + 6KB for user application.
> > >
> > > Ack; to be correct, you also need to take into account which signals may
> > > be unmasked while running on this stack, and the stack requirements of
> > > all their handlers. Unfortunately, that's hard :(
> > >
> > > What's your view on my naming suggesions?
> >
> > I used _SC_MINSIGSTKSZ:
> >
> > https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9
>
> Apologies, I missed that.
>
> Otherwise, the changes look much as I would expect, except for the
> "6K for user program" thing. This is strictly not included in the
> legacy MINSIGSTKSZ.
>
> >
> > >
> > > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > > (and glibc already has to guess anyway, in order to determine the
> > > > > default thread stack size).
> > > >
> > > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > > available.
> > >
> > > In my code, I generate _SC_SIGSTKSZ as the equivalent of
> > >
> > > max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
> > >
> > > which is >= the legacy value, and broadly reperesentative of the
> > > relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
> > >
> > >
> > > What do you think?
> >
> > sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.
>
> Why, though?
>
> MINSIGSTKSZ is not specified to be usable as-is for any case whatsoever.
>
>
> Software that calculates its own needs to know the actual system values,
> not estimates based on guesses about how much stack a typical program
> might need if it were recompiled for x86.
>
> This doesn't mean we can't have a generic suggested value that's suitable
> for common scenarios (like SIGSTKSZ), but if we do then I think it
> should be a separate constant.

I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
_SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.

>
> > > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > > is in use.
> > > > >
> > > > > Great if we can do it. I was concerned that this might be
> > > > > controversial.
> > > > >
> > > > > Would this just be a recommendation, or can we enforce it somehow?
> > > >
> > > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > > MINSIGSTKSZ.
> > >
> > > Totally agree with that.
> > >
> >
> > With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
> > if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.
>
> Ah yes, I see. That's a sensible precaution.
>
> Is it accepted in general that defining different feature test macros
> can lead to ABI incompatibilities?
>
> I have thought that building a shared library with _GNU_SOURCE (say)
> doesn't mean that a program that loads that library must also be built
> with _GNU_SOURCE. For one thing, that's hard to police.
>
>
> However, there are already combinations that could break, e.g., mixing
> -D_FILE_OFFSET_BITS=64 with -D_FILE_OFFSET_BITS=32 would be broken if
> this define changes off_t.
>
>
> So, maybe having _SC_MINSIGSTKSZ_SOURCE break things in this way is an
> acceptable compromise. Interfaces that depend on the value of
> MINSIGSTKSZ or SIGSTKSZ are possible, but probably rare in practice --
> I don't know of a specific example.
>

I changed it to _SC_SIGSTKSZ_SOURCE:

https://gitlab.com/x86-glibc/glibc/-/commit/41d5e6b31025721590f12d5aa543eb0bc53ce263

#ifdef __USE_SC_SIGSTKSZ
# include <unistd.h>
/* Minimum stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
# undef MINSIGSTKSZ
# define MINSIGSTKSZ sysconf (_SC_SIGSTKSZ)
/* System default stack size for a signal handler: MINSIGSTKSZ. */
# undef SIGSTKSZ
# define SIGSTKSZ MINSIGSTKSZ
#endif

Compilation will fail if the source assumes constant MINSIGSTKSZ or
SIGSTKSZ.

--
H.J.

2020-10-06 18:22:52

by Florian Weimer

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

* Dave Martin via Libc-alpha:

> On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote:
>> On 10/6/20 8:25 AM, Dave Martin wrote:
>> > Or are people reporting real stack overruns on x86 today?
>>
>> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
>> mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.
>
> Right. Out of interest, do you believe that's a direct consequence of
> the larger kernel-generated signal frame, or does the expansion of
> userspace stack frames play a role too?

I must say that I do not quite understand this question.

32 64-*byte* registers simply need 2048 bytes of storage space worst
case, there is really no way around that.

> In practice software just assumes SIGSTKSZ and then ignores the problem
> until / unless an actual stack overflow is seen.
>
> There's probably a lot of software out there whose stack is
> theoretically too small even without AVX-512 etc. in the mix, especially
> when considering the possibility of nested signals...

That is certainly true. We have seen problems with ntpd, which
requested a 16 KiB stack, at a time when there were various deductions
from the stack size, and since the glibc dynamic loader also uses XSAVE,
ntpd exceeded the remaining stack space. But in this case, we just
fudged the stack size computation in pthread_create and made it less
likely that the dynamic loader was activated, which largely worked
around this particular problem. For MINSIGSTKSZ, we just don't have
this option because it's simply too small in the first place.

I don't immediately recall a bug due to SIGSTKSZ being too small. The
test cases I wrote for this were all artificial, to raise awareness of
this issue (applications treating these as recommended values, rather
than minimum value to avoid immediately sigaltstack/phtread_create
failures, same issue with PTHREAD_STACK_MIN).

Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

2020-10-06 18:33:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On 10/6/20 10:00 AM, Dave Martin wrote:
> On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote:
>> On 10/6/20 8:25 AM, Dave Martin wrote:
>>> Or are people reporting real stack overruns on x86 today?
>> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
>> mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.
> Right. Out of interest, do you believe that's a direct consequence of
> the larger kernel-generated signal frame, or does the expansion of
> userspace stack frames play a role too?

The kernel-generated signal frame is entirely responsible for the ~2800
bytes that I'm talking about.

I'm sure there are some systems where userspace plays a role, but those
are much less of a worry at the moment, since the kernel-induced
overflows mean an instant crash that userspace has no recourse for.

2020-10-07 10:21:32

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 08:21:00PM +0200, Florian Weimer wrote:
> * Dave Martin via Libc-alpha:
>
> > On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote:
> >> On 10/6/20 8:25 AM, Dave Martin wrote:
> >> > Or are people reporting real stack overruns on x86 today?
> >>
> >> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
> >> mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.
> >
> > Right. Out of interest, do you believe that's a direct consequence of
> > the larger kernel-generated signal frame, or does the expansion of
> > userspace stack frames play a role too?
>
> I must say that I do not quite understand this question.
>
> 32 64-*byte* registers simply need 2048 bytes of storage space worst
> case, there is really no way around that.

If the architecture grows more or bigger registers, and if those
registers are used in general-purpose code, then all stack frames will
tend to grow, not just the signal frame.

So a stack overflow might be caused by the larger signal frame by
itself; or it might be caused by the growth of the stack of 20 function
frames created by someone's signal handler.

In the latter case, this is just a "normal" stack overflow, and nothing
really to do with signals or SIGSTKSZ. Rebuilding with different
compiler flags could also grow the stack usage and cause just the same
problem.

I also strongly suspect that people often don't think about signal
nesting when allocating signal stacks. So, there might be a pre-
existing potential overflow that just becomes more likely when the
signal frame grows. That's not really SIGSTKSZ's fault.


Of course, AVX-512 might never be used in general-purpose code. On
AArch64, SVE can be used in general-purpose code, but it's too early to
say what its prevalence will be in signal handlers. Probably low.


> > In practice software just assumes SIGSTKSZ and then ignores the problem
> > until / unless an actual stack overflow is seen.
> >
> > There's probably a lot of software out there whose stack is
> > theoretically too small even without AVX-512 etc. in the mix, especially
> > when considering the possibility of nested signals...
>
> That is certainly true. We have seen problems with ntpd, which
> requested a 16 KiB stack, at a time when there were various deductions
> from the stack size, and since the glibc dynamic loader also uses XSAVE,
> ntpd exceeded the remaining stack space. But in this case, we just
> fudged the stack size computation in pthread_create and made it less
> likely that the dynamic loader was activated, which largely worked
> around this particular problem. For MINSIGSTKSZ, we just don't have
> this option because it's simply too small in the first place.
>
> I don't immediately recall a bug due to SIGSTKSZ being too small. The
> test cases I wrote for this were all artificial, to raise awareness of
> this issue (applications treating these as recommended values, rather
> than minimum value to avoid immediately sigaltstack/phtread_create
> failures, same issue with PTHREAD_STACK_MIN).

Ack, I think if SIGSTKSZ was too small significantly often, there would
be more awareness of the issue.

Cheers
---Dave

2020-10-07 10:24:46

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 11:30:42AM -0700, Dave Hansen wrote:
> On 10/6/20 10:00 AM, Dave Martin wrote:
> > On Tue, Oct 06, 2020 at 08:33:47AM -0700, Dave Hansen wrote:
> >> On 10/6/20 8:25 AM, Dave Martin wrote:
> >>> Or are people reporting real stack overruns on x86 today?
> >> We have real overruns. We have ~2800 bytes of XSAVE (regisiter) state
> >> mostly from AVX-512, and a 2048 byte MINSIGSTKSZ.
> > Right. Out of interest, do you believe that's a direct consequence of
> > the larger kernel-generated signal frame, or does the expansion of
> > userspace stack frames play a role too?
>
> The kernel-generated signal frame is entirely responsible for the ~2800
> bytes that I'm talking about.
>
> I'm sure there are some systems where userspace plays a role, but those
> are much less of a worry at the moment, since the kernel-induced
> overflows mean an instant crash that userspace has no recourse for.

Ack, that sounds pretty convincing.

Cheers
---Dave

2020-10-07 11:08:04

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Tue, Oct 06, 2020 at 10:44:14AM -0700, H.J. Lu wrote:
> On Tue, Oct 6, 2020 at 9:55 AM Dave Martin <[email protected]> wrote:
> >
> > On Tue, Oct 06, 2020 at 08:34:06AM -0700, H.J. Lu wrote:
> > > On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <[email protected]> wrote:
> > > >
> > > > On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > > > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > > > added to the architecture.
> > > > > > > > >
> > > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > > > the user stack, [2][3].
> > > > > > > > >
> > > > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > > > >
> > > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > > > cause user stack overflow when delivering a signal.
> > > > > > > > >
> > > > > > > > > In this series, we suggest a couple of things:
> > > > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > > > approach to [5]
> > > > > > > > > 2. Avoid using a too-small alternate stack
> > > > > > > >
> > > > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > > > series does seem consistent with the way arm64 populates
> > > > > > > > AT_MINSIGSTKSZ.
> > > > > > > >
> > > > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > > > this...
> > > > > > >
> > > > > > > Here is my proposal for glibc:
> > > > > > >
> > > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > > > >
> > > > > > Thanks for the link.
> > > > > >
> > > > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > > > drop them if there's something already out there.
> > > > >
> > > > > I am working on it.
> > > >
> > > > OK. I may post something for discussion, but I'm happy for it to be
> > > > superseded by someone (i.e., other than me) who actually knows what
> > > > they're doing...
> > >
> > > Please see my previous email for my glibc patch:
> > >
> > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
> > >
> > > > > >
> > > > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > > > >
> > > > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > > > buffer overruns.
> > > > > >
> > > > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > > > definitions) was that userspace binaries will have baked in the old
> > > > > > value of the constant and may be making assumptions about it.
> > > > > >
> > > > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > > > changes. This could be a problem if an newly built library tries to
> > > > > > memcpy() or dump such an object defined by and old binary.
> > > > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > > > and makecontext() could similarly go wrong.
> > > > >
> > > > > With my original proposal:
> > > > >
> > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > > > >
> > > > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > > > constants:
> > > > >
> > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> > > >
> > > > Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> > > > array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> > > > obvious thing to do with this constant in many simple cases. Such usage
> > > > is widespread, see:
> > > >
> > > > * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
> > > >
> > > >
> > > > Your two approaches seem to trade off two different sources of buffer
> > > > overruns: undersized stacks versus ABI breaks across library boundaries.
> > >
> > > We can't get everything we want.
> > >
> > > > Since undersized stack is by far the more familiar problem and we at
> > > > least have guard regions to help detect overruns, I'd vote to keep
> > > > MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.
> > >
> > > Agree.
> > >
> > > > Or are people reporting real stack overruns on x86 today?
> > >
> > > I hope so.
> > >
> > > >
> > > > For arm64, we made large vectors on SVE opt-in, so that oversized signal
> > > > frames are not seen by default. Would somethine similar be feasible on
> > > > x86?
> > > >
> > > >
> > > > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > > > >
> > > > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > > > discovery method is changing. The meaning of the value is exactly the
> > > > > > same as before.
> > > > > >
> > > > > > If we are going to rename it though, it could make sense to go for
> > > > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > > > >
> > > > > > The trouble with including "STKSZ" is that is sounds like a
> > > > > > recommendation for your stack size. While the signal frame size is
> > > > > > relevant to picking a stack size, it's not the only thing to
> > > > > > consider.
> > > > >
> > > > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > > > kernel. The minimum stack size for a signal handler is more likely
> > > > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > > > frame size used by kernel + 6KB for user application.
> > > >
> > > > Ack; to be correct, you also need to take into account which signals may
> > > > be unmasked while running on this stack, and the stack requirements of
> > > > all their handlers. Unfortunately, that's hard :(
> > > >
> > > > What's your view on my naming suggesions?
> > >
> > > I used _SC_MINSIGSTKSZ:
> > >
> > > https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9
> >
> > Apologies, I missed that.
> >
> > Otherwise, the changes look much as I would expect, except for the
> > "6K for user program" thing. This is strictly not included in the
> > legacy MINSIGSTKSZ.
> >
> > >
> > > >
> > > > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > > > (and glibc already has to guess anyway, in order to determine the
> > > > > > default thread stack size).
> > > > >
> > > > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > > > available.
> > > >
> > > > In my code, I generate _SC_SIGSTKSZ as the equivalent of
> > > >
> > > > max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
> > > >
> > > > which is >= the legacy value, and broadly reperesentative of the
> > > > relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
> > > >
> > > >
> > > > What do you think?
> > >
> > > sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.
> >
> > Why, though?
> >
> > MINSIGSTKSZ is not specified to be usable as-is for any case whatsoever.
> >
> >
> > Software that calculates its own needs to know the actual system values,
> > not estimates based on guesses about how much stack a typical program
> > might need if it were recompiled for x86.
> >
> > This doesn't mean we can't have a generic suggested value that's suitable
> > for common scenarios (like SIGSTKSZ), but if we do then I think it
> > should be a separate constant.
>
> I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
> _SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
> which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
> of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.

Can I suggest sysconf (_SC_MINSIGSTKSZ) * 4 instead?

If the arch has more or bigger registers to save in the signal frame,
the chances are that they're going to get saved in some userspace stack
frames too. So I suspect that the user signal handler stack usage may
scale up to some extent rather than being a constant.


To help people migrate without unpleasant surprises, I also figured it
would be a good idea to make sure that sysconf (_SC_MINSIGSTKSZ) >=
legacy MINSIGSTKSZ, and sysconf (_SC_SIGSTKSZ) >= legacy SIGSTKSZ.
This should makes it safer to use sysconf (_SC_MINSIGSTKSZ) as a
drop-in replacement for MINSIGSTKSZ, etc.

(To explain: AT_MINSIGSTKSZ may actually be < MINSIGSTKSZ on AArch64.
My idea was that sysconf () should hide this surprise, but people who
really want to know the kernel value would tolerate some
nonportability and read AT_MINSIGSTKSZ directly.)


So then:

kernel_minsigstksz = getauxval(AT_MINSIGSTKSZ);
minsigstksz = LEGACY_MINSIGSTKSZ;
if (kernel_minsigstksz > minsigstksz)
minsistksz = kernel_minsigstksz;

sigstksz = LEGACY_SIGSTKSZ;
if (minsigstksz * 4 > sigstksz)
sigstksz = minsigstksz * 4;


(Or something like that, unless the architecture provides its own
definitions. ia64's MINSIGSTKSZ is enormous, so it probably doesn't
want this.)


Also: should all these values be rounded up to a multiple of the
architecture's preferred stack alignment?

Should the preferred stack alignment also be exposed through sysconf?
Portable code otherwise has no way to know this, though if the
preferred alignment is <= the minimum malloc()/alloca() alignment then
this is generally not an issue.)


> >
> > > > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > > > is in use.
> > > > > >
> > > > > > Great if we can do it. I was concerned that this might be
> > > > > > controversial.
> > > > > >
> > > > > > Would this just be a recommendation, or can we enforce it somehow?
> > > > >
> > > > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > > > MINSIGSTKSZ.
> > > >
> > > > Totally agree with that.
> > > >
> > >
> > > With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
> > > if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.
> >
> > Ah yes, I see. That's a sensible precaution.
> >
> > Is it accepted in general that defining different feature test macros
> > can lead to ABI incompatibilities?
> >
> > I have thought that building a shared library with _GNU_SOURCE (say)
> > doesn't mean that a program that loads that library must also be built
> > with _GNU_SOURCE. For one thing, that's hard to police.
> >
> >
> > However, there are already combinations that could break, e.g., mixing
> > -D_FILE_OFFSET_BITS=64 with -D_FILE_OFFSET_BITS=32 would be broken if
> > this define changes off_t.
> >
> >
> > So, maybe having _SC_MINSIGSTKSZ_SOURCE break things in this way is an
> > acceptable compromise. Interfaces that depend on the value of
> > MINSIGSTKSZ or SIGSTKSZ are possible, but probably rare in practice --
> > I don't know of a specific example.
> >
>
> I changed it to _SC_SIGSTKSZ_SOURCE:
>
> https://gitlab.com/x86-glibc/glibc/-/commit/41d5e6b31025721590f12d5aa543eb0bc53ce263
>
> #ifdef __USE_SC_SIGSTKSZ
> # include <unistd.h>
> /* Minimum stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
> # undef MINSIGSTKSZ
> # define MINSIGSTKSZ sysconf (_SC_SIGSTKSZ)
> /* System default stack size for a signal handler: MINSIGSTKSZ. */
> # undef SIGSTKSZ
> # define SIGSTKSZ MINSIGSTKSZ
> #endif
>
> Compilation will fail if the source assumes constant MINSIGSTKSZ or
> SIGSTKSZ.

I don't understand all the glibc-fu, bit it looks reasonable overall
(notwithstanding my comments above).

Cheers
---Dave

2020-10-07 13:34:57

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Wed, Oct 7, 2020 at 3:47 AM Dave Martin <[email protected]> wrote:
>
> On Tue, Oct 06, 2020 at 10:44:14AM -0700, H.J. Lu wrote:
> > On Tue, Oct 6, 2020 at 9:55 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 08:34:06AM -0700, H.J. Lu wrote:
> > > > On Tue, Oct 6, 2020 at 8:25 AM Dave Martin <[email protected]> wrote:
> > > > >
> > > > > On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> > > > > > On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > > > > > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > > > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > > > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > > > > > > which has grown over time as new features and larger registers have been
> > > > > > > > > > added to the architecture.
> > > > > > > > > >
> > > > > > > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > > > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > > > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > > > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > > > > > > the user stack, [2][3].
> > > > > > > > > >
> > > > > > > > > > However, this constant is much too small and does not reflect recent
> > > > > > > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > > > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > > > > > > >
> > > > > > > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > > > > > > cause user stack overflow when delivering a signal.
> > > > > > > > > >
> > > > > > > > > > In this series, we suggest a couple of things:
> > > > > > > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > > > > > > > approach to [5]
> > > > > > > > > > 2. Avoid using a too-small alternate stack
> > > > > > > > >
> > > > > > > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > > > > > > series does seem consistent with the way arm64 populates
> > > > > > > > > AT_MINSIGSTKSZ.
> > > > > > > > >
> > > > > > > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > > > > > > this...
> > > > > > > >
> > > > > > > > Here is my proposal for glibc:
> > > > > > > >
> > > > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> > > > > > >
> > > > > > > Thanks for the link.
> > > > > > >
> > > > > > > Are there patches yet? I already had some hacks in the works, but I can
> > > > > > > drop them if there's something already out there.
> > > > > >
> > > > > > I am working on it.
> > > > >
> > > > > OK. I may post something for discussion, but I'm happy for it to be
> > > > > superseded by someone (i.e., other than me) who actually knows what
> > > > > they're doing...
> > > >
> > > > Please see my previous email for my glibc patch:
> > > >
> > > > https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/AT_MINSIGSTKSZ
> > > >
> > > > > > >
> > > > > > > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> > > > > > >
> > > > > > > Can we do this? IIUC, this is an ABI break and carries the risk of
> > > > > > > buffer overruns.
> > > > > > >
> > > > > > > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > > > > > > (apart from the fact that it is rarely used, due to glibc's shadowing
> > > > > > > definitions) was that userspace binaries will have baked in the old
> > > > > > > value of the constant and may be making assumptions about it.
> > > > > > >
> > > > > > > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > > > > > > changes. This could be a problem if an newly built library tries to
> > > > > > > memcpy() or dump such an object defined by and old binary.
> > > > > > > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > > > > > > and makecontext() could similarly go wrong.
> > > > > >
> > > > > > With my original proposal:
> > > > > >
> > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> > > > > >
> > > > > > char [MINSIGSTKSZ] won't compile. The feedback is to increase the
> > > > > > constants:
> > > > > >
> > > > > > https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html
> > > > >
> > > > > Ah, I see. But both still API and ABI breaks; moreover, declaraing an
> > > > > array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
> > > > > obvious thing to do with this constant in many simple cases. Such usage
> > > > > is widespread, see:
> > > > >
> > > > > * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1
> > > > >
> > > > >
> > > > > Your two approaches seem to trade off two different sources of buffer
> > > > > overruns: undersized stacks versus ABI breaks across library boundaries.
> > > >
> > > > We can't get everything we want.
> > > >
> > > > > Since undersized stack is by far the more familiar problem and we at
> > > > > least have guard regions to help detect overruns, I'd vote to keep
> > > > > MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.
> > > >
> > > > Agree.
> > > >
> > > > > Or are people reporting real stack overruns on x86 today?
> > > >
> > > > I hope so.
> > > >
> > > > >
> > > > > For arm64, we made large vectors on SVE opt-in, so that oversized signal
> > > > > frames are not seen by default. Would somethine similar be feasible on
> > > > > x86?
> > > > >
> > > > >
> > > > > > > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> > > > > > >
> > > > > > > How about "_SC_MINSIGSTKSZ"? This was my initial choice since only the
> > > > > > > discovery method is changing. The meaning of the value is exactly the
> > > > > > > same as before.
> > > > > > >
> > > > > > > If we are going to rename it though, it could make sense to go for
> > > > > > > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> > > > > > >
> > > > > > > The trouble with including "STKSZ" is that is sounds like a
> > > > > > > recommendation for your stack size. While the signal frame size is
> > > > > > > relevant to picking a stack size, it's not the only thing to
> > > > > > > consider.
> > > > > >
> > > > > > The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> > > > > > kernel. The minimum stack size for a signal handler is more likely
> > > > > > AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> > > > > > frame size used by kernel + 6KB for user application.
> > > > >
> > > > > Ack; to be correct, you also need to take into account which signals may
> > > > > be unmasked while running on this stack, and the stack requirements of
> > > > > all their handlers. Unfortunately, that's hard :(
> > > > >
> > > > > What's your view on my naming suggesions?
> > > >
> > > > I used _SC_MINSIGSTKSZ:
> > > >
> > > > https://gitlab.com/x86-glibc/glibc/-/commit/73ca53bfbc1c105bc579f55f15af011a07fcded9
> > >
> > > Apologies, I missed that.
> > >
> > > Otherwise, the changes look much as I would expect, except for the
> > > "6K for user program" thing. This is strictly not included in the
> > > legacy MINSIGSTKSZ.
> > >
> > > >
> > > > >
> > > > > > > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > > > > > > of a "recommended stack size" be abandoned? glibc can at least make a
> > > > > > > slightly more informed guess about suitable stack sizes than the kernel
> > > > > > > (and glibc already has to guess anyway, in order to determine the
> > > > > > > default thread stack size).
> > > > > >
> > > > > > Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> > > > > > available.
> > > > >
> > > > > In my code, I generate _SC_SIGSTKSZ as the equivalent of
> > > > >
> > > > > max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)
> > > > >
> > > > > which is >= the legacy value, and broadly reperesentative of the
> > > > > relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.
> > > > >
> > > > >
> > > > > What do you think?
> > > >
> > > > sysconf(_SC_MINSIGSTKSZ) should be usable ASIS for most cases.
> > >
> > > Why, though?
> > >
> > > MINSIGSTKSZ is not specified to be usable as-is for any case whatsoever.
> > >
> > >
> > > Software that calculates its own needs to know the actual system values,
> > > not estimates based on guesses about how much stack a typical program
> > > might need if it were recompiled for x86.
> > >
> > > This doesn't mean we can't have a generic suggested value that's suitable
> > > for common scenarios (like SIGSTKSZ), but if we do then I think it
> > > should be a separate constant.
> >
> > I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
> > _SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
> > which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
> > of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.
>
> Can I suggest sysconf (_SC_MINSIGSTKSZ) * 4 instead?

Done.

> If the arch has more or bigger registers to save in the signal frame,
> the chances are that they're going to get saved in some userspace stack
> frames too. So I suspect that the user signal handler stack usage may
> scale up to some extent rather than being a constant.
>
>
> To help people migrate without unpleasant surprises, I also figured it
> would be a good idea to make sure that sysconf (_SC_MINSIGSTKSZ) >=
> legacy MINSIGSTKSZ, and sysconf (_SC_SIGSTKSZ) >= legacy SIGSTKSZ.
> This should makes it safer to use sysconf (_SC_MINSIGSTKSZ) as a
> drop-in replacement for MINSIGSTKSZ, etc.
>
> (To explain: AT_MINSIGSTKSZ may actually be < MINSIGSTKSZ on AArch64.
> My idea was that sysconf () should hide this surprise, but people who
> really want to know the kernel value would tolerate some
> nonportability and read AT_MINSIGSTKSZ directly.)
>
>
> So then:
>
> kernel_minsigstksz = getauxval(AT_MINSIGSTKSZ);
> minsigstksz = LEGACY_MINSIGSTKSZ;
> if (kernel_minsigstksz > minsigstksz)
> minsistksz = kernel_minsigstksz;
>
> sigstksz = LEGACY_SIGSTKSZ;
> if (minsigstksz * 4 > sigstksz)
> sigstksz = minsigstksz * 4;

I updated users/hjl/AT_MINSIGSTKSZ branch with

+@item _SC_MINSIGSTKSZ
+@standards{GNU, unistd.h}
+Inquire about the signal stack size used by the kernel.
+
+@item _SC_SIGSTKSZ
+@standards{GNU, unistd.h}
+Inquire about the default signal stack size for a signal handler.

case _SC_MINSIGSTKSZ:
assert (GLRO(dl_minsigstacksize) != 0);
return GLRO(dl_minsigstacksize);

case _SC_SIGSTKSZ:
{
/* Return MAX (MINSIGSTKSZ, sysconf (_SC_MINSIGSTKSZ)) * 4. */
long int minsigstacksize = GLRO(dl_minsigstacksize);
_Static_assert (__builtin_constant_p (MINSIGSTKSZ),
"MINSIGSTKSZ is constant");
if (minsigstacksize < MINSIGSTKSZ)
minsigstacksize = MINSIGSTKSZ;
return minsigstacksize * 4;
}

>
> (Or something like that, unless the architecture provides its own
> definitions. ia64's MINSIGSTKSZ is enormous, so it probably doesn't
> want this.)
>
>
> Also: should all these values be rounded up to a multiple of the
> architecture's preferred stack alignment?

Kernel should provide a properly aligned AT_MINSIGSTKSZ.

> Should the preferred stack alignment also be exposed through sysconf?
> Portable code otherwise has no way to know this, though if the
> preferred alignment is <= the minimum malloc()/alloca() alignment then
> this is generally not an issue.)

Good question. But it is orthogonal to the signal stack size issue.

>
> > >
> > > > > > > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > > > > > > is in use.
> > > > > > >
> > > > > > > Great if we can do it. I was concerned that this might be
> > > > > > > controversial.
> > > > > > >
> > > > > > > Would this just be a recommendation, or can we enforce it somehow?
> > > > > >
> > > > > > It is just an idea. We need to move away from constant SIGSTKSZ and
> > > > > > MINSIGSTKSZ.
> > > > >
> > > > > Totally agree with that.
> > > > >
> > > >
> > > > With my glibc patch, -D_SC_MINSIGSTKSZ_SOURCE will fail to compile
> > > > if the source assumes constant SIGSTKSZ or MINSIGSTKSZ.
> > >
> > > Ah yes, I see. That's a sensible precaution.
> > >
> > > Is it accepted in general that defining different feature test macros
> > > can lead to ABI incompatibilities?
> > >
> > > I have thought that building a shared library with _GNU_SOURCE (say)
> > > doesn't mean that a program that loads that library must also be built
> > > with _GNU_SOURCE. For one thing, that's hard to police.
> > >
> > >
> > > However, there are already combinations that could break, e.g., mixing
> > > -D_FILE_OFFSET_BITS=64 with -D_FILE_OFFSET_BITS=32 would be broken if
> > > this define changes off_t.
> > >
> > >
> > > So, maybe having _SC_MINSIGSTKSZ_SOURCE break things in this way is an
> > > acceptable compromise. Interfaces that depend on the value of
> > > MINSIGSTKSZ or SIGSTKSZ are possible, but probably rare in practice --
> > > I don't know of a specific example.
> > >
> >
> > I changed it to _SC_SIGSTKSZ_SOURCE:
> >
> > https://gitlab.com/x86-glibc/glibc/-/commit/41d5e6b31025721590f12d5aa543eb0bc53ce263
> >
> > #ifdef __USE_SC_SIGSTKSZ
> > # include <unistd.h>
> > /* Minimum stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
> > # undef MINSIGSTKSZ
> > # define MINSIGSTKSZ sysconf (_SC_SIGSTKSZ)
> > /* System default stack size for a signal handler: MINSIGSTKSZ. */
> > # undef SIGSTKSZ
> > # define SIGSTKSZ MINSIGSTKSZ
> > #endif
> >
> > Compilation will fail if the source assumes constant MINSIGSTKSZ or
> > SIGSTKSZ.
>
> I don't understand all the glibc-fu, bit it looks reasonable overall
> (notwithstanding my comments above).
>
> Cheers
> ---Dave



--
H.J.

2020-10-07 16:19:34

by Dave Martin

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Wed, Oct 07, 2020 at 06:30:03AM -0700, H.J. Lu wrote:
> On Wed, Oct 7, 2020 at 3:47 AM Dave Martin <[email protected]> wrote:
> >
> > On Tue, Oct 06, 2020 at 10:44:14AM -0700, H.J. Lu wrote:

[...]

> > > I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
> > > _SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
> > > which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
> > > of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.
> >
> > Can I suggest sysconf (_SC_MINSIGSTKSZ) * 4 instead?
>
> Done.

OK. I was prepared to have to argue my case a bit more, but if you
think this is OK then I will stop arguing ;)


> > If the arch has more or bigger registers to save in the signal frame,
> > the chances are that they're going to get saved in some userspace stack
> > frames too. So I suspect that the user signal handler stack usage may
> > scale up to some extent rather than being a constant.
> >
> >
> > To help people migrate without unpleasant surprises, I also figured it
> > would be a good idea to make sure that sysconf (_SC_MINSIGSTKSZ) >=
> > legacy MINSIGSTKSZ, and sysconf (_SC_SIGSTKSZ) >= legacy SIGSTKSZ.
> > This should makes it safer to use sysconf (_SC_MINSIGSTKSZ) as a
> > drop-in replacement for MINSIGSTKSZ, etc.
> >
> > (To explain: AT_MINSIGSTKSZ may actually be < MINSIGSTKSZ on AArch64.
> > My idea was that sysconf () should hide this surprise, but people who
> > really want to know the kernel value would tolerate some
> > nonportability and read AT_MINSIGSTKSZ directly.)
> >
> >
> > So then:
> >
> > kernel_minsigstksz = getauxval(AT_MINSIGSTKSZ);
> > minsigstksz = LEGACY_MINSIGSTKSZ;
> > if (kernel_minsigstksz > minsigstksz)
> > minsistksz = kernel_minsigstksz;
> >
> > sigstksz = LEGACY_SIGSTKSZ;
> > if (minsigstksz * 4 > sigstksz)
> > sigstksz = minsigstksz * 4;
>
> I updated users/hjl/AT_MINSIGSTKSZ branch with
>
> +@item _SC_MINSIGSTKSZ
> +@standards{GNU, unistd.h}

Can we specify these more agressively now?

> +Inquire about the signal stack size used by the kernel.

I think we've already concluded that this should included all mandatory
overheads, including those imposed by the compiler and glibc?

e.g.:

--8<--

The returned value is the minimum number of bytes of free stack space
required in order to gurantee successful, non-nested handling of a
single signal whose handler is an empty function.

-->8--

> +
> +@item _SC_SIGSTKSZ
> +@standards{GNU, unistd.h}
> +Inquire about the default signal stack size for a signal handler.

Similarly:

--8<--

The returned value is the suggested minimum number of bytes of stack
space required for a signal stack.

This is not guaranteed to be enough for any specific purpose other than
the invocation of a single, non-nested, empty handler, but nonetheless
should be enough for basic scenarios involving simple signal handlers
and very low levels of signal nesting (say, 2 or 3 levels at the very
most).

This value is provided for developer convenience and to ease migration
from the legacy SIGSTKSZ constant. Programs requiring stronger
guarantees should avoid using it if at all possible.

-->8--


If these descriptions are too wordy, we might want to move some of it
out to signal.texi, though.

>
> case _SC_MINSIGSTKSZ:
> assert (GLRO(dl_minsigstacksize) != 0);
> return GLRO(dl_minsigstacksize);
>
> case _SC_SIGSTKSZ:
> {
> /* Return MAX (MINSIGSTKSZ, sysconf (_SC_MINSIGSTKSZ)) * 4. */
> long int minsigstacksize = GLRO(dl_minsigstacksize);
> _Static_assert (__builtin_constant_p (MINSIGSTKSZ),
> "MINSIGSTKSZ is constant");
> if (minsigstacksize < MINSIGSTKSZ)
> minsigstacksize = MINSIGSTKSZ;
> return minsigstacksize * 4;
> }
>
> >
> > (Or something like that, unless the architecture provides its own
> > definitions. ia64's MINSIGSTKSZ is enormous, so it probably doesn't
> > want this.)
> >
> >
> > Also: should all these values be rounded up to a multiple of the
> > architecture's preferred stack alignment?
>
> Kernel should provide a properly aligned AT_MINSIGSTKSZ.

OK. Can you comment on Chang S. Bae's series? I wasn't sure that the
proposal produces an aligned value for AT_MINSIGSTKSZ on x86, but maybe
I just worked it out wrong.


> > Should the preferred stack alignment also be exposed through sysconf?
> > Portable code otherwise has no way to know this, though if the
> > preferred alignment is <= the minimum malloc()/alloca() alignment then
> > this is generally not an issue.)
>
> Good question. But it is orthogonal to the signal stack size issue.

Ack.

There are various other brokennesses around this area, such as the fact
that the register data may now run off the end of the mcontext_t object
that is supposed to contain it. Ideally we should fork ucontext_t or
mcontext_t into two types, one for the ucontext API and one for the
signal API (which are anyway highly incompatible with each other).

If this stuff is addressed, I guess we could bundle it under a more
general feature test macro. But it's probably best to nail down this
series in something like its current form first.


I'll follow up on libc-alpha with a wishlist, but that will be a
separate conversation...

[...]

Cheers
---Dave

2020-10-07 18:18:33

by H.J. Lu

[permalink] [raw]
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size

On Wed, Oct 7, 2020 at 8:46 AM Dave Martin <[email protected]> wrote:
>
> On Wed, Oct 07, 2020 at 06:30:03AM -0700, H.J. Lu wrote:
> > On Wed, Oct 7, 2020 at 3:47 AM Dave Martin <[email protected]> wrote:
> > >
> > > On Tue, Oct 06, 2020 at 10:44:14AM -0700, H.J. Lu wrote:
>
> [...]
>
> > > > I updated my glibc patch to add both _SC_MINSIGSTKSZ and _SC_SIGSTKSZ.
> > > > _SC_MINSIGSTKSZ is the minimum signal stack size from AT_MINSIGSTKSZ,
> > > > which is the signal frame size used by kernel, and _SC_SIGSTKSZ is the value
> > > > of sysconf (_SC_MINSIGSTKSZ) + 6KB for user application.
> > >
> > > Can I suggest sysconf (_SC_MINSIGSTKSZ) * 4 instead?
> >
> > Done.
>
> OK. I was prepared to have to argue my case a bit more, but if you
> think this is OK then I will stop arguing ;)
>
>
> > > If the arch has more or bigger registers to save in the signal frame,
> > > the chances are that they're going to get saved in some userspace stack
> > > frames too. So I suspect that the user signal handler stack usage may
> > > scale up to some extent rather than being a constant.
> > >
> > >
> > > To help people migrate without unpleasant surprises, I also figured it
> > > would be a good idea to make sure that sysconf (_SC_MINSIGSTKSZ) >=
> > > legacy MINSIGSTKSZ, and sysconf (_SC_SIGSTKSZ) >= legacy SIGSTKSZ.
> > > This should makes it safer to use sysconf (_SC_MINSIGSTKSZ) as a
> > > drop-in replacement for MINSIGSTKSZ, etc.
> > >
> > > (To explain: AT_MINSIGSTKSZ may actually be < MINSIGSTKSZ on AArch64.
> > > My idea was that sysconf () should hide this surprise, but people who
> > > really want to know the kernel value would tolerate some
> > > nonportability and read AT_MINSIGSTKSZ directly.)
> > >
> > >
> > > So then:
> > >
> > > kernel_minsigstksz = getauxval(AT_MINSIGSTKSZ);
> > > minsigstksz = LEGACY_MINSIGSTKSZ;
> > > if (kernel_minsigstksz > minsigstksz)
> > > minsistksz = kernel_minsigstksz;
> > >
> > > sigstksz = LEGACY_SIGSTKSZ;
> > > if (minsigstksz * 4 > sigstksz)
> > > sigstksz = minsigstksz * 4;
> >
> > I updated users/hjl/AT_MINSIGSTKSZ branch with
> >
> > +@item _SC_MINSIGSTKSZ
> > +@standards{GNU, unistd.h}
>
> Can we specify these more agressively now?
>
> > +Inquire about the signal stack size used by the kernel.
>
> I think we've already concluded that this should included all mandatory
> overheads, including those imposed by the compiler and glibc?
>
> e.g.:
>
> --8<--
>
> The returned value is the minimum number of bytes of free stack space
> required in order to gurantee successful, non-nested handling of a
> single signal whose handler is an empty function.
>
> -->8--
>
> > +
> > +@item _SC_SIGSTKSZ
> > +@standards{GNU, unistd.h}
> > +Inquire about the default signal stack size for a signal handler.
>
> Similarly:
>
> --8<--
>
> The returned value is the suggested minimum number of bytes of stack
> space required for a signal stack.
>
> This is not guaranteed to be enough for any specific purpose other than
> the invocation of a single, non-nested, empty handler, but nonetheless
> should be enough for basic scenarios involving simple signal handlers
> and very low levels of signal nesting (say, 2 or 3 levels at the very
> most).
>
> This value is provided for developer convenience and to ease migration
> from the legacy SIGSTKSZ constant. Programs requiring stronger
> guarantees should avoid using it if at all possible.
>
> -->8--

Done.

>
> If these descriptions are too wordy, we might want to move some of it
> out to signal.texi, though.
>
> >
> > case _SC_MINSIGSTKSZ:
> > assert (GLRO(dl_minsigstacksize) != 0);
> > return GLRO(dl_minsigstacksize);
> >
> > case _SC_SIGSTKSZ:
> > {
> > /* Return MAX (MINSIGSTKSZ, sysconf (_SC_MINSIGSTKSZ)) * 4. */
> > long int minsigstacksize = GLRO(dl_minsigstacksize);
> > _Static_assert (__builtin_constant_p (MINSIGSTKSZ),
> > "MINSIGSTKSZ is constant");
> > if (minsigstacksize < MINSIGSTKSZ)
> > minsigstacksize = MINSIGSTKSZ;
> > return minsigstacksize * 4;
> > }
> >
> > >
> > > (Or something like that, unless the architecture provides its own
> > > definitions. ia64's MINSIGSTKSZ is enormous, so it probably doesn't
> > > want this.)
> > >
> > >
> > > Also: should all these values be rounded up to a multiple of the
> > > architecture's preferred stack alignment?
> >
> > Kernel should provide a properly aligned AT_MINSIGSTKSZ.
>
> OK. Can you comment on Chang S. Bae's series? I wasn't sure that the
> proposal produces an aligned value for AT_MINSIGSTKSZ on x86, but maybe
> I just worked it out wrong.

In Linux kernel, the signal frame data is composed of the following areas
and laid out as:
------------------------------
| alignment padding |
------------------------------
| xsave buffer | <<<<<<< This must be 64 byte aligned
------------------------------
| fsave header (32-bit only) |
------------------------------
| siginfo + ucontext |
------------------------------

In order to get the proper alignment for xsave buffer, the signal stake
size should be rounded up to 64 byte aligned. In glibc, I have

/* Size of xsave buffer. */
unsigned int sigframe_size = ebx;
if (64 bit)
/* NB: sizeof(struct rt_sigframe) in Linux kernel. */
sigframe_size += 440;
else
/* NB: sizeof(struct sigframe_ia32) + sizeof(struct fregs_state)) in
Linux kernel. */
sigframe_size += 736 + 112;
endif

/* Round up to 64-byte aligned. */
sigframe_size = ALIGN_UP (sigframe_size, 64);

GLRO(dl_minsigstacksize) = sigframe_size;

Kernel should do something similar.

>
> > > Should the preferred stack alignment also be exposed through sysconf?
> > > Portable code otherwise has no way to know this, though if the
> > > preferred alignment is <= the minimum malloc()/alloca() alignment then
> > > this is generally not an issue.)
> >
> > Good question. But it is orthogonal to the signal stack size issue.
>
> Ack.
>
> There are various other brokennesses around this area, such as the fact
> that the register data may now run off the end of the mcontext_t object
> that is supposed to contain it. Ideally we should fork ucontext_t or
> mcontext_t into two types, one for the ucontext API and one for the
> signal API (which are anyway highly incompatible with each other).
>
> If this stuff is addressed, I guess we could bundle it under a more
> general feature test macro. But it's probably best to nail down this
> series in something like its current form first.

Agreed.

>
> I'll follow up on libc-alpha with a wishlist, but that will be a
> separate conversation...
>
> [...]
>
> Cheers
> ---Dave



--
H.J.