2022-05-11 09:42:38

by Kirill A. Shutemov

[permalink] [raw]
Subject: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

Add three new arch_prctl() handles:

- ARCH_THREAD_FEATURE_ENABLE/DISABLE enables or disables the specified
features. Returns what features are enabled after the operation.

- ARCH_THREAD_FEATURE_LOCK prevents future disabling or enabling of the
specified features. Returns the new set of locked features.

The features handled per-thread and inherited over fork(2)/clone(2), but
reset on exec().

This is preparation patch. It does not impelement any features.

Signed-off-by: Kirill A. Shutemov <[email protected]>
---
arch/x86/include/asm/processor.h | 3 +++
arch/x86/include/uapi/asm/prctl.h | 5 +++++
arch/x86/kernel/process.c | 37 +++++++++++++++++++++++++++++++
3 files changed, 45 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 91d0f93a00c7..ff0c34e18cc6 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -530,6 +530,9 @@ struct thread_struct {
*/
u32 pkru;

+ unsigned long features;
+ unsigned long features_locked;
+
/* Floating point and extended processor state */
struct fpu fpu;
/*
diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h
index 500b96e71f18..67fc30d36c73 100644
--- a/arch/x86/include/uapi/asm/prctl.h
+++ b/arch/x86/include/uapi/asm/prctl.h
@@ -20,4 +20,9 @@
#define ARCH_MAP_VDSO_32 0x2002
#define ARCH_MAP_VDSO_64 0x2003

+/* Never implement 0x3001, it will confuse old glibc's */
+#define ARCH_THREAD_FEATURE_ENABLE 0x3002
+#define ARCH_THREAD_FEATURE_DISABLE 0x3003
+#define ARCH_THREAD_FEATURE_LOCK 0x3004
+
#endif /* _ASM_X86_PRCTL_H */
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index b370767f5b19..cb8fc28f2eae 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -367,6 +367,10 @@ void arch_setup_new_exec(void)
task_clear_spec_ssb_noexec(current);
speculation_ctrl_update(read_thread_flags());
}
+
+ /* Reset thread features on exec */
+ current->thread.features = 0;
+ current->thread.features_locked = 0;
}

#ifdef CONFIG_X86_IOPL_IOPERM
@@ -985,6 +989,35 @@ unsigned long __get_wchan(struct task_struct *p)
return addr;
}

+static long thread_feature_prctl(struct task_struct *task, int option,
+ unsigned long features)
+{
+ const unsigned long known_features = 0;
+
+ if (features & ~known_features)
+ return -EINVAL;
+
+ if (option == ARCH_THREAD_FEATURE_LOCK) {
+ task->thread.features_locked |= features;
+ return task->thread.features_locked;
+ }
+
+ /* Do not allow to change locked features */
+ if (features & task->thread.features_locked)
+ return -EPERM;
+
+ if (option == ARCH_THREAD_FEATURE_DISABLE) {
+ task->thread.features &= ~features;
+ goto out;
+ }
+
+ /* Handle ARCH_THREAD_FEATURE_ENABLE */
+
+ task->thread.features |= features;
+out:
+ return task->thread.features;
+}
+
long do_arch_prctl_common(struct task_struct *task, int option,
unsigned long arg2)
{
@@ -999,6 +1032,10 @@ long do_arch_prctl_common(struct task_struct *task, int option,
case ARCH_GET_XCOMP_GUEST_PERM:
case ARCH_REQ_XCOMP_GUEST_PERM:
return fpu_xstate_prctl(task, option, arg2);
+ case ARCH_THREAD_FEATURE_ENABLE:
+ case ARCH_THREAD_FEATURE_DISABLE:
+ case ARCH_THREAD_FEATURE_LOCK:
+ return thread_feature_prctl(task, option, arg2);
}

return -EINVAL;
--
2.35.1



2022-05-14 01:34:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Wed, May 11 2022 at 05:27, Kirill A. Shutemov wrote:
> Add three new arch_prctl() handles:
>
> +static long thread_feature_prctl(struct task_struct *task, int option,
> + unsigned long features)

Bah. I really hate the task pointer on all these x86 prctls. @task must
always be current, so this @task argument is just confusion.

> +{
> + const unsigned long known_features = 0;
> +
> + if (features & ~known_features)
> + return -EINVAL;

This implementation allows to task->read features_[locked] with
@features == 0. That should be documented somewhere.

> + if (option == ARCH_THREAD_FEATURE_LOCK) {
> + task->thread.features_locked |= features;
> + return task->thread.features_locked;
> + }


> + /* Do not allow to change locked features */
> + if (features & task->thread.features_locked)
> + return -EPERM;
> +
> + if (option == ARCH_THREAD_FEATURE_DISABLE) {
> + task->thread.features &= ~features;
> + goto out;
> + }
> +
> + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> +
> + task->thread.features |= features;
> +out:
> + return task->thread.features;

Eyes bleed.

if (option == ARCH_THREAD_FEATURE_ENABLE)
task->thread.features |= features;
else
task->thread.features &= ~features;

return task->thread.features;

It's bloody obvious from the code that the counterpart of enable is
disable, no? So neither the goto nor the 'comment the obvious' is useful
in any way.

Thanks,

tglx




2022-05-14 02:23:57

by Alexander Potapenko

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

> +
> + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> +
> + task->thread.features |= features;
> +out:
> + return task->thread.features;
Isn't arch_prctl() supposed to return 0 on success?

2022-05-14 02:34:52

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote:
> > +
> > + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> > +
> > + task->thread.features |= features;
> > +out:
> > + return task->thread.features;
>
> Isn't arch_prctl() supposed to return 0 on success?

Hmm, good point. Maybe we'll need a struct to pass info in and out.

2022-05-14 04:23:17

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote:
> On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote:
> > > +
> > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> > > +
> > > + task->thread.features |= features;
> > > +out:
> > > + return task->thread.features;
> >
> > Isn't arch_prctl() supposed to return 0 on success?
>
> Hmm, good point. Maybe we'll need a struct to pass info in and out.

But values >0 are unused. I don't see why can't we use them.

--
Kirill A. Shutemov

2022-05-14 04:35:14

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote:
> On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote:
> > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote:
> > > > +
> > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> > > > +
> > > > + task->thread.features |= features;
> > > > +out:
> > > > + return task->thread.features;
> > >
> > > Isn't arch_prctl() supposed to return 0 on success?
> >
> > Hmm, good point. Maybe we'll need a struct to pass info in and out.
>
> But values >0 are unused. I don't see why can't we use them.

Hmm, I don't know what it would break since it is a new "code"
argument. But the man page says:
"On success, arch_prctl() returns 0; on error, -1 is returned, and
errno is set to indicate the error."

So just change the man pages?
"On success, arch_prctl() returns positive values; on error, -1 is
returned, and errno is set to indicate the error."


2022-05-14 16:01:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Fri, May 13 2022 at 23:50, Edgecombe, Rick P wrote:
> On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote:
>> On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P wrote:
>> > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote:
>> > > > +
>> > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */
>> > > > +
>> > > > + task->thread.features |= features;
>> > > > +out:
>> > > > + return task->thread.features;
>> > >
>> > > Isn't arch_prctl() supposed to return 0 on success?
>> >
>> > Hmm, good point. Maybe we'll need a struct to pass info in and out.
>>
>> But values >0 are unused. I don't see why can't we use them.
>
> Hmm, I don't know what it would break since it is a new "code"
> argument. But the man page says:
> "On success, arch_prctl() returns 0; on error, -1 is returned, and
> errno is set to indicate the error."
>
> So just change the man pages?
> "On success, arch_prctl() returns positive values; on error, -1 is
> returned, and errno is set to indicate the error."

Why?

prctl(GET, &out)

is the pattern used all over the place.

Thanks,

tglx

2022-05-15 10:35:04

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sat, May 14 2022 at 23:06, Edgecombe, Rick P wrote:
> On Sat, 2022-05-14 at 10:37 +0200, Thomas Gleixner wrote:
>> > "On success, arch_prctl() returns positive values; on error, -1 is
>> > returned, and errno is set to indicate the error."
>>
>> Why?
>>
>> prctl(GET, &out)
>>
>> is the pattern used all over the place.
>
> It seems better to me, but we also need to pass something in.
>
> The idea of the "enable" operation is that userspace would pass in all
> the features that it wants in one call, and then find out back what was
> successfully enabled. So unlike the other arch_prctl()s, it wants to
> pass something in AND get a result in one arch_prctl() call. It doesn't
> need to check what is supported ahead of time. Since these enabling
> operations can fail (OOM, etc), userspace has to handle unexpected per-
> feature failure anyway. So it just blindly asks for what it wants.

I'm not convinced at all, that this wholesale enabling of independent
features makes any sense. That would require:

- that all features are strict binary of/off which is alredy not the
case with LAM due to the different mask sizes.

- that user space knows at some potentially early point of process
startup which features it needs. Some of them might be requested
later when libraries are loaded, but that would be too late for
others.

Aside of that, if you lump all these things together, what's the
semantics of this feature lump in case of failure:

Try to enable the rest when one fails, skip the rest, roll back?

Also such a feature lump results in a demultiplexing prctl() which is
code wise always inferior to dedicated controls.

> Any objections to having it write back the result in the same
> structure?

Why not.

> Otherwise, the option that used to be used here was a "status"
> arch_prctl(), which was called separately to find out what actually got
> enabled after an "enable" call. That fit with the GET/SET semantics
> already in place.
>
> I guess we could also get rid of the batch enabling idea, and just have
> one "enable" call per feature too. But then it is syscall heavy.

This is not a runtime hotpath problem. Those prctls() happen once when
the process starts, so having three which are designed for the
individual purpose instead of one ill defined is definitely the better
choice.

Premature optimization is never a good idea. Keep it simple is the right
starting point.

If it really turns out to be something which matters, then you can
provide a batch interface later on if it makes sense to do so, but see
above.

Thanks,

tglx

2022-05-15 21:18:27

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sat, 2022-05-14 at 10:37 +0200, Thomas Gleixner wrote:
> On Fri, May 13 2022 at 23:50, Edgecombe, Rick P wrote:
> > On Sat, 2022-05-14 at 02:09 +0300, Kirill A. Shutemov wrote:
> > > On Fri, May 13, 2022 at 05:34:12PM +0000, Edgecombe, Rick P
> > > wrote:
> > > > On Fri, 2022-05-13 at 16:09 +0200, Alexander Potapenko wrote:
> > > > > > +
> > > > > > + /* Handle ARCH_THREAD_FEATURE_ENABLE */
> > > > > > +
> > > > > > + task->thread.features |= features;
> > > > > > +out:
> > > > > > + return task->thread.features;
> > > > >
> > > > > Isn't arch_prctl() supposed to return 0 on success?
> > > >
> > > > Hmm, good point. Maybe we'll need a struct to pass info in and
> > > > out.
> > >
> > > But values >0 are unused. I don't see why can't we use them.
> >
> > Hmm, I don't know what it would break since it is a new "code"
> > argument. But the man page says:
> > "On success, arch_prctl() returns 0; on error, -1 is returned, and
> > errno is set to indicate the error."
> >
> > So just change the man pages?
> > "On success, arch_prctl() returns positive values; on error, -1 is
> > returned, and errno is set to indicate the error."
>
> Why?
>
> prctl(GET, &out)
>
> is the pattern used all over the place.

It seems better to me, but we also need to pass something in.

The idea of the "enable" operation is that userspace would pass in all
the features that it wants in one call, and then find out back what was
successfully enabled. So unlike the other arch_prctl()s, it wants to
pass something in AND get a result in one arch_prctl() call. It doesn't
need to check what is supported ahead of time. Since these enabling
operations can fail (OOM, etc), userspace has to handle unexpected per-
feature failure anyway. So it just blindly asks for what it wants.

Any objections to having it write back the result in the same
structure?

Otherwise, the option that used to be used here was a "status"
arch_prctl(), which was called separately to find out what actually got
enabled after an "enable" call. That fit with the GET/SET semantics
already in place.

I guess we could also get rid of the batch enabling idea, and just have
one "enable" call per feature too. But then it is syscall heavy.

Any chance you could weigh in on this?

2022-05-16 10:23:58

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sun, 2022-05-15 at 21:38 +0200, Thomas Gleixner wrote:
> On Sun, May 15 2022 at 18:24, Edgecombe, Rick P wrote:
> > On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote:
> > > If it really turns out to be something which matters, then you
> > > can
> > > provide a batch interface later on if it makes sense to do so,
> > > but
> > > see
> > > above.
> >
> > Thanks, sounds good to me.
> >
> > Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/
> > ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit
> > is
> > set. It returns 0 on success and whatever error code on failure.
> > Userspace can do whatever rollback logic it wants. What do you
> > think?
>
> Why having this feature bit interface in the first place?

The idea was that we should not have duplicate interfaces if we can
avoid it. It of course grew out of the "elf feature bit" stuff, but we
considered splitting them after moving away from that. LAM and CET's
enabling needs seemed close enough to avoid having two interfaces.

>
> It's going to be a demultiplex mechanism with incompatible
> arguments. Just look at LAM. What's really architecture specific
> about
> it?
>
> The mechanism per se is architecture independent: pointer tagging.
>
> What's architecture specific is whether it's supported, the address
> mask
> and the enable/disable mechanism.
>
> So having e.g.
>
> prctl(POINTER_TAGGING_GET_MASK, &mask);
>
> works on all architectures which support this. Ditto
>
> prctl(POINTER_TAGGING_ENABLE, &mask);
>
> is architecture agnostic. Both need to be backed by an architecture
> specific implementation of course.
>
> This makes it future proof because new CPUs could define the mask to
> be
> bit 57-61 and use bit 62 for something else. So from a user space
> perspective the mask retrival is useful because it's obvious and
> trivial
> to use and does not need code changes when the hardware
> implementation
> provides a different mask.

The lack of ability to pass extra arguments is a good point.

>
> See?

Regarding making it arch specific or not, if the LAM interface can be
arch agnostic, then that makes sense to me. I guess some CPU features
(virtual memory, etc) are similar enough that the kernel can hide them
beyond common interfaces. Some aren't (cpuid, gs register, etc). If LAM
can be one of the former, then sharing an interface with other
architectures does seem much better.

I'm thinking CET is different enough from other similar features that
leaving it as an arch thing is probably appropriate. BTI is probably
the closest (to IBT). It uses it's own BTI specific elf header bit, and
requires special PROT on memory, unlike IBT.

>
> The thread.features bitmap could still be used as an internal storage
> for enabled features, but having this as the primary programming
> interface is cumbersome and unflexible for anything which is not
> binary
> on/off.
>
> Thanks,
>
> tglx
>
>

2022-05-16 19:43:37

by Edgecombe, Rick P

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote:
> > Otherwise, the option that used to be used here was a "status"
> > arch_prctl(), which was called separately to find out what actually
> > got
> > enabled after an "enable" call. That fit with the GET/SET semantics
> > already in place.
> >
> > I guess we could also get rid of the batch enabling idea, and just
> > have
> > one "enable" call per feature too. But then it is syscall heavy.
>
> This is not a runtime hotpath problem. Those prctls() happen once
> when
> the process starts, so having three which are designed for the
> individual purpose instead of one ill defined is definitely the
> better
> choice.
>
> Premature optimization is never a good idea. Keep it simple is the
> right
> starting point.
>
> If it really turns out to be something which matters, then you can
> provide a batch interface later on if it makes sense to do so, but
> see
> above.

Thanks, sounds good to me.

Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/
ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit is
set. It returns 0 on success and whatever error code on failure.
Userspace can do whatever rollback logic it wants. What do you think?

2022-05-17 00:07:58

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFCv2 03/10] x86: Introduce userspace API to handle per-thread features

On Sun, May 15 2022 at 18:24, Edgecombe, Rick P wrote:
> On Sun, 2022-05-15 at 11:02 +0200, Thomas Gleixner wrote:
>> If it really turns out to be something which matters, then you can
>> provide a batch interface later on if it makes sense to do so, but
>> see
>> above.
>
> Thanks, sounds good to me.
>
> Kirill, so I guess we can just change ARCH_THREAD_FEATURE_ENABLE/
> ARCH_THREAD_FEATURE_DISABLE to return EINVAL if more than one bit is
> set. It returns 0 on success and whatever error code on failure.
> Userspace can do whatever rollback logic it wants. What do you think?

Why having this feature bit interface in the first place?

It's going to be a demultiplex mechanism with incompatible
arguments. Just look at LAM. What's really architecture specific about
it?

The mechanism per se is architecture independent: pointer tagging.

What's architecture specific is whether it's supported, the address mask
and the enable/disable mechanism.

So having e.g.

prctl(POINTER_TAGGING_GET_MASK, &mask);

works on all architectures which support this. Ditto

prctl(POINTER_TAGGING_ENABLE, &mask);

is architecture agnostic. Both need to be backed by an architecture
specific implementation of course.

This makes it future proof because new CPUs could define the mask to be
bit 57-61 and use bit 62 for something else. So from a user space
perspective the mask retrival is useful because it's obvious and trivial
to use and does not need code changes when the hardware implementation
provides a different mask.

See?

The thread.features bitmap could still be used as an internal storage
for enabled features, but having this as the primary programming
interface is cumbersome and unflexible for anything which is not binary
on/off.

Thanks,

tglx