2018-07-02 22:32:51

by Mathieu Desnoyers

[permalink] [raw]
Subject: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip
fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather
that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a
consistent behavior for a 32-bit binary executed on 32-bit kernels and in
compat mode on 64-bit kernels.

Validating the value of abort_ip field to be below TASK_SIZE ensures the
kernel don't return to an invalid address when returning to userspace
after an abort. I don't fully trust each architecture code to consistently
deal with invalid return addresses.

Validating the value of the start_ip and post_commit_offset fields
prevents overflow on arithmetic performed on those values, used to
check whether abort_ip is within the rseq critical section.

On 32-bit kernels, the rseq->rseq_cs_padding field is never read by the
kernel. However, 64-bit kernels dealing with 32-bit compat tasks read the
full 64-bit in its entirety, and terminates the offending process with
a segmentation fault if the upper 32 bits are set due to failure of
copy_from_user().

Ensure that both 32-bit and 64-bit kernels dealing with 32-bit tasks end
up terminating offending tasks with a segmentation fault if the upper
32-bit padding bits (rseq->rseq_cs_padding) are set by explicitly ensuring
that padding is zero on 32-bit kernels.

If validation fails, the process is killed with a segmentation fault.

When the signature encountered before abort_ip does not match the expected
signature, return -EINVAL rather than -EPERM to be consistent with other
input validation return codes from rseq_get_rseq_cs().

Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: "Paul E. McKenney" <[email protected]>
CC: Peter Zijlstra <[email protected]>
CC: Paul Turner <[email protected]>
CC: Thomas Gleixner <[email protected]>
CC: Andy Lutomirski <[email protected]>
CC: Andi Kleen <[email protected]>
CC: Dave Watson <[email protected]>
CC: Chris Lameter <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: "H. Peter Anvin" <[email protected]>
CC: Ben Maurer <[email protected]>
CC: Steven Rostedt <[email protected]>
CC: Josh Triplett <[email protected]>
CC: Linus Torvalds <[email protected]>
CC: Andrew Morton <[email protected]>
CC: Russell King <[email protected]>
CC: Catalin Marinas <[email protected]>
CC: Will Deacon <[email protected]>
CC: Michael Kerrisk <[email protected]>
CC: Boqun Feng <[email protected]>
CC: [email protected]
---
include/uapi/linux/rseq.h | 6 +++---
kernel/rseq.c | 39 +++++++++++++++++++++++++++++++++++----
2 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/rseq.h b/include/uapi/linux/rseq.h
index d620fa43756c..519ad6e176d1 100644
--- a/include/uapi/linux/rseq.h
+++ b/include/uapi/linux/rseq.h
@@ -52,10 +52,10 @@ struct rseq_cs {
__u32 version;
/* enum rseq_cs_flags */
__u32 flags;
- LINUX_FIELD_u32_u64(start_ip);
+ __u64 start_ip;
/* Offset from start_ip. */
- LINUX_FIELD_u32_u64(post_commit_offset);
- LINUX_FIELD_u32_u64(abort_ip);
+ __u64 post_commit_offset;
+ __u64 abort_ip;
} __attribute__((aligned(4 * sizeof(__u64))));

/*
diff --git a/kernel/rseq.c b/kernel/rseq.c
index 22b6acf1ad63..1d1dd6aa43f8 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -112,6 +112,29 @@ static int rseq_reset_rseq_cpu_id(struct task_struct *t)
return 0;
}

+#ifndef __LP64__
+/*
+ * Check that padding is zero.
+ */
+static int check_rseq_cs_padding(struct task_struct *t)
+{
+ u32 pad;
+ int ret;
+
+ ret = __get_user(pad, &t->rseq->rseq_cs_padding);
+ if (ret)
+ return ret;
+ if (pad)
+ return -EINVAL;
+ return 0;
+}
+#else
+static int check_rseq_cs_padding(struct task_struct *t)
+{
+ return 0;
+}
+#endif
+
static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
{
struct rseq_cs __user *urseq_cs;
@@ -123,6 +146,8 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
ret = __get_user(ptr, &t->rseq->rseq_cs);
if (ret)
return ret;
+ if (check_rseq_cs_padding(t))
+ return -EINVAL;
if (!ptr) {
memset(rseq_cs, 0, sizeof(*rseq_cs));
return 0;
@@ -130,14 +155,20 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
urseq_cs = (struct rseq_cs __user *)ptr;
if (copy_from_user(rseq_cs, urseq_cs, sizeof(*rseq_cs)))
return -EFAULT;
- if (rseq_cs->version > 0)
- return -EINVAL;

+ if (rseq_cs->start_ip >= TASK_SIZE ||
+ rseq_cs->start_ip + rseq_cs->post_commit_offset >= TASK_SIZE ||
+ rseq_cs->abort_ip >= TASK_SIZE ||
+ rseq_cs->version > 0)
+ return -EINVAL;
+ /* Check for overflow. */
+ if (rseq_cs->start_ip + rseq_cs->post_commit_offset < rseq_cs->start_ip)
+ return -EINVAL;
/* Ensure that abort_ip is not in the critical section. */
if (rseq_cs->abort_ip - rseq_cs->start_ip < rseq_cs->post_commit_offset)
return -EINVAL;

- usig = (u32 __user *)(rseq_cs->abort_ip - sizeof(u32));
+ usig = (u32 __user *)(unsigned long)(rseq_cs->abort_ip - sizeof(u32));
ret = get_user(sig, usig);
if (ret)
return ret;
@@ -146,7 +177,7 @@ static int rseq_get_rseq_cs(struct task_struct *t, struct rseq_cs *rseq_cs)
printk_ratelimited(KERN_WARNING
"Possible attack attempt. Unexpected rseq signature 0x%x, expecting 0x%x (pid=%d, addr=%p).\n",
sig, current->rseq_sig, current->pid, usig);
- return -EPERM;
+ return -EINVAL;
}
return 0;
}
--
2.11.0



2018-07-02 22:46:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 3:31 PM Mathieu Desnoyers
<[email protected]> wrote:
>
> Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip
> fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather
> that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a
> consistent behavior for a 32-bit binary executed on 32-bit kernels and in
> compat mode on 64-bit kernels.

Actually, now that I see this again, I react to:


> +static int check_rseq_cs_padding(struct task_struct *t)
> +{
> + u32 pad;
> + int ret;
> +
> + ret = __get_user(pad, &t->rseq->rseq_cs_padding);
> + if (ret)
> + return ret;
> + if (pad)
> + return -EINVAL;
> + return 0;
> +}

This is all wrong.

Just make "rseq_cs" be an __u64" too. That will clean up everything,
and user space will have a much easier time filling it in too, since
it's just one field. Instead of having to remember about the "let's
fill in padding for 32-bit cases".

Then the rseq_get_rseq_cs() will be

__u64 rseq_cs;

ret = get_user(rseq_cs, &t->rseq->rseq_cs);
if (ret)
return ret;
ptr = (void *)rseq_cs;
if (rseq_cs != (unsigned long)ptr)
return -EINVAL;

and it's all good, no #ifdef's etc needed.

Hmm?

Sorry for the bike-shedding, but this is now the last remaining user
of that LINUX_FIELD_u32_u64, so let's just get rid of it entirely, ok?

Then we can also get rid of that silly uapi/linux/types_32_64.h header
file entirely.

That would be *lovely*. Simpler code, simpler and less error-prone
interfaces, and one less specialized header file.

Linus

2018-07-02 23:01:40

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 6:45 PM, Linus Torvalds [email protected] wrote:

> On Mon, Jul 2, 2018 at 3:31 PM Mathieu Desnoyers
> <[email protected]> wrote:
>>
>> Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip
>> fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather
>> that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a
>> consistent behavior for a 32-bit binary executed on 32-bit kernels and in
>> compat mode on 64-bit kernels.
>
> Actually, now that I see this again, I react to:
>
>
>> +static int check_rseq_cs_padding(struct task_struct *t)
>> +{
>> + u32 pad;
>> + int ret;
>> +
>> + ret = __get_user(pad, &t->rseq->rseq_cs_padding);
>> + if (ret)
>> + return ret;
>> + if (pad)
>> + return -EINVAL;
>> + return 0;
>> +}
>
> This is all wrong.
>
> Just make "rseq_cs" be an __u64" too. That will clean up everything,
> and user space will have a much easier time filling it in too, since
> it's just one field. Instead of having to remember about the "let's
> fill in padding for 32-bit cases".
>
> Then the rseq_get_rseq_cs() will be
>
> __u64 rseq_cs;
>
> ret = get_user(rseq_cs, &t->rseq->rseq_cs);
> if (ret)
> return ret;
> ptr = (void *)rseq_cs;
> if (rseq_cs != (unsigned long)ptr)
> return -EINVAL;
>
> and it's all good, no #ifdef's etc needed.
>
> Hmm?

Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
the padding with 0, and only update the low bits with single-copy atomicity.

>
> Sorry for the bike-shedding, but this is now the last remaining user
> of that LINUX_FIELD_u32_u64, so let's just get rid of it entirely, ok?
>
> Then we can also get rid of that silly uapi/linux/types_32_64.h header
> file entirely.
>
> That would be *lovely*. Simpler code, simpler and less error-prone
> interfaces, and one less specialized header file.

We can easily switch from LINUX_FIELD_u32_u64 to __u64 for fields within
struct rseq_cs because we have no requirement on update single-copy
atomicity. However, this is not true for the rseq->rseq_cs pointer.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-02 23:07:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
<[email protected]> wrote:
>
> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
> the padding with 0, and only update the low bits with single-copy atomicity.

Well... It's actually still single-copy atomicity as a 64-bit value.

Why? Because it doesn't matter how you write the upper bits. You'll be
writing the same value to them (zero) anyway.

So who cares if the write ends up being two instructions, because the
write to the upper bits doesn't actually *do* anything.

Hmm?

Linus

2018-07-02 23:18:25

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds [email protected] wrote:

> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
> <[email protected]> wrote:
>>
>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>> the padding with 0, and only update the low bits with single-copy atomicity.
>
> Well... It's actually still single-copy atomicity as a 64-bit value.
>
> Why? Because it doesn't matter how you write the upper bits. You'll be
> writing the same value to them (zero) anyway.
>
> So who cares if the write ends up being two instructions, because the
> write to the upper bits doesn't actually *do* anything.
>
> Hmm?

Are there any kind of guarantees that a __u64 update on a 32-bit architecture
won't be torn into something daft like byte-per-byte stores when performed
from C code ?

I don't worry whether the upper bits get updated or how, but I really care
about not having store tearing of the low bits update.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-02 23:23:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 4:17 PM Mathieu Desnoyers
<[email protected]> wrote:
>
> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
> won't be torn into something daft like byte-per-byte stores when performed
> from C code ?

Guarantees? No. Not that there are any guarantees that the same won't
happen for a plain 32-bit value either.

Will compilers generate that kind of code? I guess some crazy compiler
could simply be really bad at handling 64-bit values, and just happen
to handle 32-bit values better. So in that sense a 64-bit entity is
certainly a bit riskier. But that would be a really bad compiler, I
have to say.

Linus

2018-07-02 23:24:06

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers [email protected] wrote:

> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds [email protected]
> wrote:
>
>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>> <[email protected]> wrote:
>>>
>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>
>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>
>> Why? Because it doesn't matter how you write the upper bits. You'll be
>> writing the same value to them (zero) anyway.
>>
>> So who cares if the write ends up being two instructions, because the
>> write to the upper bits doesn't actually *do* anything.
>>
>> Hmm?
>
> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
> won't be torn into something daft like byte-per-byte stores when performed
> from C code ?
>
> I don't worry whether the upper bits get updated or how, but I really care
> about not having store tearing of the low bits update.

For the records, most updates of those low bits are done in assembly
from critical sections, for which we control exactly how the update is
performed.

However, there is one helper function in user-space that updates that value
from C through a volatile store, e.g.:

static inline void rseq_prepare_unload(void)
{
__rseq_abi.rseq_cs = 0;
}

Thanks,

Mathieu

>
> Thanks,
>
> Mathieu
>
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-02 23:27:35

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 7:22 PM, Linus Torvalds [email protected] wrote:

> On Mon, Jul 2, 2018 at 4:17 PM Mathieu Desnoyers
> <[email protected]> wrote:
>>
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>
> Guarantees? No. Not that there are any guarantees that the same won't
> happen for a plain 32-bit value either.
>
> Will compilers generate that kind of code? I guess some crazy compiler
> could simply be really bad at handling 64-bit values, and just happen
> to handle 32-bit values better. So in that sense a 64-bit entity is
> certainly a bit riskier. But that would be a really bad compiler, I
> have to say.

Given that the only C code updating that field is rseq_prepare_unload()
(the rest is only ever updated from assembly), we could perhaps mandate
that user-space always update it from assembly, and therefore implement
rseq_prepare_unload as an inline asm which clears rseq->rseq_cs.

Does it sound better than the LINUX_FIELD_u32_u64 macro ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-02 23:43:42

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs



> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <[email protected]> wrote:
>
> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers [email protected] wrote:
>
>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds [email protected]
>> wrote:
>>
>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>> <[email protected]> wrote:
>>>>
>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>>
>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>>
>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>> writing the same value to them (zero) anyway.
>>>
>>> So who cares if the write ends up being two instructions, because the
>>> write to the upper bits doesn't actually *do* anything.
>>>
>>> Hmm?
>>
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>>
>> I don't worry whether the upper bits get updated or how, but I really care
>> about not having store tearing of the low bits update.
>
> For the records, most updates of those low bits are done in assembly
> from critical sections, for which we control exactly how the update is
> performed.
>
> However, there is one helper function in user-space that updates that value
> from C through a volatile store, e.g.:
>
> static inline void rseq_prepare_unload(void)
> {
> __rseq_abi.rseq_cs = 0;
> }

How about making the field be:

union {
__u64 rseq_cs;
struct {
__u32 rseq_cs_low;
__u32 rseq_cs_high;
};
};

32-bit user code that cares about performance can just write to rseq_cs_low because it already knows that rseq_cs_high == 0.

The header could even supply a static inline helper write_rseq_cs() that atomically writes a pointer and just does the right thing for 64-bit, for 32-bit BE, and for 32-bit LE.

I think the union really is needed because we can’t rely on user code being built with -fno-strict-aliasing. Or the helper could use inline asm.

Anyway, the point is that we get optimal code generation (a single instruction write of the correct number of bits) without any compat magic in the kernel.

>
> Thanks,
>
> Mathieu
>
>>
>> Thanks,
>>
>> Mathieu
>>
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
>> http://www.efficios.com
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:

> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
> won't be torn into something daft like byte-per-byte stores when performed
> from C code ?
>
> I don't worry whether the upper bits get updated or how, but I really care
> about not having store tearing of the low bits update.

Platforms with 32 bit word size only guarantee atomicity of a 32 bit
write or RMV instruction.

Special instructions may exist on a platform to perform 64 bit atomic
updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
atomicity8.

So use the macros that we have to guarantee 64 bit ops and you should be
fine. See linux/arch/x86/include/asm/atomic64_32.h

2018-07-03 00:24:11

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 8:19 PM, Chris Lameter [email protected] wrote:

> On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:
>
>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>> won't be torn into something daft like byte-per-byte stores when performed
>> from C code ?
>>
>> I don't worry whether the upper bits get updated or how, but I really care
>> about not having store tearing of the low bits update.
>
> Platforms with 32 bit word size only guarantee atomicity of a 32 bit
> write or RMV instruction.
>
> Special instructions may exist on a platform to perform 64 bit atomic
> updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
> atomicity8.
>
> So use the macros that we have to guarantee 64 bit ops and you should be
> fine. See linux/arch/x86/include/asm/atomic64_32.h

We are talking about user-space here. What we need is a single instruction
atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
on a 32-bit architecture should be considered to provide single-copy atomicity
on the low 32 bits.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:

> >
> > Platforms with 32 bit word size only guarantee atomicity of a 32 bit
> > write or RMV instruction.
> >
> > Special instructions may exist on a platform to perform 64 bit atomic
> > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
> > atomicity8.
> >
> > So use the macros that we have to guarantee 64 bit ops and you should be
> > fine. See linux/arch/x86/include/asm/atomic64_32.h
>
> We are talking about user-space here. What we need is a single instruction
> atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
> is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
> on a 32-bit architecture should be considered to provide single-copy atomicity
> on the low 32 bits.

Right. You would need to make this work for userspace. atomic64_32.h is a
good reference as to which instructions provide 64 bit atomicity on 32
bit platforms.


2018-07-03 01:18:35

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 8:35 PM, Chris Lameter [email protected] wrote:

> On Mon, 2 Jul 2018, Mathieu Desnoyers wrote:
>
>> >
>> > Platforms with 32 bit word size only guarantee atomicity of a 32 bit
>> > write or RMV instruction.
>> >
>> > Special instructions may exist on a platform to perform 64 bit atomic
>> > updates. We use cmpxchg64 f.e. on Intel 32 bit platforms to guarantee
>> > atomicity8.
>> >
>> > So use the macros that we have to guarantee 64 bit ops and you should be
>> > fine. See linux/arch/x86/include/asm/atomic64_32.h
>>
>> We are talking about user-space here. What we need is a single instruction
>> atomic store, similar to what WRITE_ONCE() does in the kernel. The discussion
>> is about whether doing the user-space equivalent of a WRITE_ONCE() to a u64
>> on a 32-bit architecture should be considered to provide single-copy atomicity
>> on the low 32 bits.
>
> Right. You would need to make this work for userspace. atomic64_32.h is a
> good reference as to which instructions provide 64 bit atomicity on 32
> bit platforms.

We only need to update a pointer, so we don't need 64-bit atomicity on
32-bit processes.

What we need is to ensure single-copy atomicity of the 32-bit pointer update
on the 32-bit process in a field read from the kernel as a __u64.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 01:20:42

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski [email protected] wrote:

>> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <[email protected]>
>> wrote:
>>
>> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers
>> [email protected] wrote:
>>
>>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds [email protected]
>>> wrote:
>>>
>>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>>> <[email protected]> wrote:
>>>>>
>>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>>>
>>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>>>
>>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>>> writing the same value to them (zero) anyway.
>>>>
>>>> So who cares if the write ends up being two instructions, because the
>>>> write to the upper bits doesn't actually *do* anything.
>>>>
>>>> Hmm?
>>>
>>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>>> won't be torn into something daft like byte-per-byte stores when performed
>>> from C code ?
>>>
>>> I don't worry whether the upper bits get updated or how, but I really care
>>> about not having store tearing of the low bits update.
>>
>> For the records, most updates of those low bits are done in assembly
>> from critical sections, for which we control exactly how the update is
>> performed.
>>
>> However, there is one helper function in user-space that updates that value
>> from C through a volatile store, e.g.:
>>
>> static inline void rseq_prepare_unload(void)
>> {
>> __rseq_abi.rseq_cs = 0;
>> }
>
> How about making the field be:
>
> union {
> __u64 rseq_cs;
> struct {
> __u32 rseq_cs_low;
> __u32 rseq_cs_high;
> };
> };
>
> 32-bit user code that cares about performance can just write to rseq_cs_low
> because it already knows that rseq_cs_high == 0.
>
> The header could even supply a static inline helper write_rseq_cs() that
> atomically writes a pointer and just does the right thing for 64-bit, for
> 32-bit BE, and for 32-bit LE.
>
> I think the union really is needed because we can’t rely on user code being
> built with -fno-strict-aliasing. Or the helper could use inline asm.
>
> Anyway, the point is that we get optimal code generation (a single instruction
> write of the correct number of bits) without any compat magic in the kernel.

That works for me! Any objection from anyone else for this approach ?

Thanks,

Mathieu

>
>>
>> Thanks,
>>
>> Mathieu
>>
>>>
>>> Thanks,
>>>
>>> Mathieu
>>>
>>>
>>> --
>>> Mathieu Desnoyers
>>> EfficiOS Inc.
>>> http://www.efficios.com
>>
>> --
>> Mathieu Desnoyers
>> EfficiOS Inc.
> > http://www.efficios.com

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 02:03:11

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 9:19 PM, Mathieu Desnoyers [email protected] wrote:

> ----- On Jul 2, 2018, at 7:37 PM, Andy Lutomirski [email protected] wrote:
>
>>> On Jul 2, 2018, at 4:22 PM, Mathieu Desnoyers <[email protected]>
>>> wrote:
>>>
>>> ----- On Jul 2, 2018, at 7:16 PM, Mathieu Desnoyers
>>> [email protected] wrote:
>>>
>>>> ----- On Jul 2, 2018, at 7:06 PM, Linus Torvalds [email protected]
>>>> wrote:
>>>>
>>>>> On Mon, Jul 2, 2018 at 4:00 PM Mathieu Desnoyers
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>> Unfortunately, that rseq->rseq_cs field needs to be updated by user-space
>>>>>> with single-copy atomicity. Therefore, we want 32-bit user-space to initialize
>>>>>> the padding with 0, and only update the low bits with single-copy atomicity.
>>>>>
>>>>> Well... It's actually still single-copy atomicity as a 64-bit value.
>>>>>
>>>>> Why? Because it doesn't matter how you write the upper bits. You'll be
>>>>> writing the same value to them (zero) anyway.
>>>>>
>>>>> So who cares if the write ends up being two instructions, because the
>>>>> write to the upper bits doesn't actually *do* anything.
>>>>>
>>>>> Hmm?
>>>>
>>>> Are there any kind of guarantees that a __u64 update on a 32-bit architecture
>>>> won't be torn into something daft like byte-per-byte stores when performed
>>>> from C code ?
>>>>
>>>> I don't worry whether the upper bits get updated or how, but I really care
>>>> about not having store tearing of the low bits update.
>>>
>>> For the records, most updates of those low bits are done in assembly
>>> from critical sections, for which we control exactly how the update is
>>> performed.
>>>
>>> However, there is one helper function in user-space that updates that value
>>> from C through a volatile store, e.g.:
>>>
>>> static inline void rseq_prepare_unload(void)
>>> {
>>> __rseq_abi.rseq_cs = 0;
>>> }
>>
>> How about making the field be:
>>
>> union {
>> __u64 rseq_cs;
>> struct {
>> __u32 rseq_cs_low;
>> __u32 rseq_cs_high;
>> };
>> };
>>
>> 32-bit user code that cares about performance can just write to rseq_cs_low
>> because it already knows that rseq_cs_high == 0.
>>
>> The header could even supply a static inline helper write_rseq_cs() that
>> atomically writes a pointer and just does the right thing for 64-bit, for
>> 32-bit BE, and for 32-bit LE.
>>
>> I think the union really is needed because we can’t rely on user code being
>> built with -fno-strict-aliasing. Or the helper could use inline asm.
>>
>> Anyway, the point is that we get optimal code generation (a single instruction
>> write of the correct number of bits) without any compat magic in the kernel.
>
> That works for me! Any objection from anyone else for this approach ?

One thing to consider is how we will implement the load of that pointer
on the kernel side. Strictly-speaking, the rseq uapi talks about single-copy
atomicity, and does not specify _which_ thread is expected to update that
pointer. So arguably, the common case is that the current thread is updating
it, which would allow the kernel to read it piece-wise. However, nothing
prevents user-space from updating it from another thread with single-copy
atomicity.

So in order to be on the safe side, I prefer to guarantee single-copy
atomicity of the get_user() load from the kernel that reads this pointer.
This means a 32-bit kernel would have to perform two independent loads:
one for low bits, one for high bits.

So it does look like we need some __LP64__ ifdefery even with the union
trick. Therefore, I'm not convinced the union is useful at all.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 02:20:33

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 7:01 PM Mathieu Desnoyers
<[email protected]> wrote:
>
> One thing to consider is how we will implement the load of that pointer
> on the kernel side.

Use "get_user()". It works for 64-bit objects too, and it will be
atomic in the 32-bit sub-parts on a 32-bit architecture.

Again: there is no point in trying to be atomic in the full 64 bits
(when you're running on 32-bit). The upper bits don't have to "match"
the lower bits. They just have to be zero. So doing it as two loads is
fine - the same way it's perfectly fine to do it as two stores (since
the store to the upper bits will always be zero).

Linus

2018-07-03 02:32:33

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 2, 2018, at 10:18 PM, Linus Torvalds [email protected] wrote:

> On Mon, Jul 2, 2018 at 7:01 PM Mathieu Desnoyers
> <[email protected]> wrote:
>>
>> One thing to consider is how we will implement the load of that pointer
>> on the kernel side.
>
> Use "get_user()". It works for 64-bit objects too, and it will be
> atomic in the 32-bit sub-parts on a 32-bit architecture.

Is it really ? Last time we had this discussion, not all architectures
guaranteed that reading a 64-bit integer would happen in two atomic
32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
macro as it stands today (rather than using a union).

>
> Again: there is no point in trying to be atomic in the full 64 bits
> (when you're running on 32-bit). The upper bits don't have to "match"
> the lower bits. They just have to be zero. So doing it as two loads is
> fine - the same way it's perfectly fine to do it as two stores (since
> the store to the upper bits will always be zero).

I'd be fine with two atomic loads, but I'd rather have a strong
confirmation about this, because last time around there were
architectures where it was not true as far as I recall.

Thanks,

Mathieu


>
> Linus

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 02:36:38

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 7:30 PM, Mathieu Desnoyers
<[email protected]> wrote:
> ----- On Jul 2, 2018, at 10:18 PM, Linus Torvalds [email protected] wrote:
>
>> On Mon, Jul 2, 2018 at 7:01 PM Mathieu Desnoyers
>> <[email protected]> wrote:
>>>
>>> One thing to consider is how we will implement the load of that pointer
>>> on the kernel side.
>>
>> Use "get_user()". It works for 64-bit objects too, and it will be
>> atomic in the 32-bit sub-parts on a 32-bit architecture.
>
> Is it really ? Last time we had this discussion, not all architectures
> guaranteed that reading a 64-bit integer would happen in two atomic
> 32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
> macro as it stands today (rather than using a union).
>

If you're nervous, you could do this by open-coding:

#if BITS_PER_LONG == 64
get_user(...)
#else
get_user(...);
get_user(...);
#endif

No need to make the header more complicated just for this.

2018-07-03 02:45:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 2, 2018 at 7:30 PM Mathieu Desnoyers
<[email protected]> wrote:
>
>
> Is it really ? Last time we had this discussion, not all architectures
> guaranteed that reading a 64-bit integer would happen in two atomic
> 32-bit sub-parts.

All architectures that matter do.

Please don't overdesign this, or try to make a problem out of
something that isn't a problem.

Sure, maybe some toy architecture does a 8-byte "get_user()" as a
"copy_from_user()" one byte at a time, because that's the best way to
do unaligned accesses.

But nobody will ever care about rseq on such a thing anyway. Let it go.

Linus

2018-07-03 08:16:42

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Mon, Jul 02, 2018 at 10:30:09PM -0400, Mathieu Desnoyers wrote:
> > Use "get_user()". It works for 64-bit objects too, and it will be
> > atomic in the 32-bit sub-parts on a 32-bit architecture.
>
> Is it really ? Last time we had this discussion, not all architectures
> guaranteed that reading a 64-bit integer would happen in two atomic
> 32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
> macro as it stands today (rather than using a union).

Just state, as a requirement for supporting rseq, that the arch
{get,put}_user(u64) on 32bit targets must be exactly 2 u32 loads/stores.

We're piece-wise enabling rseq across architectures anyway, and when the
relevant maintains do this, they can have a look at their
{get,put}_user() implementations and fix them.

If you rely on get_user(u64) working, that means microblaze is already
broken, but I suppose it already was, since their rseq enablement patch
is extremely dodgy. Michal?

2018-07-03 08:31:29

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:14:49AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 02, 2018 at 10:30:09PM -0400, Mathieu Desnoyers wrote:
> > > Use "get_user()". It works for 64-bit objects too, and it will be
> > > atomic in the 32-bit sub-parts on a 32-bit architecture.
> >
> > Is it really ? Last time we had this discussion, not all architectures
> > guaranteed that reading a 64-bit integer would happen in two atomic
> > 32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
> > macro as it stands today (rather than using a union).
>
> Just state, as a requirement for supporting rseq, that the arch
> {get,put}_user(u64) on 32bit targets must be exactly 2 u32 loads/stores.
>
> We're piece-wise enabling rseq across architectures anyway, and when the
> relevant maintains do this, they can have a look at their
> {get,put}_user() implementations and fix them.
>
> If you rely on get_user(u64) working, that means microblaze is already
> broken, but I suppose it already was, since their rseq enablement patch
> is extremely dodgy. Michal?

s390 uses the mvcos instruction to implement get_user(). That instruction
is not defined to be atomic, but may copy bytes piecemeal.. I had the
impression that the rseq fields are supposed to be updated within the
context of a single thread (user + kernel space).

However if another user space thread is allowed to do this as well, then
the get_user() approach won't fly on s390.

That leaves the question: does it even make sense for a thread to update
the rseq structure of a different thread?


2018-07-03 08:44:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:29:55AM +0200, Heiko Carstens wrote:
> On Tue, Jul 03, 2018 at 10:14:49AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 02, 2018 at 10:30:09PM -0400, Mathieu Desnoyers wrote:
> > > > Use "get_user()". It works for 64-bit objects too, and it will be
> > > > atomic in the 32-bit sub-parts on a 32-bit architecture.
> > >
> > > Is it really ? Last time we had this discussion, not all architectures
> > > guaranteed that reading a 64-bit integer would happen in two atomic
> > > 32-bit sub-parts. This was the main motivation for the LINUX_FIELD_u32_u64()
> > > macro as it stands today (rather than using a union).
> >
> > Just state, as a requirement for supporting rseq, that the arch
> > {get,put}_user(u64) on 32bit targets must be exactly 2 u32 loads/stores.
> >
> > We're piece-wise enabling rseq across architectures anyway, and when the
> > relevant maintains do this, they can have a look at their
> > {get,put}_user() implementations and fix them.
> >
> > If you rely on get_user(u64) working, that means microblaze is already
> > broken, but I suppose it already was, since their rseq enablement patch
> > is extremely dodgy. Michal?
>
> s390 uses the mvcos instruction to implement get_user(). That instruction
> is not defined to be atomic, but may copy bytes piecemeal.. I had the
> impression that the rseq fields are supposed to be updated within the
> context of a single thread (user + kernel space).
>
> However if another user space thread is allowed to do this as well, then
> the get_user() approach won't fly on s390.
>
> That leaves the question: does it even make sense for a thread to update
> the rseq structure of a different thread?

The problem is interrupts; we need interrupts on the CPU doing the store
to observe either the old or the new value, not a mix.

If mvcos does not guarantee that, we're having problems. Is there a
reason get_user() cannot use a 'regular' load?

2018-07-03 08:57:00

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

> > > We're piece-wise enabling rseq across architectures anyway, and when the
> > > relevant maintains do this, they can have a look at their
> > > {get,put}_user() implementations and fix them.
> > >
> > > If you rely on get_user(u64) working, that means microblaze is already
> > > broken, but I suppose it already was, since their rseq enablement patch
> > > is extremely dodgy. Michal?
> >
> > s390 uses the mvcos instruction to implement get_user(). That instruction
> > is not defined to be atomic, but may copy bytes piecemeal.. I had the
> > impression that the rseq fields are supposed to be updated within the
> > context of a single thread (user + kernel space).
> >
> > However if another user space thread is allowed to do this as well, then
> > the get_user() approach won't fly on s390.
> >
> > That leaves the question: does it even make sense for a thread to update
> > the rseq structure of a different thread?
>
> The problem is interrupts; we need interrupts on the CPU doing the store
> to observe either the old or the new value, not a mix.
>
> If mvcos does not guarantee that, we're having problems. Is there a
> reason get_user() cannot use a 'regular' load?

Well, that's single instruction semantics. This is something we actually
can guarantee, since the mvcos instruction itself won't be interrupted and
copies all 1/2/4/8 bytes in a row.

So we are talking about that single instructions are required and not
atomic accesses?


2018-07-03 09:19:22

by Heiko Carstens

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:55:46AM +0200, Heiko Carstens wrote:
> > > > We're piece-wise enabling rseq across architectures anyway, and when the
> > > > relevant maintains do this, they can have a look at their
> > > > {get,put}_user() implementations and fix them.
> > > >
> > > > If you rely on get_user(u64) working, that means microblaze is already
> > > > broken, but I suppose it already was, since their rseq enablement patch
> > > > is extremely dodgy. Michal?
> > >
> > > s390 uses the mvcos instruction to implement get_user(). That instruction
> > > is not defined to be atomic, but may copy bytes piecemeal.. I had the
> > > impression that the rseq fields are supposed to be updated within the
> > > context of a single thread (user + kernel space).
> > >
> > > However if another user space thread is allowed to do this as well, then
> > > the get_user() approach won't fly on s390.
> > >
> > > That leaves the question: does it even make sense for a thread to update
> > > the rseq structure of a different thread?
> >
> > The problem is interrupts; we need interrupts on the CPU doing the store
> > to observe either the old or the new value, not a mix.
> >
> > If mvcos does not guarantee that, we're having problems. Is there a
> > reason get_user() cannot use a 'regular' load?
>
> Well, that's single instruction semantics. This is something we actually
> can guarantee, since the mvcos instruction itself won't be interrupted and
> copies all 1/2/4/8 bytes in a row.
>
> So we are talking about that single instructions are required and not
> atomic accesses?

And to answer also your question: we don't use a regular load, since we
would have to use 'sacf' construct surrounding the load instruction which
would be much slower.
We have something like that implemented for the futex atomic ops, and we
could also implement something like that for this use case
(e.g. get_user_atomic()), if really needed.


2018-07-03 09:24:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:55:46AM +0200, Heiko Carstens wrote:
> >
> > The problem is interrupts; we need interrupts on the CPU doing the store
> > to observe either the old or the new value, not a mix.
> >
> > If mvcos does not guarantee that, we're having problems. Is there a
> > reason get_user() cannot use a 'regular' load?
>
> Well, that's single instruction semantics. This is something we actually
> can guarantee, since the mvcos instruction itself won't be interrupted and
> copies all 1/2/4/8 bytes in a row.
>
> So we are talking about that single instructions are required and not
> atomic accesses?

rseq is strictly task local. So from that pov single-copy atomic and
single instruction semantics end up being very similar.

The most complicated scenario would be where we interrupt the task,
schedule it out, migrate it and resume execution on another CPU. In that
case the second CPU also needs to observe a 'whole' value.

But note that in that example there's a fair bit of ordering provided by
the scheduler to ensure all the state from the old CPU is observed by
the new CPU (on s390 just the rq->lock fiddling would imply a bunch of
general memory barriers).

So I think you're good... But yes, you raise an interresting point.

2018-07-03 09:26:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 11:17:17AM +0200, Heiko Carstens wrote:

> And to answer also your question: we don't use a regular load, since we
> would have to use 'sacf' construct surrounding the load instruction which
> would be much slower.
> We have something like that implemented for the futex atomic ops, and we
> could also implement something like that for this use case
> (e.g. get_user_atomic()), if really needed.

/me digs out the s390-PoO PDF and understands.. this is because of the
split user/kernel address space stuff.

Fair enough.

2018-07-03 16:42:14

by Andi Kleen

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

>
> So I think you're good... But yes, you raise an interresting point.

So it sounds like architectures that don't have an instruction atomic u64
*_user need to disable interrupts during the access, and somehow handle that
case when a page fault happens?

-Andi

2018-07-03 17:04:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 09:40:48AM -0700, Andi Kleen wrote:
> >
> > So I think you're good... But yes, you raise an interresting point.
>
> So it sounds like architectures that don't have an instruction atomic u64
> *_user need to disable interrupts during the access, and somehow handle that
> case when a page fault happens?

So for 32bit, as Linus already said, a split store is _fine_, because
the top word is always going to be 0 anyway.

So all we really need is native word sized loads / stores. s390 is just
a little weird here (it wouldn't be s390 if it wasn't I suppose) for not
actually using regular loads / stores because of the split address space
stuff.


2018-07-03 17:08:12

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs


On Jul 3, 2018, at 9:40 AM, Andi Kleen <[email protected]> wrote:

>>
>> So I think you're good... But yes, you raise an interresting point.
>
> So it sounds like architectures that don't have an instruction atomic u64
> *_user need to disable interrupts during the access, and somehow handle that
> case when a page fault happens?

I think all this discussion of “atomic” is a huge distraction. The properties we need are:

- User code can change rseq_cs from one valid user pointer to another with a single instruction (or equivalent) such that we can’t end up in the kernel with the write only partially done as seen in that thread.

- The kernel needs to be able to read the value consistently with the above requirement.

I don’t think it’s possible to have a valid implementation of get_user() on any architecture that’s so weak that this doesn’t work.

If user code writes rseq_cs from the wrong thread, I think the user code is buggy and we simply don’t care what happens. The kernel should be allowed to use an arbitrarily weak read with respect to other threads.

2018-07-03 17:12:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
>
> So it sounds like architectures that don't have an instruction atomic u64
> *_user need to disable interrupts during the access, and somehow handle that
> case when a page fault happens?

No. It's actually the store by *user* space that is the critical one.
Not the whole 64-bit value, just the low pointer part.

The kernel could do it as a byte-by-byte load, really. It's
per-thread, and once the kernel is running, it's not going to change.
The kernel never changes the value, it just loads it from user space.

So all the atomicity worries for the kernel are a red herring. They'd
arguably be nice to have - but only for an insane case that makes
absolutely no sense (a different thread trying to change the value).

Can we please stop the idiocy already? The kernel could read the rseq
pointer one bit at a time, and do a little dance with "yield()" in
between, and take interrupts and page faults, and it wouldn't matter
AT ALL.

It's not even that we read the value from an interrupt context, it's
that as we return to user space (which can be the result of an
interrupt) we can read the value.

This whole thread has been filled with crazy "what if" things that don't matter.

Linus

2018-07-03 17:28:06

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 1:10 PM, Linus Torvalds [email protected] wrote:

> On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
>>
>> So it sounds like architectures that don't have an instruction atomic u64
>> *_user need to disable interrupts during the access, and somehow handle that
>> case when a page fault happens?
>
> No. It's actually the store by *user* space that is the critical one.
> Not the whole 64-bit value, just the low pointer part.
>
> The kernel could do it as a byte-by-byte load, really. It's
> per-thread, and once the kernel is running, it's not going to change.
> The kernel never changes the value, it just loads it from user space.
>
> So all the atomicity worries for the kernel are a red herring. They'd
> arguably be nice to have - but only for an insane case that makes
> absolutely no sense (a different thread trying to change the value).
>
> Can we please stop the idiocy already? The kernel could read the rseq
> pointer one bit at a time, and do a little dance with "yield()" in
> between, and take interrupts and page faults, and it wouldn't matter
> AT ALL.
>
> It's not even that we read the value from an interrupt context, it's
> that as we return to user space (which can be the result of an
> interrupt) we can read the value.
>
> This whole thread has been filled with crazy "what if" things that don't matter.

Sorry to come back in the thread late, looks like I've missed all the
fun.

I agree with Linus: we can simply document that updates to rseq->rseq_cs
should be thread-local in the rseq uapi and be done with it. This would
allow using get_user(u64) even on 32-bit architectures, because we cannot
care less if an architecture chooses to read the u64 byte-wise while
standing on its feet.

With this added requirement, Andy's idea of using a union between __u64
and upper/lower __u32 would fit very nicely.

If everyone is OK with that approach, I can prepare an updated patch.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 17:36:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:10:37AM -0700, Linus Torvalds wrote:
> On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
> >
> > So it sounds like architectures that don't have an instruction atomic u64
> > *_user need to disable interrupts during the access, and somehow handle that
> > case when a page fault happens?
>
> No. It's actually the store by *user* space that is the critical one.
> Not the whole 64-bit value, just the low pointer part.
>
> The kernel could do it as a byte-by-byte load, really. It's
> per-thread, and once the kernel is running, it's not going to change.
> The kernel never changes the value, it just loads it from user space.

The kernel doesn't change _this_ value, but the kernel does change other
values, like for instance rseq->cpu_id. But even there, it could use
byte stores and it is again the userspace load of that field that is
critical again and needs to be a single op.


2018-07-03 17:40:29

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 1:34 PM, Peter Zijlstra [email protected] wrote:

> On Tue, Jul 03, 2018 at 10:10:37AM -0700, Linus Torvalds wrote:
>> On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
>> >
>> > So it sounds like architectures that don't have an instruction atomic u64
>> > *_user need to disable interrupts during the access, and somehow handle that
>> > case when a page fault happens?
>>
>> No. It's actually the store by *user* space that is the critical one.
>> Not the whole 64-bit value, just the low pointer part.
>>
>> The kernel could do it as a byte-by-byte load, really. It's
>> per-thread, and once the kernel is running, it's not going to change.
>> The kernel never changes the value, it just loads it from user space.
>
> The kernel doesn't change _this_ value, but the kernel does change other
> values, like for instance rseq->cpu_id. But even there, it could use
> byte stores and it is again the userspace load of that field that is
> critical again and needs to be a single op.

I can simply document that loads/stores from/to all struct rseq fields
should be thread-local then ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 17:50:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 01:38:59PM -0400, Mathieu Desnoyers wrote:
> ----- On Jul 3, 2018, at 1:34 PM, Peter Zijlstra [email protected] wrote:
>
> > On Tue, Jul 03, 2018 at 10:10:37AM -0700, Linus Torvalds wrote:
> >> On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
> >> >
> >> > So it sounds like architectures that don't have an instruction atomic u64
> >> > *_user need to disable interrupts during the access, and somehow handle that
> >> > case when a page fault happens?
> >>
> >> No. It's actually the store by *user* space that is the critical one.
> >> Not the whole 64-bit value, just the low pointer part.
> >>
> >> The kernel could do it as a byte-by-byte load, really. It's
> >> per-thread, and once the kernel is running, it's not going to change.
> >> The kernel never changes the value, it just loads it from user space.
> >
> > The kernel doesn't change _this_ value, but the kernel does change other
> > values, like for instance rseq->cpu_id. But even there, it could use
> > byte stores and it is again the userspace load of that field that is
> > critical again and needs to be a single op.
>
> I can simply document that loads/stores from/to all struct rseq fields
> should be thread-local then ?

I'm not sure that covers things sufficiently. You really want the
userspace load/stores to be single instructions.

Also, I think it was rseq_update_cpu_id() where we wanted to use a
single u64 store if possible but you worried about the stores.

2018-07-03 18:00:07

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 1:48 PM, Peter Zijlstra [email protected] wrote:

> On Tue, Jul 03, 2018 at 01:38:59PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 3, 2018, at 1:34 PM, Peter Zijlstra [email protected] wrote:
>>
>> > On Tue, Jul 03, 2018 at 10:10:37AM -0700, Linus Torvalds wrote:
>> >> On Tue, Jul 3, 2018 at 9:40 AM Andi Kleen <[email protected]> wrote:
>> >> >
>> >> > So it sounds like architectures that don't have an instruction atomic u64
>> >> > *_user need to disable interrupts during the access, and somehow handle that
>> >> > case when a page fault happens?
>> >>
>> >> No. It's actually the store by *user* space that is the critical one.
>> >> Not the whole 64-bit value, just the low pointer part.
>> >>
>> >> The kernel could do it as a byte-by-byte load, really. It's
>> >> per-thread, and once the kernel is running, it's not going to change.
>> >> The kernel never changes the value, it just loads it from user space.
>> >
>> > The kernel doesn't change _this_ value, but the kernel does change other
>> > values, like for instance rseq->cpu_id. But even there, it could use
>> > byte stores and it is again the userspace load of that field that is
>> > critical again and needs to be a single op.
>>
>> I can simply document that loads/stores from/to all struct rseq fields
>> should be thread-local then ?
>
> I'm not sure that covers things sufficiently. You really want the
> userspace load/stores to be single instructions.

Yes, of course. More specifically, I would document that those need to
be single-copy atomicity load/store performed by the local thread.

> Also, I think it was rseq_update_cpu_id() where we wanted to use a
> single u64 store if possible but you worried about the stores.

With this added bit of restriction on thread-local loads, indeed we
can then update them without caring about atomicity at kernel level.

I can modify the ABI to put the cpu_id_start and cpu_id fields inside
a union, and update it with a single store.

Thoughts ?

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 18:02:09

by Linus Torvalds

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 3, 2018 at 10:49 AM Peter Zijlstra <[email protected]> wrote:
>
> > I can simply document that loads/stores from/to all struct rseq fields
> > should be thread-local then ?
>
> I'm not sure that covers things sufficiently. You really want the
> userspace load/stores to be single instructions.

Actually, I think we should try very hard to limit even that to _just_
the rseq pointer itself.

Everything else can be filled in ahead of time with non-atomic stores,
and then the last thing that happens - and the only thing that wants
that final "one last atomic write" is the rseq pointer write.

No?

So I'd suggest that the only part we aim to have any "atomic" behavior
at all is for the individual fields in "struct rseq" itself. So the
cpu id and the base pointer and the flags. And even they are
thread-local, so the atomicity is not about the kernel, but about user
space needing to read and update them in word-sized chunks.

End result: absolutely nothing is atomic for the kernel.

Linus

2018-07-03 18:11:10

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 1:59 PM, Linus Torvalds [email protected] wrote:

> On Tue, Jul 3, 2018 at 10:49 AM Peter Zijlstra <[email protected]> wrote:
>>
>> > I can simply document that loads/stores from/to all struct rseq fields
>> > should be thread-local then ?
>>
>> I'm not sure that covers things sufficiently. You really want the
>> userspace load/stores to be single instructions.
>
> Actually, I think we should try very hard to limit even that to _just_
> the rseq pointer itself.
>
> Everything else can be filled in ahead of time with non-atomic stores,
> and then the last thing that happens - and the only thing that wants
> that final "one last atomic write" is the rseq pointer write.
>
> No?

Well a small nit here: the rseq->rseq_cs pointer store is performed
at the _very beginning_ of the rseq critical section. We indeed want
that store to be performed as a single instruction by user-space.

What I think you have in mind as "one last atomic write" is the commit
instruction at the end of the critical section, which does not touch
any field in struct rseq.

>
> So I'd suggest that the only part we aim to have any "atomic" behavior
> at all is for the individual fields in "struct rseq" itself. So the
> cpu id and the base pointer and the flags. And even they are
> thread-local, so the atomicity is not about the kernel, but about user
> space needing to read and update them in word-sized chunks.
>
> End result: absolutely nothing is atomic for the kernel.

Yes, +1. If everyone is OK with that I'll go and implement the changes
within the coming day.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 18:12:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 10:59:45AM -0700, Linus Torvalds wrote:
> On Tue, Jul 3, 2018 at 10:49 AM Peter Zijlstra <[email protected]> wrote:
> >
> > > I can simply document that loads/stores from/to all struct rseq fields
> > > should be thread-local then ?
> >
> > I'm not sure that covers things sufficiently. You really want the
> > userspace load/stores to be single instructions.
>
> Actually, I think we should try very hard to limit even that to _just_
> the rseq pointer itself.

> So I'd suggest that the only part we aim to have any "atomic" behavior
> at all is for the individual fields in "struct rseq" itself. So the
> cpu id and the base pointer and the flags. And even they are
> thread-local, so the atomicity is not about the kernel, but about user
> space needing to read and update them in word-sized chunks.
>
> End result: absolutely nothing is atomic for the kernel.

Yes, agreed, that is what I meant but very poorly expressed. Only the
rseq bits themselves need this single-copy atomic stuff -- for
userspace.


2018-07-03 18:13:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 01:58:37PM -0400, Mathieu Desnoyers wrote:
> I can modify the ABI to put the cpu_id_start and cpu_id fields inside
> a union, and update it with a single store.
>
> Thoughts ?

Let's keep them for now, we can always frob this later, they are aligned
and proper, no need to expose that union to userspace.

2018-07-03 18:17:42

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 2:11 PM, Peter Zijlstra [email protected] wrote:

> On Tue, Jul 03, 2018 at 01:58:37PM -0400, Mathieu Desnoyers wrote:
>> I can modify the ABI to put the cpu_id_start and cpu_id fields inside
>> a union, and update it with a single store.
>>
>> Thoughts ?
>
> Let's keep them for now, we can always frob this later, they are aligned
> and proper, no need to expose that union to userspace.

Isn't it weird to change the API of an exposed public uapi header ? What
if userspace chooses to do sizeof(__rseq_abi.cpu_id) ? We would break
this unless we use a transparent union, which puts constraints I would
hope not to have on compilers supporting transparent unions (I recall
C++ had issues with this).

I'd prefer to expose the union right away if it's fine with you.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 18:30:00

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 02:15:34PM -0400, Mathieu Desnoyers wrote:
> ----- On Jul 3, 2018, at 2:11 PM, Peter Zijlstra [email protected] wrote:
>
> > On Tue, Jul 03, 2018 at 01:58:37PM -0400, Mathieu Desnoyers wrote:
> >> I can modify the ABI to put the cpu_id_start and cpu_id fields inside
> >> a union, and update it with a single store.
> >>
> >> Thoughts ?
> >
> > Let's keep them for now, we can always frob this later, they are aligned
> > and proper, no need to expose that union to userspace.
>
> Isn't it weird to change the API of an exposed public uapi header ?

Sure, just keep it as is. We don't need an exposed union to do a single
store there.

Something like the ugly below preserves API but still does a single
store.

But sure, if you want to expose that union for some reason, then now is
the time.

diff --git a/kernel/rseq.c b/kernel/rseq.c
index 22b6acf1ad63..e956c48b5f83 100644
--- a/kernel/rseq.c
+++ b/kernel/rseq.c
@@ -85,10 +85,17 @@ static int rseq_update_cpu_id(struct task_struct *t)
{
u32 cpu_id = raw_smp_processor_id();

- if (__put_user(cpu_id, &t->rseq->cpu_id_start))
- return -EFAULT;
- if (__put_user(cpu_id, &t->rseq->cpu_id))
+ union {
+ struct {
+ u32 cpu_id_start;
+ u32 cpu_id;
+ };
+ u64 val;
+ } x = { { .cpu_id_start = cpu_id, .cpu_id = cpu_id, } };
+
+ if (__put_user(x.val, (u64 *)&t->rseq->cpu_id_start))
return -EFAULT;
+
trace_rseq_update(t);
return 0;
}

2018-07-03 18:42:05

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

----- On Jul 3, 2018, at 2:28 PM, Peter Zijlstra [email protected] wrote:

> On Tue, Jul 03, 2018 at 02:15:34PM -0400, Mathieu Desnoyers wrote:
>> ----- On Jul 3, 2018, at 2:11 PM, Peter Zijlstra [email protected] wrote:
>>
>> > On Tue, Jul 03, 2018 at 01:58:37PM -0400, Mathieu Desnoyers wrote:
>> >> I can modify the ABI to put the cpu_id_start and cpu_id fields inside
>> >> a union, and update it with a single store.
>> >>
>> >> Thoughts ?
>> >
>> > Let's keep them for now, we can always frob this later, they are aligned
>> > and proper, no need to expose that union to userspace.
>>
>> Isn't it weird to change the API of an exposed public uapi header ?
>
> Sure, just keep it as is. We don't need an exposed union to do a single
> store there.
>
> Something like the ugly below preserves API but still does a single
> store.
>
> But sure, if you want to expose that union for some reason, then now is
> the time.

User-space won't ever want to read cpu_id_start and cpu_id from a single
u64 load, it serves no purpose to do so. So I'm OK with keeping those as
is and defining a local union for the __put_user() update.

Thanks!

Mathieu

>
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 22b6acf1ad63..e956c48b5f83 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -85,10 +85,17 @@ static int rseq_update_cpu_id(struct task_struct *t)
> {
> u32 cpu_id = raw_smp_processor_id();
>
> - if (__put_user(cpu_id, &t->rseq->cpu_id_start))
> - return -EFAULT;
> - if (__put_user(cpu_id, &t->rseq->cpu_id))
> + union {
> + struct {
> + u32 cpu_id_start;
> + u32 cpu_id;
> + };
> + u64 val;
> + } x = { { .cpu_id_start = cpu_id, .cpu_id = cpu_id, } };
> +
> + if (__put_user(x.val, (u64 *)&t->rseq->cpu_id_start))
> return -EFAULT;
> +
> trace_rseq_update(t);
> return 0;
> }

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

2018-07-03 19:10:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH for 4.18] rseq: use __u64 for rseq_cs fields, validate user inputs

On Tue, Jul 03, 2018 at 02:41:01PM -0400, Mathieu Desnoyers wrote:

> User-space won't ever want to read cpu_id_start and cpu_id from a single
> u64 load, it serves no purpose to do so. So I'm OK with keeping those as
> is and defining a local union for the __put_user() update.

So I think previously we had the sequence number and cpuid in there
together, and in that case it did want to load them both. But since you
made that sequence number dissapear....