2020-09-24 08:28:25

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

From: Dave Jiang
> Sent: 24 September 2020 00:11
>
> The MOVDIR64B instruction can be used by other wrapper instructions. Move
> the asm code to special_insns.h and have iosubmit_cmds512() call the
> asm function.
>
> Signed-off-by: Dave Jiang <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> arch/x86/include/asm/io.h | 17 +++--------------
> arch/x86/include/asm/special_insns.h | 19 +++++++++++++++++++
> 2 files changed, 22 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h
> index e1aa17a468a8..d726459d08e5 100644
> --- a/arch/x86/include/asm/io.h
> +++ b/arch/x86/include/asm/io.h
...
> diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> index 59a3e13204c3..2a5abd27bb86 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -234,6 +234,25 @@ static inline void clwb(volatile void *__p)
>
> #define nop() asm volatile ("nop")
>
> +/* The dst parameter must be 64-bytes aligned */
> +static inline void movdir64b(void *dst, const void *src)
> +{
> + /*
> + * Note that this isn't an "on-stack copy", just definition of "dst"
> + * as a pointer to 64-bytes of stuff that is going to be overwritten.
> + * In the MOVDIR64B case that may be needed as you can use the
> + * MOVDIR64B instruction to copy arbitrary memory around. This trick
> + * lets the compiler know how much gets clobbered.
> + */
> + volatile struct { char _[64]; } *__dst = dst;
> +
> + /* MOVDIR64B [rdx], rax */
> + asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> + :
> + : "m" (*(struct { char _[64];} **)src), "a" (__dst)
> + : "memory");
> +}
> +
> #endif /* __KERNEL__ */

You've lost the "d" (src).
You don't need the 'memory' clobber, just:

static inline void movdir64b(void *dst, const void *src)
{
/*
* 64 bytes from dst are marked as modified for completeness.
* Since the writes bypass the cache later reads may return
* old data anyway.
*/
/* MOVDIR64B [rdx], rax */
asm volatile (".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
: "=m" ((struct { char _[64];} *)dst),
: "m" ((struct { char _[64];} *)src), "d" (src), "a" (dst));
}

I've checked that the "m" constraint on src does force (at least one
version of) gcc to actually write to the supplied buffer.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2020-09-24 10:16:38

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

On Thu, Sep 24, 2020 at 08:24:46AM +0000, David Laight wrote:
> static inline void movdir64b(void *dst, const void *src)
> {
> /*
> * 64 bytes from dst are marked as modified for completeness.
> * Since the writes bypass the cache later reads may return
> * old data anyway.
> */
> /* MOVDIR64B [rdx], rax */
> asm volatile (".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> : "=m" ((struct { char _[64];} *)dst),
> : "m" ((struct { char _[64];} *)src), "d" (src), "a" (dst));

Now since you're so generous with your advice on random threads, please
explain what you're advising here?

The destination operand - in this case in %rax - is "destination memory
address specified as offset to ES segment in the register operand."

So what is the difference between:

...(void *dst, ... )

volatile struct { char _[64]; } *__dst = dst;

...

: "=m" (__dst)
: "a" (__dst)

and

...(void *dst, ... )

...

: "=m" ((struct { char _[64];} *)dst)
: "a" (__dst)

and why?

Point me to the gcc documentation where this is explained.

To cut to the chase, I don't think you need to do that, otherwise clwb()
would be broken too but perhaps you know something I don't.

Looking at clwb(), I believe the proper specification should be:

volatile struct { char _[64]; } *__dst = dst;

...

: "+m" (__dst)
: "a" (__dst)

And if anything, the source specification should be something like that:

volatile struct { char x[64]; } *__src = src;

...


"d" (__src)

because this tells gcc that the source operand would read 64 bytes
through the pointer in the %rdx reg.

So this ends up close to what you're saying but it is using local
variables to make the asm actually readable.

Lemme add Micha to Cc for sanity-checking:

Micha, the instruction is:

MOVDIR64B %(rdx), rax

"Move 64-bytes as direct-store with guaranteed 64-byte write atomicity
from the source memory operand address to destination memory address
specified as offset to ES segment in the register operand."

Do I need to tell gcc that both operands are referencing 64 bytes,
source operand is a memory reference, destination operand is an address
specified in a register?

What we have currently is:

volatile struct { char _[64]; } *dst = __dst;

/* MOVDIR64B [rdx], rax */
asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
: "=m" (dst)
: "d" (from), "a" (dst));


Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-09-24 10:44:12

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

From: Borislav Petkov
> Sent: 24 September 2020 11:15
> On Thu, Sep 24, 2020 at 08:24:46AM +0000, David Laight wrote:
> > static inline void movdir64b(void *dst, const void *src)
> > {
> > /*
> > * 64 bytes from dst are marked as modified for completeness.
> > * Since the writes bypass the cache later reads may return
> > * old data anyway.
> > */
> > /* MOVDIR64B [rdx], rax */
> > asm volatile (".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> > : "=m" ((struct { char _[64];} *)dst),
> > : "m" ((struct { char _[64];} *)src), "d" (src), "a" (dst));
>
> Now since you're so generous with your advice on random threads, please
> explain what you're advising here?
>
> The destination operand - in this case in %rax - is "destination memory
> address specified as offset to ES segment in the register operand."

The movdir64b instruction does a 'normal' read of 64 bytes (can be misaligned)
Then a cache-bypassing (probably) write-combining single 64byte write to
an address that must be aligned.
Any reference to segment registers is largely irrelevant since we are
not in real mode.


> So what is the difference between:
>
> ...(void *dst, ... )
>
> volatile struct { char _[64]; } *__dst = dst;
> ...
> : "=m" (__dst)
> : "a" (__dst)
>
> and
>
> ...(void *dst, ... )
> ...
> : "=m" ((struct { char _[64];} *)dst)
> : "a" (__dst)
>
> and why?
>
> Point me to the gcc documentation where this is explained.

Mainly less lines of code to look at.

> To cut to the chase, I don't think you need to do that, otherwise clwb()
> would be broken too but perhaps you know something I don't.
>
> Looking at clwb(), I believe the proper specification should be:
>
> volatile struct { char _[64]; } *__dst = dst;
>
> ...
>
> : "+m" (__dst)
> : "a" (__dst)

No idea what clwb() is doing.
But the "+m" (dst) tells gcc it depends on, and modifies the 64 bytes
at *dst.

I believe the 'volatile' is pointless.

> And if anything, the source specification should be something like that:
>
> volatile struct { char x[64]; } *__src = src;
>
> ...
>
>
> "d" (__src)
>
> because this tells gcc that the source operand would read 64 bytes
> through the pointer in the %rdx reg.

No, that just says the asm uses the value of the pointer.
Not what it points to.

> So this ends up close to what you're saying but it is using local
> variables to make the asm actually readable.
>
> Lemme add Micha to Cc for sanity-checking:
>
> Micha, the instruction is:
>
> MOVDIR64B %(rdx), rax
>
> "Move 64-bytes as direct-store with guaranteed 64-byte write atomicity
> from the source memory operand address to destination memory address
> specified as offset to ES segment in the register operand."
>
> Do I need to tell gcc that both operands are referencing 64 bytes,
> source operand is a memory reference, destination operand is an address
> specified in a register?
>
> What we have currently is:
>
> volatile struct { char _[64]; } *dst = __dst;
>
> /* MOVDIR64B [rdx], rax */
> asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
> : "=m" (dst)
> : "d" (from), "a" (dst));

That is wrong.
Feed this into cc -S -O2 and look at the .s file

static inline void movdir64b(void *dst, const void *src)
{
asm volatile(".byte 0x66, 0x0f, 0x38, 0xf8, 0x02"
:
: /*"m" ((struct { char _[64];} *)src),*/ "d" (src), "a" (dst)
);

void foo(void *dst, int val)
{
long b64[8] = { 0 };

b64[0] = val;
movdir64b(dst, b64);
}

Note that all to code that writes into b64[] is optimised away.
Repeat after uncommenting the "m" constraint and spot the difference.

The "=m" (dst) constraint is much less important here.
The write itself will always happen.
So do we need to tell gcc we did it?
Doing so just ensures gcc doesn't move any instructions that it knows
access the same memory above the movdir64b instruction.
But, because this is a cache bypassing write they are going
to be invalid anyway - without extra strong barriers.
So it is fairly safe to miss it out.
OTOH putting it in does no harm and helps annotate what the
instruction is doing.

I just failed to spot an example of a 'memory size' cast in the
kernel source tree - I'm sure there is an example somewhere.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-09-24 11:03:53

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

On Thu, Sep 24, 2020 at 10:42:16AM +0000, David Laight wrote:
> The movdir64b instruction does a 'normal' read of 64 bytes (can be
> misaligned) Then a cache-bypassing (probably) write-combining single
> 64byte write to an address that must be aligned. Any reference to
> segment registers is largely irrelevant since we are not in real mode.

Sounds like you know better than the SDM.

> Mainly less lines of code to look at.

Yeah, no. Readability is what I would prefer any day of the week.

> No idea what clwb() is doing.

Sounds like you need to read another part of the SDM.

> But the "+m" (dst) tells gcc it depends on, and modifies the 64 bytes
> at *dst.
>
> I believe the 'volatile' is pointless.

I discussed this at the time with a gcc person. And nope, it ain't
pointless.

> No, that just says the asm uses the value of the pointer.
> Not what it points to.

Err, no, it is *exactly* what it points to that is important here and
you're telling the compiler that the instruction will read that much
memory through the pointer.

Ok, I've read enough babble. I'll discuss it with a gcc person before I
take anything.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-09-24 11:27:27

by David Laight

[permalink] [raw]
Subject: RE: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

> > No, that just says the asm uses the value of the pointer.
> > Not what it points to.
>
> Err, no, it is *exactly* what it points to that is important here and
> you're telling the compiler that the instruction will read that much
> memory through the pointer.

You need to use an "m" constraint for that.
A 'register' constraint just requires the value of the address
to be valid.

Look at the asm output from the example code I posted.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-09-24 14:10:39

by Michael Matz

[permalink] [raw]
Subject: Re: [PATCH v5 1/5] x86/asm: Carve out a generic movdir64b() helper for general usage

Hello,

even though we hashed it out downthread, let me make some additional
remarks:

On Thu, 24 Sep 2020, Borislav Petkov wrote:

> > /* MOVDIR64B [rdx], rax */

This comment is confusing as it uses Intel syntax for the operand forms,
but AT&T order (dest last).

> volatile struct { char _[64]; } *__dst = dst;
>
> ...
>
> : "=m" (__dst)

This and the other occurences in this thread up to now always miss that
the 'm' constraints want the object itself, not the address of the object.
So you want '"m" (*__src)', same for dst, and so on.

> Micha, the instruction is:
>
> MOVDIR64B %(rdx), rax
>
> "Move 64-bytes as direct-store with guaranteed 64-byte write atomicity
> from the source memory operand address to destination memory address
> specified as offset to ES segment in the register operand."

It's unfortunate that the introduction of this mnemonic into binutils
did it wrong already, but what the instruction should really read like in
AT&T mode is:

movdir64b (%rdx), (%rax)
or even
movdir64b (%rdx), es:(%rax)

because both are memory operands really (even though the destination can
only be encoded with a direct register, as these are the constraints of
x86 insn encodings). It's comparable to movs, which, also having two
memory operands is written:

movsb %ds:(%rsi),%es:(%rdi)


Ciao,
Michael.