2020-10-12 11:09:10

by Borislav Petkov

[permalink] [raw]
Subject: [GIT PULL] x86/asm updates for v5.10

Hi Linus,

please pull two asm wrapper fixes.

Thx.

---

The following changes since commit f75aef392f869018f78cfedf3c320a6b3fcfda6b:

Linux 5.9-rc3 (2020-08-30 16:01:54 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_asm_for_v5.10

for you to fetch changes up to aa5cacdc29d76a005cbbee018a47faa6e724dd2d:

x86/asm: Replace __force_order with a memory clobber (2020-10-01 10:31:48 +0200)

----------------------------------------------------------------
* Use XORL instead of XORQ to avoid a REX prefix and save some bytes in
the .fixup section, by Uros Bizjak.

* Replace __force_order dummy variable with a memory clobber to fix LLVM
requiring a definition for former and to prevent memory accesses from
still being cached/reordered, by Arvind Sankar.

----------------------------------------------------------------
Arvind Sankar (1):
x86/asm: Replace __force_order with a memory clobber

Uros Bizjak (1):
x86/uaccess: Use XORL %0,%0 in __get_user_asm()

arch/x86/boot/compressed/pgtable_64.c | 9 ---------
arch/x86/include/asm/special_insns.h | 28 +++++++++++++++-------------
arch/x86/include/asm/uaccess.h | 2 +-
arch/x86/kernel/cpu/common.c | 4 ++--
4 files changed, 18 insertions(+), 25 deletions(-)

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg


2020-10-12 18:14:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 4:06 AM Borislav Petkov <[email protected]> wrote:
>
> * Use XORL instead of XORQ to avoid a REX prefix and save some bytes in
> the .fixup section, by Uros Bizjak.

I think this one is actually buggy.

For the 1-byte case, it does this:

__get_user_asm(x_u8__, ptr, retval, "b", "=q");

and ends up doing "xorl" on a register that we told the compiler is a
byte register (with that "=q")

Yes, it uses "%k[output]" to turn that byte register into the word
version of the register, but there's no fundamental reason why the
register might not be something like "%ah".

Does the "xorl" work? Does it build? Yes, and yes.

But maybe %al contains SOMETHING ELSE, and it now clears that too,
because the asm is basically doing something completely different than
what we told the compiler it would do.

Now, afaik, gcc (and presumably clang) basically almost never use the
high byte registers. But I still think this patch is fundamentally
wrong and conceptually completely buggy, even if it might work in
practice.

Also, I'm going to uninline this nasty __get_user() function anyway
for 5.10, so the patch ends up being not just wrong, but pointless.
This is not some kind of hot code that should be optimized, and the
extra byte is not a lot to worry about.

Annoying. Because the other patch in this pull request is fine, and
people want it.

But I'm going to skip this pull request, because I really think it's
dangerously and subtly buggy even if there might not be any case that
matters in reality.

Linus

2020-10-12 18:58:23

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 08:41:32PM +0200, Uros Bizjak wrote:
> On Mon, Oct 12, 2020 at 8:11 PM Linus Torvalds <
> [email protected]> wrote:
>
> > On Mon, Oct 12, 2020 at 4:06 AM Borislav Petkov <[email protected]> wrote:
> > >
> > > * Use XORL instead of XORQ to avoid a REX prefix and save some bytes in
> > > the .fixup section, by Uros Bizjak.
> >
> > I think this one is actually buggy.
> >
> > For the 1-byte case, it does this:
> >
> > __get_user_asm(x_u8__, ptr, retval, "b", "=q");
> >
> > and ends up doing "xorl" on a register that we told the compiler is a
> > byte register (with that "=q")
> >
> > Yes, it uses "%k[output]" to turn that byte register into the word
> > version of the register, but there's no fundamental reason why the
> > register might not be something like "%ah".
> >
>
> GCC does not distinguish between %ah and %al and it is not possible to pass
> "%ah" to the assembly. To access the high part of the %ax register, %h
> modifier has to be used in the assembly template.

Btw, did those get documented in the meantime? I can find them only in
gcc sources:

k -- likewise, print the SImode name of the register.
h -- print the QImode name for a "high" register, either ah, bh, ch or dh.

and SImode you guys call

@findex SImode
@item SImode
``Single Integer'' mode represents a four-byte integer.


and QImode:

@findex QImode
@item QImode
``Quarter-Integer'' mode represents a single byte treated as an integer.

so the above %k would turn that into %eax, IINM, since it is a SImode,
i.e., 4 bytes.

> The compiler uses high registers only as a kind of bit insert / bit extract
> operation of 8 bits at the position of 8. The compiler is free to
> substitute "movb %al, %bl" with "movl %eax, %ebx", and there are many
> instruction patterns that exercise this to implement "impossible" reg-reg
> moves involving %sil and %dil registers in 32bit mode.
>
> Based on the above facts, the value in %ah can only live as a part of a
> wider register, %ax (and wider) in this case.

Is this going to be the case for future gccs too or is that subject to
change at some point?

> > Annoying. Because the other patch in this pull request is fine, and
> > people want it.

@Linus: I'll send you a new one with only that one tomorrow.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-10-12 19:01:32

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

+ [email protected] for clang folks...

On Mon, Oct 12, 2020 at 11:56:45AM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 11:41 AM Uros Bizjak <[email protected]> wrote:
> >
> > GCC does not distinguish between %ah and %al and it is not possible to pass "%ah" to the assembly. To access the high part of the %ax register, %h modifier has to be used in the assembly template.
>
> Do you know whether that's true for clang too, for example?
>
> Also note that even if the _asm_ might get "%al", maybe the compiler
> decided to use "%ah" for something else?
>
> I have memories of gcc using the high registers at some point, but it
> might have been some special case code - and it might also be very
> historical.
>
> [ Goes off and checks ]
>
> In fact, I can still find gcc generating high register code, although
> it's quite possible that yes, it's only peephole bit extract
> instruction kind of use..
>
> I also find that clang generates code that uses the high byte
> registers, although again, that's not from any knowledge of clang
> internals, and just by looking at my kernel image disassembly.
>
> So yes, it _may_ all be just peepholes, but it's not obvious that this
> is all safe.
>
> Linus

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-10-12 19:09:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 11:56 AM Linus Torvalds
<[email protected]> wrote:
>
> I also find that clang generates code that uses the high byte
> registers, although again, that's not from any knowledge of clang
> internals, and just by looking at my kernel image disassembly.
>
> So yes, it _may_ all be just peepholes, but it's not obvious that this
> is all safe.

The clang use I find seems to be _purely_ for variations of "mov", and
only ever with the high register as a source.

So yes, that one looks very much like a peephole optimization where
clang just recognizes patterns line

X = (y >> 8) & 0xff;

and uses a "movzbl %*h,xyz" for it.

Gcc actually seems to use high registers more, but the extended use
seems to be bit test (and set) operations that also may be simply
peepholes.

So yes, from code generation patterns it does look likely that neither
compiler actually considers the high registers to be truely
independent entities, and thus quite likely that you'd never find
concurrent mixed use.

But that really seems to be an implementation issue rather than
something we should necessarily rely on, unless we have a stronger
statement from both compiler camps..

Linus

2020-10-12 19:37:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 12:24 PM Uros Bizjak <[email protected]> wrote:
>
> I don't think it is even possible to write to a part of a register in the asm. An example:

But this example is the *reverse* of what I worry about.

I worry about the asm writing not to a "part" of a register, but to
*more* than we told the compiler we'd write to.

If we told the compiler we're only writing to %al, then I could see
the compiler using %ah for something, and scheduling that "somethihng"
to after the inline asm that said it was only modifying the low bits.

Now, I do believe you're right that gcc (and probably clang) simply
doesn't track %ah liveness and clobbering separately from %al.

But it still stinks as a concept when this isn't actually documented
anywhere I can tell.

See my worry?

Linus

2020-10-13 08:53:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 11:41 AM Uros Bizjak <[email protected]> wrote:
>
> GCC does not distinguish between %ah and %al and it is not possible to pass "%ah" to the assembly. To access the high part of the %ax register, %h modifier has to be used in the assembly template.

Do you know whether that's true for clang too, for example?

Also note that even if the _asm_ might get "%al", maybe the compiler
decided to use "%ah" for something else?

I have memories of gcc using the high registers at some point, but it
might have been some special case code - and it might also be very
historical.

[ Goes off and checks ]

In fact, I can still find gcc generating high register code, although
it's quite possible that yes, it's only peephole bit extract
instruction kind of use..

I also find that clang generates code that uses the high byte
registers, although again, that's not from any knowledge of clang
internals, and just by looking at my kernel image disassembly.

So yes, it _may_ all be just peepholes, but it's not obvious that this
is all safe.

Linus

2020-10-13 10:00:08

by Arvind Sankar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 11:11:35AM -0700, Linus Torvalds wrote:
> On Mon, Oct 12, 2020 at 4:06 AM Borislav Petkov <[email protected]> wrote:
> >
> > * Use XORL instead of XORQ to avoid a REX prefix and save some bytes in
> > the .fixup section, by Uros Bizjak.
>
> I think this one is actually buggy.
>
> For the 1-byte case, it does this:
>
> __get_user_asm(x_u8__, ptr, retval, "b", "=q");
>
> and ends up doing "xorl" on a register that we told the compiler is a
> byte register (with that "=q")

It's not the "q", but the size of the l-value specified that tells the
compiler what to use. So x_u8__ does make it use a byte register, and it
would even with an "r" constraint. I think "q" is there only in case you
want to access the low byte of a bigger operand, to force the compiler
to use only a,b,c,d in 32-bit mode.

>
> Yes, it uses "%k[output]" to turn that byte register into the word
> version of the register, but there's no fundamental reason why the
> register might not be something like "%ah".
>
> Does the "xorl" work? Does it build? Yes, and yes.
>
> But maybe %al contains SOMETHING ELSE, and it now clears that too,
> because the asm is basically doing something completely different than
> what we told the compiler it would do.
>
> Now, afaik, gcc (and presumably clang) basically almost never use the
> high byte registers. But I still think this patch is fundamentally
> wrong and conceptually completely buggy, even if it might work in
> practice.
>
> Also, I'm going to uninline this nasty __get_user() function anyway
> for 5.10, so the patch ends up being not just wrong, but pointless.
> This is not some kind of hot code that should be optimized, and the
> extra byte is not a lot to worry about.
>
> Annoying. Because the other patch in this pull request is fine, and
> people want it.
>
> But I'm going to skip this pull request, because I really think it's
> dangerously and subtly buggy even if there might not be any case that
> matters in reality.
>
> Linus

2020-10-13 10:00:22

by Arvind Sankar

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 08:55:47PM +0200, Borislav Petkov wrote:
> On Mon, Oct 12, 2020 at 08:41:32PM +0200, Uros Bizjak wrote:
> > On Mon, Oct 12, 2020 at 8:11 PM Linus Torvalds <
> > [email protected]> wrote:
> >
> > > On Mon, Oct 12, 2020 at 4:06 AM Borislav Petkov <[email protected]> wrote:
> > > >
> > > > * Use XORL instead of XORQ to avoid a REX prefix and save some bytes in
> > > > the .fixup section, by Uros Bizjak.
> > >
> > > I think this one is actually buggy.
> > >
> > > For the 1-byte case, it does this:
> > >
> > > __get_user_asm(x_u8__, ptr, retval, "b", "=q");
> > >
> > > and ends up doing "xorl" on a register that we told the compiler is a
> > > byte register (with that "=q")
> > >
> > > Yes, it uses "%k[output]" to turn that byte register into the word
> > > version of the register, but there's no fundamental reason why the
> > > register might not be something like "%ah".
> > >
> >
> > GCC does not distinguish between %ah and %al and it is not possible to pass
> > "%ah" to the assembly. To access the high part of the %ax register, %h
> > modifier has to be used in the assembly template.
>
> Btw, did those get documented in the meantime? I can find them only in
> gcc sources:
>
> k -- likewise, print the SImode name of the register.
> h -- print the QImode name for a "high" register, either ah, bh, ch or dh.

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers

2020-10-13 10:33:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 1:22 PM Uros Bizjak <[email protected]> wrote:
>
> No, this fact is not documented, although there are close to zero
> chances it will ever change. High registers are independent from their
> 8bit lowparts, but they still clobber corresponding 16bit, 32bit and
> 64bit representations. I guess this limitation is so severe that the
> compiler writers will be more than happy to document that %ah and %al
> can't be independent.

Ok, if we can get that agreed upon (and the clang people too), then I
have no concerns about the patch.

Just so that we don't have any nasty surprises in the future where
some clever compiler change ends up breaking this (very rare)
exception case.

Linus

2020-10-13 11:58:38

by Uros Bizjak

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

On Mon, Oct 12, 2020 at 10:57 PM Linus Torvalds
<[email protected]> wrote:
>
> On Mon, Oct 12, 2020 at 1:22 PM Uros Bizjak <[email protected]> wrote:
> >
> > No, this fact is not documented, although there are close to zero
> > chances it will ever change. High registers are independent from their
> > 8bit lowparts, but they still clobber corresponding 16bit, 32bit and
> > 64bit representations. I guess this limitation is so severe that the
> > compiler writers will be more than happy to document that %ah and %al
> > can't be independent.
>
> Ok, if we can get that agreed upon (and the clang people too), then I
> have no concerns about the patch.

The GCC's development documentation says:

--cut here--
When storing to a normal 'subreg' that is smaller than a
block, the other bits of the referenced block are usually left
in an undefined state. This laxity makes it easier to
generate efficient code for such instructions. To represent
an instruction that preserves all the bits outside of those in
the 'subreg', use 'strict_low_part' or 'zero_extract' around
the 'subreg'.
--cut here--

REG is divided into individually-addressable blocks in which each block has:

REGMODE_NATURAL_SIZE (M2)

bytes. Usually the value is 'UNITS_PER_WORD'; that is, most targets
usually treat each word of a register as being independently
addressable.

The 'block' is 32bits for i386 or 64bits for x86_64. When asm is
writing to %al, this effectively means that other bits of a register
are left in an undefined state. Please note that 'strict_low_part' and
'zero_extract' are internal representations, not available in asm
statements.

> Just so that we don't have any nasty surprises in the future where
> some clever compiler change ends up breaking this (very rare)
> exception case.

IMO the above documents that write to a partial register clobbers the
entire register.

Uros.

2020-10-13 12:03:47

by Borislav Petkov

[permalink] [raw]
Subject: [GIT PULL -v2] x86/asm updates for v5.10

Hi Linus,

here's v2 of the x86/asm pull with only the __force_order patch so that
it can go in now. The other one will be sorted out when the matter has
been settled properly.

Please pull,
thx.

---
The following changes since commit f75aef392f869018f78cfedf3c320a6b3fcfda6b:

Linux 5.9-rc3 (2020-08-30 16:01:54 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_asm_for_v5.10

for you to fetch changes up to 3e626682046e30282979f7d71e054cd4c00069a7:

x86/asm: Replace __force_order with a memory clobber (2020-10-13 11:23:15 +0200)

----------------------------------------------------------------
* Replace __force_order dummy variable with a memory clobber to fix LLVM
requiring a definition for former and to prevent memory accesses from
still being cached/reordered. (Arvind Sankar)

----------------------------------------------------------------
Arvind Sankar (1):
x86/asm: Replace __force_order with a memory clobber

arch/x86/boot/compressed/pgtable_64.c | 9 ---------
arch/x86/include/asm/special_insns.h | 28 +++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 4 ++--
3 files changed, 17 insertions(+), 24 deletions(-)

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-10-13 20:41:45

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL -v2] x86/asm updates for v5.10

On Tue, Oct 13, 2020 at 2:42 AM Borislav Petkov <[email protected]> wrote:
>
> here's v2 of the x86/asm pull with only the __force_order patch so that
> it can go in now. The other one will be sorted out when the matter has
> been settled properly.

Actually, I think you forgot to push out the updated thing, I still
see the same contents of the pull.

Which I guess is ok, since Uros has convinced me that the xorl
conversion is safe even for the byte cases.

So I've pulled that unmodified branch.

Linus

2020-10-13 20:47:36

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] x86/asm updates for v5.10

The pull request you sent on Mon, 12 Oct 2020 13:05:57 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git tags/x86_asm_for_v5.10

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/029f56db6ac248769f2c260bfaf3c3c0e23e904c

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

2020-10-13 20:49:35

by Borislav Petkov

[permalink] [raw]
Subject: Re: [GIT PULL -v2] x86/asm updates for v5.10

On Tue, Oct 13, 2020 at 01:39:01PM -0700, Linus Torvalds wrote:
> Actually, I think you forgot to push out the updated thing, I still
> see the same contents of the pull.

Blergh, that tag is still pointing to the old branch:

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tag/?h=x86_asm_for_v5.10

even though I wrote a new one and pushed it. Otherwise I wouldnt've been
able to create the v2 pull request message.

However, I used the same tag name and force-pushed and perhaps there it
didn't do what I wanted it to do. Sorry about that - I'll recheck stuff
like that in the future.

> Which I guess is ok, since Uros has convinced me that the xorl
> conversion is safe even for the byte cases.
>
> So I've pulled that unmodified branch.

Aha, ok, sounds good too.

Thx.

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
--