2013-07-14 13:00:15

by Ramkumar Ramachandra

[permalink] [raw]
Subject: [PATCH] x86/asm: avoid mnemonics without type suffix

1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
changed a bunch of btrl/btsl instructions to btr/bts, with the following
justification:

The inline assembly for the bit operations has been changed to remove
explicit sizing hints on the instructions, so the assembler will pick
the appropriate instruction forms depending on the architecture and
the context.

Unfortunately, GNU as does no such thing, and the AT&T syntax manual
[1] contains no references to any such inference. As evidenced by the
following experiment, gas always disambiguates btr/bts to btrl/btsl.
Feed the following input to gas:

btrl $1, 0
btr $1, 0
btsl $1, 0
bts $1, 0

Check that btr matches btrl, and bts matches btsl in both cases:

$ as --32 -a in.s
$ as --64 -a in.s

To avoid giving readers the illusion of such an inference, and for
clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to
disambiguate btr/bts automatically.

[1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf

Cc: Jeremy Fitzhardinge <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Eli Friedman <[email protected]>
Cc: Jim Grosbach <[email protected]>
Cc: Stephen Checkoway <[email protected]>
Cc: LLVMdev <[email protected]>
Signed-off-by: Ramkumar Ramachandra <[email protected]>
---
We discussed this pretty extensively on LLVMDev, but I'm still not
sure that I haven't missed something.

arch/x86/include/asm/bitops.h | 16 ++++++++--------
arch/x86/include/asm/percpu.h | 2 +-
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index 6dfd019..6ed3d1e 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
: "iq" ((u8)CONST_MASK(nr))
: "memory");
} else {
- asm volatile(LOCK_PREFIX "bts %1,%0"
+ asm volatile(LOCK_PREFIX "btsl %1,%0"
: BITOP_ADDR(addr) : "Ir" (nr) : "memory");
}
}
@@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
*/
static inline void __set_bit(int nr, volatile unsigned long *addr)
{
- asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
+ asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory");
}

/**
@@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
: CONST_MASK_ADDR(nr, addr)
: "iq" ((u8)~CONST_MASK(nr)));
} else {
- asm volatile(LOCK_PREFIX "btr %1,%0"
+ asm volatile(LOCK_PREFIX "btrl %1,%0"
: BITOP_ADDR(addr)
: "Ir" (nr));
}
@@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr)

static inline void __clear_bit(int nr, volatile unsigned long *addr)
{
- asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
+ asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
}

/*
@@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
{
int oldbit;

- asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
+ asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
"sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");

return oldbit;
@@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
{
int oldbit;

- asm("bts %2,%1\n\t"
+ asm("btsl %2,%1\n\t"
"sbb %0,%0"
: "=r" (oldbit), ADDR
: "Ir" (nr));
@@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
{
int oldbit;

- asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
+ asm volatile(LOCK_PREFIX "btrl %2,%1\n\t"
"sbb %0,%0"
: "=r" (oldbit), ADDR : "Ir" (nr) : "memory");

@@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
{
int oldbit;

- asm volatile("btr %2,%1\n\t"
+ asm volatile("btrl %2,%1\n\t"
"sbb %0,%0"
: "=r" (oldbit), ADDR
: "Ir" (nr));
diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0da5200..fda54c9 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -490,7 +490,7 @@ do { \
#define x86_test_and_clear_bit_percpu(bit, var) \
({ \
int old__; \
- asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \
+ asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \
: "=r" (old__), "+m" (var) \
: "dIr" (bit)); \
old__; \
--
1.8.3.2.736.g869de25


2013-07-14 17:19:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 5:56 AM, Ramkumar Ramachandra
<[email protected]> wrote:
> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
> The inline assembly for the bit operations has been changed to remove
> explicit sizing hints on the instructions, so the assembler will pick
> the appropriate instruction forms depending on the architecture and
> the context.
>
> Unfortunately, GNU as does no such thing

Yes it does.

> btrl $1, 0
> btr $1, 0
> btsl $1, 0
> bts $1, 0

What the heck is that supposed to show? It shows nothing at all. With
an argument of '1', *of*course* gas will use "btsl", since that's the
short form. Using the rex-predix and a btsq would be *stupid*.

So gas will pick the appropriate form, exactly as claimed.

Try some actual relevant test instead:

bt %eax,mem
bt %rax,mem

and notice how they are actually fundamentally different. Test-case:

int main(int argc, char **argv)
{
asm("bt %1,%0":"=m" (**argv): "a" (argc));
asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc)));
}

and I get

0f a3 02 bt %eax,(%rdx)
48 0f a3 02 bt %rax,(%rdx)

exactly as expected and wanted.

Now, there are possible cases where you want to make the size explicit
because you are mixing memory operand sizes and there can be nasty
performance implications of doing a 32-bit write and then doing a
64-bit read of the result. I'm not actually aware of us having ever
worried/cared about it, but it's a possible source of trouble: mixing
bitop instructions with non-bitop instructions can have some subtle
interactions, and you need to be careful, since the size of the
operand affects both the offset *and* the memory access size. The
access size generally is meaningless from a semantic standpoint
(little-endian being the only sane model), but the access size *can*
have performance implications for the write queue forwarding.

Linus

2013-07-14 18:26:51

by Ramkumar Ramachandra

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

Linus Torvalds wrote:
>> btrl $1, 0
>> btr $1, 0
>> btsl $1, 0
>> bts $1, 0
>
> What the heck is that supposed to show?

I was trying to show a reduced case where gas doesn't complain, but
llvm-mc does. Try compiling this with llvm-mc, and you'll get:

.text
btrl $1, 0
in.s:2:1: error: ambiguous instructions require an explicit suffix
(could be 'btrw', 'btrl', or 'btrq')
btr $1, 0
^
btsl $1, 0
in.s:4:1: error: ambiguous instructions require an explicit suffix
(could be 'btsw', 'btsl', or 'btsq')
bts $1, 0
^

Obviously, I misunderstood something major and screwed up the commit message.

> int main(int argc, char **argv)
> {
> asm("bt %1,%0":"=m" (**argv): "a" (argc));
> asm("bt %1,%0":"=m" (**argv): "a" ((unsigned long)(argc)));
> }

Right, so in:

int main(int argc, char **argv)
{
asm("bts %1,%0":"=m" (**argv): "r" (argc));
asm("btsl %1,%0":"=m" (**argv): "r" (argc));
asm("btr %1,%0":"=m" (**argv): "r" ((unsigned long)(argc)));
asm("btrq %1,%0":"=m" (**argv): "r" ((unsigned long)(argc)));
}

bts disambiguates to btsl, and btr disambiguates to btrq, as
advertised. Is it dependent on whether I have a 32-bit machine or
64-bit machine, or just on the operand lengths?

Either way, this is not a very enlightening example, because clang
also compiles this fine, and doesn't complain about any ambiguity. To
see the ambiguity I'm talking about, try to compile linux.git with
clang; I'll paste one error:

arch/x86/include/asm/bitops.h:129:15: error: ambiguous instructions
require an explicit suffix (could be 'btrw', 'btrl', or 'btrq')
asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
^
<inline asm>:1:2: note: instantiated into assembly here
btr $0,(%rsi)
^

Since nr is an int, and ADDR is *(volatile long *), this should
disambiguate to btrl, right? Any clue why clang is complaining?

2013-07-14 18:34:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 11:26 AM, Ramkumar Ramachandra
<[email protected]> wrote:
>
> I was trying to show a reduced case where gas doesn't complain, but
> llvm-mc does. Try compiling this with llvm-mc, and you'll get:

Ok. So your commit message and explanation was pure and utter tripe,
and the real reason you want this is that llvm-mc is broken.

Please fix llvm-mc instead, ok? If the intent of llvm is to be
compatible with the gnu compiler tools, then it should do that. Plus
the gas behavior is clearly superior, so why not just improve the llvm
toolchain to match those improved semantics?

Linus

2013-07-14 18:35:23

by Tim Northover

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

Hi,

The issue perhaps wasn't explained ideally (and possibly shouldn't
have been CCed directly to you either, so apologies, but now that
there *is* a discussion...)

> Try some actual relevant test instead:
>
> bt %eax,mem
> bt %rax,mem
>
> and notice how they are actually fundamentally different. Test-case:

I'm coming at this from the compiler side, where the register form is
unambiguous and not questioned. The discussion we're having involves
only the immediate form of the instruction. GNU as interprets:

bt $63, mem

as
btl $63, mem

which may or may not be what the user intended, but is not the same as
"btq $63, mem".

I'm not an official LLVM spokesperson or anything, but our consensus
seems to be that "bt $imm, whatever" is ambiguous (the %eax and %rax
versions you quoted disambiguate the width) and should be disallowed
by the assembler.

The patch we're replying to implements that as a NOP fix to the kernel
(GNU as always treats "bt" with an immediate as "btl"). I don't
believe there's any situation in which it will produce different code,
but it will allow Clang to compile (this part of) the kernel.

There is, however, a potential optimisation here for someone who knows
their inline asm. Currently "set_bit(63, addr)" will use the "r"
version of the constraint even on amd64 targets, materialising 63 with
a "movl". With sufficiently clever faff, it could use "btsq" instead.

Cheers.

Tim.

2013-07-14 18:49:47

by Ramkumar Ramachandra

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

Linus Torvalds wrote:
> Ok. So your commit message and explanation was pure and utter tripe,
> and the real reason you want this is that llvm-mc is broken.
>
> Please fix llvm-mc instead, ok? If the intent of llvm is to be
> compatible with the gnu compiler tools, then it should do that. Plus
> the gas behavior is clearly superior, so why not just improve the llvm
> toolchain to match those improved semantics?

Yep. I started the discussion on LLVMDev, and posted patches [1].
>From the discussions on the list, many of the devs are claiming that
LLVM is "correct" and that linux.git needs to be patched. I'm not
taking sides; I just want a solution to the problem.

[1]: The archive is broken, but here are some pieces:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130708/180968.html

2013-07-14 19:09:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 11:35 AM, Tim Northover <[email protected]> wrote:
>
> I'm coming at this from the compiler side, where the register form is
> unambiguous and not questioned. The discussion we're having involves
> only the immediate form of the instruction. GNU as interprets:
>
> bt $63, mem
>
> as
> btl $63, mem
>
> which may or may not be what the user intended, but is not the same as
> "btq $63, mem".

Umm. The user doesn't care. The user wants the best code without
having to worry about it.

Think of it this way: the whole and ONLY point of an assembler is to
make machine code reasonably easy to write, by not having to worry
about the exact encoding details. We don't want the users specifying
the hex representation of the instructions, do we? Or even details
like "what is the most efficient form of this instruction". For
example, think about branch offsets and immediates. Most architectures
have some limits about how long branch offsets or immediates are, and
a short branch offset may use TOTALLY DIFFERENT instruction encoding
than a long branch offset.

Do you really expect that the user says "jnel" for the long form of
the "jne" instruction? And "jnes" if you want the
smaller/faster/simpler 8-bit version?

No sane person actually wants that, and no modern assembler does that
(although I can remember ones that did - ugh). You write "jne target"
and depend on the assembler doing the right thing. Or you write "add
$5,%eax", and depend on the fact that the assembler will use the much
shorter version of the "add" instruction that just takes a 8-bit
signed value instead of the full 32-bit immediate. Or any number of
details like this ("there are special versions that only work on %eax"
etc rules)

And that is why I think you should just consider "bt $x,y" to be
trivially the same thing and not at all ambiguous. Because there is
ABSOLUTELY ZERO ambiguity when people write

bt $63, mem

Zero. Nada. None. The semantics are *exactly* the same for btl and btq
in this case, so why would you want the user to specify one or the
other? The user may be knowledgeable about the architecture, and know
that "btl" is one byte shorter than "btq", and use "btl" for that
reason. You seem to argue that that is the "right thing"(tm) to do,
since that's what the instruction encoding will be. But if that's the
case, then you are arguing that "jne target" is "ambiguous" because
there are two different ways to encode that too? Do you seriously
argue that?

So I'm arguing that that is wrong for an assembler to not just do the
right thing, because the user isn't *supposed* to have to know about
things like "one byte shorter encoding format". And there really is no
semantic difference between the two forms. So making the user specify
the size is just going to cause problems (in particular, it might well
make users go "this is an array of 64-bit entities, so I should use
btq", even though that is actually incorrect).

Now, I obviously think that the user should have the choice to
*override* the default thing, so sometimes you might have

/* We use a 64-bit btsq to encourage the CPU to do it as a 64-bit
read-modify-write, since we will do a 64-bit read of the result later,
and otherwise we'll get a partial write buffer stall */

btsq $63, mem

and then the assembler had obviously better use the size information
the user gave it. But the thing is, this is basically never a concern
in practice, and when it is, the assembler really cannot know (it
could go either way: maybe the bts is following a 32-bit write, and
you want the 32-bit version - and I suspect that the likelihood of
most users getting this right by hand is quite low too).

(Side note: I'm not even going to guarantee that the actual CPU uses
the operand size for the memory access size. The manuals imply they
do, but since there are no real semantic reasons to enforce that, I
could imagine that some microarchitecture doesn't actually care).

Linus

2013-07-14 19:23:53

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

(resent without HTML)

On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
> changed a bunch of btrl/btsl instructions to btr/bts, with the following
> justification:
>
> The inline assembly for the bit operations has been changed to remove
> explicit sizing hints on the instructions, so the assembler will pick
> the appropriate instruction forms depending on the architecture and
> the context.
>
> Unfortunately, GNU as does no such thing, and the AT&T syntax manual
> [1] contains no references to any such inference. As evidenced by the
> following experiment, gas always disambiguates btr/bts to btrl/btsl.
> Feed the following input to gas:
>
> btrl $1, 0
> btr $1, 0
> btsl $1, 0
> bts $1, 0

When I originally did those patches, I was careful make sure that we
didn't give implied sizes to operations with only immediate and/or
memory operands because - in general - gas can't infer the operation
size from such operands. However, in the case of the bit test/set
operations, the memory access size is not really derived from the
operation size (the SDM is a bit vague), and even if it were it would be
an operation rather than semantic difference. So there's no real
problem with gas choosing 'l' as a default size in the absence of any
explicit override or constraint.

> Check that btr matches btrl, and bts matches btsl in both cases:
>
> $ as --32 -a in.s
> $ as --64 -a in.s
>
> To avoid giving readers the illusion of such an inference, and for
> clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to
> disambiguate btr/bts automatically.

That sounds reasonable for all other operations because it makes a real
semantic difference, but overly strict for bit operations.

J


> [1]: http://docs.oracle.com/cd/E19253-01/817-5477/817-5477.pdf
>
> Cc: Jeremy Fitzhardinge <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Eli Friedman <[email protected]>
> Cc: Jim Grosbach <[email protected]>
> Cc: Stephen Checkoway <[email protected]>
> Cc: LLVMdev <[email protected]>
> Signed-off-by: Ramkumar Ramachandra <[email protected]>
> ---
> We discussed this pretty extensively on LLVMDev, but I'm still not
> sure that I haven't missed something.
>
> arch/x86/include/asm/bitops.h | 16 ++++++++--------
> arch/x86/include/asm/percpu.h | 2 +-
> 2 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
> index 6dfd019..6ed3d1e 100644
> --- a/arch/x86/include/asm/bitops.h
> +++ b/arch/x86/include/asm/bitops.h
> @@ -67,7 +67,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
> : "iq" ((u8)CONST_MASK(nr))
> : "memory");
> } else {
> - asm volatile(LOCK_PREFIX "bts %1,%0"
> + asm volatile(LOCK_PREFIX "btsl %1,%0"
> : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
> }
> }
> @@ -83,7 +83,7 @@ set_bit(unsigned int nr, volatile unsigned long *addr)
> */
> static inline void __set_bit(int nr, volatile unsigned long *addr)
> {
> - asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
> + asm volatile("btsl %1,%0" : ADDR : "Ir" (nr) : "memory");
> }
>
> /**
> @@ -104,7 +104,7 @@ clear_bit(int nr, volatile unsigned long *addr)
> : CONST_MASK_ADDR(nr, addr)
> : "iq" ((u8)~CONST_MASK(nr)));
> } else {
> - asm volatile(LOCK_PREFIX "btr %1,%0"
> + asm volatile(LOCK_PREFIX "btrl %1,%0"
> : BITOP_ADDR(addr)
> : "Ir" (nr));
> }
> @@ -126,7 +126,7 @@ static inline void clear_bit_unlock(unsigned nr, volatile unsigned long *addr)
>
> static inline void __clear_bit(int nr, volatile unsigned long *addr)
> {
> - asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
> + asm volatile("btrl %1,%0" : ADDR : "Ir" (nr));
> }
>
> /*
> @@ -198,7 +198,7 @@ static inline int test_and_set_bit(int nr, volatile unsigned long *addr)
> {
> int oldbit;
>
> - asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
> + asm volatile(LOCK_PREFIX "btsl %2,%1\n\t"
> "sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
>
> return oldbit;
> @@ -230,7 +230,7 @@ static inline int __test_and_set_bit(int nr, volatile unsigned long *addr)
> {
> int oldbit;
>
> - asm("bts %2,%1\n\t"
> + asm("btsl %2,%1\n\t"
> "sbb %0,%0"
> : "=r" (oldbit), ADDR
> : "Ir" (nr));
> @@ -249,7 +249,7 @@ static inline int test_and_clear_bit(int nr, volatile unsigned long *addr)
> {
> int oldbit;
>
> - asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
> + asm volatile(LOCK_PREFIX "btrl %2,%1\n\t"
> "sbb %0,%0"
> : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
>
> @@ -276,7 +276,7 @@ static inline int __test_and_clear_bit(int nr, volatile unsigned long *addr)
> {
> int oldbit;
>
> - asm volatile("btr %2,%1\n\t"
> + asm volatile("btrl %2,%1\n\t"
> "sbb %0,%0"
> : "=r" (oldbit), ADDR
> : "Ir" (nr));
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 0da5200..fda54c9 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -490,7 +490,7 @@ do { \
> #define x86_test_and_clear_bit_percpu(bit, var) \
> ({ \
> int old__; \
> - asm volatile("btr %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \
> + asm volatile("btrl %2,"__percpu_arg(1)"\n\tsbbl %0,%0" \
> : "=r" (old__), "+m" (var) \
> : "dIr" (bit)); \
> old__; \

2013-07-14 19:23:58

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

(Resent without HTML)

On 07/14/2013 10:19 AM, Linus Torvalds wrote:
> Now, there are possible cases where you want to make the size explicit
> because you are mixing memory operand sizes and there can be nasty
> performance implications of doing a 32-bit write and then doing a
> 64-bit read of the result. I'm not actually aware of us having ever
> worried/cared about it, but it's a possible source of trouble: mixing
> bitop instructions with non-bitop instructions can have some subtle
> interactions, and you need to be careful, since the size of the
> operand affects both the offset *and* the memory access size.
The SDM entry for BT mentions that the instruction may touch 2 or 4
bytes depending on the operand size, but doesn't specifically mention
that a 64 bit operation size touches 8 bytes - and it doesn't mention
anything at all about operand size and access size in BTR/BTS/BTC
(unless it's implied as part of the discussion about encoding the MSBs
of a constant bit offset in the offset of the addressing mode). Is that
an oversight?

> The
> access size generally is meaningless from a semantic standpoint
> (little-endian being the only sane model), but the access size *can*
> have performance implications for the write queue forwarding.

It looks like that if the base address isn't aligned then neither is the
generated access, so you could get a protection fault if it overlaps a
page boundary, which is a semantic rather than purely operational
difference.

J

2013-07-14 19:29:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 12:23 PM, Jeremy Fitzhardinge <[email protected]> wrote:
>
> It looks like that if the base address isn't aligned then neither is the
> generated access, so you could get a protection fault if it overlaps a
> page boundary, which is a semantic rather than purely operational
> difference.

You could also get AC fault for the btq if the thing is only long-aligned.

But yes, I checked the Intel manuals too, and the access size is
actually not well-specified (even the 16-bit case says "may", I
think), so both the page-fault and the alignment fault are purely
theoretical. And i'm too lazy to bother trying the (easily testable)
alignment fault case in practice, since (a) nobody cares and (b)
nobody cares.

In the (unlikely) situation that somebody actually cares, that
somebody should obviously then have to specify "btl" vs "btq".
Assuming the hardware cares, which is testable but might be
micro-architecture dependent.

Linus

2013-07-14 19:30:04

by Tim Northover

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

> And that is why I think you should just consider "bt $x,y" to be
> trivially the same thing and not at all ambiguous. Because there is
> ABSOLUTELY ZERO ambiguity when people write
>
> bt $63, mem
>
> Zero. Nada. None. The semantics are *exactly* the same for btl and btq
> in this case, so why would you want the user to specify one or the
> other?

I don't think you've actually tested that, have you? (x86-64)

int main() {
long val = 0xffffffff;
char res;

asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
printf("%d\n", res);

asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
printf("%d\n", res);
}

Tim.

2013-07-14 19:41:10

by Jeremy Fitzhardinge

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 12:30 PM, Tim Northover wrote:
>> And that is why I think you should just consider "bt $x,y" to be
>> trivially the same thing and not at all ambiguous. Because there is
>> ABSOLUTELY ZERO ambiguity when people write
>>
>> bt $63, mem
>>
>> Zero. Nada. None. The semantics are *exactly* the same for btl and btq
>> in this case, so why would you want the user to specify one or the
>> other?
> I don't think you've actually tested that, have you? (x86-64)
>
> int main() {
> long val = 0xffffffff;
> char res;
>
> asm("btl $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
> printf("%d\n", res);
>
> asm("btq $63, %1\n\tsetc %0" : "=r"(res) : "m"(val));
> printf("%d\n", res);
> }

Blerk. It doesn't undermine the original point - that gas can
unambiguously choose the right operation size for a constant bit offset
- but yes, the operation size is meaningful in the case of a immediate
bit offset. Its pretty nasty of Intel to hide that detail in Table 3-2,
far from the instructions which use it...

J

>
> Tim.
>

2013-07-14 19:49:46

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On Sun, Jul 14, 2013 at 12:30 PM, Tim Northover <[email protected]> wrote:
>
> I don't think you've actually tested that, have you? (x86-64)

Oh, you're right, for constants > 5 bits you have that other thing
going on. I didn't think about the fact that the constant changed in
the middle of the thread (it started out as 1).

We use the gcc constraint "I" (0-31) in the kernel for this reason.

Linus

2013-07-14 21:14:54

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix


I think best would be to just find some way to implement LOCK prefix
patching using atomic compiler intrinsics and then switch to those

Then all this inline assembler horror could be ifdef'ed away
for old compilers only, and likely the generated code would
be better as the compiler could optimize more.

Or just give up on LOCK patching, as single CPU systems
and VMs are less and less interesting?

-Andi
--
[email protected] -- Speaking for myself only

2013-07-15 18:44:52

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 12:49 PM, Linus Torvalds wrote:
> On Sun, Jul 14, 2013 at 12:30 PM, Tim Northover <[email protected]> wrote:
>>
>> I don't think you've actually tested that, have you? (x86-64)
>
> Oh, you're right, for constants > 5 bits you have that other thing
> going on. I didn't think about the fact that the constant changed in
> the middle of the thread (it started out as 1).
>
> We use the gcc constraint "I" (0-31) in the kernel for this reason.
>
> Linus

This is also why the Intel manuals point out that "some assemblers" can
take things like:

bt[l] $63,(%rsi)

... and turn it into:

btl $31,4(%rsi)

This is definitely the friendly thing to do toward the human programmer.
Unfortunately gas doesn't, nor does e.g. NASM.

-hpa

2013-07-15 18:45:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote:
> The SDM entry for BT mentions that the instruction may touch 2 or 4
> bytes depending on the operand size, but doesn't specifically mention
> that a 64 bit operation size touches 8 bytes - and it doesn't mention
> anything at all about operand size and access size in BTR/BTS/BTC
> (unless it's implied as part of the discussion about encoding the MSBs
> of a constant bit offset in the offset of the addressing mode). Is that
> an oversight?

Most likely. I'll check with the people responsible for the SDM here at
Intel.

-hpa

2013-07-15 18:51:29

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 02:14 PM, Andi Kleen wrote:
>
> I think best would be to just find some way to implement LOCK prefix
> patching using atomic compiler intrinsics and then switch to those
>

That will take a long time to make happen now, and then when we change
things we have to wait for gcc to catch up. At some point this always
degenerates into the "do we need a kernel-specific compiler" morass.

-hpa

2013-07-15 18:53:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/14/2013 12:23 PM, Jeremy Fitzhardinge wrote:
> (resent without HTML)
>
> On 07/14/2013 05:56 AM, Ramkumar Ramachandra wrote:
>> 1c54d77 (x86: partial unification of asm-x86/bitops.h, 2008-01-30)
>> changed a bunch of btrl/btsl instructions to btr/bts, with the following
>> justification:
>>
>> The inline assembly for the bit operations has been changed to remove
>> explicit sizing hints on the instructions, so the assembler will pick
>> the appropriate instruction forms depending on the architecture and
>> the context.
>>
>> Unfortunately, GNU as does no such thing, and the AT&T syntax manual
>> [1] contains no references to any such inference. As evidenced by the
>> following experiment, gas always disambiguates btr/bts to btrl/btsl.
>> Feed the following input to gas:
>>
>> btrl $1, 0
>> btr $1, 0
>> btsl $1, 0
>> bts $1, 0
>
> When I originally did those patches, I was careful make sure that we
> didn't give implied sizes to operations with only immediate and/or
> memory operands because - in general - gas can't infer the operation
> size from such operands. However, in the case of the bit test/set
> operations, the memory access size is not really derived from the
> operation size (the SDM is a bit vague), and even if it were it would be
> an operation rather than semantic difference. So there's no real
> problem with gas choosing 'l' as a default size in the absence of any
> explicit override or constraint.
>
>> Check that btr matches btrl, and bts matches btsl in both cases:
>>
>> $ as --32 -a in.s
>> $ as --64 -a in.s
>>
>> To avoid giving readers the illusion of such an inference, and for
>> clarity, change btr/bts back to btrl/btsl. Also, llvm-mc refuses to
>> disambiguate btr/bts automatically.
>
> That sounds reasonable for all other operations because it makes a real
> semantic difference, but overly strict for bit operations.
>

To be fair, we *ought to* be able to do something like:

asm volatile(LOCK_PREFIX "bts%z0 %1,%0"
: BITOP_ADDR(addr) : "Ir" (nr) : "memory");

... but some older version of gcc are broken and emit "ll" rather than
"q". Furthermore, since that would actually result in *worse* code
emitted overall (unnecessary REX prefixes), I'm not exactly happy on the
idea.

-hpa

2013-07-15 18:56:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On Mon, Jul 15, 2013 at 11:40 AM, H. Peter Anvin <[email protected]> wrote:
> On 07/14/2013 12:49 PM, Linus Torvalds wrote:
>
> This is also why the Intel manuals point out that "some assemblers" can
> take things like:
>
> bt[l] $63,(%rsi)
>
> ... and turn it into:
>
> btl $31,4(%rsi)
>
> This is definitely the friendly thing to do toward the human programmer.
> Unfortunately gas doesn't, nor does e.g. NASM.

Yeah, that's definitely a "quality of implementation" issue. Clearly
"bt $63,mem" is talking about bit 63, and a quality assembler would
either warn about it or just do the RightThing(tm) like the intel
manual says.

I'd actually like to say "think you" to the gas people, because gas
today may not do the above, but gas today is still *lightyears* ahead
of where it used to be two decades ago. Back in those dark ages, GNU
as was even documented to be *only* about turning compiler output into
object code, and gas was the ghastliest assembler on the planet - it
silently did horrible horrible things, and didn't do *anything*
user-friendly or clever. It would entirely ignore things like implied
sizes from register names etc, and generate code that was obviously
not at all what the user expected, but because cc1 always used
explicit sizes etc and only generated very specific syntax, it "wasn't
an issue".

gas has improved immensely in this regard, and the fact that it
silently takes a $63 and effectively turns it into $31 is something I
think is not nice and not a good QoI, but considering where gas came
from, I'm not going to complain about it too much. It's
"understandable", even if it isn't great.

But quite frankly, partly because of just how bad gas used to be wrt
issues like this, I think that any other assembler should aim to be at
_least_ as good as gas, if not better. Because I definitely don't want
to go back to the bad old days. I've been there, done that. Assemblers
that are worse than gas are not worth working with. gas should be
considered the minimal implementation quality, and llvm-as should
strive to do better rather than worse..

(NASM used to be *much* more pleasant to work with than gas. Maybe you
should strive to make nasm DTRT wrt bt and constants, and maintain
that lead?)

Linus

2013-07-15 18:58:59

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] x86/asm: avoid mnemonics without type suffix

On Mon, Jul 15, 2013 at 11:47 AM, H. Peter Anvin <[email protected]> wrote:
>
> To be fair, we *ought to* be able to do something like:
>
> asm volatile(LOCK_PREFIX "bts%z0 %1,%0"
> : BITOP_ADDR(addr) : "Ir" (nr) : "memory");
>
> ... but some older version of gcc are broken and emit "ll" rather than
> "q". Furthermore, since that would actually result in *worse* code
> emitted overall (unnecessary REX prefixes), I'm not exactly happy on the
> idea.

I really think the "worse code" argument is the one that matters.

Specifying the size of the operation is *overspecifying* things,
exactly because the 32-bit encoding is actually the *better* one when
possible.

So it's much better to underspecify and let the assembler pick the
best encoding, than it is to use an explicit size and get worse code.

Which is why I brought up the issue of small constants and short
jumps. I really believe this is exactly the same issue.

Linus

2013-07-15 19:00:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On Mon, Jul 15, 2013 at 11:56 AM, Linus Torvalds
<[email protected]> wrote:
>
> I'd actually like to say "think you" to the gas people

My fingers are all over the place today. "thank you".

It's not even like "i" and "a" are next to each other - they're at
different ends of the keyboard. Clearly I'm more used to writing
"think" than "thank".

Linus

2013-07-15 19:01:12

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [LLVMdev] [PATCH] x86/asm: avoid mnemonics without type suffix

On 07/15/2013 11:56 AM, Linus Torvalds wrote:
>
> (NASM used to be *much* more pleasant to work with than gas. Maybe you
> should strive to make nasm DTRT wrt bt and constants, and maintain
> that lead?)
>

Yes, that is in fact why I took over NASM development when Simon Tatham
gave up on it. I'll look at adding this in the next version, it should
be easy enough.

-hpa