2015-02-12 19:07:06

by Denys Vlasenko

[permalink] [raw]
Subject: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
insns differently. For near branches, it affects decode too since
immediate offset's width is different.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Masami Hiramatsu <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Oleg Nesterov <[email protected]>
CC: [email protected]
---
arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1a2be7c..816488c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -273,6 +273,9 @@ dd: ESC
de: ESC
df: ESC
# 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
e0: LOOPNE/LOOPNZ Jb (f64)
e1: LOOPE/LOOPZ Jb (f64)
e2: LOOP Jb (f64)
@@ -281,6 +284,10 @@ e4: IN AL,Ib
e5: IN eAX,Ib
e6: OUT Ib,AL
e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
e8: CALL Jz (f64)
e9: JMP-near Jz (f64)
ea: JMP-far Ap (i64)
@@ -456,6 +463,7 @@ AVXcode: 1
7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
# 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
80: JO Jz (f64)
81: JNO Jz (f64)
82: JB/JC/JNAE Jz (f64)
@@ -842,6 +850,7 @@ EndTable
GrpTable: Grp5
0: INC Ev
1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
2: CALLN Ev (f64)
3: CALLF Ep
4: JMPN Ev (f64)
--
1.8.1.4


2015-02-13 12:02:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

On Thu, Feb 12, 2015 at 08:06:57PM +0100, Denys Vlasenko wrote:
> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
> insns differently. For near branches, it affects decode too since
> immediate offset's width is different.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Masami Hiramatsu <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: [email protected]
> ---
> arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 1a2be7c..816488c 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -273,6 +273,9 @@ dd: ESC
> de: ESC
> df: ESC
> # 0xe0 - 0xef
> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.

Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
Operation section:

...
IF BranchCond = 1
THEN
IF OperandSize = 32
THEN EIP ← EIP + SignExtend(DEST);
ELSE IF OperandSize = 64
THEN RIP ← RIP + SignExtend(DEST);
FI;
ELSE IF OperandSize = 16
THEN EIP ← EIP AND 0000FFFFH; <---

and text talks about 0x67 but that's address size and it is used to size
the rCX register.

So something must be setting the OperandSize and text doesn't mention
anywhere about 0x66 being ignored.

Or have you been doing some empirical experiments? :-)

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

Subject: Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

(2015/02/13 4:06), Denys Vlasenko wrote:
> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
> insns differently. For near branches, it affects decode too since
> immediate offset's width is different.

You'd better add a link to your investigation report :)

http://marc.info/?l=linux-kernel&m=139714939728946&w=2

so that anyone can see what actually happens.

Thank you,

>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Masami Hiramatsu <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> CC: [email protected]
> ---
> arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index 1a2be7c..816488c 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -273,6 +273,9 @@ dd: ESC
> de: ESC
> df: ESC
> # 0xe0 - 0xef
> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
> e0: LOOPNE/LOOPNZ Jb (f64)
> e1: LOOPE/LOOPZ Jb (f64)
> e2: LOOP Jb (f64)
> @@ -281,6 +284,10 @@ e4: IN AL,Ib
> e5: IN eAX,Ib
> e6: OUT Ib,AL
> e7: OUT Ib,eAX
> +# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
> +# in "near" jumps and calls is 16-bit. For CALL,
> +# push of return address is 16-bit wide, RSP is decremented by 2
> +# but is not truncated to 16 bits, unlike RIP.
> e8: CALL Jz (f64)
> e9: JMP-near Jz (f64)
> ea: JMP-far Ap (i64)
> @@ -456,6 +463,7 @@ AVXcode: 1
> 7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
> 7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
> # 0x0f 0x80-0x8f
> +# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
> 80: JO Jz (f64)
> 81: JNO Jz (f64)
> 82: JB/JC/JNAE Jz (f64)
> @@ -842,6 +850,7 @@ EndTable
> GrpTable: Grp5
> 0: INC Ev
> 1: DEC Ev
> +# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
> 2: CALLN Ev (f64)
> 3: CALLF Ep
> 4: JMPN Ev (f64)
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2015-02-13 13:25:43

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

On Fri, Feb 13, 2015 at 1:01 PM, Borislav Petkov <[email protected]> wrote:
> On Thu, Feb 12, 2015 at 08:06:57PM +0100, Denys Vlasenko wrote:
>> In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before branch
>> insns differently. For near branches, it affects decode too since
>> immediate offset's width is different.
>>
>> Signed-off-by: Denys Vlasenko <[email protected]>
>> CC: Masami Hiramatsu <[email protected]>
>> CC: Ingo Molnar <[email protected]>
>> CC: Oleg Nesterov <[email protected]>
>> CC: [email protected]
>> ---
>> arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
>> 1 file changed, 9 insertions(+)
>>
>> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
>> index 1a2be7c..816488c 100644
>> --- a/arch/x86/lib/x86-opcode-map.txt
>> +++ b/arch/x86/lib/x86-opcode-map.txt
>> @@ -273,6 +273,9 @@ dd: ESC
>> de: ESC
>> df: ESC
>> # 0xe0 - 0xef
>> +# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
>> +# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
>> +# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
>
> Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
> Operation section:
>
> ...
> IF BranchCond = 1
> THEN
> IF OperandSize = 32
> THEN EIP ← EIP + SignExtend(DEST);
> ELSE IF OperandSize = 64
> THEN RIP ← RIP + SignExtend(DEST);
> FI;
> ELSE IF OperandSize = 16
> THEN EIP ← EIP AND 0000FFFFH; <---
>
> and text talks about 0x67 but that's address size and it is used to size
> the rCX register.
>
> So something must be setting the OperandSize and text doesn't mention
> anywhere about 0x66 being ignored.
>
> Or have you been doing some empirical experiments? :-)

Yes, I did.

32-bit case: Intel CPU truncates EIP to 16 bits:

$ cat t.S
_start: .globl _start
1: .byte 0x66
loop 1b

$ gcc -nostartfiles -nostdlib -m32 t.S

$ objdump -dr a.out
a.out: file format elf32-i386
Disassembly of section .text:
08048098 <_start>:
8048098: 66 data16
8048099: e2 fd loop 8048098 <_start>

$ gdb ./a.out
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x00008098 in ?? ()


Now let's try 64-bit version - compiling without -m32:

$ gcc -nostartfiles -nostdlib t.S
$ ./a.out
(runs without SEGV)

2015-02-14 00:28:59

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86: x86-opcode-map.txt: explain CALLW discrepancy between Intel and AMD

On Fri, Feb 13, 2015 at 02:25:20PM +0100, Denys Vlasenko wrote:
> > Well, according to the SDM, Intel truncates too, see the LOOP/LOOPcc
> > Operation section:
> >
> > ...
> > IF BranchCond = 1
> > THEN
> > IF OperandSize = 32
> > THEN EIP ← EIP + SignExtend(DEST);
> > ELSE IF OperandSize = 64
> > THEN RIP ← RIP + SignExtend(DEST);
> > FI;
> > ELSE IF OperandSize = 16
> > THEN EIP ← EIP AND 0000FFFFH; <---
> >
> > and text talks about 0x67 but that's address size and it is used to size
> > the rCX register.
> >
> > So something must be setting the OperandSize and text doesn't mention
> > anywhere about 0x66 being ignored.
> >
> > Or have you been doing some empirical experiments? :-)
>
> Yes, I did.
>
> 32-bit case: Intel CPU truncates EIP to 16 bits:
>
> $ cat t.S
> _start: .globl _start
> 1: .byte 0x66
> loop 1b
>
> $ gcc -nostartfiles -nostdlib -m32 t.S
>
> $ objdump -dr a.out
> a.out: file format elf32-i386
> Disassembly of section .text:
> 08048098 <_start>:
> 8048098: 66 data16
> 8048099: e2 fd loop 8048098 <_start>
>
> $ gdb ./a.out
> (gdb) run
> Program received signal SIGSEGV, Segmentation fault.
> 0x00008098 in ?? ()
>
>
> Now let's try 64-bit version - compiling without -m32:
>
> $ gcc -nostartfiles -nostdlib t.S
> $ ./a.out
> (runs without SEGV)
>

AMD CPU always truncates:

32-bit: a.out[13626]: segfault at 8098 ip 0000000000008098 sp 00000000ffa0ea20 error 14 in a.out[8048000+1000]

64-bit: a.out[13706]: segfault at d6 ip 00000000000000d6 sp 00007fffec14e870 error 14 in a.out[400000+1000]


Intel CPU:

32-bit: a.out[3478]: segfault at 8098 ip 0000000000008098 sp 00000000ff959da0 error 14 in a.out[8048000+1000]

64-bit:

Make the loop terminate:

_start: .globl _start
mov $1, %rcx
1: .byte 0x66
loop 1b


a.out[3523]: segfault at 0 ip 00000000004000de sp 00007ffff31674e0 error 6 in a.out[400000+1000]

segfaults because we don't have the libc glue around it, rIP is intact.

So it looks like the SDM is wrong.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

Subject: [tip:x86/asm] x86/asm/decoder: Explain CALLW discrepancy between Intel and AMD

Commit-ID: cbb53b9623a70f012e1fdfb6fc0af6878df4762b
Gitweb: http://git.kernel.org/tip/cbb53b9623a70f012e1fdfb6fc0af6878df4762b
Author: Denys Vlasenko <[email protected]>
AuthorDate: Thu, 12 Feb 2015 20:06:57 +0100
Committer: Ingo Molnar <[email protected]>
CommitDate: Wed, 18 Feb 2015 21:01:59 +0100

x86/asm/decoder: Explain CALLW discrepancy between Intel and AMD

In 64-bit mode, AMD and Intel CPUs treat 0x66 prefix before
branch insns differently. For near branches, it affects decode
too since immediate offset's width is different.

See these empirical tests:

http://marc.info/?l=linux-kernel&m=139714939728946&w=2

Signed-off-by: Denys Vlasenko <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1a2be7c..816488c 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -273,6 +273,9 @@ dd: ESC
de: ESC
df: ESC
# 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
e0: LOOPNE/LOOPNZ Jb (f64)
e1: LOOPE/LOOPZ Jb (f64)
e2: LOOP Jb (f64)
@@ -281,6 +284,10 @@ e4: IN AL,Ib
e5: IN eAX,Ib
e6: OUT Ib,AL
e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
e8: CALL Jz (f64)
e9: JMP-near Jz (f64)
ea: JMP-far Ap (i64)
@@ -456,6 +463,7 @@ AVXcode: 1
7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
# 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
80: JO Jz (f64)
81: JNO Jz (f64)
82: JB/JC/JNAE Jz (f64)
@@ -842,6 +850,7 @@ EndTable
GrpTable: Grp5
0: INC Ev
1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
2: CALLN Ev (f64)
3: CALLF Ep
4: JMPN Ev (f64)