2024-05-02 10:59:25

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 00/10] perf intel pt: Update instruction decoder for APX and other new instructions

Hi

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

It should be noted that there are 2 copies of the instruction decoder.
One for the kernel and one for tools, which is a policy to prevent
changes from breaking builds.

Changes are based upon documents:

Intel® Advanced Performance Extensions (Intel® APX)
Architecture Specification
April, 2024 Revision 4.0 355828-004

Intel® Architecture Instruction Set Extensions
and Future Features Programming Reference
March 2024 319433-052

The patches depend on a patch by Chang S. Bae that they have already
submitted in their Key Locker patch set. Nevertheless, the patch is
included in this patch set as well.

The final patch is an update to the perf tools' "x86 instruction decoder
test - new instructions" test, which provides testing.


Adrian Hunter (9):
x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map
x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS
x86/insn: Add misc new Intel instructions
x86/insn: Add support for REX2 prefix to the instruction decoder logic
x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map
x86/insn: Add support for APX EVEX to the instruction decoder logic
x86/insn: Add support for APX EVEX instructions to the opcode map
perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder
perf tests: Add APX and other new instructions to x86 instruction decoder test

Chang S. Bae (1):
x86/insn: Add Key Locker instructions to the opcode map

arch/x86/include/asm/inat.h | 17 +-
arch/x86/include/asm/insn.h | 32 +-
arch/x86/lib/insn.c | 29 +
arch/x86/lib/x86-opcode-map.txt | 315 ++++--
arch/x86/tools/gen-insn-attr-x86.awk | 15 +-
tools/arch/x86/include/asm/inat.h | 17 +-
tools/arch/x86/include/asm/insn.h | 32 +-
tools/arch/x86/lib/insn.c | 29 +
tools/arch/x86/lib/x86-opcode-map.txt | 315 ++++--
tools/arch/x86/tools/gen-insn-attr-x86.awk | 15 +-
tools/perf/arch/x86/tests/insn-x86-dat-32.c | 116 +++
tools/perf/arch/x86/tests/insn-x86-dat-64.c | 1026 ++++++++++++++++++++
tools/perf/arch/x86/tests/insn-x86-dat-src.c | 597 ++++++++++++
.../util/intel-pt-decoder/intel-pt-insn-decoder.c | 9 +
14 files changed, 2370 insertions(+), 194 deletions(-)


Regards
Adrian


2024-05-02 10:59:55

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.

Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.

Example:

$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]

Before:

$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction

After:

$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax

Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index c94988d5130d..4ea2e6adb477 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
65: SEG=GS (Prefix)
66: Operand-Size (Prefix)
67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
69: IMUL Gv,Ev,Iz
6a: PUSH Ib (d64)
6b: IMUL Gv,Ev,Ib
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index c94988d5130d..4ea2e6adb477 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
65: SEG=GS (Prefix)
66: Operand-Size (Prefix)
67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
69: IMUL Gv,Ev,Iz
6a: PUSH Ib (d64)
6b: IMUL Gv,Ev,Ib
--
2.34.1


2024-05-02 11:00:28

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 01/10] x86/insn: Add Key Locker instructions to the opcode map

From: "Chang S. Bae" <[email protected]>

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
Load a CPU-internal wrapping key.

ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.

AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 12af572201a2..c94988d5130d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
--
2.34.1


2024-05-02 11:00:58

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 06/10] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map

Support for REX2 has been added to the instruction decoder logic and the
awk script that generates the attribute tables from the opcode map.

Add REX2 prefix byte (0xD5) to the opcode map.

Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.

Add JMPABS to the opcode map and add annotation (REX2) to identify that it
has a mandatory REX2 prefix. A separate opcode attribute table is not
needed at this time because JMPABS has the same attribute encoding as the
MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 148 +++++++++++++-------------
tools/arch/x86/lib/x86-opcode-map.txt | 148 +++++++++++++-------------
2 files changed, 152 insertions(+), 144 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4e9f53581b58..240ef714b64f 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
# - (F2): the last prefix is 0xF2
# - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
# - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+# - (!REX2): REX2 is not allowed
+# - (REX2): REX2 variant e.g. JMPABS

Table: one byte opcode
Referrer:
@@ -157,22 +161,22 @@ AVXcode:
6e: OUTS/OUTSB DX,Xb
6f: OUTS/OUTSW/OUTSD DX,Xz
# 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
# 0x80 - 0x8f
80: Grp1 Eb,Ib (1A)
81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
9e: SAHF
9f: LAHF
# 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
# Note: The May 2011 Intel manual shows Xv for the second parameter of the
# next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
# 0xb0 - 0xbf
b0: MOV AL/R8L,Ib
b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
d2: Grp2 Eb,CL (1A)
d3: Grp2 Ev,CL (1A)
d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
d6:
d7: XLAT/XLATB
d8: ESC
@@ -281,26 +285,26 @@ df: ESC
# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
# in "near" jumps and calls is 16-bit. For CALL,
# push of return address is 16-bit wide, RSP is decremented by 2
# but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
# 0xf0 - 0xff
f0: LOCK (Prefix)
f1:
@@ -386,14 +390,14 @@ AVXcode: 1
2e: vucomiss Vss,Wss (v1) | vucomisd Vsd,Wsd (66),(v1)
2f: vcomiss Vss,Wss (v1) | vcomisd Vsd,Wsd (66),(v1)
# 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
36:
-37: GETSEC
+37: GETSEC (!REX2)
38: escape # 3-byte escape 1
39:
3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
# 0x0f 0x80-0x8f
# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
# 0x0f 0x90-0x9f
90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4e9f53581b58..240ef714b64f 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
# - (F2): the last prefix is 0xF2
# - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
# - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+# - (!REX2): REX2 is not allowed
+# - (REX2): REX2 variant e.g. JMPABS

Table: one byte opcode
Referrer:
@@ -157,22 +161,22 @@ AVXcode:
6e: OUTS/OUTSB DX,Xb
6f: OUTS/OUTSW/OUTSD DX,Xz
# 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
# 0x80 - 0x8f
80: Grp1 Eb,Ib (1A)
81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
9e: SAHF
9f: LAHF
# 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
# Note: The May 2011 Intel manual shows Xv for the second parameter of the
# next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
# 0xb0 - 0xbf
b0: MOV AL/R8L,Ib
b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
d2: Grp2 Eb,CL (1A)
d3: Grp2 Ev,CL (1A)
d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
d6:
d7: XLAT/XLATB
d8: ESC
@@ -281,26 +285,26 @@ df: ESC
# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
# in "near" jumps and calls is 16-bit. For CALL,
# push of return address is 16-bit wide, RSP is decremented by 2
# but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
# 0xf0 - 0xff
f0: LOCK (Prefix)
f1:
@@ -386,14 +390,14 @@ AVXcode: 1
2e: vucomiss Vss,Wss (v1) | vucomisd Vsd,Wsd (66),(v1)
2f: vcomiss Vss,Wss (v1) | vcomisd Vsd,Wsd (66),(v1)
# 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
36:
-37: GETSEC
+37: GETSEC (!REX2)
38: escape # 3-byte escape 1
39:
3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
# 0x0f 0x80-0x8f
# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
# 0x0f 0x90-0x9f
90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)
--
2.34.1


2024-05-02 11:01:13

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 03/10] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Intel Architecture Instruction Set Extensions and Future Features manual
number 319433-044 of May 2021, documented VEX versions of instructions
VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
listed as EVEX only.

Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
or EVEX prefix.

Fixes: 0153d98f2dd6 ("x86/insn: Add misc instructions to x86 instruction decoder")
Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 8 ++++----
tools/arch/x86/lib/x86-opcode-map.txt | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6adb477..24941b9d4e06 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
55: vpopcntd/q Vx,Wx (66),(ev)
58: vpbroadcastd Vx,Wx (66),(v)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6adb477..24941b9d4e06 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
55: vpopcntd/q Vx,Wx (66),(ev)
58: vpbroadcastd Vx,Wx (66),(v)
--
2.34.1


2024-05-02 11:01:30

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 08/10] x86/insn: Add support for APX EVEX instructions to the opcode map

To support APX functionality, the EVEX prefix is used to:
- promote legacy instructions
- promote VEX instructions
- add new instructions

Promoted VEX instructions require no extra annotation because the opcodes
do not change and the permissive nature of the instruction decoder already
allows them to have an EVEX prefix.

Promoted legacy instructions and new instructions are placed in map 4 which
has not been used before.

Create a new table for map 4 and add APX instructions.

Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
support for APX EVEX to the instruction decoder logic". SCALABLE
instructions must be represented in both no-prefix (NP) and 66 prefix
forms.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 93 +++++++++++++++++++++++++++
tools/arch/x86/lib/x86-opcode-map.txt | 93 +++++++++++++++++++++++++++
2 files changed, 186 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 240ef714b64f..caedb3ef6688 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
#
# AVX Superscripts
# (ev): this opcode requires EVEX prefix.
+# (es): this opcode requires EVEX prefix and is SCALABALE.
# (evo): this opcode is changed by EVEX prefix (EVEX opcode)
# (v): this opcode requires VEX prefix.
# (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable

+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+# CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO Gv,Ev (es) | CMOVO Gv,Ev (66),(es) | CFCMOVO Ev,Ev (es) | CFCMOVO Ev,Ev (66),(es) | SETO Eb (F2),(ev)
+41: CMOVNO Gv,Ev (es) | CMOVNO Gv,Ev (66),(es) | CFCMOVNO Ev,Ev (es) | CFCMOVNO Ev,Ev (66),(es) | SETNO Eb (F2),(ev)
+42: CMOVB Gv,Ev (es) | CMOVB Gv,Ev (66),(es) | CFCMOVB Ev,Ev (es) | CFCMOVB Ev,Ev (66),(es) | SETB Eb (F2),(ev)
+43: CMOVNB Gv,Ev (es) | CMOVNB Gv,Ev (66),(es) | CFCMOVNB Ev,Ev (es) | CFCMOVNB Ev,Ev (66),(es) | SETNB Eb (F2),(ev)
+44: CMOVZ Gv,Ev (es) | CMOVZ Gv,Ev (66),(es) | CFCMOVZ Ev,Ev (es) | CFCMOVZ Ev,Ev (66),(es) | SETZ Eb (F2),(ev)
+45: CMOVNZ Gv,Ev (es) | CMOVNZ Gv,Ev (66),(es) | CFCMOVNZ Ev,Ev (es) | CFCMOVNZ Ev,Ev (66),(es) | SETNZ Eb (F2),(ev)
+46: CMOVBE Gv,Ev (es) | CMOVBE Gv,Ev (66),(es) | CFCMOVBE Ev,Ev (es) | CFCMOVBE Ev,Ev (66),(es) | SETBE Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS Gv,Ev (es) | CMOVS Gv,Ev (66),(es) | CFCMOVS Ev,Ev (es) | CFCMOVS Ev,Ev (66),(es) | SETS Eb (F2),(ev)
+49: CMOVNS Gv,Ev (es) | CMOVNS Gv,Ev (66),(es) | CFCMOVNS Ev,Ev (es) | CFCMOVNS Ev,Ev (66),(es) | SETNS Eb (F2),(ev)
+4a: CMOVP Gv,Ev (es) | CMOVP Gv,Ev (66),(es) | CFCMOVP Ev,Ev (es) | CFCMOVP Ev,Ev (66),(es) | SETP Eb (F2),(ev)
+4b: CMOVNP Gv,Ev (es) | CMOVNP Gv,Ev (66),(es) | CFCMOVNP Ev,Ev (es) | CFCMOVNP Ev,Ev (66),(es) | SETNP Eb (F2),(ev)
+4c: CMOVL Gv,Ev (es) | CMOVL Gv,Ev (66),(es) | CFCMOVL Ev,Ev (es) | CFCMOVL Ev,Ev (66),(es) | SETL Eb (F2),(ev)
+4d: CMOVNL Gv,Ev (es) | CMOVNL Gv,Ev (66),(es) | CFCMOVNL Ev,Ev (es) | CFCMOVNL Ev,Ev (66),(es) | SETNL Eb (F2),(ev)
+4e: CMOVLE Gv,Ev (es) | CMOVLE Gv,Ev (66),(es) | CFCMOVLE Ev,Ev (es) | CFCMOVLE Ev,Ev (66),(es) | SETLE Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+# CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
Table: EVEX map 5
Referrer:
AVXcode: 5
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 240ef714b64f..caedb3ef6688 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
#
# AVX Superscripts
# (ev): this opcode requires EVEX prefix.
+# (es): this opcode requires EVEX prefix and is SCALABALE.
# (evo): this opcode is changed by EVEX prefix (EVEX opcode)
# (v): this opcode requires VEX prefix.
# (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable

+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+# CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO Gv,Ev (es) | CMOVO Gv,Ev (66),(es) | CFCMOVO Ev,Ev (es) | CFCMOVO Ev,Ev (66),(es) | SETO Eb (F2),(ev)
+41: CMOVNO Gv,Ev (es) | CMOVNO Gv,Ev (66),(es) | CFCMOVNO Ev,Ev (es) | CFCMOVNO Ev,Ev (66),(es) | SETNO Eb (F2),(ev)
+42: CMOVB Gv,Ev (es) | CMOVB Gv,Ev (66),(es) | CFCMOVB Ev,Ev (es) | CFCMOVB Ev,Ev (66),(es) | SETB Eb (F2),(ev)
+43: CMOVNB Gv,Ev (es) | CMOVNB Gv,Ev (66),(es) | CFCMOVNB Ev,Ev (es) | CFCMOVNB Ev,Ev (66),(es) | SETNB Eb (F2),(ev)
+44: CMOVZ Gv,Ev (es) | CMOVZ Gv,Ev (66),(es) | CFCMOVZ Ev,Ev (es) | CFCMOVZ Ev,Ev (66),(es) | SETZ Eb (F2),(ev)
+45: CMOVNZ Gv,Ev (es) | CMOVNZ Gv,Ev (66),(es) | CFCMOVNZ Ev,Ev (es) | CFCMOVNZ Ev,Ev (66),(es) | SETNZ Eb (F2),(ev)
+46: CMOVBE Gv,Ev (es) | CMOVBE Gv,Ev (66),(es) | CFCMOVBE Ev,Ev (es) | CFCMOVBE Ev,Ev (66),(es) | SETBE Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS Gv,Ev (es) | CMOVS Gv,Ev (66),(es) | CFCMOVS Ev,Ev (es) | CFCMOVS Ev,Ev (66),(es) | SETS Eb (F2),(ev)
+49: CMOVNS Gv,Ev (es) | CMOVNS Gv,Ev (66),(es) | CFCMOVNS Ev,Ev (es) | CFCMOVNS Ev,Ev (66),(es) | SETNS Eb (F2),(ev)
+4a: CMOVP Gv,Ev (es) | CMOVP Gv,Ev (66),(es) | CFCMOVP Ev,Ev (es) | CFCMOVP Ev,Ev (66),(es) | SETP Eb (F2),(ev)
+4b: CMOVNP Gv,Ev (es) | CMOVNP Gv,Ev (66),(es) | CFCMOVNP Ev,Ev (es) | CFCMOVNP Ev,Ev (66),(es) | SETNP Eb (F2),(ev)
+4c: CMOVL Gv,Ev (es) | CMOVL Gv,Ev (66),(es) | CFCMOVL Ev,Ev (es) | CFCMOVL Ev,Ev (66),(es) | SETL Eb (F2),(ev)
+4d: CMOVNL Gv,Ev (es) | CMOVNL Gv,Ev (66),(es) | CFCMOVNL Ev,Ev (es) | CFCMOVNL Ev,Ev (66),(es) | SETNL Eb (F2),(ev)
+4e: CMOVLE Gv,Ev (es) | CMOVLE Gv,Ev (66),(es) | CFCMOVLE Ev,Ev (es) | CFCMOVLE Ev,Ev (66),(es) | SETLE Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+# CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
Table: EVEX map 5
Referrer:
AVXcode: 5
--
2.34.1


2024-05-02 11:01:46

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 09/10] perf intel pt: Add new JMPABS instruction to the Intel PT instruction decoder

JMPABS is 64-bit absolute direct jump instruction, encoded with a mandatory
REX2 prefix. JMPABS is designed to be used in the procedure linkage table
(PLT) to replace indirect jumps, because it has better performance. In that
case the jump target will be amended at run time. To enable Intel PT to
follow the code, a TIP packet is always emitted when JMPABS is traced under
Intel PT.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Decode JMPABS as an indirect jump, because it has an associated TIP packet
the same as an indirect jump and the control flow should follow the TIP
packet payload, and not assume it is the same as the on-file object code
JMPABS target address.

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
index c5d57027ec23..4407130d91f8 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -92,6 +92,15 @@ static void intel_pt_insn_decoder(struct insn *insn,
op = INTEL_PT_OP_JCC;
branch = INTEL_PT_BR_CONDITIONAL;
break;
+ case 0xa1:
+ if (insn_is_rex2(insn)) { /* jmpabs */
+ intel_pt_insn->op = INTEL_PT_OP_JMP;
+ /* jmpabs causes a TIP packet like an indirect branch */
+ intel_pt_insn->branch = INTEL_PT_BR_INDIRECT;
+ intel_pt_insn->length = insn->length;
+ return;
+ }
+ break;
case 0xc2: /* near ret */
case 0xc3: /* near ret */
case 0xca: /* far ret */
--
2.34.1


2024-05-02 11:01:48

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 04/10] x86/insn: Add misc new Intel instructions

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Add instructions documented in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference March 2024
319433-052, that have not been added yet:

AADD
AAND
AOR
AXOR
CMPccXADD
PBNDKB
RDMSRLIST
URDMSR
UWRMSR
VBCSTNEBF162PS
VBCSTNESH2PS
VCVTNEEBF162PS
VCVTNEEPH2PS
VCVTNEOBF162PS
VCVTNEOPH2PS
VCVTNEPS2BF16
VPDPB[SU,UU,SS]D[,S]
VPDPW[SU,US,UU]D[,S]
VPMADD52HUQ
VPMADD52LUQ
VSHA512MSG1
VSHA512MSG2
VSHA512RNDS2
VSM3MSG1
VSM3MSG2
VSM3RNDS2
VSM4KEY4
VSM4RNDS4
WRMSRLIST
TCMMIMFP16PS
TCMMRLFP16PS
TDPFP16PS
PREFETCHIT1
PREFETCHIT0

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 57 +++++++++++++++++++++------
tools/arch/x86/lib/x86-opcode-map.txt | 57 +++++++++++++++++++++------
2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 24941b9d4e06..4e9f53581b58 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
# Skip 0x5d
5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
# Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
65: vblendmps/d Vx,Hx,Wx (66),(ev)
66: vpblendmb/w Vx,Hx,Wx (66),(ev)
68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
70: vpshldvw Vx,Hx,Wx (66),(ev)
71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
75: vpermi2b/w Vx,Hx,Wx (66),(ev)
76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
c9: sha1msg1 Vdq,Wdq
ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
db: VAESIMC Vdq,Wdq (66),(v1)
dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
fa: ENCODEKEY128 Ew,Ew (F3)
fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
cc: sha1rnds4 Vdq,Wdq,Ib
ce: vgf2p8affineqb Vx,Wx,Ib (66)
cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
EndTable

+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
GrpTable: Grp1
0: ADD
1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
EndTable

GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
1: prefetch T0
2: prefetch T1
3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
EndTable

GrpTable: Grp17
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 24941b9d4e06..4e9f53581b58 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
# Skip 0x5d
5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
# Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
65: vblendmps/d Vx,Hx,Wx (66),(ev)
66: vpblendmb/w Vx,Hx,Wx (66),(ev)
68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
70: vpshldvw Vx,Hx,Wx (66),(ev)
71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
75: vpermi2b/w Vx,Hx,Wx (66),(ev)
76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
c9: sha1msg1 Vdq,Wdq
ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
db: VAESIMC Vdq,Wdq (66),(v1)
dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
fa: ENCODEKEY128 Ew,Ew (F3)
fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
cc: sha1rnds4 Vdq,Wdq,Ib
ce: vgf2p8affineqb Vx,Wx,Ib (66)
cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
EndTable

+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
GrpTable: Grp1
0: ADD
1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
EndTable

GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
1: prefetch T0
2: prefetch T1
3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
EndTable

GrpTable: Grp17
--
2.34.1


2024-05-02 11:02:02

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 10/10] perf tests: Add APX and other new instructions to x86 instruction decoder test

Add samples of APX and other new instructions to the 'x86 instruction
decoder - new instructions' test.

Note the test is only available if the perf tool has been built with
EXTRA_TESTS=1.

Example:

$ make EXTRA_TESTS=1 -C tools/perf
$ tools/perf/perf test -F -v 'new ins' |& grep -i 'jmpabs\|popp\|pushp'
Decoded ok: d5 00 a1 ef cd ab 90 78 56 34 12 jmpabs $0x1234567890abcdef
Decoded ok: d5 08 53 pushp %rbx
Decoded ok: d5 18 50 pushp %r16
Decoded ok: d5 19 57 pushp %r31
Decoded ok: d5 19 5f popp %r31
Decoded ok: d5 18 58 popp %r16
Decoded ok: d5 08 5b popp %rbx

Signed-off-by: Adrian Hunter <[email protected]>
---
tools/perf/arch/x86/tests/insn-x86-dat-32.c | 116 ++
tools/perf/arch/x86/tests/insn-x86-dat-64.c | 1026 ++++++++++++++++++
tools/perf/arch/x86/tests/insn-x86-dat-src.c | 597 ++++++++++
3 files changed, 1739 insertions(+)

diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-32.c b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
index ba429cadb18f..ce9645edaf68 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-32.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-32.c
@@ -3107,6 +3107,122 @@
"62 f5 7c 08 2e ca \tvucomish %xmm2,%xmm1",},
{{0x62, 0xf5, 0x7c, 0x08, 0x2e, 0x8c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
"62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%eax,%ecx,8),%xmm1",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1 \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0 \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0 \tencodekey256 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dc 5a 77 \taesenc128kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 de 5a 77 \taesenc256kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dd 5a 77 \taesdec128kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 df 5a 77 \taesdec256kl 0x77(%edx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 42 77 \taesencwide128kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 52 77 \taesencwide256kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 4a 77 \taesdecwide128kl 0x77(%edx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 5a 77 \taesdecwide256kl 0x77(%edx)",},
+{{0x0f, 0x38, 0xfc, 0x08, }, 4, 0, "", "",
+"0f 38 fc 08 \taadd %ecx,(%eax)",},
+{{0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 8, 0, "", "",
+"0f 38 fc 15 78 56 34 12 \taadd %edx,0x12345678",},
+{{0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 94 c8 78 56 34 12 \taadd %edx,0x12345678(%eax,%ecx,8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"66 0f 38 fc 08 \taand %ecx,(%eax)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"66 0f 38 fc 15 78 56 34 12 \taand %edx,0x12345678",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 94 c8 78 56 34 12 \taand %edx,0x12345678(%eax,%ecx,8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f2 0f 38 fc 08 \taor %ecx,(%eax)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"f2 0f 38 fc 15 78 56 34 12 \taor %edx,0x12345678",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 94 c8 78 56 34 12 \taor %edx,0x12345678(%eax,%ecx,8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f3 0f 38 fc 08 \taxor %ecx,(%eax)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x15, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"f3 0f 38 fc 15 78 56 34 12 \taxor %edx,0x12345678",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 94 c8 78 56 34 12 \taxor %edx,0x12345678(%eax,%ecx,8)",},
+{{0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b1 31 \tvbcstnebf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b1 31 \tvbcstnesh2ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b0 31 \tvcvtneebf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b0 31 \tvcvtneeph2ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7b b0 31 \tvcvtneobf162ps (%ecx),%xmm6",},
+{{0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 78 b0 31 \tvcvtneoph2ps (%ecx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1 \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xc4, 0xe2, 0x6b, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 50 d9 \tvpdpbssd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 51 d9 \tvpdpbssds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 50 d9 \tvpdpbsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 51 d9 \tvpdpbsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 50 d9 \tvpdpbuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 51 d9 \tvpdpbuuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d2 d9 \tvpdpwsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d3 d9 \tvpdpwsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d2 d9 \tvpdpwusd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d3 d9 \tvpdpwusds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d2 d9 \tvpdpwuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d3 d9 \tvpdpwuuds %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb5, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b5 d9 \tvpmadd52huq %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb4, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b4 d9 \tvpmadd52luq %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x7f, 0xcc, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cc d1 \tvsha512msg1 %xmm1,%ymm2",},
+{{0xc4, 0xe2, 0x7f, 0xcd, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cd d1 \tvsha512msg2 %ymm1,%ymm2",},
+{{0xc4, 0xe2, 0x6f, 0xcb, 0xd9, }, 5, 0, "", "",
+"c4 e2 6f cb d9 \tvsha512rnds2 %xmm1,%ymm2,%ymm3",},
+{{0xc4, 0xe2, 0x68, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 da d9 \tvsm3msg1 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 da d9 \tvsm3msg2 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe3, 0x69, 0xde, 0xd9, 0xa1, }, 6, 0, "", "",
+"c4 e3 69 de d9 a1 \tvsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a da d9 \tvsm4key4 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b da d9 \tvsm4rnds4 %xmm1,%xmm2,%xmm3",},
+{{0x0f, 0x0d, 0x00, }, 3, 0, "", "",
+"0f 0d 00 \tprefetch (%eax)",},
+{{0x0f, 0x18, 0x08, }, 3, 0, "", "",
+"0f 18 08 \tprefetcht0 (%eax)",},
+{{0x0f, 0x18, 0x10, }, 3, 0, "", "",
+"0f 18 10 \tprefetcht1 (%eax)",},
+{{0x0f, 0x18, 0x18, }, 3, 0, "", "",
+"0f 18 18 \tprefetcht2 (%eax)",},
+{{0x0f, 0x18, 0x00, }, 3, 0, "", "",
+"0f 18 00 \tprefetchnta (%eax)",},
+{{0x0f, 0x01, 0xc6, }, 3, 0, "", "",
+"0f 01 c6 \twrmsrns",},
{{0xf3, 0x0f, 0x3a, 0xf0, 0xc0, 0x00, }, 6, 0, "", "",
"f3 0f 3a f0 c0 00 \threset $0x0",},
{{0x0f, 0x01, 0xe8, }, 3, 0, "", "",
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-64.c b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
index 3a47e98fec33..3881fe89df8b 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-64.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-64.c
@@ -3877,6 +3877,1032 @@
"62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%rax,%rcx,8),%xmm1",},
{{0x67, 0x62, 0xf5, 0x7c, 0x08, 0x2e, 0x8c, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 12, 0, "", "",
"67 62 f5 7c 08 2e 8c c8 78 56 34 12 \tvucomish 0x12345678(%eax,%ecx,8),%xmm1",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1 \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0 \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0 \tencodekey256 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dc 5a 77 \taesenc128kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 de 5a 77 \taesenc256kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 dd 5a 77 \taesdec128kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 df 5a 77 \taesdec256kl 0x77(%rdx),%xmm3",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 42 77 \taesencwide128kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 52 77 \taesencwide256kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 4a 77 \taesdecwide128kl 0x77(%rdx)",},
+{{0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 6, 0, "", "",
+"f3 0f 38 d8 5a 77 \taesdecwide256kl 0x77(%rdx)",},
+{{0x0f, 0x38, 0xfc, 0x08, }, 4, 0, "", "",
+"0f 38 fc 08 \taadd %ecx,(%rax)",},
+{{0x41, 0x0f, 0x38, 0xfc, 0x10, }, 5, 0, "", "",
+"41 0f 38 fc 10 \taadd %edx,(%r8)",},
+{{0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 94 c8 78 56 34 12 \taadd %edx,0x12345678(%rax,%rcx,8)",},
+{{0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"41 0f 38 fc 94 c8 78 56 34 12 \taadd %edx,0x12345678(%r8,%rcx,8)",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"48 0f 38 fc 08 \taadd %rcx,(%rax)",},
+{{0x49, 0x0f, 0x38, 0xfc, 0x10, }, 5, 0, "", "",
+"49 0f 38 fc 10 \taadd %rdx,(%r8)",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"48 0f 38 fc 14 25 78 56 34 12 \taadd %rdx,0x12345678",},
+{{0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"48 0f 38 fc 94 c8 78 56 34 12 \taadd %rdx,0x12345678(%rax,%rcx,8)",},
+{{0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"49 0f 38 fc 94 c8 78 56 34 12 \taadd %rdx,0x12345678(%r8,%rcx,8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"66 0f 38 fc 08 \taand %ecx,(%rax)",},
+{{0x66, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"66 41 0f 38 fc 10 \taand %edx,(%r8)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 94 c8 78 56 34 12 \taand %edx,0x12345678(%rax,%rcx,8)",},
+{{0x66, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 41 0f 38 fc 94 c8 78 56 34 12 \taand %edx,0x12345678(%r8,%rcx,8)",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"66 48 0f 38 fc 08 \taand %rcx,(%rax)",},
+{{0x66, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"66 49 0f 38 fc 10 \taand %rdx,(%r8)",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 48 0f 38 fc 14 25 78 56 34 12 \taand %rdx,0x12345678",},
+{{0x66, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 48 0f 38 fc 94 c8 78 56 34 12 \taand %rdx,0x12345678(%rax,%rcx,8)",},
+{{0x66, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"66 49 0f 38 fc 94 c8 78 56 34 12 \taand %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f2 0f 38 fc 08 \taor %ecx,(%rax)",},
+{{0xf2, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f2 41 0f 38 fc 10 \taor %edx,(%r8)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 94 c8 78 56 34 12 \taor %edx,0x12345678(%rax,%rcx,8)",},
+{{0xf2, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 41 0f 38 fc 94 c8 78 56 34 12 \taor %edx,0x12345678(%r8,%rcx,8)",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"f2 48 0f 38 fc 08 \taor %rcx,(%rax)",},
+{{0xf2, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f2 49 0f 38 fc 10 \taor %rdx,(%r8)",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 48 0f 38 fc 14 25 78 56 34 12 \taor %rdx,0x12345678",},
+{{0xf2, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 48 0f 38 fc 94 c8 78 56 34 12 \taor %rdx,0x12345678(%rax,%rcx,8)",},
+{{0xf2, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f2 49 0f 38 fc 94 c8 78 56 34 12 \taor %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"f3 0f 38 fc 08 \taxor %ecx,(%rax)",},
+{{0xf3, 0x41, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f3 41 0f 38 fc 10 \taxor %edx,(%r8)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 94 c8 78 56 34 12 \taxor %edx,0x12345678(%rax,%rcx,8)",},
+{{0xf3, 0x41, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 41 0f 38 fc 94 c8 78 56 34 12 \taxor %edx,0x12345678(%r8,%rcx,8)",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"f3 48 0f 38 fc 08 \taxor %rcx,(%rax)",},
+{{0xf3, 0x49, 0x0f, 0x38, 0xfc, 0x10, }, 6, 0, "", "",
+"f3 49 0f 38 fc 10 \taxor %rdx,(%r8)",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 48 0f 38 fc 14 25 78 56 34 12 \taxor %rdx,0x12345678",},
+{{0xf3, 0x48, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 48 0f 38 fc 94 c8 78 56 34 12 \taxor %rdx,0x12345678(%rax,%rcx,8)",},
+{{0xf3, 0x49, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"f3 49 0f 38 fc 94 c8 78 56 34 12 \taxor %rdx,0x12345678(%r8,%rcx,8)",},
+{{0xc4, 0xc2, 0x61, 0xe6, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e6 09 \tcmpbexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe2, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e2 09 \tcmpbxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xee, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ee 09 \tcmplexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xec, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ec 09 \tcmplxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe7, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e7 09 \tcmpnbexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe3, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e3 09 \tcmpnbxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xef, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ef 09 \tcmpnlexadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xed, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ed 09 \tcmpnlxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe1, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e1 09 \tcmpnoxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xeb, 0x09, }, 5, 0, "", "",
+"c4 c2 61 eb 09 \tcmpnpxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe9, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e9 09 \tcmpnsxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe5, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e5 09 \tcmpnzxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe0, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e0 09 \tcmpoxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xea, 0x09, }, 5, 0, "", "",
+"c4 c2 61 ea 09 \tcmppxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe8, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e8 09 \tcmpsxadd %ebx,%ecx,(%r9)",},
+{{0xc4, 0xc2, 0x61, 0xe4, 0x09, }, 5, 0, "", "",
+"c4 c2 61 e4 09 \tcmpzxadd %ebx,%ecx,(%r9)",},
+{{0x0f, 0x0d, 0x00, }, 3, 0, "", "",
+"0f 0d 00 \tprefetch (%rax)",},
+{{0x0f, 0x18, 0x08, }, 3, 0, "", "",
+"0f 18 08 \tprefetcht0 (%rax)",},
+{{0x0f, 0x18, 0x10, }, 3, 0, "", "",
+"0f 18 10 \tprefetcht1 (%rax)",},
+{{0x0f, 0x18, 0x18, }, 3, 0, "", "",
+"0f 18 18 \tprefetcht2 (%rax)",},
+{{0x0f, 0x18, 0x00, }, 3, 0, "", "",
+"0f 18 00 \tprefetchnta (%rax)",},
+{{0x0f, 0x18, 0x3d, 0x78, 0x56, 0x34, 0x12, }, 7, 0, "", "",
+"0f 18 3d 78 56 34 12 \tprefetchit0 0x12345678(%rip) # 1234924e <main+0x1234924e>",},
+{{0x0f, 0x18, 0x35, 0x78, 0x56, 0x34, 0x12, }, 7, 0, "", "",
+"0f 18 35 78 56 34 12 \tprefetchit1 0x12345678(%rip) # 12349255 <main+0x12349255>",},
+{{0xf2, 0x0f, 0x01, 0xc6, }, 4, 0, "", "",
+"f2 0f 01 c6 \trdmsrlist",},
+{{0xf3, 0x0f, 0x01, 0xc6, }, 4, 0, "", "",
+"f3 0f 01 c6 \twrmsrlist",},
+{{0xf2, 0x0f, 0x38, 0xf8, 0xd0, }, 5, 0, "", "",
+"f2 0f 38 f8 d0 \turdmsr %rdx,%rax",},
+{{0x62, 0xfc, 0x7f, 0x08, 0xf8, 0xd6, }, 6, 0, "", "",
+"62 fc 7f 08 f8 d6 \turdmsr %rdx,%r22",},
+{{0xc4, 0xc7, 0x7b, 0xf8, 0xc4, 0x7f, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"c4 c7 7b f8 c4 7f 00 00 00 \turdmsr $0x7f,%r12",},
+{{0xf3, 0x0f, 0x38, 0xf8, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 f8 d0 \tuwrmsr %rax,%rdx",},
+{{0x62, 0xfc, 0x7e, 0x08, 0xf8, 0xd6, }, 6, 0, "", "",
+"62 fc 7e 08 f8 d6 \tuwrmsr %r22,%rdx",},
+{{0xc4, 0xc7, 0x7a, 0xf8, 0xc4, 0x7f, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"c4 c7 7a f8 c4 7f 00 00 00 \tuwrmsr %r12,$0x7f",},
+{{0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b1 31 \tvbcstnebf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b1 31 \tvbcstnesh2ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7a b0 31 \tvcvtneebf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 79 b0 31 \tvcvtneeph2ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 7b b0 31 \tvcvtneobf162ps (%rcx),%xmm6",},
+{{0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 5, 0, "", "",
+"c4 e2 78 b0 31 \tvcvtneoph2ps (%rcx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1 \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xf2, 0x0f, 0x01, 0xca, }, 4, 0, "erets", "indirect",
+"f2 0f 01 ca \terets",},
+{{0xf3, 0x0f, 0x01, 0xca, }, 4, 0, "eretu", "indirect",
+"f3 0f 01 ca \teretu",},
+{{0xc4, 0xe2, 0x71, 0x6c, 0xda, }, 5, 0, "", "",
+"c4 e2 71 6c da \ttcmmimfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xc4, 0xe2, 0x70, 0x6c, 0xda, }, 5, 0, "", "",
+"c4 e2 70 6c da \ttcmmrlfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xc4, 0xe2, 0x73, 0x5c, 0xda, }, 5, 0, "", "",
+"c4 e2 73 5c da \ttdpfp16ps %tmm1,%tmm2,%tmm3",},
+{{0xd5, 0x10, 0xf6, 0xc2, 0x05, }, 5, 0, "", "",
+"d5 10 f6 c2 05 \ttest $0x5,%r18b",},
+{{0xd5, 0x10, 0xf7, 0xc2, 0x05, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"d5 10 f7 c2 05 00 00 00 \ttest $0x5,%r18d",},
+{{0xd5, 0x18, 0xf7, 0xc2, 0x05, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"d5 18 f7 c2 05 00 00 00 \ttest $0x5,%r18",},
+{{0x66, 0xd5, 0x10, 0xf7, 0xc2, 0x05, 0x00, }, 7, 0, "", "",
+"66 d5 10 f7 c2 05 00 \ttest $0x5,%r18w",},
+{{0x44, 0x0f, 0xaf, 0xf0, }, 4, 0, "", "",
+"44 0f af f0 \timul %eax,%r14d",},
+{{0xd5, 0xc0, 0xaf, 0xc8, }, 4, 0, "", "",
+"d5 c0 af c8 \timul %eax,%r17d",},
+{{0xd5, 0x90, 0x62, 0x12, }, 4, 0, "", "",
+"d5 90 62 12 \tpunpckldq %mm2,(%r18)",},
+{{0xd5, 0x40, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 40 8d 00 \tlea (%rax),%r16d",},
+{{0xd5, 0x44, 0x8d, 0x38, }, 4, 0, "", "",
+"d5 44 8d 38 \tlea (%rax),%r31d",},
+{{0xd5, 0x20, 0x8d, 0x04, 0x05, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 20 8d 04 05 00 00 00 00 \tlea 0x0(,%r16,1),%eax",},
+{{0xd5, 0x22, 0x8d, 0x04, 0x3d, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 22 8d 04 3d 00 00 00 00 \tlea 0x0(,%r31,1),%eax",},
+{{0xd5, 0x10, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 10 8d 00 \tlea (%r16),%eax",},
+{{0xd5, 0x11, 0x8d, 0x07, }, 4, 0, "", "",
+"d5 11 8d 07 \tlea (%r31),%eax",},
+{{0x4c, 0x8d, 0x38, }, 3, 0, "", "",
+"4c 8d 38 \tlea (%rax),%r15",},
+{{0xd5, 0x48, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 48 8d 00 \tlea (%rax),%r16",},
+{{0x49, 0x8d, 0x07, }, 3, 0, "", "",
+"49 8d 07 \tlea (%r15),%rax",},
+{{0xd5, 0x18, 0x8d, 0x00, }, 4, 0, "", "",
+"d5 18 8d 00 \tlea (%r16),%rax",},
+{{0x4a, 0x8d, 0x04, 0x3d, 0x00, 0x00, 0x00, 0x00, }, 8, 0, "", "",
+"4a 8d 04 3d 00 00 00 00 \tlea 0x0(,%r15,1),%rax",},
+{{0xd5, 0x28, 0x8d, 0x04, 0x05, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 28 8d 04 05 00 00 00 00 \tlea 0x0(,%r16,1),%rax",},
+{{0xd5, 0x1c, 0x03, 0x00, }, 4, 0, "", "",
+"d5 1c 03 00 \tadd (%r16),%r8",},
+{{0xd5, 0x1c, 0x03, 0x38, }, 4, 0, "", "",
+"d5 1c 03 38 \tadd (%r16),%r15",},
+{{0xd5, 0x4a, 0x8b, 0x04, 0x0d, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 4a 8b 04 0d 00 00 00 00 \tmov 0x0(,%r9,1),%r16",},
+{{0xd5, 0x4a, 0x8b, 0x04, 0x35, 0x00, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 4a 8b 04 35 00 00 00 00 \tmov 0x0(,%r14,1),%r16",},
+{{0xd5, 0x4d, 0x2b, 0x3a, }, 4, 0, "", "",
+"d5 4d 2b 3a \tsub (%r10),%r31",},
+{{0xd5, 0x4d, 0x2b, 0x7d, 0x00, }, 5, 0, "", "",
+"d5 4d 2b 7d 00 \tsub 0x0(%r13),%r31",},
+{{0xd5, 0x30, 0x8d, 0x44, 0x28, 0x01, }, 6, 0, "", "",
+"d5 30 8d 44 28 01 \tlea 0x1(%r16,%r21,1),%eax",},
+{{0xd5, 0x76, 0x8d, 0x7c, 0x10, 0x01, }, 6, 0, "", "",
+"d5 76 8d 7c 10 01 \tlea 0x1(%r16,%r26,1),%r31d",},
+{{0xd5, 0x12, 0x8d, 0x84, 0x0d, 0x81, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 12 8d 84 0d 81 00 00 00 \tlea 0x81(%r21,%r9,1),%eax",},
+{{0xd5, 0x57, 0x8d, 0xbc, 0x0a, 0x81, 0x00, 0x00, 0x00, }, 9, 0, "", "",
+"d5 57 8d bc 0a 81 00 00 00 \tlea 0x81(%r26,%r9,1),%r31d",},
+{{0xd5, 0x00, 0xa1, 0xef, 0xcd, 0xab, 0x90, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "jmp", "indirect",
+"d5 00 a1 ef cd ab 90 78 56 34 12 \tjmpabs $0x1234567890abcdef",},
+{{0xd5, 0x08, 0x53, }, 3, 0, "", "",
+"d5 08 53 \tpushp %rbx",},
+{{0xd5, 0x18, 0x50, }, 3, 0, "", "",
+"d5 18 50 \tpushp %r16",},
+{{0xd5, 0x19, 0x57, }, 3, 0, "", "",
+"d5 19 57 \tpushp %r31",},
+{{0xd5, 0x19, 0x5f, }, 3, 0, "", "",
+"d5 19 5f \tpopp %r31",},
+{{0xd5, 0x18, 0x58, }, 3, 0, "", "",
+"d5 18 58 \tpopp %r16",},
+{{0xd5, 0x08, 0x5b, }, 3, 0, "", "",
+"d5 08 5b \tpopp %rbx",},
+{{0x62, 0x72, 0x34, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 34 00 f7 d2 \tbextr %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x34, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f7 94 87 23 01 00 00 \tbextr %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x84, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 84 00 f7 df \tbextr %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x84, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 84 00 f7 bc 87 23 01 00 00 \tbextr %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xd9, }, 6, 0, "", "",
+"62 da 6c 08 f3 d9 \tblsi %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xdf, }, 6, 0, "", "",
+"62 da 84 08 f3 df \tblsi %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x9c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 9c 87 23 01 00 00 \tblsi 0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x9c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 9c 87 23 01 00 00 \tblsi 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xd1, }, 6, 0, "", "",
+"62 da 6c 08 f3 d1 \tblsmsk %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xd7, }, 6, 0, "", "",
+"62 da 84 08 f3 d7 \tblsmsk %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 94 87 23 01 00 00 \tblsmsk 0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 94 87 23 01 00 00 \tblsmsk 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0xda, 0x6c, 0x08, 0xf3, 0xc9, }, 6, 0, "", "",
+"62 da 6c 08 f3 c9 \tblsr %r25d,%edx",},
+{{0x62, 0xda, 0x84, 0x08, 0xf3, 0xcf, }, 6, 0, "", "",
+"62 da 84 08 f3 cf \tblsr %r31,%r15",},
+{{0x62, 0xda, 0x34, 0x00, 0xf3, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f3 8c 87 23 01 00 00 \tblsr 0x123(%r31,%rax,4),%r25d",},
+{{0x62, 0xda, 0x84, 0x00, 0xf3, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 84 00 f3 8c 87 23 01 00 00 \tblsr 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x72, 0x34, 0x00, 0xf5, 0xd2, }, 6, 0, "", "",
+"62 72 34 00 f5 d2 \tbzhi %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x34, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 34 00 f5 94 87 23 01 00 00 \tbzhi %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x84, 0x00, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 52 84 00 f5 df \tbzhi %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x84, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 84 00 f5 bc 87 23 01 00 00 \tbzhi %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x35, 0x00, 0xe6, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e6 94 87 23 01 00 00 \tcmpbexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe6, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e6 bc 87 23 01 00 00 \tcmpbexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe2, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e2 94 87 23 01 00 00 \tcmpbxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe2, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e2 bc 87 23 01 00 00 \tcmpbxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xec, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ec 94 87 23 01 00 00 \tcmplxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xec, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ec bc 87 23 01 00 00 \tcmplxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e7 94 87 23 01 00 00 \tcmpnbexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e7 bc 87 23 01 00 00 \tcmpnbexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe3, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e3 94 87 23 01 00 00 \tcmpnbxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe3, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e3 bc 87 23 01 00 00 \tcmpnbxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xef, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ef 94 87 23 01 00 00 \tcmpnlexadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xef, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ef bc 87 23 01 00 00 \tcmpnlexadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xed, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ed 94 87 23 01 00 00 \tcmpnlxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xed, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ed bc 87 23 01 00 00 \tcmpnlxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe1, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e1 94 87 23 01 00 00 \tcmpnoxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe1, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e1 bc 87 23 01 00 00 \tcmpnoxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xeb, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 eb 94 87 23 01 00 00 \tcmpnpxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xeb, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 eb bc 87 23 01 00 00 \tcmpnpxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe9, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e9 94 87 23 01 00 00 \tcmpnsxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe9, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e9 bc 87 23 01 00 00 \tcmpnsxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e5 94 87 23 01 00 00 \tcmpnzxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e5 bc 87 23 01 00 00 \tcmpnzxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe0, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e0 94 87 23 01 00 00 \tcmpoxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe0, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e0 bc 87 23 01 00 00 \tcmpoxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xea, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 ea 94 87 23 01 00 00 \tcmppxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xea, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 ea bc 87 23 01 00 00 \tcmppxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe8, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e8 94 87 23 01 00 00 \tcmpsxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e8 bc 87 23 01 00 00 \tcmpsxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x35, 0x00, 0xe4, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 e4 94 87 23 01 00 00 \tcmpzxadd %r25d,%edx,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x85, 0x00, 0xe4, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 e4 bc 87 23 01 00 00 \tcmpzxadd %r31,%r15,0x123(%r31,%rax,4)",},
+{{0x62, 0xcc, 0xfc, 0x08, 0xf1, 0xf7, }, 6, 0, "", "",
+"62 cc fc 08 f1 f7 \tcrc32 %r31,%r22",},
+{{0x62, 0xcc, 0xfc, 0x08, 0xf1, 0x37, }, 6, 0, "", "",
+"62 cc fc 08 f1 37 \tcrc32q (%r31),%r22",},
+{{0x62, 0xec, 0xfc, 0x08, 0xf0, 0xcb, }, 6, 0, "", "",
+"62 ec fc 08 f0 cb \tcrc32 %r19b,%r17",},
+{{0x62, 0xec, 0x7c, 0x08, 0xf0, 0xeb, }, 6, 0, "", "",
+"62 ec 7c 08 f0 eb \tcrc32 %r19b,%r21d",},
+{{0x62, 0xfc, 0x7c, 0x08, 0xf0, 0x1b, }, 6, 0, "", "",
+"62 fc 7c 08 f0 1b \tcrc32b (%r19),%ebx",},
+{{0x62, 0xcc, 0x7c, 0x08, 0xf1, 0xff, }, 6, 0, "", "",
+"62 cc 7c 08 f1 ff \tcrc32 %r31d,%r23d",},
+{{0x62, 0xcc, 0x7c, 0x08, 0xf1, 0x3f, }, 6, 0, "", "",
+"62 cc 7c 08 f1 3f \tcrc32l (%r31),%r23d",},
+{{0x62, 0xcc, 0x7d, 0x08, 0xf1, 0xef, }, 6, 0, "", "",
+"62 cc 7d 08 f1 ef \tcrc32 %r31w,%r21d",},
+{{0x62, 0xcc, 0x7d, 0x08, 0xf1, 0x2f, }, 6, 0, "", "",
+"62 cc 7d 08 f1 2f \tcrc32w (%r31),%r21d",},
+{{0x62, 0xe4, 0xfc, 0x08, 0xf1, 0xd0, }, 6, 0, "", "",
+"62 e4 fc 08 f1 d0 \tcrc32 %rax,%r18",},
+{{0x67, 0x62, 0x4c, 0x7f, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7f 08 f8 8c 87 23 01 00 00 \tenqcmd 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7f, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7f 08 f8 bc 87 23 01 00 00 \tenqcmd 0x123(%r31,%rax,4),%r31",},
+{{0x67, 0x62, 0x4c, 0x7e, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7e 08 f8 8c 87 23 01 00 00 \tenqcmds 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f8 bc 87 23 01 00 00 \tenqcmds 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf0, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f0 bc 87 23 01 00 00 \tinvept 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf2, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f2 bc 87 23 01 00 00 \tinvpcid 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7e, 0x08, 0xf1, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7e 08 f1 bc 87 23 01 00 00 \tinvvpid 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x61, 0x7d, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7d 08 93 cd \tkmovb %k5,%r25d",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7d 08 91 ac 87 23 01 00 00 \tkmovb %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7d 08 92 e9 \tkmovb %r25d,%k5",},
+{{0x62, 0xd9, 0x7d, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7d 08 90 ac 87 23 01 00 00 \tkmovb 0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0x7f, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7f 08 93 cd \tkmovd %k5,%r25d",},
+{{0x62, 0xd9, 0xfd, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fd 08 91 ac 87 23 01 00 00 \tkmovd %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7f, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7f 08 92 e9 \tkmovd %r25d,%k5",},
+{{0x62, 0xd9, 0xfd, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fd 08 90 ac 87 23 01 00 00 \tkmovd 0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0xff, 0x08, 0x93, 0xfd, }, 6, 0, "", "",
+"62 61 ff 08 93 fd \tkmovq %k5,%r31",},
+{{0x62, 0xd9, 0xfc, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fc 08 91 ac 87 23 01 00 00 \tkmovq %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0xff, 0x08, 0x92, 0xef, }, 6, 0, "", "",
+"62 d9 ff 08 92 ef \tkmovq %r31,%k5",},
+{{0x62, 0xd9, 0xfc, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 fc 08 90 ac 87 23 01 00 00 \tkmovq 0x123(%r31,%rax,4),%k5",},
+{{0x62, 0x61, 0x7c, 0x08, 0x93, 0xcd, }, 6, 0, "", "",
+"62 61 7c 08 93 cd \tkmovw %k5,%r25d",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x91, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7c 08 91 ac 87 23 01 00 00 \tkmovw %k5,0x123(%r31,%rax,4)",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x92, 0xe9, }, 6, 0, "", "",
+"62 d9 7c 08 92 e9 \tkmovw %r25d,%k5",},
+{{0x62, 0xd9, 0x7c, 0x08, 0x90, 0xac, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d9 7c 08 90 ac 87 23 01 00 00 \tkmovw 0x123(%r31,%rax,4),%k5",},
+{{0x62, 0xda, 0x7c, 0x08, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7c 08 49 84 87 23 01 00 00 \tldtilecfg 0x123(%r31,%rax,4)",},
+{{0x62, 0xfc, 0x7d, 0x08, 0x60, 0xc2, }, 6, 0, "", "",
+"62 fc 7d 08 60 c2 \tmovbe %r18w,%ax",},
+{{0x62, 0xd4, 0x7d, 0x08, 0x60, 0xc7, }, 6, 0, "", "",
+"62 d4 7d 08 60 c7 \tmovbe %r15w,%ax",},
+{{0x62, 0xec, 0x7d, 0x08, 0x61, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 ec 7d 08 61 94 80 23 01 00 00 \tmovbe %r18w,0x123(%r16,%rax,4)",},
+{{0x62, 0xcc, 0x7d, 0x08, 0x61, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 cc 7d 08 61 94 87 23 01 00 00 \tmovbe %r18w,0x123(%r31,%rax,4)",},
+{{0x62, 0xdc, 0x7c, 0x08, 0x60, 0xd1, }, 6, 0, "", "",
+"62 dc 7c 08 60 d1 \tmovbe %r25d,%edx",},
+{{0x62, 0xd4, 0x7c, 0x08, 0x60, 0xd7, }, 6, 0, "", "",
+"62 d4 7c 08 60 d7 \tmovbe %r15d,%edx",},
+{{0x62, 0x6c, 0x7c, 0x08, 0x61, 0x8c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c 7c 08 61 8c 80 23 01 00 00 \tmovbe %r25d,0x123(%r16,%rax,4)",},
+{{0x62, 0x5c, 0xfc, 0x08, 0x60, 0xff, }, 6, 0, "", "",
+"62 5c fc 08 60 ff \tmovbe %r31,%r15",},
+{{0x62, 0x54, 0xfc, 0x08, 0x60, 0xf8, }, 6, 0, "", "",
+"62 54 fc 08 60 f8 \tmovbe %r8,%r15",},
+{{0x62, 0x6c, 0xfc, 0x08, 0x61, 0xbc, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c fc 08 61 bc 80 23 01 00 00 \tmovbe %r31,0x123(%r16,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0x61, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 61 bc 87 23 01 00 00 \tmovbe %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x6c, 0xfc, 0x08, 0x60, 0xbc, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 6c fc 08 60 bc 80 23 01 00 00 \tmovbe 0x123(%r16,%rax,4),%r31",},
+{{0x62, 0xcc, 0x7d, 0x08, 0x60, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 cc 7d 08 60 94 87 23 01 00 00 \tmovbe 0x123(%r31,%rax,4),%r18w",},
+{{0x62, 0x4c, 0x7c, 0x08, 0x60, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 60 8c 87 23 01 00 00 \tmovbe 0x123(%r31,%rax,4),%r25d",},
+{{0x67, 0x62, 0x4c, 0x7d, 0x08, 0xf8, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 12, 0, "", "",
+"67 62 4c 7d 08 f8 8c 87 23 01 00 00 \tmovdir64b 0x123(%r31d,%eax,4),%r25d",},
+{{0x62, 0x4c, 0x7d, 0x08, 0xf8, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7d 08 f8 bc 87 23 01 00 00 \tmovdir64b 0x123(%r31,%rax,4),%r31",},
+{{0x62, 0x4c, 0x7c, 0x08, 0xf9, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 f9 8c 87 23 01 00 00 \tmovdiri %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0xf9, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 f9 bc 87 23 01 00 00 \tmovdiri %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x5a, 0x6f, 0x08, 0xf5, 0xd1, }, 6, 0, "", "",
+"62 5a 6f 08 f5 d1 \tpdep %r25d,%edx,%r10d",},
+{{0x62, 0x5a, 0x87, 0x08, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 5a 87 08 f5 df \tpdep %r31,%r15,%r11",},
+{{0x62, 0xda, 0x37, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 37 00 f5 94 87 23 01 00 00 \tpdep 0x123(%r31,%rax,4),%r25d,%edx",},
+{{0x62, 0x5a, 0x87, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 87 00 f5 bc 87 23 01 00 00 \tpdep 0x123(%r31,%rax,4),%r31,%r15",},
+{{0x62, 0x5a, 0x6e, 0x08, 0xf5, 0xd1, }, 6, 0, "", "",
+"62 5a 6e 08 f5 d1 \tpext %r25d,%edx,%r10d",},
+{{0x62, 0x5a, 0x86, 0x08, 0xf5, 0xdf, }, 6, 0, "", "",
+"62 5a 86 08 f5 df \tpext %r31,%r15,%r11",},
+{{0x62, 0xda, 0x36, 0x00, 0xf5, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 36 00 f5 94 87 23 01 00 00 \tpext 0x123(%r31,%rax,4),%r25d,%edx",},
+{{0x62, 0x5a, 0x86, 0x00, 0xf5, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 86 00 f5 bc 87 23 01 00 00 \tpext 0x123(%r31,%rax,4),%r31,%r15",},
+{{0x62, 0x72, 0x35, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 35 00 f7 d2 \tshlx %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x35, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 35 00 f7 94 87 23 01 00 00 \tshlx %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x85, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 85 00 f7 df \tshlx %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x85, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 85 00 f7 bc 87 23 01 00 00 \tshlx %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0x72, 0x37, 0x00, 0xf7, 0xd2, }, 6, 0, "", "",
+"62 72 37 00 f7 d2 \tshrx %r25d,%edx,%r10d",},
+{{0x62, 0xda, 0x37, 0x00, 0xf7, 0x94, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 37 00 f7 94 87 23 01 00 00 \tshrx %r25d,0x123(%r31,%rax,4),%edx",},
+{{0x62, 0x52, 0x87, 0x00, 0xf7, 0xdf, }, 6, 0, "", "",
+"62 52 87 00 f7 df \tshrx %r31,%r15,%r11",},
+{{0x62, 0x5a, 0x87, 0x00, 0xf7, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 5a 87 00 f7 bc 87 23 01 00 00 \tshrx %r31,0x123(%r31,%rax,4),%r15",},
+{{0x62, 0xda, 0x7d, 0x08, 0x49, 0x84, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7d 08 49 84 87 23 01 00 00 \tsttilecfg 0x123(%r31,%rax,4)",},
+{{0x62, 0xda, 0x7f, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7f 08 4b b4 87 23 01 00 00 \ttileloadd 0x123(%r31,%rax,4),%tmm6",},
+{{0x62, 0xda, 0x7d, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7d 08 4b b4 87 23 01 00 00 \ttileloaddt1 0x123(%r31,%rax,4),%tmm6",},
+{{0x62, 0xda, 0x7e, 0x08, 0x4b, 0xb4, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 da 7e 08 4b b4 87 23 01 00 00 \ttilestored %tmm6,0x123(%r31,%rax,4)",},
+{{0x62, 0xfa, 0x7d, 0x28, 0x1a, 0x18, }, 6, 0, "", "",
+"62 fa 7d 28 1a 18 \tvbroadcastf32x4 (%r16),%ymm3",},
+{{0x62, 0xfa, 0x7d, 0x28, 0x5a, 0x18, }, 6, 0, "", "",
+"62 fa 7d 28 5a 18 \tvbroadcasti32x4 (%r16),%ymm3",},
+{{0x62, 0xfb, 0x7d, 0x28, 0x19, 0x18, 0x01, }, 7, 0, "", "",
+"62 fb 7d 28 19 18 01 \tvextractf32x4 $0x1,%ymm3,(%r16)",},
+{{0x62, 0xfb, 0x7d, 0x28, 0x39, 0x18, 0x01, }, 7, 0, "", "",
+"62 fb 7d 28 39 18 01 \tvextracti32x4 $0x1,%ymm3,(%r16)",},
+{{0x62, 0x7b, 0x65, 0x28, 0x18, 0x00, 0x01, }, 7, 0, "", "",
+"62 7b 65 28 18 00 01 \tvinsertf32x4 $0x1,(%r16),%ymm3,%ymm8",},
+{{0x62, 0x7b, 0x65, 0x28, 0x38, 0x00, 0x01, }, 7, 0, "", "",
+"62 7b 65 28 38 00 01 \tvinserti32x4 $0x1,(%r16),%ymm3,%ymm8",},
+{{0x62, 0xdb, 0xfd, 0x08, 0x09, 0x30, 0x01, }, 7, 0, "", "",
+"62 db fd 08 09 30 01 \tvrndscalepd $0x1,(%r24),%xmm6",},
+{{0x62, 0xdb, 0x7d, 0x08, 0x08, 0x30, 0x02, }, 7, 0, "", "",
+"62 db 7d 08 08 30 02 \tvrndscaleps $0x2,(%r24),%xmm6",},
+{{0x62, 0xdb, 0xcd, 0x08, 0x0b, 0x18, 0x03, }, 7, 0, "", "",
+"62 db cd 08 0b 18 03 \tvrndscalesd $0x3,(%r24),%xmm6,%xmm3",},
+{{0x62, 0xdb, 0x4d, 0x08, 0x0a, 0x18, 0x04, }, 7, 0, "", "",
+"62 db 4d 08 0a 18 04 \tvrndscaless $0x4,(%r24),%xmm6,%xmm3",},
+{{0x62, 0x4c, 0x7c, 0x08, 0x66, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7c 08 66 8c 87 23 01 00 00 \twrssd %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfc, 0x08, 0x66, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fc 08 66 bc 87 23 01 00 00 \twrssq %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0x7d, 0x08, 0x65, 0x8c, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c 7d 08 65 8c 87 23 01 00 00 \twrussd %r25d,0x123(%r31,%rax,4)",},
+{{0x62, 0x4c, 0xfd, 0x08, 0x65, 0xbc, 0x87, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 4c fd 08 65 bc 87 23 01 00 00 \twrussq %r31,0x123(%r31,%rax,4)",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xd0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 d0 34 12 \tadc $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x10, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 10 f9 \tadc %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x11, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 11 38 \tadc %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x12, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 12 04 07 \tadc (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x13, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 13 04 07 \tadc (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x14, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 14 83 11 \tadc $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0x54, 0x6d, 0x10, 0x66, 0xc7, }, 6, 0, "", "",
+"62 54 6d 10 66 c7 \tadcx %r15d,%r8d,%r18d",},
+{{0x62, 0x14, 0xf9, 0x08, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 f9 08 66 04 3f \tadcx (%r15,%r31,1),%r8",},
+{{0x62, 0x14, 0x69, 0x10, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 69 10 66 04 3f \tadcx (%r15,%r31,1),%r8d,%r18d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xc0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 c0 34 12 \tadd $0x1234,%ax,%r30w",},
+{{0x62, 0xd4, 0xfc, 0x10, 0x81, 0xc7, 0x33, 0x44, 0x34, 0x12, }, 10, 0, "", "",
+"62 d4 fc 10 81 c7 33 44 34 12 \tadd $0x12344433,%r15,%r16",},
+{{0x62, 0xd4, 0x74, 0x10, 0x80, 0xc5, 0x34, }, 7, 0, "", "",
+"62 d4 74 10 80 c5 34 \tadd $0x34,%r13b,%r17b",},
+{{0x62, 0xf4, 0xbc, 0x18, 0x81, 0xc0, 0x11, 0x22, 0x33, 0xf4, }, 10, 0, "", "",
+"62 f4 bc 18 81 c0 11 22 33 f4 \tadd $0xfffffffff4332211,%rax,%r8",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8 \tadd %r31,%r8,%r16",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0x38, }, 6, 0, "", "",
+"62 44 fc 10 01 38 \tadd %r31,(%r8),%r16",},
+{{0x62, 0x44, 0xf8, 0x10, 0x01, 0x3c, 0xc0, }, 7, 0, "", "",
+"62 44 f8 10 01 3c c0 \tadd %r31,(%r8,%r16,8),%r16",},
+{{0x62, 0x44, 0x7c, 0x10, 0x00, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 00 f8 \tadd %r31b,%r8b,%r16b",},
+{{0x62, 0x44, 0x7c, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 01 f8 \tadd %r31d,%r8d,%r16d",},
+{{0x62, 0x44, 0x7d, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7d 10 01 f8 \tadd %r31w,%r8w,%r16w",},
+{{0x62, 0x5c, 0xfc, 0x10, 0x03, 0x07, }, 6, 0, "", "",
+"62 5c fc 10 03 07 \tadd (%r31),%r8,%r16",},
+{{0x62, 0x5c, 0xf8, 0x10, 0x03, 0x84, 0x07, 0x90, 0x90, 0x00, 0x00, }, 11, 0, "", "",
+"62 5c f8 10 03 84 07 90 90 00 00 \tadd 0x9090(%r31,%r16,1),%r8,%r16",},
+{{0x62, 0x44, 0x7c, 0x10, 0x00, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 00 f8 \tadd %r31b,%r8b,%r16b",},
+{{0x62, 0x44, 0x7c, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7c 10 01 f8 \tadd %r31d,%r8d,%r16d",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x04, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 04 83 11 \tadd $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8 \tadd %r31,%r8,%r16",},
+{{0x62, 0xd4, 0xfc, 0x10, 0x81, 0x04, 0x8f, 0x33, 0x44, 0x34, 0x12, }, 11, 0, "", "",
+"62 d4 fc 10 81 04 8f 33 44 34 12 \tadd $0x12344433,(%r15,%rcx,4),%r16",},
+{{0x62, 0x44, 0x7d, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 7d 10 01 f8 \tadd %r31w,%r8w,%r16w",},
+{{0x62, 0x54, 0x6e, 0x10, 0x66, 0xc7, }, 6, 0, "", "",
+"62 54 6e 10 66 c7 \tadox %r15d,%r8d,%r18d",},
+{{0x62, 0x5c, 0xfc, 0x10, 0x03, 0xc7, }, 6, 0, "", "",
+"62 5c fc 10 03 c7 \tadd %r31,%r8,%r16",},
+{{0x62, 0x44, 0xfc, 0x10, 0x01, 0xf8, }, 6, 0, "", "",
+"62 44 fc 10 01 f8 \tadd %r31,%r8,%r16",},
+{{0x62, 0x14, 0xfa, 0x08, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 fa 08 66 04 3f \tadox (%r15,%r31,1),%r8",},
+{{0x62, 0x14, 0x6a, 0x10, 0x66, 0x04, 0x3f, }, 7, 0, "", "",
+"62 14 6a 10 66 04 3f \tadox (%r15,%r31,1),%r8d,%r18d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xe0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 e0 34 12 \tand $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x20, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 20 f9 \tand %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x21, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 21 38 \tand %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x22, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 22 04 07 \tand (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x23, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 23 04 07 \tand (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x24, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 24 83 11 \tand $0x11,(%r19,%rax,4),%r20d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x47, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 47 90 90 90 90 90 \tcmova -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x43, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 43 90 90 90 90 90 \tcmovae -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x42, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 42 90 90 90 90 90 \tcmovb -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x46, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 46 90 90 90 90 90 \tcmovbe -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x44, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 44 90 90 90 90 90 \tcmove -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4f, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4f 90 90 90 90 90 \tcmovg -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4d, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4d 90 90 90 90 90 \tcmovge -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4c, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4c 90 90 90 90 90 \tcmovl -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4e, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4e 90 90 90 90 90 \tcmovle -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x45, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 45 90 90 90 90 90 \tcmovne -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x41, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 41 90 90 90 90 90 \tcmovno -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4b, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4b 90 90 90 90 90 \tcmovnp -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x49, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 49 90 90 90 90 90 \tcmovns -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x40, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 40 90 90 90 90 90 \tcmovo -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x4a, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 4a 90 90 90 90 90 \tcmovp -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0x48, 0x90, 0x90, 0x90, 0x90, 0x90, }, 11, 0, "", "",
+"67 62 f4 3c 18 48 90 90 90 90 90 \tcmovs -0x6f6f6f70(%eax),%edx,%r8d",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xff, 0xc8, }, 6, 0, "", "",
+"62 f4 f4 10 ff c8 \tdec %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xfe, 0x0c, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 fe 0c 27 \tdec (%r31,%r12,1),%r8b",},
+{{0x62, 0xb4, 0xb0, 0x10, 0xaf, 0x94, 0xf8, 0x09, 0x09, 0x00, 0x00, }, 11, 0, "", "",
+"62 b4 b0 10 af 94 f8 09 09 00 00 \timul 0x909(%rax,%r31,8),%rdx,%r25",},
+{{0x67, 0x62, 0xf4, 0x3c, 0x18, 0xaf, 0x90, 0x09, 0x09, 0x09, 0x00, }, 11, 0, "", "",
+"67 62 f4 3c 18 af 90 09 09 09 00 \timul 0x90909(%eax),%edx,%r8d",},
+{{0x62, 0xdc, 0xfc, 0x10, 0xff, 0xc7, }, 6, 0, "", "",
+"62 dc fc 10 ff c7 \tinc %r31,%r16",},
+{{0x62, 0xdc, 0xbc, 0x18, 0xff, 0xc7, }, 6, 0, "", "",
+"62 dc bc 18 ff c7 \tinc %r31,%r8",},
+{{0x62, 0xf4, 0xe4, 0x18, 0xff, 0xc0, }, 6, 0, "", "",
+"62 f4 e4 18 ff c0 \tinc %rax,%rbx",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xf7, 0xd8, }, 6, 0, "", "",
+"62 f4 f4 10 f7 d8 \tneg %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xf6, 0x1c, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 f6 1c 27 \tneg (%r31,%r12,1),%r8b",},
+{{0x62, 0xf4, 0xf4, 0x10, 0xf7, 0xd0, }, 6, 0, "", "",
+"62 f4 f4 10 f7 d0 \tnot %rax,%r17",},
+{{0x62, 0x9c, 0x3c, 0x18, 0xf6, 0x14, 0x27, }, 7, 0, "", "",
+"62 9c 3c 18 f6 14 27 \tnot (%r31,%r12,1),%r8b",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xc8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 c8 34 12 \tor $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x08, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 08 f9 \tor %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x09, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 09 38 \tor %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x0a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 0a 04 07 \tor (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x0b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 0b 04 07 \tor (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x0c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 0c 83 11 \tor $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xd4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 d4 02 \trcl $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xd0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 d0 \trcl %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x10, }, 6, 0, "", "",
+"62 f4 04 10 d0 10 \trcl $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x10, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 10 02 \trcl $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x10, }, 6, 0, "", "",
+"62 f4 05 10 d1 10 \trcl $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x14, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 14 83 \trcl %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xdc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 dc 02 \trcr $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xd8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 d8 \trcr %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x18, }, 6, 0, "", "",
+"62 f4 04 10 d0 18 \trcr $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x18, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 18 02 \trcr $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x18, }, 6, 0, "", "",
+"62 f4 05 10 d1 18 \trcr $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x1c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 1c 83 \trcr %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xc4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 c4 02 \trol $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xc0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 c0 \trol %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x00, }, 6, 0, "", "",
+"62 f4 04 10 d0 00 \trol $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x00, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 00 02 \trol $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x00, }, 6, 0, "", "",
+"62 f4 05 10 d1 00 \trol $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x04, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 04 83 \trol %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xcc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 cc 02 \tror $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xc8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 c8 \tror %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x08, }, 6, 0, "", "",
+"62 f4 04 10 d0 08 \tror $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x08, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 08 02 \tror $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x08, }, 6, 0, "", "",
+"62 f4 05 10 d1 08 \tror $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x0c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 0c 83 \tror %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xfc, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 fc 02 \tsar $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xf8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 f8 \tsar %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x38, }, 6, 0, "", "",
+"62 f4 04 10 d0 38 \tsar $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x38, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 38 02 \tsar $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x38, }, 6, 0, "", "",
+"62 f4 05 10 d1 38 \tsar $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x3c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 3c 83 \tsar %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xd8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 d8 34 12 \tsbb $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x18, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 18 f9 \tsbb %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x19, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 19 38 \tsbb %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x1a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 1a 04 07 \tsbb (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x1b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 1b 04 07 \tsbb (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x1c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 1c 83 11 \tsbb $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xe4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 e4 02 \tshl $0x2,%r12b,%r31b",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xe4, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 e4 02 \tshl $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e0 \tshl %cl,%r16b,%r8b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe0, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e0 \tshl %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x20, }, 6, 0, "", "",
+"62 f4 04 10 d0 20 \tshl $1,(%rax),%r31b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x20, }, 6, 0, "", "",
+"62 f4 04 10 d0 20 \tshl $1,(%rax),%r31b",},
+{{0x62, 0x74, 0x84, 0x10, 0x24, 0x20, 0x01, }, 7, 0, "", "",
+"62 74 84 10 24 20 01 \tshld $0x1,%r12,(%rax),%r31",},
+{{0x62, 0x74, 0x04, 0x10, 0x24, 0x38, 0x02, }, 7, 0, "", "",
+"62 74 04 10 24 38 02 \tshld $0x2,%r15d,(%rax),%r31d",},
+{{0x62, 0x54, 0x05, 0x10, 0x24, 0xc4, 0x02, }, 7, 0, "", "",
+"62 54 05 10 24 c4 02 \tshld $0x2,%r8w,%r12w,%r31w",},
+{{0x62, 0x7c, 0xbc, 0x18, 0xa5, 0xe0, }, 6, 0, "", "",
+"62 7c bc 18 a5 e0 \tshld %cl,%r12,%r16,%r8",},
+{{0x62, 0x7c, 0x05, 0x10, 0xa5, 0x2c, 0x83, }, 7, 0, "", "",
+"62 7c 05 10 a5 2c 83 \tshld %cl,%r13w,(%r19,%rax,4),%r31w",},
+{{0x62, 0x74, 0x05, 0x10, 0xa5, 0x08, }, 6, 0, "", "",
+"62 74 05 10 a5 08 \tshld %cl,%r9w,(%rax),%r31w",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x20, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 20 02 \tshl $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x20, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 20 02 \tshl $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x20, }, 6, 0, "", "",
+"62 f4 05 10 d1 20 \tshl $1,(%rax),%r31w",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x20, }, 6, 0, "", "",
+"62 f4 05 10 d1 20 \tshl $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x24, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 24 83 \tshl %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x24, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 24 83 \tshl %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xd4, 0x04, 0x10, 0xc0, 0xec, 0x02, }, 7, 0, "", "",
+"62 d4 04 10 c0 ec 02 \tshr $0x2,%r12b,%r31b",},
+{{0x62, 0xfc, 0x3c, 0x18, 0xd2, 0xe8, }, 6, 0, "", "",
+"62 fc 3c 18 d2 e8 \tshr %cl,%r16b,%r8b",},
+{{0x62, 0xf4, 0x04, 0x10, 0xd0, 0x28, }, 6, 0, "", "",
+"62 f4 04 10 d0 28 \tshr $1,(%rax),%r31b",},
+{{0x62, 0x74, 0x84, 0x10, 0x2c, 0x20, 0x01, }, 7, 0, "", "",
+"62 74 84 10 2c 20 01 \tshrd $0x1,%r12,(%rax),%r31",},
+{{0x62, 0x74, 0x04, 0x10, 0x2c, 0x38, 0x02, }, 7, 0, "", "",
+"62 74 04 10 2c 38 02 \tshrd $0x2,%r15d,(%rax),%r31d",},
+{{0x62, 0x54, 0x05, 0x10, 0x2c, 0xc4, 0x02, }, 7, 0, "", "",
+"62 54 05 10 2c c4 02 \tshrd $0x2,%r8w,%r12w,%r31w",},
+{{0x62, 0x7c, 0xbc, 0x18, 0xad, 0xe0, }, 6, 0, "", "",
+"62 7c bc 18 ad e0 \tshrd %cl,%r12,%r16,%r8",},
+{{0x62, 0x7c, 0x05, 0x10, 0xad, 0x2c, 0x83, }, 7, 0, "", "",
+"62 7c 05 10 ad 2c 83 \tshrd %cl,%r13w,(%r19,%rax,4),%r31w",},
+{{0x62, 0x74, 0x05, 0x10, 0xad, 0x08, }, 6, 0, "", "",
+"62 74 05 10 ad 08 \tshrd %cl,%r9w,(%rax),%r31w",},
+{{0x62, 0xf4, 0x04, 0x10, 0xc1, 0x28, 0x02, }, 7, 0, "", "",
+"62 f4 04 10 c1 28 02 \tshr $0x2,(%rax),%r31d",},
+{{0x62, 0xf4, 0x05, 0x10, 0xd1, 0x28, }, 6, 0, "", "",
+"62 f4 05 10 d1 28 \tshr $1,(%rax),%r31w",},
+{{0x62, 0xfc, 0x05, 0x10, 0xd3, 0x2c, 0x83, }, 7, 0, "", "",
+"62 fc 05 10 d3 2c 83 \tshr %cl,(%r19,%rax,4),%r31w",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xe8, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 e8 34 12 \tsub $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x28, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 28 f9 \tsub %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x29, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 29 38 \tsub %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x2a, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 2a 04 07 \tsub (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x2b, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 2b 04 07 \tsub (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x2c, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 2c 83 11 \tsub $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xf4, 0x0d, 0x10, 0x81, 0xf0, 0x34, 0x12, }, 8, 0, "", "",
+"62 f4 0d 10 81 f0 34 12 \txor $0x1234,%ax,%r30w",},
+{{0x62, 0x7c, 0x6c, 0x10, 0x30, 0xf9, }, 6, 0, "", "",
+"62 7c 6c 10 30 f9 \txor %r15b,%r17b,%r18b",},
+{{0x62, 0x54, 0x6c, 0x10, 0x31, 0x38, }, 6, 0, "", "",
+"62 54 6c 10 31 38 \txor %r15d,(%r8),%r18d",},
+{{0x62, 0xc4, 0x3c, 0x18, 0x32, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3c 18 32 04 07 \txor (%r15,%rax,1),%r16b,%r8b",},
+{{0x62, 0xc4, 0x3d, 0x18, 0x33, 0x04, 0x07, }, 7, 0, "", "",
+"62 c4 3d 18 33 04 07 \txor (%r15,%rax,1),%r16w,%r8w",},
+{{0x62, 0xfc, 0x5c, 0x10, 0x83, 0x34, 0x83, 0x11, }, 8, 0, "", "",
+"62 fc 5c 10 83 34 83 11 \txor $0x11,(%r19,%rax,4),%r20d",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x00, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 00 da \t{nf} add %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x01, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 01 d0 \t{nf} add %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x02, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 02 9c 80 23 01 00 00 \t{nf} add 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x03, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 03 94 80 23 01 00 00 \t{nf} add 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x08, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 08 da \t{nf} or %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x09, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 09 d0 \t{nf} or %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x0a, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 0a 9c 80 23 01 00 00 \t{nf} or 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x0b, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 0b 94 80 23 01 00 00 \t{nf} or 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x20, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 20 da \t{nf} and %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x21, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 21 d0 \t{nf} and %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x22, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 22 9c 80 23 01 00 00 \t{nf} and 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x23, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 23 94 80 23 01 00 00 \t{nf} and 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x24, 0xd0, 0x7b, }, 7, 0, "", "",
+"62 f4 35 1c 24 d0 7b \t{nf} shld $0x7b,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x28, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 28 da \t{nf} sub %bl,%dl,%r8b",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x29, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c 29 d0 \t{nf} sub %dx,%ax,%r9w",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x2a, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 2a 9c 80 23 01 00 00 \t{nf} sub 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x2b, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 2b 94 80 23 01 00 00 \t{nf} sub 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0xf4, 0x35, 0x1c, 0x2c, 0xd0, 0x7b, }, 7, 0, "", "",
+"62 f4 35 1c 2c d0 7b \t{nf} shrd $0x7b,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x3c, 0x1c, 0x30, 0xda, }, 6, 0, "", "",
+"62 f4 3c 1c 30 da \t{nf} xor %bl,%dl,%r8b",},
+{{0x62, 0x4c, 0xfc, 0x0c, 0x31, 0xff, }, 6, 0, "", "",
+"62 4c fc 0c 31 ff \t{nf} xor %r31,%r31",},
+{{0x62, 0xd4, 0x6c, 0x1c, 0x32, 0x9c, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 6c 1c 32 9c 80 23 01 00 00 \t{nf} xor 0x123(%r8,%rax,4),%bl,%dl",},
+{{0x62, 0xd4, 0x7d, 0x1c, 0x33, 0x94, 0x80, 0x23, 0x01, 0x00, 0x00, }, 11, 0, "", "",
+"62 d4 7d 1c 33 94 80 23 01 00 00 \t{nf} xor 0x123(%r8,%rax,4),%dx,%ax",},
+{{0x62, 0x54, 0xfc, 0x0c, 0x69, 0xf9, 0x90, 0xff, 0x00, 0x00, }, 10, 0, "", "",
+"62 54 fc 0c 69 f9 90 ff 00 00 \t{nf} imul $0xff90,%r9,%r15",},
+{{0x62, 0x54, 0xfc, 0x0c, 0x6b, 0xf9, 0x7b, }, 7, 0, "", "",
+"62 54 fc 0c 6b f9 7b \t{nf} imul $0x7b,%r9,%r15",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0x80, 0xf3, 0x7b, }, 7, 0, "", "",
+"62 f4 6c 1c 80 f3 7b \t{nf} xor $0x7b,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0x83, 0xf2, 0x7b, }, 7, 0, "", "",
+"62 f4 7d 1c 83 f2 7b \t{nf} xor $0x7b,%dx,%ax",},
+{{0x62, 0x44, 0xfc, 0x0c, 0x88, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c 88 f9 \t{nf} popcnt %r9,%r31",},
+{{0x62, 0xf4, 0x35, 0x1c, 0xa5, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c a5 d0 \t{nf} shld %cl,%dx,%ax,%r9w",},
+{{0x62, 0xf4, 0x35, 0x1c, 0xad, 0xd0, }, 6, 0, "", "",
+"62 f4 35 1c ad d0 \t{nf} shrd %cl,%dx,%ax,%r9w",},
+{{0x62, 0x44, 0xa4, 0x1c, 0xaf, 0xf9, }, 6, 0, "", "",
+"62 44 a4 1c af f9 \t{nf} imul %r9,%r31,%r11",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xc0, 0xfb, 0x7b, }, 7, 0, "", "",
+"62 f4 6c 1c c0 fb 7b \t{nf} sar $0x7b,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xc1, 0xfa, 0x7b, }, 7, 0, "", "",
+"62 f4 7d 1c c1 fa 7b \t{nf} sar $0x7b,%dx,%ax",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xd0, 0xfb, }, 6, 0, "", "",
+"62 f4 6c 1c d0 fb \t{nf} sar $1,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xd1, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 1c d1 fa \t{nf} sar $1,%dx,%ax",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xd2, 0xfb, }, 6, 0, "", "",
+"62 f4 6c 1c d2 fb \t{nf} sar %cl,%bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xd3, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 1c d3 fa \t{nf} sar %cl,%dx,%ax",},
+{{0x62, 0x52, 0x84, 0x04, 0xf2, 0xd9, }, 6, 0, "", "",
+"62 52 84 04 f2 d9 \t{nf} andn %r9,%r31,%r11",},
+{{0x62, 0xd2, 0x84, 0x04, 0xf3, 0xd9, }, 6, 0, "", "",
+"62 d2 84 04 f3 d9 \t{nf} blsi %r9,%r31",},
+{{0x62, 0x44, 0xfc, 0x0c, 0xf4, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c f4 f9 \t{nf} tzcnt %r9,%r31",},
+{{0x62, 0x44, 0xfc, 0x0c, 0xf5, 0xf9, }, 6, 0, "", "",
+"62 44 fc 0c f5 f9 \t{nf} lzcnt %r9,%r31",},
+{{0x62, 0xf4, 0x7c, 0x0c, 0xf6, 0xfb, }, 6, 0, "", "",
+"62 f4 7c 0c f6 fb \t{nf} idiv %bl",},
+{{0x62, 0xf4, 0x7d, 0x0c, 0xf7, 0xfa, }, 6, 0, "", "",
+"62 f4 7d 0c f7 fa \t{nf} idiv %dx",},
+{{0x62, 0xf4, 0x6c, 0x1c, 0xfe, 0xcb, }, 6, 0, "", "",
+"62 f4 6c 1c fe cb \t{nf} dec %bl,%dl",},
+{{0x62, 0xf4, 0x7d, 0x1c, 0xff, 0xca, }, 6, 0, "", "",
+"62 f4 7d 1c ff ca \t{nf} dec %dx,%ax",},
+{{0xf3, 0x0f, 0x38, 0xdc, 0xd1, }, 5, 0, "", "",
+"f3 0f 38 dc d1 \tloadiwkey %xmm1,%xmm2",},
+{{0xf3, 0x0f, 0x38, 0xfa, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fa d0 \tencodekey128 %eax,%edx",},
+{{0xf3, 0x0f, 0x38, 0xfb, 0xd0, }, 5, 0, "", "",
+"f3 0f 38 fb d0 \tencodekey256 %eax,%edx",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdc, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 dc 5a 77 \taesenc128kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xde, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 de 5a 77 \taesenc256kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdd, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 dd 5a 77 \taesdec128kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xdf, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 df 5a 77 \taesdec256kl 0x77(%edx),%xmm3",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x42, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 42 77 \taesencwide128kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x52, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 52 77 \taesencwide256kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x4a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 4a 77 \taesdecwide128kl 0x77(%edx)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xd8, 0x5a, 0x77, }, 7, 0, "", "",
+"67 f3 0f 38 d8 5a 77 \taesdecwide256kl 0x77(%edx)",},
+{{0x67, 0x0f, 0x38, 0xfc, 0x08, }, 5, 0, "", "",
+"67 0f 38 fc 08 \taadd %ecx,(%eax)",},
+{{0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 9, 0, "", "",
+"0f 38 fc 14 25 78 56 34 12 \taadd %edx,0x12345678",},
+{{0x67, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"67 0f 38 fc 94 c8 78 56 34 12 \taadd %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0x66, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 66 0f 38 fc 08 \taand %ecx,(%eax)",},
+{{0x66, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"66 0f 38 fc 14 25 78 56 34 12 \taand %edx,0x12345678",},
+{{0x67, 0x66, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 66 0f 38 fc 94 c8 78 56 34 12 \taand %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xf2, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 f2 0f 38 fc 08 \taor %ecx,(%eax)",},
+{{0xf2, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f2 0f 38 fc 14 25 78 56 34 12 \taor %edx,0x12345678",},
+{{0x67, 0xf2, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 f2 0f 38 fc 94 c8 78 56 34 12 \taor %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xfc, 0x08, }, 6, 0, "", "",
+"67 f3 0f 38 fc 08 \taxor %ecx,(%eax)",},
+{{0xf3, 0x0f, 0x38, 0xfc, 0x14, 0x25, 0x78, 0x56, 0x34, 0x12, }, 10, 0, "", "",
+"f3 0f 38 fc 14 25 78 56 34 12 \taxor %edx,0x12345678",},
+{{0x67, 0xf3, 0x0f, 0x38, 0xfc, 0x94, 0xc8, 0x78, 0x56, 0x34, 0x12, }, 11, 0, "", "",
+"67 f3 0f 38 fc 94 c8 78 56 34 12 \taxor %edx,0x12345678(%eax,%ecx,8)",},
+{{0x67, 0xc4, 0xe2, 0x7a, 0xb1, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7a b1 31 \tvbcstnebf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x79, 0xb1, 0x31, }, 6, 0, "", "",
+"67 c4 e2 79 b1 31 \tvbcstnesh2ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x7a, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7a b0 31 \tvcvtneebf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x79, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 79 b0 31 \tvcvtneeph2ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x7b, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 7b b0 31 \tvcvtneobf162ps (%ecx),%xmm6",},
+{{0x67, 0xc4, 0xe2, 0x78, 0xb0, 0x31, }, 6, 0, "", "",
+"67 c4 e2 78 b0 31 \tvcvtneoph2ps (%ecx),%xmm6",},
+{{0x62, 0xf2, 0x7e, 0x08, 0x72, 0xf1, }, 6, 0, "", "",
+"62 f2 7e 08 72 f1 \tvcvtneps2bf16 %xmm1,%xmm6",},
+{{0xc4, 0xe2, 0x6b, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 50 d9 \tvpdpbssd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b 51 d9 \tvpdpbssds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 50 d9 \tvpdpbsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a 51 d9 \tvpdpbsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x50, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 50 d9 \tvpdpbuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0x51, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 51 d9 \tvpdpbuuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d2 d9 \tvpdpwsud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a d3 d9 \tvpdpwsuds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d2 d9 \tvpdpwusd %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 d3 d9 \tvpdpwusds %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd2, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d2 d9 \tvpdpwuud %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x68, 0xd3, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 d3 d9 \tvpdpwuuds %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb5, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b5 d9 \tvpmadd52huq %xmm1,%xmm2,%xmm3",},
+{{0x62, 0xf2, 0xed, 0x08, 0xb4, 0xd9, }, 6, 0, "", "",
+"62 f2 ed 08 b4 d9 \tvpmadd52luq %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x7f, 0xcc, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cc d1 \tvsha512msg1 %xmm1,%ymm2",},
+{{0xc4, 0xe2, 0x7f, 0xcd, 0xd1, }, 5, 0, "", "",
+"c4 e2 7f cd d1 \tvsha512msg2 %ymm1,%ymm2",},
+{{0xc4, 0xe2, 0x6f, 0xcb, 0xd9, }, 5, 0, "", "",
+"c4 e2 6f cb d9 \tvsha512rnds2 %xmm1,%ymm2,%ymm3",},
+{{0xc4, 0xe2, 0x68, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 68 da d9 \tvsm3msg1 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x69, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 69 da d9 \tvsm3msg2 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe3, 0x69, 0xde, 0xd9, 0xa1, }, 6, 0, "", "",
+"c4 e3 69 de d9 a1 \tvsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6a, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6a da d9 \tvsm4key4 %xmm1,%xmm2,%xmm3",},
+{{0xc4, 0xe2, 0x6b, 0xda, 0xd9, }, 5, 0, "", "",
+"c4 e2 6b da d9 \tvsm4rnds4 %xmm1,%xmm2,%xmm3",},
+{{0x67, 0x0f, 0x0d, 0x00, }, 4, 0, "", "",
+"67 0f 0d 00 \tprefetch (%eax)",},
+{{0x67, 0x0f, 0x18, 0x08, }, 4, 0, "", "",
+"67 0f 18 08 \tprefetcht0 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x10, }, 4, 0, "", "",
+"67 0f 18 10 \tprefetcht1 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x18, }, 4, 0, "", "",
+"67 0f 18 18 \tprefetcht2 (%eax)",},
+{{0x67, 0x0f, 0x18, 0x00, }, 4, 0, "", "",
+"67 0f 18 00 \tprefetchnta (%eax)",},
+{{0x0f, 0x01, 0xc6, }, 3, 0, "", "",
+"0f 01 c6 \twrmsrns",},
{{0xf3, 0x0f, 0x3a, 0xf0, 0xc0, 0x00, }, 6, 0, "", "",
"f3 0f 3a f0 c0 00 \threset $0x0",},
{{0x0f, 0x01, 0xe8, }, 3, 0, "", "",
diff --git a/tools/perf/arch/x86/tests/insn-x86-dat-src.c b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
index a391464c8dee..f55505c75d51 100644
--- a/tools/perf/arch/x86/tests/insn-x86-dat-src.c
+++ b/tools/perf/arch/x86/tests/insn-x86-dat-src.c
@@ -2628,6 +2628,512 @@ int main(void)
asm volatile("vucomish 0x12345678(%rax,%rcx,8), %xmm1");
asm volatile("vucomish 0x12345678(%eax,%ecx,8), %xmm1");

+ /* Key Locker */
+
+ asm volatile("loadiwkey %xmm1, %xmm2");
+ asm volatile("encodekey128 %eax, %edx");
+ asm volatile("encodekey256 %eax, %edx");
+ asm volatile("aesenc128kl 0x77(%rdx), %xmm3");
+ asm volatile("aesenc256kl 0x77(%rdx), %xmm3");
+ asm volatile("aesdec128kl 0x77(%rdx), %xmm3");
+ asm volatile("aesdec256kl 0x77(%rdx), %xmm3");
+ asm volatile("aesencwide128kl 0x77(%rdx)");
+ asm volatile("aesencwide256kl 0x77(%rdx)");
+ asm volatile("aesdecwide128kl 0x77(%rdx)");
+ asm volatile("aesdecwide256kl 0x77(%rdx)");
+
+ /* Remote Atomic Operations */
+
+ asm volatile("aadd %ecx,(%rax)");
+ asm volatile("aadd %edx,(%r8)");
+ asm volatile("aadd %edx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aadd %edx,0x12345678(%r8,%rcx,8)");
+ asm volatile("aadd %rcx,(%rax)");
+ asm volatile("aadd %rdx,(%r8)");
+ asm volatile("aadd %rdx,(0x12345678)");
+ asm volatile("aadd %rdx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aadd %rdx,0x12345678(%r8,%rcx,8)");
+
+ asm volatile("aand %ecx,(%rax)");
+ asm volatile("aand %edx,(%r8)");
+ asm volatile("aand %edx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aand %edx,0x12345678(%r8,%rcx,8)");
+ asm volatile("aand %rcx,(%rax)");
+ asm volatile("aand %rdx,(%r8)");
+ asm volatile("aand %rdx,(0x12345678)");
+ asm volatile("aand %rdx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aand %rdx,0x12345678(%r8,%rcx,8)");
+
+ asm volatile("aor %ecx,(%rax)");
+ asm volatile("aor %edx,(%r8)");
+ asm volatile("aor %edx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aor %edx,0x12345678(%r8,%rcx,8)");
+ asm volatile("aor %rcx,(%rax)");
+ asm volatile("aor %rdx,(%r8)");
+ asm volatile("aor %rdx,(0x12345678)");
+ asm volatile("aor %rdx,0x12345678(%rax,%rcx,8)");
+ asm volatile("aor %rdx,0x12345678(%r8,%rcx,8)");
+
+ asm volatile("axor %ecx,(%rax)");
+ asm volatile("axor %edx,(%r8)");
+ asm volatile("axor %edx,0x12345678(%rax,%rcx,8)");
+ asm volatile("axor %edx,0x12345678(%r8,%rcx,8)");
+ asm volatile("axor %rcx,(%rax)");
+ asm volatile("axor %rdx,(%r8)");
+ asm volatile("axor %rdx,(0x12345678)");
+ asm volatile("axor %rdx,0x12345678(%rax,%rcx,8)");
+ asm volatile("axor %rdx,0x12345678(%r8,%rcx,8)");
+
+ /* VEX CMPxxXADD */
+
+ asm volatile("cmpbexadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpbxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmplexadd %ebx,%ecx,(%r9)");
+ asm volatile("cmplxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnbexadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnbxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnlexadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnlxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnoxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnpxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnsxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpnzxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpoxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmppxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpsxadd %ebx,%ecx,(%r9)");
+ asm volatile("cmpzxadd %ebx,%ecx,(%r9)");
+
+ /* Pre-fetch */
+
+ asm volatile("prefetch (%rax)");
+ asm volatile("prefetcht0 (%rax)");
+ asm volatile("prefetcht1 (%rax)");
+ asm volatile("prefetcht2 (%rax)");
+ asm volatile("prefetchnta (%rax)");
+ asm volatile("prefetchit0 0x12345678(%rip)");
+ asm volatile("prefetchit1 0x12345678(%rip)");
+
+ /* MSR List */
+
+ asm volatile("rdmsrlist");
+ asm volatile("wrmsrlist");
+
+ /* User Read/Write MSR */
+
+ asm volatile("urdmsr %rdx,%rax");
+ asm volatile("urdmsr %rdx,%r22");
+ asm volatile("urdmsr $0x7f,%r12");
+ asm volatile("uwrmsr %rax,%rdx");
+ asm volatile("uwrmsr %r22,%rdx");
+ asm volatile("uwrmsr %r12,$0x7f");
+
+ /* AVX NE Convert */
+
+ asm volatile("vbcstnebf162ps (%rcx),%xmm6");
+ asm volatile("vbcstnesh2ps (%rcx),%xmm6");
+ asm volatile("vcvtneebf162ps (%rcx),%xmm6");
+ asm volatile("vcvtneeph2ps (%rcx),%xmm6");
+ asm volatile("vcvtneobf162ps (%rcx),%xmm6");
+ asm volatile("vcvtneoph2ps (%rcx),%xmm6");
+ asm volatile("vcvtneps2bf16 %xmm1,%xmm6");
+
+ /* FRED */
+
+ asm volatile("erets"); /* Expecting: erets indirect 0 */
+ asm volatile("eretu"); /* Expecting: eretu indirect 0 */
+
+ /* AMX Complex */
+
+ asm volatile("tcmmimfp16ps %tmm1,%tmm2,%tmm3");
+ asm volatile("tcmmrlfp16ps %tmm1,%tmm2,%tmm3");
+
+ /* AMX FP16 */
+
+ asm volatile("tdpfp16ps %tmm1,%tmm2,%tmm3");
+
+ /* REX2 */
+
+ asm volatile("test $0x5, %r18b");
+ asm volatile("test $0x5, %r18d");
+ asm volatile("test $0x5, %r18");
+ asm volatile("test $0x5, %r18w");
+ asm volatile("imull %eax, %r14d");
+ asm volatile("imull %eax, %r17d");
+ asm volatile("punpckldq (%r18), %mm2");
+ asm volatile("leal (%rax), %r16d");
+ asm volatile("leal (%rax), %r31d");
+ asm volatile("leal (,%r16), %eax");
+ asm volatile("leal (,%r31), %eax");
+ asm volatile("leal (%r16), %eax");
+ asm volatile("leal (%r31), %eax");
+ asm volatile("leaq (%rax), %r15");
+ asm volatile("leaq (%rax), %r16");
+ asm volatile("leaq (%r15), %rax");
+ asm volatile("leaq (%r16), %rax");
+ asm volatile("leaq (,%r15), %rax");
+ asm volatile("leaq (,%r16), %rax");
+ asm volatile("add (%r16), %r8");
+ asm volatile("add (%r16), %r15");
+ asm volatile("mov (,%r9), %r16");
+ asm volatile("mov (,%r14), %r16");
+ asm volatile("sub (%r10), %r31");
+ asm volatile("sub (%r13), %r31");
+ asm volatile("leal 1(%r16, %r21), %eax");
+ asm volatile("leal 1(%r16, %r26), %r31d");
+ asm volatile("leal 129(%r21, %r9), %eax");
+ asm volatile("leal 129(%r26, %r9), %r31d");
+ /*
+ * Have to use .byte for jmpabs because gas does not support the
+ * mnemonic for some reason, but then it also gets the source line wrong
+ * with .byte, so the following is a workaround.
+ */
+ asm volatile(""); /* Expecting: jmp indirect 0 */
+ asm volatile(".byte 0xd5, 0x00, 0xa1, 0xef, 0xcd, 0xab, 0x90, 0x78, 0x56, 0x34, 0x12");
+ asm volatile("pushp %rbx");
+ asm volatile("pushp %r16");
+ asm volatile("pushp %r31");
+ asm volatile("popp %r31");
+ asm volatile("popp %r16");
+ asm volatile("popp %rbx");
+
+ /* APX */
+
+ asm volatile("bextr %r25d,%edx,%r10d");
+ asm volatile("bextr %r25d,0x123(%r31,%rax,4),%edx");
+ asm volatile("bextr %r31,%r15,%r11");
+ asm volatile("bextr %r31,0x123(%r31,%rax,4),%r15");
+ asm volatile("blsi %r25d,%edx");
+ asm volatile("blsi %r31,%r15");
+ asm volatile("blsi 0x123(%r31,%rax,4),%r25d");
+ asm volatile("blsi 0x123(%r31,%rax,4),%r31");
+ asm volatile("blsmsk %r25d,%edx");
+ asm volatile("blsmsk %r31,%r15");
+ asm volatile("blsmsk 0x123(%r31,%rax,4),%r25d");
+ asm volatile("blsmsk 0x123(%r31,%rax,4),%r31");
+ asm volatile("blsr %r25d,%edx");
+ asm volatile("blsr %r31,%r15");
+ asm volatile("blsr 0x123(%r31,%rax,4),%r25d");
+ asm volatile("blsr 0x123(%r31,%rax,4),%r31");
+ asm volatile("bzhi %r25d,%edx,%r10d");
+ asm volatile("bzhi %r25d,0x123(%r31,%rax,4),%edx");
+ asm volatile("bzhi %r31,%r15,%r11");
+ asm volatile("bzhi %r31,0x123(%r31,%rax,4),%r15");
+ asm volatile("cmpbexadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpbexadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpbxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpbxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmplxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmplxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnbexadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnbexadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnbxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnbxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnlexadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnlexadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnlxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnlxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnoxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnoxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnpxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnpxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnsxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnsxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpnzxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpnzxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpoxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpoxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmppxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmppxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpsxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpsxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("cmpzxadd %r25d,%edx,0x123(%r31,%rax,4)");
+ asm volatile("cmpzxadd %r31,%r15,0x123(%r31,%rax,4)");
+ asm volatile("crc32q %r31, %r22");
+ asm volatile("crc32q (%r31), %r22");
+ asm volatile("crc32b %r19b, %r17");
+ asm volatile("crc32b %r19b, %r21d");
+ asm volatile("crc32b (%r19),%ebx");
+ asm volatile("crc32l %r31d, %r23d");
+ asm volatile("crc32l (%r31), %r23d");
+ asm volatile("crc32w %r31w, %r21d");
+ asm volatile("crc32w (%r31),%r21d");
+ asm volatile("crc32 %rax, %r18");
+ asm volatile("enqcmd 0x123(%r31d,%eax,4),%r25d");
+ asm volatile("enqcmd 0x123(%r31,%rax,4),%r31");
+ asm volatile("enqcmds 0x123(%r31d,%eax,4),%r25d");
+ asm volatile("enqcmds 0x123(%r31,%rax,4),%r31");
+ asm volatile("invept 0x123(%r31,%rax,4),%r31");
+ asm volatile("invpcid 0x123(%r31,%rax,4),%r31");
+ asm volatile("invvpid 0x123(%r31,%rax,4),%r31");
+ asm volatile("kmovb %k5,%r25d");
+ asm volatile("kmovb %k5,0x123(%r31,%rax,4)");
+ asm volatile("kmovb %r25d,%k5");
+ asm volatile("kmovb 0x123(%r31,%rax,4),%k5");
+ asm volatile("kmovd %k5,%r25d");
+ asm volatile("kmovd %k5,0x123(%r31,%rax,4)");
+ asm volatile("kmovd %r25d,%k5");
+ asm volatile("kmovd 0x123(%r31,%rax,4),%k5");
+ asm volatile("kmovq %k5,%r31");
+ asm volatile("kmovq %k5,0x123(%r31,%rax,4)");
+ asm volatile("kmovq %r31,%k5");
+ asm volatile("kmovq 0x123(%r31,%rax,4),%k5");
+ asm volatile("kmovw %k5,%r25d");
+ asm volatile("kmovw %k5,0x123(%r31,%rax,4)");
+ asm volatile("kmovw %r25d,%k5");
+ asm volatile("kmovw 0x123(%r31,%rax,4),%k5");
+ asm volatile("ldtilecfg 0x123(%r31,%rax,4)");
+ asm volatile("movbe %r18w,%ax");
+ asm volatile("movbe %r15w,%ax");
+ asm volatile("movbe %r18w,0x123(%r16,%rax,4)");
+ asm volatile("movbe %r18w,0x123(%r31,%rax,4)");
+ asm volatile("movbe %r25d,%edx");
+ asm volatile("movbe %r15d,%edx");
+ asm volatile("movbe %r25d,0x123(%r16,%rax,4)");
+ asm volatile("movbe %r31,%r15");
+ asm volatile("movbe %r8,%r15");
+ asm volatile("movbe %r31,0x123(%r16,%rax,4)");
+ asm volatile("movbe %r31,0x123(%r31,%rax,4)");
+ asm volatile("movbe 0x123(%r16,%rax,4),%r31");
+ asm volatile("movbe 0x123(%r31,%rax,4),%r18w");
+ asm volatile("movbe 0x123(%r31,%rax,4),%r25d");
+ asm volatile("movdir64b 0x123(%r31d,%eax,4),%r25d");
+ asm volatile("movdir64b 0x123(%r31,%rax,4),%r31");
+ asm volatile("movdiri %r25d,0x123(%r31,%rax,4)");
+ asm volatile("movdiri %r31,0x123(%r31,%rax,4)");
+ asm volatile("pdep %r25d,%edx,%r10d");
+ asm volatile("pdep %r31,%r15,%r11");
+ asm volatile("pdep 0x123(%r31,%rax,4),%r25d,%edx");
+ asm volatile("pdep 0x123(%r31,%rax,4),%r31,%r15");
+ asm volatile("pext %r25d,%edx,%r10d");
+ asm volatile("pext %r31,%r15,%r11");
+ asm volatile("pext 0x123(%r31,%rax,4),%r25d,%edx");
+ asm volatile("pext 0x123(%r31,%rax,4),%r31,%r15");
+ asm volatile("shlx %r25d,%edx,%r10d");
+ asm volatile("shlx %r25d,0x123(%r31,%rax,4),%edx");
+ asm volatile("shlx %r31,%r15,%r11");
+ asm volatile("shlx %r31,0x123(%r31,%rax,4),%r15");
+ asm volatile("shrx %r25d,%edx,%r10d");
+ asm volatile("shrx %r25d,0x123(%r31,%rax,4),%edx");
+ asm volatile("shrx %r31,%r15,%r11");
+ asm volatile("shrx %r31,0x123(%r31,%rax,4),%r15");
+ asm volatile("sttilecfg 0x123(%r31,%rax,4)");
+ asm volatile("tileloadd 0x123(%r31,%rax,4),%tmm6");
+ asm volatile("tileloaddt1 0x123(%r31,%rax,4),%tmm6");
+ asm volatile("tilestored %tmm6,0x123(%r31,%rax,4)");
+ asm volatile("vbroadcastf128 (%r16),%ymm3");
+ asm volatile("vbroadcasti128 (%r16),%ymm3");
+ asm volatile("vextractf128 $1,%ymm3,(%r16)");
+ asm volatile("vextracti128 $1,%ymm3,(%r16)");
+ asm volatile("vinsertf128 $1,(%r16),%ymm3,%ymm8");
+ asm volatile("vinserti128 $1,(%r16),%ymm3,%ymm8");
+ asm volatile("vroundpd $1,(%r24),%xmm6");
+ asm volatile("vroundps $2,(%r24),%xmm6");
+ asm volatile("vroundsd $3,(%r24),%xmm6,%xmm3");
+ asm volatile("vroundss $4,(%r24),%xmm6,%xmm3");
+ asm volatile("wrssd %r25d,0x123(%r31,%rax,4)");
+ asm volatile("wrssq %r31,0x123(%r31,%rax,4)");
+ asm volatile("wrussd %r25d,0x123(%r31,%rax,4)");
+ asm volatile("wrussq %r31,0x123(%r31,%rax,4)");
+
+ /* APX new data destination */
+
+ asm volatile("adc $0x1234,%ax,%r30w");
+ asm volatile("adc %r15b,%r17b,%r18b");
+ asm volatile("adc %r15d,(%r8),%r18d");
+ asm volatile("adc (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("adc (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("adcl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("adcx %r15d,%r8d,%r18d");
+ asm volatile("adcx (%r15,%r31,1),%r8");
+ asm volatile("adcx (%r15,%r31,1),%r8d,%r18d");
+ asm volatile("add $0x1234,%ax,%r30w");
+ asm volatile("add $0x12344433,%r15,%r16");
+ asm volatile("add $0x34,%r13b,%r17b");
+ asm volatile("add $0xfffffffff4332211,%rax,%r8");
+ asm volatile("add %r31,%r8,%r16");
+ asm volatile("add %r31,(%r8),%r16");
+ asm volatile("add %r31,(%r8,%r16,8),%r16");
+ asm volatile("add %r31b,%r8b,%r16b");
+ asm volatile("add %r31d,%r8d,%r16d");
+ asm volatile("add %r31w,%r8w,%r16w");
+ asm volatile("add (%r31),%r8,%r16");
+ asm volatile("add 0x9090(%r31,%r16,1),%r8,%r16");
+ asm volatile("addb %r31b,%r8b,%r16b");
+ asm volatile("addl %r31d,%r8d,%r16d");
+ asm volatile("addl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("addq %r31,%r8,%r16");
+ asm volatile("addq $0x12344433,(%r15,%rcx,4),%r16");
+ asm volatile("addw %r31w,%r8w,%r16w");
+ asm volatile("adox %r15d,%r8d,%r18d");
+ asm volatile("{load} add %r31,%r8,%r16");
+ asm volatile("{store} add %r31,%r8,%r16");
+ asm volatile("adox (%r15,%r31,1),%r8");
+ asm volatile("adox (%r15,%r31,1),%r8d,%r18d");
+ asm volatile("and $0x1234,%ax,%r30w");
+ asm volatile("and %r15b,%r17b,%r18b");
+ asm volatile("and %r15d,(%r8),%r18d");
+ asm volatile("and (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("and (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("andl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("cmova 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovae 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovb 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovbe 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmove 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovg 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovge 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovl 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovle 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovne 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovno 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovnp 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovns 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovo 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovp 0x90909090(%eax),%edx,%r8d");
+ asm volatile("cmovs 0x90909090(%eax),%edx,%r8d");
+ asm volatile("dec %rax,%r17");
+ asm volatile("decb (%r31,%r12,1),%r8b");
+ asm volatile("imul 0x909(%rax,%r31,8),%rdx,%r25");
+ asm volatile("imul 0x90909(%eax),%edx,%r8d");
+ asm volatile("inc %r31,%r16");
+ asm volatile("inc %r31,%r8");
+ asm volatile("inc %rax,%rbx");
+ asm volatile("neg %rax,%r17");
+ asm volatile("negb (%r31,%r12,1),%r8b");
+ asm volatile("not %rax,%r17");
+ asm volatile("notb (%r31,%r12,1),%r8b");
+ asm volatile("or $0x1234,%ax,%r30w");
+ asm volatile("or %r15b,%r17b,%r18b");
+ asm volatile("or %r15d,(%r8),%r18d");
+ asm volatile("or (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("or (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("orl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("rcl $0x2,%r12b,%r31b");
+ asm volatile("rcl %cl,%r16b,%r8b");
+ asm volatile("rclb $0x1,(%rax),%r31b");
+ asm volatile("rcll $0x2,(%rax),%r31d");
+ asm volatile("rclw $0x1,(%rax),%r31w");
+ asm volatile("rclw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("rcr $0x2,%r12b,%r31b");
+ asm volatile("rcr %cl,%r16b,%r8b");
+ asm volatile("rcrb $0x1,(%rax),%r31b");
+ asm volatile("rcrl $0x2,(%rax),%r31d");
+ asm volatile("rcrw $0x1,(%rax),%r31w");
+ asm volatile("rcrw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("rol $0x2,%r12b,%r31b");
+ asm volatile("rol %cl,%r16b,%r8b");
+ asm volatile("rolb $0x1,(%rax),%r31b");
+ asm volatile("roll $0x2,(%rax),%r31d");
+ asm volatile("rolw $0x1,(%rax),%r31w");
+ asm volatile("rolw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("ror $0x2,%r12b,%r31b");
+ asm volatile("ror %cl,%r16b,%r8b");
+ asm volatile("rorb $0x1,(%rax),%r31b");
+ asm volatile("rorl $0x2,(%rax),%r31d");
+ asm volatile("rorw $0x1,(%rax),%r31w");
+ asm volatile("rorw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("sar $0x2,%r12b,%r31b");
+ asm volatile("sar %cl,%r16b,%r8b");
+ asm volatile("sarb $0x1,(%rax),%r31b");
+ asm volatile("sarl $0x2,(%rax),%r31d");
+ asm volatile("sarw $0x1,(%rax),%r31w");
+ asm volatile("sarw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("sbb $0x1234,%ax,%r30w");
+ asm volatile("sbb %r15b,%r17b,%r18b");
+ asm volatile("sbb %r15d,(%r8),%r18d");
+ asm volatile("sbb (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("sbb (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("sbbl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("shl $0x2,%r12b,%r31b");
+ asm volatile("shl $0x2,%r12b,%r31b");
+ asm volatile("shl %cl,%r16b,%r8b");
+ asm volatile("shl %cl,%r16b,%r8b");
+ asm volatile("shlb $0x1,(%rax),%r31b");
+ asm volatile("shlb $0x1,(%rax),%r31b");
+ asm volatile("shld $0x1,%r12,(%rax),%r31");
+ asm volatile("shld $0x2,%r15d,(%rax),%r31d");
+ asm volatile("shld $0x2,%r8w,%r12w,%r31w");
+ asm volatile("shld %cl,%r12,%r16,%r8");
+ asm volatile("shld %cl,%r13w,(%r19,%rax,4),%r31w");
+ asm volatile("shld %cl,%r9w,(%rax),%r31w");
+ asm volatile("shll $0x2,(%rax),%r31d");
+ asm volatile("shll $0x2,(%rax),%r31d");
+ asm volatile("shlw $0x1,(%rax),%r31w");
+ asm volatile("shlw $0x1,(%rax),%r31w");
+ asm volatile("shlw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("shlw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("shr $0x2,%r12b,%r31b");
+ asm volatile("shr %cl,%r16b,%r8b");
+ asm volatile("shrb $0x1,(%rax),%r31b");
+ asm volatile("shrd $0x1,%r12,(%rax),%r31");
+ asm volatile("shrd $0x2,%r15d,(%rax),%r31d");
+ asm volatile("shrd $0x2,%r8w,%r12w,%r31w");
+ asm volatile("shrd %cl,%r12,%r16,%r8");
+ asm volatile("shrd %cl,%r13w,(%r19,%rax,4),%r31w");
+ asm volatile("shrd %cl,%r9w,(%rax),%r31w");
+ asm volatile("shrl $0x2,(%rax),%r31d");
+ asm volatile("shrw $0x1,(%rax),%r31w");
+ asm volatile("shrw %cl,(%r19,%rax,4),%r31w");
+ asm volatile("sub $0x1234,%ax,%r30w");
+ asm volatile("sub %r15b,%r17b,%r18b");
+ asm volatile("sub %r15d,(%r8),%r18d");
+ asm volatile("sub (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("sub (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("subl $0x11,(%r19,%rax,4),%r20d");
+ asm volatile("xor $0x1234,%ax,%r30w");
+ asm volatile("xor %r15b,%r17b,%r18b");
+ asm volatile("xor %r15d,(%r8),%r18d");
+ asm volatile("xor (%r15,%rax,1),%r16b,%r8b");
+ asm volatile("xor (%r15,%rax,1),%r16w,%r8w");
+ asm volatile("xorl $0x11,(%r19,%rax,4),%r20d");
+
+ /* APX suppress status flags */
+
+ asm volatile("{nf} add %bl,%dl,%r8b");
+ asm volatile("{nf} add %dx,%ax,%r9w");
+ asm volatile("{nf} add 0x123(%r8,%rax,4),%bl,%dl");
+ asm volatile("{nf} add 0x123(%r8,%rax,4),%dx,%ax");
+ asm volatile("{nf} or %bl,%dl,%r8b");
+ asm volatile("{nf} or %dx,%ax,%r9w");
+ asm volatile("{nf} or 0x123(%r8,%rax,4),%bl,%dl");
+ asm volatile("{nf} or 0x123(%r8,%rax,4),%dx,%ax");
+ asm volatile("{nf} and %bl,%dl,%r8b");
+ asm volatile("{nf} and %dx,%ax,%r9w");
+ asm volatile("{nf} and 0x123(%r8,%rax,4),%bl,%dl");
+ asm volatile("{nf} and 0x123(%r8,%rax,4),%dx,%ax");
+ asm volatile("{nf} shld $0x7b,%dx,%ax,%r9w");
+ asm volatile("{nf} sub %bl,%dl,%r8b");
+ asm volatile("{nf} sub %dx,%ax,%r9w");
+ asm volatile("{nf} sub 0x123(%r8,%rax,4),%bl,%dl");
+ asm volatile("{nf} sub 0x123(%r8,%rax,4),%dx,%ax");
+ asm volatile("{nf} shrd $0x7b,%dx,%ax,%r9w");
+ asm volatile("{nf} xor %bl,%dl,%r8b");
+ asm volatile("{nf} xor %r31,%r31");
+ asm volatile("{nf} xor 0x123(%r8,%rax,4),%bl,%dl");
+ asm volatile("{nf} xor 0x123(%r8,%rax,4),%dx,%ax");
+ asm volatile("{nf} imul $0xff90,%r9,%r15");
+ asm volatile("{nf} imul $0x7b,%r9,%r15");
+ asm volatile("{nf} xor $0x7b,%bl,%dl");
+ asm volatile("{nf} xor $0x7b,%dx,%ax");
+ asm volatile("{nf} popcnt %r9,%r31");
+ asm volatile("{nf} shld %cl,%dx,%ax,%r9w");
+ asm volatile("{nf} shrd %cl,%dx,%ax,%r9w");
+ asm volatile("{nf} imul %r9,%r31,%r11");
+ asm volatile("{nf} sar $0x7b,%bl,%dl");
+ asm volatile("{nf} sar $0x7b,%dx,%ax");
+ asm volatile("{nf} sar $1,%bl,%dl");
+ asm volatile("{nf} sar $1,%dx,%ax");
+ asm volatile("{nf} sar %cl,%bl,%dl");
+ asm volatile("{nf} sar %cl,%dx,%ax");
+ asm volatile("{nf} andn %r9,%r31,%r11");
+ asm volatile("{nf} blsi %r9,%r31");
+ asm volatile("{nf} tzcnt %r9,%r31");
+ asm volatile("{nf} lzcnt %r9,%r31");
+ asm volatile("{nf} idiv %bl");
+ asm volatile("{nf} idiv %dx");
+ asm volatile("{nf} dec %bl,%dl");
+ asm volatile("{nf} dec %dx,%ax");
+
#else /* #ifdef __x86_64__ */

/* bound r32, mem (same op code as EVEX prefix) */
@@ -4848,6 +5354,97 @@ int main(void)

#endif /* #ifndef __x86_64__ */

+ /* Key Locker */
+
+ asm volatile(" loadiwkey %xmm1, %xmm2");
+ asm volatile(" encodekey128 %eax, %edx");
+ asm volatile(" encodekey256 %eax, %edx");
+ asm volatile(" aesenc128kl 0x77(%edx), %xmm3");
+ asm volatile(" aesenc256kl 0x77(%edx), %xmm3");
+ asm volatile(" aesdec128kl 0x77(%edx), %xmm3");
+ asm volatile(" aesdec256kl 0x77(%edx), %xmm3");
+ asm volatile(" aesencwide128kl 0x77(%edx)");
+ asm volatile(" aesencwide256kl 0x77(%edx)");
+ asm volatile(" aesdecwide128kl 0x77(%edx)");
+ asm volatile(" aesdecwide256kl 0x77(%edx)");
+
+ /* Remote Atomic Operations */
+
+ asm volatile("aadd %ecx,(%eax)");
+ asm volatile("aadd %edx,(0x12345678)");
+ asm volatile("aadd %edx,0x12345678(%eax,%ecx,8)");
+
+ asm volatile("aand %ecx,(%eax)");
+ asm volatile("aand %edx,(0x12345678)");
+ asm volatile("aand %edx,0x12345678(%eax,%ecx,8)");
+
+ asm volatile("aor %ecx,(%eax)");
+ asm volatile("aor %edx,(0x12345678)");
+ asm volatile("aor %edx,0x12345678(%eax,%ecx,8)");
+
+ asm volatile("axor %ecx,(%eax)");
+ asm volatile("axor %edx,(0x12345678)");
+ asm volatile("axor %edx,0x12345678(%eax,%ecx,8)");
+
+ /* AVX NE Convert */
+
+ asm volatile("vbcstnebf162ps (%ecx),%xmm6");
+ asm volatile("vbcstnesh2ps (%ecx),%xmm6");
+ asm volatile("vcvtneebf162ps (%ecx),%xmm6");
+ asm volatile("vcvtneeph2ps (%ecx),%xmm6");
+ asm volatile("vcvtneobf162ps (%ecx),%xmm6");
+ asm volatile("vcvtneoph2ps (%ecx),%xmm6");
+ asm volatile("vcvtneps2bf16 %xmm1,%xmm6");
+
+ /* AVX VNNI INT16 */
+
+ asm volatile("vpdpbssd %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpbssds %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpbsud %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpbsuds %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpbuud %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpbuuds %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwsud %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwsuds %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwusd %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwusds %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwuud %xmm1,%xmm2,%xmm3");
+ asm volatile("vpdpwuuds %xmm1,%xmm2,%xmm3");
+
+ /* AVX IFMA */
+
+ asm volatile("vpmadd52huq %xmm1,%xmm2,%xmm3");
+ asm volatile("vpmadd52luq %xmm1,%xmm2,%xmm3");
+
+ /* AVX SHA512 */
+
+ asm volatile("vsha512msg1 %xmm1,%ymm2");
+ asm volatile("vsha512msg2 %ymm1,%ymm2");
+ asm volatile("vsha512rnds2 %xmm1,%ymm2,%ymm3");
+
+ /* AVX SM3 */
+
+ asm volatile("vsm3msg1 %xmm1,%xmm2,%xmm3");
+ asm volatile("vsm3msg2 %xmm1,%xmm2,%xmm3");
+ asm volatile("vsm3rnds2 $0xa1,%xmm1,%xmm2,%xmm3");
+
+ /* AVX SM4 */
+
+ asm volatile("vsm4key4 %xmm1,%xmm2,%xmm3");
+ asm volatile("vsm4rnds4 %xmm1,%xmm2,%xmm3");
+
+ /* Pre-fetch */
+
+ asm volatile("prefetch (%eax)");
+ asm volatile("prefetcht0 (%eax)");
+ asm volatile("prefetcht1 (%eax)");
+ asm volatile("prefetcht2 (%eax)");
+ asm volatile("prefetchnta (%eax)");
+
+ /* Non-serializing write MSR */
+
+ asm volatile("wrmsrns");
+
/* Prediction history reset */

asm volatile("hreset $0");
--
2.34.1


2024-05-02 11:09:30

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic

Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.

The REX2 prefix is effectively an extended version of the REX prefix.

REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.

REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.

Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).

Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/include/asm/inat.h | 11 +++++++++-
arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
tools/arch/x86/include/asm/inat.h | 11 +++++++++-
tools/arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
tools/arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
8 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index b56c5741581a..1331bdd39a23 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
#define INAT_PFX_EVEX 15 /* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2 16 /* 0xD5 */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -50,7 +52,7 @@

/* Legacy prefix */
#define INAT_PFX_OFFS 0
-#define INAT_PFX_BITS 4
+#define INAT_PFX_BITS 5
#define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
#define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
/* Escape opcodes */
@@ -77,6 +79,8 @@
#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
}

+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 1b29f58f730f..95249ec1f24e 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
#define X86_SIB_BASE(sib) ((sib) & 0x07)

-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */

/* VEX bit flags */
#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
/* Instruction uses RIP-relative addressing */
extern int insn_rip_relative(struct insn *insn);

+static inline int insn_is_rex2(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+ return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
static inline int insn_is_avx(struct insn *insn)
{
if (!insn->prefixes.got)
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 1bb155a0955b..6126ddc6e5f5 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
if (X86_REX_W(b))
/* REX.W overrides opnd_size */
insn->opnd_bytes = 8;
+ } else if (inat_is_rex2_prefix(attr)) {
+ insn_set_byte(&insn->rex_prefix, 0, b);
+ b = peek_nbyte_next(insn_byte_t, insn, 1);
+ insn_set_byte(&insn->rex_prefix, 1, b);
+ insn->rex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ if (X86_REX_W(b))
+ /* REX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ insn->rex_prefix.got = 1;
+ goto vex_end;
}
}
insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
goto end;
}

+ /* Check if there is REX2 prefix or not */
+ if (insn_is_rex2(insn)) {
+ if (insn_rex2_m_bit(insn)) {
+ /* map 1 is escape 0x0f */
+ insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+ pfx_id = insn_last_prefix_id(insn);
+ insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+ } else {
+ insn->attr = inat_get_opcode_attribute(op);
+ }
+ goto end;
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index af38469afd14..3f43aa7d8fef 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {

modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
- rex_expr = "^REX(\\.[XRWB]+)*"
+ rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+ rex2_expr = "\\(REX2\\)"
+ no_rex2_expr = "\\(!REX2\\)"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
prefix_num["EVEX"] = "INAT_PFX_EVEX"
+ prefix_num["REX2"] = "INAT_PFX_REX2"

clear_vars()
}
@@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
if (match(ext, force64_expr))
flags = add_flags(flags, "INAT_FORCE64")

+ # check REX2 not allowed
+ if (match(ext, no_rex2_expr))
+ flags = add_flags(flags, "INAT_NO_REX2")
+
# check REX prefix
if (match(opcode, rex_expr))
flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
lptable3[idx] = add_flags(lptable3[idx],flags)
variant = "INAT_VARIANT"
}
+ if (match(ext, rex2_expr))
+ table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
if (!match(ext, lprefix_expr)){
table[idx] = add_flags(table[idx],flags)
}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index a61051400311..2e65312cae52 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
#define INAT_PFX_EVEX 15 /* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2 16 /* 0xD5 */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -50,7 +52,7 @@

/* Legacy prefix */
#define INAT_PFX_OFFS 0
-#define INAT_PFX_BITS 4
+#define INAT_PFX_BITS 5
#define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
#define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
/* Escape opcodes */
@@ -77,6 +79,8 @@
#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
}

+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 65c0d9ce1e29..1a7e8fc4d75a 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
#define X86_SIB_BASE(sib) ((sib) & 0x07)

-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */

/* VEX bit flags */
#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
/* Instruction uses RIP-relative addressing */
extern int insn_rip_relative(struct insn *insn);

+static inline int insn_is_rex2(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+ return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
static inline int insn_is_avx(struct insn *insn)
{
if (!insn->prefixes.got)
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index ada4b4a79dd4..f761adeb8e8c 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
if (X86_REX_W(b))
/* REX.W overrides opnd_size */
insn->opnd_bytes = 8;
+ } else if (inat_is_rex2_prefix(attr)) {
+ insn_set_byte(&insn->rex_prefix, 0, b);
+ b = peek_nbyte_next(insn_byte_t, insn, 1);
+ insn_set_byte(&insn->rex_prefix, 1, b);
+ insn->rex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ if (X86_REX_W(b))
+ /* REX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ insn->rex_prefix.got = 1;
+ goto vex_end;
}
}
insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
goto end;
}

+ /* Check if there is REX2 prefix or not */
+ if (insn_is_rex2(insn)) {
+ if (insn_rex2_m_bit(insn)) {
+ /* map 1 is escape 0x0f */
+ insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+ pfx_id = insn_last_prefix_id(insn);
+ insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+ } else {
+ insn->attr = inat_get_opcode_attribute(op);
+ }
+ goto end;
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index af38469afd14..3f43aa7d8fef 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {

modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
- rex_expr = "^REX(\\.[XRWB]+)*"
+ rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+ rex2_expr = "\\(REX2\\)"
+ no_rex2_expr = "\\(!REX2\\)"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
prefix_num["EVEX"] = "INAT_PFX_EVEX"
+ prefix_num["REX2"] = "INAT_PFX_REX2"

clear_vars()
}
@@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
if (match(ext, force64_expr))
flags = add_flags(flags, "INAT_FORCE64")

+ # check REX2 not allowed
+ if (match(ext, no_rex2_expr))
+ flags = add_flags(flags, "INAT_NO_REX2")
+
# check REX prefix
if (match(opcode, rex_expr))
flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
lptable3[idx] = add_flags(lptable3[idx],flags)
variant = "INAT_VARIANT"
}
+ if (match(ext, rex2_expr))
+ table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
if (!match(ext, lprefix_expr)){
table[idx] = add_flags(table[idx],flags)
}
--
2.34.1


2024-05-02 11:10:18

by Adrian Hunter

[permalink] [raw]
Subject: [PATCH 07/10] x86/insn: Add support for APX EVEX to the instruction decoder logic

Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:
- extended general purpose registers (EGPRs) i.e. r16 to r31
- Push-Pop Acceleration (PPX) hints
- new data destination (NDD) register
- suppress status flags writes (NF) of common instructions
- new instructions

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits

Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.

Signed-off-by: Adrian Hunter <[email protected]>
---
arch/x86/include/asm/inat.h | 6 ++++++
arch/x86/include/asm/insn.h | 7 +++++++
arch/x86/lib/insn.c | 4 ++++
arch/x86/tools/gen-insn-attr-x86.awk | 4 ++++
tools/arch/x86/include/asm/inat.h | 6 ++++++
tools/arch/x86/include/asm/insn.h | 7 +++++++
tools/arch/x86/lib/insn.c | 4 ++++
tools/arch/x86/tools/gen-insn-attr-x86.awk | 4 ++++
8 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 1331bdd39a23..53e4015242b4 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE (1 << (INAT_FLAG_OFFS + 10))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
{
return attr & INAT_EVEXONLY;
}
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+ return attr & INAT_EVEX_SCALABLE;
+}
#endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 95249ec1f24e..7152ea809e6a 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
return X86_VEX_P(insn->vex_prefix.bytes[2]);
}

+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes < 3)
+ return 0;
+ return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
/* Get the last prefix id from last prefix or VEX prefix */
static inline int insn_last_prefix_id(struct insn *insn)
{
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 6126ddc6e5f5..5952ab41c60f 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
m = insn_vex_m_bits(insn);
p = insn_vex_p_bits(insn);
insn->attr = inat_get_avx_attribute(op, m, p);
+ /* SCALABLE EVEX uses p bits to encode operand size */
+ if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+ p == INAT_PFX_OPNDSZ)
+ insn->opnd_bytes = 2;
if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
(!inat_accept_vex(insn->attr) &&
!inat_is_group(insn->attr))) {
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7d8fef..5770c8097f32 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
vexonly_expr = "\\(v\\)"
# All opcodes with (ev) superscript supports *only* EVEX prefix
evexonly_expr = "\\(ev\\)"
+ # (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+ evex_scalable_expr = "\\(es\\)"

prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
# check VEX codes
if (match(ext, evexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+ else if (match(ext, evex_scalable_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 2e65312cae52..253690eb3c26 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE (1 << (INAT_FLAG_OFFS + 10))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
{
return attr & INAT_EVEXONLY;
}
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+ return attr & INAT_EVEX_SCALABLE;
+}
#endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 1a7e8fc4d75a..0e5abd896ad4 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
return X86_VEX_P(insn->vex_prefix.bytes[2]);
}

+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes < 3)
+ return 0;
+ return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
/* Get the last prefix id from last prefix or VEX prefix */
static inline int insn_last_prefix_id(struct insn *insn)
{
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index f761adeb8e8c..a43b37346a22 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
m = insn_vex_m_bits(insn);
p = insn_vex_p_bits(insn);
insn->attr = inat_get_avx_attribute(op, m, p);
+ /* SCALABLE EVEX uses p bits to encode operand size */
+ if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+ p == INAT_PFX_OPNDSZ)
+ insn->opnd_bytes = 2;
if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
(!inat_accept_vex(insn->attr) &&
!inat_is_group(insn->attr))) {
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7d8fef..5770c8097f32 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
vexonly_expr = "\\(v\\)"
# All opcodes with (ev) superscript supports *only* EVEX prefix
evexonly_expr = "\\(ev\\)"
+ # (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+ evex_scalable_expr = "\\(es\\)"

prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
# check VEX codes
if (match(ext, evexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+ else if (match(ext, evex_scalable_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
--
2.34.1


2024-05-02 11:26:29

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add support for APX EVEX to the instruction decoder logic

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 87bbaf1a4be4904fcf04a024e7c1d9f9d1fa945b
Gitweb: https://git.kernel.org/tip/87bbaf1a4be4904fcf04a024e7c1d9f9d1fa945b
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:50 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:45 +02:00

x86/insn: Add support for APX EVEX to the instruction decoder logic

Intel Advanced Performance Extensions (APX) extends the EVEX prefix to
support:

- extended general purpose registers (EGPRs) i.e. r16 to r31
- Push-Pop Acceleration (PPX) hints
- new data destination (NDD) register
- suppress status flags writes (NF) of common instructions
- new instructions

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

The extended EVEX prefix does not need amended instruction decoder logic,
except in one area. Some instructions are defined as SCALABLE which means
the EVEX.W bit and EVEX.pp bits are used to determine operand size.
Specifically, if an instruction is SCALABLE and EVEX.W is zero, then
EVEX.pp value 0 (representing no prefix NP) means default operand size,
whereas EVEX.pp value 1 (representing 66 prefix) means operand size
override i.e. 16 bits

Add an attribute (INAT_EVEX_SCALABLE) to identify such instructions, and
amend the logic appropriately.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "(es)" as attribute INAT_EVEX_SCALABLE.

Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/inat.h | 6 ++++++
arch/x86/include/asm/insn.h | 7 +++++++
arch/x86/lib/insn.c | 4 ++++
arch/x86/tools/gen-insn-attr-x86.awk | 4 ++++
tools/arch/x86/include/asm/inat.h | 6 ++++++
tools/arch/x86/include/asm/insn.h | 7 +++++++
tools/arch/x86/lib/insn.c | 4 ++++
tools/arch/x86/tools/gen-insn-attr-x86.awk | 4 ++++
8 files changed, 42 insertions(+)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 1331bdd..53e4015 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE (1 << (INAT_FLAG_OFFS + 10))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
{
return attr & INAT_EVEXONLY;
}
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+ return attr & INAT_EVEX_SCALABLE;
+}
#endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 95249ec..7152ea8 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
return X86_VEX_P(insn->vex_prefix.bytes[2]);
}

+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes < 3)
+ return 0;
+ return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
/* Get the last prefix id from last prefix or VEX prefix */
static inline int insn_last_prefix_id(struct insn *insn)
{
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 6126ddc..5952ab4 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
m = insn_vex_m_bits(insn);
p = insn_vex_p_bits(insn);
insn->attr = inat_get_avx_attribute(op, m, p);
+ /* SCALABLE EVEX uses p bits to encode operand size */
+ if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+ p == INAT_PFX_OPNDSZ)
+ insn->opnd_bytes = 2;
if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
(!inat_accept_vex(insn->attr) &&
!inat_is_group(insn->attr))) {
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7..5770c80 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
vexonly_expr = "\\(v\\)"
# All opcodes with (ev) superscript supports *only* EVEX prefix
evexonly_expr = "\\(ev\\)"
+ # (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+ evex_scalable_expr = "\\(es\\)"

prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
# check VEX codes
if (match(ext, evexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+ else if (match(ext, evex_scalable_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index 2e65312..253690e 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -81,6 +81,7 @@
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
+#define INAT_EVEX_SCALABLE (1 << (INAT_FLAG_OFFS + 10))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -236,4 +237,9 @@ static inline int inat_must_evex(insn_attr_t attr)
{
return attr & INAT_EVEXONLY;
}
+
+static inline int inat_evex_scalable(insn_attr_t attr)
+{
+ return attr & INAT_EVEX_SCALABLE;
+}
#endif
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 1a7e8fc..0e5abd8 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -215,6 +215,13 @@ static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
return X86_VEX_P(insn->vex_prefix.bytes[2]);
}

+static inline insn_byte_t insn_vex_w_bit(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes < 3)
+ return 0;
+ return X86_VEX_W(insn->vex_prefix.bytes[2]);
+}
+
/* Get the last prefix id from last prefix or VEX prefix */
static inline int insn_last_prefix_id(struct insn *insn)
{
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index f761ade..a43b373 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -294,6 +294,10 @@ int insn_get_opcode(struct insn *insn)
m = insn_vex_m_bits(insn);
p = insn_vex_p_bits(insn);
insn->attr = inat_get_avx_attribute(op, m, p);
+ /* SCALABLE EVEX uses p bits to encode operand size */
+ if (inat_evex_scalable(insn->attr) && !insn_vex_w_bit(insn) &&
+ p == INAT_PFX_OPNDSZ)
+ insn->opnd_bytes = 2;
if ((inat_must_evex(insn->attr) && !insn_is_evex(insn)) ||
(!inat_accept_vex(insn->attr) &&
!inat_is_group(insn->attr))) {
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index 3f43aa7..5770c80 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -83,6 +83,8 @@ BEGIN {
vexonly_expr = "\\(v\\)"
# All opcodes with (ev) superscript supports *only* EVEX prefix
evexonly_expr = "\\(ev\\)"
+ # (es) is the same as (ev) but also "SCALABLE" i.e. W and pp determine operand size
+ evex_scalable_expr = "\\(es\\)"

prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
@@ -332,6 +334,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
# check VEX codes
if (match(ext, evexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY")
+ else if (match(ext, evex_scalable_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_EVEXONLY | INAT_EVEX_SCALABLE")
else if (match(ext, vexonly_expr))
flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))

2024-05-02 11:26:35

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add support for APX EVEX instructions to the opcode map

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 690ca3a3067f760bef92ca5db1c42490498ab5de
Gitweb: https://git.kernel.org/tip/690ca3a3067f760bef92ca5db1c42490498ab5de
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:51 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:46 +02:00

x86/insn: Add support for APX EVEX instructions to the opcode map

To support APX functionality, the EVEX prefix is used to:

- promote legacy instructions
- promote VEX instructions
- add new instructions

Promoted VEX instructions require no extra annotation because the opcodes
do not change and the permissive nature of the instruction decoder already
allows them to have an EVEX prefix.

Promoted legacy instructions and new instructions are placed in map 4 which
has not been used before.

Create a new table for map 4 and add APX instructions.

Annotate SCALABLE instructions with "(es)" - refer to patch "x86/insn: Add
support for APX EVEX to the instruction decoder logic". SCALABLE
instructions must be represented in both no-prefix (NP) and 66 prefix
forms.

Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 93 ++++++++++++++++++++++++++-
tools/arch/x86/lib/x86-opcode-map.txt | 93 ++++++++++++++++++++++++++-
2 files changed, 186 insertions(+)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 240ef71..caedb3e 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
#
# AVX Superscripts
# (ev): this opcode requires EVEX prefix.
+# (es): this opcode requires EVEX prefix and is SCALABALE.
# (evo): this opcode is changed by EVEX prefix (EVEX opcode)
# (v): this opcode requires VEX prefix.
# (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable

+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+# CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO Gv,Ev (es) | CMOVO Gv,Ev (66),(es) | CFCMOVO Ev,Ev (es) | CFCMOVO Ev,Ev (66),(es) | SETO Eb (F2),(ev)
+41: CMOVNO Gv,Ev (es) | CMOVNO Gv,Ev (66),(es) | CFCMOVNO Ev,Ev (es) | CFCMOVNO Ev,Ev (66),(es) | SETNO Eb (F2),(ev)
+42: CMOVB Gv,Ev (es) | CMOVB Gv,Ev (66),(es) | CFCMOVB Ev,Ev (es) | CFCMOVB Ev,Ev (66),(es) | SETB Eb (F2),(ev)
+43: CMOVNB Gv,Ev (es) | CMOVNB Gv,Ev (66),(es) | CFCMOVNB Ev,Ev (es) | CFCMOVNB Ev,Ev (66),(es) | SETNB Eb (F2),(ev)
+44: CMOVZ Gv,Ev (es) | CMOVZ Gv,Ev (66),(es) | CFCMOVZ Ev,Ev (es) | CFCMOVZ Ev,Ev (66),(es) | SETZ Eb (F2),(ev)
+45: CMOVNZ Gv,Ev (es) | CMOVNZ Gv,Ev (66),(es) | CFCMOVNZ Ev,Ev (es) | CFCMOVNZ Ev,Ev (66),(es) | SETNZ Eb (F2),(ev)
+46: CMOVBE Gv,Ev (es) | CMOVBE Gv,Ev (66),(es) | CFCMOVBE Ev,Ev (es) | CFCMOVBE Ev,Ev (66),(es) | SETBE Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS Gv,Ev (es) | CMOVS Gv,Ev (66),(es) | CFCMOVS Ev,Ev (es) | CFCMOVS Ev,Ev (66),(es) | SETS Eb (F2),(ev)
+49: CMOVNS Gv,Ev (es) | CMOVNS Gv,Ev (66),(es) | CFCMOVNS Ev,Ev (es) | CFCMOVNS Ev,Ev (66),(es) | SETNS Eb (F2),(ev)
+4a: CMOVP Gv,Ev (es) | CMOVP Gv,Ev (66),(es) | CFCMOVP Ev,Ev (es) | CFCMOVP Ev,Ev (66),(es) | SETP Eb (F2),(ev)
+4b: CMOVNP Gv,Ev (es) | CMOVNP Gv,Ev (66),(es) | CFCMOVNP Ev,Ev (es) | CFCMOVNP Ev,Ev (66),(es) | SETNP Eb (F2),(ev)
+4c: CMOVL Gv,Ev (es) | CMOVL Gv,Ev (66),(es) | CFCMOVL Ev,Ev (es) | CFCMOVL Ev,Ev (66),(es) | SETL Eb (F2),(ev)
+4d: CMOVNL Gv,Ev (es) | CMOVNL Gv,Ev (66),(es) | CFCMOVNL Ev,Ev (es) | CFCMOVNL Ev,Ev (66),(es) | SETNL Eb (F2),(ev)
+4e: CMOVLE Gv,Ev (es) | CMOVLE Gv,Ev (66),(es) | CFCMOVLE Ev,Ev (es) | CFCMOVLE Ev,Ev (66),(es) | SETLE Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+# CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
Table: EVEX map 5
Referrer:
AVXcode: 5
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 240ef71..caedb3e 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -23,6 +23,7 @@
#
# AVX Superscripts
# (ev): this opcode requires EVEX prefix.
+# (es): this opcode requires EVEX prefix and is SCALABALE.
# (evo): this opcode is changed by EVEX prefix (EVEX opcode)
# (v): this opcode requires VEX prefix.
# (v1): this opcode only supports 128bit VEX.
@@ -929,6 +930,98 @@ df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable

+Table: EVEX map 4
+Referrer:
+AVXcode: 4
+00: ADD Eb,Gb (ev)
+01: ADD Ev,Gv (es) | ADD Ev,Gv (66),(es)
+02: ADD Gb,Eb (ev)
+03: ADD Gv,Ev (es) | ADD Gv,Ev (66),(es)
+08: OR Eb,Gb (ev)
+09: OR Ev,Gv (es) | OR Ev,Gv (66),(es)
+0a: OR Gb,Eb (ev)
+0b: OR Gv,Ev (es) | OR Gv,Ev (66),(es)
+10: ADC Eb,Gb (ev)
+11: ADC Ev,Gv (es) | ADC Ev,Gv (66),(es)
+12: ADC Gb,Eb (ev)
+13: ADC Gv,Ev (es) | ADC Gv,Ev (66),(es)
+18: SBB Eb,Gb (ev)
+19: SBB Ev,Gv (es) | SBB Ev,Gv (66),(es)
+1a: SBB Gb,Eb (ev)
+1b: SBB Gv,Ev (es) | SBB Gv,Ev (66),(es)
+20: AND Eb,Gb (ev)
+21: AND Ev,Gv (es) | AND Ev,Gv (66),(es)
+22: AND Gb,Eb (ev)
+23: AND Gv,Ev (es) | AND Gv,Ev (66),(es)
+24: SHLD Ev,Gv,Ib (es) | SHLD Ev,Gv,Ib (66),(es)
+28: SUB Eb,Gb (ev)
+29: SUB Ev,Gv (es) | SUB Ev,Gv (66),(es)
+2a: SUB Gb,Eb (ev)
+2b: SUB Gv,Ev (es) | SUB Gv,Ev (66),(es)
+2c: SHRD Ev,Gv,Ib (es) | SHRD Ev,Gv,Ib (66),(es)
+30: XOR Eb,Gb (ev)
+31: XOR Ev,Gv (es) | XOR Ev,Gv (66),(es)
+32: XOR Gb,Eb (ev)
+33: XOR Gv,Ev (es) | XOR Gv,Ev (66),(es)
+# CCMPSCC instructions are: CCOMB, CCOMBE, CCOMF, CCOML, CCOMLE, CCOMNB, CCOMNBE, CCOMNL, CCOMNLE,
+# CCOMNO, CCOMNS, CCOMNZ, CCOMO, CCOMS, CCOMT, CCOMZ
+38: CCMPSCC Eb,Gb (ev)
+39: CCMPSCC Ev,Gv (es) | CCMPSCC Ev,Gv (66),(es)
+3a: CCMPSCC Gv,Ev (ev)
+3b: CCMPSCC Gv,Ev (es) | CCMPSCC Gv,Ev (66),(es)
+40: CMOVO Gv,Ev (es) | CMOVO Gv,Ev (66),(es) | CFCMOVO Ev,Ev (es) | CFCMOVO Ev,Ev (66),(es) | SETO Eb (F2),(ev)
+41: CMOVNO Gv,Ev (es) | CMOVNO Gv,Ev (66),(es) | CFCMOVNO Ev,Ev (es) | CFCMOVNO Ev,Ev (66),(es) | SETNO Eb (F2),(ev)
+42: CMOVB Gv,Ev (es) | CMOVB Gv,Ev (66),(es) | CFCMOVB Ev,Ev (es) | CFCMOVB Ev,Ev (66),(es) | SETB Eb (F2),(ev)
+43: CMOVNB Gv,Ev (es) | CMOVNB Gv,Ev (66),(es) | CFCMOVNB Ev,Ev (es) | CFCMOVNB Ev,Ev (66),(es) | SETNB Eb (F2),(ev)
+44: CMOVZ Gv,Ev (es) | CMOVZ Gv,Ev (66),(es) | CFCMOVZ Ev,Ev (es) | CFCMOVZ Ev,Ev (66),(es) | SETZ Eb (F2),(ev)
+45: CMOVNZ Gv,Ev (es) | CMOVNZ Gv,Ev (66),(es) | CFCMOVNZ Ev,Ev (es) | CFCMOVNZ Ev,Ev (66),(es) | SETNZ Eb (F2),(ev)
+46: CMOVBE Gv,Ev (es) | CMOVBE Gv,Ev (66),(es) | CFCMOVBE Ev,Ev (es) | CFCMOVBE Ev,Ev (66),(es) | SETBE Eb (F2),(ev)
+47: CMOVNBE Gv,Ev (es) | CMOVNBE Gv,Ev (66),(es) | CFCMOVNBE Ev,Ev (es) | CFCMOVNBE Ev,Ev (66),(es) | SETNBE Eb (F2),(ev)
+48: CMOVS Gv,Ev (es) | CMOVS Gv,Ev (66),(es) | CFCMOVS Ev,Ev (es) | CFCMOVS Ev,Ev (66),(es) | SETS Eb (F2),(ev)
+49: CMOVNS Gv,Ev (es) | CMOVNS Gv,Ev (66),(es) | CFCMOVNS Ev,Ev (es) | CFCMOVNS Ev,Ev (66),(es) | SETNS Eb (F2),(ev)
+4a: CMOVP Gv,Ev (es) | CMOVP Gv,Ev (66),(es) | CFCMOVP Ev,Ev (es) | CFCMOVP Ev,Ev (66),(es) | SETP Eb (F2),(ev)
+4b: CMOVNP Gv,Ev (es) | CMOVNP Gv,Ev (66),(es) | CFCMOVNP Ev,Ev (es) | CFCMOVNP Ev,Ev (66),(es) | SETNP Eb (F2),(ev)
+4c: CMOVL Gv,Ev (es) | CMOVL Gv,Ev (66),(es) | CFCMOVL Ev,Ev (es) | CFCMOVL Ev,Ev (66),(es) | SETL Eb (F2),(ev)
+4d: CMOVNL Gv,Ev (es) | CMOVNL Gv,Ev (66),(es) | CFCMOVNL Ev,Ev (es) | CFCMOVNL Ev,Ev (66),(es) | SETNL Eb (F2),(ev)
+4e: CMOVLE Gv,Ev (es) | CMOVLE Gv,Ev (66),(es) | CFCMOVLE Ev,Ev (es) | CFCMOVLE Ev,Ev (66),(es) | SETLE Eb (F2),(ev)
+4f: CMOVNLE Gv,Ev (es) | CMOVNLE Gv,Ev (66),(es) | CFCMOVNLE Ev,Ev (es) | CFCMOVNLE Ev,Ev (66),(es) | SETNLE Eb (F2),(ev)
+60: MOVBE Gv,Ev (es) | MOVBE Gv,Ev (66),(es)
+61: MOVBE Ev,Gv (es) | MOVBE Ev,Gv (66),(es)
+65: WRUSSD Md,Gd (66),(ev) | WRUSSQ Mq,Gq (66),(ev)
+66: ADCX Gy,Ey (66),(ev) | ADOX Gy,Ey (F3),(ev) | WRSSD Md,Gd (ev) | WRSSQ Mq,Gq (66),(ev)
+69: IMUL Gv,Ev,Iz (es) | IMUL Gv,Ev,Iz (66),(es)
+6b: IMUL Gv,Ev,Ib (es) | IMUL Gv,Ev,Ib (66),(es)
+80: Grp1 Eb,Ib (1A),(ev)
+81: Grp1 Ev,Iz (1A),(es)
+83: Grp1 Ev,Ib (1A),(es)
+# CTESTSCC instructions are: CTESTB, CTESTBE, CTESTF, CTESTL, CTESTLE, CTESTNB, CTESTNBE, CTESTNL,
+# CTESTNLE, CTESTNO, CTESTNS, CTESTNZ, CTESTO, CTESTS, CTESTT, CTESTZ
+84: CTESTSCC (ev)
+85: CTESTSCC (es) | CTESTSCC (66),(es)
+88: POPCNT Gv,Ev (es) | POPCNT Gv,Ev (66),(es)
+8f: POP2 Bq,Rq (000),(11B),(ev)
+a5: SHLD Ev,Gv,CL (es) | SHLD Ev,Gv,CL (66),(es)
+ad: SHRD Ev,Gv,CL (es) | SHRD Ev,Gv,CL (66),(es)
+af: IMUL Gv,Ev (es) | IMUL Gv,Ev (66),(es)
+c0: Grp2 Eb,Ib (1A),(ev)
+c1: Grp2 Ev,Ib (1A),(es)
+d0: Grp2 Eb,1 (1A),(ev)
+d1: Grp2 Ev,1 (1A),(es)
+d2: Grp2 Eb,CL (1A),(ev)
+d3: Grp2 Ev,CL (1A),(es)
+f0: CRC32 Gy,Eb (es) | INVEPT Gq,Mdq (F3),(ev)
+f1: CRC32 Gy,Ey (es) | CRC32 Gy,Ey (66),(es) | INVVPID Gy,Mdq (F3),(ev)
+f2: INVPCID Gy,Mdq (F3),(ev)
+f4: TZCNT Gv,Ev (es) | TZCNT Gv,Ev (66),(es)
+f5: LZCNT Gv,Ev (es) | LZCNT Gv,Ev (66),(es)
+f6: Grp3_1 Eb (1A),(ev)
+f7: Grp3_2 Ev (1A),(es)
+f8: MOVDIR64B Gv,Mdqq (66),(ev) | ENQCMD Gv,Mdqq (F2),(ev) | ENQCMDS Gv,Mdqq (F3),(ev) | URDMSR Rq,Gq (F2),(11B),(ev) | UWRMSR Gq,Rq (F3),(11B),(ev)
+f9: MOVDIRI My,Gy (ev)
+fe: Grp4 (1A),(ev)
+ff: Grp5 (1A),(es) | PUSH2 Bq,Rq (110),(11B),(ev)
+EndTable
+
Table: EVEX map 5
Referrer:
AVXcode: 5

2024-05-02 11:26:39

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add misc new Intel instructions

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 9dd3612895de3d967ebd607423571bd6b0854ccc
Gitweb: https://git.kernel.org/tip/9dd3612895de3d967ebd607423571bd6b0854ccc
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:47 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:43 +02:00

x86/insn: Add misc new Intel instructions

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Add instructions documented in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference March 2024
319433-052, that have not been added yet:

AADD
AAND
AOR
AXOR
CMPccXADD
PBNDKB
RDMSRLIST
URDMSR
UWRMSR
VBCSTNEBF162PS
VBCSTNESH2PS
VCVTNEEBF162PS
VCVTNEEPH2PS
VCVTNEOBF162PS
VCVTNEOPH2PS
VCVTNEPS2BF16
VPDPB[SU,UU,SS]D[,S]
VPDPW[SU,US,UU]D[,S]
VPMADD52HUQ
VPMADD52LUQ
VSHA512MSG1
VSHA512MSG2
VSHA512RNDS2
VSM3MSG1
VSM3MSG2
VSM3RNDS2
VSM4KEY4
VSM4RNDS4
WRMSRLIST
TCMMIMFP16PS
TCMMRLFP16PS
TDPFP16PS
PREFETCHIT1
PREFETCHIT0

Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 57 ++++++++++++++++++++------
tools/arch/x86/lib/x86-opcode-map.txt | 57 ++++++++++++++++++++------
2 files changed, 90 insertions(+), 24 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 24941b9..4e9f535 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
# Skip 0x5d
5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
# Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
65: vblendmps/d Vx,Hx,Wx (66),(ev)
66: vpblendmb/w Vx,Hx,Wx (66),(ev)
68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
70: vpshldvw Vx,Hx,Wx (66),(ev)
71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
75: vpermi2b/w Vx,Hx,Wx (66),(ev)
76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
c9: sha1msg1 Vdq,Wdq
ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
db: VAESIMC Vdq,Wdq (66),(v1)
dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
fa: ENCODEKEY128 Ew,Ew (F3)
fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
cc: sha1rnds4 Vdq,Wdq,Ib
ce: vgf2p8affineqb Vx,Wx,Ib (66)
cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
EndTable

+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
GrpTable: Grp1
0: ADD
1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
EndTable

GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
1: prefetch T0
2: prefetch T1
3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
EndTable

GrpTable: Grp17
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 24941b9..4e9f535 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,8 +698,8 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66)
-51: vpdpbusds Vx,Hx,Wx (66)
+50: vpdpbusd Vx,Hx,Wx (66) | vpdpbssd Vx,Hx,Wx (F2),(v) | vpdpbsud Vx,Hx,Wx (F3),(v) | vpdpbuud Vx,Hx,Wx (v)
+51: vpdpbusds Vx,Hx,Wx (66) | vpdpbssds Vx,Hx,Wx (F2),(v) | vpdpbsuds Vx,Hx,Wx (F3),(v) | vpdpbuuds Vx,Hx,Wx (v)
52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
@@ -708,7 +708,7 @@ AVXcode: 2
59: vpbroadcastq Vx,Wx (66),(v) | vbroadcasti32x2 Vx,Wx (66),(evo)
5a: vbroadcasti128 Vqq,Mdq (66),(v) | vbroadcasti32x4/64x2 Vx,Wx (66),(evo)
5b: vbroadcasti32x8/64x4 Vqq,Mdq (66),(ev)
-5c: TDPBF16PS Vt,Wt,Ht (F3),(v1)
+5c: TDPBF16PS Vt,Wt,Ht (F3),(v1) | TDPFP16PS Vt,Wt,Ht (F2),(v1),(o64)
# Skip 0x5d
5e: TDPBSSD Vt,Wt,Ht (F2),(v1) | TDPBSUD Vt,Wt,Ht (F3),(v1) | TDPBUSD Vt,Wt,Ht (66),(v1) | TDPBUUD Vt,Wt,Ht (v1)
# Skip 0x5f-0x61
@@ -718,10 +718,12 @@ AVXcode: 2
65: vblendmps/d Vx,Hx,Wx (66),(ev)
66: vpblendmb/w Vx,Hx,Wx (66),(ev)
68: vp2intersectd/q Kx,Hx,Wx (F2),(ev)
-# Skip 0x69-0x6f
+# Skip 0x69-0x6b
+6c: TCMMIMFP16PS Vt,Wt,Ht (66),(v1),(o64) | TCMMRLFP16PS Vt,Wt,Ht (v1),(o64)
+# Skip 0x6d-0x6f
70: vpshldvw Vx,Hx,Wx (66),(ev)
71: vpshldvd/q Vx,Hx,Wx (66),(ev)
-72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3),(ev) | vpshrdvw Vx,Hx,Wx (66),(ev)
+72: vcvtne2ps2bf16 Vx,Hx,Wx (F2),(ev) | vcvtneps2bf16 Vx,Wx (F3) | vpshrdvw Vx,Hx,Wx (66),(ev)
73: vpshrdvd/q Vx,Hx,Wx (66),(ev)
75: vpermi2b/w Vx,Hx,Wx (66),(ev)
76: vpermi2d/q Vx,Hx,Wx (66),(ev)
@@ -777,8 +779,10 @@ ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
-b4: vpmadd52luq Vx,Hx,Wx (66),(ev)
-b5: vpmadd52huq Vx,Hx,Wx (66),(ev)
+b0: vcvtneebf162ps Vx,Mx (F3),(!11B),(v) | vcvtneeph2ps Vx,Mx (66),(!11B),(v) | vcvtneobf162ps Vx,Mx (F2),(!11B),(v) | vcvtneoph2ps Vx,Mx (!11B),(v)
+b1: vbcstnebf162ps Vx,Mw (F3),(!11B),(v) | vbcstnesh2ps Vx,Mw (66),(!11B),(v)
+b4: vpmadd52luq Vx,Hx,Wx (66)
+b5: vpmadd52huq Vx,Hx,Wx (66)
b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
@@ -796,16 +800,35 @@ c7: Grp19 (1A)
c8: sha1nexte Vdq,Wdq | vexp2ps/d Vx,Wx (66),(ev)
c9: sha1msg1 Vdq,Wdq
ca: sha1msg2 Vdq,Wdq | vrcp28ps/d Vx,Wx (66),(ev)
-cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
-cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
-cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
+cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev) | vsha512rnds2 Vqq,Hqq,Udq (F2),(11B),(v)
+cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev) | vsha512msg1 Vqq,Udq (F2),(11B),(v)
+cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev) | vsha512msg2 Vqq,Uqq (F2),(11B),(v)
cf: vgf2p8mulb Vx,Wx (66)
+d2: vpdpwsud Vx,Hx,Wx (F3),(v) | vpdpwusd Vx,Hx,Wx (66),(v) | vpdpwuud Vx,Hx,Wx (v)
+d3: vpdpwsuds Vx,Hx,Wx (F3),(v) | vpdpwusds Vx,Hx,Wx (66),(v) | vpdpwuuds Vx,Hx,Wx (v)
d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
+da: vsm3msg1 Vdq,Hdq,Udq (v1) | vsm3msg2 Vdq,Hdq,Udq (66),(v1) | vsm4key4 Vx,Hx,Wx (F3),(v) | vsm4rnds4 Vx,Hx,Wx (F2),(v)
db: VAESIMC Vdq,Wdq (66),(v1)
dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
+e0: CMPOXADD My,Gy,By (66),(v1),(o64)
+e1: CMPNOXADD My,Gy,By (66),(v1),(o64)
+e2: CMPBXADD My,Gy,By (66),(v1),(o64)
+e3: CMPNBXADD My,Gy,By (66),(v1),(o64)
+e4: CMPZXADD My,Gy,By (66),(v1),(o64)
+e5: CMPNZXADD My,Gy,By (66),(v1),(o64)
+e6: CMPBEXADD My,Gy,By (66),(v1),(o64)
+e7: CMPNBEXADD My,Gy,By (66),(v1),(o64)
+e8: CMPSXADD My,Gy,By (66),(v1),(o64)
+e9: CMPNSXADD My,Gy,By (66),(v1),(o64)
+ea: CMPPXADD My,Gy,By (66),(v1),(o64)
+eb: CMPNPXADD My,Gy,By (66),(v1),(o64)
+ec: CMPLXADD My,Gy,By (66),(v1),(o64)
+ed: CMPNLXADD My,Gy,By (66),(v1),(o64)
+ee: CMPLEXADD My,Gy,By (66),(v1),(o64)
+ef: CMPNLEXADD My,Gy,By (66),(v1),(o64)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -813,10 +836,11 @@ f3: Grp17 (1A)
f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v) | WRUSSD/Q My,Gy (66)
f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,Gy
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
-f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
+f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3) | URDMSR Rq,Gq (F2),(11B) | UWRMSR Gq,Rq (F3),(11B)
f9: MOVDIRI My,Gy
fa: ENCODEKEY128 Ew,Ew (F3)
fb: ENCODEKEY256 Ew,Ew (F3)
+fc: AADD My,Gy | AAND My,Gy (66) | AOR My,Gy (F2) | AXOR My,Gy (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
@@ -896,6 +920,7 @@ c2: vcmpph Vx,Hx,Wx,Ib (ev) | vcmpsh Vx,Hx,Wx,Ib (F3),(ev)
cc: sha1rnds4 Vdq,Wdq,Ib
ce: vgf2p8affineqb Vx,Wx,Ib (66)
cf: vgf2p8affineinvqb Vx,Wx,Ib (66)
+de: vsm3rnds2 Vdq,Hdq,Wdq,Ib (66),(v1)
df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
f0: RORX Gy,Ey,Ib (F2),(v) | HRESET Gv,Ib (F3),(000),(11B)
EndTable
@@ -978,6 +1003,12 @@ d6: vfcmulcph Vx,Hx,Wx (F2),(ev) | vfmulcph Vx,Hx,Wx (F3),(ev)
d7: vfcmulcsh Vx,Hx,Wx (F2),(ev) | vfmulcsh Vx,Hx,Wx (F3),(ev)
EndTable

+Table: VEX map 7
+Referrer:
+AVXcode: 7
+f8: URDMSR Rq,Id (F2),(v1),(11B) | UWRMSR Id,Rq (F3),(v1),(11B)
+EndTable
+
GrpTable: Grp1
0: ADD
1: OR
@@ -1054,7 +1085,7 @@ GrpTable: Grp6
EndTable

GrpTable: Grp7
-0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B)
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B) | PCONFIG (101),(11B) | ENCLV (000),(11B) | WRMSRNS (110),(11B) | RDMSRLIST (F2),(110),(11B) | WRMSRLIST (F3),(110),(11B) | PBNDKB (111),(11B)
1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B) | ENCLS (111),(11B) | ERETU (F3),(010),(11B) | ERETS (F2),(010),(11B)
2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B) | ENCLU (111),(11B)
3: LIDT Ms
@@ -1140,6 +1171,8 @@ GrpTable: Grp16
1: prefetch T0
2: prefetch T1
3: prefetch T2
+6: prefetch IT1
+7: prefetch IT0
EndTable

GrpTable: Grp17

2024-05-02 11:26:44

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add support for REX2 prefix to the instruction decoder logic

The following commit has been merged into the perf/core branch of tip:

Commit-ID: eada38d575a2b947b3ffefd570fea90a5a17feb3
Gitweb: https://git.kernel.org/tip/eada38d575a2b947b3ffefd570fea90a5a17feb3
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:48 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:44 +02:00

x86/insn: Add support for REX2 prefix to the instruction decoder logic

Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.

The REX2 prefix is effectively an extended version of the REX prefix.

REX2 and EVEX are also used with PUSH/POP instructions to provide a
Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
fast-forward register data between matching PUSH and POP instructions.

REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
other maps is provided by the EVEX prefix, covered in a separate patch.

Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
for a new 64-bit absolute direct jump instruction JMPABS.

Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
Specification for details.

Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
opcodes (only JMPABS) that require a mandatory REX2 prefix
(INAT_REX2_VARIANT).

Amend logic to read the REX2 prefix and get the opcode attribute for the
map number (0 or 1) encoded in the REX2 prefix.

Amend the awk script that generates the attribute tables from the opcode
map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.

Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/include/asm/inat.h | 11 ++++++++-
arch/x86/include/asm/insn.h | 25 +++++++++++++++++----
arch/x86/lib/insn.c | 25 +++++++++++++++++++++-
arch/x86/tools/gen-insn-attr-x86.awk | 11 ++++++++-
tools/arch/x86/include/asm/inat.h | 11 ++++++++-
tools/arch/x86/include/asm/insn.h | 25 +++++++++++++++++----
tools/arch/x86/lib/insn.c | 25 +++++++++++++++++++++-
tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 ++++++++-
8 files changed, 132 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index b56c574..1331bdd 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
#define INAT_PFX_EVEX 15 /* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2 16 /* 0xD5 */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -50,7 +52,7 @@

/* Legacy prefix */
#define INAT_PFX_OFFS 0
-#define INAT_PFX_BITS 4
+#define INAT_PFX_BITS 5
#define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
#define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
/* Escape opcodes */
@@ -77,6 +79,8 @@
#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
}

+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 1b29f58..95249ec 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
#define X86_SIB_BASE(sib) ((sib) & 0x07)

-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */

/* VEX bit flags */
#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
/* Instruction uses RIP-relative addressing */
extern int insn_rip_relative(struct insn *insn);

+static inline int insn_is_rex2(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+ return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
static inline int insn_is_avx(struct insn *insn)
{
if (!insn->prefixes.got)
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 1bb155a..6126ddc 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ found:
if (X86_REX_W(b))
/* REX.W overrides opnd_size */
insn->opnd_bytes = 8;
+ } else if (inat_is_rex2_prefix(attr)) {
+ insn_set_byte(&insn->rex_prefix, 0, b);
+ b = peek_nbyte_next(insn_byte_t, insn, 1);
+ insn_set_byte(&insn->rex_prefix, 1, b);
+ insn->rex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ if (X86_REX_W(b))
+ /* REX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ insn->rex_prefix.got = 1;
+ goto vex_end;
}
}
insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
goto end;
}

+ /* Check if there is REX2 prefix or not */
+ if (insn_is_rex2(insn)) {
+ if (insn_rex2_m_bit(insn)) {
+ /* map 1 is escape 0x0f */
+ insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+ pfx_id = insn_last_prefix_id(insn);
+ insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+ } else {
+ insn->attr = inat_get_opcode_attribute(op);
+ }
+ goto end;
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index af38469..3f43aa7 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {

modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
- rex_expr = "^REX(\\.[XRWB]+)*"
+ rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+ rex2_expr = "\\(REX2\\)"
+ no_rex2_expr = "\\(!REX2\\)"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
prefix_num["EVEX"] = "INAT_PFX_EVEX"
+ prefix_num["REX2"] = "INAT_PFX_REX2"

clear_vars()
}
@@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
if (match(ext, force64_expr))
flags = add_flags(flags, "INAT_FORCE64")

+ # check REX2 not allowed
+ if (match(ext, no_rex2_expr))
+ flags = add_flags(flags, "INAT_NO_REX2")
+
# check REX prefix
if (match(opcode, rex_expr))
flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
lptable3[idx] = add_flags(lptable3[idx],flags)
variant = "INAT_VARIANT"
}
+ if (match(ext, rex2_expr))
+ table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
if (!match(ext, lprefix_expr)){
table[idx] = add_flags(table[idx],flags)
}
diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
index a610514..2e65312 100644
--- a/tools/arch/x86/include/asm/inat.h
+++ b/tools/arch/x86/include/asm/inat.h
@@ -35,6 +35,8 @@
#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
#define INAT_PFX_EVEX 15 /* EVEX prefix */
+/* x86-64 REX2 prefix */
+#define INAT_PFX_REX2 16 /* 0xD5 */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -50,7 +52,7 @@

/* Legacy prefix */
#define INAT_PFX_OFFS 0
-#define INAT_PFX_BITS 4
+#define INAT_PFX_BITS 5
#define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
#define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
/* Escape opcodes */
@@ -77,6 +79,8 @@
#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
#define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
+#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
+#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
}

+static inline int inat_is_rex2_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
index 65c0d9c..1a7e8fc 100644
--- a/tools/arch/x86/include/asm/insn.h
+++ b/tools/arch/x86/include/asm/insn.h
@@ -112,10 +112,15 @@ struct insn {
#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
#define X86_SIB_BASE(sib) ((sib) & 0x07)

-#define X86_REX_W(rex) ((rex) & 8)
-#define X86_REX_R(rex) ((rex) & 4)
-#define X86_REX_X(rex) ((rex) & 2)
-#define X86_REX_B(rex) ((rex) & 1)
+#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
+#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
+#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
+#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
+
+#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
+#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
+#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
+#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */

/* VEX bit flags */
#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
@@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
/* Instruction uses RIP-relative addressing */
extern int insn_rip_relative(struct insn *insn);

+static inline int insn_is_rex2(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return insn->rex_prefix.nbytes == 2;
+}
+
+static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
+{
+ return X86_REX2_M(insn->rex_prefix.bytes[1]);
+}
+
static inline int insn_is_avx(struct insn *insn)
{
if (!insn->prefixes.got)
diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
index ada4b4a..f761ade 100644
--- a/tools/arch/x86/lib/insn.c
+++ b/tools/arch/x86/lib/insn.c
@@ -185,6 +185,17 @@ found:
if (X86_REX_W(b))
/* REX.W overrides opnd_size */
insn->opnd_bytes = 8;
+ } else if (inat_is_rex2_prefix(attr)) {
+ insn_set_byte(&insn->rex_prefix, 0, b);
+ b = peek_nbyte_next(insn_byte_t, insn, 1);
+ insn_set_byte(&insn->rex_prefix, 1, b);
+ insn->rex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ if (X86_REX_W(b))
+ /* REX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ insn->rex_prefix.got = 1;
+ goto vex_end;
}
}
insn->rex_prefix.got = 1;
@@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
goto end;
}

+ /* Check if there is REX2 prefix or not */
+ if (insn_is_rex2(insn)) {
+ if (insn_rex2_m_bit(insn)) {
+ /* map 1 is escape 0x0f */
+ insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
+
+ pfx_id = insn_last_prefix_id(insn);
+ insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
+ } else {
+ insn->attr = inat_get_opcode_attribute(op);
+ }
+ goto end;
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
index af38469..3f43aa7 100644
--- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
@@ -64,7 +64,9 @@ BEGIN {

modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
- rex_expr = "^REX(\\.[XRWB]+)*"
+ rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
+ rex2_expr = "\\(REX2\\)"
+ no_rex2_expr = "\\(!REX2\\)"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\((66|!F3)\\)"
@@ -99,6 +101,7 @@ BEGIN {
prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
prefix_num["EVEX"] = "INAT_PFX_EVEX"
+ prefix_num["REX2"] = "INAT_PFX_REX2"

clear_vars()
}
@@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
if (match(ext, force64_expr))
flags = add_flags(flags, "INAT_FORCE64")

+ # check REX2 not allowed
+ if (match(ext, no_rex2_expr))
+ flags = add_flags(flags, "INAT_NO_REX2")
+
# check REX prefix
if (match(opcode, rex_expr))
flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
@@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
lptable3[idx] = add_flags(lptable3[idx],flags)
variant = "INAT_VARIANT"
}
+ if (match(ext, rex2_expr))
+ table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
if (!match(ext, lprefix_expr)){
table[idx] = add_flags(table[idx],flags)
}

2024-05-02 11:27:20

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 159039af8c074a14545fd695c38d2772b3c98b12
Gitweb: https://git.kernel.org/tip/159039af8c074a14545fd695c38d2772b3c98b12
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:49 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:44 +02:00

x86/insn: x86/insn: Add support for REX2 prefix to the instruction decoder opcode map

Support for REX2 has been added to the instruction decoder logic and the
awk script that generates the attribute tables from the opcode map.

Add REX2 prefix byte (0xD5) to the opcode map.

Add annotation (!REX2) for map 0/1 opcodes that are reserved under REX2.

Add JMPABS to the opcode map and add annotation (REX2) to identify that it
has a mandatory REX2 prefix. A separate opcode attribute table is not
needed at this time because JMPABS has the same attribute encoding as the
MOV instruction that it shares an opcode with i.e. INAT_MOFFSET.

Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 148 ++++++++++++-------------
tools/arch/x86/lib/x86-opcode-map.txt | 148 ++++++++++++-------------
2 files changed, 152 insertions(+), 144 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4e9f535..240ef71 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
# - (F2): the last prefix is 0xF2
# - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
# - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+# - (!REX2): REX2 is not allowed
+# - (REX2): REX2 variant e.g. JMPABS

Table: one byte opcode
Referrer:
@@ -157,22 +161,22 @@ AVXcode:
6e: OUTS/OUTSB DX,Xb
6f: OUTS/OUTSW/OUTSD DX,Xz
# 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
# 0x80 - 0x8f
80: Grp1 Eb,Ib (1A)
81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
9e: SAHF
9f: LAHF
# 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
# Note: The May 2011 Intel manual shows Xv for the second parameter of the
# next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
# 0xb0 - 0xbf
b0: MOV AL/R8L,Ib
b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
d2: Grp2 Eb,CL (1A)
d3: Grp2 Ev,CL (1A)
d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
d6:
d7: XLAT/XLATB
d8: ESC
@@ -281,26 +285,26 @@ df: ESC
# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
# in "near" jumps and calls is 16-bit. For CALL,
# push of return address is 16-bit wide, RSP is decremented by 2
# but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
# 0xf0 - 0xff
f0: LOCK (Prefix)
f1:
@@ -386,14 +390,14 @@ AVXcode: 1
2e: vucomiss Vss,Wss (v1) | vucomisd Vsd,Wsd (66),(v1)
2f: vcomiss Vss,Wss (v1) | vcomisd Vsd,Wsd (66),(v1)
# 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
36:
-37: GETSEC
+37: GETSEC (!REX2)
38: escape # 3-byte escape 1
39:
3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
# 0x0f 0x80-0x8f
# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
# 0x0f 0x90-0x9f
90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4e9f535..240ef71 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -33,6 +33,10 @@
# - (F2): the last prefix is 0xF2
# - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
# - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+#
+# REX2 Prefix
+# - (!REX2): REX2 is not allowed
+# - (REX2): REX2 variant e.g. JMPABS

Table: one byte opcode
Referrer:
@@ -157,22 +161,22 @@ AVXcode:
6e: OUTS/OUTSB DX,Xb
6f: OUTS/OUTSW/OUTSD DX,Xz
# 0x70 - 0x7f
-70: JO Jb
-71: JNO Jb
-72: JB/JNAE/JC Jb
-73: JNB/JAE/JNC Jb
-74: JZ/JE Jb
-75: JNZ/JNE Jb
-76: JBE/JNA Jb
-77: JNBE/JA Jb
-78: JS Jb
-79: JNS Jb
-7a: JP/JPE Jb
-7b: JNP/JPO Jb
-7c: JL/JNGE Jb
-7d: JNL/JGE Jb
-7e: JLE/JNG Jb
-7f: JNLE/JG Jb
+70: JO Jb (!REX2)
+71: JNO Jb (!REX2)
+72: JB/JNAE/JC Jb (!REX2)
+73: JNB/JAE/JNC Jb (!REX2)
+74: JZ/JE Jb (!REX2)
+75: JNZ/JNE Jb (!REX2)
+76: JBE/JNA Jb (!REX2)
+77: JNBE/JA Jb (!REX2)
+78: JS Jb (!REX2)
+79: JNS Jb (!REX2)
+7a: JP/JPE Jb (!REX2)
+7b: JNP/JPO Jb (!REX2)
+7c: JL/JNGE Jb (!REX2)
+7d: JNL/JGE Jb (!REX2)
+7e: JLE/JNG Jb (!REX2)
+7f: JNLE/JG Jb (!REX2)
# 0x80 - 0x8f
80: Grp1 Eb,Ib (1A)
81: Grp1 Ev,Iz (1A)
@@ -208,24 +212,24 @@ AVXcode:
9e: SAHF
9f: LAHF
# 0xa0 - 0xaf
-a0: MOV AL,Ob
-a1: MOV rAX,Ov
-a2: MOV Ob,AL
-a3: MOV Ov,rAX
-a4: MOVS/B Yb,Xb
-a5: MOVS/W/D/Q Yv,Xv
-a6: CMPS/B Xb,Yb
-a7: CMPS/W/D Xv,Yv
-a8: TEST AL,Ib
-a9: TEST rAX,Iz
-aa: STOS/B Yb,AL
-ab: STOS/W/D/Q Yv,rAX
-ac: LODS/B AL,Xb
-ad: LODS/W/D/Q rAX,Xv
-ae: SCAS/B AL,Yb
+a0: MOV AL,Ob (!REX2)
+a1: MOV rAX,Ov (!REX2) | JMPABS O (REX2),(o64)
+a2: MOV Ob,AL (!REX2)
+a3: MOV Ov,rAX (!REX2)
+a4: MOVS/B Yb,Xb (!REX2)
+a5: MOVS/W/D/Q Yv,Xv (!REX2)
+a6: CMPS/B Xb,Yb (!REX2)
+a7: CMPS/W/D Xv,Yv (!REX2)
+a8: TEST AL,Ib (!REX2)
+a9: TEST rAX,Iz (!REX2)
+aa: STOS/B Yb,AL (!REX2)
+ab: STOS/W/D/Q Yv,rAX (!REX2)
+ac: LODS/B AL,Xb (!REX2)
+ad: LODS/W/D/Q rAX,Xv (!REX2)
+ae: SCAS/B AL,Yb (!REX2)
# Note: The May 2011 Intel manual shows Xv for the second parameter of the
# next instruction but Yv is correct
-af: SCAS/W/D/Q rAX,Yv
+af: SCAS/W/D/Q rAX,Yv (!REX2)
# 0xb0 - 0xbf
b0: MOV AL/R8L,Ib
b1: MOV CL/R9L,Ib
@@ -266,7 +270,7 @@ d1: Grp2 Ev,1 (1A)
d2: Grp2 Eb,CL (1A)
d3: Grp2 Ev,CL (1A)
d4: AAM Ib (i64)
-d5: AAD Ib (i64)
+d5: AAD Ib (i64) | REX2 (Prefix),(o64)
d6:
d7: XLAT/XLATB
d8: ESC
@@ -281,26 +285,26 @@ df: ESC
# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
-e0: LOOPNE/LOOPNZ Jb (f64)
-e1: LOOPE/LOOPZ Jb (f64)
-e2: LOOP Jb (f64)
-e3: JrCXZ Jb (f64)
-e4: IN AL,Ib
-e5: IN eAX,Ib
-e6: OUT Ib,AL
-e7: OUT Ib,eAX
+e0: LOOPNE/LOOPNZ Jb (f64) (!REX2)
+e1: LOOPE/LOOPZ Jb (f64) (!REX2)
+e2: LOOP Jb (f64) (!REX2)
+e3: JrCXZ Jb (f64) (!REX2)
+e4: IN AL,Ib (!REX2)
+e5: IN eAX,Ib (!REX2)
+e6: OUT Ib,AL (!REX2)
+e7: OUT Ib,eAX (!REX2)
# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
# in "near" jumps and calls is 16-bit. For CALL,
# push of return address is 16-bit wide, RSP is decremented by 2
# but is not truncated to 16 bits, unlike RIP.
-e8: CALL Jz (f64)
-e9: JMP-near Jz (f64)
-ea: JMP-far Ap (i64)
-eb: JMP-short Jb (f64)
-ec: IN AL,DX
-ed: IN eAX,DX
-ee: OUT DX,AL
-ef: OUT DX,eAX
+e8: CALL Jz (f64) (!REX2)
+e9: JMP-near Jz (f64) (!REX2)
+ea: JMP-far Ap (i64) (!REX2)
+eb: JMP-short Jb (f64) (!REX2)
+ec: IN AL,DX (!REX2)
+ed: IN eAX,DX (!REX2)
+ee: OUT DX,AL (!REX2)
+ef: OUT DX,eAX (!REX2)
# 0xf0 - 0xff
f0: LOCK (Prefix)
f1:
@@ -386,14 +390,14 @@ AVXcode: 1
2e: vucomiss Vss,Wss (v1) | vucomisd Vsd,Wsd (66),(v1)
2f: vcomiss Vss,Wss (v1) | vcomisd Vsd,Wsd (66),(v1)
# 0x0f 0x30-0x3f
-30: WRMSR
-31: RDTSC
-32: RDMSR
-33: RDPMC
-34: SYSENTER
-35: SYSEXIT
+30: WRMSR (!REX2)
+31: RDTSC (!REX2)
+32: RDMSR (!REX2)
+33: RDPMC (!REX2)
+34: SYSENTER (!REX2)
+35: SYSEXIT (!REX2)
36:
-37: GETSEC
+37: GETSEC (!REX2)
38: escape # 3-byte escape 1
39:
3a: escape # 3-byte escape 2
@@ -473,22 +477,22 @@ AVXcode: 1
7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqa32/64 Wx,Vx (66),(evo) | vmovdqu Wx,Vx (F3) | vmovdqu32/64 Wx,Vx (F3),(evo) | vmovdqu8/16 Wx,Vx (F2),(ev)
# 0x0f 0x80-0x8f
# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
-80: JO Jz (f64)
-81: JNO Jz (f64)
-82: JB/JC/JNAE Jz (f64)
-83: JAE/JNB/JNC Jz (f64)
-84: JE/JZ Jz (f64)
-85: JNE/JNZ Jz (f64)
-86: JBE/JNA Jz (f64)
-87: JA/JNBE Jz (f64)
-88: JS Jz (f64)
-89: JNS Jz (f64)
-8a: JP/JPE Jz (f64)
-8b: JNP/JPO Jz (f64)
-8c: JL/JNGE Jz (f64)
-8d: JNL/JGE Jz (f64)
-8e: JLE/JNG Jz (f64)
-8f: JNLE/JG Jz (f64)
+80: JO Jz (f64) (!REX2)
+81: JNO Jz (f64) (!REX2)
+82: JB/JC/JNAE Jz (f64) (!REX2)
+83: JAE/JNB/JNC Jz (f64) (!REX2)
+84: JE/JZ Jz (f64) (!REX2)
+85: JNE/JNZ Jz (f64) (!REX2)
+86: JBE/JNA Jz (f64) (!REX2)
+87: JA/JNBE Jz (f64) (!REX2)
+88: JS Jz (f64) (!REX2)
+89: JNS Jz (f64) (!REX2)
+8a: JP/JPE Jz (f64) (!REX2)
+8b: JNP/JPO Jz (f64) (!REX2)
+8c: JL/JNGE Jz (f64) (!REX2)
+8d: JNL/JGE Jz (f64) (!REX2)
+8e: JLE/JNG Jz (f64) (!REX2)
+8f: JNLE/JG Jz (f64) (!REX2)
# 0x0f 0x90-0x9f
90: SETO Eb | kmovw/q Vk,Wk | kmovb/d Vk,Wk (66)
91: SETNO Eb | kmovw/q Mv,Vk | kmovb/d Mv,Vk (66)

2024-05-02 11:27:27

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS

The following commit has been merged into the perf/core branch of tip:

Commit-ID: b8000264348979b60dbe479255570a40e1b3a097
Gitweb: https://git.kernel.org/tip/b8000264348979b60dbe479255570a40e1b3a097
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:46 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:42 +02:00

x86/insn: Add VEX versions of VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Intel Architecture Instruction Set Extensions and Future Features manual
number 319433-044 of May 2021, documented VEX versions of instructions
VPDPBUSD, VPDPBUSDS, VPDPWSSD and VPDPWSSDS, but the opcode map has them
listed as EVEX only.

Remove EVEX-only (ev) annotation from instructions VPDPBUSD, VPDPBUSDS,
VPDPWSSD and VPDPWSSDS, which allows them to be decoded with either a VEX
or EVEX prefix.

Fixes: 0153d98f2dd6 ("x86/insn: Add misc instructions to x86 instruction decoder")
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 8 ++++----
tools/arch/x86/lib/x86-opcode-map.txt | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6a..24941b9 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
55: vpopcntd/q Vx,Wx (66),(ev)
58: vpbroadcastd Vx,Wx (66),(v)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 4ea2e6a..24941b9 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -698,10 +698,10 @@ AVXcode: 2
4d: vrcp14ss/d Vsd,Hpd,Wsd (66),(ev)
4e: vrsqrt14ps/d Vpd,Wpd (66),(ev)
4f: vrsqrt14ss/d Vsd,Hsd,Wsd (66),(ev)
-50: vpdpbusd Vx,Hx,Wx (66),(ev)
-51: vpdpbusds Vx,Hx,Wx (66),(ev)
-52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66),(ev) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
-53: vpdpwssds Vx,Hx,Wx (66),(ev) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
+50: vpdpbusd Vx,Hx,Wx (66)
+51: vpdpbusds Vx,Hx,Wx (66)
+52: vdpbf16ps Vx,Hx,Wx (F3),(ev) | vpdpwssd Vx,Hx,Wx (66) | vp4dpwssd Vdqq,Hdqq,Wdq (F2),(ev)
+53: vpdpwssds Vx,Hx,Wx (66) | vp4dpwssds Vdqq,Hdqq,Wdq (F2),(ev)
54: vpopcntb/w Vx,Wx (66),(ev)
55: vpopcntd/q Vx,Wx (66),(ev)
58: vpbroadcastd Vx,Wx (66),(v)

2024-05-02 11:27:27

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map

The following commit has been merged into the perf/core branch of tip:

Commit-ID: 59162e0c11d7257cde15f907d19fefe26da66692
Gitweb: https://git.kernel.org/tip/59162e0c11d7257cde15f907d19fefe26da66692
Author: Adrian Hunter <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:45 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:41 +02:00

x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map

The x86 instruction decoder is used not only for decoding kernel
instructions. It is also used by perf uprobes (user space probes) and by
perf tools Intel Processor Trace decoding. Consequently, it needs to
support instructions executed by user space also.

Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
contradicted by the Instruction Set Reference section for PUSH in the
same manual.

Remove 64-bit operand size only annotation from opcode 0x68 PUSH
instruction.

Example:

$ cat pushw.s
.global _start
.text
_start:
pushw $0x1234
mov $0x1,%eax # system call number (sys_exit)
int $0x80
$ as -o pushw.o pushw.s
$ ld -s -o pushw pushw.o
$ objdump -d pushw | tail -4
0000000000401000 <.text>:
401000: 66 68 34 12 pushw $0x1234
401004: b8 01 00 00 00 mov $0x1,%eax
401009: cd 80 int $0x80
$ perf record -e intel_pt//u ./pushw
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data ]

Before:

$ perf script --insn-trace=disasm
Warning:
1 instruction trace errors
pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction

After:

$ perf script --insn-trace=disasm
pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax

Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 2 +-
tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index c94988d..4ea2e6a 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
65: SEG=GS (Prefix)
66: Operand-Size (Prefix)
67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
69: IMUL Gv,Ev,Iz
6a: PUSH Ib (d64)
6b: IMUL Gv,Ev,Ib
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index c94988d..4ea2e6a 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -148,7 +148,7 @@ AVXcode:
65: SEG=GS (Prefix)
66: Operand-Size (Prefix)
67: Address-Size (Prefix)
-68: PUSH Iz (d64)
+68: PUSH Iz
69: IMUL Gv,Ev,Iz
6a: PUSH Ib (d64)
6b: IMUL Gv,Ev,Ib

2024-05-02 11:27:28

by tip-bot2 for Jacob Pan

[permalink] [raw]
Subject: [tip: perf/core] x86/insn: Add Key Locker instructions to the opcode map

The following commit has been merged into the perf/core branch of tip:

Commit-ID: a5dd673ab7d269e4a5b6565fc2b5c6295a079605
Gitweb: https://git.kernel.org/tip/a5dd673ab7d269e4a5b6565fc2b5c6295a079605
Author: Chang S. Bae <[email protected]>
AuthorDate: Thu, 02 May 2024 13:58:44 +03:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Thu, 02 May 2024 13:13:41 +02:00

x86/insn: Add Key Locker instructions to the opcode map

The x86 instruction decoder needs to know these new instructions that
are going to be used in the crypto library as well as the x86 core
code. Add the following:

LOADIWKEY:
Load a CPU-internal wrapping key.

ENCODEKEY128:
Wrap a 128-bit AES key to a key handle.

ENCODEKEY256:
Wrap a 256-bit AES key to a key handle.

AESENC128KL:
Encrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESENC256KL:
Encrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESDEC128KL:
Decrypt a 128-bit block of data using a 128-bit AES key
indicated by a key handle.

AESDEC256KL:
Decrypt a 128-bit block of data using a 256-bit AES key
indicated by a key handle.

AESENCWIDE128KL:
Encrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESENCWIDE256KL:
Encrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

AESDECWIDE128KL:
Decrypt 8 128-bit blocks of data using a 128-bit AES key
indicated by a key handle.

AESDECWIDE256KL:
Decrypt 8 128-bit blocks of data using a 256-bit AES key
indicated by a key handle.

The detail can be found in Intel Software Developer Manual.

Signed-off-by: Chang S. Bae <[email protected]>
Signed-off-by: Adrian Hunter <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
tools/arch/x86/lib/x86-opcode-map.txt | 11 +++++++----
2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 12af572..c94988d 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
index 12af572..c94988d 100644
--- a/tools/arch/x86/lib/x86-opcode-map.txt
+++ b/tools/arch/x86/lib/x86-opcode-map.txt
@@ -800,11 +800,12 @@ cb: sha256rnds2 Vdq,Wdq | vrcp28ss/d Vx,Hx,Wx (66),(ev)
cc: sha256msg1 Vdq,Wdq | vrsqrt28ps/d Vx,Wx (66),(ev)
cd: sha256msg2 Vdq,Wdq | vrsqrt28ss/d Vx,Hx,Wx (66),(ev)
cf: vgf2p8mulb Vx,Wx (66)
+d8: AESENCWIDE128KL Qpi (F3),(000),(00B) | AESENCWIDE256KL Qpi (F3),(000),(10B) | AESDECWIDE128KL Qpi (F3),(000),(01B) | AESDECWIDE256KL Qpi (F3),(000),(11B)
db: VAESIMC Vdq,Wdq (66),(v1)
-dc: vaesenc Vx,Hx,Wx (66)
-dd: vaesenclast Vx,Hx,Wx (66)
-de: vaesdec Vx,Hx,Wx (66)
-df: vaesdeclast Vx,Hx,Wx (66)
+dc: vaesenc Vx,Hx,Wx (66) | LOADIWKEY Vx,Hx (F3) | AESENC128KL Vpd,Qpi (F3)
+dd: vaesenclast Vx,Hx,Wx (66) | AESDEC128KL Vpd,Qpi (F3)
+de: vaesdec Vx,Hx,Wx (66) | AESENC256KL Vpd,Qpi (F3)
+df: vaesdeclast Vx,Hx,Wx (66) | AESDEC256KL Vpd,Qpi (F3)
f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
f2: ANDN Gy,By,Ey (v)
@@ -814,6 +815,8 @@ f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v) | WRSSD/Q My,
f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
f8: MOVDIR64B Gv,Mdqq (66) | ENQCMD Gv,Mdqq (F2) | ENQCMDS Gv,Mdqq (F3)
f9: MOVDIRI My,Gy
+fa: ENCODEKEY128 Ew,Ew (F3)
+fb: ENCODEKEY256 Ew,Ew (F3)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)

2024-05-02 13:41:27

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map

On Thu, May 02, 2024 at 01:58:45PM +0300, Adrian Hunter wrote:
> The x86 instruction decoder is used not only for decoding kernel
> instructions. It is also used by perf uprobes (user space probes) and by
> perf tools Intel Processor Trace decoding. Consequently, it needs to
> support instructions executed by user space also.

I wonder if we should do it this way, i.e. updating both tools/ and
kernel source code in the same patch.

I think the best is to update the kernel bits, then, after that is
merged, do the tooling part.

To avoid possible, yet unlikely, clashes in linux-next, for instance.

- Arnaldo

> Opcode 0x68 PUSH instruction is currently defined as 64-bit operand size
> only i.e. (d64). That was based on Intel SDM Opcode Map. However that is
> contradicted by the Instruction Set Reference section for PUSH in the
> same manual.
>
> Remove 64-bit operand size only annotation from opcode 0x68 PUSH
> instruction.
>
> Example:
>
> $ cat pushw.s
> .global _start
> .text
> _start:
> pushw $0x1234
> mov $0x1,%eax # system call number (sys_exit)
> int $0x80
> $ as -o pushw.o pushw.s
> $ ld -s -o pushw pushw.o
> $ objdump -d pushw | tail -4
> 0000000000401000 <.text>:
> 401000: 66 68 34 12 pushw $0x1234
> 401004: b8 01 00 00 00 mov $0x1,%eax
> 401009: cd 80 int $0x80
> $ perf record -e intel_pt//u ./pushw
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data ]
>
> Before:
>
> $ perf script --insn-trace=disasm
> Warning:
> 1 instruction trace errors
> pushw 10349 [000] 10586.869237014: 401000 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) pushw $0x1234
> pushw 10349 [000] 10586.869237014: 401006 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %al, (%rax)
> pushw 10349 [000] 10586.869237014: 401008 [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb %cl, %ch
> pushw 10349 [000] 10586.869237014: 40100a [unknown] (/home/ahunter/git/misc/rtit-tests/pushw) addb $0x2e, (%rax)
> instruction trace error type 1 time 10586.869237224 cpu 0 pid 10349 tid 10349 ip 0x40100d code 6: Trace doesn't match instruction
>
> After:
>
> $ perf script --insn-trace=disasm
> pushw 10349 [000] 10586.869237014: 401000 [unknown] (./pushw) pushw $0x1234
> pushw 10349 [000] 10586.869237014: 401004 [unknown] (./pushw) movl $1, %eax
>
> Fixes: eb13296cfaf6 ("x86: Instruction decoder API")
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> arch/x86/lib/x86-opcode-map.txt | 2 +-
> tools/arch/x86/lib/x86-opcode-map.txt | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
> index c94988d5130d..4ea2e6adb477 100644
> --- a/arch/x86/lib/x86-opcode-map.txt
> +++ b/arch/x86/lib/x86-opcode-map.txt
> @@ -148,7 +148,7 @@ AVXcode:
> 65: SEG=GS (Prefix)
> 66: Operand-Size (Prefix)
> 67: Address-Size (Prefix)
> -68: PUSH Iz (d64)
> +68: PUSH Iz
> 69: IMUL Gv,Ev,Iz
> 6a: PUSH Ib (d64)
> 6b: IMUL Gv,Ev,Ib
> diff --git a/tools/arch/x86/lib/x86-opcode-map.txt b/tools/arch/x86/lib/x86-opcode-map.txt
> index c94988d5130d..4ea2e6adb477 100644
> --- a/tools/arch/x86/lib/x86-opcode-map.txt
> +++ b/tools/arch/x86/lib/x86-opcode-map.txt
> @@ -148,7 +148,7 @@ AVXcode:
> 65: SEG=GS (Prefix)
> 66: Operand-Size (Prefix)
> 67: Address-Size (Prefix)
> -68: PUSH Iz (d64)
> +68: PUSH Iz
> 69: IMUL Gv,Ev,Iz
> 6a: PUSH Ib (d64)
> 6b: IMUL Gv,Ev,Ib
> --
> 2.34.1

2024-05-02 18:10:49

by Ian Rogers

[permalink] [raw]
Subject: Re: [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic

On Thu, May 2, 2024 at 3:59 AM Adrian Hunter <[email protected]> wrote:
>
> Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
> REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
>
> The REX2 prefix is effectively an extended version of the REX prefix.
>
> REX2 and EVEX are also used with PUSH/POP instructions to provide a
> Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
> fast-forward register data between matching PUSH and POP instructions.
>
> REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
> other maps is provided by the EVEX prefix, covered in a separate patch.
>
> Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
> for a new 64-bit absolute direct jump instruction JMPABS.
>
> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
> Specification for details.
>
> Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
> flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
> opcodes (only JMPABS) that require a mandatory REX2 prefix
> (INAT_REX2_VARIANT).
>
> Amend logic to read the REX2 prefix and get the opcode attribute for the
> map number (0 or 1) encoded in the REX2 prefix.
>
> Amend the awk script that generates the attribute tables from the opcode
> map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
> as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
>
> Signed-off-by: Adrian Hunter <[email protected]>
> ---
> arch/x86/include/asm/inat.h | 11 +++++++++-
> arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
> arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
> arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
> tools/arch/x86/include/asm/inat.h | 11 +++++++++-
> tools/arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
> tools/arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
> tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
> 8 files changed, 132 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
> index b56c5741581a..1331bdd39a23 100644
> --- a/arch/x86/include/asm/inat.h
> +++ b/arch/x86/include/asm/inat.h
> @@ -35,6 +35,8 @@
> #define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
> #define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
> #define INAT_PFX_EVEX 15 /* EVEX prefix */
> +/* x86-64 REX2 prefix */
> +#define INAT_PFX_REX2 16 /* 0xD5 */
>
> #define INAT_LSTPFX_MAX 3
> #define INAT_LGCPFX_MAX 11
> @@ -50,7 +52,7 @@
>
> /* Legacy prefix */
> #define INAT_PFX_OFFS 0
> -#define INAT_PFX_BITS 4
> +#define INAT_PFX_BITS 5
> #define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
> #define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
> /* Escape opcodes */
> @@ -77,6 +79,8 @@
> #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
> #define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
> #define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
> +#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
> +#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
> /* Attribute making macros for attribute tables */
> #define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
> #define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
> return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
> }
>
> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
> +{
> + return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
> +}
> +
> static inline int inat_last_prefix_id(insn_attr_t attr)
> {
> if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
> diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
> index 1b29f58f730f..95249ec1f24e 100644
> --- a/arch/x86/include/asm/insn.h
> +++ b/arch/x86/include/asm/insn.h
> @@ -112,10 +112,15 @@ struct insn {
> #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
> #define X86_SIB_BASE(sib) ((sib) & 0x07)
>
> -#define X86_REX_W(rex) ((rex) & 8)
> -#define X86_REX_R(rex) ((rex) & 4)
> -#define X86_REX_X(rex) ((rex) & 2)
> -#define X86_REX_B(rex) ((rex) & 1)
> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
> +
> +#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
> +#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
> +#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
> +#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */
>
> /* VEX bit flags */
> #define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
> /* Instruction uses RIP-relative addressing */
> extern int insn_rip_relative(struct insn *insn);
>
> +static inline int insn_is_rex2(struct insn *insn)
> +{
> + if (!insn->prefixes.got)
> + insn_get_prefixes(insn);
> + return insn->rex_prefix.nbytes == 2;

It'd be nice to capture that a rex2 prefix is by definition 2 bytes.
Playing devil's advocate, if there were a REX and a REX2 prefix,
couldn't rex_prefix.nbytes be 3? I'm wondering about other prefix
combinations that may confuse this logic, maybe someone dreams up
doing this for say alignment reasons like "rep ret".

Thanks,
Ian

> +}
> +
> +static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
> +{
> + return X86_REX2_M(insn->rex_prefix.bytes[1]);
> +}
> +
> static inline int insn_is_avx(struct insn *insn)
> {
> if (!insn->prefixes.got)
> diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
> index 1bb155a0955b..6126ddc6e5f5 100644
> --- a/arch/x86/lib/insn.c
> +++ b/arch/x86/lib/insn.c
> @@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
> if (X86_REX_W(b))
> /* REX.W overrides opnd_size */
> insn->opnd_bytes = 8;
> + } else if (inat_is_rex2_prefix(attr)) {
> + insn_set_byte(&insn->rex_prefix, 0, b);
> + b = peek_nbyte_next(insn_byte_t, insn, 1);
> + insn_set_byte(&insn->rex_prefix, 1, b);
> + insn->rex_prefix.nbytes = 2;
> + insn->next_byte += 2;
> + if (X86_REX_W(b))
> + /* REX.W overrides opnd_size */
> + insn->opnd_bytes = 8;
> + insn->rex_prefix.got = 1;
> + goto vex_end;
> }
> }
> insn->rex_prefix.got = 1;
> @@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
> goto end;
> }
>
> + /* Check if there is REX2 prefix or not */
> + if (insn_is_rex2(insn)) {
> + if (insn_rex2_m_bit(insn)) {
> + /* map 1 is escape 0x0f */
> + insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
> +
> + pfx_id = insn_last_prefix_id(insn);
> + insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
> + } else {
> + insn->attr = inat_get_opcode_attribute(op);
> + }
> + goto end;
> + }
> +
> insn->attr = inat_get_opcode_attribute(op);
> while (inat_is_escape(insn->attr)) {
> /* Get escaped opcode */
> diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
> index af38469afd14..3f43aa7d8fef 100644
> --- a/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -64,7 +64,9 @@ BEGIN {
>
> modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
> force64_expr = "\\([df]64\\)"
> - rex_expr = "^REX(\\.[XRWB]+)*"
> + rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
> + rex2_expr = "\\(REX2\\)"
> + no_rex2_expr = "\\(!REX2\\)"
> fpu_expr = "^ESC" # TODO
>
> lprefix1_expr = "\\((66|!F3)\\)"
> @@ -99,6 +101,7 @@ BEGIN {
> prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
> prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
> prefix_num["EVEX"] = "INAT_PFX_EVEX"
> + prefix_num["REX2"] = "INAT_PFX_REX2"
>
> clear_vars()
> }
> @@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
> if (match(ext, force64_expr))
> flags = add_flags(flags, "INAT_FORCE64")
>
> + # check REX2 not allowed
> + if (match(ext, no_rex2_expr))
> + flags = add_flags(flags, "INAT_NO_REX2")
> +
> # check REX prefix
> if (match(opcode, rex_expr))
> flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
> @@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
> lptable3[idx] = add_flags(lptable3[idx],flags)
> variant = "INAT_VARIANT"
> }
> + if (match(ext, rex2_expr))
> + table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
> if (!match(ext, lprefix_expr)){
> table[idx] = add_flags(table[idx],flags)
> }
> diff --git a/tools/arch/x86/include/asm/inat.h b/tools/arch/x86/include/asm/inat.h
> index a61051400311..2e65312cae52 100644
> --- a/tools/arch/x86/include/asm/inat.h
> +++ b/tools/arch/x86/include/asm/inat.h
> @@ -35,6 +35,8 @@
> #define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
> #define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
> #define INAT_PFX_EVEX 15 /* EVEX prefix */
> +/* x86-64 REX2 prefix */
> +#define INAT_PFX_REX2 16 /* 0xD5 */
>
> #define INAT_LSTPFX_MAX 3
> #define INAT_LGCPFX_MAX 11
> @@ -50,7 +52,7 @@
>
> /* Legacy prefix */
> #define INAT_PFX_OFFS 0
> -#define INAT_PFX_BITS 4
> +#define INAT_PFX_BITS 5
> #define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
> #define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
> /* Escape opcodes */
> @@ -77,6 +79,8 @@
> #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
> #define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
> #define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
> +#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
> +#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
> /* Attribute making macros for attribute tables */
> #define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
> #define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
> return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
> }
>
> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
> +{
> + return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
> +}
> +
> static inline int inat_last_prefix_id(insn_attr_t attr)
> {
> if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
> diff --git a/tools/arch/x86/include/asm/insn.h b/tools/arch/x86/include/asm/insn.h
> index 65c0d9ce1e29..1a7e8fc4d75a 100644
> --- a/tools/arch/x86/include/asm/insn.h
> +++ b/tools/arch/x86/include/asm/insn.h
> @@ -112,10 +112,15 @@ struct insn {
> #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
> #define X86_SIB_BASE(sib) ((sib) & 0x07)
>
> -#define X86_REX_W(rex) ((rex) & 8)
> -#define X86_REX_R(rex) ((rex) & 4)
> -#define X86_REX_X(rex) ((rex) & 2)
> -#define X86_REX_B(rex) ((rex) & 1)
> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
> +
> +#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
> +#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
> +#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
> +#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */
>
> /* VEX bit flags */
> #define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
> /* Instruction uses RIP-relative addressing */
> extern int insn_rip_relative(struct insn *insn);
>
> +static inline int insn_is_rex2(struct insn *insn)
> +{
> + if (!insn->prefixes.got)
> + insn_get_prefixes(insn);
> + return insn->rex_prefix.nbytes == 2;
> +}
> +
> +static inline insn_byte_t insn_rex2_m_bit(struct insn *insn)
> +{
> + return X86_REX2_M(insn->rex_prefix.bytes[1]);
> +}
> +
> static inline int insn_is_avx(struct insn *insn)
> {
> if (!insn->prefixes.got)
> diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
> index ada4b4a79dd4..f761adeb8e8c 100644
> --- a/tools/arch/x86/lib/insn.c
> +++ b/tools/arch/x86/lib/insn.c
> @@ -185,6 +185,17 @@ int insn_get_prefixes(struct insn *insn)
> if (X86_REX_W(b))
> /* REX.W overrides opnd_size */
> insn->opnd_bytes = 8;
> + } else if (inat_is_rex2_prefix(attr)) {
> + insn_set_byte(&insn->rex_prefix, 0, b);
> + b = peek_nbyte_next(insn_byte_t, insn, 1);
> + insn_set_byte(&insn->rex_prefix, 1, b);
> + insn->rex_prefix.nbytes = 2;
> + insn->next_byte += 2;
> + if (X86_REX_W(b))
> + /* REX.W overrides opnd_size */
> + insn->opnd_bytes = 8;
> + insn->rex_prefix.got = 1;
> + goto vex_end;
> }
> }
> insn->rex_prefix.got = 1;
> @@ -294,6 +305,20 @@ int insn_get_opcode(struct insn *insn)
> goto end;
> }
>
> + /* Check if there is REX2 prefix or not */
> + if (insn_is_rex2(insn)) {
> + if (insn_rex2_m_bit(insn)) {
> + /* map 1 is escape 0x0f */
> + insn_attr_t esc_attr = inat_get_opcode_attribute(0x0f);
> +
> + pfx_id = insn_last_prefix_id(insn);
> + insn->attr = inat_get_escape_attribute(op, pfx_id, esc_attr);
> + } else {
> + insn->attr = inat_get_opcode_attribute(op);
> + }
> + goto end;
> + }
> +
> insn->attr = inat_get_opcode_attribute(op);
> while (inat_is_escape(insn->attr)) {
> /* Get escaped opcode */
> diff --git a/tools/arch/x86/tools/gen-insn-attr-x86.awk b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> index af38469afd14..3f43aa7d8fef 100644
> --- a/tools/arch/x86/tools/gen-insn-attr-x86.awk
> +++ b/tools/arch/x86/tools/gen-insn-attr-x86.awk
> @@ -64,7 +64,9 @@ BEGIN {
>
> modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
> force64_expr = "\\([df]64\\)"
> - rex_expr = "^REX(\\.[XRWB]+)*"
> + rex_expr = "^((REX(\\.[XRWB]+)+)|(REX$))"
> + rex2_expr = "\\(REX2\\)"
> + no_rex2_expr = "\\(!REX2\\)"
> fpu_expr = "^ESC" # TODO
>
> lprefix1_expr = "\\((66|!F3)\\)"
> @@ -99,6 +101,7 @@ BEGIN {
> prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
> prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
> prefix_num["EVEX"] = "INAT_PFX_EVEX"
> + prefix_num["REX2"] = "INAT_PFX_REX2"
>
> clear_vars()
> }
> @@ -314,6 +317,10 @@ function convert_operands(count,opnd, i,j,imm,mod)
> if (match(ext, force64_expr))
> flags = add_flags(flags, "INAT_FORCE64")
>
> + # check REX2 not allowed
> + if (match(ext, no_rex2_expr))
> + flags = add_flags(flags, "INAT_NO_REX2")
> +
> # check REX prefix
> if (match(opcode, rex_expr))
> flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
> @@ -351,6 +358,8 @@ function convert_operands(count,opnd, i,j,imm,mod)
> lptable3[idx] = add_flags(lptable3[idx],flags)
> variant = "INAT_VARIANT"
> }
> + if (match(ext, rex2_expr))
> + table[idx] = add_flags(table[idx], "INAT_REX2_VARIANT")
> if (!match(ext, lprefix_expr)){
> table[idx] = add_flags(table[idx],flags)
> }
> --
> 2.34.1
>

2024-05-02 19:11:39

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 02/10] x86/insn: Fix PUSH instruction in x86 instruction decoder opcode map

On 2/05/24 16:41, Arnaldo Carvalho de Melo wrote:
> On Thu, May 02, 2024 at 01:58:45PM +0300, Adrian Hunter wrote:
>> The x86 instruction decoder is used not only for decoding kernel
>> instructions. It is also used by perf uprobes (user space probes) and by
>> perf tools Intel Processor Trace decoding. Consequently, it needs to
>> support instructions executed by user space also.
>
> I wonder if we should do it this way, i.e. updating both tools/ and
> kernel source code in the same patch.
>
> I think the best is to update the kernel bits, then, after that is
> merged, do the tooling part.

For objtool purposes, it might be better to do both at the same time.

>
> To avoid possible, yet unlikely, clashes in linux-next, for instance.

Always gonna find out about clashes at some point.


2024-05-03 05:10:17

by Adrian Hunter

[permalink] [raw]
Subject: Re: [PATCH 05/10] x86/insn: Add support for REX2 prefix to the instruction decoder logic

On 2/05/24 21:10, Ian Rogers wrote:
> On Thu, May 2, 2024 at 3:59 AM Adrian Hunter <[email protected]> wrote:
>>
>> Intel Advanced Performance Extensions (APX) uses a new 2-byte prefix named
>> REX2 to select extended general purpose registers (EGPRs) i.e. r16 to r31.
>>
>> The REX2 prefix is effectively an extended version of the REX prefix.
>>
>> REX2 and EVEX are also used with PUSH/POP instructions to provide a
>> Push-Pop Acceleration (PPX) hint. With PPX hints, a CPU will attempt to
>> fast-forward register data between matching PUSH and POP instructions.
>>
>> REX2 is valid only with opcodes in maps 0 and 1. Similar extension for
>> other maps is provided by the EVEX prefix, covered in a separate patch.
>>
>> Some opcodes in maps 0 and 1 are reserved under REX2. One of these is used
>> for a new 64-bit absolute direct jump instruction JMPABS.
>>
>> Refer to the Intel Advanced Performance Extensions (Intel APX) Architecture
>> Specification for details.
>>
>> Define a code value for the REX2 prefix (INAT_PFX_REX2), and add attribute
>> flags for opcodes reserved under REX2 (INAT_NO_REX2) and to identify
>> opcodes (only JMPABS) that require a mandatory REX2 prefix
>> (INAT_REX2_VARIANT).
>>
>> Amend logic to read the REX2 prefix and get the opcode attribute for the
>> map number (0 or 1) encoded in the REX2 prefix.
>>
>> Amend the awk script that generates the attribute tables from the opcode
>> map, to recognise "REX2" as attribute INAT_PFX_REX2, and "(!REX2)"
>> as attribute INAT_NO_REX2, and "(REX2)" as attribute INAT_REX2_VARIANT.
>>
>> Signed-off-by: Adrian Hunter <[email protected]>
>> ---
>> arch/x86/include/asm/inat.h | 11 +++++++++-
>> arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
>> arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
>> arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
>> tools/arch/x86/include/asm/inat.h | 11 +++++++++-
>> tools/arch/x86/include/asm/insn.h | 25 ++++++++++++++++++----
>> tools/arch/x86/lib/insn.c | 25 ++++++++++++++++++++++
>> tools/arch/x86/tools/gen-insn-attr-x86.awk | 11 +++++++++-
>> 8 files changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
>> index b56c5741581a..1331bdd39a23 100644
>> --- a/arch/x86/include/asm/inat.h
>> +++ b/arch/x86/include/asm/inat.h
>> @@ -35,6 +35,8 @@
>> #define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
>> #define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */
>> #define INAT_PFX_EVEX 15 /* EVEX prefix */
>> +/* x86-64 REX2 prefix */
>> +#define INAT_PFX_REX2 16 /* 0xD5 */
>>
>> #define INAT_LSTPFX_MAX 3
>> #define INAT_LGCPFX_MAX 11
>> @@ -50,7 +52,7 @@
>>
>> /* Legacy prefix */
>> #define INAT_PFX_OFFS 0
>> -#define INAT_PFX_BITS 4
>> +#define INAT_PFX_BITS 5
>> #define INAT_PFX_MAX ((1 << INAT_PFX_BITS) - 1)
>> #define INAT_PFX_MASK (INAT_PFX_MAX << INAT_PFX_OFFS)
>> /* Escape opcodes */
>> @@ -77,6 +79,8 @@
>> #define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
>> #define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
>> #define INAT_EVEXONLY (1 << (INAT_FLAG_OFFS + 7))
>> +#define INAT_NO_REX2 (1 << (INAT_FLAG_OFFS + 8))
>> +#define INAT_REX2_VARIANT (1 << (INAT_FLAG_OFFS + 9))
>> /* Attribute making macros for attribute tables */
>> #define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
>> #define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
>> @@ -128,6 +132,11 @@ static inline int inat_is_rex_prefix(insn_attr_t attr)
>> return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
>> }
>>
>> +static inline int inat_is_rex2_prefix(insn_attr_t attr)
>> +{
>> + return (attr & INAT_PFX_MASK) == INAT_PFX_REX2;
>> +}
>> +
>> static inline int inat_last_prefix_id(insn_attr_t attr)
>> {
>> if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
>> diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
>> index 1b29f58f730f..95249ec1f24e 100644
>> --- a/arch/x86/include/asm/insn.h
>> +++ b/arch/x86/include/asm/insn.h
>> @@ -112,10 +112,15 @@ struct insn {
>> #define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
>> #define X86_SIB_BASE(sib) ((sib) & 0x07)
>>
>> -#define X86_REX_W(rex) ((rex) & 8)
>> -#define X86_REX_R(rex) ((rex) & 4)
>> -#define X86_REX_X(rex) ((rex) & 2)
>> -#define X86_REX_B(rex) ((rex) & 1)
>> +#define X86_REX2_M(rex) ((rex) & 0x80) /* REX2 M0 */
>> +#define X86_REX2_R(rex) ((rex) & 0x40) /* REX2 R4 */
>> +#define X86_REX2_X(rex) ((rex) & 0x20) /* REX2 X4 */
>> +#define X86_REX2_B(rex) ((rex) & 0x10) /* REX2 B4 */
>> +
>> +#define X86_REX_W(rex) ((rex) & 8) /* REX or REX2 W */
>> +#define X86_REX_R(rex) ((rex) & 4) /* REX or REX2 R3 */
>> +#define X86_REX_X(rex) ((rex) & 2) /* REX or REX2 X3 */
>> +#define X86_REX_B(rex) ((rex) & 1) /* REX or REX2 B3 */
>>
>> /* VEX bit flags */
>> #define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
>> @@ -161,6 +166,18 @@ static inline void insn_get_attribute(struct insn *insn)
>> /* Instruction uses RIP-relative addressing */
>> extern int insn_rip_relative(struct insn *insn);
>>
>> +static inline int insn_is_rex2(struct insn *insn)
>> +{
>> + if (!insn->prefixes.got)
>> + insn_get_prefixes(insn);
>> + return insn->rex_prefix.nbytes == 2;
>
> It'd be nice to capture that a rex2 prefix is by definition 2 bytes.
> Playing devil's advocate, if there were a REX and a REX2 prefix,
> couldn't rex_prefix.nbytes be 3? I'm wondering about other prefix
> combinations that may confuse this logic, maybe someone dreams up
> doing this for say alignment reasons like "rep ret".

REX with REX2 is not allowed.