2009-10-27 20:42:43

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Hi Ingo,

Here are bugfixes and some enhances of x86-insn decoder and perf-probe.
- x86 insn decoder supports AVX and FMA.
- perf-probe syntax change.
- perf-probe supports function-relative line number.
- minor bugfixes.

New perf-probe syntax is below:

perf probe 'PROBE'

or

perf probe --add 'PROBE'

where, PROBE is

<source>:<line-number>

or

<function>[:<rel-lineno>|+<byte-offset>|%return][@<source>]

e.g.

perf probe 'schedule:10@kernel/sched.c'

puts a probe at 10th line from entry line of schedule() function
in kernel/sched.c." and

perf probe 'vmalloc%return'

puts a return probe at the returning of vmalloc.

TODO:
- Support --line option to show which lines user can probe.
- Support lazy string matching.

Thank you,

---

Masami Hiramatsu (10):
perf/probes: Support function entry relative line number
perf/probes: Change probepoint syntax of perf-probe
perf/probes: Change command-line option of perf-probe
perf/probes: Exit searching after finding target function
kprobe-tracer: Compare both of event-name and event-group to find probe
x86: Add Intel FMA instructions to x86 opcode map
x86: AVX instruction set decoder support
x86: Add pclmulq to x86 opcode map
x86: Merge INAT_REXPFX into INAT_PFX_*
x86: Fix SSE opcode map bug


arch/x86/include/asm/inat.h | 68 ++++-
arch/x86/include/asm/insn.h | 43 +++
arch/x86/lib/inat.c | 12 +
arch/x86/lib/insn.c | 54 ++++
arch/x86/lib/x86-opcode-map.txt | 464 +++++++++++++++++++---------------
arch/x86/tools/gen-insn-attr-x86.awk | 100 +++++--
kernel/trace/trace_kprobe.c | 8 -
tools/perf/builtin-probe.c | 201 +++++++++------
tools/perf/util/probe-finder.c | 93 +++++--
tools/perf/util/probe-finder.h | 4
10 files changed, 695 insertions(+), 352 deletions(-)

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: [email protected]


2009-10-27 20:42:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 01/10] x86: Fix SSE opcode map bug

Fix superscripts position because some superscripts of SSE opcode are
not put in correct position.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

arch/x86/lib/x86-opcode-map.txt | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 701c467..efef3ca 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -401,9 +401,9 @@ Referrer: 2-byte escape
62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66)
63: packsswb Pq,Qq | packsswb Vdq,Wdq (66)
64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66)
-65: pcmpgtw Pq,Qq | pcmpgtw(66) Vdq,Wdq
+65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66)
66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66)
-67: packuswb Pq,Qq | packuswb(66) Vdq,Wdq
+67: packuswb Pq,Qq | packuswb Vdq,Wdq (66)
68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66)
69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66)
6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66)
@@ -425,8 +425,8 @@ Referrer: 2-byte escape
79: VMWRITE Gd/q,Ed/q
7a:
7b:
-7c: haddps(F2) Vps,Wps | haddpd(66) Vpd,Wpd
-7d: hsubps(F2) Vps,Wps | hsubpd(66) Vpd,Wpd
+7c: haddps Vps,Wps (F2) | haddpd Vpd,Wpd (66)
+7d: hsubps Vps,Wps (F2) | hsubpd Vpd,Wpd (66)
7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66) | movq Vq,Wq (F3)
7f: movq Qq,Pq | movdqa Wdq,Vdq (66) | movdqu Wdq,Vdq (F3)
# 0x0f 0x80-0x8f
@@ -574,7 +574,7 @@ Referrer: 3-byte escape 1
01: phaddw Pq,Qq | phaddw Vdq,Wdq (66)
02: phaddd Pq,Qq | phaddd Vdq,Wdq (66)
03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66)
-04: pmaddubsw Pq,Qq | pmaddubsw (66)Vdq,Wdq
+04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66)
05: phsubw Pq,Qq | phsubw Vdq,Wdq (66)
06: phsubd Pq,Qq | phsubd Vdq,Wdq (66)
07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66)


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:04

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 02/10] x86: Merge INAT_REXPFX into INAT_PFX_*

Merge INAT_REXPFX into INAT_PFX_* macro and rename it to INAT_PFX_REX.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

arch/x86/include/asm/inat.h | 36 +++++++++++++++++++---------------
arch/x86/lib/insn.c | 2 +-
arch/x86/tools/gen-insn-attr-x86.awk | 6 +++---
3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 2866fdd..c2487d2 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -30,10 +30,11 @@
#define INAT_OPCODE_TABLE_SIZE 256
#define INAT_GROUP_TABLE_SIZE 8

-/* Legacy instruction prefixes */
+/* Legacy last prefixes */
#define INAT_PFX_OPNDSZ 1 /* 0x66 */ /* LPFX1 */
#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */
#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */
+/* Other Legacy prefixes */
#define INAT_PFX_LOCK 4 /* 0xF0 */
#define INAT_PFX_CS 5 /* 0x2E */
#define INAT_PFX_DS 6 /* 0x3E */
@@ -42,8 +43,11 @@
#define INAT_PFX_GS 9 /* 0x65 */
#define INAT_PFX_SS 10 /* 0x36 */
#define INAT_PFX_ADDRSZ 11 /* 0x67 */
+/* x86-64 REX prefix */
+#define INAT_PFX_REX 12 /* 0x4X */

-#define INAT_LPREFIX_MAX 3
+#define INAT_LSTPFX_MAX 3
+#define INAT_LGCPFX_MAX 11

/* Immediate size */
#define INAT_IMM_BYTE 1
@@ -75,12 +79,11 @@
#define INAT_IMM_MASK (((1 << INAT_IMM_BITS) - 1) << INAT_IMM_OFFS)
/* Flags */
#define INAT_FLAG_OFFS (INAT_IMM_OFFS + INAT_IMM_BITS)
-#define INAT_REXPFX (1 << INAT_FLAG_OFFS)
-#define INAT_MODRM (1 << (INAT_FLAG_OFFS + 1))
-#define INAT_FORCE64 (1 << (INAT_FLAG_OFFS + 2))
-#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 3))
-#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 4))
-#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 5))
+#define INAT_MODRM (1 << (INAT_FLAG_OFFS))
+#define INAT_FORCE64 (1 << (INAT_FLAG_OFFS + 1))
+#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 2))
+#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 3))
+#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 4))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -97,9 +100,10 @@ extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
insn_attr_t esc_attr);

/* Attribute checking functions */
-static inline int inat_is_prefix(insn_attr_t attr)
+static inline int inat_is_legacy_prefix(insn_attr_t attr)
{
- return attr & INAT_PFX_MASK;
+ attr &= INAT_PFX_MASK;
+ return attr && attr <= INAT_LGCPFX_MAX;
}

static inline int inat_is_address_size_prefix(insn_attr_t attr)
@@ -112,9 +116,14 @@ static inline int inat_is_operand_size_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_OPNDSZ;
}

+static inline int inat_is_rex_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
- if ((attr & INAT_PFX_MASK) > INAT_LPREFIX_MAX)
+ if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
return 0;
else
return attr & INAT_PFX_MASK;
@@ -155,11 +164,6 @@ static inline int inat_immediate_size(insn_attr_t attr)
return (attr & INAT_IMM_MASK) >> INAT_IMM_OFFS;
}

-static inline int inat_is_rex_prefix(insn_attr_t attr)
-{
- return attr & INAT_REXPFX;
-}
-
static inline int inat_has_modrm(insn_attr_t attr)
{
return attr & INAT_MODRM;
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index dfd56a3..9f48317 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -69,7 +69,7 @@ void insn_get_prefixes(struct insn *insn)
lb = 0;
b = peek_next(insn_byte_t, insn);
attr = inat_get_opcode_attribute(b);
- while (inat_is_prefix(attr)) {
+ while (inat_is_legacy_prefix(attr)) {
/* Skip if same prefix */
for (i = 0; i < nb; i++)
if (prefixes->bytes[i] == b)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 19ba096..7d54929 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -278,7 +278,7 @@ function convert_operands(opnd, i,imm,mod)

# check REX prefix
if (match(opcode, rex_expr))
- flags = add_flags(flags, "INAT_REXPFX")
+ flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")

# check coprocessor escape : TODO
if (match(opcode, fpu_expr))
@@ -316,7 +316,7 @@ END {
# print escape opcode map's array
print "/* Escape opcode map array */"
print "const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1]" \
- "[INAT_LPREFIX_MAX + 1] = {"
+ "[INAT_LSTPFX_MAX + 1] = {"
for (i = 0; i < geid; i++)
for (j = 0; j < max_lprefix; j++)
if (etable[i,j])
@@ -325,7 +325,7 @@ END {
# print group opcode map's array
print "/* Group opcode map array */"
print "const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1]"\
- "[INAT_LPREFIX_MAX + 1] = {"
+ "[INAT_LSTPFX_MAX + 1] = {"
for (i = 0; i < ggid; i++)
for (j = 0; j < max_lprefix; j++)
if (gtable[i,j])


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:45:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 03/10] x86: Add pclmulq to x86 opcode map

Add pclmulq opcode to x86 opcode map.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

arch/x86/lib/x86-opcode-map.txt | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index efef3ca..1f41246 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -672,6 +672,7 @@ Referrer: 3-byte escape 2
40: dpps Vdq,Wdq,Ib (66)
41: dppd Vdq,Wdq,Ib (66)
42: mpsadbw Vdq,Wdq,Ib (66)
+44: pclmulq Vdq,Wdq,Ib (66)
60: pcmpestrm Vdq,Wdq,Ib (66)
61: pcmpestri Vdq,Wdq,Ib (66)
62: pcmpistrm Vdq,Wdq,Ib (66)


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:18

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 04/10] x86: AVX instruction set decoder support

Add Intel AVX(Advanced Vector Extensions) instruction set support to
x86 instruction decoder. This adds insn.vex_prefix field for storing
VEX prefixes, and introduces some original tags for expressing opcodes
attributes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

arch/x86/include/asm/inat.h | 32 ++-
arch/x86/include/asm/insn.h | 43 +++
arch/x86/lib/inat.c | 12 +
arch/x86/lib/insn.c | 52 ++++
arch/x86/lib/x86-opcode-map.txt | 431 ++++++++++++++++++----------------
arch/x86/tools/gen-insn-attr-x86.awk | 94 ++++++-
6 files changed, 431 insertions(+), 233 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index c2487d2..205b063 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -32,8 +32,8 @@

/* Legacy last prefixes */
#define INAT_PFX_OPNDSZ 1 /* 0x66 */ /* LPFX1 */
-#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */
-#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_REPE 2 /* 0xF3 */ /* LPFX2 */
+#define INAT_PFX_REPNE 3 /* 0xF2 */ /* LPFX3 */
/* Other Legacy prefixes */
#define INAT_PFX_LOCK 4 /* 0xF0 */
#define INAT_PFX_CS 5 /* 0x2E */
@@ -45,6 +45,9 @@
#define INAT_PFX_ADDRSZ 11 /* 0x67 */
/* x86-64 REX prefix */
#define INAT_PFX_REX 12 /* 0x4X */
+/* AVX VEX prefixes */
+#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
+#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -84,6 +87,8 @@
#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 2))
#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 3))
#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 4))
+#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
+#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -98,6 +103,9 @@ extern insn_attr_t inat_get_escape_attribute(insn_byte_t opcode,
extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
insn_byte_t last_pfx,
insn_attr_t esc_attr);
+extern insn_attr_t inat_get_avx_attribute(insn_byte_t opcode,
+ insn_byte_t vex_m,
+ insn_byte_t vex_pp);

/* Attribute checking functions */
static inline int inat_is_legacy_prefix(insn_attr_t attr)
@@ -129,6 +137,17 @@ static inline int inat_last_prefix_id(insn_attr_t attr)
return attr & INAT_PFX_MASK;
}

+static inline int inat_is_vex_prefix(insn_attr_t attr)
+{
+ attr &= INAT_PFX_MASK;
+ return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_vex3_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_VEX3;
+}
+
static inline int inat_is_escape(insn_attr_t attr)
{
return attr & INAT_ESC_MASK;
@@ -189,4 +208,13 @@ static inline int inat_has_variant(insn_attr_t attr)
return attr & INAT_VARIANT;
}

+static inline int inat_accept_vex(insn_attr_t attr)
+{
+ return attr & INAT_VEXOK;
+}
+
+static inline int inat_must_vex(insn_attr_t attr)
+{
+ return attr & INAT_VEXONLY;
+}
#endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 12b4e37..96c2e0a 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -39,6 +39,7 @@ struct insn {
* prefixes.bytes[3]: last prefix
*/
struct insn_field rex_prefix; /* REX prefix */
+ struct insn_field vex_prefix; /* VEX prefix */
struct insn_field opcode; /*
* opcode.bytes[0]: opcode1
* opcode.bytes[1]: opcode2
@@ -80,6 +81,19 @@ struct insn {
#define X86_REX_X(rex) ((rex) & 2)
#define X86_REX_B(rex) ((rex) & 1)

+/* VEX bit flags */
+#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
+#define X86_VEX_R(vex) ((vex) & 0x80) /* VEX2/3 Byte1 */
+#define X86_VEX_X(vex) ((vex) & 0x40) /* VEX3 Byte1 */
+#define X86_VEX_B(vex) ((vex) & 0x20) /* VEX3 Byte1 */
+#define X86_VEX_L(vex) ((vex) & 0x04) /* VEX3 Byte2, VEX2 Byte1 */
+/* VEX bit fields */
+#define X86_VEX3_M(vex) ((vex) & 0x1f) /* VEX3 Byte1 */
+#define X86_VEX2_M 1 /* VEX2.M always 1 */
+#define X86_VEX_V(vex) (((vex) & 0x78) >> 3) /* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_P(vex) ((vex) & 0x03) /* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_M_MAX 0x1f /* VEX3.M Maximum value */
+
/* The last prefix is needed for two-byte and three-byte opcodes */
static inline insn_byte_t insn_last_prefix(struct insn *insn)
{
@@ -114,15 +128,42 @@ static inline void kernel_insn_init(struct insn *insn, const void *kaddr)
#endif
}

+static inline int insn_is_avx(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return (insn->vex_prefix.value != 0);
+}
+
+static inline insn_byte_t insn_vex_m_bits(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
+ return X86_VEX2_M;
+ else
+ return X86_VEX3_M(insn->vex_prefix.bytes[1]);
+}
+
+static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
+ return X86_VEX_P(insn->vex_prefix.bytes[1]);
+ else
+ return X86_VEX_P(insn->vex_prefix.bytes[2]);
+}
+
/* Offset of each field from kaddr */
static inline int insn_offset_rex_prefix(struct insn *insn)
{
return insn->prefixes.nbytes;
}
-static inline int insn_offset_opcode(struct insn *insn)
+static inline int insn_offset_vex_prefix(struct insn *insn)
{
return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes;
}
+static inline int insn_offset_opcode(struct insn *insn)
+{
+ return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes;
+}
static inline int insn_offset_modrm(struct insn *insn)
{
return insn_offset_opcode(insn) + insn->opcode.nbytes;
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index 3fb5998..46fc4ee 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -76,3 +76,15 @@ insn_attr_t inat_get_group_attribute(insn_byte_t modrm, insn_byte_t last_pfx,
inat_group_common_attribute(grp_attr);
}

+insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
+ insn_byte_t vex_p)
+{
+ const insn_attr_t *table;
+ if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
+ return 0;
+ table = inat_avx_tables[vex_m][vex_p];
+ if (!table)
+ return 0;
+ return table[opcode];
+}
+
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 9f48317..9f33b98 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -28,6 +28,9 @@
#define peek_next(t, insn) \
({t r; r = *(t*)insn->next_byte; r; })

+#define peek_nbyte_next(t, insn, n) \
+ ({t r; r = *(t*)((insn)->next_byte + n); r; })
+
/**
* insn_init() - initialize struct insn
* @insn: &struct insn to be initialized
@@ -107,6 +110,7 @@ found:
insn->prefixes.bytes[3] = lb;
}

+ /* Decode REX prefix */
if (insn->x86_64) {
b = peek_next(insn_byte_t, insn);
attr = inat_get_opcode_attribute(b);
@@ -120,6 +124,39 @@ found:
}
}
insn->rex_prefix.got = 1;
+
+ /* Decode VEX prefix */
+ b = peek_next(insn_byte_t, insn);
+ attr = inat_get_opcode_attribute(b);
+ if (inat_is_vex_prefix(attr)) {
+ insn_byte_t b2 = peek_nbyte_next(insn_byte_t, insn, 1);
+ if (!insn->x86_64) {
+ /*
+ * In 32-bits mode, if the [7:6] bits (mod bits of
+ * ModRM) on the second byte are not 11b, it is
+ * LDS or LES.
+ */
+ if (X86_MODRM_MOD(b2) != 3)
+ goto vex_end;
+ }
+ insn->vex_prefix.bytes[0] = b;
+ insn->vex_prefix.bytes[1] = b2;
+ if (inat_is_vex3_prefix(attr)) {
+ b2 = peek_nbyte_next(insn_byte_t, insn, 2);
+ insn->vex_prefix.bytes[2] = b2;
+ insn->vex_prefix.nbytes = 3;
+ insn->next_byte += 3;
+ if (insn->x86_64 && X86_VEX_W(b2))
+ /* VEX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ } else {
+ insn->vex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ }
+ }
+vex_end:
+ insn->vex_prefix.got = 1;
+
prefixes->got = 1;
return;
}
@@ -147,6 +184,18 @@ void insn_get_opcode(struct insn *insn)
op = get_next(insn_byte_t, insn);
opcode->bytes[0] = op;
opcode->nbytes = 1;
+
+ /* Check if there is VEX prefix or not */
+ if (insn_is_avx(insn)) {
+ insn_byte_t m, p;
+ m = insn_vex_m_bits(insn);
+ p = insn_vex_p_bits(insn);
+ insn->attr = inat_get_avx_attribute(op, m, p);
+ if (!inat_accept_vex(insn->attr))
+ insn->attr = 0; /* This instruction is bad */
+ goto end; /* VEX has only 1 byte for opcode */
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
@@ -155,6 +204,9 @@ void insn_get_opcode(struct insn *insn)
pfx = insn_last_prefix(insn);
insn->attr = inat_get_escape_attribute(op, pfx, insn->attr);
}
+ if (inat_must_vex(insn->attr))
+ insn->attr = 0; /* This instruction is bad */
+end:
opcode->got = 1;
}

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1f41246..9887bfe 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -3,6 +3,7 @@
#<Opcode maps>
# Table: table-name
# Referrer: escaped-name
+# AVXcode: avx-code
# opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
# (or)
# opcode: escape # escaped-name
@@ -13,9 +14,16 @@
# reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
# EndTable
#
+# AVX Superscripts
+# (VEX): this opcode can accept VEX prefix.
+# (oVEX): this opcode requires VEX prefix.
+# (o128): this opcode only supports 128bit VEX.
+# (o256): this opcode only supports 256bit VEX.
+#

Table: one byte opcode
Referrer:
+AVXcode:
# 0x00 - 0x0f
00: ADD Eb,Gb
01: ADD Ev,Gv
@@ -225,8 +233,8 @@ c0: Grp2 Eb,Ib (1A)
c1: Grp2 Ev,Ib (1A)
c2: RETN Iw (f64)
c3: RETN
-c4: LES Gz,Mp (i64)
-c5: LDS Gz,Mp (i64)
+c4: LES Gz,Mp (i64) | 3bytes-VEX (Prefix)
+c5: LDS Gz,Mp (i64) | 2bytes-VEX (Prefix)
c6: Grp11 Eb,Ib (1A)
c7: Grp11 Ev,Iz (1A)
c8: ENTER Iw,Ib
@@ -290,8 +298,9 @@ fe: Grp4 (1A)
ff: Grp5 (1A)
EndTable

-Table: 2-byte opcode # First Byte is 0x0f
+Table: 2-byte opcode (0x0f)
Referrer: 2-byte escape
+AVXcode: 1
# 0x0f 0x00-0x0f
00: Grp6 (1A)
01: Grp7 (1A)
@@ -311,14 +320,14 @@ Referrer: 2-byte escape
# 3DNow! uses the last imm byte as opcode extension.
0f: 3DNow! Pq,Qq,Ib
# 0x0f 0x10-0x1f
-10: movups Vps,Wps | movss Vss,Wss (F3) | movupd Vpd,Wpd (66) | movsd Vsd,Wsd (F2)
-11: movups Wps,Vps | movss Wss,Vss (F3) | movupd Wpd,Vpd (66) | movsd Wsd,Vsd (F2)
-12: movlps Vq,Mq | movlpd Vq,Mq (66) | movhlps Vq,Uq | movddup Vq,Wq (F2) | movsldup Vq,Wq (F3)
-13: mpvlps Mq,Vq | movlpd Mq,Vq (66)
-14: unpcklps Vps,Wq | unpcklpd Vpd,Wq (66)
-15: unpckhps Vps,Wq | unpckhpd Vpd,Wq (66)
-16: movhps Vq,Mq | movhpd Vq,Mq (66) | movlsps Vq,Uq | movshdup Vq,Wq (F3)
-17: movhps Mq,Vq | movhpd Mq,Vq (66)
+10: movups Vps,Wps (VEX) | movss Vss,Wss (F3),(VEX),(o128) | movupd Vpd,Wpd (66),(VEX) | movsd Vsd,Wsd (F2),(VEX),(o128)
+11: movups Wps,Vps (VEX) | movss Wss,Vss (F3),(VEX),(o128) | movupd Wpd,Vpd (66),(VEX) | movsd Wsd,Vsd (F2),(VEX),(o128)
+12: movlps Vq,Mq (VEX),(o128) | movlpd Vq,Mq (66),(VEX),(o128) | movhlps Vq,Uq (VEX),(o128) | movddup Vq,Wq (F2),(VEX) | movsldup Vq,Wq (F3),(VEX)
+13: mpvlps Mq,Vq (VEX),(o128) | movlpd Mq,Vq (66),(VEX),(o128)
+14: unpcklps Vps,Wq (VEX) | unpcklpd Vpd,Wq (66),(VEX)
+15: unpckhps Vps,Wq (VEX) | unpckhpd Vpd,Wq (66),(VEX)
+16: movhps Vq,Mq (VEX),(o128) | movhpd Vq,Mq (66),(VEX),(o128) | movlsps Vq,Uq (VEX),(o128) | movshdup Vq,Wq (F3),(VEX)
+17: movhps Mq,Vq (VEX),(o128) | movhpd Mq,Vq (66),(VEX),(o128)
18: Grp16 (1A)
19:
1a:
@@ -336,14 +345,14 @@ Referrer: 2-byte escape
25:
26:
27:
-28: movaps Vps,Wps | movapd Vpd,Wpd (66)
-29: movaps Wps,Vps | movapd Wpd,Vpd (66)
-2a: cvtpi2ps Vps,Qpi | cvtsi2ss Vss,Ed/q (F3) | cvtpi2pd Vpd,Qpi (66) | cvtsi2sd Vsd,Ed/q (F2)
-2b: movntps Mps,Vps | movntpd Mpd,Vpd (66)
-2c: cvttps2pi Ppi,Wps | cvttss2si Gd/q,Wss (F3) | cvttpd2pi Ppi,Wpd (66) | cvttsd2si Gd/q,Wsd (F2)
-2d: cvtps2pi Ppi,Wps | cvtss2si Gd/q,Wss (F3) | cvtpd2pi Qpi,Wpd (66) | cvtsd2si Gd/q,Wsd (F2)
-2e: ucomiss Vss,Wss | ucomisd Vsd,Wsd (66)
-2f: comiss Vss,Wss | comisd Vsd,Wsd (66)
+28: movaps Vps,Wps (VEX) | movapd Vpd,Wpd (66),(VEX)
+29: movaps Wps,Vps (VEX) | movapd Wpd,Vpd (66),(VEX)
+2a: cvtpi2ps Vps,Qpi | cvtsi2ss Vss,Ed/q (F3),(VEX),(o128) | cvtpi2pd Vpd,Qpi (66) | cvtsi2sd Vsd,Ed/q (F2),(VEX),(o128)
+2b: movntps Mps,Vps (VEX) | movntpd Mpd,Vpd (66),(VEX)
+2c: cvttps2pi Ppi,Wps | cvttss2si Gd/q,Wss (F3),(VEX),(o128) | cvttpd2pi Ppi,Wpd (66) | cvttsd2si Gd/q,Wsd (F2),(VEX),(o128)
+2d: cvtps2pi Ppi,Wps | cvtss2si Gd/q,Wss (F3),(VEX),(o128) | cvtpd2pi Qpi,Wpd (66) | cvtsd2si Gd/q,Wsd (F2),(VEX),(o128)
+2e: ucomiss Vss,Wss (VEX),(o128) | ucomisd Vsd,Wsd (66),(VEX),(o128)
+2f: comiss Vss,Wss (VEX),(o128) | comisd Vsd,Wsd (66),(VEX),(o128)
# 0x0f 0x30-0x3f
30: WRMSR
31: RDTSC
@@ -379,56 +388,56 @@ Referrer: 2-byte escape
4e: CMOVLE/NG Gv,Ev
4f: CMOVNLE/G Gv,Ev
# 0x0f 0x50-0x5f
-50: movmskps Gd/q,Ups | movmskpd Gd/q,Upd (66)
-51: sqrtps Vps,Wps | sqrtss Vss,Wss (F3) | sqrtpd Vpd,Wpd (66) | sqrtsd Vsd,Wsd (F2)
-52: rsqrtps Vps,Wps | rsqrtss Vss,Wss (F3)
-53: rcpps Vps,Wps | rcpss Vss,Wss (F3)
-54: andps Vps,Wps | andpd Vpd,Wpd (66)
-55: andnps Vps,Wps | andnpd Vpd,Wpd (66)
-56: orps Vps,Wps | orpd Vpd,Wpd (66)
-57: xorps Vps,Wps | xorpd Vpd,Wpd (66)
-58: addps Vps,Wps | addss Vss,Wss (F3) | addpd Vpd,Wpd (66) | addsd Vsd,Wsd (F2)
-59: mulps Vps,Wps | mulss Vss,Wss (F3) | mulpd Vpd,Wpd (66) | mulsd Vsd,Wsd (F2)
-5a: cvtps2pd Vpd,Wps | cvtss2sd Vsd,Wss (F3) | cvtpd2ps Vps,Wpd (66) | cvtsd2ss Vsd,Wsd (F2)
-5b: cvtdq2ps Vps,Wdq | cvtps2dq Vdq,Wps (66) | cvttps2dq Vdq,Wps (F3)
-5c: subps Vps,Wps | subss Vss,Wss (F3) | subpd Vpd,Wpd (66) | subsd Vsd,Wsd (F2)
-5d: minps Vps,Wps | minss Vss,Wss (F3) | minpd Vpd,Wpd (66) | minsd Vsd,Wsd (F2)
-5e: divps Vps,Wps | divss Vss,Wss (F3) | divpd Vpd,Wpd (66) | divsd Vsd,Wsd (F2)
-5f: maxps Vps,Wps | maxss Vss,Wss (F3) | maxpd Vpd,Wpd (66) | maxsd Vsd,Wsd (F2)
+50: movmskps Gd/q,Ups (VEX) | movmskpd Gd/q,Upd (66),(VEX)
+51: sqrtps Vps,Wps (VEX) | sqrtss Vss,Wss (F3),(VEX),(o128) | sqrtpd Vpd,Wpd (66),(VEX) | sqrtsd Vsd,Wsd (F2),(VEX),(o128)
+52: rsqrtps Vps,Wps (VEX) | rsqrtss Vss,Wss (F3),(VEX),(o128)
+53: rcpps Vps,Wps (VEX) | rcpss Vss,Wss (F3),(VEX),(o128)
+54: andps Vps,Wps (VEX) | andpd Vpd,Wpd (66),(VEX)
+55: andnps Vps,Wps (VEX) | andnpd Vpd,Wpd (66),(VEX)
+56: orps Vps,Wps (VEX) | orpd Vpd,Wpd (66),(VEX)
+57: xorps Vps,Wps (VEX) | xorpd Vpd,Wpd (66),(VEX)
+58: addps Vps,Wps (VEX) | addss Vss,Wss (F3),(VEX),(o128) | addpd Vpd,Wpd (66),(VEX) | addsd Vsd,Wsd (F2),(VEX),(o128)
+59: mulps Vps,Wps (VEX) | mulss Vss,Wss (F3),(VEX),(o128) | mulpd Vpd,Wpd (66),(VEX) | mulsd Vsd,Wsd (F2),(VEX),(o128)
+5a: cvtps2pd Vpd,Wps (VEX) | cvtss2sd Vsd,Wss (F3),(VEX),(o128) | cvtpd2ps Vps,Wpd (66),(VEX) | cvtsd2ss Vsd,Wsd (F2),(VEX),(o128)
+5b: cvtdq2ps Vps,Wdq (VEX) | cvtps2dq Vdq,Wps (66),(VEX) | cvttps2dq Vdq,Wps (F3),(VEX)
+5c: subps Vps,Wps (VEX) | subss Vss,Wss (F3),(VEX),(o128) | subpd Vpd,Wpd (66),(VEX) | subsd Vsd,Wsd (F2),(VEX),(o128)
+5d: minps Vps,Wps (VEX) | minss Vss,Wss (F3),(VEX),(o128) | minpd Vpd,Wpd (66),(VEX) | minsd Vsd,Wsd (F2),(VEX),(o128)
+5e: divps Vps,Wps (VEX) | divss Vss,Wss (F3),(VEX),(o128) | divpd Vpd,Wpd (66),(VEX) | divsd Vsd,Wsd (F2),(VEX),(o128)
+5f: maxps Vps,Wps (VEX) | maxss Vss,Wss (F3),(VEX),(o128) | maxpd Vpd,Wpd (66),(VEX) | maxsd Vsd,Wsd (F2),(VEX),(o128)
# 0x0f 0x60-0x6f
-60: punpcklbw Pq,Qd | punpcklbw Vdq,Wdq (66)
-61: punpcklwd Pq,Qd | punpcklwd Vdq,Wdq (66)
-62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66)
-63: packsswb Pq,Qq | packsswb Vdq,Wdq (66)
-64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66)
-65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66)
-66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66)
-67: packuswb Pq,Qq | packuswb Vdq,Wdq (66)
-68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66)
-69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66)
-6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66)
-6b: packssdw Pq,Qd | packssdw Vdq,Wdq (66)
-6c: punpcklqdq Vdq,Wdq (66)
-6d: punpckhqdq Vdq,Wdq (66)
-6e: movd/q/ Pd,Ed/q | movd/q Vdq,Ed/q (66)
-6f: movq Pq,Qq | movdqa Vdq,Wdq (66) | movdqu Vdq,Wdq (F3)
+60: punpcklbw Pq,Qd | punpcklbw Vdq,Wdq (66),(VEX),(o128)
+61: punpcklwd Pq,Qd | punpcklwd Vdq,Wdq (66),(VEX),(o128)
+62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66),(VEX),(o128)
+63: packsswb Pq,Qq | packsswb Vdq,Wdq (66),(VEX),(o128)
+64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66),(VEX),(o128)
+65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66),(VEX),(o128)
+66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66),(VEX),(o128)
+67: packuswb Pq,Qq | packuswb Vdq,Wdq (66),(VEX),(o128)
+68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66),(VEX),(o128)
+69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66),(VEX),(o128)
+6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66),(VEX),(o128)
+6b: packssdw Pq,Qd | packssdw Vdq,Wdq (66),(VEX),(o128)
+6c: punpcklqdq Vdq,Wdq (66),(VEX),(o128)
+6d: punpckhqdq Vdq,Wdq (66),(VEX),(o128)
+6e: movd/q/ Pd,Ed/q | movd/q Vdq,Ed/q (66),(VEX),(o128)
+6f: movq Pq,Qq | movdqa Vdq,Wdq (66),(VEX) | movdqu Vdq,Wdq (F3),(VEX)
# 0x0f 0x70-0x7f
-70: pshufw Pq,Qq,Ib | pshufd Vdq,Wdq,Ib (66) | pshufhw Vdq,Wdq,Ib (F3) | pshuflw VdqWdq,Ib (F2)
+70: pshufw Pq,Qq,Ib | pshufd Vdq,Wdq,Ib (66),(VEX),(o128) | pshufhw Vdq,Wdq,Ib (F3),(VEX),(o128) | pshuflw VdqWdq,Ib (F2),(VEX),(o128)
71: Grp12 (1A)
72: Grp13 (1A)
73: Grp14 (1A)
-74: pcmpeqb Pq,Qq | pcmpeqb Vdq,Wdq (66)
-75: pcmpeqw Pq,Qq | pcmpeqw Vdq,Wdq (66)
-76: pcmpeqd Pq,Qq | pcmpeqd Vdq,Wdq (66)
-77: emms
+74: pcmpeqb Pq,Qq | pcmpeqb Vdq,Wdq (66),(VEX),(o128)
+75: pcmpeqw Pq,Qq | pcmpeqw Vdq,Wdq (66),(VEX),(o128)
+76: pcmpeqd Pq,Qq | pcmpeqd Vdq,Wdq (66),(VEX),(o128)
+77: emms/vzeroupper/vzeroall (VEX)
78: VMREAD Ed/q,Gd/q
79: VMWRITE Gd/q,Ed/q
7a:
7b:
-7c: haddps Vps,Wps (F2) | haddpd Vpd,Wpd (66)
-7d: hsubps Vps,Wps (F2) | hsubpd Vpd,Wpd (66)
-7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66) | movq Vq,Wq (F3)
-7f: movq Qq,Pq | movdqa Wdq,Vdq (66) | movdqu Wdq,Vdq (F3)
+7c: haddps Vps,Wps (F2),(VEX) | haddpd Vpd,Wpd (66),(VEX)
+7d: hsubps Vps,Wps (F2),(VEX) | hsubpd Vpd,Wpd (66),(VEX)
+7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66),(VEX),(o128) | movq Vq,Wq (F3),(VEX),(o128)
+7f: movq Qq,Pq | movdqa Wdq,Vdq (66),(VEX) | movdqu Wdq,Vdq (F3),(VEX)
# 0x0f 0x80-0x8f
80: JO Jz (f64)
81: JNO Jz (f64)
@@ -500,11 +509,11 @@ bf: MOVSX Gv,Ew
# 0x0f 0xc0-0xcf
c0: XADD Eb,Gb
c1: XADD Ev,Gv
-c2: cmpps Vps,Wps,Ib | cmpss Vss,Wss,Ib (F3) | cmppd Vpd,Wpd,Ib (66) | cmpsd Vsd,Wsd,Ib (F2)
+c2: cmpps Vps,Wps,Ib (VEX) | cmpss Vss,Wss,Ib (F3),(VEX),(o128) | cmppd Vpd,Wpd,Ib (66),(VEX) | cmpsd Vsd,Wsd,Ib (F2),(VEX)
c3: movnti Md/q,Gd/q
-c4: pinsrw Pq,Rd/q/Mw,Ib | pinsrw Vdq,Rd/q/Mw,Ib (66)
-c5: pextrw Gd,Nq,Ib | pextrw Gd,Udq,Ib (66)
-c6: shufps Vps,Wps,Ib | shufpd Vpd,Wpd,Ib (66)
+c4: pinsrw Pq,Rd/q/Mw,Ib | pinsrw Vdq,Rd/q/Mw,Ib (66),(VEX),(o128)
+c5: pextrw Gd,Nq,Ib | pextrw Gd,Udq,Ib (66),(VEX),(o128)
+c6: shufps Vps,Wps,Ib (VEX) | shufpd Vpd,Wpd,Ib (66),(VEX)
c7: Grp9 (1A)
c8: BSWAP RAX/EAX/R8/R8D
c9: BSWAP RCX/ECX/R9/R9D
@@ -515,77 +524,78 @@ cd: BSWAP RBP/EBP/R13/R13D
ce: BSWAP RSI/ESI/R14/R14D
cf: BSWAP RDI/EDI/R15/R15D
# 0x0f 0xd0-0xdf
-d0: addsubps Vps,Wps (F2) | addsubpd Vpd,Wpd (66)
-d1: psrlw Pq,Qq | psrlw Vdq,Wdq (66)
-d2: psrld Pq,Qq | psrld Vdq,Wdq (66)
-d3: psrlq Pq,Qq | psrlq Vdq,Wdq (66)
-d4: paddq Pq,Qq | paddq Vdq,Wdq (66)
-d5: pmullw Pq,Qq | pmullw Vdq,Wdq (66)
-d6: movq Wq,Vq (66) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
-d7: pmovmskb Gd,Nq | pmovmskb Gd,Udq (66)
-d8: psubusb Pq,Qq | psubusb Vdq,Wdq (66)
-d9: psubusw Pq,Qq | psubusw Vdq,Wdq (66)
-da: pminub Pq,Qq | pminub Vdq,Wdq (66)
-db: pand Pq,Qq | pand Vdq,Wdq (66)
-dc: paddusb Pq,Qq | paddusb Vdq,Wdq (66)
-dd: paddusw Pq,Qq | paddusw Vdq,Wdq (66)
-de: pmaxub Pq,Qq | pmaxub Vdq,Wdq (66)
-df: pandn Pq,Qq | pandn Vdq,Wdq (66)
+d0: addsubps Vps,Wps (F2),(VEX) | addsubpd Vpd,Wpd (66),(VEX)
+d1: psrlw Pq,Qq | psrlw Vdq,Wdq (66),(VEX),(o128)
+d2: psrld Pq,Qq | psrld Vdq,Wdq (66),(VEX),(o128)
+d3: psrlq Pq,Qq | psrlq Vdq,Wdq (66),(VEX),(o128)
+d4: paddq Pq,Qq | paddq Vdq,Wdq (66),(VEX),(o128)
+d5: pmullw Pq,Qq | pmullw Vdq,Wdq (66),(VEX),(o128)
+d6: movq Wq,Vq (66),(VEX),(o128) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
+d7: pmovmskb Gd,Nq | pmovmskb Gd,Udq (66),(VEX),(o128)
+d8: psubusb Pq,Qq | psubusb Vdq,Wdq (66),(VEX),(o128)
+d9: psubusw Pq,Qq | psubusw Vdq,Wdq (66),(VEX),(o128)
+da: pminub Pq,Qq | pminub Vdq,Wdq (66),(VEX),(o128)
+db: pand Pq,Qq | pand Vdq,Wdq (66),(VEX),(o128)
+dc: paddusb Pq,Qq | paddusb Vdq,Wdq (66),(VEX),(o128)
+dd: paddusw Pq,Qq | paddusw Vdq,Wdq (66),(VEX),(o128)
+de: pmaxub Pq,Qq | pmaxub Vdq,Wdq (66),(VEX),(o128)
+df: pandn Pq,Qq | pandn Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0xe0-0xef
-e0: pavgb Pq,Qq | pavgb Vdq,Wdq (66)
-e1: psraw Pq,Qq | psraw Vdq,Wdq (66)
-e2: psrad Pq,Qq | psrad Vdq,Wdq (66)
-e3: pavgw Pq,Qq | pavgw Vdq,Wdq (66)
-e4: pmulhuw Pq,Qq | pmulhuw Vdq,Wdq (66)
-e5: pmulhw Pq,Qq | pmulhw Vdq,Wdq (66)
-e6: cvtpd2dq Vdq,Wpd (F2) | cvttpd2dq Vdq,Wpd (66) | cvtdq2pd Vpd,Wdq (F3)
-e7: movntq Mq,Pq | movntdq Mdq,Vdq (66)
-e8: psubsb Pq,Qq | psubsb Vdq,Wdq (66)
-e9: psubsw Pq,Qq | psubsw Vdq,Wdq (66)
-ea: pminsw Pq,Qq | pminsw Vdq,Wdq (66)
-eb: por Pq,Qq | por Vdq,Wdq (66)
-ec: paddsb Pq,Qq | paddsb Vdq,Wdq (66)
-ed: paddsw Pq,Qq | paddsw Vdq,Wdq (66)
-ee: pmaxsw Pq,Qq | pmaxsw Vdq,Wdq (66)
-ef: pxor Pq,Qq | pxor Vdq,Wdq (66)
+e0: pavgb Pq,Qq | pavgb Vdq,Wdq (66),(VEX),(o128)
+e1: psraw Pq,Qq | psraw Vdq,Wdq (66),(VEX),(o128)
+e2: psrad Pq,Qq | psrad Vdq,Wdq (66),(VEX),(o128)
+e3: pavgw Pq,Qq | pavgw Vdq,Wdq (66),(VEX),(o128)
+e4: pmulhuw Pq,Qq | pmulhuw Vdq,Wdq (66),(VEX),(o128)
+e5: pmulhw Pq,Qq | pmulhw Vdq,Wdq (66),(VEX),(o128)
+e6: cvtpd2dq Vdq,Wpd (F2),(VEX) | cvttpd2dq Vdq,Wpd (66),(VEX) | cvtdq2pd Vpd,Wdq (F3),(VEX)
+e7: movntq Mq,Pq | movntdq Mdq,Vdq (66),(VEX)
+e8: psubsb Pq,Qq | psubsb Vdq,Wdq (66),(VEX),(o128)
+e9: psubsw Pq,Qq | psubsw Vdq,Wdq (66),(VEX),(o128)
+ea: pminsw Pq,Qq | pminsw Vdq,Wdq (66),(VEX),(o128)
+eb: por Pq,Qq | por Vdq,Wdq (66),(VEX),(o128)
+ec: paddsb Pq,Qq | paddsb Vdq,Wdq (66),(VEX),(o128)
+ed: paddsw Pq,Qq | paddsw Vdq,Wdq (66),(VEX),(o128)
+ee: pmaxsw Pq,Qq | pmaxsw Vdq,Wdq (66),(VEX),(o128)
+ef: pxor Pq,Qq | pxor Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0xf0-0xff
-f0: lddqu Vdq,Mdq (F2)
-f1: psllw Pq,Qq | psllw Vdq,Wdq (66)
-f2: pslld Pq,Qq | pslld Vdq,Wdq (66)
-f3: psllq Pq,Qq | psllq Vdq,Wdq (66)
-f4: pmuludq Pq,Qq | pmuludq Vdq,Wdq (66)
-f5: pmaddwd Pq,Qq | pmaddwd Vdq,Wdq (66)
-f6: psadbw Pq,Qq | psadbw Vdq,Wdq (66)
-f7: maskmovq Pq,Nq | maskmovdqu Vdq,Udq (66)
-f8: psubb Pq,Qq | psubb Vdq,Wdq (66)
-f9: psubw Pq,Qq | psubw Vdq,Wdq (66)
-fa: psubd Pq,Qq | psubd Vdq,Wdq (66)
-fb: psubq Pq,Qq | psubq Vdq,Wdq (66)
-fc: paddb Pq,Qq | paddb Vdq,Wdq (66)
-fd: paddw Pq,Qq | paddw Vdq,Wdq (66)
-fe: paddd Pq,Qq | paddd Vdq,Wdq (66)
+f0: lddqu Vdq,Mdq (F2),(VEX)
+f1: psllw Pq,Qq | psllw Vdq,Wdq (66),(VEX),(o128)
+f2: pslld Pq,Qq | pslld Vdq,Wdq (66),(VEX),(o128)
+f3: psllq Pq,Qq | psllq Vdq,Wdq (66),(VEX),(o128)
+f4: pmuludq Pq,Qq | pmuludq Vdq,Wdq (66),(VEX),(o128)
+f5: pmaddwd Pq,Qq | pmaddwd Vdq,Wdq (66),(VEX),(o128)
+f6: psadbw Pq,Qq | psadbw Vdq,Wdq (66),(VEX),(o128)
+f7: maskmovq Pq,Nq | maskmovdqu Vdq,Udq (66),(VEX),(o128)
+f8: psubb Pq,Qq | psubb Vdq,Wdq (66),(VEX),(o128)
+f9: psubw Pq,Qq | psubw Vdq,Wdq (66),(VEX),(o128)
+fa: psubd Pq,Qq | psubd Vdq,Wdq (66),(VEX),(o128)
+fb: psubq Pq,Qq | psubq Vdq,Wdq (66),(VEX),(o128)
+fc: paddb Pq,Qq | paddb Vdq,Wdq (66),(VEX),(o128)
+fd: paddw Pq,Qq | paddw Vdq,Wdq (66),(VEX),(o128)
+fe: paddd Pq,Qq | paddd Vdq,Wdq (66),(VEX),(o128)
ff:
EndTable

Table: 3-byte opcode 1 (0x0f 0x38)
Referrer: 3-byte escape 1
+AVXcode: 2
# 0x0f 0x38 0x00-0x0f
-00: pshufb Pq,Qq | pshufb Vdq,Wdq (66)
-01: phaddw Pq,Qq | phaddw Vdq,Wdq (66)
-02: phaddd Pq,Qq | phaddd Vdq,Wdq (66)
-03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66)
-04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66)
-05: phsubw Pq,Qq | phsubw Vdq,Wdq (66)
-06: phsubd Pq,Qq | phsubd Vdq,Wdq (66)
-07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66)
-08: psignb Pq,Qq | psignb Vdq,Wdq (66)
-09: psignw Pq,Qq | psignw Vdq,Wdq (66)
-0a: psignd Pq,Qq | psignd Vdq,Wdq (66)
-0b: pmulhrsw Pq,Qq | pmulhrsw Vdq,Wdq (66)
-0c:
-0d:
-0e:
-0f:
+00: pshufb Pq,Qq | pshufb Vdq,Wdq (66),(VEX),(o128)
+01: phaddw Pq,Qq | phaddw Vdq,Wdq (66),(VEX),(o128)
+02: phaddd Pq,Qq | phaddd Vdq,Wdq (66),(VEX),(o128)
+03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66),(VEX),(o128)
+04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66),(VEX),(o128)
+05: phsubw Pq,Qq | phsubw Vdq,Wdq (66),(VEX),(o128)
+06: phsubd Pq,Qq | phsubd Vdq,Wdq (66),(VEX),(o128)
+07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66),(VEX),(o128)
+08: psignb Pq,Qq | psignb Vdq,Wdq (66),(VEX),(o128)
+09: psignw Pq,Qq | psignw Vdq,Wdq (66),(VEX),(o128)
+0a: psignd Pq,Qq | psignd Vdq,Wdq (66),(VEX),(o128)
+0b: pmulhrsw Pq,Qq | pmulhrsw Vdq,Wdq (66),(VEX),(o128)
+0c: Vpermilps /r (66),(oVEX)
+0d: Vpermilpd /r (66),(oVEX)
+0e: vtestps /r (66),(oVEX)
+0f: vtestpd /r (66),(oVEX)
# 0x0f 0x38 0x10-0x1f
10: pblendvb Vdq,Wdq (66)
11:
@@ -594,90 +604,99 @@ Referrer: 3-byte escape 1
14: blendvps Vdq,Wdq (66)
15: blendvpd Vdq,Wdq (66)
16:
-17: ptest Vdq,Wdq (66)
-18:
-19:
-1a:
+17: ptest Vdq,Wdq (66),(VEX)
+18: vbroadcastss /r (66),(oVEX)
+19: vbroadcastsd /r (66),(oVEX),(o256)
+1a: vbroadcastf128 /r (66),(oVEX),(o256)
1b:
-1c: pabsb Pq,Qq | pabsb Vdq,Wdq (66)
-1d: pabsw Pq,Qq | pabsw Vdq,Wdq (66)
-1e: pabsd Pq,Qq | pabsd Vdq,Wdq (66)
+1c: pabsb Pq,Qq | pabsb Vdq,Wdq (66),(VEX),(o128)
+1d: pabsw Pq,Qq | pabsw Vdq,Wdq (66),(VEX),(o128)
+1e: pabsd Pq,Qq | pabsd Vdq,Wdq (66),(VEX),(o128)
1f:
# 0x0f 0x38 0x20-0x2f
-20: pmovsxbw Vdq,Udq/Mq (66)
-21: pmovsxbd Vdq,Udq/Md (66)
-22: pmovsxbq Vdq,Udq/Mw (66)
-23: pmovsxwd Vdq,Udq/Mq (66)
-24: pmovsxwq Vdq,Udq/Md (66)
-25: pmovsxdq Vdq,Udq/Mq (66)
+20: pmovsxbw Vdq,Udq/Mq (66),(VEX),(o128)
+21: pmovsxbd Vdq,Udq/Md (66),(VEX),(o128)
+22: pmovsxbq Vdq,Udq/Mw (66),(VEX),(o128)
+23: pmovsxwd Vdq,Udq/Mq (66),(VEX),(o128)
+24: pmovsxwq Vdq,Udq/Md (66),(VEX),(o128)
+25: pmovsxdq Vdq,Udq/Mq (66),(VEX),(o128)
26:
27:
-28: pmuldq Vdq,Wdq (66)
-29: pcmpeqq Vdq,Wdq (66)
-2a: movntdqa Vdq,Mdq (66)
-2b: packusdw Vdq,Wdq (66)
-2c:
-2d:
-2e:
-2f:
+28: pmuldq Vdq,Wdq (66),(VEX),(o128)
+29: pcmpeqq Vdq,Wdq (66),(VEX),(o128)
+2a: movntdqa Vdq,Mdq (66),(VEX),(o128)
+2b: packusdw Vdq,Wdq (66),(VEX),(o128)
+2c: vmaskmovps(ld) /r (66),(oVEX)
+2d: vmaskmovpd(ld) /r (66),(oVEX)
+2e: vmaskmovps(st) /r (66),(oVEX)
+2f: vmaskmovpd(st) /r (66),(oVEX)
# 0x0f 0x38 0x30-0x3f
-30: pmovzxbw Vdq,Udq/Mq (66)
-31: pmovzxbd Vdq,Udq/Md (66)
-32: pmovzxbq Vdq,Udq/Mw (66)
-33: pmovzxwd Vdq,Udq/Mq (66)
-34: pmovzxwq Vdq,Udq/Md (66)
-35: pmovzxdq Vdq,Udq/Mq (66)
+30: pmovzxbw Vdq,Udq/Mq (66),(VEX),(o128)
+31: pmovzxbd Vdq,Udq/Md (66),(VEX),(o128)
+32: pmovzxbq Vdq,Udq/Mw (66),(VEX),(o128)
+33: pmovzxwd Vdq,Udq/Mq (66),(VEX),(o128)
+34: pmovzxwq Vdq,Udq/Md (66),(VEX),(o128)
+35: pmovzxdq Vdq,Udq/Mq (66),(VEX),(o128)
36:
-37: pcmpgtq Vdq,Wdq (66)
-38: pminsb Vdq,Wdq (66)
-39: pminsd Vdq,Wdq (66)
-3a: pminuw Vdq,Wdq (66)
-3b: pminud Vdq,Wdq (66)
-3c: pmaxsb Vdq,Wdq (66)
-3d: pmaxsd Vdq,Wdq (66)
-3e: pmaxuw Vdq,Wdq (66)
-3f: pmaxud Vdq,Wdq (66)
+37: pcmpgtq Vdq,Wdq (66),(VEX),(o128)
+38: pminsb Vdq,Wdq (66),(VEX),(o128)
+39: pminsd Vdq,Wdq (66),(VEX),(o128)
+3a: pminuw Vdq,Wdq (66),(VEX),(o128)
+3b: pminud Vdq,Wdq (66),(VEX),(o128)
+3c: pmaxsb Vdq,Wdq (66),(VEX),(o128)
+3d: pmaxsd Vdq,Wdq (66),(VEX),(o128)
+3e: pmaxuw Vdq,Wdq (66),(VEX),(o128)
+3f: pmaxud Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0x38 0x4f-0xff
-40: pmulld Vdq,Wdq (66)
-41: phminposuw Vdq,Wdq (66)
+40: pmulld Vdq,Wdq (66),(VEX),(o128)
+41: phminposuw Vdq,Wdq (66),(VEX),(o128)
80: INVEPT Gd/q,Mdq (66)
81: INVPID Gd/q,Mdq (66)
-db: aesimc Vdq,Wdq (66)
-dc: aesenc Vdq,Wdq (66)
-dd: aesenclast Vdq,Wdq (66)
-de: aesdec Vdq,Wdq (66)
-df: aesdeclast Vdq,Wdq (66)
+db: aesimc Vdq,Wdq (66),(VEX),(o128)
+dc: aesenc Vdq,Wdq (66),(VEX),(o128)
+dd: aesenclast Vdq,Wdq (66),(VEX),(o128)
+de: aesdec Vdq,Wdq (66),(VEX),(o128)
+df: aesdeclast Vdq,Wdq (66),(VEX),(o128)
f0: MOVBE Gv,Mv | CRC32 Gd,Eb (F2)
f1: MOVBE Mv,Gv | CRC32 Gd,Ev (F2)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
Referrer: 3-byte escape 2
+AVXcode: 3
# 0x0f 0x3a 0x00-0xff
-08: roundps Vdq,Wdq,Ib (66)
-09: roundpd Vdq,Wdq,Ib (66)
-0a: roundss Vss,Wss,Ib (66)
-0b: roundsd Vsd,Wsd,Ib (66)
-0c: blendps Vdq,Wdq,Ib (66)
-0d: blendpd Vdq,Wdq,Ib (66)
-0e: pblendw Vdq,Wdq,Ib (66)
-0f: palignr Pq,Qq,Ib | palignr Vdq,Wdq,Ib (66)
-14: pextrb Rd/Mb,Vdq,Ib (66)
-15: pextrw Rd/Mw,Vdq,Ib (66)
-16: pextrd/pextrq Ed/q,Vdq,Ib (66)
-17: extractps Ed,Vdq,Ib (66)
-20: pinsrb Vdq,Rd/q/Mb,Ib (66)
-21: insertps Vdq,Udq/Md,Ib (66)
-22: pinsrd/pinsrq Vdq,Ed/q,Ib (66)
-40: dpps Vdq,Wdq,Ib (66)
-41: dppd Vdq,Wdq,Ib (66)
-42: mpsadbw Vdq,Wdq,Ib (66)
-44: pclmulq Vdq,Wdq,Ib (66)
-60: pcmpestrm Vdq,Wdq,Ib (66)
-61: pcmpestri Vdq,Wdq,Ib (66)
-62: pcmpistrm Vdq,Wdq,Ib (66)
-63: pcmpistri Vdq,Wdq,Ib (66)
-df: aeskeygenassist Vdq,Wdq,Ib (66)
+04: vpermilps /r,Ib (66),(oVEX)
+05: vpermilpd /r,Ib (66),(oVEX)
+06: vperm2f128 /r,Ib (66),(oVEX),(o256)
+08: roundps Vdq,Wdq,Ib (66),(VEX)
+09: roundpd Vdq,Wdq,Ib (66),(VEX)
+0a: roundss Vss,Wss,Ib (66),(VEX),(o128)
+0b: roundsd Vsd,Wsd,Ib (66),(VEX),(o128)
+0c: blendps Vdq,Wdq,Ib (66),(VEX)
+0d: blendpd Vdq,Wdq,Ib (66),(VEX)
+0e: pblendw Vdq,Wdq,Ib (66),(VEX),(o128)
+0f: palignr Pq,Qq,Ib | palignr Vdq,Wdq,Ib (66),(VEX),(o128)
+14: pextrb Rd/Mb,Vdq,Ib (66),(VEX),(o128)
+15: pextrw Rd/Mw,Vdq,Ib (66),(VEX),(o128)
+16: pextrd/pextrq Ed/q,Vdq,Ib (66),(VEX),(o128)
+17: extractps Ed,Vdq,Ib (66),(VEX),(o128)
+18: vinsertf128 /r,Ib (66),(oVEX),(o256)
+19: vextractf128 /r,Ib (66),(oVEX),(o256)
+20: pinsrb Vdq,Rd/q/Mb,Ib (66),(VEX),(o128)
+21: insertps Vdq,Udq/Md,Ib (66),(VEX),(o128)
+22: pinsrd/pinsrq Vdq,Ed/q,Ib (66),(VEX),(o128)
+40: dpps Vdq,Wdq,Ib (66),(VEX)
+41: dppd Vdq,Wdq,Ib (66),(VEX),(o128)
+42: mpsadbw Vdq,Wdq,Ib (66),(VEX),(o128)
+44: pclmulq Vdq,Wdq,Ib (66),(VEX),(o128)
+4a: vblendvps /r,Ib (66),(oVEX)
+4b: vblendvpd /r,Ib (66),(oVEX)
+4c: vpblendvb /r,Ib (66),(oVEX),(o128)
+60: pcmpestrm Vdq,Wdq,Ib (66),(VEX),(o128)
+61: pcmpestri Vdq,Wdq,Ib (66),(VEX),(o128)
+62: pcmpistrm Vdq,Wdq,Ib (66),(VEX),(o128)
+63: pcmpistri Vdq,Wdq,Ib (66),(VEX),(o128)
+df: aeskeygenassist Vdq,Wdq,Ib (66),(VEX),(o128)
EndTable

GrpTable: Grp1
@@ -785,29 +804,29 @@ GrpTable: Grp11
EndTable

GrpTable: Grp12
-2: psrlw Nq,Ib (11B) | psrlw Udq,Ib (66),(11B)
-4: psraw Nq,Ib (11B) | psraw Udq,Ib (66),(11B)
-6: psllw Nq,Ib (11B) | psllw Udq,Ib (66),(11B)
+2: psrlw Nq,Ib (11B) | psrlw Udq,Ib (66),(11B),(VEX),(o128)
+4: psraw Nq,Ib (11B) | psraw Udq,Ib (66),(11B),(VEX),(o128)
+6: psllw Nq,Ib (11B) | psllw Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp13
-2: psrld Nq,Ib (11B) | psrld Udq,Ib (66),(11B)
-4: psrad Nq,Ib (11B) | psrad Udq,Ib (66),(11B)
-6: pslld Nq,Ib (11B) | pslld Udq,Ib (66),(11B)
+2: psrld Nq,Ib (11B) | psrld Udq,Ib (66),(11B),(VEX),(o128)
+4: psrad Nq,Ib (11B) | psrad Udq,Ib (66),(11B),(VEX),(o128)
+6: pslld Nq,Ib (11B) | pslld Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp14
-2: psrlq Nq,Ib (11B) | psrlq Udq,Ib (66),(11B)
-3: psrldq Udq,Ib (66),(11B)
-6: psllq Nq,Ib (11B) | psllq Udq,Ib (66),(11B)
-7: pslldq Udq,Ib (66),(11B)
+2: psrlq Nq,Ib (11B) | psrlq Udq,Ib (66),(11B),(VEX),(o128)
+3: psrldq Udq,Ib (66),(11B),(VEX),(o128)
+6: psllq Nq,Ib (11B) | psllq Udq,Ib (66),(11B),(VEX),(o128)
+7: pslldq Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp15
0: fxsave
1: fxstor
-2: ldmxcsr
-3: stmxcsr
+2: ldmxcsr (VEX)
+3: stmxcsr (VEX)
4: XSAVE
5: XRSTOR | lfence (11B)
6: mfence (11B)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 7d54929..e34e92a 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -13,6 +13,18 @@ function check_awk_implement() {
return ""
}

+# Clear working vars
+function clear_vars() {
+ delete table
+ delete lptable2
+ delete lptable1
+ delete lptable3
+ eid = -1 # escape id
+ gid = -1 # group id
+ aid = -1 # AVX id
+ tname = ""
+}
+
BEGIN {
# Implementation error checking
awkchecked = check_awk_implement()
@@ -24,11 +36,15 @@ BEGIN {

# Setup generating tables
print "/* x86 opcode map generated from x86-opcode-map.txt */"
- print "/* Do not change this code. */"
+ print "/* Do not change this code. */\n"
ggid = 1
geid = 1
+ gaid = 0
+ delete etable
+ delete gtable
+ delete atable

- opnd_expr = "^[[:alpha:]]"
+ opnd_expr = "^[[:alpha:]/]"
ext_expr = "^\\("
sep_expr = "^\\|$"
group_expr = "^Grp[[:alnum:]]+"
@@ -46,19 +62,19 @@ BEGIN {
imm_flag["Ob"] = "INAT_MOFFSET"
imm_flag["Ov"] = "INAT_MOFFSET"

- modrm_expr = "^([CDEGMNPQRSUVW][[:lower:]]+|NTA|T[012])"
+ modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
rex_expr = "^REX(\\.[XRWB]+)*"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\(66\\)"
- delete lptable1
- lprefix2_expr = "\\(F2\\)"
- delete lptable2
- lprefix3_expr = "\\(F3\\)"
- delete lptable3
+ lprefix2_expr = "\\(F3\\)"
+ lprefix3_expr = "\\(F2\\)"
max_lprefix = 4

+ vexok_expr = "\\(VEX\\)"
+ vexonly_expr = "\\(oVEX\\)"
+
prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
prefix_num["REPNE"] = "INAT_PFX_REPNE"
@@ -71,12 +87,10 @@ BEGIN {
prefix_num["SEG=GS"] = "INAT_PFX_GS"
prefix_num["SEG=SS"] = "INAT_PFX_SS"
prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
+ prefix_num["2bytes-VEX"] = "INAT_PFX_VEX2"
+ prefix_num["3bytes-VEX"] = "INAT_PFX_VEX3"

- delete table
- delete etable
- delete gtable
- eid = -1
- gid = -1
+ clear_vars()
}

function semantic_error(msg) {
@@ -97,14 +111,12 @@ function array_size(arr, i,c) {

/^Table:/ {
print "/* " $0 " */"
+ if (tname != "")
+ semantic_error("Hit Table: before EndTable:.");
}

/^Referrer:/ {
- if (NF == 1) {
- # primary opcode table
- tname = "inat_primary_table"
- eid = -1
- } else {
+ if (NF != 1) {
# escape opcode table
ref = ""
for (i = 2; i <= NF; i++)
@@ -114,6 +126,19 @@ function array_size(arr, i,c) {
}
}

+/^AVXcode:/ {
+ if (NF != 1) {
+ # AVX/escape opcode table
+ aid = $2
+ if (gaid <= aid)
+ gaid = aid + 1
+ if (tname == "") # AVX only opcode table
+ tname = sprintf("inat_avx_table_%d", $2)
+ }
+ if (aid == -1 && eid == -1) # primary opcode table
+ tname = "inat_primary_table"
+}
+
/^GrpTable:/ {
print "/* " $0 " */"
if (!($2 in group))
@@ -162,30 +187,33 @@ function print_table(tbl,name,fmt,n)
print_table(table, tname "[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,0] = tname
+ if (aid >= 0)
+ atable[aid,0] = tname
}
if (array_size(lptable1) != 0) {
print_table(lptable1,tname "_1[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,1] = tname "_1"
+ if (aid >= 0)
+ atable[aid,1] = tname "_1"
}
if (array_size(lptable2) != 0) {
print_table(lptable2,tname "_2[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,2] = tname "_2"
+ if (aid >= 0)
+ atable[aid,2] = tname "_2"
}
if (array_size(lptable3) != 0) {
print_table(lptable3,tname "_3[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,3] = tname "_3"
+ if (aid >= 0)
+ atable[aid,3] = tname "_3"
}
}
print ""
- delete table
- delete lptable1
- delete lptable2
- delete lptable3
- gid = -1
- eid = -1
+ clear_vars()
}

function add_flags(old,new) {
@@ -284,6 +312,14 @@ function convert_operands(opnd, i,imm,mod)
if (match(opcode, fpu_expr))
flags = add_flags(flags, "INAT_MODRM")

+ # check VEX only code
+ if (match(ext, vexonly_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
+
+ # check VEX only code
+ if (match(ext, vexok_expr))
+ flags = add_flags(flags, "INAT_VEXOK")
+
# check prefixes
if (match(ext, prefix_expr)) {
if (!prefix_num[opcode])
@@ -330,5 +366,15 @@ END {
for (j = 0; j < max_lprefix; j++)
if (gtable[i,j])
print " ["i"]["j"] = "gtable[i,j]","
+ print "};\n"
+ # print AVX opcode map's array
+ print "/* AVX opcode map array */"
+ print "const insn_attr_t const *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+ "[INAT_LSTPFX_MAX + 1] = {"
+ for (i = 0; i < gaid; i++)
+ for (j = 0; j < max_lprefix; j++)
+ if (atable[i,j])
+ print " ["i"]["j"] = "atable[i,j]","
print "};"
}
+


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:15

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 05/10] x86: Add Intel FMA instructions to x86 opcode map

Add Intel FMA(FUSED-MULTIPLY-ADD) instructions to x86 opcode map
for x86 instruction decoder.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

arch/x86/lib/x86-opcode-map.txt | 34 +++++++++++++++++++++++++++++++++-
1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 9887bfe..a793da5 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -647,11 +647,43 @@ AVXcode: 2
3d: pmaxsd Vdq,Wdq (66),(VEX),(o128)
3e: pmaxuw Vdq,Wdq (66),(VEX),(o128)
3f: pmaxud Vdq,Wdq (66),(VEX),(o128)
-# 0x0f 0x38 0x4f-0xff
+# 0x0f 0x38 0x40-0x8f
40: pmulld Vdq,Wdq (66),(VEX),(o128)
41: phminposuw Vdq,Wdq (66),(VEX),(o128)
80: INVEPT Gd/q,Mdq (66)
81: INVPID Gd/q,Mdq (66)
+# 0x0f 0x38 0x90-0xbf (FMA)
+96: vfmaddsub132pd/ps /r (66),(VEX)
+97: vfmsubadd132pd/ps /r (66),(VEX)
+98: vfmadd132pd/ps /r (66),(VEX)
+99: vfmadd132sd/ss /r (66),(VEX),(o128)
+9a: vfmsub132pd/ps /r (66),(VEX)
+9b: vfmsub132sd/ss /r (66),(VEX),(o128)
+9c: vfnmadd132pd/ps /r (66),(VEX)
+9d: vfnmadd132sd/ss /r (66),(VEX),(o128)
+9e: vfnmsub132pd/ps /r (66),(VEX)
+9f: vfnmsub132sd/ss /r (66),(VEX),(o128)
+a6: vfmaddsub213pd/ps /r (66),(VEX)
+a7: vfmsubadd213pd/ps /r (66),(VEX)
+a8: vfmadd213pd/ps /r (66),(VEX)
+a9: vfmadd213sd/ss /r (66),(VEX),(o128)
+aa: vfmsub213pd/ps /r (66),(VEX)
+ab: vfmsub213sd/ss /r (66),(VEX),(o128)
+ac: vfnmadd213pd/ps /r (66),(VEX)
+ad: vfnmadd213sd/ss /r (66),(VEX),(o128)
+ae: vfnmsub213pd/ps /r (66),(VEX)
+af: vfnmsub213sd/ss /r (66),(VEX),(o128)
+b6: vfmaddsub231pd/ps /r (66),(VEX)
+b7: vfmsubadd231pd/ps /r (66),(VEX)
+b8: vfmadd231pd/ps /r (66),(VEX)
+b9: vfmadd231sd/ss /r (66),(VEX),(o128)
+ba: vfmsub231pd/ps /r (66),(VEX)
+bb: vfmsub231sd/ss /r (66),(VEX),(o128)
+bc: vfnmadd231pd/ps /r (66),(VEX)
+bd: vfnmadd231sd/ss /r (66),(VEX),(o128)
+be: vfnmsub231pd/ps /r (66),(VEX)
+bf: vfnmsub231sd/ss /r (66),(VEX),(o128)
+# 0x0f 0x38 0xc0-0xff
db: aesimc Vdq,Wdq (66),(VEX),(o128)
dc: aesenc Vdq,Wdq (66),(VEX),(o128)
dd: aesenclast Vdq,Wdq (66),(VEX),(o128)


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:32

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 06/10] kprobe-tracer: Compare both of event-name and event-group to find probe

Fix find_probe_event() to compare both of event-name and event-group.
Without this fix, kprobe-tracer overwrites existing same event-name probe
even if its group-name is different.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

kernel/trace/trace_kprobe.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b8ef707..a86c3ac 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -353,12 +353,14 @@ static void free_trace_probe(struct trace_probe *tp)
kfree(tp);
}

-static struct trace_probe *find_probe_event(const char *event)
+static struct trace_probe *find_probe_event(const char *event,
+ const char *group)
{
struct trace_probe *tp;

list_for_each_entry(tp, &probe_list, list)
- if (!strcmp(tp->call.name, event))
+ if (strcmp(tp->call.name, event) == 0 &&
+ strcmp(tp->call.system, group) == 0)
return tp;
return NULL;
}
@@ -383,7 +385,7 @@ static int register_trace_probe(struct trace_probe *tp)
mutex_lock(&probe_lock);

/* register as an event */
- old_tp = find_probe_event(tp->call.name);
+ old_tp = find_probe_event(tp->call.name, tp->call.system);
if (old_tp) {
/* delete old event */
unregister_trace_probe(old_tp);


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 07/10] perf/probes: Exit searching after finding target function

Exit searching after finding real (not-inlined) function, because
there should be no same symbol in that CU.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

tools/perf/util/probe-finder.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 54e7071..b98d35e 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -585,14 +585,14 @@ static int probefunc_callback(struct die_link *dlink, void *data)
DIE_IF(ret != DW_DLV_OK);
pr_debug("inline definition offset %lld\n",
pf->inl_offs);
- return 0;
+ return 0; /* Continue to search */
}
/* Get probe address */
pf->addr = die_get_entrypc(dlink->die);
pf->addr += pp->offset;
/* TODO: Check the address in this function */
show_probepoint(dlink->die, pp->offset, pf);
- /* Continue to search */
+ return 1; /* Exit; no same symbol in this CU. */
}
} else if (tag == DW_TAG_inlined_subroutine && pf->inl_offs) {
if (die_get_abstract_origin(dlink->die) == pf->inl_offs) {


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:36

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 08/10] perf/probes: Change command-line option of perf-probe

Change command-line option from -P to --add, and accepting probes
without --add too.

perf probe --add "probe-define"

or, just

perf probe "probe-define"

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

tools/perf/builtin-probe.c | 28 ++++++++++++++++++----------
1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index dcb406c..3370dab 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -65,8 +65,8 @@ static struct {

#define semantic_error(msg ...) die("Semantic error :" msg)

-static int parse_probepoint(const struct option *opt __used,
- const char *str, int unset __used)
+/* Parse a probe point. Note that any error must die. */
+static void parse_probepoint(const char *str)
{
char *argv[MAX_PROBE_ARGS + 2]; /* Event + probe + args */
int argc, i;
@@ -75,9 +75,6 @@ static int parse_probepoint(const struct option *opt __used,
char **event = &session.events[session.nr_probe];
int retp = 0;

- if (!str) /* The end of probe points */
- return 0;
-
pr_debug("probe-definition(%d): %s\n", session.nr_probe, str);
if (++session.nr_probe == MAX_PROBES)
semantic_error("Too many probes");
@@ -176,6 +173,13 @@ static int parse_probepoint(const struct option *opt __used,
}

pr_debug("%d arguments\n", pp->nr_args);
+}
+
+static int opt_add_probepoint(const struct option *opt __used,
+ const char *str, int unset __used)
+{
+ if (str)
+ parse_probepoint(str);
return 0;
}

@@ -211,7 +215,8 @@ static int open_default_vmlinux(void)
#endif

static const char * const probe_usage[] = {
- "perf probe [<options>] -P 'PROBEDEF' [-P 'PROBEDEF' ...]",
+ "perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]",
+ "perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]",
NULL
};

@@ -222,7 +227,7 @@ static const struct option options[] = {
OPT_STRING('k', "vmlinux", &session.vmlinux, "file",
"vmlinux/module pathname"),
#endif
- OPT_CALLBACK('P', "probe", NULL,
+ OPT_CALLBACK('a', "add", NULL,
#ifdef NO_LIBDWARF
"p|r:[GRP/]NAME FUNC[+OFFS] [ARG ...]",
#else
@@ -243,7 +248,7 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
- parse_probepoint),
+ opt_add_probepoint),
OPT_END()
};

@@ -296,8 +301,11 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
char buf[MAX_CMDLEN];

argc = parse_options(argc, argv, options, probe_usage,
- PARSE_OPT_STOP_AT_NON_OPTION);
- if (argc || session.nr_probe == 0)
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ for (i = 0; i < argc; i++)
+ parse_probe_event(argv[i]);
+
+ if (session.nr_probe == 0)
usage_with_options(probe_usage, options);

#ifdef NO_LIBDWARF


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:44:12

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 09/10] perf/probes: Change probepoint syntax of perf-probe

This changes probe point syntax of perf-probe as below

<SRC>[:ABS_LN] [ARGS]
or
<FUNC>[+OFFS|%return][@SRC] [ARGS]

And event name and event group name are automatically
generated based on probe-symbol and offset as below.

perfprobes/SYMBOL_OFFSET[_NUM]

Where SYMBOL is the probing symbol and OFFSET is
the byte offset from the symbol.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

tools/perf/builtin-probe.c | 181 +++++++++++++++++++++++++---------------
tools/perf/util/probe-finder.c | 10 ++
tools/perf/util/probe-finder.h | 2
3 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 3370dab..92b4c49 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -52,6 +52,7 @@ const char *default_search_path[NR_SEARCH_PATH] = {
#define MAX_PATH_LEN 256
#define MAX_PROBES 128
#define MAX_PROBE_ARGS 128
+#define PERFPROBE_GROUP "perfprobe"

/* Session management structure */
static struct {
@@ -60,20 +61,100 @@ static struct {
int need_dwarf;
int nr_probe;
struct probe_point probes[MAX_PROBES];
- char *events[MAX_PROBES];
} session;

#define semantic_error(msg ...) die("Semantic error :" msg)

-/* Parse a probe point. Note that any error must die. */
-static void parse_probepoint(const char *str)
+/* Parse probe point. Return 1 if return probe */
+static void parse_probe_point(char *arg, struct probe_point *pp)
+{
+ char *ptr, *tmp;
+ char c, nc;
+ /*
+ * <Syntax>
+ * perf probe SRC:LN
+ * perf probe FUNC[+OFFS|%return][@SRC]
+ */
+
+ ptr = strpbrk(arg, ":+@%");
+ if (ptr) {
+ nc = *ptr;
+ *ptr++ = '\0';
+ }
+
+ /* Check arg is function or file and copy it */
+ if (strchr(arg, '.')) /* File */
+ pp->file = strdup(arg);
+ else /* Function */
+ pp->function = strdup(arg);
+ DIE_IF(pp->file == NULL && pp->function == NULL);
+
+ /* Parse other options */
+ while (ptr) {
+ arg = ptr;
+ c = nc;
+ ptr = strpbrk(arg, ":+@%");
+ if (ptr) {
+ nc = *ptr;
+ *ptr++ = '\0';
+ }
+ switch (c) {
+ case ':': /* Line number */
+ pp->line = strtoul(arg, &tmp, 0);
+ if (*tmp != '\0')
+ semantic_error("There is non-digit charactor"
+ " in line number.");
+ break;
+ case '+': /* Byte offset from a symbol */
+ pp->offset = strtoul(arg, &tmp, 0);
+ if (*tmp != '\0')
+ semantic_error("There is non-digit charactor"
+ " in offset.");
+ break;
+ case '@': /* File name */
+ if (pp->file)
+ semantic_error("SRC@SRC is not allowed.");
+ pp->file = strdup(arg);
+ DIE_IF(pp->file == NULL);
+ if (ptr)
+ semantic_error("@SRC must be the last "
+ "option.");
+ break;
+ case '%': /* Probe places */
+ if (strcmp(arg, "return") == 0) {
+ pp->retprobe = 1;
+ } else /* Others not supported yet */
+ semantic_error("%%%s is not supported.", arg);
+ break;
+ default:
+ DIE_IF("Program has a bug.");
+ break;
+ }
+ }
+
+ /* Exclusion check */
+ if (pp->line && pp->function)
+ semantic_error("Function-relative line number is not"
+ " supported yet.");
+ if (!pp->line && pp->file && !pp->function)
+ semantic_error("File always requires line number.");
+ if (pp->offset && !pp->function)
+ semantic_error("Offset requires an entry function.");
+ if (pp->retprobe && !pp->function)
+ semantic_error("Return probe requires an entry function.");
+ if (pp->offset && pp->retprobe)
+ semantic_error("Offset can't be used with return probe.");
+
+ pr_debug("symbol:%s file:%s line:%d offset:%d, return:%d\n",
+ pp->function, pp->file, pp->line, pp->offset, pp->retprobe);
+}
+
+/* Parse an event definition. Note that any error must die. */
+static void parse_probe_event(const char *str)
{
char *argv[MAX_PROBE_ARGS + 2]; /* Event + probe + args */
int argc, i;
- char *arg, *ptr;
struct probe_point *pp = &session.probes[session.nr_probe];
- char **event = &session.events[session.nr_probe];
- int retp = 0;

pr_debug("probe-definition(%d): %s\n", session.nr_probe, str);
if (++session.nr_probe == MAX_PROBES)
@@ -103,70 +184,28 @@ static void parse_probepoint(const char *str)
pr_debug("argv[%d]=%s\n", argc, argv[argc - 1]);
}
} while (*str != '\0');
- if (argc < 2)
- semantic_error("Need event-name and probe-point at least.");
-
- /* Parse the event name */
- if (argv[0][0] == 'r')
- retp = 1;
- else if (argv[0][0] != 'p')
- semantic_error("You must specify 'p'(kprobe) or"
- " 'r'(kretprobe) first.");
- /* TODO: check event name */
- *event = argv[0];
+ if (!argc)
+ semantic_error("An empty argument.");

/* Parse probe point */
- arg = argv[1];
- if (arg[0] == '@') {
- /* Source Line */
- arg++;
- ptr = strchr(arg, ':');
- if (!ptr || !isdigit(ptr[1]))
- semantic_error("Line number is required.");
- *ptr++ = '\0';
- if (strlen(arg) == 0)
- semantic_error("No file name.");
- pp->file = strdup(arg);
- pp->line = atoi(ptr);
- if (!pp->file || !pp->line)
- semantic_error("Failed to parse line.");
- pr_debug("file:%s line:%d\n", pp->file, pp->line);
- } else {
- /* Function name */
- ptr = strchr(arg, '+');
- if (ptr) {
- if (!isdigit(ptr[1]))
- semantic_error("Offset is required.");
- *ptr++ = '\0';
- pp->offset = atoi(ptr);
- } else
- ptr = arg;
- ptr = strchr(ptr, '@');
- if (ptr) {
- *ptr++ = '\0';
- pp->file = strdup(ptr);
- }
- pp->function = strdup(arg);
- pr_debug("symbol:%s file:%s offset:%d\n",
- pp->function, pp->file, pp->offset);
- }
- free(argv[1]);
+ parse_probe_point(argv[0], pp);
+ free(argv[0]);
if (pp->file)
session.need_dwarf = 1;

/* Copy arguments */
- pp->nr_args = argc - 2;
+ pp->nr_args = argc - 1;
if (pp->nr_args > 0) {
pp->args = (char **)malloc(sizeof(char *) * pp->nr_args);
if (!pp->args)
die("malloc");
- memcpy(pp->args, &argv[2], sizeof(char *) * pp->nr_args);
+ memcpy(pp->args, &argv[1], sizeof(char *) * pp->nr_args);
}

/* Ensure return probe has no C argument */
for (i = 0; i < pp->nr_args; i++)
if (is_c_varname(pp->args[i])) {
- if (retp)
+ if (pp->retprobe)
semantic_error("You can't specify local"
" variable for kretprobe");
session.need_dwarf = 1;
@@ -175,11 +214,11 @@ static void parse_probepoint(const char *str)
pr_debug("%d arguments\n", pp->nr_args);
}

-static int opt_add_probepoint(const struct option *opt __used,
+static int opt_add_probe_event(const struct option *opt __used,
const char *str, int unset __used)
{
if (str)
- parse_probepoint(str);
+ parse_probe_event(str);
return 0;
}

@@ -229,17 +268,16 @@ static const struct option options[] = {
#endif
OPT_CALLBACK('a', "add", NULL,
#ifdef NO_LIBDWARF
- "p|r:[GRP/]NAME FUNC[+OFFS] [ARG ...]",
+ "FUNC[+OFFS|%return] [ARG ...]",
#else
- "p|r:[GRP/]NAME FUNC[+OFFS][@SRC]|@SRC:LINE [ARG ...]",
+ "FUNC[+OFFS|%return][@SRC]|SRC:LINE [ARG ...]",
#endif
"probe point definition, where\n"
- "\t\tp:\tkprobe probe\n"
- "\t\tr:\tkretprobe probe\n"
"\t\tGRP:\tGroup name (optional)\n"
"\t\tNAME:\tEvent name\n"
"\t\tFUNC:\tFunction name\n"
"\t\tOFFS:\tOffset from function entry (in byte)\n"
+ "\t\t%return:\tPut the probe at function return\n"
#ifdef NO_LIBDWARF
"\t\tARG:\tProbe argument (only \n"
#else
@@ -248,7 +286,7 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
- opt_add_probepoint),
+ opt_add_probe_event),
OPT_END()
};

@@ -266,7 +304,7 @@ static int write_new_event(int fd, const char *buf)

#define MAX_CMDLEN 256

-static int synthesize_probepoint(struct probe_point *pp)
+static int synthesize_probe_event(struct probe_point *pp)
{
char *buf;
int i, len, ret;
@@ -316,12 +354,12 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
/* Synthesize probes without dwarf */
for (j = 0; j < session.nr_probe; j++) {
#ifndef NO_LIBDWARF
- if (session.events[j][0] != 'r') {
+ if (!session.probes[j].retprobe) {
session.need_dwarf = 1;
continue;
}
#endif
- ret = synthesize_probepoint(&session.probes[j]);
+ ret = synthesize_probe_event(&session.probes[j]);
if (ret == -E2BIG)
semantic_error("probe point is too long.");
else if (ret < 0)
@@ -349,7 +387,6 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
ret = find_probepoint(fd, pp);
if (ret <= 0)
die("No probe point found.\n");
- pr_debug("probe event %s found\n", session.events[j]);
}
close(fd);

@@ -364,13 +401,17 @@ setup_probes:
for (j = 0; j < session.nr_probe; j++) {
pp = &session.probes[j];
if (pp->found == 1) {
- snprintf(buf, MAX_CMDLEN, "%s %s\n",
- session.events[j], pp->probes[0]);
+ snprintf(buf, MAX_CMDLEN, "%c:%s/%s_%x %s\n",
+ pp->retprobe ? 'r' : 'p', PERFPROBE_GROUP,
+ pp->function, pp->offset, pp->probes[0]);
write_new_event(fd, buf);
} else
for (i = 0; i < pp->found; i++) {
- snprintf(buf, MAX_CMDLEN, "%s%d %s\n",
- session.events[j], i, pp->probes[i]);
+ snprintf(buf, MAX_CMDLEN, "%c:%s/%s_%x_%d %s\n",
+ pp->retprobe ? 'r' : 'p',
+ PERFPROBE_GROUP,
+ pp->function, pp->offset, i,
+ pp->probes[0]);
write_new_event(fd, buf);
}
}
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index b98d35e..6d3bac9 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -483,10 +483,20 @@ static void show_probepoint(Dwarf_Die sp_die, Dwarf_Signed offs,
if (ret == DW_DLV_OK) {
ret = snprintf(tmp, MAX_PROBE_BUFFER, "%s+%u", name,
(unsigned int)offs);
+ /* Copy the function name if possible */
+ if (!pp->function) {
+ pp->function = strdup(name);
+ pp->offset = offs;
+ }
dwarf_dealloc(__dw_debug, name, DW_DLA_STRING);
} else {
/* This function has no name. */
ret = snprintf(tmp, MAX_PROBE_BUFFER, "0x%llx", pf->addr);
+ if (!pp->function) {
+ /* TODO: Use _stext */
+ pp->function = strdup("");
+ pp->offset = (int)pf->addr;
+ }
}
DIE_IF(ret < 0);
DIE_IF(ret >= MAX_PROBE_BUFFER);
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index d17fafc..240d6cb 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -22,6 +22,8 @@ struct probe_point {
int nr_args; /* Number of arguments */
char **args; /* Arguments */

+ int retprobe; /* Return probe */
+
/* Output */
int found; /* Number of found probe points */
char *probes[MAX_PROBES]; /* Output buffers (will be allocated)*/


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-27 20:43:58

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip perf/probes 10/10] perf/probes: Support function entry relative line number

Add function-entry relative line number specifying support to perf-probe.
This allows users to define probes by line number from entry of the
function.

e.g.

perf probe schedule:16

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
---

tools/perf/builtin-probe.c | 14 ++++---
tools/perf/util/probe-finder.c | 79 +++++++++++++++++++++++++++++++---------
tools/perf/util/probe-finder.h | 2 +
3 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 92b4c49..a99a366 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -133,17 +133,16 @@ static void parse_probe_point(char *arg, struct probe_point *pp)
}

/* Exclusion check */
- if (pp->line && pp->function)
- semantic_error("Function-relative line number is not"
- " supported yet.");
+ if (pp->line && pp->offset)
+ semantic_error("Offset can't be used with line number.");
if (!pp->line && pp->file && !pp->function)
semantic_error("File always requires line number.");
if (pp->offset && !pp->function)
semantic_error("Offset requires an entry function.");
if (pp->retprobe && !pp->function)
semantic_error("Return probe requires an entry function.");
- if (pp->offset && pp->retprobe)
- semantic_error("Offset can't be used with return probe.");
+ if ((pp->offset || pp->line) && pp->retprobe)
+ semantic_error("Offset/Line can't be used with return probe.");

pr_debug("symbol:%s file:%s line:%d offset:%d, return:%d\n",
pp->function, pp->file, pp->line, pp->offset, pp->retprobe);
@@ -270,7 +269,7 @@ static const struct option options[] = {
#ifdef NO_LIBDWARF
"FUNC[+OFFS|%return] [ARG ...]",
#else
- "FUNC[+OFFS|%return][@SRC]|SRC:LINE [ARG ...]",
+ "FUNC[+OFFS|%return|:RLN][@SRC]|SRC:ALN [ARG ...]",
#endif
"probe point definition, where\n"
"\t\tGRP:\tGroup name (optional)\n"
@@ -282,7 +281,8 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (only \n"
#else
"\t\tSRC:\tSource code path\n"
- "\t\tLINE:\tLine number\n"
+ "\t\tRLN:\tRelative line number from function entry.\n"
+ "\t\tALN:\tAbsolute line number in file.\n"
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 6d3bac9..db96186 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -114,7 +114,7 @@ static int strtailcmp(const char *s1, const char *s2)
}

/* Find the fileno of the target file. */
-static Dwarf_Unsigned die_get_fileno(Dwarf_Die cu_die, const char *fname)
+static Dwarf_Unsigned cu_find_fileno(Dwarf_Die cu_die, const char *fname)
{
Dwarf_Signed cnt, i;
Dwarf_Unsigned found = 0;
@@ -335,6 +335,36 @@ static int attr_get_locdesc(Dwarf_Attribute attr, Dwarf_Locdesc *desc,
return ret;
}

+/* Get decl_file attribute value (file number) */
+static Dwarf_Unsigned die_get_decl_file(Dwarf_Die sp_die)
+{
+ Dwarf_Attribute attr;
+ Dwarf_Unsigned fno;
+ int ret;
+
+ ret = dwarf_attr(sp_die, DW_AT_decl_file, &attr, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_formudata(attr, &fno, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_dealloc(__dw_debug, attr, DW_DLA_ATTR);
+ return fno;
+}
+
+/* Get decl_line attribute value (line number) */
+static Dwarf_Unsigned die_get_decl_line(Dwarf_Die sp_die)
+{
+ Dwarf_Attribute attr;
+ Dwarf_Unsigned lno;
+ int ret;
+
+ ret = dwarf_attr(sp_die, DW_AT_decl_line, &attr, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_formudata(attr, &lno, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_dealloc(__dw_debug, attr, DW_DLA_ATTR);
+ return lno;
+}
+
/*
* Probe finder related functions
*/
@@ -501,6 +531,7 @@ static void show_probepoint(Dwarf_Die sp_die, Dwarf_Signed offs,
DIE_IF(ret < 0);
DIE_IF(ret >= MAX_PROBE_BUFFER);
len = ret;
+ pr_debug("Probe point found: %s\n", tmp);

/* Find each argument */
get_current_frame_base(sp_die, pf);
@@ -536,17 +567,16 @@ static int probeaddr_callback(struct die_link *dlink, void *data)
}

/* Find probe point from its line number */
-static void find_by_line(Dwarf_Die cu_die, struct probe_finder *pf)
+static void find_by_line(struct probe_finder *pf)
{
- struct probe_point *pp = pf->pp;
- Dwarf_Signed cnt, i;
+ Dwarf_Signed cnt, i, clm;
Dwarf_Line *lines;
Dwarf_Unsigned lineno = 0;
Dwarf_Addr addr;
Dwarf_Unsigned fno;
int ret;

- ret = dwarf_srclines(cu_die, &lines, &cnt, &__dw_error);
+ ret = dwarf_srclines(pf->cu_die, &lines, &cnt, &__dw_error);
DIE_IF(ret != DW_DLV_OK);

for (i = 0; i < cnt; i++) {
@@ -557,15 +587,20 @@ static void find_by_line(Dwarf_Die cu_die, struct probe_finder *pf)

ret = dwarf_lineno(lines[i], &lineno, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
- if (lineno != (Dwarf_Unsigned)pp->line)
+ if (lineno != pf->lno)
continue;

+ ret = dwarf_lineoff(lines[i], &clm, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+
ret = dwarf_lineaddr(lines[i], &addr, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
- pr_debug("Probe point found: 0x%llx\n", addr);
+ pr_debug("Probe line found: line[%d]:%u,%d addr:0x%llx\n",
+ (int)i, (unsigned)lineno, (int)clm, addr);
pf->addr = addr;
/* Search a real subprogram including this line, */
- ret = search_die_from_children(cu_die, probeaddr_callback, pf);
+ ret = search_die_from_children(pf->cu_die,
+ probeaddr_callback, pf);
if (ret == 0)
die("Probe point is not found in subprograms.\n");
/* Continuing, because target line might be inlined. */
@@ -587,6 +622,13 @@ static int probefunc_callback(struct die_link *dlink, void *data)
DIE_IF(ret == DW_DLV_ERROR);
if (tag == DW_TAG_subprogram) {
if (die_compare_name(dlink->die, pp->function) == 0) {
+ if (pp->line) { /* Function relative line */
+ pf->fno = die_get_decl_file(dlink->die);
+ pf->lno = die_get_decl_line(dlink->die)
+ + pp->line;
+ find_by_line(pf);
+ return 1;
+ }
if (die_inlined_subprogram(dlink->die)) {
/* Inlined function, save it. */
ret = dwarf_die_CU_offset(dlink->die,
@@ -631,9 +673,9 @@ found:
return 0;
}

-static void find_by_func(Dwarf_Die cu_die, struct probe_finder *pf)
+static void find_by_func(struct probe_finder *pf)
{
- search_die_from_children(cu_die, probefunc_callback, pf);
+ search_die_from_children(pf->cu_die, probefunc_callback, pf);
}

/* Find a probe point */
@@ -641,7 +683,6 @@ int find_probepoint(int fd, struct probe_point *pp)
{
Dwarf_Half addr_size = 0;
Dwarf_Unsigned next_cuh = 0;
- Dwarf_Die cu_die = 0;
int cu_number = 0, ret;
struct probe_finder pf = {.pp = pp};

@@ -659,25 +700,27 @@ int find_probepoint(int fd, struct probe_point *pp)
break;

/* Get the DIE(Debugging Information Entry) of this CU */
- ret = dwarf_siblingof(__dw_debug, 0, &cu_die, &__dw_error);
+ ret = dwarf_siblingof(__dw_debug, 0, &pf.cu_die, &__dw_error);
DIE_IF(ret != DW_DLV_OK);

/* Check if target file is included. */
if (pp->file)
- pf.fno = die_get_fileno(cu_die, pp->file);
+ pf.fno = cu_find_fileno(pf.cu_die, pp->file);

if (!pp->file || pf.fno) {
/* Save CU base address (for frame_base) */
- ret = dwarf_lowpc(cu_die, &pf.cu_base, &__dw_error);
+ ret = dwarf_lowpc(pf.cu_die, &pf.cu_base, &__dw_error);
DIE_IF(ret == DW_DLV_ERROR);
if (ret == DW_DLV_NO_ENTRY)
pf.cu_base = 0;
- if (pp->line)
- find_by_line(cu_die, &pf);
if (pp->function)
- find_by_func(cu_die, &pf);
+ find_by_func(&pf);
+ else {
+ pf.lno = pp->line;
+ find_by_line(&pf);
+ }
}
- dwarf_dealloc(__dw_debug, cu_die, DW_DLA_DIE);
+ dwarf_dealloc(__dw_debug, pf.cu_die, DW_DLA_DIE);
}
ret = dwarf_finish(__dw_debug, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index 240d6cb..bdebca6 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -41,7 +41,9 @@ struct probe_finder {
/* For function searching */
Dwarf_Addr addr; /* Address */
Dwarf_Unsigned fno; /* File number */
+ Dwarf_Unsigned lno; /* Line number */
Dwarf_Off inl_offs; /* Inline offset */
+ Dwarf_Die cu_die; /* Current CU */

/* For variable searching */
Dwarf_Addr cu_base; /* Current CU base address */


--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-29 08:07:58

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] x86: Fix SSE opcode map bug

Commit-ID: 7f387d3f2421781610588faa2f49ae5f1737b137
Gitweb: http://git.kernel.org/tip/7f387d3f2421781610588faa2f49ae5f1737b137
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:04 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:45 +0100

x86: Fix SSE opcode map bug

Fix superscripts position because some superscripts of SSE
opcode are not put in correct position.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204204.30545.97296.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 701c467..efef3ca 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -401,9 +401,9 @@ Referrer: 2-byte escape
62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66)
63: packsswb Pq,Qq | packsswb Vdq,Wdq (66)
64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66)
-65: pcmpgtw Pq,Qq | pcmpgtw(66) Vdq,Wdq
+65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66)
66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66)
-67: packuswb Pq,Qq | packuswb(66) Vdq,Wdq
+67: packuswb Pq,Qq | packuswb Vdq,Wdq (66)
68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66)
69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66)
6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66)
@@ -425,8 +425,8 @@ Referrer: 2-byte escape
79: VMWRITE Gd/q,Ed/q
7a:
7b:
-7c: haddps(F2) Vps,Wps | haddpd(66) Vpd,Wpd
-7d: hsubps(F2) Vps,Wps | hsubpd(66) Vpd,Wpd
+7c: haddps Vps,Wps (F2) | haddpd Vpd,Wpd (66)
+7d: hsubps Vps,Wps (F2) | hsubpd Vpd,Wpd (66)
7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66) | movq Vq,Wq (F3)
7f: movq Qq,Pq | movdqa Wdq,Vdq (66) | movdqu Wdq,Vdq (F3)
# 0x0f 0x80-0x8f
@@ -574,7 +574,7 @@ Referrer: 3-byte escape 1
01: phaddw Pq,Qq | phaddw Vdq,Wdq (66)
02: phaddd Pq,Qq | phaddd Vdq,Wdq (66)
03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66)
-04: pmaddubsw Pq,Qq | pmaddubsw (66)Vdq,Wdq
+04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66)
05: phsubw Pq,Qq | phsubw Vdq,Wdq (66)
06: phsubd Pq,Qq | phsubd Vdq,Wdq (66)
07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66)

2009-10-29 08:08:00

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] x86: Merge INAT_REXPFX into INAT_PFX_*

Commit-ID: 04d46c1b13b02e1e5c24eb270a01cf3f94ee4d04
Gitweb: http://git.kernel.org/tip/04d46c1b13b02e1e5c24eb270a01cf3f94ee4d04
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:11 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:45 +0100

x86: Merge INAT_REXPFX into INAT_PFX_*

Merge INAT_REXPFX into INAT_PFX_* macro and rename it to
INAT_PFX_REX.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204211.30545.58090.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/inat.h | 36 ++++++++++++++++++---------------
arch/x86/lib/insn.c | 2 +-
arch/x86/tools/gen-insn-attr-x86.awk | 6 ++--
3 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index 2866fdd..c2487d2 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -30,10 +30,11 @@
#define INAT_OPCODE_TABLE_SIZE 256
#define INAT_GROUP_TABLE_SIZE 8

-/* Legacy instruction prefixes */
+/* Legacy last prefixes */
#define INAT_PFX_OPNDSZ 1 /* 0x66 */ /* LPFX1 */
#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */
#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */
+/* Other Legacy prefixes */
#define INAT_PFX_LOCK 4 /* 0xF0 */
#define INAT_PFX_CS 5 /* 0x2E */
#define INAT_PFX_DS 6 /* 0x3E */
@@ -42,8 +43,11 @@
#define INAT_PFX_GS 9 /* 0x65 */
#define INAT_PFX_SS 10 /* 0x36 */
#define INAT_PFX_ADDRSZ 11 /* 0x67 */
+/* x86-64 REX prefix */
+#define INAT_PFX_REX 12 /* 0x4X */

-#define INAT_LPREFIX_MAX 3
+#define INAT_LSTPFX_MAX 3
+#define INAT_LGCPFX_MAX 11

/* Immediate size */
#define INAT_IMM_BYTE 1
@@ -75,12 +79,11 @@
#define INAT_IMM_MASK (((1 << INAT_IMM_BITS) - 1) << INAT_IMM_OFFS)
/* Flags */
#define INAT_FLAG_OFFS (INAT_IMM_OFFS + INAT_IMM_BITS)
-#define INAT_REXPFX (1 << INAT_FLAG_OFFS)
-#define INAT_MODRM (1 << (INAT_FLAG_OFFS + 1))
-#define INAT_FORCE64 (1 << (INAT_FLAG_OFFS + 2))
-#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 3))
-#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 4))
-#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 5))
+#define INAT_MODRM (1 << (INAT_FLAG_OFFS))
+#define INAT_FORCE64 (1 << (INAT_FLAG_OFFS + 1))
+#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 2))
+#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 3))
+#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 4))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -97,9 +100,10 @@ extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
insn_attr_t esc_attr);

/* Attribute checking functions */
-static inline int inat_is_prefix(insn_attr_t attr)
+static inline int inat_is_legacy_prefix(insn_attr_t attr)
{
- return attr & INAT_PFX_MASK;
+ attr &= INAT_PFX_MASK;
+ return attr && attr <= INAT_LGCPFX_MAX;
}

static inline int inat_is_address_size_prefix(insn_attr_t attr)
@@ -112,9 +116,14 @@ static inline int inat_is_operand_size_prefix(insn_attr_t attr)
return (attr & INAT_PFX_MASK) == INAT_PFX_OPNDSZ;
}

+static inline int inat_is_rex_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
+}
+
static inline int inat_last_prefix_id(insn_attr_t attr)
{
- if ((attr & INAT_PFX_MASK) > INAT_LPREFIX_MAX)
+ if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
return 0;
else
return attr & INAT_PFX_MASK;
@@ -155,11 +164,6 @@ static inline int inat_immediate_size(insn_attr_t attr)
return (attr & INAT_IMM_MASK) >> INAT_IMM_OFFS;
}

-static inline int inat_is_rex_prefix(insn_attr_t attr)
-{
- return attr & INAT_REXPFX;
-}
-
static inline int inat_has_modrm(insn_attr_t attr)
{
return attr & INAT_MODRM;
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index dfd56a3..9f48317 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -69,7 +69,7 @@ void insn_get_prefixes(struct insn *insn)
lb = 0;
b = peek_next(insn_byte_t, insn);
attr = inat_get_opcode_attribute(b);
- while (inat_is_prefix(attr)) {
+ while (inat_is_legacy_prefix(attr)) {
/* Skip if same prefix */
for (i = 0; i < nb; i++)
if (prefixes->bytes[i] == b)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 19ba096..7d54929 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -278,7 +278,7 @@ function convert_operands(opnd, i,imm,mod)

# check REX prefix
if (match(opcode, rex_expr))
- flags = add_flags(flags, "INAT_REXPFX")
+ flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")

# check coprocessor escape : TODO
if (match(opcode, fpu_expr))
@@ -316,7 +316,7 @@ END {
# print escape opcode map's array
print "/* Escape opcode map array */"
print "const insn_attr_t const *inat_escape_tables[INAT_ESC_MAX + 1]" \
- "[INAT_LPREFIX_MAX + 1] = {"
+ "[INAT_LSTPFX_MAX + 1] = {"
for (i = 0; i < geid; i++)
for (j = 0; j < max_lprefix; j++)
if (etable[i,j])
@@ -325,7 +325,7 @@ END {
# print group opcode map's array
print "/* Group opcode map array */"
print "const insn_attr_t const *inat_group_tables[INAT_GRP_MAX + 1]"\
- "[INAT_LPREFIX_MAX + 1] = {"
+ "[INAT_LSTPFX_MAX + 1] = {"
for (i = 0; i < ggid; i++)
for (j = 0; j < max_lprefix; j++)
if (gtable[i,j])

2009-10-29 08:08:22

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] x86: Add pclmulq to x86 opcode map

Commit-ID: 82cb57028c864822c5a260f806d051e2ce28c86a
Gitweb: http://git.kernel.org/tip/82cb57028c864822c5a260f806d051e2ce28c86a
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:19 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:46 +0100

x86: Add pclmulq to x86 opcode map

Add pclmulq opcode to x86 opcode map.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204219.30545.82039.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index efef3ca..1f41246 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -672,6 +672,7 @@ Referrer: 3-byte escape 2
40: dpps Vdq,Wdq,Ib (66)
41: dppd Vdq,Wdq,Ib (66)
42: mpsadbw Vdq,Wdq,Ib (66)
+44: pclmulq Vdq,Wdq,Ib (66)
60: pcmpestrm Vdq,Wdq,Ib (66)
61: pcmpestri Vdq,Wdq,Ib (66)
62: pcmpistrm Vdq,Wdq,Ib (66)

2009-10-29 08:09:00

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] x86: AVX instruction set decoder support

Commit-ID: e0e492e99b372c6990a5daca9e4683c341f1330e
Gitweb: http://git.kernel.org/tip/e0e492e99b372c6990a5daca9e4683c341f1330e
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:27 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:46 +0100

x86: AVX instruction set decoder support

Add Intel AVX(Advanced Vector Extensions) instruction set
support to x86 instruction decoder. This adds insn.vex_prefix
field for storing VEX prefixes, and introduces some original
tags for expressing opcodes attributes.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204226.30545.23451.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/inat.h | 32 +++-
arch/x86/include/asm/insn.h | 43 ++++-
arch/x86/lib/inat.c | 12 +
arch/x86/lib/insn.c | 52 ++++
arch/x86/lib/x86-opcode-map.txt | 431 ++++++++++++++++++----------------
arch/x86/tools/gen-insn-attr-x86.awk | 94 ++++++--
6 files changed, 431 insertions(+), 233 deletions(-)

diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h
index c2487d2..205b063 100644
--- a/arch/x86/include/asm/inat.h
+++ b/arch/x86/include/asm/inat.h
@@ -32,8 +32,8 @@

/* Legacy last prefixes */
#define INAT_PFX_OPNDSZ 1 /* 0x66 */ /* LPFX1 */
-#define INAT_PFX_REPNE 2 /* 0xF2 */ /* LPFX2 */
-#define INAT_PFX_REPE 3 /* 0xF3 */ /* LPFX3 */
+#define INAT_PFX_REPE 2 /* 0xF3 */ /* LPFX2 */
+#define INAT_PFX_REPNE 3 /* 0xF2 */ /* LPFX3 */
/* Other Legacy prefixes */
#define INAT_PFX_LOCK 4 /* 0xF0 */
#define INAT_PFX_CS 5 /* 0x2E */
@@ -45,6 +45,9 @@
#define INAT_PFX_ADDRSZ 11 /* 0x67 */
/* x86-64 REX prefix */
#define INAT_PFX_REX 12 /* 0x4X */
+/* AVX VEX prefixes */
+#define INAT_PFX_VEX2 13 /* 2-bytes VEX prefix */
+#define INAT_PFX_VEX3 14 /* 3-bytes VEX prefix */

#define INAT_LSTPFX_MAX 3
#define INAT_LGCPFX_MAX 11
@@ -84,6 +87,8 @@
#define INAT_SCNDIMM (1 << (INAT_FLAG_OFFS + 2))
#define INAT_MOFFSET (1 << (INAT_FLAG_OFFS + 3))
#define INAT_VARIANT (1 << (INAT_FLAG_OFFS + 4))
+#define INAT_VEXOK (1 << (INAT_FLAG_OFFS + 5))
+#define INAT_VEXONLY (1 << (INAT_FLAG_OFFS + 6))
/* Attribute making macros for attribute tables */
#define INAT_MAKE_PREFIX(pfx) (pfx << INAT_PFX_OFFS)
#define INAT_MAKE_ESCAPE(esc) (esc << INAT_ESC_OFFS)
@@ -98,6 +103,9 @@ extern insn_attr_t inat_get_escape_attribute(insn_byte_t opcode,
extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
insn_byte_t last_pfx,
insn_attr_t esc_attr);
+extern insn_attr_t inat_get_avx_attribute(insn_byte_t opcode,
+ insn_byte_t vex_m,
+ insn_byte_t vex_pp);

/* Attribute checking functions */
static inline int inat_is_legacy_prefix(insn_attr_t attr)
@@ -129,6 +137,17 @@ static inline int inat_last_prefix_id(insn_attr_t attr)
return attr & INAT_PFX_MASK;
}

+static inline int inat_is_vex_prefix(insn_attr_t attr)
+{
+ attr &= INAT_PFX_MASK;
+ return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_vex3_prefix(insn_attr_t attr)
+{
+ return (attr & INAT_PFX_MASK) == INAT_PFX_VEX3;
+}
+
static inline int inat_is_escape(insn_attr_t attr)
{
return attr & INAT_ESC_MASK;
@@ -189,4 +208,13 @@ static inline int inat_has_variant(insn_attr_t attr)
return attr & INAT_VARIANT;
}

+static inline int inat_accept_vex(insn_attr_t attr)
+{
+ return attr & INAT_VEXOK;
+}
+
+static inline int inat_must_vex(insn_attr_t attr)
+{
+ return attr & INAT_VEXONLY;
+}
#endif
diff --git a/arch/x86/include/asm/insn.h b/arch/x86/include/asm/insn.h
index 12b4e37..96c2e0a 100644
--- a/arch/x86/include/asm/insn.h
+++ b/arch/x86/include/asm/insn.h
@@ -39,6 +39,7 @@ struct insn {
* prefixes.bytes[3]: last prefix
*/
struct insn_field rex_prefix; /* REX prefix */
+ struct insn_field vex_prefix; /* VEX prefix */
struct insn_field opcode; /*
* opcode.bytes[0]: opcode1
* opcode.bytes[1]: opcode2
@@ -80,6 +81,19 @@ struct insn {
#define X86_REX_X(rex) ((rex) & 2)
#define X86_REX_B(rex) ((rex) & 1)

+/* VEX bit flags */
+#define X86_VEX_W(vex) ((vex) & 0x80) /* VEX3 Byte2 */
+#define X86_VEX_R(vex) ((vex) & 0x80) /* VEX2/3 Byte1 */
+#define X86_VEX_X(vex) ((vex) & 0x40) /* VEX3 Byte1 */
+#define X86_VEX_B(vex) ((vex) & 0x20) /* VEX3 Byte1 */
+#define X86_VEX_L(vex) ((vex) & 0x04) /* VEX3 Byte2, VEX2 Byte1 */
+/* VEX bit fields */
+#define X86_VEX3_M(vex) ((vex) & 0x1f) /* VEX3 Byte1 */
+#define X86_VEX2_M 1 /* VEX2.M always 1 */
+#define X86_VEX_V(vex) (((vex) & 0x78) >> 3) /* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_P(vex) ((vex) & 0x03) /* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_M_MAX 0x1f /* VEX3.M Maximum value */
+
/* The last prefix is needed for two-byte and three-byte opcodes */
static inline insn_byte_t insn_last_prefix(struct insn *insn)
{
@@ -114,15 +128,42 @@ static inline void kernel_insn_init(struct insn *insn, const void *kaddr)
#endif
}

+static inline int insn_is_avx(struct insn *insn)
+{
+ if (!insn->prefixes.got)
+ insn_get_prefixes(insn);
+ return (insn->vex_prefix.value != 0);
+}
+
+static inline insn_byte_t insn_vex_m_bits(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
+ return X86_VEX2_M;
+ else
+ return X86_VEX3_M(insn->vex_prefix.bytes[1]);
+}
+
+static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
+{
+ if (insn->vex_prefix.nbytes == 2) /* 2 bytes VEX */
+ return X86_VEX_P(insn->vex_prefix.bytes[1]);
+ else
+ return X86_VEX_P(insn->vex_prefix.bytes[2]);
+}
+
/* Offset of each field from kaddr */
static inline int insn_offset_rex_prefix(struct insn *insn)
{
return insn->prefixes.nbytes;
}
-static inline int insn_offset_opcode(struct insn *insn)
+static inline int insn_offset_vex_prefix(struct insn *insn)
{
return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes;
}
+static inline int insn_offset_opcode(struct insn *insn)
+{
+ return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes;
+}
static inline int insn_offset_modrm(struct insn *insn)
{
return insn_offset_opcode(insn) + insn->opcode.nbytes;
diff --git a/arch/x86/lib/inat.c b/arch/x86/lib/inat.c
index 3fb5998..46fc4ee 100644
--- a/arch/x86/lib/inat.c
+++ b/arch/x86/lib/inat.c
@@ -76,3 +76,15 @@ insn_attr_t inat_get_group_attribute(insn_byte_t modrm, insn_byte_t last_pfx,
inat_group_common_attribute(grp_attr);
}

+insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
+ insn_byte_t vex_p)
+{
+ const insn_attr_t *table;
+ if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
+ return 0;
+ table = inat_avx_tables[vex_m][vex_p];
+ if (!table)
+ return 0;
+ return table[opcode];
+}
+
diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
index 9f48317..9f33b98 100644
--- a/arch/x86/lib/insn.c
+++ b/arch/x86/lib/insn.c
@@ -28,6 +28,9 @@
#define peek_next(t, insn) \
({t r; r = *(t*)insn->next_byte; r; })

+#define peek_nbyte_next(t, insn, n) \
+ ({t r; r = *(t*)((insn)->next_byte + n); r; })
+
/**
* insn_init() - initialize struct insn
* @insn: &struct insn to be initialized
@@ -107,6 +110,7 @@ found:
insn->prefixes.bytes[3] = lb;
}

+ /* Decode REX prefix */
if (insn->x86_64) {
b = peek_next(insn_byte_t, insn);
attr = inat_get_opcode_attribute(b);
@@ -120,6 +124,39 @@ found:
}
}
insn->rex_prefix.got = 1;
+
+ /* Decode VEX prefix */
+ b = peek_next(insn_byte_t, insn);
+ attr = inat_get_opcode_attribute(b);
+ if (inat_is_vex_prefix(attr)) {
+ insn_byte_t b2 = peek_nbyte_next(insn_byte_t, insn, 1);
+ if (!insn->x86_64) {
+ /*
+ * In 32-bits mode, if the [7:6] bits (mod bits of
+ * ModRM) on the second byte are not 11b, it is
+ * LDS or LES.
+ */
+ if (X86_MODRM_MOD(b2) != 3)
+ goto vex_end;
+ }
+ insn->vex_prefix.bytes[0] = b;
+ insn->vex_prefix.bytes[1] = b2;
+ if (inat_is_vex3_prefix(attr)) {
+ b2 = peek_nbyte_next(insn_byte_t, insn, 2);
+ insn->vex_prefix.bytes[2] = b2;
+ insn->vex_prefix.nbytes = 3;
+ insn->next_byte += 3;
+ if (insn->x86_64 && X86_VEX_W(b2))
+ /* VEX.W overrides opnd_size */
+ insn->opnd_bytes = 8;
+ } else {
+ insn->vex_prefix.nbytes = 2;
+ insn->next_byte += 2;
+ }
+ }
+vex_end:
+ insn->vex_prefix.got = 1;
+
prefixes->got = 1;
return;
}
@@ -147,6 +184,18 @@ void insn_get_opcode(struct insn *insn)
op = get_next(insn_byte_t, insn);
opcode->bytes[0] = op;
opcode->nbytes = 1;
+
+ /* Check if there is VEX prefix or not */
+ if (insn_is_avx(insn)) {
+ insn_byte_t m, p;
+ m = insn_vex_m_bits(insn);
+ p = insn_vex_p_bits(insn);
+ insn->attr = inat_get_avx_attribute(op, m, p);
+ if (!inat_accept_vex(insn->attr))
+ insn->attr = 0; /* This instruction is bad */
+ goto end; /* VEX has only 1 byte for opcode */
+ }
+
insn->attr = inat_get_opcode_attribute(op);
while (inat_is_escape(insn->attr)) {
/* Get escaped opcode */
@@ -155,6 +204,9 @@ void insn_get_opcode(struct insn *insn)
pfx = insn_last_prefix(insn);
insn->attr = inat_get_escape_attribute(op, pfx, insn->attr);
}
+ if (inat_must_vex(insn->attr))
+ insn->attr = 0; /* This instruction is bad */
+end:
opcode->got = 1;
}

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 1f41246..9887bfe 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -3,6 +3,7 @@
#<Opcode maps>
# Table: table-name
# Referrer: escaped-name
+# AVXcode: avx-code
# opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
# (or)
# opcode: escape # escaped-name
@@ -13,9 +14,16 @@
# reg: mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
# EndTable
#
+# AVX Superscripts
+# (VEX): this opcode can accept VEX prefix.
+# (oVEX): this opcode requires VEX prefix.
+# (o128): this opcode only supports 128bit VEX.
+# (o256): this opcode only supports 256bit VEX.
+#

Table: one byte opcode
Referrer:
+AVXcode:
# 0x00 - 0x0f
00: ADD Eb,Gb
01: ADD Ev,Gv
@@ -225,8 +233,8 @@ c0: Grp2 Eb,Ib (1A)
c1: Grp2 Ev,Ib (1A)
c2: RETN Iw (f64)
c3: RETN
-c4: LES Gz,Mp (i64)
-c5: LDS Gz,Mp (i64)
+c4: LES Gz,Mp (i64) | 3bytes-VEX (Prefix)
+c5: LDS Gz,Mp (i64) | 2bytes-VEX (Prefix)
c6: Grp11 Eb,Ib (1A)
c7: Grp11 Ev,Iz (1A)
c8: ENTER Iw,Ib
@@ -290,8 +298,9 @@ fe: Grp4 (1A)
ff: Grp5 (1A)
EndTable

-Table: 2-byte opcode # First Byte is 0x0f
+Table: 2-byte opcode (0x0f)
Referrer: 2-byte escape
+AVXcode: 1
# 0x0f 0x00-0x0f
00: Grp6 (1A)
01: Grp7 (1A)
@@ -311,14 +320,14 @@ Referrer: 2-byte escape
# 3DNow! uses the last imm byte as opcode extension.
0f: 3DNow! Pq,Qq,Ib
# 0x0f 0x10-0x1f
-10: movups Vps,Wps | movss Vss,Wss (F3) | movupd Vpd,Wpd (66) | movsd Vsd,Wsd (F2)
-11: movups Wps,Vps | movss Wss,Vss (F3) | movupd Wpd,Vpd (66) | movsd Wsd,Vsd (F2)
-12: movlps Vq,Mq | movlpd Vq,Mq (66) | movhlps Vq,Uq | movddup Vq,Wq (F2) | movsldup Vq,Wq (F3)
-13: mpvlps Mq,Vq | movlpd Mq,Vq (66)
-14: unpcklps Vps,Wq | unpcklpd Vpd,Wq (66)
-15: unpckhps Vps,Wq | unpckhpd Vpd,Wq (66)
-16: movhps Vq,Mq | movhpd Vq,Mq (66) | movlsps Vq,Uq | movshdup Vq,Wq (F3)
-17: movhps Mq,Vq | movhpd Mq,Vq (66)
+10: movups Vps,Wps (VEX) | movss Vss,Wss (F3),(VEX),(o128) | movupd Vpd,Wpd (66),(VEX) | movsd Vsd,Wsd (F2),(VEX),(o128)
+11: movups Wps,Vps (VEX) | movss Wss,Vss (F3),(VEX),(o128) | movupd Wpd,Vpd (66),(VEX) | movsd Wsd,Vsd (F2),(VEX),(o128)
+12: movlps Vq,Mq (VEX),(o128) | movlpd Vq,Mq (66),(VEX),(o128) | movhlps Vq,Uq (VEX),(o128) | movddup Vq,Wq (F2),(VEX) | movsldup Vq,Wq (F3),(VEX)
+13: mpvlps Mq,Vq (VEX),(o128) | movlpd Mq,Vq (66),(VEX),(o128)
+14: unpcklps Vps,Wq (VEX) | unpcklpd Vpd,Wq (66),(VEX)
+15: unpckhps Vps,Wq (VEX) | unpckhpd Vpd,Wq (66),(VEX)
+16: movhps Vq,Mq (VEX),(o128) | movhpd Vq,Mq (66),(VEX),(o128) | movlsps Vq,Uq (VEX),(o128) | movshdup Vq,Wq (F3),(VEX)
+17: movhps Mq,Vq (VEX),(o128) | movhpd Mq,Vq (66),(VEX),(o128)
18: Grp16 (1A)
19:
1a:
@@ -336,14 +345,14 @@ Referrer: 2-byte escape
25:
26:
27:
-28: movaps Vps,Wps | movapd Vpd,Wpd (66)
-29: movaps Wps,Vps | movapd Wpd,Vpd (66)
-2a: cvtpi2ps Vps,Qpi | cvtsi2ss Vss,Ed/q (F3) | cvtpi2pd Vpd,Qpi (66) | cvtsi2sd Vsd,Ed/q (F2)
-2b: movntps Mps,Vps | movntpd Mpd,Vpd (66)
-2c: cvttps2pi Ppi,Wps | cvttss2si Gd/q,Wss (F3) | cvttpd2pi Ppi,Wpd (66) | cvttsd2si Gd/q,Wsd (F2)
-2d: cvtps2pi Ppi,Wps | cvtss2si Gd/q,Wss (F3) | cvtpd2pi Qpi,Wpd (66) | cvtsd2si Gd/q,Wsd (F2)
-2e: ucomiss Vss,Wss | ucomisd Vsd,Wsd (66)
-2f: comiss Vss,Wss | comisd Vsd,Wsd (66)
+28: movaps Vps,Wps (VEX) | movapd Vpd,Wpd (66),(VEX)
+29: movaps Wps,Vps (VEX) | movapd Wpd,Vpd (66),(VEX)
+2a: cvtpi2ps Vps,Qpi | cvtsi2ss Vss,Ed/q (F3),(VEX),(o128) | cvtpi2pd Vpd,Qpi (66) | cvtsi2sd Vsd,Ed/q (F2),(VEX),(o128)
+2b: movntps Mps,Vps (VEX) | movntpd Mpd,Vpd (66),(VEX)
+2c: cvttps2pi Ppi,Wps | cvttss2si Gd/q,Wss (F3),(VEX),(o128) | cvttpd2pi Ppi,Wpd (66) | cvttsd2si Gd/q,Wsd (F2),(VEX),(o128)
+2d: cvtps2pi Ppi,Wps | cvtss2si Gd/q,Wss (F3),(VEX),(o128) | cvtpd2pi Qpi,Wpd (66) | cvtsd2si Gd/q,Wsd (F2),(VEX),(o128)
+2e: ucomiss Vss,Wss (VEX),(o128) | ucomisd Vsd,Wsd (66),(VEX),(o128)
+2f: comiss Vss,Wss (VEX),(o128) | comisd Vsd,Wsd (66),(VEX),(o128)
# 0x0f 0x30-0x3f
30: WRMSR
31: RDTSC
@@ -379,56 +388,56 @@ Referrer: 2-byte escape
4e: CMOVLE/NG Gv,Ev
4f: CMOVNLE/G Gv,Ev
# 0x0f 0x50-0x5f
-50: movmskps Gd/q,Ups | movmskpd Gd/q,Upd (66)
-51: sqrtps Vps,Wps | sqrtss Vss,Wss (F3) | sqrtpd Vpd,Wpd (66) | sqrtsd Vsd,Wsd (F2)
-52: rsqrtps Vps,Wps | rsqrtss Vss,Wss (F3)
-53: rcpps Vps,Wps | rcpss Vss,Wss (F3)
-54: andps Vps,Wps | andpd Vpd,Wpd (66)
-55: andnps Vps,Wps | andnpd Vpd,Wpd (66)
-56: orps Vps,Wps | orpd Vpd,Wpd (66)
-57: xorps Vps,Wps | xorpd Vpd,Wpd (66)
-58: addps Vps,Wps | addss Vss,Wss (F3) | addpd Vpd,Wpd (66) | addsd Vsd,Wsd (F2)
-59: mulps Vps,Wps | mulss Vss,Wss (F3) | mulpd Vpd,Wpd (66) | mulsd Vsd,Wsd (F2)
-5a: cvtps2pd Vpd,Wps | cvtss2sd Vsd,Wss (F3) | cvtpd2ps Vps,Wpd (66) | cvtsd2ss Vsd,Wsd (F2)
-5b: cvtdq2ps Vps,Wdq | cvtps2dq Vdq,Wps (66) | cvttps2dq Vdq,Wps (F3)
-5c: subps Vps,Wps | subss Vss,Wss (F3) | subpd Vpd,Wpd (66) | subsd Vsd,Wsd (F2)
-5d: minps Vps,Wps | minss Vss,Wss (F3) | minpd Vpd,Wpd (66) | minsd Vsd,Wsd (F2)
-5e: divps Vps,Wps | divss Vss,Wss (F3) | divpd Vpd,Wpd (66) | divsd Vsd,Wsd (F2)
-5f: maxps Vps,Wps | maxss Vss,Wss (F3) | maxpd Vpd,Wpd (66) | maxsd Vsd,Wsd (F2)
+50: movmskps Gd/q,Ups (VEX) | movmskpd Gd/q,Upd (66),(VEX)
+51: sqrtps Vps,Wps (VEX) | sqrtss Vss,Wss (F3),(VEX),(o128) | sqrtpd Vpd,Wpd (66),(VEX) | sqrtsd Vsd,Wsd (F2),(VEX),(o128)
+52: rsqrtps Vps,Wps (VEX) | rsqrtss Vss,Wss (F3),(VEX),(o128)
+53: rcpps Vps,Wps (VEX) | rcpss Vss,Wss (F3),(VEX),(o128)
+54: andps Vps,Wps (VEX) | andpd Vpd,Wpd (66),(VEX)
+55: andnps Vps,Wps (VEX) | andnpd Vpd,Wpd (66),(VEX)
+56: orps Vps,Wps (VEX) | orpd Vpd,Wpd (66),(VEX)
+57: xorps Vps,Wps (VEX) | xorpd Vpd,Wpd (66),(VEX)
+58: addps Vps,Wps (VEX) | addss Vss,Wss (F3),(VEX),(o128) | addpd Vpd,Wpd (66),(VEX) | addsd Vsd,Wsd (F2),(VEX),(o128)
+59: mulps Vps,Wps (VEX) | mulss Vss,Wss (F3),(VEX),(o128) | mulpd Vpd,Wpd (66),(VEX) | mulsd Vsd,Wsd (F2),(VEX),(o128)
+5a: cvtps2pd Vpd,Wps (VEX) | cvtss2sd Vsd,Wss (F3),(VEX),(o128) | cvtpd2ps Vps,Wpd (66),(VEX) | cvtsd2ss Vsd,Wsd (F2),(VEX),(o128)
+5b: cvtdq2ps Vps,Wdq (VEX) | cvtps2dq Vdq,Wps (66),(VEX) | cvttps2dq Vdq,Wps (F3),(VEX)
+5c: subps Vps,Wps (VEX) | subss Vss,Wss (F3),(VEX),(o128) | subpd Vpd,Wpd (66),(VEX) | subsd Vsd,Wsd (F2),(VEX),(o128)
+5d: minps Vps,Wps (VEX) | minss Vss,Wss (F3),(VEX),(o128) | minpd Vpd,Wpd (66),(VEX) | minsd Vsd,Wsd (F2),(VEX),(o128)
+5e: divps Vps,Wps (VEX) | divss Vss,Wss (F3),(VEX),(o128) | divpd Vpd,Wpd (66),(VEX) | divsd Vsd,Wsd (F2),(VEX),(o128)
+5f: maxps Vps,Wps (VEX) | maxss Vss,Wss (F3),(VEX),(o128) | maxpd Vpd,Wpd (66),(VEX) | maxsd Vsd,Wsd (F2),(VEX),(o128)
# 0x0f 0x60-0x6f
-60: punpcklbw Pq,Qd | punpcklbw Vdq,Wdq (66)
-61: punpcklwd Pq,Qd | punpcklwd Vdq,Wdq (66)
-62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66)
-63: packsswb Pq,Qq | packsswb Vdq,Wdq (66)
-64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66)
-65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66)
-66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66)
-67: packuswb Pq,Qq | packuswb Vdq,Wdq (66)
-68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66)
-69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66)
-6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66)
-6b: packssdw Pq,Qd | packssdw Vdq,Wdq (66)
-6c: punpcklqdq Vdq,Wdq (66)
-6d: punpckhqdq Vdq,Wdq (66)
-6e: movd/q/ Pd,Ed/q | movd/q Vdq,Ed/q (66)
-6f: movq Pq,Qq | movdqa Vdq,Wdq (66) | movdqu Vdq,Wdq (F3)
+60: punpcklbw Pq,Qd | punpcklbw Vdq,Wdq (66),(VEX),(o128)
+61: punpcklwd Pq,Qd | punpcklwd Vdq,Wdq (66),(VEX),(o128)
+62: punpckldq Pq,Qd | punpckldq Vdq,Wdq (66),(VEX),(o128)
+63: packsswb Pq,Qq | packsswb Vdq,Wdq (66),(VEX),(o128)
+64: pcmpgtb Pq,Qq | pcmpgtb Vdq,Wdq (66),(VEX),(o128)
+65: pcmpgtw Pq,Qq | pcmpgtw Vdq,Wdq (66),(VEX),(o128)
+66: pcmpgtd Pq,Qq | pcmpgtd Vdq,Wdq (66),(VEX),(o128)
+67: packuswb Pq,Qq | packuswb Vdq,Wdq (66),(VEX),(o128)
+68: punpckhbw Pq,Qd | punpckhbw Vdq,Wdq (66),(VEX),(o128)
+69: punpckhwd Pq,Qd | punpckhwd Vdq,Wdq (66),(VEX),(o128)
+6a: punpckhdq Pq,Qd | punpckhdq Vdq,Wdq (66),(VEX),(o128)
+6b: packssdw Pq,Qd | packssdw Vdq,Wdq (66),(VEX),(o128)
+6c: punpcklqdq Vdq,Wdq (66),(VEX),(o128)
+6d: punpckhqdq Vdq,Wdq (66),(VEX),(o128)
+6e: movd/q/ Pd,Ed/q | movd/q Vdq,Ed/q (66),(VEX),(o128)
+6f: movq Pq,Qq | movdqa Vdq,Wdq (66),(VEX) | movdqu Vdq,Wdq (F3),(VEX)
# 0x0f 0x70-0x7f
-70: pshufw Pq,Qq,Ib | pshufd Vdq,Wdq,Ib (66) | pshufhw Vdq,Wdq,Ib (F3) | pshuflw VdqWdq,Ib (F2)
+70: pshufw Pq,Qq,Ib | pshufd Vdq,Wdq,Ib (66),(VEX),(o128) | pshufhw Vdq,Wdq,Ib (F3),(VEX),(o128) | pshuflw VdqWdq,Ib (F2),(VEX),(o128)
71: Grp12 (1A)
72: Grp13 (1A)
73: Grp14 (1A)
-74: pcmpeqb Pq,Qq | pcmpeqb Vdq,Wdq (66)
-75: pcmpeqw Pq,Qq | pcmpeqw Vdq,Wdq (66)
-76: pcmpeqd Pq,Qq | pcmpeqd Vdq,Wdq (66)
-77: emms
+74: pcmpeqb Pq,Qq | pcmpeqb Vdq,Wdq (66),(VEX),(o128)
+75: pcmpeqw Pq,Qq | pcmpeqw Vdq,Wdq (66),(VEX),(o128)
+76: pcmpeqd Pq,Qq | pcmpeqd Vdq,Wdq (66),(VEX),(o128)
+77: emms/vzeroupper/vzeroall (VEX)
78: VMREAD Ed/q,Gd/q
79: VMWRITE Gd/q,Ed/q
7a:
7b:
-7c: haddps Vps,Wps (F2) | haddpd Vpd,Wpd (66)
-7d: hsubps Vps,Wps (F2) | hsubpd Vpd,Wpd (66)
-7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66) | movq Vq,Wq (F3)
-7f: movq Qq,Pq | movdqa Wdq,Vdq (66) | movdqu Wdq,Vdq (F3)
+7c: haddps Vps,Wps (F2),(VEX) | haddpd Vpd,Wpd (66),(VEX)
+7d: hsubps Vps,Wps (F2),(VEX) | hsubpd Vpd,Wpd (66),(VEX)
+7e: movd/q Ed/q,Pd | movd/q Ed/q,Vdq (66),(VEX),(o128) | movq Vq,Wq (F3),(VEX),(o128)
+7f: movq Qq,Pq | movdqa Wdq,Vdq (66),(VEX) | movdqu Wdq,Vdq (F3),(VEX)
# 0x0f 0x80-0x8f
80: JO Jz (f64)
81: JNO Jz (f64)
@@ -500,11 +509,11 @@ bf: MOVSX Gv,Ew
# 0x0f 0xc0-0xcf
c0: XADD Eb,Gb
c1: XADD Ev,Gv
-c2: cmpps Vps,Wps,Ib | cmpss Vss,Wss,Ib (F3) | cmppd Vpd,Wpd,Ib (66) | cmpsd Vsd,Wsd,Ib (F2)
+c2: cmpps Vps,Wps,Ib (VEX) | cmpss Vss,Wss,Ib (F3),(VEX),(o128) | cmppd Vpd,Wpd,Ib (66),(VEX) | cmpsd Vsd,Wsd,Ib (F2),(VEX)
c3: movnti Md/q,Gd/q
-c4: pinsrw Pq,Rd/q/Mw,Ib | pinsrw Vdq,Rd/q/Mw,Ib (66)
-c5: pextrw Gd,Nq,Ib | pextrw Gd,Udq,Ib (66)
-c6: shufps Vps,Wps,Ib | shufpd Vpd,Wpd,Ib (66)
+c4: pinsrw Pq,Rd/q/Mw,Ib | pinsrw Vdq,Rd/q/Mw,Ib (66),(VEX),(o128)
+c5: pextrw Gd,Nq,Ib | pextrw Gd,Udq,Ib (66),(VEX),(o128)
+c6: shufps Vps,Wps,Ib (VEX) | shufpd Vpd,Wpd,Ib (66),(VEX)
c7: Grp9 (1A)
c8: BSWAP RAX/EAX/R8/R8D
c9: BSWAP RCX/ECX/R9/R9D
@@ -515,77 +524,78 @@ cd: BSWAP RBP/EBP/R13/R13D
ce: BSWAP RSI/ESI/R14/R14D
cf: BSWAP RDI/EDI/R15/R15D
# 0x0f 0xd0-0xdf
-d0: addsubps Vps,Wps (F2) | addsubpd Vpd,Wpd (66)
-d1: psrlw Pq,Qq | psrlw Vdq,Wdq (66)
-d2: psrld Pq,Qq | psrld Vdq,Wdq (66)
-d3: psrlq Pq,Qq | psrlq Vdq,Wdq (66)
-d4: paddq Pq,Qq | paddq Vdq,Wdq (66)
-d5: pmullw Pq,Qq | pmullw Vdq,Wdq (66)
-d6: movq Wq,Vq (66) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
-d7: pmovmskb Gd,Nq | pmovmskb Gd,Udq (66)
-d8: psubusb Pq,Qq | psubusb Vdq,Wdq (66)
-d9: psubusw Pq,Qq | psubusw Vdq,Wdq (66)
-da: pminub Pq,Qq | pminub Vdq,Wdq (66)
-db: pand Pq,Qq | pand Vdq,Wdq (66)
-dc: paddusb Pq,Qq | paddusb Vdq,Wdq (66)
-dd: paddusw Pq,Qq | paddusw Vdq,Wdq (66)
-de: pmaxub Pq,Qq | pmaxub Vdq,Wdq (66)
-df: pandn Pq,Qq | pandn Vdq,Wdq (66)
+d0: addsubps Vps,Wps (F2),(VEX) | addsubpd Vpd,Wpd (66),(VEX)
+d1: psrlw Pq,Qq | psrlw Vdq,Wdq (66),(VEX),(o128)
+d2: psrld Pq,Qq | psrld Vdq,Wdq (66),(VEX),(o128)
+d3: psrlq Pq,Qq | psrlq Vdq,Wdq (66),(VEX),(o128)
+d4: paddq Pq,Qq | paddq Vdq,Wdq (66),(VEX),(o128)
+d5: pmullw Pq,Qq | pmullw Vdq,Wdq (66),(VEX),(o128)
+d6: movq Wq,Vq (66),(VEX),(o128) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
+d7: pmovmskb Gd,Nq | pmovmskb Gd,Udq (66),(VEX),(o128)
+d8: psubusb Pq,Qq | psubusb Vdq,Wdq (66),(VEX),(o128)
+d9: psubusw Pq,Qq | psubusw Vdq,Wdq (66),(VEX),(o128)
+da: pminub Pq,Qq | pminub Vdq,Wdq (66),(VEX),(o128)
+db: pand Pq,Qq | pand Vdq,Wdq (66),(VEX),(o128)
+dc: paddusb Pq,Qq | paddusb Vdq,Wdq (66),(VEX),(o128)
+dd: paddusw Pq,Qq | paddusw Vdq,Wdq (66),(VEX),(o128)
+de: pmaxub Pq,Qq | pmaxub Vdq,Wdq (66),(VEX),(o128)
+df: pandn Pq,Qq | pandn Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0xe0-0xef
-e0: pavgb Pq,Qq | pavgb Vdq,Wdq (66)
-e1: psraw Pq,Qq | psraw Vdq,Wdq (66)
-e2: psrad Pq,Qq | psrad Vdq,Wdq (66)
-e3: pavgw Pq,Qq | pavgw Vdq,Wdq (66)
-e4: pmulhuw Pq,Qq | pmulhuw Vdq,Wdq (66)
-e5: pmulhw Pq,Qq | pmulhw Vdq,Wdq (66)
-e6: cvtpd2dq Vdq,Wpd (F2) | cvttpd2dq Vdq,Wpd (66) | cvtdq2pd Vpd,Wdq (F3)
-e7: movntq Mq,Pq | movntdq Mdq,Vdq (66)
-e8: psubsb Pq,Qq | psubsb Vdq,Wdq (66)
-e9: psubsw Pq,Qq | psubsw Vdq,Wdq (66)
-ea: pminsw Pq,Qq | pminsw Vdq,Wdq (66)
-eb: por Pq,Qq | por Vdq,Wdq (66)
-ec: paddsb Pq,Qq | paddsb Vdq,Wdq (66)
-ed: paddsw Pq,Qq | paddsw Vdq,Wdq (66)
-ee: pmaxsw Pq,Qq | pmaxsw Vdq,Wdq (66)
-ef: pxor Pq,Qq | pxor Vdq,Wdq (66)
+e0: pavgb Pq,Qq | pavgb Vdq,Wdq (66),(VEX),(o128)
+e1: psraw Pq,Qq | psraw Vdq,Wdq (66),(VEX),(o128)
+e2: psrad Pq,Qq | psrad Vdq,Wdq (66),(VEX),(o128)
+e3: pavgw Pq,Qq | pavgw Vdq,Wdq (66),(VEX),(o128)
+e4: pmulhuw Pq,Qq | pmulhuw Vdq,Wdq (66),(VEX),(o128)
+e5: pmulhw Pq,Qq | pmulhw Vdq,Wdq (66),(VEX),(o128)
+e6: cvtpd2dq Vdq,Wpd (F2),(VEX) | cvttpd2dq Vdq,Wpd (66),(VEX) | cvtdq2pd Vpd,Wdq (F3),(VEX)
+e7: movntq Mq,Pq | movntdq Mdq,Vdq (66),(VEX)
+e8: psubsb Pq,Qq | psubsb Vdq,Wdq (66),(VEX),(o128)
+e9: psubsw Pq,Qq | psubsw Vdq,Wdq (66),(VEX),(o128)
+ea: pminsw Pq,Qq | pminsw Vdq,Wdq (66),(VEX),(o128)
+eb: por Pq,Qq | por Vdq,Wdq (66),(VEX),(o128)
+ec: paddsb Pq,Qq | paddsb Vdq,Wdq (66),(VEX),(o128)
+ed: paddsw Pq,Qq | paddsw Vdq,Wdq (66),(VEX),(o128)
+ee: pmaxsw Pq,Qq | pmaxsw Vdq,Wdq (66),(VEX),(o128)
+ef: pxor Pq,Qq | pxor Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0xf0-0xff
-f0: lddqu Vdq,Mdq (F2)
-f1: psllw Pq,Qq | psllw Vdq,Wdq (66)
-f2: pslld Pq,Qq | pslld Vdq,Wdq (66)
-f3: psllq Pq,Qq | psllq Vdq,Wdq (66)
-f4: pmuludq Pq,Qq | pmuludq Vdq,Wdq (66)
-f5: pmaddwd Pq,Qq | pmaddwd Vdq,Wdq (66)
-f6: psadbw Pq,Qq | psadbw Vdq,Wdq (66)
-f7: maskmovq Pq,Nq | maskmovdqu Vdq,Udq (66)
-f8: psubb Pq,Qq | psubb Vdq,Wdq (66)
-f9: psubw Pq,Qq | psubw Vdq,Wdq (66)
-fa: psubd Pq,Qq | psubd Vdq,Wdq (66)
-fb: psubq Pq,Qq | psubq Vdq,Wdq (66)
-fc: paddb Pq,Qq | paddb Vdq,Wdq (66)
-fd: paddw Pq,Qq | paddw Vdq,Wdq (66)
-fe: paddd Pq,Qq | paddd Vdq,Wdq (66)
+f0: lddqu Vdq,Mdq (F2),(VEX)
+f1: psllw Pq,Qq | psllw Vdq,Wdq (66),(VEX),(o128)
+f2: pslld Pq,Qq | pslld Vdq,Wdq (66),(VEX),(o128)
+f3: psllq Pq,Qq | psllq Vdq,Wdq (66),(VEX),(o128)
+f4: pmuludq Pq,Qq | pmuludq Vdq,Wdq (66),(VEX),(o128)
+f5: pmaddwd Pq,Qq | pmaddwd Vdq,Wdq (66),(VEX),(o128)
+f6: psadbw Pq,Qq | psadbw Vdq,Wdq (66),(VEX),(o128)
+f7: maskmovq Pq,Nq | maskmovdqu Vdq,Udq (66),(VEX),(o128)
+f8: psubb Pq,Qq | psubb Vdq,Wdq (66),(VEX),(o128)
+f9: psubw Pq,Qq | psubw Vdq,Wdq (66),(VEX),(o128)
+fa: psubd Pq,Qq | psubd Vdq,Wdq (66),(VEX),(o128)
+fb: psubq Pq,Qq | psubq Vdq,Wdq (66),(VEX),(o128)
+fc: paddb Pq,Qq | paddb Vdq,Wdq (66),(VEX),(o128)
+fd: paddw Pq,Qq | paddw Vdq,Wdq (66),(VEX),(o128)
+fe: paddd Pq,Qq | paddd Vdq,Wdq (66),(VEX),(o128)
ff:
EndTable

Table: 3-byte opcode 1 (0x0f 0x38)
Referrer: 3-byte escape 1
+AVXcode: 2
# 0x0f 0x38 0x00-0x0f
-00: pshufb Pq,Qq | pshufb Vdq,Wdq (66)
-01: phaddw Pq,Qq | phaddw Vdq,Wdq (66)
-02: phaddd Pq,Qq | phaddd Vdq,Wdq (66)
-03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66)
-04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66)
-05: phsubw Pq,Qq | phsubw Vdq,Wdq (66)
-06: phsubd Pq,Qq | phsubd Vdq,Wdq (66)
-07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66)
-08: psignb Pq,Qq | psignb Vdq,Wdq (66)
-09: psignw Pq,Qq | psignw Vdq,Wdq (66)
-0a: psignd Pq,Qq | psignd Vdq,Wdq (66)
-0b: pmulhrsw Pq,Qq | pmulhrsw Vdq,Wdq (66)
-0c:
-0d:
-0e:
-0f:
+00: pshufb Pq,Qq | pshufb Vdq,Wdq (66),(VEX),(o128)
+01: phaddw Pq,Qq | phaddw Vdq,Wdq (66),(VEX),(o128)
+02: phaddd Pq,Qq | phaddd Vdq,Wdq (66),(VEX),(o128)
+03: phaddsw Pq,Qq | phaddsw Vdq,Wdq (66),(VEX),(o128)
+04: pmaddubsw Pq,Qq | pmaddubsw Vdq,Wdq (66),(VEX),(o128)
+05: phsubw Pq,Qq | phsubw Vdq,Wdq (66),(VEX),(o128)
+06: phsubd Pq,Qq | phsubd Vdq,Wdq (66),(VEX),(o128)
+07: phsubsw Pq,Qq | phsubsw Vdq,Wdq (66),(VEX),(o128)
+08: psignb Pq,Qq | psignb Vdq,Wdq (66),(VEX),(o128)
+09: psignw Pq,Qq | psignw Vdq,Wdq (66),(VEX),(o128)
+0a: psignd Pq,Qq | psignd Vdq,Wdq (66),(VEX),(o128)
+0b: pmulhrsw Pq,Qq | pmulhrsw Vdq,Wdq (66),(VEX),(o128)
+0c: Vpermilps /r (66),(oVEX)
+0d: Vpermilpd /r (66),(oVEX)
+0e: vtestps /r (66),(oVEX)
+0f: vtestpd /r (66),(oVEX)
# 0x0f 0x38 0x10-0x1f
10: pblendvb Vdq,Wdq (66)
11:
@@ -594,90 +604,99 @@ Referrer: 3-byte escape 1
14: blendvps Vdq,Wdq (66)
15: blendvpd Vdq,Wdq (66)
16:
-17: ptest Vdq,Wdq (66)
-18:
-19:
-1a:
+17: ptest Vdq,Wdq (66),(VEX)
+18: vbroadcastss /r (66),(oVEX)
+19: vbroadcastsd /r (66),(oVEX),(o256)
+1a: vbroadcastf128 /r (66),(oVEX),(o256)
1b:
-1c: pabsb Pq,Qq | pabsb Vdq,Wdq (66)
-1d: pabsw Pq,Qq | pabsw Vdq,Wdq (66)
-1e: pabsd Pq,Qq | pabsd Vdq,Wdq (66)
+1c: pabsb Pq,Qq | pabsb Vdq,Wdq (66),(VEX),(o128)
+1d: pabsw Pq,Qq | pabsw Vdq,Wdq (66),(VEX),(o128)
+1e: pabsd Pq,Qq | pabsd Vdq,Wdq (66),(VEX),(o128)
1f:
# 0x0f 0x38 0x20-0x2f
-20: pmovsxbw Vdq,Udq/Mq (66)
-21: pmovsxbd Vdq,Udq/Md (66)
-22: pmovsxbq Vdq,Udq/Mw (66)
-23: pmovsxwd Vdq,Udq/Mq (66)
-24: pmovsxwq Vdq,Udq/Md (66)
-25: pmovsxdq Vdq,Udq/Mq (66)
+20: pmovsxbw Vdq,Udq/Mq (66),(VEX),(o128)
+21: pmovsxbd Vdq,Udq/Md (66),(VEX),(o128)
+22: pmovsxbq Vdq,Udq/Mw (66),(VEX),(o128)
+23: pmovsxwd Vdq,Udq/Mq (66),(VEX),(o128)
+24: pmovsxwq Vdq,Udq/Md (66),(VEX),(o128)
+25: pmovsxdq Vdq,Udq/Mq (66),(VEX),(o128)
26:
27:
-28: pmuldq Vdq,Wdq (66)
-29: pcmpeqq Vdq,Wdq (66)
-2a: movntdqa Vdq,Mdq (66)
-2b: packusdw Vdq,Wdq (66)
-2c:
-2d:
-2e:
-2f:
+28: pmuldq Vdq,Wdq (66),(VEX),(o128)
+29: pcmpeqq Vdq,Wdq (66),(VEX),(o128)
+2a: movntdqa Vdq,Mdq (66),(VEX),(o128)
+2b: packusdw Vdq,Wdq (66),(VEX),(o128)
+2c: vmaskmovps(ld) /r (66),(oVEX)
+2d: vmaskmovpd(ld) /r (66),(oVEX)
+2e: vmaskmovps(st) /r (66),(oVEX)
+2f: vmaskmovpd(st) /r (66),(oVEX)
# 0x0f 0x38 0x30-0x3f
-30: pmovzxbw Vdq,Udq/Mq (66)
-31: pmovzxbd Vdq,Udq/Md (66)
-32: pmovzxbq Vdq,Udq/Mw (66)
-33: pmovzxwd Vdq,Udq/Mq (66)
-34: pmovzxwq Vdq,Udq/Md (66)
-35: pmovzxdq Vdq,Udq/Mq (66)
+30: pmovzxbw Vdq,Udq/Mq (66),(VEX),(o128)
+31: pmovzxbd Vdq,Udq/Md (66),(VEX),(o128)
+32: pmovzxbq Vdq,Udq/Mw (66),(VEX),(o128)
+33: pmovzxwd Vdq,Udq/Mq (66),(VEX),(o128)
+34: pmovzxwq Vdq,Udq/Md (66),(VEX),(o128)
+35: pmovzxdq Vdq,Udq/Mq (66),(VEX),(o128)
36:
-37: pcmpgtq Vdq,Wdq (66)
-38: pminsb Vdq,Wdq (66)
-39: pminsd Vdq,Wdq (66)
-3a: pminuw Vdq,Wdq (66)
-3b: pminud Vdq,Wdq (66)
-3c: pmaxsb Vdq,Wdq (66)
-3d: pmaxsd Vdq,Wdq (66)
-3e: pmaxuw Vdq,Wdq (66)
-3f: pmaxud Vdq,Wdq (66)
+37: pcmpgtq Vdq,Wdq (66),(VEX),(o128)
+38: pminsb Vdq,Wdq (66),(VEX),(o128)
+39: pminsd Vdq,Wdq (66),(VEX),(o128)
+3a: pminuw Vdq,Wdq (66),(VEX),(o128)
+3b: pminud Vdq,Wdq (66),(VEX),(o128)
+3c: pmaxsb Vdq,Wdq (66),(VEX),(o128)
+3d: pmaxsd Vdq,Wdq (66),(VEX),(o128)
+3e: pmaxuw Vdq,Wdq (66),(VEX),(o128)
+3f: pmaxud Vdq,Wdq (66),(VEX),(o128)
# 0x0f 0x38 0x4f-0xff
-40: pmulld Vdq,Wdq (66)
-41: phminposuw Vdq,Wdq (66)
+40: pmulld Vdq,Wdq (66),(VEX),(o128)
+41: phminposuw Vdq,Wdq (66),(VEX),(o128)
80: INVEPT Gd/q,Mdq (66)
81: INVPID Gd/q,Mdq (66)
-db: aesimc Vdq,Wdq (66)
-dc: aesenc Vdq,Wdq (66)
-dd: aesenclast Vdq,Wdq (66)
-de: aesdec Vdq,Wdq (66)
-df: aesdeclast Vdq,Wdq (66)
+db: aesimc Vdq,Wdq (66),(VEX),(o128)
+dc: aesenc Vdq,Wdq (66),(VEX),(o128)
+dd: aesenclast Vdq,Wdq (66),(VEX),(o128)
+de: aesdec Vdq,Wdq (66),(VEX),(o128)
+df: aesdeclast Vdq,Wdq (66),(VEX),(o128)
f0: MOVBE Gv,Mv | CRC32 Gd,Eb (F2)
f1: MOVBE Mv,Gv | CRC32 Gd,Ev (F2)
EndTable

Table: 3-byte opcode 2 (0x0f 0x3a)
Referrer: 3-byte escape 2
+AVXcode: 3
# 0x0f 0x3a 0x00-0xff
-08: roundps Vdq,Wdq,Ib (66)
-09: roundpd Vdq,Wdq,Ib (66)
-0a: roundss Vss,Wss,Ib (66)
-0b: roundsd Vsd,Wsd,Ib (66)
-0c: blendps Vdq,Wdq,Ib (66)
-0d: blendpd Vdq,Wdq,Ib (66)
-0e: pblendw Vdq,Wdq,Ib (66)
-0f: palignr Pq,Qq,Ib | palignr Vdq,Wdq,Ib (66)
-14: pextrb Rd/Mb,Vdq,Ib (66)
-15: pextrw Rd/Mw,Vdq,Ib (66)
-16: pextrd/pextrq Ed/q,Vdq,Ib (66)
-17: extractps Ed,Vdq,Ib (66)
-20: pinsrb Vdq,Rd/q/Mb,Ib (66)
-21: insertps Vdq,Udq/Md,Ib (66)
-22: pinsrd/pinsrq Vdq,Ed/q,Ib (66)
-40: dpps Vdq,Wdq,Ib (66)
-41: dppd Vdq,Wdq,Ib (66)
-42: mpsadbw Vdq,Wdq,Ib (66)
-44: pclmulq Vdq,Wdq,Ib (66)
-60: pcmpestrm Vdq,Wdq,Ib (66)
-61: pcmpestri Vdq,Wdq,Ib (66)
-62: pcmpistrm Vdq,Wdq,Ib (66)
-63: pcmpistri Vdq,Wdq,Ib (66)
-df: aeskeygenassist Vdq,Wdq,Ib (66)
+04: vpermilps /r,Ib (66),(oVEX)
+05: vpermilpd /r,Ib (66),(oVEX)
+06: vperm2f128 /r,Ib (66),(oVEX),(o256)
+08: roundps Vdq,Wdq,Ib (66),(VEX)
+09: roundpd Vdq,Wdq,Ib (66),(VEX)
+0a: roundss Vss,Wss,Ib (66),(VEX),(o128)
+0b: roundsd Vsd,Wsd,Ib (66),(VEX),(o128)
+0c: blendps Vdq,Wdq,Ib (66),(VEX)
+0d: blendpd Vdq,Wdq,Ib (66),(VEX)
+0e: pblendw Vdq,Wdq,Ib (66),(VEX),(o128)
+0f: palignr Pq,Qq,Ib | palignr Vdq,Wdq,Ib (66),(VEX),(o128)
+14: pextrb Rd/Mb,Vdq,Ib (66),(VEX),(o128)
+15: pextrw Rd/Mw,Vdq,Ib (66),(VEX),(o128)
+16: pextrd/pextrq Ed/q,Vdq,Ib (66),(VEX),(o128)
+17: extractps Ed,Vdq,Ib (66),(VEX),(o128)
+18: vinsertf128 /r,Ib (66),(oVEX),(o256)
+19: vextractf128 /r,Ib (66),(oVEX),(o256)
+20: pinsrb Vdq,Rd/q/Mb,Ib (66),(VEX),(o128)
+21: insertps Vdq,Udq/Md,Ib (66),(VEX),(o128)
+22: pinsrd/pinsrq Vdq,Ed/q,Ib (66),(VEX),(o128)
+40: dpps Vdq,Wdq,Ib (66),(VEX)
+41: dppd Vdq,Wdq,Ib (66),(VEX),(o128)
+42: mpsadbw Vdq,Wdq,Ib (66),(VEX),(o128)
+44: pclmulq Vdq,Wdq,Ib (66),(VEX),(o128)
+4a: vblendvps /r,Ib (66),(oVEX)
+4b: vblendvpd /r,Ib (66),(oVEX)
+4c: vpblendvb /r,Ib (66),(oVEX),(o128)
+60: pcmpestrm Vdq,Wdq,Ib (66),(VEX),(o128)
+61: pcmpestri Vdq,Wdq,Ib (66),(VEX),(o128)
+62: pcmpistrm Vdq,Wdq,Ib (66),(VEX),(o128)
+63: pcmpistri Vdq,Wdq,Ib (66),(VEX),(o128)
+df: aeskeygenassist Vdq,Wdq,Ib (66),(VEX),(o128)
EndTable

GrpTable: Grp1
@@ -785,29 +804,29 @@ GrpTable: Grp11
EndTable

GrpTable: Grp12
-2: psrlw Nq,Ib (11B) | psrlw Udq,Ib (66),(11B)
-4: psraw Nq,Ib (11B) | psraw Udq,Ib (66),(11B)
-6: psllw Nq,Ib (11B) | psllw Udq,Ib (66),(11B)
+2: psrlw Nq,Ib (11B) | psrlw Udq,Ib (66),(11B),(VEX),(o128)
+4: psraw Nq,Ib (11B) | psraw Udq,Ib (66),(11B),(VEX),(o128)
+6: psllw Nq,Ib (11B) | psllw Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp13
-2: psrld Nq,Ib (11B) | psrld Udq,Ib (66),(11B)
-4: psrad Nq,Ib (11B) | psrad Udq,Ib (66),(11B)
-6: pslld Nq,Ib (11B) | pslld Udq,Ib (66),(11B)
+2: psrld Nq,Ib (11B) | psrld Udq,Ib (66),(11B),(VEX),(o128)
+4: psrad Nq,Ib (11B) | psrad Udq,Ib (66),(11B),(VEX),(o128)
+6: pslld Nq,Ib (11B) | pslld Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp14
-2: psrlq Nq,Ib (11B) | psrlq Udq,Ib (66),(11B)
-3: psrldq Udq,Ib (66),(11B)
-6: psllq Nq,Ib (11B) | psllq Udq,Ib (66),(11B)
-7: pslldq Udq,Ib (66),(11B)
+2: psrlq Nq,Ib (11B) | psrlq Udq,Ib (66),(11B),(VEX),(o128)
+3: psrldq Udq,Ib (66),(11B),(VEX),(o128)
+6: psllq Nq,Ib (11B) | psllq Udq,Ib (66),(11B),(VEX),(o128)
+7: pslldq Udq,Ib (66),(11B),(VEX),(o128)
EndTable

GrpTable: Grp15
0: fxsave
1: fxstor
-2: ldmxcsr
-3: stmxcsr
+2: ldmxcsr (VEX)
+3: stmxcsr (VEX)
4: XSAVE
5: XRSTOR | lfence (11B)
6: mfence (11B)
diff --git a/arch/x86/tools/gen-insn-attr-x86.awk b/arch/x86/tools/gen-insn-attr-x86.awk
index 7d54929..e34e92a 100644
--- a/arch/x86/tools/gen-insn-attr-x86.awk
+++ b/arch/x86/tools/gen-insn-attr-x86.awk
@@ -13,6 +13,18 @@ function check_awk_implement() {
return ""
}

+# Clear working vars
+function clear_vars() {
+ delete table
+ delete lptable2
+ delete lptable1
+ delete lptable3
+ eid = -1 # escape id
+ gid = -1 # group id
+ aid = -1 # AVX id
+ tname = ""
+}
+
BEGIN {
# Implementation error checking
awkchecked = check_awk_implement()
@@ -24,11 +36,15 @@ BEGIN {

# Setup generating tables
print "/* x86 opcode map generated from x86-opcode-map.txt */"
- print "/* Do not change this code. */"
+ print "/* Do not change this code. */\n"
ggid = 1
geid = 1
+ gaid = 0
+ delete etable
+ delete gtable
+ delete atable

- opnd_expr = "^[[:alpha:]]"
+ opnd_expr = "^[[:alpha:]/]"
ext_expr = "^\\("
sep_expr = "^\\|$"
group_expr = "^Grp[[:alnum:]]+"
@@ -46,19 +62,19 @@ BEGIN {
imm_flag["Ob"] = "INAT_MOFFSET"
imm_flag["Ov"] = "INAT_MOFFSET"

- modrm_expr = "^([CDEGMNPQRSUVW][[:lower:]]+|NTA|T[012])"
+ modrm_expr = "^([CDEGMNPQRSUVW/][[:lower:]]+|NTA|T[012])"
force64_expr = "\\([df]64\\)"
rex_expr = "^REX(\\.[XRWB]+)*"
fpu_expr = "^ESC" # TODO

lprefix1_expr = "\\(66\\)"
- delete lptable1
- lprefix2_expr = "\\(F2\\)"
- delete lptable2
- lprefix3_expr = "\\(F3\\)"
- delete lptable3
+ lprefix2_expr = "\\(F3\\)"
+ lprefix3_expr = "\\(F2\\)"
max_lprefix = 4

+ vexok_expr = "\\(VEX\\)"
+ vexonly_expr = "\\(oVEX\\)"
+
prefix_expr = "\\(Prefix\\)"
prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
prefix_num["REPNE"] = "INAT_PFX_REPNE"
@@ -71,12 +87,10 @@ BEGIN {
prefix_num["SEG=GS"] = "INAT_PFX_GS"
prefix_num["SEG=SS"] = "INAT_PFX_SS"
prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
+ prefix_num["2bytes-VEX"] = "INAT_PFX_VEX2"
+ prefix_num["3bytes-VEX"] = "INAT_PFX_VEX3"

- delete table
- delete etable
- delete gtable
- eid = -1
- gid = -1
+ clear_vars()
}

function semantic_error(msg) {
@@ -97,14 +111,12 @@ function array_size(arr, i,c) {

/^Table:/ {
print "/* " $0 " */"
+ if (tname != "")
+ semantic_error("Hit Table: before EndTable:.");
}

/^Referrer:/ {
- if (NF == 1) {
- # primary opcode table
- tname = "inat_primary_table"
- eid = -1
- } else {
+ if (NF != 1) {
# escape opcode table
ref = ""
for (i = 2; i <= NF; i++)
@@ -114,6 +126,19 @@ function array_size(arr, i,c) {
}
}

+/^AVXcode:/ {
+ if (NF != 1) {
+ # AVX/escape opcode table
+ aid = $2
+ if (gaid <= aid)
+ gaid = aid + 1
+ if (tname == "") # AVX only opcode table
+ tname = sprintf("inat_avx_table_%d", $2)
+ }
+ if (aid == -1 && eid == -1) # primary opcode table
+ tname = "inat_primary_table"
+}
+
/^GrpTable:/ {
print "/* " $0 " */"
if (!($2 in group))
@@ -162,30 +187,33 @@ function print_table(tbl,name,fmt,n)
print_table(table, tname "[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,0] = tname
+ if (aid >= 0)
+ atable[aid,0] = tname
}
if (array_size(lptable1) != 0) {
print_table(lptable1,tname "_1[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,1] = tname "_1"
+ if (aid >= 0)
+ atable[aid,1] = tname "_1"
}
if (array_size(lptable2) != 0) {
print_table(lptable2,tname "_2[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,2] = tname "_2"
+ if (aid >= 0)
+ atable[aid,2] = tname "_2"
}
if (array_size(lptable3) != 0) {
print_table(lptable3,tname "_3[INAT_OPCODE_TABLE_SIZE]",
"0x%02x", 256)
etable[eid,3] = tname "_3"
+ if (aid >= 0)
+ atable[aid,3] = tname "_3"
}
}
print ""
- delete table
- delete lptable1
- delete lptable2
- delete lptable3
- gid = -1
- eid = -1
+ clear_vars()
}

function add_flags(old,new) {
@@ -284,6 +312,14 @@ function convert_operands(opnd, i,imm,mod)
if (match(opcode, fpu_expr))
flags = add_flags(flags, "INAT_MODRM")

+ # check VEX only code
+ if (match(ext, vexonly_expr))
+ flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
+
+ # check VEX only code
+ if (match(ext, vexok_expr))
+ flags = add_flags(flags, "INAT_VEXOK")
+
# check prefixes
if (match(ext, prefix_expr)) {
if (!prefix_num[opcode])
@@ -330,5 +366,15 @@ END {
for (j = 0; j < max_lprefix; j++)
if (gtable[i,j])
print " ["i"]["j"] = "gtable[i,j]","
+ print "};\n"
+ # print AVX opcode map's array
+ print "/* AVX opcode map array */"
+ print "const insn_attr_t const *inat_avx_tables[X86_VEX_M_MAX + 1]"\
+ "[INAT_LSTPFX_MAX + 1] = {"
+ for (i = 0; i < gaid; i++)
+ for (j = 0; j < max_lprefix; j++)
+ if (atable[i,j])
+ print " ["i"]["j"] = "atable[i,j]","
print "};"
}
+

2009-10-29 08:09:04

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] x86: Add Intel FMA instructions to x86 opcode map

Commit-ID: 3f7e454af1dd8b9cea410d9380d3f71477e94f2b
Gitweb: http://git.kernel.org/tip/3f7e454af1dd8b9cea410d9380d3f71477e94f2b
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:35 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:47 +0100

x86: Add Intel FMA instructions to x86 opcode map

Add Intel FMA(FUSED-MULTIPLY-ADD) instructions to x86 opcode map
for x86 instruction decoder.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204235.30545.33997.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/lib/x86-opcode-map.txt | 34 +++++++++++++++++++++++++++++++++-
1 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt
index 9887bfe..a793da5 100644
--- a/arch/x86/lib/x86-opcode-map.txt
+++ b/arch/x86/lib/x86-opcode-map.txt
@@ -647,11 +647,43 @@ AVXcode: 2
3d: pmaxsd Vdq,Wdq (66),(VEX),(o128)
3e: pmaxuw Vdq,Wdq (66),(VEX),(o128)
3f: pmaxud Vdq,Wdq (66),(VEX),(o128)
-# 0x0f 0x38 0x4f-0xff
+# 0x0f 0x38 0x40-0x8f
40: pmulld Vdq,Wdq (66),(VEX),(o128)
41: phminposuw Vdq,Wdq (66),(VEX),(o128)
80: INVEPT Gd/q,Mdq (66)
81: INVPID Gd/q,Mdq (66)
+# 0x0f 0x38 0x90-0xbf (FMA)
+96: vfmaddsub132pd/ps /r (66),(VEX)
+97: vfmsubadd132pd/ps /r (66),(VEX)
+98: vfmadd132pd/ps /r (66),(VEX)
+99: vfmadd132sd/ss /r (66),(VEX),(o128)
+9a: vfmsub132pd/ps /r (66),(VEX)
+9b: vfmsub132sd/ss /r (66),(VEX),(o128)
+9c: vfnmadd132pd/ps /r (66),(VEX)
+9d: vfnmadd132sd/ss /r (66),(VEX),(o128)
+9e: vfnmsub132pd/ps /r (66),(VEX)
+9f: vfnmsub132sd/ss /r (66),(VEX),(o128)
+a6: vfmaddsub213pd/ps /r (66),(VEX)
+a7: vfmsubadd213pd/ps /r (66),(VEX)
+a8: vfmadd213pd/ps /r (66),(VEX)
+a9: vfmadd213sd/ss /r (66),(VEX),(o128)
+aa: vfmsub213pd/ps /r (66),(VEX)
+ab: vfmsub213sd/ss /r (66),(VEX),(o128)
+ac: vfnmadd213pd/ps /r (66),(VEX)
+ad: vfnmadd213sd/ss /r (66),(VEX),(o128)
+ae: vfnmsub213pd/ps /r (66),(VEX)
+af: vfnmsub213sd/ss /r (66),(VEX),(o128)
+b6: vfmaddsub231pd/ps /r (66),(VEX)
+b7: vfmsubadd231pd/ps /r (66),(VEX)
+b8: vfmadd231pd/ps /r (66),(VEX)
+b9: vfmadd231sd/ss /r (66),(VEX),(o128)
+ba: vfmsub231pd/ps /r (66),(VEX)
+bb: vfmsub231sd/ss /r (66),(VEX),(o128)
+bc: vfnmadd231pd/ps /r (66),(VEX)
+bd: vfnmadd231sd/ss /r (66),(VEX),(o128)
+be: vfnmsub231pd/ps /r (66),(VEX)
+bf: vfnmsub231sd/ss /r (66),(VEX),(o128)
+# 0x0f 0x38 0xc0-0xff
db: aesimc Vdq,Wdq (66),(VEX),(o128)
dc: aesenc Vdq,Wdq (66),(VEX),(o128)
dd: aesenclast Vdq,Wdq (66),(VEX),(o128)

2009-10-29 08:09:30

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] kprobe-tracer: Compare both of event-name and event-group to find probe

Commit-ID: dd004c475cd15a5749b04b0283d41ffdfa57d658
Gitweb: http://git.kernel.org/tip/dd004c475cd15a5749b04b0283d41ffdfa57d658
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:44 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:47 +0100

kprobe-tracer: Compare both of event-name and event-group to find probe

Fix find_probe_event() to compare both of event-name and
event-group. Without this fix, kprobe-tracer overwrites existing
same event-name probe even if its group-name is different.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204244.30545.27516.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/trace/trace_kprobe.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index b8ef707..a86c3ac 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -353,12 +353,14 @@ static void free_trace_probe(struct trace_probe *tp)
kfree(tp);
}

-static struct trace_probe *find_probe_event(const char *event)
+static struct trace_probe *find_probe_event(const char *event,
+ const char *group)
{
struct trace_probe *tp;

list_for_each_entry(tp, &probe_list, list)
- if (!strcmp(tp->call.name, event))
+ if (strcmp(tp->call.name, event) == 0 &&
+ strcmp(tp->call.system, group) == 0)
return tp;
return NULL;
}
@@ -383,7 +385,7 @@ static int register_trace_probe(struct trace_probe *tp)
mutex_lock(&probe_lock);

/* register as an event */
- old_tp = find_probe_event(tp->call.name);
+ old_tp = find_probe_event(tp->call.name, tp->call.system);
if (old_tp) {
/* delete old event */
unregister_trace_probe(old_tp);

2009-10-29 08:09:48

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] perf/probes: Exit searching after finding target function

Commit-ID: 8030c5f5a57e018fcdeb1f395d7adc123b48ced6
Gitweb: http://git.kernel.org/tip/8030c5f5a57e018fcdeb1f395d7adc123b48ced6
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:42:53 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:48 +0100

perf/probes: Exit searching after finding target function

Exit searching after finding real (not-inlined) function,
because there should be no same symbol in that CU.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204252.30545.19251.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/util/probe-finder.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 54e7071..b98d35e 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -585,14 +585,14 @@ static int probefunc_callback(struct die_link *dlink, void *data)
DIE_IF(ret != DW_DLV_OK);
pr_debug("inline definition offset %lld\n",
pf->inl_offs);
- return 0;
+ return 0; /* Continue to search */
}
/* Get probe address */
pf->addr = die_get_entrypc(dlink->die);
pf->addr += pp->offset;
/* TODO: Check the address in this function */
show_probepoint(dlink->die, pp->offset, pf);
- /* Continue to search */
+ return 1; /* Exit; no same symbol in this CU. */
}
} else if (tag == DW_TAG_inlined_subroutine && pf->inl_offs) {
if (die_get_abstract_origin(dlink->die) == pf->inl_offs) {

2009-10-29 08:09:54

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] perf/probes: Improve command-line option of perf-probe

Commit-ID: 46ab49267d338eb5056d0077e16346509b9e9284
Gitweb: http://git.kernel.org/tip/46ab49267d338eb5056d0077e16346509b9e9284
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:43:02 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:48 +0100

perf/probes: Improve command-line option of perf-probe

Change command-line option from -P to --add, and accepting
probes without --add too.

perf probe --add "probe-define"

or, just:

perf probe "probe-define"

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204301.30545.48600.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-probe.c | 28 ++++++++++++++++++----------
1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index dcb406c..3370dab 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -65,8 +65,8 @@ static struct {

#define semantic_error(msg ...) die("Semantic error :" msg)

-static int parse_probepoint(const struct option *opt __used,
- const char *str, int unset __used)
+/* Parse a probe point. Note that any error must die. */
+static void parse_probepoint(const char *str)
{
char *argv[MAX_PROBE_ARGS + 2]; /* Event + probe + args */
int argc, i;
@@ -75,9 +75,6 @@ static int parse_probepoint(const struct option *opt __used,
char **event = &session.events[session.nr_probe];
int retp = 0;

- if (!str) /* The end of probe points */
- return 0;
-
pr_debug("probe-definition(%d): %s\n", session.nr_probe, str);
if (++session.nr_probe == MAX_PROBES)
semantic_error("Too many probes");
@@ -176,6 +173,13 @@ static int parse_probepoint(const struct option *opt __used,
}

pr_debug("%d arguments\n", pp->nr_args);
+}
+
+static int opt_add_probepoint(const struct option *opt __used,
+ const char *str, int unset __used)
+{
+ if (str)
+ parse_probepoint(str);
return 0;
}

@@ -211,7 +215,8 @@ static int open_default_vmlinux(void)
#endif

static const char * const probe_usage[] = {
- "perf probe [<options>] -P 'PROBEDEF' [-P 'PROBEDEF' ...]",
+ "perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]",
+ "perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]",
NULL
};

@@ -222,7 +227,7 @@ static const struct option options[] = {
OPT_STRING('k', "vmlinux", &session.vmlinux, "file",
"vmlinux/module pathname"),
#endif
- OPT_CALLBACK('P', "probe", NULL,
+ OPT_CALLBACK('a', "add", NULL,
#ifdef NO_LIBDWARF
"p|r:[GRP/]NAME FUNC[+OFFS] [ARG ...]",
#else
@@ -243,7 +248,7 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
- parse_probepoint),
+ opt_add_probepoint),
OPT_END()
};

@@ -296,8 +301,11 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
char buf[MAX_CMDLEN];

argc = parse_options(argc, argv, options, probe_usage,
- PARSE_OPT_STOP_AT_NON_OPTION);
- if (argc || session.nr_probe == 0)
+ PARSE_OPT_STOP_AT_NON_OPTION);
+ for (i = 0; i < argc; i++)
+ parse_probe_event(argv[i]);
+
+ if (session.nr_probe == 0)
usage_with_options(probe_usage, options);

#ifdef NO_LIBDWARF

2009-10-29 08:10:32

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] perf/probes: Improve probe point syntax of perf-probe

Commit-ID: 253977b0d87fbb793f12b1661a763ae264028ccf
Gitweb: http://git.kernel.org/tip/253977b0d87fbb793f12b1661a763ae264028ccf
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:43:10 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:49 +0100

perf/probes: Improve probe point syntax of perf-probe

This changes probe point syntax of perf-probe as below

<SRC>[:ABS_LN] [ARGS]
or
<FUNC>[+OFFS|%return][@SRC] [ARGS]

And event name and event group name are automatically
generated based on probe-symbol and offset as below.

perfprobes/SYMBOL_OFFSET[_NUM]

Where SYMBOL is the probing symbol and OFFSET is
the byte offset from the symbol.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204310.30545.84984.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-probe.c | 181 ++++++++++++++++++++++++---------------
tools/perf/util/probe-finder.c | 10 ++
tools/perf/util/probe-finder.h | 2 +
3 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 3370dab..92b4c49 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -52,6 +52,7 @@ const char *default_search_path[NR_SEARCH_PATH] = {
#define MAX_PATH_LEN 256
#define MAX_PROBES 128
#define MAX_PROBE_ARGS 128
+#define PERFPROBE_GROUP "perfprobe"

/* Session management structure */
static struct {
@@ -60,20 +61,100 @@ static struct {
int need_dwarf;
int nr_probe;
struct probe_point probes[MAX_PROBES];
- char *events[MAX_PROBES];
} session;

#define semantic_error(msg ...) die("Semantic error :" msg)

-/* Parse a probe point. Note that any error must die. */
-static void parse_probepoint(const char *str)
+/* Parse probe point. Return 1 if return probe */
+static void parse_probe_point(char *arg, struct probe_point *pp)
+{
+ char *ptr, *tmp;
+ char c, nc;
+ /*
+ * <Syntax>
+ * perf probe SRC:LN
+ * perf probe FUNC[+OFFS|%return][@SRC]
+ */
+
+ ptr = strpbrk(arg, ":+@%");
+ if (ptr) {
+ nc = *ptr;
+ *ptr++ = '\0';
+ }
+
+ /* Check arg is function or file and copy it */
+ if (strchr(arg, '.')) /* File */
+ pp->file = strdup(arg);
+ else /* Function */
+ pp->function = strdup(arg);
+ DIE_IF(pp->file == NULL && pp->function == NULL);
+
+ /* Parse other options */
+ while (ptr) {
+ arg = ptr;
+ c = nc;
+ ptr = strpbrk(arg, ":+@%");
+ if (ptr) {
+ nc = *ptr;
+ *ptr++ = '\0';
+ }
+ switch (c) {
+ case ':': /* Line number */
+ pp->line = strtoul(arg, &tmp, 0);
+ if (*tmp != '\0')
+ semantic_error("There is non-digit charactor"
+ " in line number.");
+ break;
+ case '+': /* Byte offset from a symbol */
+ pp->offset = strtoul(arg, &tmp, 0);
+ if (*tmp != '\0')
+ semantic_error("There is non-digit charactor"
+ " in offset.");
+ break;
+ case '@': /* File name */
+ if (pp->file)
+ semantic_error("SRC@SRC is not allowed.");
+ pp->file = strdup(arg);
+ DIE_IF(pp->file == NULL);
+ if (ptr)
+ semantic_error("@SRC must be the last "
+ "option.");
+ break;
+ case '%': /* Probe places */
+ if (strcmp(arg, "return") == 0) {
+ pp->retprobe = 1;
+ } else /* Others not supported yet */
+ semantic_error("%%%s is not supported.", arg);
+ break;
+ default:
+ DIE_IF("Program has a bug.");
+ break;
+ }
+ }
+
+ /* Exclusion check */
+ if (pp->line && pp->function)
+ semantic_error("Function-relative line number is not"
+ " supported yet.");
+ if (!pp->line && pp->file && !pp->function)
+ semantic_error("File always requires line number.");
+ if (pp->offset && !pp->function)
+ semantic_error("Offset requires an entry function.");
+ if (pp->retprobe && !pp->function)
+ semantic_error("Return probe requires an entry function.");
+ if (pp->offset && pp->retprobe)
+ semantic_error("Offset can't be used with return probe.");
+
+ pr_debug("symbol:%s file:%s line:%d offset:%d, return:%d\n",
+ pp->function, pp->file, pp->line, pp->offset, pp->retprobe);
+}
+
+/* Parse an event definition. Note that any error must die. */
+static void parse_probe_event(const char *str)
{
char *argv[MAX_PROBE_ARGS + 2]; /* Event + probe + args */
int argc, i;
- char *arg, *ptr;
struct probe_point *pp = &session.probes[session.nr_probe];
- char **event = &session.events[session.nr_probe];
- int retp = 0;

pr_debug("probe-definition(%d): %s\n", session.nr_probe, str);
if (++session.nr_probe == MAX_PROBES)
@@ -103,70 +184,28 @@ static void parse_probepoint(const char *str)
pr_debug("argv[%d]=%s\n", argc, argv[argc - 1]);
}
} while (*str != '\0');
- if (argc < 2)
- semantic_error("Need event-name and probe-point at least.");
-
- /* Parse the event name */
- if (argv[0][0] == 'r')
- retp = 1;
- else if (argv[0][0] != 'p')
- semantic_error("You must specify 'p'(kprobe) or"
- " 'r'(kretprobe) first.");
- /* TODO: check event name */
- *event = argv[0];
+ if (!argc)
+ semantic_error("An empty argument.");

/* Parse probe point */
- arg = argv[1];
- if (arg[0] == '@') {
- /* Source Line */
- arg++;
- ptr = strchr(arg, ':');
- if (!ptr || !isdigit(ptr[1]))
- semantic_error("Line number is required.");
- *ptr++ = '\0';
- if (strlen(arg) == 0)
- semantic_error("No file name.");
- pp->file = strdup(arg);
- pp->line = atoi(ptr);
- if (!pp->file || !pp->line)
- semantic_error("Failed to parse line.");
- pr_debug("file:%s line:%d\n", pp->file, pp->line);
- } else {
- /* Function name */
- ptr = strchr(arg, '+');
- if (ptr) {
- if (!isdigit(ptr[1]))
- semantic_error("Offset is required.");
- *ptr++ = '\0';
- pp->offset = atoi(ptr);
- } else
- ptr = arg;
- ptr = strchr(ptr, '@');
- if (ptr) {
- *ptr++ = '\0';
- pp->file = strdup(ptr);
- }
- pp->function = strdup(arg);
- pr_debug("symbol:%s file:%s offset:%d\n",
- pp->function, pp->file, pp->offset);
- }
- free(argv[1]);
+ parse_probe_point(argv[0], pp);
+ free(argv[0]);
if (pp->file)
session.need_dwarf = 1;

/* Copy arguments */
- pp->nr_args = argc - 2;
+ pp->nr_args = argc - 1;
if (pp->nr_args > 0) {
pp->args = (char **)malloc(sizeof(char *) * pp->nr_args);
if (!pp->args)
die("malloc");
- memcpy(pp->args, &argv[2], sizeof(char *) * pp->nr_args);
+ memcpy(pp->args, &argv[1], sizeof(char *) * pp->nr_args);
}

/* Ensure return probe has no C argument */
for (i = 0; i < pp->nr_args; i++)
if (is_c_varname(pp->args[i])) {
- if (retp)
+ if (pp->retprobe)
semantic_error("You can't specify local"
" variable for kretprobe");
session.need_dwarf = 1;
@@ -175,11 +214,11 @@ static void parse_probepoint(const char *str)
pr_debug("%d arguments\n", pp->nr_args);
}

-static int opt_add_probepoint(const struct option *opt __used,
+static int opt_add_probe_event(const struct option *opt __used,
const char *str, int unset __used)
{
if (str)
- parse_probepoint(str);
+ parse_probe_event(str);
return 0;
}

@@ -229,17 +268,16 @@ static const struct option options[] = {
#endif
OPT_CALLBACK('a', "add", NULL,
#ifdef NO_LIBDWARF
- "p|r:[GRP/]NAME FUNC[+OFFS] [ARG ...]",
+ "FUNC[+OFFS|%return] [ARG ...]",
#else
- "p|r:[GRP/]NAME FUNC[+OFFS][@SRC]|@SRC:LINE [ARG ...]",
+ "FUNC[+OFFS|%return][@SRC]|SRC:LINE [ARG ...]",
#endif
"probe point definition, where\n"
- "\t\tp:\tkprobe probe\n"
- "\t\tr:\tkretprobe probe\n"
"\t\tGRP:\tGroup name (optional)\n"
"\t\tNAME:\tEvent name\n"
"\t\tFUNC:\tFunction name\n"
"\t\tOFFS:\tOffset from function entry (in byte)\n"
+ "\t\t%return:\tPut the probe at function return\n"
#ifdef NO_LIBDWARF
"\t\tARG:\tProbe argument (only \n"
#else
@@ -248,7 +286,7 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
- opt_add_probepoint),
+ opt_add_probe_event),
OPT_END()
};

@@ -266,7 +304,7 @@ static int write_new_event(int fd, const char *buf)

#define MAX_CMDLEN 256

-static int synthesize_probepoint(struct probe_point *pp)
+static int synthesize_probe_event(struct probe_point *pp)
{
char *buf;
int i, len, ret;
@@ -316,12 +354,12 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
/* Synthesize probes without dwarf */
for (j = 0; j < session.nr_probe; j++) {
#ifndef NO_LIBDWARF
- if (session.events[j][0] != 'r') {
+ if (!session.probes[j].retprobe) {
session.need_dwarf = 1;
continue;
}
#endif
- ret = synthesize_probepoint(&session.probes[j]);
+ ret = synthesize_probe_event(&session.probes[j]);
if (ret == -E2BIG)
semantic_error("probe point is too long.");
else if (ret < 0)
@@ -349,7 +387,6 @@ int cmd_probe(int argc, const char **argv, const char *prefix __used)
ret = find_probepoint(fd, pp);
if (ret <= 0)
die("No probe point found.\n");
- pr_debug("probe event %s found\n", session.events[j]);
}
close(fd);

@@ -364,13 +401,17 @@ setup_probes:
for (j = 0; j < session.nr_probe; j++) {
pp = &session.probes[j];
if (pp->found == 1) {
- snprintf(buf, MAX_CMDLEN, "%s %s\n",
- session.events[j], pp->probes[0]);
+ snprintf(buf, MAX_CMDLEN, "%c:%s/%s_%x %s\n",
+ pp->retprobe ? 'r' : 'p', PERFPROBE_GROUP,
+ pp->function, pp->offset, pp->probes[0]);
write_new_event(fd, buf);
} else
for (i = 0; i < pp->found; i++) {
- snprintf(buf, MAX_CMDLEN, "%s%d %s\n",
- session.events[j], i, pp->probes[i]);
+ snprintf(buf, MAX_CMDLEN, "%c:%s/%s_%x_%d %s\n",
+ pp->retprobe ? 'r' : 'p',
+ PERFPROBE_GROUP,
+ pp->function, pp->offset, i,
+ pp->probes[0]);
write_new_event(fd, buf);
}
}
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index b98d35e..6d3bac9 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -483,10 +483,20 @@ static void show_probepoint(Dwarf_Die sp_die, Dwarf_Signed offs,
if (ret == DW_DLV_OK) {
ret = snprintf(tmp, MAX_PROBE_BUFFER, "%s+%u", name,
(unsigned int)offs);
+ /* Copy the function name if possible */
+ if (!pp->function) {
+ pp->function = strdup(name);
+ pp->offset = offs;
+ }
dwarf_dealloc(__dw_debug, name, DW_DLA_STRING);
} else {
/* This function has no name. */
ret = snprintf(tmp, MAX_PROBE_BUFFER, "0x%llx", pf->addr);
+ if (!pp->function) {
+ /* TODO: Use _stext */
+ pp->function = strdup("");
+ pp->offset = (int)pf->addr;
+ }
}
DIE_IF(ret < 0);
DIE_IF(ret >= MAX_PROBE_BUFFER);
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index d17fafc..240d6cb 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -22,6 +22,8 @@ struct probe_point {
int nr_args; /* Number of arguments */
char **args; /* Arguments */

+ int retprobe; /* Return probe */
+
/* Output */
int found; /* Number of found probe points */
char *probes[MAX_PROBES]; /* Output buffers (will be allocated)*/

2009-10-29 08:12:11

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:perf/probes] perf/probes: Support function entry relative line number

Commit-ID: b0ef07324310d66f660a311d4a8d669eda74f801
Gitweb: http://git.kernel.org/tip/b0ef07324310d66f660a311d4a8d669eda74f801
Author: Masami Hiramatsu <[email protected]>
AuthorDate: Tue, 27 Oct 2009 16:43:19 -0400
Committer: Ingo Molnar <[email protected]>
CommitDate: Thu, 29 Oct 2009 08:47:49 +0100

perf/probes: Support function entry relative line number

Add function-entry relative line number specifying support to
perf-probe. This allows users to define probes by line number
from entry of the function.

e.g.

perf probe schedule:16

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: Ananth N Mavinakayanahalli <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Frank Ch. Eigler <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Jason Baron <[email protected]>
Cc: K.Prasad <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
LKML-Reference: <20091027204319.30545.30678.stgit@harusame>
Signed-off-by: Ingo Molnar <[email protected]>
---
tools/perf/builtin-probe.c | 14 ++++----
tools/perf/util/probe-finder.c | 79 +++++++++++++++++++++++++++++++---------
tools/perf/util/probe-finder.h | 2 +
3 files changed, 70 insertions(+), 25 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 92b4c49..a99a366 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -133,17 +133,16 @@ static void parse_probe_point(char *arg, struct probe_point *pp)
}

/* Exclusion check */
- if (pp->line && pp->function)
- semantic_error("Function-relative line number is not"
- " supported yet.");
+ if (pp->line && pp->offset)
+ semantic_error("Offset can't be used with line number.");
if (!pp->line && pp->file && !pp->function)
semantic_error("File always requires line number.");
if (pp->offset && !pp->function)
semantic_error("Offset requires an entry function.");
if (pp->retprobe && !pp->function)
semantic_error("Return probe requires an entry function.");
- if (pp->offset && pp->retprobe)
- semantic_error("Offset can't be used with return probe.");
+ if ((pp->offset || pp->line) && pp->retprobe)
+ semantic_error("Offset/Line can't be used with return probe.");

pr_debug("symbol:%s file:%s line:%d offset:%d, return:%d\n",
pp->function, pp->file, pp->line, pp->offset, pp->retprobe);
@@ -270,7 +269,7 @@ static const struct option options[] = {
#ifdef NO_LIBDWARF
"FUNC[+OFFS|%return] [ARG ...]",
#else
- "FUNC[+OFFS|%return][@SRC]|SRC:LINE [ARG ...]",
+ "FUNC[+OFFS|%return|:RLN][@SRC]|SRC:ALN [ARG ...]",
#endif
"probe point definition, where\n"
"\t\tGRP:\tGroup name (optional)\n"
@@ -282,7 +281,8 @@ static const struct option options[] = {
"\t\tARG:\tProbe argument (only \n"
#else
"\t\tSRC:\tSource code path\n"
- "\t\tLINE:\tLine number\n"
+ "\t\tRLN:\tRelative line number from function entry.\n"
+ "\t\tALN:\tAbsolute line number in file.\n"
"\t\tARG:\tProbe argument (local variable name or\n"
#endif
"\t\t\tkprobe-tracer argument format is supported.)\n",
diff --git a/tools/perf/util/probe-finder.c b/tools/perf/util/probe-finder.c
index 6d3bac9..db96186 100644
--- a/tools/perf/util/probe-finder.c
+++ b/tools/perf/util/probe-finder.c
@@ -114,7 +114,7 @@ static int strtailcmp(const char *s1, const char *s2)
}

/* Find the fileno of the target file. */
-static Dwarf_Unsigned die_get_fileno(Dwarf_Die cu_die, const char *fname)
+static Dwarf_Unsigned cu_find_fileno(Dwarf_Die cu_die, const char *fname)
{
Dwarf_Signed cnt, i;
Dwarf_Unsigned found = 0;
@@ -335,6 +335,36 @@ static int attr_get_locdesc(Dwarf_Attribute attr, Dwarf_Locdesc *desc,
return ret;
}

+/* Get decl_file attribute value (file number) */
+static Dwarf_Unsigned die_get_decl_file(Dwarf_Die sp_die)
+{
+ Dwarf_Attribute attr;
+ Dwarf_Unsigned fno;
+ int ret;
+
+ ret = dwarf_attr(sp_die, DW_AT_decl_file, &attr, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_formudata(attr, &fno, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_dealloc(__dw_debug, attr, DW_DLA_ATTR);
+ return fno;
+}
+
+/* Get decl_line attribute value (line number) */
+static Dwarf_Unsigned die_get_decl_line(Dwarf_Die sp_die)
+{
+ Dwarf_Attribute attr;
+ Dwarf_Unsigned lno;
+ int ret;
+
+ ret = dwarf_attr(sp_die, DW_AT_decl_line, &attr, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_formudata(attr, &lno, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+ dwarf_dealloc(__dw_debug, attr, DW_DLA_ATTR);
+ return lno;
+}
+
/*
* Probe finder related functions
*/
@@ -501,6 +531,7 @@ static void show_probepoint(Dwarf_Die sp_die, Dwarf_Signed offs,
DIE_IF(ret < 0);
DIE_IF(ret >= MAX_PROBE_BUFFER);
len = ret;
+ pr_debug("Probe point found: %s\n", tmp);

/* Find each argument */
get_current_frame_base(sp_die, pf);
@@ -536,17 +567,16 @@ static int probeaddr_callback(struct die_link *dlink, void *data)
}

/* Find probe point from its line number */
-static void find_by_line(Dwarf_Die cu_die, struct probe_finder *pf)
+static void find_by_line(struct probe_finder *pf)
{
- struct probe_point *pp = pf->pp;
- Dwarf_Signed cnt, i;
+ Dwarf_Signed cnt, i, clm;
Dwarf_Line *lines;
Dwarf_Unsigned lineno = 0;
Dwarf_Addr addr;
Dwarf_Unsigned fno;
int ret;

- ret = dwarf_srclines(cu_die, &lines, &cnt, &__dw_error);
+ ret = dwarf_srclines(pf->cu_die, &lines, &cnt, &__dw_error);
DIE_IF(ret != DW_DLV_OK);

for (i = 0; i < cnt; i++) {
@@ -557,15 +587,20 @@ static void find_by_line(Dwarf_Die cu_die, struct probe_finder *pf)

ret = dwarf_lineno(lines[i], &lineno, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
- if (lineno != (Dwarf_Unsigned)pp->line)
+ if (lineno != pf->lno)
continue;

+ ret = dwarf_lineoff(lines[i], &clm, &__dw_error);
+ DIE_IF(ret != DW_DLV_OK);
+
ret = dwarf_lineaddr(lines[i], &addr, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
- pr_debug("Probe point found: 0x%llx\n", addr);
+ pr_debug("Probe line found: line[%d]:%u,%d addr:0x%llx\n",
+ (int)i, (unsigned)lineno, (int)clm, addr);
pf->addr = addr;
/* Search a real subprogram including this line, */
- ret = search_die_from_children(cu_die, probeaddr_callback, pf);
+ ret = search_die_from_children(pf->cu_die,
+ probeaddr_callback, pf);
if (ret == 0)
die("Probe point is not found in subprograms.\n");
/* Continuing, because target line might be inlined. */
@@ -587,6 +622,13 @@ static int probefunc_callback(struct die_link *dlink, void *data)
DIE_IF(ret == DW_DLV_ERROR);
if (tag == DW_TAG_subprogram) {
if (die_compare_name(dlink->die, pp->function) == 0) {
+ if (pp->line) { /* Function relative line */
+ pf->fno = die_get_decl_file(dlink->die);
+ pf->lno = die_get_decl_line(dlink->die)
+ + pp->line;
+ find_by_line(pf);
+ return 1;
+ }
if (die_inlined_subprogram(dlink->die)) {
/* Inlined function, save it. */
ret = dwarf_die_CU_offset(dlink->die,
@@ -631,9 +673,9 @@ found:
return 0;
}

-static void find_by_func(Dwarf_Die cu_die, struct probe_finder *pf)
+static void find_by_func(struct probe_finder *pf)
{
- search_die_from_children(cu_die, probefunc_callback, pf);
+ search_die_from_children(pf->cu_die, probefunc_callback, pf);
}

/* Find a probe point */
@@ -641,7 +683,6 @@ int find_probepoint(int fd, struct probe_point *pp)
{
Dwarf_Half addr_size = 0;
Dwarf_Unsigned next_cuh = 0;
- Dwarf_Die cu_die = 0;
int cu_number = 0, ret;
struct probe_finder pf = {.pp = pp};

@@ -659,25 +700,27 @@ int find_probepoint(int fd, struct probe_point *pp)
break;

/* Get the DIE(Debugging Information Entry) of this CU */
- ret = dwarf_siblingof(__dw_debug, 0, &cu_die, &__dw_error);
+ ret = dwarf_siblingof(__dw_debug, 0, &pf.cu_die, &__dw_error);
DIE_IF(ret != DW_DLV_OK);

/* Check if target file is included. */
if (pp->file)
- pf.fno = die_get_fileno(cu_die, pp->file);
+ pf.fno = cu_find_fileno(pf.cu_die, pp->file);

if (!pp->file || pf.fno) {
/* Save CU base address (for frame_base) */
- ret = dwarf_lowpc(cu_die, &pf.cu_base, &__dw_error);
+ ret = dwarf_lowpc(pf.cu_die, &pf.cu_base, &__dw_error);
DIE_IF(ret == DW_DLV_ERROR);
if (ret == DW_DLV_NO_ENTRY)
pf.cu_base = 0;
- if (pp->line)
- find_by_line(cu_die, &pf);
if (pp->function)
- find_by_func(cu_die, &pf);
+ find_by_func(&pf);
+ else {
+ pf.lno = pp->line;
+ find_by_line(&pf);
+ }
}
- dwarf_dealloc(__dw_debug, cu_die, DW_DLA_DIE);
+ dwarf_dealloc(__dw_debug, pf.cu_die, DW_DLA_DIE);
}
ret = dwarf_finish(__dw_debug, &__dw_error);
DIE_IF(ret != DW_DLV_OK);
diff --git a/tools/perf/util/probe-finder.h b/tools/perf/util/probe-finder.h
index 240d6cb..bdebca6 100644
--- a/tools/perf/util/probe-finder.h
+++ b/tools/perf/util/probe-finder.h
@@ -41,7 +41,9 @@ struct probe_finder {
/* For function searching */
Dwarf_Addr addr; /* Address */
Dwarf_Unsigned fno; /* File number */
+ Dwarf_Unsigned lno; /* Line number */
Dwarf_Off inl_offs; /* Inline offset */
+ Dwarf_Die cu_die; /* Current CU */

/* For variable searching */
Dwarf_Addr cu_base; /* Current CU base address */

2009-10-29 08:54:26

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes


* Masami Hiramatsu <[email protected]> wrote:

> Hi Ingo,
>
> Here are bugfixes and some enhances of x86-insn decoder and perf-probe.
> - x86 insn decoder supports AVX and FMA.
> - perf-probe syntax change.
> - perf-probe supports function-relative line number.
> - minor bugfixes.
>
> New perf-probe syntax is below:
>
> perf probe 'PROBE'
>
> or
>
> perf probe --add 'PROBE'
>
> where, PROBE is
>
> <source>:<line-number>
>
> or
>
> <function>[:<rel-lineno>|+<byte-offset>|%return][@<source>]
>
> e.g.
>
> perf probe 'schedule:10@kernel/sched.c'
>
> puts a probe at 10th line from entry line of schedule() function
> in kernel/sched.c." and
>
> perf probe 'vmalloc%return'
>
> puts a return probe at the returning of vmalloc.
>
> TODO:
> - Support --line option to show which lines user can probe.
> - Support lazy string matching.

Ok, i like these fixes and improvements - thanks Masami!

One detail i noticed is that we are still not very smart about finding
our vmlinux file. I booted the kernel on a box and it gave:

aldebaran:~/linux/linux> perf probe schedule
Fatal: vmlinux/module file open
aldebaran:~/linux/linux> ls -l vmlinux
-rwxrwxr-x 1 mingo mingo 21589717 2009-10-29 09:04 vmlinux

Firstly, it should print something more meaningful, such as:

aldebaran:~/linux/linux> perf probe schedule
Fatal: Could not open vmlinux/module file

Secondly, we should look beyond these places:

25806 open("/lib/modules/2.6.32-rc5-tip-01760-g2c4b5e0-dirty/build/vmlinux", O_RDONLY) = -1 ENOENT (No such file or directory)
25806 open("/usr/lib/debug/lib/modules/2.6.32-rc5-tip-01760-g2c4b5e0-dirty/vmlinux", O_RDONLY) = -1 ENOENT (No such file or directory)
25806 open("/boot/vmlinux-debug-2.6.32-rc5-tip-01760-g2c4b5e0-dirty", O_RDONLY) = -1 ENOENT (No such file or directory)
25806 write(2, " Fatal: vmlinux/module file open"..., 34) = 34
25806 exit_group(128) = ?

and look into the current directory as well.

Thirdly, i think we should expose the build-id of the kernel and the
path to the vmlinux (and modules) via /proc/build-id or so. That way
tooling can find the vmlinux file and can validate that it matches the
kernel's signature. (maybe include a file date as well - that's a faster
check than a full build-id checksum, especially for large kernels)

Another problem i noticed is that a vmlinux without DEBUG_INFO will fail
in this way:

aldebaran:~/linux/linux> perf probe schedule
Fatal: Failed to call dwarf_init(). Maybe, not a dwarf file.

this is not meaningful to a user. A more usable message would be:

aldebaran:~/linux/linux> perf probe schedule
Fatal: Have not found dwarf info in the vmlinux - please rebuild with CONFIG_DEBUG_INFO

but ... we should really not force DEBUG_INFO for a simple probe that is
based on an ELF symbol. We already parse ELF symbols so 'perf probe
schedule' should be able to figure out where to put the probe point.

Not forcing debuginfo for simple usecases is extremely important for
usability.

And once i had these fixed, i got:

aldebaran:~/linux/linux> perf probe schedule
Fatal: kprobe_events open

Which is not a meaningful error message either. This error occured due
to CONFIG_KPROBE_TRACER not being enabled.

What we want here is two fold:

- enable kprobes event support when perf events is enabled and kprobes
is enabled. We dont want another config option for it.

- and we need to improve error messages so that users can figure out
what is the problem.

Once i had this third roadblock out of the way i noticed that the _real_
reason for the error was that i was not root and had no privilege to
insert a kprobe.

Once root, the probe worked well:

# perf probe schedule
Adding new event: p:perfprobe/schedule_0 schedule+0

And it seems to be precise:

aldebaran:/home/mingo> perf stat -e perfprobe:__switch_to_0 -e cs -a ./hackbench 1
Time: 0.018

Performance counter stats for './hackbench 1':

7358 perfprobe:__switch_to_0 # 0.000 M/sec
7364 context-switches # 0.000 M/sec

0.119152919 seconds time elapsed

(The difference of 6 context-switches should be investigated i suspect.)

Btw., very small nit, it would be better to put that sentence into past
tense:

# perf probe schedule
Added new event: p:perfprobe/schedule_0 schedule+0

To make sure the user knows that the action has been pursued already.

I'd expect the typical user give up much sooner than i did so we really
need to address these usability details - emit useful error messages and
be more successful in getting the user what he wants.

But the basic UI is already pretty promising!

A few further (and very small) UI tweaks i'd suggest:

Firstly, could we please make the first probe inserted named plain after
the symbol it specifies, with no _0 postfix? I.e. instead of:

7358 perfprobe:__switch_to_0 # 0.000 M/sec

we'd get:

7358 perfprobe:__switch_to # 0.000 M/sec

Subsequent probes for the same symbol can be named _1, _2 - but the
first symbol should not have this needless post-fix.

Secondly, i think we should remove the 'perf' bit from the probe name.
I.e. instead of:

7358 perfprobe:__switch_to # 0.000 M/sec

we should do:

7358 probe:__switch_to # 0.000 M/sec

as that's really what the user cares about. The user already knows that
we are in perf - no need to repeat that in every event specification.

Thanks,

Ingo

2009-10-29 15:10:32

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Em Thu, Oct 29, 2009 at 09:53:48AM +0100, Ingo Molnar escreveu:
> Thirdly, i think we should expose the build-id of the kernel and the
> path to the vmlinux (and modules) via /proc/build-id or so. That way
> tooling can find the vmlinux file and can validate that it matches the
> kernel's signature. (maybe include a file date as well - that's a faster
> check than a full build-id checksum, especially for large kernels)

Yeah, build-id need to be a first class citizen, being always available for
checking that the DSO in disk matches the one being currently used by some app.

Right now I got myself staring at these messages in a perf top session:

No symbols found in /usr/lib/libvte.so.9.6.0 (deleted), maybe install a debug package?
No symbols found in /lib/libgobject-2.0.so.0.2000.5.#prelink#.MDypV3 (deleted), maybe install a debug package?
No symbols found in /usr/lib/libpixman-1.so.0.14.0 (deleted), maybe install a debug package?
No symbols found in /usr/lib/libgdk-x11-2.0.so.0.1600.6.#prelink#.4CriMV (deleted), maybe install a debug package?
No symbols found in /lib/libgthread-2.0.so.0.2000.5.#prelink#.0ooYbx (deleted), maybe install a debug package?
No symbols found in /usr/lib/libX11.so.6.2.0 (deleted), maybe install a debug package?
No symbols found in /usr/lib/libxcb.so.1.1.0 (deleted), maybe install a debug package?

Well, I did a 'yum upgrade' and there are some binaries still running that hold
reference counts to those DSOs, but there is not anymore any place I can look
for its buildid to get the right symtab.

Corner case, yeah, but it would be nice to be able to query for its buildid and
then get the right symtab.

But as we don't have that, I'll just look for (deleted) at the end of the DSO
name and emit a more friendly warning...

- Arnaldo

2009-10-29 16:56:37

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Ingo Molnar wrote:
>
> * Masami Hiramatsu<[email protected]> wrote:
>
>> Hi Ingo,
>>
>> Here are bugfixes and some enhances of x86-insn decoder and perf-probe.
>> - x86 insn decoder supports AVX and FMA.
>> - perf-probe syntax change.
>> - perf-probe supports function-relative line number.
>> - minor bugfixes.
>>
>> New perf-probe syntax is below:
>>
>> perf probe 'PROBE'
>>
>> or
>>
>> perf probe --add 'PROBE'
>>
>> where, PROBE is
>>
>> <source>:<line-number>
>>
>> or
>>
>> <function>[:<rel-lineno>|+<byte-offset>|%return][@<source>]
>>
>> e.g.
>>
>> perf probe 'schedule:10@kernel/sched.c'
>>
>> puts a probe at 10th line from entry line of schedule() function
>> in kernel/sched.c." and
>>
>> perf probe 'vmalloc%return'
>>
>> puts a return probe at the returning of vmalloc.
>>
>> TODO:
>> - Support --line option to show which lines user can probe.
>> - Support lazy string matching.
>
> Ok, i like these fixes and improvements - thanks Masami!

Thanks!

> One detail i noticed is that we are still not very smart about finding
> our vmlinux file. I booted the kernel on a box and it gave:
>
> aldebaran:~/linux/linux> perf probe schedule
> Fatal: vmlinux/module file open
> aldebaran:~/linux/linux> ls -l vmlinux
> -rwxrwxr-x 1 mingo mingo 21589717 2009-10-29 09:04 vmlinux
>
> Firstly, it should print something more meaningful, such as:
>
> aldebaran:~/linux/linux> perf probe schedule
> Fatal: Could not open vmlinux/module file

Oops, yes, it should be. since previously I used perror() for error message,
all die() messages in perf probe should be changed to show actual error message.

> Secondly, we should look beyond these places:
>
> 25806 open("/lib/modules/2.6.32-rc5-tip-01760-g2c4b5e0-dirty/build/vmlinux", O_RDONLY) = -1 ENOENT (No such file or directory)
> 25806 open("/usr/lib/debug/lib/modules/2.6.32-rc5-tip-01760-g2c4b5e0-dirty/vmlinux", O_RDONLY) = -1 ENOENT (No such file or directory)
> 25806 open("/boot/vmlinux-debug-2.6.32-rc5-tip-01760-g2c4b5e0-dirty", O_RDONLY) = -1 ENOENT (No such file or directory)
> 25806 write(2, " Fatal: vmlinux/module file open"..., 34) = 34
> 25806 exit_group(128) = ?
>
> and look into the current directory as well.

I think that perf-probe has -k option for that purpose, doesn't it? :-)
Did you move your vmlinux from build dir to current dir?
I assume people usually just installs kernel and runs perf tools,
and they might not tend to copy vmlinux to current dir, just IMHO.
But anyway, if other perf commands already search vmlinux in current
dir, I'm OK for adding it on the list.

> Thirdly, i think we should expose the build-id of the kernel and the
> path to the vmlinux (and modules) via /proc/build-id or so. That way
> tooling can find the vmlinux file and can validate that it matches the
> kernel's signature. (maybe include a file date as well - that's a faster
> check than a full build-id checksum, especially for large kernels)

That's a good idea!
I think we can expose build-id via /sys/modules/*/build-id.


> Another problem i noticed is that a vmlinux without DEBUG_INFO will fail
> in this way:
>
> aldebaran:~/linux/linux> perf probe schedule
> Fatal: Failed to call dwarf_init(). Maybe, not a dwarf file.

Ah, really? I think I broke need_dwarf logic somehow...

> this is not meaningful to a user. A more usable message would be:
>
> aldebaran:~/linux/linux> perf probe schedule
> Fatal: Have not found dwarf info in the vmlinux - please rebuild with CONFIG_DEBUG_INFO

Sure.

>
> but ... we should really not force DEBUG_INFO for a simple probe that is
> based on an ELF symbol. We already parse ELF symbols so 'perf probe
> schedule' should be able to figure out where to put the probe point.

Agreed, I'll figure out how perf-probe can use ELF symbols too.

> Not forcing debuginfo for simple usecases is extremely important for
> usability.

Yeah, it should not.

>
> And once i had these fixed, i got:
>
> aldebaran:~/linux/linux> perf probe schedule
> Fatal: kprobe_events open
>
> Which is not a meaningful error message either. This error occured due
> to CONFIG_KPROBE_TRACER not being enabled.
>
> What we want here is two fold:
>
> - enable kprobes event support when perf events is enabled and kprobes
> is enabled. We dont want another config option for it.

Sure, at least that combination should enable kprobe-tracer forcibly.

> - and we need to improve error messages so that users can figure out
> what is the problem.

Sure.

> Once i had this third roadblock out of the way i noticed that the _real_
> reason for the error was that i was not root and had no privilege to
> insert a kprobe.
>
> Once root, the probe worked well:
>
> # perf probe schedule
> Adding new event: p:perfprobe/schedule_0 schedule+0
>
> And it seems to be precise:
>
> aldebaran:/home/mingo> perf stat -e perfprobe:__switch_to_0 -e cs -a ./hackbench 1
> Time: 0.018
>
> Performance counter stats for './hackbench 1':
>
> 7358 perfprobe:__switch_to_0 # 0.000 M/sec
> 7364 context-switches # 0.000 M/sec
>
> 0.119152919 seconds time elapsed
>
> (The difference of 6 context-switches should be investigated i suspect.)

Maybe, because of enabling/disabling timings difference.

> Btw., very small nit, it would be better to put that sentence into past
> tense:
>
> # perf probe schedule
> Added new event: p:perfprobe/schedule_0 schedule+0
>
> To make sure the user knows that the action has been pursued already.

OK.

> I'd expect the typical user give up much sooner than i did so we really
> need to address these usability details - emit useful error messages and
> be more successful in getting the user what he wants.
>
> But the basic UI is already pretty promising!
>
> A few further (and very small) UI tweaks i'd suggest:
>
> Firstly, could we please make the first probe inserted named plain after
> the symbol it specifies, with no _0 postfix? I.e. instead of:
>
> 7358 perfprobe:__switch_to_0 # 0.000 M/sec
>
> we'd get:
>
> 7358 perfprobe:__switch_to # 0.000 M/sec
>
> Subsequent probes for the same symbol can be named _1, _2 - but the
> first symbol should not have this needless post-fix.

Ah, this prefix means the offset from the symbol. Of course we can
remove it if the offset == 0. Or, would you think make the postfix
sequence number of probes on the same symbol?


> Secondly, i think we should remove the 'perf' bit from the probe name.
> I.e. instead of:
>
> 7358 perfprobe:__switch_to # 0.000 M/sec
>
> we should do:
>
> 7358 probe:__switch_to # 0.000 M/sec
>
> as that's really what the user cares about. The user already knows that
> we are in perf - no need to repeat that in every event specification.

Yeah, that's fine to me.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-29 17:16:36

by Frank Ch. Eigler

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Hi -

On Thu, Oct 29, 2009 at 09:53:48AM +0100, Ingo Molnar wrote:
> [...]
> Thirdly, i think we should expose the build-id of the kernel and the
> path to the vmlinux (and modules) via /proc/build-id or so. [...]

See "/sys/kernel/notes" and "/sys/module/*/notes".

- FChE

2009-10-29 19:19:07

by Roland McGrath

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

> Thirdly, i think we should expose the build-id of the kernel and the
> path to the vmlinux (and modules) via /proc/build-id or so. That way
> tooling can find the vmlinux file and can validate that it matches the
> kernel's signature. (maybe include a file date as well - that's a faster
> check than a full build-id checksum, especially for large kernels)

You seem to be confused about what build IDs are. There is no "checksum
validation". Once the bits are stored there is no calculation ever done
again, only exact comparison with an expected build ID bitstring. The size
of an object file is immaterial.

As Frank mentioned, the kernel's and modules' allocated ELF notes (and thus
build IDs) are already exposed in /sys. Tools like "eu-unstrip -nk" use
this information today.


Thanks,
Roland

2009-10-29 20:11:24

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Masami Hiramatsu wrote:
> Ingo Molnar wrote:
>> Another problem i noticed is that a vmlinux without DEBUG_INFO will fail
>> in this way:
>>
>> aldebaran:~/linux/linux> perf probe schedule
>> Fatal: Failed to call dwarf_init(). Maybe, not a dwarf file.
>
> Ah, really? I think I broke need_dwarf logic somehow...

Hmm, I've found that is for searching (implicitly) inlined symbols,
this means "the behavior is by (bad) design" :-(

I think it should be search the symbol in Elf (or kallsyms) first,
and only if it fails, use Dwarf for searching the symbol again.

Or, it may be enough that just trying to setup probe and if it fails
use Dwarf. This way doesn't require any vmlinux access.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-10-29 20:26:20

by Josh Stone

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

On 10/29/2009 01:10 PM, Masami Hiramatsu wrote:
> Masami Hiramatsu wrote:
>> Ingo Molnar wrote:
>>> Another problem i noticed is that a vmlinux without DEBUG_INFO will fail
>>> in this way:
>>>
>>> aldebaran:~/linux/linux> perf probe schedule
>>> Fatal: Failed to call dwarf_init(). Maybe, not a dwarf file.
>>
>> Ah, really? I think I broke need_dwarf logic somehow...
>
> Hmm, I've found that is for searching (implicitly) inlined symbols,
> this means "the behavior is by (bad) design" :-(
>
> I think it should be search the symbol in Elf (or kallsyms) first,
> and only if it fails, use Dwarf for searching the symbol again.
>
> Or, it may be enough that just trying to setup probe and if it fails
> use Dwarf. This way doesn't require any vmlinux access.

Just beware that functions can exist in the symbol table and as inlines
at the same time. For example, we've seen compat_sys_recvmsg get
inlined into compat_sys_socketcall, while it's still compiled as a
standalone function too. So if you have the dwarf, you should still try
to see if inlined versions exist.

Josh

2009-10-29 20:42:39

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Josh Stone wrote:
> On 10/29/2009 01:10 PM, Masami Hiramatsu wrote:
>> Masami Hiramatsu wrote:
>>> Ingo Molnar wrote:
>>>> Another problem i noticed is that a vmlinux without DEBUG_INFO will fail
>>>> in this way:
>>>>
>>>> aldebaran:~/linux/linux> perf probe schedule
>>>> Fatal: Failed to call dwarf_init(). Maybe, not a dwarf file.
>>>
>>> Ah, really? I think I broke need_dwarf logic somehow...
>>
>> Hmm, I've found that is for searching (implicitly) inlined symbols,
>> this means "the behavior is by (bad) design" :-(
>>
>> I think it should be search the symbol in Elf (or kallsyms) first,
>> and only if it fails, use Dwarf for searching the symbol again.
>>
>> Or, it may be enough that just trying to setup probe and if it fails
>> use Dwarf. This way doesn't require any vmlinux access.
>
> Just beware that functions can exist in the symbol table and as inlines
> at the same time. For example, we've seen compat_sys_recvmsg get
> inlined into compat_sys_socketcall, while it's still compiled as a
> standalone function too. So if you have the dwarf, you should still try
> to see if inlined versions exist.

Right, by default, perf-probe should see Dwarf since some static functions
may implicitly compiled as inline, and it is hard to request user checking
whether the symbol is inlined or not.

So, this means current design is not so bad, but practically, we'd better
provide an option which ignores inline functions. e.g. exported functions
will be always not-inlined.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-11-02 21:17:19

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes

Masami Hiramatsu wrote:
> Ingo Molnar wrote:
>> What we want here is two fold:
>>
>> - enable kprobes event support when perf events is enabled and kprobes
>> is enabled. We dont want another config option for it.
>
> Sure, at least that combination should enable kprobe-tracer forcibly.

Hmm, someone may not want to enables kprobe-tracer. Perhaps,
"default y if (EVENT_PROFILE)" is enough, isn't it?


>> A few further (and very small) UI tweaks i'd suggest:
>>
>> Firstly, could we please make the first probe inserted named plain after
>> the symbol it specifies, with no _0 postfix? I.e. instead of:
>>
>> 7358 perfprobe:__switch_to_0 # 0.000 M/sec
>>
>> we'd get:
>>
>> 7358 perfprobe:__switch_to # 0.000 M/sec
>>
>> Subsequent probes for the same symbol can be named _1, _2 - but the
>> first symbol should not have this needless post-fix.
>
> Ah, this prefix means the offset from the symbol. Of course we can
> remove it if the offset == 0. Or, would you think make the postfix
> sequence number of probes on the same symbol?

If so, we'd better have --list option before that and check the
postfix is already used, since we may not want to overwrite
existing probes, right?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-11-02 21:26:32

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes

On Mon, Nov 02, 2009 at 04:16:25PM -0500, Masami Hiramatsu wrote:
> Masami Hiramatsu wrote:
>> Ingo Molnar wrote:
>>> What we want here is two fold:
>>>
>>> - enable kprobes event support when perf events is enabled and kprobes
>>> is enabled. We dont want another config option for it.
>>
>> Sure, at least that combination should enable kprobe-tracer forcibly.
>
> Hmm, someone may not want to enables kprobe-tracer. Perhaps,
> "default y if (EVENT_PROFILE)" is enough, isn't it?
>


I guess it should be sufficient yeah. We want to strongly recommend
the kprobe events if we have enabled perf, but we don't want to force
it.

Also in this case we need a verbose runtime report of the lack of this
tracer in debugfs from perf probe if needed.

2009-11-02 21:58:07

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes

Frederic Weisbecker wrote:
> On Mon, Nov 02, 2009 at 04:16:25PM -0500, Masami Hiramatsu wrote:
>> Masami Hiramatsu wrote:
>>> Ingo Molnar wrote:
>>>> What we want here is two fold:
>>>>
>>>> - enable kprobes event support when perf events is enabled and kprobes
>>>> is enabled. We dont want another config option for it.
>>>
>>> Sure, at least that combination should enable kprobe-tracer forcibly.
>>
>> Hmm, someone may not want to enables kprobe-tracer. Perhaps,
>> "default y if (EVENT_PROFILE)" is enough, isn't it?
>>
>
>
> I guess it should be sufficient yeah. We want to strongly recommend
> the kprobe events if we have enabled perf, but we don't want to force
> it.
>
> Also in this case we need a verbose runtime report of the lack of this
> tracer in debugfs from perf probe if needed.

Sure, error message should be changed as warning user to enable
CONFIG_KPROBE_TRACER :-)

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-11-03 00:37:38

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes

Masami Hiramatsu wrote:
> Masami Hiramatsu wrote:
>> Ingo Molnar wrote:
>>> What we want here is two fold:
>>>
>>> - enable kprobes event support when perf events is enabled and kprobes
>>> is enabled. We dont want another config option for it.
>>
>> Sure, at least that combination should enable kprobe-tracer forcibly.
>
> Hmm, someone may not want to enables kprobe-tracer. Perhaps,
> "default y if (EVENT_PROFILE)" is enough, isn't it?

Oops, this causes recursive dependency error :-(

kernel/trace/Kconfig:90:error: found recursive dependency: TRACING ->
EVENT_TRACING -> EVENT_PROFILE -> KPROBE_TRACER -> GENERIC_TRACER -> TRACING

At this time, this kind of weak dependency may not supported by Kbuild yet.

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]

2009-11-03 07:25:18

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes


* Roland McGrath <[email protected]> wrote:

> > Thirdly, i think we should expose the build-id of the kernel and the
> > path to the vmlinux (and modules) via /proc/build-id or so. That way
> > tooling can find the vmlinux file and can validate that it matches
> > the kernel's signature. (maybe include a file date as well - that's
> > a faster check than a full build-id checksum, especially for large
> > kernels)
>
> You seem to be confused about what build IDs are. There is no
> "checksum validation". Once the bits are stored there is no
> calculation ever done again, only exact comparison with an expected
> build ID bitstring. The size of an object file is immaterial.
>
> As Frank mentioned, the kernel's and modules' allocated ELF notes (and
> thus build IDs) are already exposed in /sys. Tools like "eu-unstrip
> -nk" use this information today.

Ah, i didnt realize we link with --build-id already, unconditonally,
since v2.6.23 (if ld supports it):

|
| From 18991197b4b588255ccabf472ebc84db7b66a19c Mon Sep 17 00:00:00 2001
| From: Roland McGrath <[email protected]>
| Date: Thu, 19 Jul 2007 01:48:40 -0700
| Subject: [PATCH] Use --build-id ld option
|
| This change passes the --build-id when linking the kernel and when
| linking modules, if ld supports it. This is a new GNU ld option that
| synthesizes an ELF note section inside the read-only data. The note in
| this section contains unique identifying bits called the "build ID",
| which are generated so as to be different for any two linked ELF files
| that aren't identical.
|

So we have an SHA1 build-id already on the vmlinux and on modules, and
it's exposed in /sys/*/*/notes. Just have to make use of it in
tools/perf too.

The other useful addition i mentioned isnt implemented yet: to emit an
ELF note of the absolute path of the output directory the kernel was
built in as well. This information is not available right now, and it
would be a place to look in to search for the vmlinux and the modules.

What do you think? Also, if we do this, is there a standard way to name
it , or should i just pick a suitably new, Linux-specific name for that?

Thanks,

Ingo

2009-11-03 07:33:09

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes


* Masami Hiramatsu <[email protected]> wrote:

> Masami Hiramatsu wrote:
>> Masami Hiramatsu wrote:
>>> Ingo Molnar wrote:
>>>> What we want here is two fold:
>>>>
>>>> - enable kprobes event support when perf events is enabled and kprobes
>>>> is enabled. We dont want another config option for it.
>>>
>>> Sure, at least that combination should enable kprobe-tracer forcibly.
>>
>> Hmm, someone may not want to enables kprobe-tracer. Perhaps,
>> "default y if (EVENT_PROFILE)" is enough, isn't it?
>
> Oops, this causes recursive dependency error :-(
>
> kernel/trace/Kconfig:90:error: found recursive dependency: TRACING ->
> EVENT_TRACING -> EVENT_PROFILE -> KPROBE_TRACER -> GENERIC_TRACER -> TRACING

This dependency problem can be resolved by simply making it 'default y'
- the option itself depends on KPROBES already, which is default-off -
so no need to also make it depend on EVENT_PROFILE.

btw., it would be nice to re-name it to 'KPROBE_EVENTS'. If the probe
point is used as a count - like in the __switch_to example i cited -
there's no tracing going on at all.

Ingo

2009-11-03 12:35:41

by Arnaldo Carvalho de Melo

[permalink] [raw]
Subject: Using build-ids in perf tools was Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes

Em Tue, Nov 03, 2009 at 08:24:39AM +0100, Ingo Molnar escreveu:
> * Roland McGrath <[email protected]> wrote:
> > As Frank mentioned, the kernel's and modules' allocated ELF notes (and
> > thus build IDs) are already exposed in /sys. Tools like "eu-unstrip
> > -nk" use this information today.
>
> Ah, i didnt realize we link with --build-id already, unconditonally,
> since v2.6.23 (if ld supports it):
>
> | From 18991197b4b588255ccabf472ebc84db7b66a19c Mon Sep 17 00:00:00 2001
> | Subject: [PATCH] Use --build-id ld option
>
> So we have an SHA1 build-id already on the vmlinux and on modules, and
> it's exposed in /sys/*/*/notes. Just have to make use of it in
> tools/perf too.

I wasn't aware this was done upstream, ass-umed that it was only on
kernel specfiles, will cook up a patch for consideration.

- Arnaldo

2009-11-03 15:07:43

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip perf/probes 00/10] x86 insn decoder bugfixes and perf-probe syntax changes

Ingo Molnar wrote:
> * Masami Hiramatsu<[email protected]> wrote:
>> Masami Hiramatsu wrote:
>>> Masami Hiramatsu wrote:
>>>> Ingo Molnar wrote:
>>>>> What we want here is two fold:
>>>>>
>>>>> - enable kprobes event support when perf events is enabled and kprobes
>>>>> is enabled. We dont want another config option for it.
>>>>
>>>> Sure, at least that combination should enable kprobe-tracer forcibly.
>>>
>>> Hmm, someone may not want to enables kprobe-tracer. Perhaps,
>>> "default y if (EVENT_PROFILE)" is enough, isn't it?
>>
>> Oops, this causes recursive dependency error :-(
>>
>> kernel/trace/Kconfig:90:error: found recursive dependency: TRACING ->
>> EVENT_TRACING -> EVENT_PROFILE -> KPROBE_TRACER -> GENERIC_TRACER -> TRACING
>
> This dependency problem can be resolved by simply making it 'default y'
> - the option itself depends on KPROBES already, which is default-off -
> so no need to also make it depend on EVENT_PROFILE.

OK,

> btw., it would be nice to re-name it to 'KPROBE_EVENTS'. If the probe
> point is used as a count - like in the __switch_to example i cited -
> there's no tracing going on at all.

Sure, it's not a tracer anyway :-)

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division

e-mail: [email protected]