LinuxLists.cc - [PATCH 1/3] uprobes: add comment with insn opcodes, mnemonics and why we dont support them

2014-07-04 13:04:04

Subject: [PATCH 1/3] uprobes: add comment with insn opcodes, mnemonics and why we dont support them

After adding these, it's clear we have some awkward choices there.
Some valid instructions are prohibited from uprobing while
several invalid ones are allowed.

Hopefully future edits to the good-opcode tables will fix wrong bits
or explain why those bits are not wrong.

No actual code changes.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Jim Keniston <[email protected]>
CC: Masami Hiramatsu <[email protected]>
CC: Srikar Dronamraju <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 153 ++++++++++++++++++++++++++++++++++++++++------
1 file changed, 134 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 5d1cbfe..9ba2fb2 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -66,6 +66,49 @@
* Good-instruction tables for 32-bit apps. This is non-const and volatile
* to keep gcc from statically optimizing it out, as variable_test_bit makes
* some versions of gcc to think only *(unsigned long*) is used.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs
+ * (why we support bound (62) then? it's similar, and similarly unused...)
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * 07,17,1f - pop es/ss/ds
+ * Normally not used in userspace, but would execute if used.
+ * Can cause GP or stack exception if tries to load wrong segment descriptor.
+ * We hesitate to run them under single step since kernel's handling
+ * of userspace single-stepping (TF flag) is fragile.
+ * We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e)
+ * on the same grounds that they are never used.
+ * cd - int N.
+ * Used by userspace for "int 80" syscall entry. (Other "int N"
+ * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ * Not supported since kernel's handling of userspace single-stepping
+ * (TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ *
+ * Opcodes which can be enabled right away:
+ * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
+ * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
*/
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
static volatile u32 good_insns_32[256 / 32] = {
@@ -94,7 +137,55 @@ static volatile u32 good_insns_32[256 / 32] = {
#define good_insns_32 NULL
#endif

-/* Good-instruction tables for 64-bit apps */
+/* Good-instruction tables for 64-bit apps.
+ *
+ * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
+ * won't report *prefixes* as OPCODE1(insn).
+ * 0f - 2-byte opcode prefix
+ * 26,2e,36,3e - es:/cs:/ss:/ds:
+ * 40-4f - rex prefixes
+ * 64 - fs: (marked as "good", why?)
+ * 65 - gs: (marked as "good", why?)
+ * 66 - operand-size prefix
+ * 67 - address-size prefix
+ * f0 - lock prefix
+ * f2 - repnz (marked as "good", why?)
+ * f3 - rep/repz (marked as "good", why?)
+ *
+ * Genuinely invalid opcodes:
+ * 06,07 - formerly push/pop es
+ * 0e - formerly push cs
+ * 16,17 - formerly push/pop ss
+ * 1e,1f - formerly push/pop ds
+ * 27,2f,37,3f - formerly daa/das/aaa/aas
+ * 60,61 - formerly pusha/popa
+ * 62 - formerly bound. EVEX prefix for AVX512
+ * 82 - formerly redundant encoding of Group1
+ * 9a - formerly call seg:ofs (marked as "supported"???)
+ * c4,c5 - formerly les/lds. VEX prefixes for AVX
+ * ce - formerly into
+ * d4,d5 - formerly aam/aad
+ * d6 - formerly undocumented salc
+ * ea - formerly jmp seg:ofs (marked as "supported"???)
+ *
+ * Opcodes we'll probably never support:
+ * 6c-6f - ins,outs. SEGVs if used in userspace
+ * e4-e7 - in,out imm. SEGVs if used in userspace
+ * ec-ef - in,out acc. SEGVs if used in userspace
+ * cc - int3. SIGTRAP if used in userspace
+ * f1 - int1. SIGTRAP if used in userspace
+ * f4 - hlt. SEGVs if used in userspace
+ * fa - cli. SEGVs if used in userspace
+ * fb - sti. SEGVs if used in userspace
+ *
+ * Opcodes which need some work to be supported:
+ * cd - int N.
+ * Used by userspace for "int 80" syscall entry. (Other "int N"
+ * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
+ * Not supported since kernel's handling of userspace single-stepping
+ * (TF flag) is fragile.
+ * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
+ */
#if defined(CONFIG_X86_64)
static volatile u32 good_insns_64[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
@@ -122,7 +213,48 @@ static volatile u32 good_insns_64[256 / 32] = {
#define good_insns_64 NULL
#endif

-/* Using this for both 64-bit and 32-bit apps */
+/* Using this for both 64-bit and 32-bit apps.
+ * Opcodes we don't support:
+ * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns
+ * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
+ * Also encodes tons of other system insns if mod=11.
+ * Some are in fact non-system: xend, xtest, rdtscp, maybe more
+ * 0f 02 - lar (why? should be safe, it throws no exceptipons)
+ * 0f 03 - lsl (why? should be safe, it throws no exceptipons)
+ * 0f 04 - undefined
+ * 0f 05 - syscall
+ * 0f 06 - clts (CPL0 insn)
+ * 0f 07 - sysret
+ * 0f 08 - invd (CPL0 insn)
+ * 0f 09 - wbinvd (CPL0 insn)
+ * 0f 0a - undefined
+ * 0f 0b - ud2
+ * 0f 0c - undefined
+ * 0f 0d - prefetchFOO (amd prefetch insns)
+ * 0f 18 - prefetchBAR (intel prefetch insns)
+ * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
+ * 0f 25 - undefined
+ * 0f 26 - mov to test regs
+ * 0f 27 - undefined
+ * 0f 30 - wrmsr (CPL0 insn)
+ * 0f 34 - sysenter
+ * 0f 35 - sysexit
+ * 0f 36 - undefined
+ * 0f 37 - getsec
+ * 0f 38-3f - 3-byte opcodes (why?? all look safe)
+ * 0f 78 - vmread
+ * 0f 79 - vmwrite
+ * 0f 7a - undefined
+ * 0f 7b - undefined
+ * 0f 7c - undefined
+ * 0f 7d - undefined
+ * 0f a6 - undefined
+ * 0f a7 - undefined
+ * 0f b8 - popcnt (why?? it's an ordinary ALU op)
+ * 0f d0 - undefined
+ * 0f f0 - lddqu (why?? it's an ordinary vector load op)
+ * 0f ff - undefined
+ */
static volatile u32 good_2byte_insns[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
/* ---------------------------------------------- */
@@ -148,23 +280,6 @@ static volatile u32 good_2byte_insns[256 / 32] = {
#undef W

/*
- * opcodes we'll probably never support:
- *
- * 6c-6d, e4-e5, ec-ed - in
- * 6e-6f, e6-e7, ee-ef - out
- * cc, cd - int3, int
- * cf - iret
- * d6 - illegal instruction
- * f1 - int1/icebp
- * f4 - hlt
- * fa, fb - cli, sti
- * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
- *
- * invalid opcodes in 64-bit mode:
- *
- * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
- * 63 - we support this opcode in x86_64 but not in i386.
- *
* opcodes we may need to refine support for:
*
* 0f - 2-byte instructions: For many of these instructions, the validity
--
1.8.1.4

2014-07-04 13:04:22

by Denys Vlasenko

[permalink] [raw]

Subject: [PATCH 3/3] uprobe: fix 2-byte opcode table

Enabled probing of lar, lsl, popcnt, lddqu, prefetch insns.
They should be safe to probe, they throw no exceptions.

Enabled probing of 3-byte opcodes 0f 38-3f xx - these are
vector isns, so should be safe.

Enabled probing of many currently undefined 0f xx insns.
At the rate new vector instructions are getting added,
we don't want to constantly enable more bits.
We want to only occasionally *disable* ones which
for some reason can't be probed.
This includes 0f 24,26 opcodes, which are undefined
since Pentium. On 486, they were "mov to/from test register".

Explained more fully what 0f 78,79 opcodes are.

Explained what 0f ae opcode is. (It's unclear why we don't allow
probing it, but let's not change it for now).

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Jim Keniston <[email protected]>
CC: Masami Hiramatsu <[email protected]>
CC: Srikar Dronamraju <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 52 ++++++++++++++++-------------------------------
1 file changed, 17 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 40dfb4e..11619e1 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -189,61 +189,43 @@ static volatile u32 good_insns_64[256 / 32] = {
* 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
* Also encodes tons of other system insns if mod=11.
* Some are in fact non-system: xend, xtest, rdtscp, maybe more
- * 0f 02 - lar (why? should be safe, it throws no exceptipons)
- * 0f 03 - lsl (why? should be safe, it throws no exceptipons)
- * 0f 04 - undefined
* 0f 05 - syscall
* 0f 06 - clts (CPL0 insn)
* 0f 07 - sysret
* 0f 08 - invd (CPL0 insn)
* 0f 09 - wbinvd (CPL0 insn)
- * 0f 0a - undefined
* 0f 0b - ud2
- * 0f 0c - undefined
- * 0f 0d - prefetchFOO (amd prefetch insns)
- * 0f 18 - prefetchBAR (intel prefetch insns)
- * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
- * 0f 25 - undefined
- * 0f 26 - mov to test regs
- * 0f 27 - undefined
- * 0f 30 - wrmsr (CPL0 insn)
+ * 0f 30 - wrmsr (CPL0 insn) (then why rdmsr is allowed, it's also CPL0 insn?)
* 0f 34 - sysenter
* 0f 35 - sysexit
- * 0f 36 - undefined
* 0f 37 - getsec
- * 0f 38-3f - 3-byte opcodes (why?? all look safe)
- * 0f 78 - vmread
- * 0f 79 - vmwrite
- * 0f 7a - undefined
- * 0f 7b - undefined
- * 0f 7c - undefined
- * 0f 7d - undefined
- * 0f a6 - undefined
- * 0f a7 - undefined
- * 0f b8 - popcnt (why?? it's an ordinary ALU op)
- * 0f d0 - undefined
- * 0f f0 - lddqu (why?? it's an ordinary vector load op)
- * 0f ff - undefined
+ * 0f 78 - vmread (Intel VMX. CPL0 insn)
+ * 0f 79 - vmwrite (Intel VMX. CPL0 insn)
+ * Note: with prefixes, these two opcodes are
+ * extrq/insertq/AVX512 convert vector ops.
+ * 0f ae - group15: [f]xsave,[f]xrstor,[v]{ld,st}mxcsr,clflush[opt],
+ * {rd,wr}{fs,gs}base,{s,l,m}fence.
+ * Why? They are all user-executable.
*/
static volatile u32 good_2byte_insns[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
/* ---------------------------------------------- */
- W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1) | /* 00 */
- W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* 10 */
- W(0x20, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
- W(0x30, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) , /* 30 */
+ W(0x00, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1) | /* 00 */
+ W(0x10, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 10 */
+ W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
+ W(0x30, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */
W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 60 */
- W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1) , /* 70 */
+ W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1) , /* 70 */
W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
- W(0xa0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* a0 */
- W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
+ W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* a0 */
+ W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* c0 */
- W(0xd0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+ W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
W(0xe0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* e0 */
- W(0xf0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0) /* f0 */
+ W(0xf0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) /* f0 */
/* ---------------------------------------------- */
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
};
--
1.8.1.4

2014-07-04 13:04:55

by Denys Vlasenko

[permalink] [raw]

Subject: [PATCH 2/3] uprobes: fix 1-byte opcode tables

This change fixes 1-byte opcode tables so that only insns
for which we have real reasons to disallow probing are marked
with unset bits.

To that end:

Set bits for all prefix bytes. Their setting is ignored anyway -
we check the bitmap against OPCODE1(insn), not against first byte.
Keeping them set to 0 only confuses code reader with
"why we don't support that opcode" question.

Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes).
Byte 62 (EVEX prefix) is not yet enabled since insn decoder
does not support that yet.

For 32-bit mode, enable probing of opcodes 63 (arpl) and d6 (salc).
They don't require any special handling.

For 64-bit mode, disable 9a and ea - these undefined opcodes
were mistakenly left enabled.

Signed-off-by: Denys Vlasenko <[email protected]>
CC: Jim Keniston <[email protected]>
CC: Masami Hiramatsu <[email protected]>
CC: Srikar Dronamraju <[email protected]>
CC: Ingo Molnar <[email protected]>
CC: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 66 +++++++++++++----------------------------------
1 file changed, 18 insertions(+), 48 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9ba2fb2..40dfb4e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -67,18 +67,6 @@
* to keep gcc from statically optimizing it out, as variable_test_bit makes
* some versions of gcc to think only *(unsigned long*) is used.
*
- * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
- * won't report *prefixes* as OPCODE1(insn).
- * 0f - 2-byte opcode prefix
- * 26,2e,36,3e - es:/cs:/ss:/ds:
- * 64 - fs: (marked as "good", why?)
- * 65 - gs: (marked as "good", why?)
- * 66 - operand-size prefix
- * 67 - address-size prefix
- * f0 - lock prefix
- * f2 - repnz (marked as "good", why?)
- * f3 - rep/repz (marked as "good", why?)
- *
* Opcodes we'll probably never support:
* 6c-6f - ins,outs. SEGVs if used in userspace
* e4-e7 - in,out imm. SEGVs if used in userspace
@@ -105,31 +93,27 @@
* Not supported since kernel's handling of userspace single-stepping
* (TF flag) is fragile.
* cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
- *
- * Opcodes which can be enabled right away:
- * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
- * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
*/
#if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
static volatile u32 good_insns_32[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
/* ---------------------------------------------- */
- W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */
+ W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 00 */
W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */
- W(0x20, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* 20 */
- W(0x30, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) , /* 30 */
+ W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
+ W(0x30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */
W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
- W(0x60, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+ W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
- W(0xd0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
+ W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
- W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
+ W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
/* ---------------------------------------------- */
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
};
@@ -139,19 +123,6 @@ static volatile u32 good_insns_32[256 / 32] = {

/* Good-instruction tables for 64-bit apps.
*
- * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
- * won't report *prefixes* as OPCODE1(insn).
- * 0f - 2-byte opcode prefix
- * 26,2e,36,3e - es:/cs:/ss:/ds:
- * 40-4f - rex prefixes
- * 64 - fs: (marked as "good", why?)
- * 65 - gs: (marked as "good", why?)
- * 66 - operand-size prefix
- * 67 - address-size prefix
- * f0 - lock prefix
- * f2 - repnz (marked as "good", why?)
- * f3 - rep/repz (marked as "good", why?)
- *
* Genuinely invalid opcodes:
* 06,07 - formerly push/pop es
* 0e - formerly push cs
@@ -159,14 +130,13 @@ static volatile u32 good_insns_32[256 / 32] = {
* 1e,1f - formerly push/pop ds
* 27,2f,37,3f - formerly daa/das/aaa/aas
* 60,61 - formerly pusha/popa
- * 62 - formerly bound. EVEX prefix for AVX512
+ * 62 - formerly bound. EVEX prefix for AVX512 (not yet supported)
* 82 - formerly redundant encoding of Group1
- * 9a - formerly call seg:ofs (marked as "supported"???)
- * c4,c5 - formerly les/lds. VEX prefixes for AVX
+ * 9a - formerly call seg:ofs
* ce - formerly into
* d4,d5 - formerly aam/aad
* d6 - formerly undocumented salc
- * ea - formerly jmp seg:ofs (marked as "supported"???)
+ * ea - formerly jmp seg:ofs
*
* Opcodes we'll probably never support:
* 6c-6f - ins,outs. SEGVs if used in userspace
@@ -190,22 +160,22 @@ static volatile u32 good_insns_32[256 / 32] = {
static volatile u32 good_insns_64[256 / 32] = {
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
/* ---------------------------------------------- */
- W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */
+ W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* 00 */
W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */
- W(0x20, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 20 */
- W(0x30, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 30 */
- W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
+ W(0x20, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 20 */
+ W(0x30, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 30 */
+ W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
- W(0x60, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
+ W(0x60, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
- W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
+ W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1) , /* 90 */
W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
- W(0xc0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
+ W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
- W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
- W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
+ W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* e0 */
+ W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
/* ---------------------------------------------- */
/* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
};
--
1.8.1.4

2014-07-07 17:12:32

by Jim Keniston

[permalink] [raw]

Subject: Re: [PATCH 1/3] uprobes: add comment with insn opcodes, mnemonics and why we dont support them

On Fri, 2014-07-04 at 15:03 +0200, Denys Vlasenko wrote:
> After adding these, it's clear we have some awkward choices there.
> Some valid instructions are prohibited from uprobing while
> several invalid ones are allowed.
>
> Hopefully future edits to the good-opcode tables will fix wrong bits
> or explain why those bits are not wrong.
>
> No actual code changes.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Jim Keniston <[email protected]>
> CC: Masami Hiramatsu <[email protected]>
> CC: Srikar Dronamraju <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> ---
> arch/x86/kernel/uprobes.c | 153 ++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 134 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 5d1cbfe..9ba2fb2 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -66,6 +66,49 @@
> * Good-instruction tables for 32-bit apps. This is non-const and volatile
> * to keep gcc from statically optimizing it out, as variable_test_bit makes
> * some versions of gcc to think only *(unsigned long*) is used.
> + *
> + * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> + * won't report *prefixes* as OPCODE1(insn).
> + * 0f - 2-byte opcode prefix
> + * 26,2e,36,3e - es:/cs:/ss:/ds:
> + * 64 - fs: (marked as "good", why?)
> + * 65 - gs: (marked as "good", why?)
> + * 66 - operand-size prefix
> + * 67 - address-size prefix
> + * f0 - lock prefix
> + * f2 - repnz (marked as "good", why?)
> + * f3 - rep/repz (marked as "good", why?)

Quoting my email from May 5:
-----
The good-instruction tables date back 2006-2007. Back then, the
philosophy was to disallow any questionable opcodes, and add them back
into the "good" tables only when a need was demonstrated (i.e., somebody
needed to probe that particular instruction) and we could verify that
probing that instruction worked.

This was before the instruction decoder, so we tended to allow/disallow
prefixes as if they were opcodes. The fs, gs, and (I think) repz and
repnz prefixes were initially marked "bad," but were later changed and
verified as good when the need arose to probe instructions with those
prefixes.

And lacking the instruction decoder, a two-byte opcode was allowed if
any form of it was legitimate.

So (1) I acknowledge that the good/bad-opcode decisions could be much
more precise, and welcome any improvements in that regard; and (2) once
again I think somebody should erect a statue of Masami for the
painstaking work he did on the instruction decoder. Maybe add Masami
AND Denys to Mt. Rushmore. :-)
-----

Also, as noted below, some of the opcodes that were invalid back in
2006-2007 are now valid. However, arpl, lar, and lsl still look
questionable to me.

> + *
> + * Opcodes we'll probably never support:
> + * 6c-6f - ins,outs. SEGVs if used in userspace
> + * e4-e7 - in,out imm. SEGVs if used in userspace
> + * ec-ef - in,out acc. SEGVs if used in userspace
> + * cc - int3. SIGTRAP if used in userspace
> + * ce - into. Not used in userspace - no kernel support to make it useful. SEGVs
> + * (why we support bound (62) then? it's similar, and similarly unused...)

It was probably because I mistakenly thought into produced an exception
unconditionally.

> + * f1 - int1. SIGTRAP if used in userspace
> + * f4 - hlt. SEGVs if used in userspace
> + * fa - cli. SEGVs if used in userspace
> + * fb - sti. SEGVs if used in userspace
> + *
> + * Opcodes which need some work to be supported:
> + * 07,17,1f - pop es/ss/ds
> + * Normally not used in userspace, but would execute if used.
> + * Can cause GP or stack exception if tries to load wrong segment descriptor.
> + * We hesitate to run them under single step since kernel's handling
> + * of userspace single-stepping (TF flag) is fragile.
> + * We can easily refuse to support push es/cs/ss/ds (06/0e/16/1e)
> + * on the same grounds that they are never used.
> + * cd - int N.
> + * Used by userspace for "int 80" syscall entry. (Other "int N"
> + * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
> + * Not supported since kernel's handling of userspace single-stepping
> + * (TF flag) is fragile.
> + * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
> + *
> + * Opcodes which can be enabled right away:
> + * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).

The AMD64 Architecture Programmer's Manual documents arpl as a system
instruction ("Adjust Requestor Privilege Level"). Is this really valid
for user-space apps?

> + * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
> */
> #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
> static volatile u32 good_insns_32[256 / 32] = {
> @@ -94,7 +137,55 @@ static volatile u32 good_insns_32[256 / 32] = {
> #define good_insns_32 NULL
> #endif
>
> -/* Good-instruction tables for 64-bit apps */
> +/* Good-instruction tables for 64-bit apps.
> + *
> + * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> + * won't report *prefixes* as OPCODE1(insn).
> + * 0f - 2-byte opcode prefix
> + * 26,2e,36,3e - es:/cs:/ss:/ds:
> + * 40-4f - rex prefixes
> + * 64 - fs: (marked as "good", why?)
> + * 65 - gs: (marked as "good", why?)
> + * 66 - operand-size prefix
> + * 67 - address-size prefix
> + * f0 - lock prefix
> + * f2 - repnz (marked as "good", why?)
> + * f3 - rep/repz (marked as "good", why?)

See above.

> + *
> + * Genuinely invalid opcodes:
> + * 06,07 - formerly push/pop es
> + * 0e - formerly push cs
> + * 16,17 - formerly push/pop ss
> + * 1e,1f - formerly push/pop ds
> + * 27,2f,37,3f - formerly daa/das/aaa/aas
> + * 60,61 - formerly pusha/popa
> + * 62 - formerly bound. EVEX prefix for AVX512
> + * 82 - formerly redundant encoding of Group1
> + * 9a - formerly call seg:ofs (marked as "supported"???)

Correct. This should be invalid for x86_64.

> + * c4,c5 - formerly les/lds. VEX prefixes for AVX
> + * ce - formerly into
> + * d4,d5 - formerly aam/aad
> + * d6 - formerly undocumented salc
> + * ea - formerly jmp seg:ofs (marked as "supported"???)

Correct. This should be invalid for x86_64.

> + *
> + * Opcodes we'll probably never support:
> + * 6c-6f - ins,outs. SEGVs if used in userspace
> + * e4-e7 - in,out imm. SEGVs if used in userspace
> + * ec-ef - in,out acc. SEGVs if used in userspace
> + * cc - int3. SIGTRAP if used in userspace
> + * f1 - int1. SIGTRAP if used in userspace
> + * f4 - hlt. SEGVs if used in userspace
> + * fa - cli. SEGVs if used in userspace
> + * fb - sti. SEGVs if used in userspace
> + *
> + * Opcodes which need some work to be supported:
> + * cd - int N.
> + * Used by userspace for "int 80" syscall entry. (Other "int N"
> + * cause GP -> SEGV since their IDT gates don't allow calls from CPL 3).
> + * Not supported since kernel's handling of userspace single-stepping
> + * (TF flag) is fragile.
> + * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
> + */
> #if defined(CONFIG_X86_64)
> static volatile u32 good_insns_64[256 / 32] = {
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> @@ -122,7 +213,48 @@ static volatile u32 good_insns_64[256 / 32] = {
> #define good_insns_64 NULL
> #endif
>
> -/* Using this for both 64-bit and 32-bit apps */
> +/* Using this for both 64-bit and 32-bit apps.
> + * Opcodes we don't support:
> + * 0f 00 - SLDT/STR/LLDT/LTR/VERR/VERW/-/- group. System insns
> + * 0f 01 - SGDT/SIDT/LGDT/LIDT/SMSW/-/LMSW/INVLPG group.
> + * Also encodes tons of other system insns if mod=11.
> + * Some are in fact non-system: xend, xtest, rdtscp, maybe more
> + * 0f 02 - lar (why? should be safe, it throws no exceptipons)
> + * 0f 03 - lsl (why? should be safe, it throws no exceptipons)

lar (load access right byte) and lsl (load segment limit) look
questionable to me, like arpl.

> + * 0f 04 - undefined
> + * 0f 05 - syscall
> + * 0f 06 - clts (CPL0 insn)
> + * 0f 07 - sysret
> + * 0f 08 - invd (CPL0 insn)
> + * 0f 09 - wbinvd (CPL0 insn)
> + * 0f 0a - undefined
> + * 0f 0b - ud2
> + * 0f 0c - undefined
> + * 0f 0d - prefetchFOO (amd prefetch insns)
> + * 0f 18 - prefetchBAR (intel prefetch insns)
> + * 0f 24 - mov from test regs (perhaps entire 20-27 area can be disabled (special reg ops))
> + * 0f 25 - undefined
> + * 0f 26 - mov to test regs
> + * 0f 27 - undefined
> + * 0f 30 - wrmsr (CPL0 insn)
> + * 0f 34 - sysenter
> + * 0f 35 - sysexit
> + * 0f 36 - undefined
> + * 0f 37 - getsec
> + * 0f 38-3f - 3-byte opcodes (why?? all look safe)

My AMD64 reference back then (AMD64 Architecture Programmer's Manual Vol
3, dated September 2003) showed all those opcodes as invalid.

> + * 0f 78 - vmread
> + * 0f 79 - vmwrite
> + * 0f 7a - undefined
> + * 0f 7b - undefined
> + * 0f 7c - undefined
> + * 0f 7d - undefined
> + * 0f a6 - undefined
> + * 0f a7 - undefined
> + * 0f b8 - popcnt (why?? it's an ordinary ALU op)hinhinkyky

Invalid in 2003 AMD64 ref.

> + * 0f d0 - undefined
> + * 0f f0 - lddqu (why?? it's an ordinary vector load op)

Invalid in 2003 AMD64 ref.

> + * 0f ff - undefined
> + */
> static volatile u32 good_2byte_insns[256 / 32] = {
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> /* ---------------------------------------------- */
> @@ -148,23 +280,6 @@ static volatile u32 good_2byte_insns[256 / 32] = {
> #undef W
>
> /*
> - * opcodes we'll probably never support:
> - *
> - * 6c-6d, e4-e5, ec-ed - in
> - * 6e-6f, e6-e7, ee-ef - out
> - * cc, cd - int3, int
> - * cf - iret
> - * d6 - illegal instruction
> - * f1 - int1/icebp
> - * f4 - hlt
> - * fa, fb - cli, sti
> - * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2
> - *
> - * invalid opcodes in 64-bit mode:
> - *
> - * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5
> - * 63 - we support this opcode in x86_64 but not in i386.
> - *
> * opcodes we may need to refine support for:
> *
> * 0f - 2-byte instructions: For many of these instructions, the validity

Jim

2014-07-07 18:43:12

by Denys Vlasenko

[permalink] [raw]

Subject: Re: [PATCH 1/3] uprobes: add comment with insn opcodes, mnemonics and why we dont support them

On 07/07/2014 07:12 PM, Jim Keniston wrote:
> Also, as noted below, some of the opcodes that were invalid back in
> 2006-2007 are now valid. However, arpl, lar, and lsl still look
> questionable to me.

arpl, lar, and lsl never throw any exceptions
apart from those which any memory-referencing
arithmetic insn can throw.

In particular, they don't throw exceptions
if their operand is an invalid segment selector.

No additional code needed for them to work
when probed.

Thus I think no harm done enabling them.

>> + * Opcodes which can be enabled right away:
>> + * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
>
> The AMD64 Architecture Programmer's Manual documents arpl as a system
> instruction ("Adjust Requestor Privilege Level"). Is this really valid
> for user-space apps?

"arpl r1,r2" does the following:

"If bits 0-1 of r2 are less than same bits of r1,
set them to bits from r1 and set ZF, else clear ZF.

I doubt anyone uses this insn, but it *is* just a funky
arith op. It doesn't even check whether register values
are valid selectors.

The idea was that it helps to perform "lowering"
of selector privileges when code in comforming
code segment uses "untrusted" (caller-supplied)
selector values to access data.
Example: if CPL-2 caller gave us CPL-0 selector,
convert it to CPL-2 selector.
Some misguided guy at Intel thought it's vitally important to save
a few measly CPU cycles doing this, and created a special insn.

>> + * 0f 38-3f - 3-byte opcodes (why?? all look safe)
>
> My AMD64 reference back then (AMD64 Architecture Programmer's Manual Vol
> 3, dated September 2003) showed all those opcodes as invalid.
>
>> + * 0f 78 - vmread
>> + * 0f 79 - vmwrite
>> + * 0f 7a - undefined
>> + * 0f 7b - undefined
>> + * 0f 7c - undefined
>> + * 0f 7d - undefined
>> + * 0f a6 - undefined
>> + * 0f a7 - undefined
>> + * 0f b8 - popcnt (why?? it's an ordinary ALU op)hinhinkyky
>
> Invalid in 2003 AMD64 ref.
>
>> + * 0f d0 - undefined
>> + * 0f f0 - lddqu (why?? it's an ordinary vector load op)
>
> Invalid in 2003 AMD64 ref.

Correct. There insns are newer than 2003.

2014-07-07 20:43:03

by Jim Keniston

[permalink] [raw]

Subject: Re: [PATCH 2/3] uprobes: fix 1-byte opcode tables

I've at least verified that all your code changes are consistent with
your comments. Ditto for patch #3.

Patches 1-3:
Reviewed-by: Jim Keniston <[email protected]>

On Fri, 2014-07-04 at 15:03 +0200, Denys Vlasenko wrote:
> This change fixes 1-byte opcode tables so that only insns
> for which we have real reasons to disallow probing are marked
> with unset bits.
>
> To that end:
>
> Set bits for all prefix bytes. Their setting is ignored anyway -
> we check the bitmap against OPCODE1(insn), not against first byte.
> Keeping them set to 0 only confuses code reader with
> "why we don't support that opcode" question.
>
> Thus: enable bytes c4,c5 in 64-bit mode (VEX prefixes).
> Byte 62 (EVEX prefix) is not yet enabled since insn decoder
> does not support that yet.
>
> For 32-bit mode, enable probing of opcodes 63 (arpl) and d6 (salc).
> They don't require any special handling.
>
> For 64-bit mode, disable 9a and ea - these undefined opcodes
> were mistakenly left enabled.
>
> Signed-off-by: Denys Vlasenko <[email protected]>
> CC: Jim Keniston <[email protected]>
> CC: Masami Hiramatsu <[email protected]>
> CC: Srikar Dronamraju <[email protected]>
> CC: Ingo Molnar <[email protected]>
> CC: Oleg Nesterov <[email protected]>
> ---
> arch/x86/kernel/uprobes.c | 66 +++++++++++++----------------------------------
> 1 file changed, 18 insertions(+), 48 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 9ba2fb2..40dfb4e 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -67,18 +67,6 @@
> * to keep gcc from statically optimizing it out, as variable_test_bit makes
> * some versions of gcc to think only *(unsigned long*) is used.
> *
> - * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> - * won't report *prefixes* as OPCODE1(insn).
> - * 0f - 2-byte opcode prefix
> - * 26,2e,36,3e - es:/cs:/ss:/ds:
> - * 64 - fs: (marked as "good", why?)
> - * 65 - gs: (marked as "good", why?)
> - * 66 - operand-size prefix
> - * 67 - address-size prefix
> - * f0 - lock prefix
> - * f2 - repnz (marked as "good", why?)
> - * f3 - rep/repz (marked as "good", why?)
> - *
> * Opcodes we'll probably never support:
> * 6c-6f - ins,outs. SEGVs if used in userspace
> * e4-e7 - in,out imm. SEGVs if used in userspace
> @@ -105,31 +93,27 @@
> * Not supported since kernel's handling of userspace single-stepping
> * (TF flag) is fragile.
> * cf - iret. Normally not used in userspace. Doesn't SEGV unless arguments are bad
> - *
> - * Opcodes which can be enabled right away:
> - * 63 - arpl. This insn has no unusual exceptions (it's basically an arith op).
> - * d6 - salc. Undocumented "sign-extend carry flag to AL" insn
> */
> #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
> static volatile u32 good_insns_32[256 / 32] = {
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> /* ---------------------------------------------- */
> - W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */
> + W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1) | /* 00 */
> W(0x10, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 10 */
> - W(0x20, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) | /* 20 */
> - W(0x30, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1) , /* 30 */
> + W(0x20, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 20 */
> + W(0x30, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 30 */
> W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
> W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
> - W(0x60, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
> + W(0x60, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
> W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
> W(0x80, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
> W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
> W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
> W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
> W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
> - W(0xd0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
> + W(0xd0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
> W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
> - W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
> + W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
> /* ---------------------------------------------- */
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> };
> @@ -139,19 +123,6 @@ static volatile u32 good_insns_32[256 / 32] = {
>
> /* Good-instruction tables for 64-bit apps.
> *
> - * Prefixes. Most marked as "bad", but it doesn't matter, since insn decoder
> - * won't report *prefixes* as OPCODE1(insn).
> - * 0f - 2-byte opcode prefix
> - * 26,2e,36,3e - es:/cs:/ss:/ds:
> - * 40-4f - rex prefixes
> - * 64 - fs: (marked as "good", why?)
> - * 65 - gs: (marked as "good", why?)
> - * 66 - operand-size prefix
> - * 67 - address-size prefix
> - * f0 - lock prefix
> - * f2 - repnz (marked as "good", why?)
> - * f3 - rep/repz (marked as "good", why?)
> - *
> * Genuinely invalid opcodes:
> * 06,07 - formerly push/pop es
> * 0e - formerly push cs
> @@ -159,14 +130,13 @@ static volatile u32 good_insns_32[256 / 32] = {
> * 1e,1f - formerly push/pop ds
> * 27,2f,37,3f - formerly daa/das/aaa/aas
> * 60,61 - formerly pusha/popa
> - * 62 - formerly bound. EVEX prefix for AVX512
> + * 62 - formerly bound. EVEX prefix for AVX512 (not yet supported)
> * 82 - formerly redundant encoding of Group1
> - * 9a - formerly call seg:ofs (marked as "supported"???)
> - * c4,c5 - formerly les/lds. VEX prefixes for AVX
> + * 9a - formerly call seg:ofs
> * ce - formerly into
> * d4,d5 - formerly aam/aad
> * d6 - formerly undocumented salc
> - * ea - formerly jmp seg:ofs (marked as "supported"???)
> + * ea - formerly jmp seg:ofs
> *
> * Opcodes we'll probably never support:
> * 6c-6f - ins,outs. SEGVs if used in userspace
> @@ -190,22 +160,22 @@ static volatile u32 good_insns_32[256 / 32] = {
> static volatile u32 good_insns_64[256 / 32] = {
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> /* ---------------------------------------------- */
> - W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */
> + W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1) | /* 00 */
> W(0x10, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 10 */
> - W(0x20, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 20 */
> - W(0x30, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) , /* 30 */
> - W(0x40, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0) | /* 40 */
> + W(0x20, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 20 */
> + W(0x30, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) , /* 30 */
> + W(0x40, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 40 */
> W(0x50, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 50 */
> - W(0x60, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
> + W(0x60, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* 60 */
> W(0x70, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 70 */
> W(0x80, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* 80 */
> - W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* 90 */
> + W(0x90, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1) , /* 90 */
> W(0xa0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) | /* a0 */
> W(0xb0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* b0 */
> - W(0xc0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
> + W(0xc0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0) | /* c0 */
> W(0xd0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1) , /* d0 */
> - W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0) | /* e0 */
> - W(0xf0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
> + W(0xe0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0) | /* e0 */
> + W(0xf0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1) /* f0 */
> /* ---------------------------------------------- */
> /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */
> };

2014-07-28 15:27:06

by Oleg Nesterov

[permalink] [raw]

Subject: Re: [PATCH 2/3] uprobes: fix 1-byte opcode tables

On 07/07, Jim Keniston wrote:
>
> I've at least verified that all your code changes are consistent with
> your comments. Ditto for patch #3.
>
> Patches 1-3:
> Reviewed-by: Jim Keniston <[email protected]>

Sorry for delay.

I am going to apply this series. I'll ask to pull this later, probably
we willhave more changes.

Thanks Denys and Jim.

Oleg.