2022-09-08 01:38:24

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v2 0/2] x86/kprobes: Fixes for CONFIG_RETHUNK

Hi,

Here is the 2nd version of the patches to fix kprobes and optprobe with
CONFIG_RETHUNK and CONFIG_SLS.
Previous version is here;

https://lore.kernel.org/all/166251211081.632004.1842371136165709807.stgit@devnote2/

In this version, I updated the Fixed tag and the decoding function to
memorize all branch targets and decode those code blocks besed on
Peter's idea. (Thanks!)

With these configs, the kernel functions may includes INT3 for stopping
speculative execution in the function code block (body) in addition to
the gaps between functions.

Since kprobes on x86 has to ensure the probe address is an instruction
bondary, it decodes the instructions in the function until the address.
If it finds an INT3 which is not embedded by kprobe, it stops decoding
because usually the INT3 is used for debugging as a software breakpoint
and such INT3 will replace the first byte of an original instruction.
Without recovering it, kprobes can not continue to decode it. Thus the
kprobes returns -EILSEQ as below.


# echo "p:probe/vfs_truncate_L19 vfs_truncate+98" >> kprobe_events
sh: write error: Invalid or incomplete multibyte or wide character


Actually, those INT3s are just for padding and can be ignored.

To avoid this issue, memorize the branch target address during decoding
and if there is INT3, restart decoding from unchecked target address
so that this decodes all instructions in the kernel (in both kprobes
and optprobe.)

With thses fixes, kprobe and optprobe can probe the kernel again with
CONFIG_RETHUNK=y.


# echo "p:probe/vfs_truncate_L19 vfs_truncate+98" >> kprobe_events
# echo 1 > events/probe/vfs_truncate_L19/enable
# cat /sys/kernel/debug/kprobes/list
ffffffff81307b52 k vfs_truncate+0x62 [OPTIMIZED]


Thank you,

---

Masami Hiramatsu (Google) (2):
x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK


arch/x86/kernel/kprobes/common.h | 28 +++++
arch/x86/kernel/kprobes/core.c | 227 +++++++++++++++++++++++++++++++++-----
arch/x86/kernel/kprobes/opt.c | 96 ++++------------
3 files changed, 249 insertions(+), 102 deletions(-)

--
Masami Hiramatsu (Google) <[email protected]>


2022-09-08 01:38:40

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v2 2/2] x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK

From: Masami Hiramatsu (Google) <[email protected]>

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after function return, kprobe jump optimization
always fails on the functions with such INT3 inside the function body.
(It already checks the INT3 padding between functions, but not inside
the function)

To avoid this issue, as same as kprobes, decoding the all code blocks
in the function to check the kprobe can be optimized.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Cc: [email protected]
---
Changes in v2:
- Reuse the kprobes decoding function.
---
arch/x86/kernel/kprobes/opt.c | 73 +++++++++++++----------------------------
1 file changed, 24 insertions(+), 49 deletions(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 2e41850cab06..14f8d2c6630a 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -260,25 +260,36 @@ static int insn_is_indirect_jump(struct insn *insn)
return ret;
}

-static bool is_padding_int3(unsigned long addr, unsigned long eaddr)
+static int optimize_check_cb(struct insn *insn, void *data)
{
- unsigned char ops;
+ unsigned long paddr = (unsigned long)data;

- for (; addr < eaddr; addr++) {
- if (get_kernel_nofault(ops, (void *)addr) < 0 ||
- ops != INT3_INSN_OPCODE)
- return false;
- }
+ if (search_exception_tables((unsigned long)insn->kaddr))
+ /*
+ * Since some fixup code will jumps into this function,
+ * we can't optimize kprobe in this function.
+ */
+ return 1;
+
+ /* Check any instructions don't jump into target */
+ if (insn_is_indirect_jump(insn) ||
+ insn_jump_into_range(insn, paddr + INT3_INSN_SIZE,
+ DISP32_SIZE))
+ return 1;
+
+ /* Check whether an INT3 is involved. */
+ if (insn->opcode.bytes[0] == INT3_INSN_OPCODE &&
+ paddr <= (unsigned long)insn->kaddr &&
+ (unsigned long)insn->kaddr <= paddr + JMP32_INSN_SIZE)
+ return 1;

- return true;
+ return 0;
}

/* Decode whole function to ensure any instructions don't jump into target */
static int can_optimize(unsigned long paddr)
{
- unsigned long addr, size = 0, offset = 0;
- struct insn insn;
- kprobe_opcode_t buf[MAX_INSN_SIZE];
+ unsigned long size = 0, offset = 0;

/* Lookup symbol including addr */
if (!kallsyms_lookup_size_offset(paddr, &size, &offset))
@@ -296,44 +307,8 @@ static int can_optimize(unsigned long paddr)
if (size - offset < JMP32_INSN_SIZE)
return 0;

- /* Decode instructions */
- addr = paddr - offset;
- while (addr < paddr - offset + size) { /* Decode until function end */
- unsigned long recovered_insn;
- int ret;
-
- if (search_exception_tables(addr))
- /*
- * Since some fixup code will jumps into this function,
- * we can't optimize kprobe in this function.
- */
- return 0;
- recovered_insn = recover_probed_instruction(buf, addr);
- if (!recovered_insn)
- return 0;
-
- ret = insn_decode_kernel(&insn, (void *)recovered_insn);
- if (ret < 0)
- return 0;
-
- /*
- * In the case of detecting unknown breakpoint, this could be
- * a padding INT3 between functions. Let's check that all the
- * rest of the bytes are also INT3.
- */
- if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
- return is_padding_int3(addr, paddr - offset + size) ? 1 : 0;
-
- /* Recover address */
- insn.kaddr = (void *)addr;
- insn.next_byte = (void *)(addr + insn.length);
- /* Check any instructions don't jump into target */
- if (insn_is_indirect_jump(&insn) ||
- insn_jump_into_range(&insn, paddr + INT3_INSN_SIZE,
- DISP32_SIZE))
- return 0;
- addr += insn.length;
- }
+ if (every_insn_in_func(paddr, optimize_check_cb, (void *)paddr))
+ return 0;

return 1;
}

2022-09-08 02:02:56

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v2 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

From: Masami Hiramatsu (Google) <[email protected]>

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after RET instruction, kprobes always failes to
check the probed instruction boundary by decoding the function body if
the probed address is after such sequence. (Note that some conditional
code blocks will be placed after function return, if compiler decides
it is not on the hot path.)

This is because kprobes expects someone (e.g. kgdb) puts the INT3 as
a software breakpoint and it will replace the original instruction.
But these INT3 are not such purpose, it doesn't need to recover the
original instruction.

To avoid this issue, memorize the branch target address during decoding
and if there is INT3, restart decoding from unchecked target address.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Cc: [email protected]
---
Changes in v2:
- Use GNU AS names for mnemonic in comments
- Decode all code blocks in the function to find the instruction boundary
- Update Fixes tag to specify CONFIG_SLS.
---
arch/x86/kernel/kprobes/common.h | 28 +++++
arch/x86/kernel/kprobes/core.c | 227 +++++++++++++++++++++++++++++++++-----
arch/x86/kernel/kprobes/opt.c | 23 ----
3 files changed, 225 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kernel/kprobes/common.h b/arch/x86/kernel/kprobes/common.h
index c993521d4933..c0505e22c0db 100644
--- a/arch/x86/kernel/kprobes/common.h
+++ b/arch/x86/kernel/kprobes/common.h
@@ -92,6 +92,34 @@ extern int __copy_instruction(u8 *dest, u8 *src, u8 *real, struct insn *insn);
extern void synthesize_reljump(void *dest, void *from, void *to);
extern void synthesize_relcall(void *dest, void *from, void *to);

+/* Return the jump target address or 0 */
+static inline unsigned long insn_get_branch_addr(struct insn *insn)
+{
+ switch (insn->opcode.bytes[0]) {
+ case 0xe0: /* loopne */
+ case 0xe1: /* loope */
+ case 0xe2: /* loop */
+ case 0xe3: /* Jcxz */
+ case 0xe9: /* JMP.d32 */
+ case 0xeb: /* JMP.d8 */
+ break;
+ case 0x0f:
+ if ((insn->opcode.bytes[1] & 0xf0) == 0x80) /* Jcc.d32 */
+ break;
+ return 0;
+ case 0x70 ... 0x7f: /* Jcc.d8 */
+ break;
+
+ default:
+ return 0;
+ }
+ return (unsigned long)insn->next_byte + insn->immediate.value;
+}
+
+int every_insn_in_func(unsigned long faddr,
+ int (*callback)(struct insn *, void *),
+ void *data);
+
#ifdef CONFIG_OPTPROBES
extern int setup_detour_execution(struct kprobe *p, struct pt_regs *regs, int reenter);
extern unsigned long __recover_optprobed_insn(kprobe_opcode_t *buf, unsigned long addr);
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4c3c27b6aea3..36e8a3de8f92 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -252,47 +252,210 @@ unsigned long recover_probed_instruction(kprobe_opcode_t *buf, unsigned long add
return __recover_probed_insn(buf, addr);
}

-/* Check if paddr is at an instruction boundary */
-static int can_probe(unsigned long paddr)
+/* Code block queue */
+struct cbqueue {
+ int size;
+ int next;
+ unsigned long *addr;
+};
+#define INIT_CBQ_SIZE 32
+/* The top most bit is used for unchecked bit. */
+#define CBQ_ADDR_MASK ((-1UL) >> 1)
+#define CBQ_UNCHK_BIT (~CBQ_ADDR_MASK)
+
+static struct cbqueue *cbq_alloc(int size)
{
- unsigned long addr, __addr, offset = 0;
- struct insn insn;
- kprobe_opcode_t buf[MAX_INSN_SIZE];
+ struct cbqueue *q;

- if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
- return 0;
+ q = kzalloc(sizeof(*q), GFP_KERNEL);
+ if (!q)
+ return NULL;
+ q->size = size;
+ q->addr = kcalloc(size, sizeof(unsigned long), GFP_KERNEL);
+ if (!q->addr) {
+ kfree(q);
+ return NULL;
+ }
+ return q;
+}

- /* Decode instructions */
- addr = paddr - offset;
- while (addr < paddr) {
- int ret;
+static void cbq_free(struct cbqueue *q)
+{
+ kfree(q->addr);
+ kfree(q);
+}

- /*
- * Check if the instruction has been modified by another
- * kprobe, in which case we replace the breakpoint by the
- * original instruction in our buffer.
- * Also, jump optimization will change the breakpoint to
- * relative-jump. Since the relative-jump itself is
- * normally used, we just go through if there is no kprobe.
- */
- __addr = recover_probed_instruction(buf, addr);
- if (!__addr)
- return 0;
+static int cbq_expand(struct cbqueue *q, int newsize)
+{
+ if (q->size > newsize)
+ return -ENOSPC;

- ret = insn_decode_kernel(&insn, (void *)__addr);
- if (ret < 0)
- return 0;
+ q->addr = krealloc_array(q->addr,
+ newsize, sizeof(unsigned long), GFP_KERNEL);
+ if (!q->addr)
+ return -ENOMEM;
+ q->size = newsize;
+ return 0;
+}

- /*
- * Another debugging subsystem might insert this breakpoint.
- * In that case, we can't recover it.
- */
- if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
+/* Add a new code block address */
+static int cbq_push(struct cbqueue *q, unsigned long addr)
+{
+ int i;
+
+ /* Check the addr exists */
+ for (i = 0; i < q->next; i++)
+ if ((CBQ_ADDR_MASK & (q->addr[i] ^ addr)) == 0)
return 0;
- addr += insn.length;
+
+ if (q->next == q->size &&
+ cbq_expand(q, q->size * 2) < 0)
+ return -ENOMEM;
+
+ WARN_ON_ONCE(!(CBQ_UNCHK_BIT & addr));
+ q->addr[q->next++] = addr;
+ return 0;
+}
+
+/* Return the first unchecked code block address */
+static unsigned long cbq_pop(struct cbqueue *q)
+{
+ unsigned long addr = 0;
+ int i;
+
+ for (i = 0; i < q->next; i++) {
+ if (CBQ_UNCHK_BIT & q->addr[i]) {
+ addr = q->addr[i];
+ q->addr[i] &= CBQ_ADDR_MASK;
+ break;
+ }
+ }
+
+ return addr;
+}
+
+/* Mark the address is checked, and return true if it is already checked. */
+static bool cbq_check(struct cbqueue *q, unsigned long addr)
+{
+ int i;
+
+ for (i = 0; i < q->next; i++) {
+ if ((CBQ_ADDR_MASK & (q->addr[i] ^ addr)) == 0) {
+ if (!(CBQ_UNCHK_BIT & q->addr[i]))
+ return true;
+ q->addr[i] &= CBQ_ADDR_MASK;
+ break;
+ }
+ }
+ return false;
+}
+
+static void __decode_insn(struct insn *insn, kprobe_opcode_t *buf,
+ unsigned long addr)
+{
+ unsigned long recovered_insn;
+
+ /*
+ * Check if the instruction has been modified by another
+ * kprobe, in which case we replace the breakpoint by the
+ * original instruction in our buffer.
+ * Also, jump optimization will change the breakpoint to
+ * relative-jump. Since the relative-jump itself is
+ * normally used, we just go through if there is no kprobe.
+ */
+ recovered_insn = recover_probed_instruction(buf, addr);
+ if (!recovered_insn ||
+ insn_decode_kernel(insn, (void *)recovered_insn) < 0) {
+ insn->kaddr = NULL;
+ } else {
+ /* Recover address */
+ insn->kaddr = (void *)addr;
+ insn->next_byte = (void *)(addr + insn->length);
}
+}
+
+/* Iterate instructions in [saddr, eaddr), insn->next_byte is loop cursor. */
+#define for_each_insn(insn, saddr, eaddr, buf) \
+ for (__decode_insn(insn, buf, saddr); \
+ (insn)->kaddr && (unsigned long)(insn)->next_byte < eaddr; \
+ __decode_insn(insn, buf, (unsigned long)(insn)->next_byte))
+
+/**
+ * every_insn_in_func - iterate every instructions in the function
+ * @faddr: Address in the target function (no need to be the entry address)
+ * @callback: Callback function on each instruction
+ * @data: Data to be passed to @callback
+ *
+ * Return 0, an error (< 0), or @callback returned value.
+ * If @callback returns !0, this stops decoding and return that value.
+ * The decoding address order is not always incremental because it follows
+ * branch targets in the function.
+ */
+int every_insn_in_func(unsigned long faddr,
+ int (*callback)(struct insn *, void *),
+ void *data)
+{
+ unsigned long start, entry, end, dest, offset = 0, size = 0;
+ kprobe_opcode_t buf[MAX_INSN_SIZE];
+ struct cbqueue *q;
+ struct insn insn;
+ int ret;
+
+ ret = kallsyms_lookup_size_offset(faddr, &size, &offset);
+ if (ret < 0)
+ return ret;

- return (addr == paddr);
+ q = cbq_alloc(INIT_CBQ_SIZE);
+ if (!q)
+ return -ENOMEM;
+
+ entry = faddr - offset;
+ end = faddr - offset + size;
+ cbq_push(q, entry);
+
+ while ((start = cbq_pop(q))) {
+ for_each_insn(&insn, start, end, buf) {
+ /*
+ * If this instruction is already checked, decode
+ * other unchecked code blocks.
+ */
+ if (start != (unsigned long)insn.kaddr &&
+ cbq_check(q, (unsigned long)insn.kaddr))
+ break;
+
+ ret = callback(&insn, data);
+ if (ret)
+ goto end;
+
+ dest = insn_get_branch_addr(&insn);
+ if (entry < dest && dest < end)
+ cbq_push(q, dest);
+
+ /*
+ * Hit an INT3, which can not be decoded because it
+ * might be installed by other debug features or it
+ * just for trapping speculative execution.
+ * So, let's decode other unchecked code blocks.
+ */
+ if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
+ break;
+ }
+ }
+ ret = 0;
+end:
+ cbq_free(q);
+ return ret;
+}
+
+static int insn_boundary_cb(struct insn *insn, void *paddr)
+{
+ return insn->kaddr == paddr;
+}
+
+/* Check if paddr is at an instruction boundary */
+static int can_probe(unsigned long paddr)
+{
+ return !(every_insn_in_func(paddr, insn_boundary_cb, (void *)paddr) <= 0);
}

/* If x86 supports IBT (ENDBR) it must be skipped. */
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index e6b8c5362b94..2e41850cab06 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -235,28 +235,9 @@ static int __insn_is_indirect_jump(struct insn *insn)
/* Check whether insn jumps into specified address range */
static int insn_jump_into_range(struct insn *insn, unsigned long start, int len)
{
- unsigned long target = 0;
-
- switch (insn->opcode.bytes[0]) {
- case 0xe0: /* loopne */
- case 0xe1: /* loope */
- case 0xe2: /* loop */
- case 0xe3: /* jcxz */
- case 0xe9: /* near relative jump */
- case 0xeb: /* short relative jump */
- break;
- case 0x0f:
- if ((insn->opcode.bytes[1] & 0xf0) == 0x80) /* jcc near */
- break;
- return 0;
- default:
- if ((insn->opcode.bytes[0] & 0xf0) == 0x70) /* jcc short */
- break;
- return 0;
- }
- target = (unsigned long)insn->next_byte + insn->immediate.value;
+ unsigned long target = insn_get_branch_addr(insn);

- return (start <= target && target <= start + len);
+ return target ? (start <= target && target <= start + len) : 0;
}

static int insn_is_indirect_jump(struct insn *insn)

2022-09-08 05:42:08

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

On Thu, Sep 08, 2022 at 10:34:43AM +0900, Masami Hiramatsu (Google) wrote:
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
> speculative execution after RET instruction, kprobes always failes to
> check the probed instruction boundary by decoding the function body if
> the probed address is after such sequence. (Note that some conditional
> code blocks will be placed after function return, if compiler decides
> it is not on the hot path.)
>
> This is because kprobes expects someone (e.g. kgdb) puts the INT3 as
> a software breakpoint and it will replace the original instruction.
> But these INT3 are not such purpose, it doesn't need to recover the
> original instruction.
>
> To avoid this issue, memorize the branch target address during decoding
> and if there is INT3, restart decoding from unchecked target address.

Hm, is kprobes conflicting with kgdb actually a realistic concern?
Seems like a dangerous combination

Either way, this feels overengineered. Sort of like implementing
objtool in the kernel.

And it's incomplete: for a switch statement jump table (or C goto jump
table like in BPF), you can't detect the potential targets of the
indirect branch.

Wouldn't it be much simpler to just encode the knowledge that

if (CONFIG_RETHUNK && !X86_FEATURE_RETHUNK)
// all rets are followed by four INT3s
else if (CONFIG_SLS)
// all rets are followed by one INT3

?

--
Josh

2022-09-08 09:41:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

On Wed, Sep 07, 2022 at 10:08:55PM -0700, Josh Poimboeuf wrote:
> On Thu, Sep 08, 2022 at 10:34:43AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
> > speculative execution after RET instruction, kprobes always failes to
> > check the probed instruction boundary by decoding the function body if
> > the probed address is after such sequence. (Note that some conditional
> > code blocks will be placed after function return, if compiler decides
> > it is not on the hot path.)
> >
> > This is because kprobes expects someone (e.g. kgdb) puts the INT3 as
> > a software breakpoint and it will replace the original instruction.
> > But these INT3 are not such purpose, it doesn't need to recover the
> > original instruction.
> >
> > To avoid this issue, memorize the branch target address during decoding
> > and if there is INT3, restart decoding from unchecked target address.
>
> Hm, is kprobes conflicting with kgdb actually a realistic concern?
> Seems like a dangerous combination

Yeah, not in my book. If you use kgdb you'd better be ready to hold
pieces anyway, that thing is terrible.

> Wouldn't it be much simpler to just encode the knowledge that
>
> if (CONFIG_RETHUNK && !X86_FEATURE_RETHUNK)
> // all rets are followed by four INT3s
> else if (CONFIG_SLS)
> // all rets are followed by one INT3

At the very least that doesn't deal with the INT3 after JMP thing the
compilers should do once they catch up. This issue seems to keep getting
lost, but is now part of the AMD Branch Type Confusion (it was disclosed
in an earlier thing, but I keep forgetting what that was called).

Once that lands the rules are:

0-5 INT3 after RET, !CONFIG_RETHUNK && !CONFIG_SLS: 0
CONFIG_SLS: 1
CONFIG_RETHUNK: 4-5 depending on compiler version

0-1 INT3 after RET: !CONFIG_SLS: 0
CONFIG_SLS: 0-1 depending on compiler version

Now, given we know the compiler version at build time, this could be
determined and used in kprobes, but meh.

We also *should* put an INT3 after indirect jumps when patching the
retpolines. We don't appear to do so, but that's recommended even for
Intel I think.

Let me go do a patch.

2022-09-08 10:39:07

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Thu, Sep 08, 2022 at 11:30:41AM +0200, Peter Zijlstra wrote:
> Let me go do a patch.

---
Subject: x86,retpoline: Be sure to emit INT3 after JMP *%\reg

Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
to emit one when rewriting the retpoline JMP irrespective of compiler
SLS options or even CONFIG_SLS.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/kernel/alternative.c | 9 +++++++++
arch/x86/net/bpf_jit_comp.c | 3 ++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 62f6b8b7c4a5..68d84cf8e001 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -453,6 +453,15 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
return ret;
i += ret;

+ /*
+ * The compiler is supposed to EMIT an INT3 after every unconditional
+ * JMP instruction due to AMD BTC. However, if the compiler is too old
+ * or SLS isn't enabled, we still need an INT3 after indirect JMPs
+ * even on Intel.
+ */
+ if (op == JMP32_INSN_OPCODE && i < insn->length)
+ bytes[i++] = INT3_INSN_OPCODE;
+
for (; i < insn->length;)
bytes[i++] = BYTES_NOP1;

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c1f6c1c51d99..37f821dee68f 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -419,7 +419,8 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
OPTIMIZER_HIDE_VAR(reg);
emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
} else {
- EMIT2(0xFF, 0xE0 + reg);
+ EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
+ EMIT1(0xCC); /* int3 */
}

*pprog = prog;

2022-09-08 10:53:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

On Thu, Sep 08, 2022 at 11:30:41AM +0200, Peter Zijlstra wrote:
> Once that lands the rules are:
>
> 0-5 INT3 after RET, !CONFIG_RETHUNK && !CONFIG_SLS: 0
> CONFIG_SLS: 1
> CONFIG_RETHUNK: 4-5 depending on compiler version
>
> 0-1 INT3 after RET: !CONFIG_SLS: 0
> CONFIG_SLS: 0-1 depending on compiler version
>
> Now, given we know the compiler version at build time, this could be
> determined and used in kprobes, but meh.

Oh also, for giggles, we have a number of sites that have ret;int3
independent of CONFIG_SLS because that was easier than figuring out what
all should be done.

2022-09-08 13:40:33

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

On Wed, 7 Sep 2022 22:08:55 -0700
Josh Poimboeuf <[email protected]> wrote:

> On Thu, Sep 08, 2022 at 10:34:43AM +0900, Masami Hiramatsu (Google) wrote:
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
> > speculative execution after RET instruction, kprobes always failes to
> > check the probed instruction boundary by decoding the function body if
> > the probed address is after such sequence. (Note that some conditional
> > code blocks will be placed after function return, if compiler decides
> > it is not on the hot path.)
> >
> > This is because kprobes expects someone (e.g. kgdb) puts the INT3 as
> > a software breakpoint and it will replace the original instruction.
> > But these INT3 are not such purpose, it doesn't need to recover the
> > original instruction.
> >
> > To avoid this issue, memorize the branch target address during decoding
> > and if there is INT3, restart decoding from unchecked target address.
>
> Hm, is kprobes conflicting with kgdb actually a realistic concern?
> Seems like a dangerous combination

I'm actually not sure, I don't recommend it. But it is safe just having
fail-safe.

>
> Either way, this feels overengineered. Sort of like implementing
> objtool in the kernel.
>
> And it's incomplete: for a switch statement jump table (or C goto jump
> table like in BPF), you can't detect the potential targets of the
> indirect branch.

In that case, it just fails to detect instruction boundary (and anyway
optprobe just stops optimization if it finds the indirect jump). So it
is still fail safe.

>
> Wouldn't it be much simpler to just encode the knowledge that
>
> if (CONFIG_RETHUNK && !X86_FEATURE_RETHUNK)
> // all rets are followed by four INT3s
> else if (CONFIG_SLS)
> // all rets are followed by one INT3

Maybe we should just ask kgdb if it is using breakpoint on that
function, and if so, just reject kprobe on it. Then, all INT3
can be just skipped. That may be more realistic solution.

Thank you,

--
Masami Hiramatsu (Google) <[email protected]>

2022-09-08 14:18:41

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Thu, Sep 8, 2022 at 3:07 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Sep 08, 2022 at 11:30:41AM +0200, Peter Zijlstra wrote:
> > Let me go do a patch.
>
> ---
> Subject: x86,retpoline: Be sure to emit INT3 after JMP *%\reg
>
> Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
> to emit one when rewriting the retpoline JMP irrespective of compiler
> SLS options or even CONFIG_SLS.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> arch/x86/kernel/alternative.c | 9 +++++++++
> arch/x86/net/bpf_jit_comp.c | 3 ++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 62f6b8b7c4a5..68d84cf8e001 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -453,6 +453,15 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
> return ret;
> i += ret;
>
> + /*
> + * The compiler is supposed to EMIT an INT3 after every unconditional
> + * JMP instruction due to AMD BTC. However, if the compiler is too old
> + * or SLS isn't enabled, we still need an INT3 after indirect JMPs
> + * even on Intel.
> + */
> + if (op == JMP32_INSN_OPCODE && i < insn->length)
> + bytes[i++] = INT3_INSN_OPCODE;
> +
> for (; i < insn->length;)
> bytes[i++] = BYTES_NOP1;
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index c1f6c1c51d99..37f821dee68f 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -419,7 +419,8 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> OPTIMIZER_HIDE_VAR(reg);
> emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> } else {
> - EMIT2(0xFF, 0xE0 + reg);
> + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> + EMIT1(0xCC); /* int3 */

Hmm. Why is this unconditional?
Shouldn't it be guarded with CONFIG_xx or cpu_feature_enabled ?
People that don't care about hw speculation vulnerabilities
shouldn't pay the price of increased code size.

2022-09-08 15:14:38

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v3 0/2] x86/kprobes: Fixes for CONFIG_RETHUNK

Hi Peter and Josh,

So here is 3rd version of the patches to fix kprobes and optprobe with
CONFIG_RETHUNK and CONFIG_SLS.
Previous version is here;

https://lore.kernel.org/all/166260087224.759381.4170102827490658262.stgit@devnote2/

In this version, I simplified all code and just checks the INT3 comes
from kgdb or not. Other INT3 are treated as one-byte instruction.

With CONFIG_RETHUNK/CONFIG_SLS, the kernel functions may includes INT3
for stopping speculative execution in the function code block (body) in
addition to the gaps between functions.

Since kprobes on x86 has to ensure the probe address is an instruction
bondary, it decodes the instructions in the function until the address.
If it finds an INT3 which is not embedded by kprobe, it stops decoding
because usually the INT3 is used for debugging as a software breakpoint
and such INT3 will replace the first byte of an original instruction.
Without recovering it, kprobes can not continue to decode it. Thus the
kprobes returns -EILSEQ as below.


# echo "p:probe/vfs_truncate_L19 vfs_truncate+98" >> kprobe_events
sh: write error: Invalid or incomplete multibyte or wide character


Actually, such INT3 can be ignored except the INT3 installed by kgdb.

To avoid this issue, just check whether the INT3 is owned by kgdb
or not and if so, it just stopped and return failure.

With thses fixes, kprobe and optprobe can probe the kernel again with
CONFIG_RETHUNK=y.


# echo "p:probe/vfs_truncate_L19 vfs_truncate+98" >> kprobe_events
# echo 1 > events/probe/vfs_truncate_L19/enable
# cat /sys/kernel/debug/kprobes/list
ffffffff81307b52 k vfs_truncate+0x62 [OPTIMIZED]


Thank you,

---

Masami Hiramatsu (Google) (2):
x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK


arch/x86/kernel/kprobes/core.c | 10 +++++++---
arch/x86/kernel/kprobes/opt.c | 28 ++++++++--------------------
2 files changed, 15 insertions(+), 23 deletions(-)

--
Masami Hiramatsu (Google) <[email protected]>

2022-09-08 15:14:50

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v3 2/2] x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK

From: Masami Hiramatsu (Google) <[email protected]>

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after function return, kprobe jump optimization
always fails on the functions with such INT3 inside the function body.
(It already checks the INT3 padding between functions, but not inside
the function)

To avoid this issue, as same as kprobes, check whether the INT3 comes
from kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Cc: [email protected]
---
arch/x86/kernel/kprobes/opt.c | 28 ++++++++--------------------
1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index e6b8c5362b94..e57e07b0edb6 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -15,6 +15,7 @@
#include <linux/extable.h>
#include <linux/kdebug.h>
#include <linux/kallsyms.h>
+#include <linux/kgdb.h>
#include <linux/ftrace.h>
#include <linux/objtool.h>
#include <linux/pgtable.h>
@@ -279,19 +280,6 @@ static int insn_is_indirect_jump(struct insn *insn)
return ret;
}

-static bool is_padding_int3(unsigned long addr, unsigned long eaddr)
-{
- unsigned char ops;
-
- for (; addr < eaddr; addr++) {
- if (get_kernel_nofault(ops, (void *)addr) < 0 ||
- ops != INT3_INSN_OPCODE)
- return false;
- }
-
- return true;
-}
-
/* Decode whole function to ensure any instructions don't jump into target */
static int can_optimize(unsigned long paddr)
{
@@ -334,15 +322,15 @@ static int can_optimize(unsigned long paddr)
ret = insn_decode_kernel(&insn, (void *)recovered_insn);
if (ret < 0)
return 0;
-
+#ifdef CONFIG_KGDB
/*
- * In the case of detecting unknown breakpoint, this could be
- * a padding INT3 between functions. Let's check that all the
- * rest of the bytes are also INT3.
+ * If there is a dynamically installed kgdb sw breakpoint,
+ * this function should not be probed.
*/
- if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
- return is_padding_int3(addr, paddr - offset + size) ? 1 : 0;
-
+ if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
+ kgdb_has_hit_break(addr))
+ return 0;
+#endif
/* Recover address */
insn.kaddr = (void *)addr;
insn.next_byte = (void *)(addr + insn.length);

2022-09-08 15:17:17

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH v3 1/2] x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK

From: Masami Hiramatsu (Google) <[email protected]>

Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
speculative execution after RET instruction, kprobes always failes to
check the probed instruction boundary by decoding the function body if
the probed address is after such sequence. (Note that some conditional
code blocks will be placed after function return, if compiler decides
it is not on the hot path.)

This is because kprobes expects kgdb puts the INT3 as a software
breakpoint and it will replace the original instruction.
But these INT3 are not such purpose, it doesn't need to recover the
original instruction.

To avoid this issue, kprobes checks whether the INT3 is owned by
kgdb or not, and if so, stop decoding and make it fail. The other
INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
treated as a one-byte instruction.

Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
Suggested-by: Peter Zijlstra <[email protected]>
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
Cc: [email protected]
---
arch/x86/kernel/kprobes/core.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 4c3c27b6aea3..c6dd7ae68c8f 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -37,6 +37,7 @@
#include <linux/extable.h>
#include <linux/kdebug.h>
#include <linux/kallsyms.h>
+#include <linux/kgdb.h>
#include <linux/ftrace.h>
#include <linux/kasan.h>
#include <linux/moduleloader.h>
@@ -283,12 +284,15 @@ static int can_probe(unsigned long paddr)
if (ret < 0)
return 0;

+#ifdef CONFIG_KGDB
/*
- * Another debugging subsystem might insert this breakpoint.
- * In that case, we can't recover it.
+ * If there is a dynamically installed kgdb sw breakpoint,
+ * this function should not be probed.
*/
- if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
+ if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
+ kgdb_has_hit_break(addr))
return 0;
+#endif
addr += insn.length;
}


2022-09-08 19:36:53

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH v3 0/2] x86/kprobes: Fixes for CONFIG_RETHUNK

On Fri, Sep 09, 2022 at 12:01:11AM +0900, Masami Hiramatsu (Google) wrote:
> Hi Peter and Josh,
>
> So here is 3rd version of the patches to fix kprobes and optprobe with
> CONFIG_RETHUNK and CONFIG_SLS.
> Previous version is here;
>
> https://lore.kernel.org/all/166260087224.759381.4170102827490658262.stgit@devnote2/
>
> In this version, I simplified all code and just checks the INT3 comes
> from kgdb or not. Other INT3 are treated as one-byte instruction.

Looks good to me.

I was confused by the naming of kgdb_has_hit_break(), because in this
case, with the function being called from outside the stopped kgdb
context, the breakpoint hasn't actually been hit. But it still seems to
do the right thing: it checks for BP_ACTIVE which means the breakpoint
has been written.

Reviewed-by: Josh Poimboeuf <[email protected]>

--
Josh

2022-09-09 08:38:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Thu, Sep 08, 2022 at 07:01:12AM -0700, Alexei Starovoitov wrote:

> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index c1f6c1c51d99..37f821dee68f 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -419,7 +419,8 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> > OPTIMIZER_HIDE_VAR(reg);
> > emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> > } else {
> > - EMIT2(0xFF, 0xE0 + reg);
> > + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> > + EMIT1(0xCC); /* int3 */
>
> Hmm. Why is this unconditional?
> Shouldn't it be guarded with CONFIG_xx or cpu_feature_enabled ?
> People that don't care about hw speculation vulnerabilities
> shouldn't pay the price of increased code size.

Sure, like so then?

---
Subject: x86,retpoline: Be sure to emit INT3 after JMP *%\reg
From: Peter Zijlstra <[email protected]>
Date: Thu, 8 Sep 2022 12:04:50 +0200

Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
to emit one when rewriting the retpoline JMP irrespective of compiler
SLS options or even CONFIG_SLS.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---

arch/x86/kernel/alternative.c | 9 +++++++++
arch/x86/net/bpf_jit_comp.c | 4 +++-
2 files changed, 12 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -453,6 +453,15 @@ static int patch_retpoline(void *addr, s
return ret;
i += ret;

+ /*
+ * The compiler is supposed to EMIT an INT3 after every unconditional
+ * JMP instruction due to AMD BTC. However, if the compiler is too old
+ * or SLS isn't enabled, we still need an INT3 after indirect JMPs
+ * even on Intel.
+ */
+ if (op == JMP32_INSN_OPCODE && i < insn->length)
+ bytes[i++] = INT3_INSN_OPCODE;
+
for (; i < insn->length;)
bytes[i++] = BYTES_NOP1;

--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -419,7 +419,9 @@ static void emit_indirect_jump(u8 **ppro
OPTIMIZER_HIDE_VAR(reg);
emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
} else {
- EMIT2(0xFF, 0xE0 + reg);
+ EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
+ if (IS_ENABLED(CONFIG_RETPOLINE) || IS_ENABLED(CONFIG_SLS))
+ EMIT1(0xCC); /* int3 */
}

*pprog = prog;

2022-09-09 14:28:39

by Alexei Starovoitov

[permalink] [raw]
Subject: Re: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Fri, Sep 9, 2022 at 1:16 AM Peter Zijlstra <[email protected]> wrote:
>
> On Thu, Sep 08, 2022 at 07:01:12AM -0700, Alexei Starovoitov wrote:
>
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index c1f6c1c51d99..37f821dee68f 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -419,7 +419,8 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
> > > OPTIMIZER_HIDE_VAR(reg);
> > > emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> > > } else {
> > > - EMIT2(0xFF, 0xE0 + reg);
> > > + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> > > + EMIT1(0xCC); /* int3 */
> >
> > Hmm. Why is this unconditional?
> > Shouldn't it be guarded with CONFIG_xx or cpu_feature_enabled ?
> > People that don't care about hw speculation vulnerabilities
> > shouldn't pay the price of increased code size.
>
> Sure, like so then?
>
> ---
> Subject: x86,retpoline: Be sure to emit INT3 after JMP *%\reg
> From: Peter Zijlstra <[email protected]>
> Date: Thu, 8 Sep 2022 12:04:50 +0200
>
> Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
> to emit one when rewriting the retpoline JMP irrespective of compiler
> SLS options or even CONFIG_SLS.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
>
> arch/x86/kernel/alternative.c | 9 +++++++++
> arch/x86/net/bpf_jit_comp.c | 4 +++-
> 2 files changed, 12 insertions(+), 1 deletion(-)
>
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -453,6 +453,15 @@ static int patch_retpoline(void *addr, s
> return ret;
> i += ret;
>
> + /*
> + * The compiler is supposed to EMIT an INT3 after every unconditional
> + * JMP instruction due to AMD BTC. However, if the compiler is too old
> + * or SLS isn't enabled, we still need an INT3 after indirect JMPs
> + * even on Intel.
> + */
> + if (op == JMP32_INSN_OPCODE && i < insn->length)
> + bytes[i++] = INT3_INSN_OPCODE;
> +
> for (; i < insn->length;)
> bytes[i++] = BYTES_NOP1;
>
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -419,7 +419,9 @@ static void emit_indirect_jump(u8 **ppro
> OPTIMIZER_HIDE_VAR(reg);
> emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> } else {
> - EMIT2(0xFF, 0xE0 + reg);
> + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> + if (IS_ENABLED(CONFIG_RETPOLINE) || IS_ENABLED(CONFIG_SLS))
> + EMIT1(0xCC); /* int3 */

Looks better. Ack.

2022-09-09 17:19:50

by Josh Poimboeuf

[permalink] [raw]
Subject: Re: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Fri, Sep 09, 2022 at 10:16:13AM +0200, Peter Zijlstra wrote:
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -419,7 +419,9 @@ static void emit_indirect_jump(u8 **ppro
> OPTIMIZER_HIDE_VAR(reg);
> emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> } else {
> - EMIT2(0xFF, 0xE0 + reg);
> + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> + if (IS_ENABLED(CONFIG_RETPOLINE) || IS_ENABLED(CONFIG_SLS))
> + EMIT1(0xCC); /* int3 */
> }

Hm, if you have retpolines disabled at runtime, why would you want this.

--
Josh

2022-09-11 15:41:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

On Fri, Sep 09, 2022 at 09:48:09AM -0700, Josh Poimboeuf wrote:
> On Fri, Sep 09, 2022 at 10:16:13AM +0200, Peter Zijlstra wrote:
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -419,7 +419,9 @@ static void emit_indirect_jump(u8 **ppro
> > OPTIMIZER_HIDE_VAR(reg);
> > emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
> > } else {
> > - EMIT2(0xFF, 0xE0 + reg);
> > + EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
> > + if (IS_ENABLED(CONFIG_RETPOLINE) || IS_ENABLED(CONFIG_SLS))
> > + EMIT1(0xCC); /* int3 */
> > }
>
> Hm, if you have retpolines disabled at runtime, why would you want this.

Because I don't think eIBRS guarantees it will not SLS.

Subject: [tip: x86/core] x86,retpoline: Be sure to emit INT3 after JMP *%\reg

The following commit has been merged into the x86/core branch of tip:

Commit-ID: 8c03af3e090e9d57d90f482d344563dd4bae1e66
Gitweb: https://git.kernel.org/tip/8c03af3e090e9d57d90f482d344563dd4bae1e66
Author: Peter Zijlstra <[email protected]>
AuthorDate: Thu, 08 Sep 2022 12:04:50 +02:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Thu, 15 Sep 2022 16:13:53 +02:00

x86,retpoline: Be sure to emit INT3 after JMP *%\reg

Both AMD and Intel recommend using INT3 after an indirect JMP. Make sure
to emit one when rewriting the retpoline JMP irrespective of compiler
SLS options or even CONFIG_SLS.

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/kernel/alternative.c | 9 +++++++++
arch/x86/net/bpf_jit_comp.c | 4 +++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 62f6b8b..68d84cf 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -453,6 +453,15 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes)
return ret;
i += ret;

+ /*
+ * The compiler is supposed to EMIT an INT3 after every unconditional
+ * JMP instruction due to AMD BTC. However, if the compiler is too old
+ * or SLS isn't enabled, we still need an INT3 after indirect JMPs
+ * even on Intel.
+ */
+ if (op == JMP32_INSN_OPCODE && i < insn->length)
+ bytes[i++] = INT3_INSN_OPCODE;
+
for (; i < insn->length;)
bytes[i++] = BYTES_NOP1;

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index c1f6c1c..4922517 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -419,7 +419,9 @@ static void emit_indirect_jump(u8 **pprog, int reg, u8 *ip)
OPTIMIZER_HIDE_VAR(reg);
emit_jump(&prog, &__x86_indirect_thunk_array[reg], ip);
} else {
- EMIT2(0xFF, 0xE0 + reg);
+ EMIT2(0xFF, 0xE0 + reg); /* jmp *%\reg */
+ if (IS_ENABLED(CONFIG_RETPOLINE) || IS_ENABLED(CONFIG_SLS))
+ EMIT1(0xCC); /* int3 */
}

*pprog = prog;

2022-12-15 03:40:40

by Nadav Amit

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK


> On Sep 8, 2022, at 8:01 AM, Masami Hiramatsu (Google) <[email protected]> wrote:
>
> From: Masami Hiramatsu (Google) <[email protected]>
>
> Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
> speculative execution after function return, kprobe jump optimization
> always fails on the functions with such INT3 inside the function body.
> (It already checks the INT3 padding between functions, but not inside
> the function)
>
> To avoid this issue, as same as kprobes, check whether the INT3 comes
> from kgdb or not, and if so, stop decoding and make it fail. The other
> INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
> treated as a one-byte instruction.
>
> Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> Suggested-by: Peter Zijlstra <[email protected]>
> Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
> Cc: [email protected]
> ---
> arch/x86/kernel/kprobes/opt.c | 28 ++++++++--------------------
> 1 file changed, 8 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> index e6b8c5362b94..e57e07b0edb6 100644
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -15,6 +15,7 @@
> #include <linux/extable.h>
> #include <linux/kdebug.h>
> #include <linux/kallsyms.h>
> +#include <linux/kgdb.h>
> #include <linux/ftrace.h>
> #include <linux/objtool.h>
> #include <linux/pgtable.h>
> @@ -279,19 +280,6 @@ static int insn_is_indirect_jump(struct insn *insn)
> return ret;
> }
>
> -static bool is_padding_int3(unsigned long addr, unsigned long eaddr)
> -{
> - unsigned char ops;
> -
> - for (; addr < eaddr; addr++) {
> - if (get_kernel_nofault(ops, (void *)addr) < 0 ||
> - ops != INT3_INSN_OPCODE)
> - return false;
> - }
> -
> - return true;
> -}
> -
> /* Decode whole function to ensure any instructions don't jump into target */
> static int can_optimize(unsigned long paddr)
> {
> @@ -334,15 +322,15 @@ static int can_optimize(unsigned long paddr)
> ret = insn_decode_kernel(&insn, (void *)recovered_insn);
> if (ret < 0)
> return 0;
> -
> +#ifdef CONFIG_KGDB
> /*
> - * In the case of detecting unknown breakpoint, this could be
> - * a padding INT3 between functions. Let's check that all the
> - * rest of the bytes are also INT3.
> + * If there is a dynamically installed kgdb sw breakpoint,
> + * this function should not be probed.
> */
> - if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
> - return is_padding_int3(addr, paddr - offset + size) ? 1 : 0;
> -
> + if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
> + kgdb_has_hit_break(addr))
> + return 0;
> +#endif
> /* Recover address */
> insn.kaddr = (void *)addr;
> insn.next_byte = (void *)(addr + insn.length);

Hi Masami,

I encountered a similar issue with can_probe(). I see that your
patches were not upstreamed, at least to 6.1.

So I was wondering whether it they are going to be upstreamed, and
whether you want also to update can_probe().

Thanks,
Nadav

2022-12-18 15:03:51

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH v3 2/2] x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK

On Thu, 15 Dec 2022 03:31:25 +0000
Nadav Amit <[email protected]> wrote:

>
> > On Sep 8, 2022, at 8:01 AM, Masami Hiramatsu (Google) <[email protected]> wrote:
> >
> > From: Masami Hiramatsu (Google) <[email protected]>
> >
> > Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping
> > speculative execution after function return, kprobe jump optimization
> > always fails on the functions with such INT3 inside the function body.
> > (It already checks the INT3 padding between functions, but not inside
> > the function)
> >
> > To avoid this issue, as same as kprobes, check whether the INT3 comes
> > from kgdb or not, and if so, stop decoding and make it fail. The other
> > INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be
> > treated as a one-byte instruction.
> >
> > Signed-off-by: Masami Hiramatsu (Google) <[email protected]>
> > Suggested-by: Peter Zijlstra <[email protected]>
> > Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation")
> > Cc: [email protected]
> > ---
> > arch/x86/kernel/kprobes/opt.c | 28 ++++++++--------------------
> > 1 file changed, 8 insertions(+), 20 deletions(-)
> >
> > diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
> > index e6b8c5362b94..e57e07b0edb6 100644
> > --- a/arch/x86/kernel/kprobes/opt.c
> > +++ b/arch/x86/kernel/kprobes/opt.c
> > @@ -15,6 +15,7 @@
> > #include <linux/extable.h>
> > #include <linux/kdebug.h>
> > #include <linux/kallsyms.h>
> > +#include <linux/kgdb.h>
> > #include <linux/ftrace.h>
> > #include <linux/objtool.h>
> > #include <linux/pgtable.h>
> > @@ -279,19 +280,6 @@ static int insn_is_indirect_jump(struct insn *insn)
> > return ret;
> > }
> >
> > -static bool is_padding_int3(unsigned long addr, unsigned long eaddr)
> > -{
> > - unsigned char ops;
> > -
> > - for (; addr < eaddr; addr++) {
> > - if (get_kernel_nofault(ops, (void *)addr) < 0 ||
> > - ops != INT3_INSN_OPCODE)
> > - return false;
> > - }
> > -
> > - return true;
> > -}
> > -
> > /* Decode whole function to ensure any instructions don't jump into target */
> > static int can_optimize(unsigned long paddr)
> > {
> > @@ -334,15 +322,15 @@ static int can_optimize(unsigned long paddr)
> > ret = insn_decode_kernel(&insn, (void *)recovered_insn);
> > if (ret < 0)
> > return 0;
> > -
> > +#ifdef CONFIG_KGDB
> > /*
> > - * In the case of detecting unknown breakpoint, this could be
> > - * a padding INT3 between functions. Let's check that all the
> > - * rest of the bytes are also INT3.
> > + * If there is a dynamically installed kgdb sw breakpoint,
> > + * this function should not be probed.
> > */
> > - if (insn.opcode.bytes[0] == INT3_INSN_OPCODE)
> > - return is_padding_int3(addr, paddr - offset + size) ? 1 : 0;
> > -
> > + if (insn.opcode.bytes[0] == INT3_INSN_OPCODE &&
> > + kgdb_has_hit_break(addr))
> > + return 0;
> > +#endif
> > /* Recover address */
> > insn.kaddr = (void *)addr;
> > insn.next_byte = (void *)(addr + insn.length);
>
> Hi Masami,
>
> I encountered a similar issue with can_probe(). I see that your
> patches were not upstreamed, at least to 6.1.

Oops, I should update and resend the series. This patch must go
through x86 (tip) tree since it is x86 specific issue. Moreover,
this is a kind of bugfix because kprobes doesn't work correctly
with this issue.

>
> So I was wondering whether it they are going to be upstreamed, and
> whether you want also to update can_probe().

Yeah, I want to push it.

Thank you!


>
> Thanks,
> Nadav

--
Masami Hiramatsu (Google) <[email protected]>