2014-04-04 18:50:57

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 0/9] uprobes/x86: preparations to fix the reprel jmp/call handling.

Hello.

Changes:

- "Conditionalize the usage of handle_riprel_insn()" is moved after
"Introduce uprobe_xol_ops and arch_uprobe->ops" to make the reason
more clear, update the changelog.

Masami, I hope this addresses your comments? I would be happy to
add your ack into 6/9 and 7/9 ;)

- update the comment in 4/9, pointed out by Jim.

- fix some typos in the changelogs/naming

- 8-9 are new (the same I sent yesterday)

Since the changes in 1-7 are purely cosmetic, I preserved the acks I got.

Srikar, you acked 1, 2, 3, and 5 but didn't comment other changes. Please
let me know if I missed your emails. Do you agree with this series or you
have some concerns?

Now let me send the draft RFC patch which fixes the "call" handling...

Oleg.

arch/x86/include/asm/uprobes.h | 7 +-
arch/x86/kernel/uprobes.c | 360 ++++++++++++++++++++--------------------
kernel/events/uprobes.c | 31 +---
3 files changed, 199 insertions(+), 199 deletions(-)


2014-04-04 18:51:16

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 1/9] uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep()

UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
not only it doesn't help, it harms.

It can only help to avoid arch_uprobe_skip_sstep() if it was already
called before and failed. But this is ugly, if we want to know whether
we can emulate this instruction or not we should do this analysis in
arch_uprobe_analyze_insn(), not when we hit this probe for the first
time.

And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
fail or not depending on the task/register state, if this insn can be
emulated but, say, put_user() fails we need to xol it this time, but
this doesn't mean we shouldn't try to emulate it when this or another
thread hits this bp next time.

And this is the actual reason for this change. We need to emulate the
"call" insn, but push(return-address) can obviously fail.

Per-arch notes:

x86: __skip_sstep() can only emulate "rep;nop". With this
change it will be called every time and most probably
for no reason.

This will be fixed by the next changes. We need to
change this suboptimal code anyway.

arm: Should not be affected. It has its own "bool simulate"
flag checked in arch_uprobe_skip_sstep().

ppc: Looks like, it can emulate almost everything. Does it
actually need to record the fact that emulate_step()
failed? Hopefully not. But if yes, it can add the ppc-
specific flag into arch_uprobe.

TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Reviewed-by: David A. Long <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
---
kernel/events/uprobes.c | 23 ++---------------------
1 files changed, 2 insertions(+), 21 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 307d87c..7a3e14e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -60,8 +60,6 @@ static struct percpu_rw_semaphore dup_mmap_sem;

/* Have a copy of original instruction */
#define UPROBE_COPY_INSN 0
-/* Can skip singlestep */
-#define UPROBE_SKIP_SSTEP 1

struct uprobe {
struct rb_node rb_node; /* node in the rb tree */
@@ -491,12 +489,9 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset)
uprobe->offset = offset;
init_rwsem(&uprobe->register_rwsem);
init_rwsem(&uprobe->consumer_rwsem);
- /* For now assume that the instruction need not be single-stepped */
- __set_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);

/* add to uprobes_tree, sorted on inode:offset */
cur_uprobe = insert_uprobe(uprobe);
-
/* a uprobe exists for this inode:offset combination */
if (cur_uprobe) {
kfree(uprobe);
@@ -1628,20 +1623,6 @@ bool uprobe_deny_signal(void)
return true;
}

-/*
- * Avoid singlestepping the original instruction if the original instruction
- * is a NOP or can be emulated.
- */
-static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs)
-{
- if (test_bit(UPROBE_SKIP_SSTEP, &uprobe->flags)) {
- if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
- return true;
- clear_bit(UPROBE_SKIP_SSTEP, &uprobe->flags);
- }
- return false;
-}
-
static void mmf_recalc_uprobes(struct mm_struct *mm)
{
struct vm_area_struct *vma;
@@ -1859,13 +1840,13 @@ static void handle_swbp(struct pt_regs *regs)
goto out;

handler_chain(uprobe, regs);
- if (can_skip_sstep(uprobe, regs))
+ if (arch_uprobe_skip_sstep(&uprobe->arch, regs))
goto out;

if (!pre_ssout(uprobe, regs, bp_vaddr))
return;

- /* can_skip_sstep() succeeded, or restart if can't singlestep */
+ /* arch_uprobe_skip_sstep() succeeded, or restart if can't singlestep */
out:
put_uprobe(uprobe);
}
--
1.5.5.1

2014-04-04 18:51:22

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 3/9] uprobes/x86: Kill the "ia32_compat" check in handle_riprel_insn(), remove "mm" arg

Kill the "mm->context.ia32_compat" check in handle_riprel_insn(), if
it is true insn_rip_relative() must return false. validate_insn_bits()
passed "ia32_compat" as !x86_64 to insn_init(), and insn_rip_relative()
checks insn->x86_64.

Also, remove the no longer needed "struct mm_struct *mm" argument and
the unnecessary "return" at the end.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
---
arch/x86/kernel/uprobes.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 098e56e..963c121 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -253,14 +253,11 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
* - The displacement is always 4 bytes.
*/
static void
-handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
+handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
{
u8 *cursor;
u8 reg;

- if (mm->context.ia32_compat)
- return;
-
if (!insn_rip_relative(insn))
return;

@@ -314,7 +311,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
cursor++;
memmove(cursor, cursor + insn->displacement.nbytes, insn->immediate.nbytes);
}
- return;
}

static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
@@ -343,7 +339,7 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
return validate_insn_64bits(auprobe, insn);
}
#else /* 32-bit: */
-static void handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
+static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
{
/* No RIP-relative addressing on 32-bit */
}
@@ -376,7 +372,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
* and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
* is either zero or it reflects rip-related fixups.
*/
- handle_riprel_insn(auprobe, mm, &insn);
+ handle_riprel_insn(auprobe, &insn);

switch (OPCODE1(&insn)) {
case 0x9d: /* popf */
--
1.5.5.1

2014-04-04 18:51:37

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
set of methods and change arch_uprobe_pre/post_xol() accordingly.

This way we can add the new uprobe_xol_ops's to handle the insns
which need the special processing (rip-relative jmp/call at least).

TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
we should teach arch_uprobe_post_xol() or handle_singlestep() paths
to restart the probed insn in this case. And probably "adjust" can
be simplified and turned into set_ret_addr(). It seems that we do
not really need copy_from_user(), we can always calculate the value
we need to write into *regs->sp.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
---
arch/x86/include/asm/uprobes.h | 7 ++-
arch/x86/kernel/uprobes.c | 107 +++++++++++++++++++++++++--------------
2 files changed, 74 insertions(+), 40 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 3087ea9..9f8210b 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -33,12 +33,17 @@ typedef u8 uprobe_opcode_t;
#define UPROBE_SWBP_INSN 0xcc
#define UPROBE_SWBP_INSN_SIZE 1

+struct uprobe_xol_ops;
+
struct arch_uprobe {
- u16 fixups;
union {
u8 insn[MAX_UINSN_BYTES];
u8 ixol[MAX_UINSN_BYTES];
};
+
+ u16 fixups;
+ const struct uprobe_xol_ops *ops;
+
#ifdef CONFIG_X86_64
unsigned long rip_rela_target_address;
#endif
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 3bb4198..13ad8a3 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -402,6 +402,64 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
}
#endif /* CONFIG_X86_64 */

+struct uprobe_xol_ops {
+ bool (*emulate)(struct arch_uprobe *, struct pt_regs *);
+ int (*pre_xol)(struct arch_uprobe *, struct pt_regs *);
+ int (*post_xol)(struct arch_uprobe *, struct pt_regs *);
+};
+
+static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
+ return 0;
+}
+
+/*
+ * Adjust the return address pushed by a call insn executed out of line.
+ */
+static int adjust_ret_addr(unsigned long sp, long correction)
+{
+ int rasize, ncopied;
+ long ra = 0;
+
+ if (is_ia32_task())
+ rasize = 4;
+ else
+ rasize = 8;
+
+ ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
+ if (unlikely(ncopied))
+ return -EFAULT;
+
+ ra += correction;
+ ncopied = copy_to_user((void __user *)sp, &ra, rasize);
+ if (unlikely(ncopied))
+ return -EFAULT;
+
+ return 0;
+}
+
+static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ struct uprobe_task *utask = current->utask;
+ long correction = (long)(utask->vaddr - utask->xol_vaddr);
+ int ret = 0;
+
+ handle_riprel_post_xol(auprobe, regs, &correction);
+ if (auprobe->fixups & UPROBE_FIX_IP)
+ regs->ip += correction;
+
+ if (auprobe->fixups & UPROBE_FIX_CALL)
+ ret = adjust_ret_addr(regs->sp, correction);
+
+ return ret;
+}
+
+static struct uprobe_xol_ops default_xol_ops = {
+ .pre_xol = default_pre_xol_op,
+ .post_xol = default_post_xol_op,
+};
+
/**
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
@@ -464,6 +522,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (fix_call)
auprobe->fixups |= UPROBE_FIX_CALL;

+ auprobe->ops = &default_xol_ops;
return 0;
}

@@ -485,33 +544,8 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
set_task_blockstep(current, false);

- pre_xol_rip_insn(auprobe, regs, &utask->autask);
- return 0;
-}
-
-/*
- * This function is called by arch_uprobe_post_xol() to adjust the return
- * address pushed by a call instruction executed out of line.
- */
-static int adjust_ret_addr(unsigned long sp, long correction)
-{
- int rasize, ncopied;
- long ra = 0;
-
- if (is_ia32_task())
- rasize = 4;
- else
- rasize = 8;
-
- ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
- if (unlikely(ncopied))
- return -EFAULT;
-
- ra += correction;
- ncopied = copy_to_user((void __user *)sp, &ra, rasize);
- if (unlikely(ncopied))
- return -EFAULT;
-
+ if (auprobe->ops->pre_xol)
+ return auprobe->ops->pre_xol(auprobe, regs);
return 0;
}

@@ -560,11 +594,8 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
struct uprobe_task *utask = current->utask;
- long correction;
- int result = 0;

WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
-
current->thread.trap_nr = utask->autask.saved_trap_nr;
/*
* arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
@@ -576,15 +607,9 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
else if (!(auprobe->fixups & UPROBE_FIX_SETF))
regs->flags &= ~X86_EFLAGS_TF;

- correction = (long)(utask->vaddr - utask->xol_vaddr);
- handle_riprel_post_xol(auprobe, regs, &correction);
- if (auprobe->fixups & UPROBE_FIX_IP)
- regs->ip += correction;
-
- if (auprobe->fixups & UPROBE_FIX_CALL)
- result = adjust_ret_addr(regs->sp, correction);
-
- return result;
+ if (auprobe->ops->post_xol)
+ return auprobe->ops->post_xol(auprobe, regs);
+ return 0;
}

/* callback routine for handling exceptions. */
@@ -642,6 +667,10 @@ static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
int i;

+ if (auprobe->ops->emulate)
+ return auprobe->ops->emulate(auprobe, regs);
+
+ /* TODO: move this code into ->emulate() hook */
for (i = 0; i < MAX_UINSN_BYTES; i++) {
if (auprobe->insn[i] == 0x66)
continue;
--
1.5.5.1

2014-04-04 18:51:46

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 7/9] uprobes/x86: Conditionalize the usage of handle_riprel_insn()

arch_uprobe_analyze_insn() calls handle_riprel_insn() at the start,
but only "0xff" and "default" cases need the UPROBE_FIX_RIP_ logic.
Move the callsite into "default" case and change the "0xff" case to
fall-through.

We are going to add the various hooks to handle the rip-relative
jmp/call instructions (and more), we need this change to enforce the
fact that the new code can not conflict with is_riprel_insn() logic
which, after this change, can only be used by default_xol_ops.

Note: arch_uprobe_abort_xol() still calls handle_riprel_post_xol()
directly. This is fine unless another _xol_ops we may add later will
need to reuse "UPROBE_FIX_RIP_AX|UPROBE_FIX_RIP_CX" bits in ->fixup.
In this case we can add uprobe_xol_ops->abort() hook, which (perhaps)
we will need anyway in the long term.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
---
arch/x86/kernel/uprobes.c | 6 ++----
1 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 13ad8a3..08cdb82 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -482,8 +482,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
* and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
* is either zero or it reflects rip-related fixups.
*/
- handle_riprel_insn(auprobe, &insn);
-
switch (OPCODE1(&insn)) {
case 0x9d: /* popf */
auprobe->fixups |= UPROBE_FIX_SETF;
@@ -512,9 +510,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
case 4: case 5: /* jmp or ljmp, indirect */
fix_ip = false;
}
- break;
+ /* fall through */
default:
- break;
+ handle_riprel_insn(auprobe, &insn);
}

if (fix_ip)
--
1.5.5.1

2014-04-04 18:51:43

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 8/9] uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails

Currently the error from arch_uprobe_post_xol() is silently ignored.
This doesn't look good and this can lead to the hard-to-debug problems.

1. Change handle_singlestep() to loudly complain and send SIGILL.

Note: this only affects x86, ppc/arm can't fail.

2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
avoid TF games if it is going to return an error.

This can help to to analyze the problem, if nothing else we should
not report ->ip = xol_slot in the core-file.

Note: this means that handle_riprel_post_xol() can be called twice,
but this is fine because it is idempotent.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 16 ++++++++++++----
kernel/events/uprobes.c | 8 +++++++-
2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 08cdb82..e72903e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -594,6 +594,15 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
struct uprobe_task *utask = current->utask;

WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
+
+ if (auprobe->ops->post_xol) {
+ int err = auprobe->ops->post_xol(auprobe, regs);
+ if (err) {
+ arch_uprobe_abort_xol(auprobe, regs);
+ return err;
+ }
+ }
+
current->thread.trap_nr = utask->autask.saved_trap_nr;
/*
* arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
@@ -605,8 +614,6 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
else if (!(auprobe->fixups & UPROBE_FIX_SETF))
regs->flags &= ~X86_EFLAGS_TF;

- if (auprobe->ops->post_xol)
- return auprobe->ops->post_xol(auprobe, regs);
return 0;
}

@@ -641,8 +648,9 @@ int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val,

/*
* This function gets called when XOL instruction either gets trapped or
- * the thread has a fatal signal, so reset the instruction pointer to its
- * probed address.
+ * the thread has a fatal signal, or if arch_uprobe_post_xol() failed.
+ * Reset the instruction pointer to its probed address for the potential
+ * restart or for post mortem analysis.
*/
void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 7a3e14e..2adbc97 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1858,10 +1858,11 @@ out:
static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
{
struct uprobe *uprobe;
+ int err = 0;

uprobe = utask->active_uprobe;
if (utask->state == UTASK_SSTEP_ACK)
- arch_uprobe_post_xol(&uprobe->arch, regs);
+ err = arch_uprobe_post_xol(&uprobe->arch, regs);
else if (utask->state == UTASK_SSTEP_TRAPPED)
arch_uprobe_abort_xol(&uprobe->arch, regs);
else
@@ -1875,6 +1876,11 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
spin_lock_irq(&current->sighand->siglock);
recalc_sigpending(); /* see uprobe_deny_signal() */
spin_unlock_irq(&current->sighand->siglock);
+
+ if (unlikely(err)) {
+ uprobe_warn(current, "execute the probed insn, sending SIGILL.");
+ force_sig_info(SIGILL, SEND_SIG_FORCED, current);
+ }
}

/*
--
1.5.5.1

2014-04-04 18:52:28

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 9/9] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible

SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index e72903e..b820668 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -443,16 +443,17 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
{
struct uprobe_task *utask = current->utask;
long correction = (long)(utask->vaddr - utask->xol_vaddr);
- int ret = 0;

handle_riprel_post_xol(auprobe, regs, &correction);
if (auprobe->fixups & UPROBE_FIX_IP)
regs->ip += correction;

- if (auprobe->fixups & UPROBE_FIX_CALL)
- ret = adjust_ret_addr(regs->sp, correction);
+ if (auprobe->fixups & UPROBE_FIX_CALL) {
+ if (adjust_ret_addr(regs->sp, correction))
+ return -ERESTART;
+ }

- return ret;
+ return 0;
}

static struct uprobe_xol_ops default_xol_ops = {
@@ -599,6 +600,12 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
int err = auprobe->ops->post_xol(auprobe, regs);
if (err) {
arch_uprobe_abort_xol(auprobe, regs);
+ /*
+ * Restart the probed insn. ->post_xol() must ensure
+ * this is really possible if it returns -ERESTART.
+ */
+ if (err == -ERESTART)
+ return 0;
return err;
}
}
--
1.5.5.1

2014-04-04 18:51:30

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 5/9] uprobes/x86: move the UPROBE_FIX_{RIP,IP,CALL} code at the end of pre/post hooks

No functional changes. Preparation to simplify the review of the next
change. Just reorder the code in arch_uprobe_pre/post_xol() functions
so that UPROBE_FIX_{RIP_*,IP,CALL} logic goes to the end.

Also change arch_uprobe_pre_xol() to use utask instead of autask, to
make the code more symmetrical with arch_uprobe_post_xol().

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
---
arch/x86/kernel/uprobes.c | 30 ++++++++++++++----------------
1 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index c52c30f..3bb4198 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -474,19 +474,18 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
*/
int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- struct arch_uprobe_task *autask;
+ struct uprobe_task *utask = current->utask;

- autask = &current->utask->autask;
- autask->saved_trap_nr = current->thread.trap_nr;
+ regs->ip = utask->xol_vaddr;
+ utask->autask.saved_trap_nr = current->thread.trap_nr;
current->thread.trap_nr = UPROBE_TRAP_NR;
- regs->ip = current->utask->xol_vaddr;
- pre_xol_rip_insn(auprobe, regs, autask);

- autask->saved_tf = !!(regs->flags & X86_EFLAGS_TF);
+ utask->autask.saved_tf = !!(regs->flags & X86_EFLAGS_TF);
regs->flags |= X86_EFLAGS_TF;
if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
set_task_blockstep(current, false);

+ pre_xol_rip_insn(auprobe, regs, &utask->autask);
return 0;
}

@@ -560,22 +559,13 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
*/
int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- struct uprobe_task *utask;
+ struct uprobe_task *utask = current->utask;
long correction;
int result = 0;

WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);

- utask = current->utask;
current->thread.trap_nr = utask->autask.saved_trap_nr;
- correction = (long)(utask->vaddr - utask->xol_vaddr);
- handle_riprel_post_xol(auprobe, regs, &correction);
- if (auprobe->fixups & UPROBE_FIX_IP)
- regs->ip += correction;
-
- if (auprobe->fixups & UPROBE_FIX_CALL)
- result = adjust_ret_addr(regs->sp, correction);
-
/*
* arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
* so we can get an extra SIGTRAP if we do not clear TF. We need
@@ -586,6 +576,14 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
else if (!(auprobe->fixups & UPROBE_FIX_SETF))
regs->flags &= ~X86_EFLAGS_TF;

+ correction = (long)(utask->vaddr - utask->xol_vaddr);
+ handle_riprel_post_xol(auprobe, regs, &correction);
+ if (auprobe->fixups & UPROBE_FIX_IP)
+ regs->ip += correction;
+
+ if (auprobe->fixups & UPROBE_FIX_CALL)
+ result = adjust_ret_addr(regs->sp, correction);
+
return result;
}

--
1.5.5.1

2014-04-04 18:55:22

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 4/9] uprobes/x86: Gather "riprel" functions together

Cosmetic. Move pre_xol_rip_insn() and handle_riprel_post_xol() up to
the closely related handle_riprel_insn(). This way it is simpler to
read and understand this code, and this lessens the number of ifdef's.

While at it, update the comment in handle_riprel_post_xol() as Jim
suggested.

TODO: rename them somehow to make the naming consistent.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
---
arch/x86/kernel/uprobes.c | 118 ++++++++++++++++++++-------------------------
1 files changed, 53 insertions(+), 65 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 963c121..c52c30f 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -313,6 +313,48 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
}
}

+/*
+ * If we're emulating a rip-relative instruction, save the contents
+ * of the scratch register and store the target address in that register.
+ */
+static void
+pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
+ struct arch_uprobe_task *autask)
+{
+ if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
+ autask->saved_scratch_register = regs->ax;
+ regs->ax = current->utask->vaddr;
+ regs->ax += auprobe->rip_rela_target_address;
+ } else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
+ autask->saved_scratch_register = regs->cx;
+ regs->cx = current->utask->vaddr;
+ regs->cx += auprobe->rip_rela_target_address;
+ }
+}
+
+static void
+handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
+{
+ if (auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) {
+ struct arch_uprobe_task *autask;
+
+ autask = &current->utask->autask;
+ if (auprobe->fixups & UPROBE_FIX_RIP_AX)
+ regs->ax = autask->saved_scratch_register;
+ else
+ regs->cx = autask->saved_scratch_register;
+
+ /*
+ * The original instruction includes a displacement, and so
+ * is 4 bytes longer than what we've just single-stepped.
+ * Caller may need to apply other fixups to handle stuff
+ * like "jmpq *...(%rip)" and "callq *...(%rip)".
+ */
+ if (correction)
+ *correction += 4;
+ }
+}
+
static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn)
{
insn_init(insn, auprobe->insn, true);
@@ -339,9 +381,19 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
return validate_insn_64bits(auprobe, insn);
}
#else /* 32-bit: */
+/*
+ * No RIP-relative addressing on 32-bit
+ */
static void handle_riprel_insn(struct arch_uprobe *auprobe, struct insn *insn)
{
- /* No RIP-relative addressing on 32-bit */
+}
+static void pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
+ struct arch_uprobe_task *autask)
+{
+}
+static void handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs,
+ long *correction)
+{
}

static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn)
@@ -415,34 +467,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
return 0;
}

-#ifdef CONFIG_X86_64
-/*
- * If we're emulating a rip-relative instruction, save the contents
- * of the scratch register and store the target address in that register.
- */
-static void
-pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
- struct arch_uprobe_task *autask)
-{
- if (auprobe->fixups & UPROBE_FIX_RIP_AX) {
- autask->saved_scratch_register = regs->ax;
- regs->ax = current->utask->vaddr;
- regs->ax += auprobe->rip_rela_target_address;
- } else if (auprobe->fixups & UPROBE_FIX_RIP_CX) {
- autask->saved_scratch_register = regs->cx;
- regs->cx = current->utask->vaddr;
- regs->cx += auprobe->rip_rela_target_address;
- }
-}
-#else
-static void
-pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs,
- struct arch_uprobe_task *autask)
-{
- /* No RIP-relative addressing on 32-bit */
-}
-#endif
-
/*
* arch_uprobe_pre_xol - prepare to execute out of line.
* @auprobe: the probepoint information.
@@ -492,42 +516,6 @@ static int adjust_ret_addr(unsigned long sp, long correction)
return 0;
}

-#ifdef CONFIG_X86_64
-static bool is_riprel_insn(struct arch_uprobe *auprobe)
-{
- return ((auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) != 0);
-}
-
-static void
-handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
-{
- if (is_riprel_insn(auprobe)) {
- struct arch_uprobe_task *autask;
-
- autask = &current->utask->autask;
- if (auprobe->fixups & UPROBE_FIX_RIP_AX)
- regs->ax = autask->saved_scratch_register;
- else
- regs->cx = autask->saved_scratch_register;
-
- /*
- * The original instruction includes a displacement, and so
- * is 4 bytes longer than what we've just single-stepped.
- * Fall through to handle stuff like "jmpq *...(%rip)" and
- * "callq *...(%rip)".
- */
- if (correction)
- *correction += 4;
- }
-}
-#else
-static void
-handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction)
-{
- /* No RIP-relative addressing on 32-bit */
-}
-#endif
-
/*
* If xol insn itself traps and generates a signal(Say,
* SIGILL/SIGSEGV/etc), then detect the case where a singlestepped
--
1.5.5.1

2014-04-04 18:56:04

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH v2 2/9] uprobes/x86: Fold prepare_fixups() into arch_uprobe_analyze_insn()

No functional changes, preparation.

Shift the code from prepare_fixups() to arch_uprobe_analyze_insn()
with the following modifications:

- Do not call insn_get_opcode() again, it was already called
by validate_insn_bits().

- Move "case 0xea" up. This way "case 0xff" can fall through
to default case.

- change "case 0xff" to use the nested "switch (MODRM_REG)",
this way the code looks a bit simpler.

- Make the comments look consistent.

While at it, kill the initialization of rip_rela_target_address and
->fixups, we can rely on kzalloc(). We will add the new members into
arch_uprobe, it would be better to assume that everything is zero by
default.

TODO: cleanup/fix the mess in validate_insn_bits() paths:

- validate_insn_64bits() and validate_insn_32bits() should be
unified.

- "ifdef" is not used consistently; if good_insns_64 depends
on CONFIG_X86_64, then probably good_insns_32 should depend
on CONFIG_X86_32/EMULATION

- the usage of mm->context.ia32_compat looks wrong if the task
is TIF_X32.

Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Masami Hiramatsu <[email protected]>
Reviewed-by: Jim Keniston <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
---
arch/x86/kernel/uprobes.c | 110 +++++++++++++++++++--------------------------
1 files changed, 47 insertions(+), 63 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 2ed8459..098e56e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -53,7 +53,7 @@
#define OPCODE1(insn) ((insn)->opcode.bytes[0])
#define OPCODE2(insn) ((insn)->opcode.bytes[1])
#define OPCODE3(insn) ((insn)->opcode.bytes[2])
-#define MODRM_REG(insn) X86_MODRM_REG(insn->modrm.value)
+#define MODRM_REG(insn) X86_MODRM_REG((insn)->modrm.value)

#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
@@ -229,63 +229,6 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
return -ENOTSUPP;
}

-/*
- * Figure out which fixups arch_uprobe_post_xol() will need to perform, and
- * annotate arch_uprobe->fixups accordingly. To start with,
- * arch_uprobe->fixups is either zero or it reflects rip-related fixups.
- */
-static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
-{
- bool fix_ip = true, fix_call = false; /* defaults */
- int reg;
-
- insn_get_opcode(insn); /* should be a nop */
-
- switch (OPCODE1(insn)) {
- case 0x9d:
- /* popf */
- auprobe->fixups |= UPROBE_FIX_SETF;
- break;
- case 0xc3: /* ret/lret */
- case 0xcb:
- case 0xc2:
- case 0xca:
- /* ip is correct */
- fix_ip = false;
- break;
- case 0xe8: /* call relative - Fix return addr */
- fix_call = true;
- break;
- case 0x9a: /* call absolute - Fix return addr, not ip */
- fix_call = true;
- fix_ip = false;
- break;
- case 0xff:
- insn_get_modrm(insn);
- reg = MODRM_REG(insn);
- if (reg == 2 || reg == 3) {
- /* call or lcall, indirect */
- /* Fix return addr; ip is correct. */
- fix_call = true;
- fix_ip = false;
- } else if (reg == 4 || reg == 5) {
- /* jmp or ljmp, indirect */
- /* ip is correct. */
- fix_ip = false;
- }
- break;
- case 0xea: /* jmp absolute -- ip is correct */
- fix_ip = false;
- break;
- default:
- break;
- }
- if (fix_ip)
- auprobe->fixups |= UPROBE_FIX_IP;
- if (fix_call)
- auprobe->fixups |= UPROBE_FIX_CALL;
-}
-
#ifdef CONFIG_X86_64
/*
* If arch_uprobe->insn doesn't use rip-relative addressing, return
@@ -318,7 +261,6 @@ handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct ins
if (mm->context.ia32_compat)
return;

- auprobe->rip_rela_target_address = 0x0;
if (!insn_rip_relative(insn))
return;

@@ -421,16 +363,58 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
*/
int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
{
- int ret;
struct insn insn;
+ bool fix_ip = true, fix_call = false;
+ int ret;

- auprobe->fixups = 0;
ret = validate_insn_bits(auprobe, mm, &insn);
- if (ret != 0)
+ if (ret)
return ret;

+ /*
+ * Figure out which fixups arch_uprobe_post_xol() will need to perform,
+ * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
+ * is either zero or it reflects rip-related fixups.
+ */
handle_riprel_insn(auprobe, mm, &insn);
- prepare_fixups(auprobe, &insn);
+
+ switch (OPCODE1(&insn)) {
+ case 0x9d: /* popf */
+ auprobe->fixups |= UPROBE_FIX_SETF;
+ break;
+ case 0xc3: /* ret or lret -- ip is correct */
+ case 0xcb:
+ case 0xc2:
+ case 0xca:
+ fix_ip = false;
+ break;
+ case 0xe8: /* call relative - Fix return addr */
+ fix_call = true;
+ break;
+ case 0x9a: /* call absolute - Fix return addr, not ip */
+ fix_call = true;
+ fix_ip = false;
+ break;
+ case 0xea: /* jmp absolute -- ip is correct */
+ fix_ip = false;
+ break;
+ case 0xff:
+ insn_get_modrm(&insn);
+ switch (MODRM_REG(&insn)) {
+ case 2: case 3: /* call or lcall, indirect */
+ fix_call = true;
+ case 4: case 5: /* jmp or ljmp, indirect */
+ fix_ip = false;
+ }
+ break;
+ default:
+ break;
+ }
+
+ if (fix_ip)
+ auprobe->fixups |= UPROBE_FIX_IP;
+ if (fix_call)
+ auprobe->fixups |= UPROBE_FIX_CALL;

return 0;
}
--
1.5.5.1

2014-04-04 19:32:30

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] uprobes/x86: preparations to fix the reprel jmp/call handling.

On 04/04, Oleg Nesterov wrote:
>
> Now let me send the draft RFC patch which fixes the "call" handling...

Damn. apparently I can't understand lib/insn.c...

Please see the draft below. Lets ignore 32bit tasks, lets ignore jmp's,
please ignore how the (pseudo) code written, I'll change it anyway.

Questions:

1. Why insn_get_displacement() doesn't work? See "HELP!!!"
below.

2. Do I use lib/insn.c correctly (ignoring displacement) ?

In particular, is 'turn this insn into "1: call 1b;"'
below correct?

3. Jim, do you still think it would be better to rewrite the
call insns using a scratch register ?

4. Is there other call insns with OPCODE1() != 0xe8 which
should be fixed too?

Thanks,

Oleg.


diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 9f8210b..cca62c5 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -44,9 +44,15 @@ struct arch_uprobe {
u16 fixups;
const struct uprobe_xol_ops *ops;

+ union {
#ifdef CONFIG_X86_64
- unsigned long rip_rela_target_address;
+ unsigned long rip_rela_target_address;
#endif
+ struct {
+ s32 disp;
+ u8 ilen;
+ } ttt;
+ };
};

struct arch_uprobe_task {
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index b820668..423ae86 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -461,6 +461,52 @@ static struct uprobe_xol_ops default_xol_ops = {
.post_xol = default_post_xol_op,
};

+static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+
+ if (put_user(regs->ip + auprobe->ttt.ilen, (long __user *)(regs->sp - 8)))
+ return false;
+
+ regs->sp -= 8;
+ regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
+
+ return true;
+}
+
+static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ regs->sp += 8;
+ if (ttt_emulate_op(auprobe, regs))
+ return 0;
+ return -ERESTART;
+}
+
+static struct uprobe_xol_ops ttt_xol_ops = {
+ .emulate = ttt_emulate_op,
+ .post_xol = ttt_post_xol_op,
+};
+
+static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
+{
+ s32 *disp;
+
+ insn_get_length(insn);
+ auprobe->ttt.ilen = insn->length;
+
+ insn_get_displacement(insn);
+ auprobe->ttt.disp = insn->displacement.value;
+ // HELP!!! the above doesn't work, ->displacement.value == 0
+ auprobe->ttt.disp = *(s32 *)(auprobe->insn + 1);
+
+ // turn this insn into "1: call 1b;", we only need to xol it
+ // to expand the stack if ->emulate() fails.
+ disp = (void *)auprobe->insn + insn_offset_displacement(insn);
+ *disp = -(s32)auprobe->ttt.ilen;
+
+ auprobe->ops = &ttt_xol_ops;
+ return 0;
+}
+
/**
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
@@ -484,6 +530,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
* is either zero or it reflects rip-related fixups.
*/
switch (OPCODE1(&insn)) {
+ case 0xe8: /* call relative - has its own xol_ops */
+ return ttt_setup_xol_ops(auprobe, &insn);
+
case 0x9d: /* popf */
auprobe->fixups |= UPROBE_FIX_SETF;
break;
@@ -493,9 +542,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
case 0xca:
fix_ip = false;
break;
- case 0xe8: /* call relative - Fix return addr */
- fix_call = true;
- break;
case 0x9a: /* call absolute - Fix return addr, not ip */
fix_call = true;
fix_ip = false;

2014-04-04 19:52:04

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] uprobes/x86: preparations to fix the reprel jmp/call handling.

On 04/04, Oleg Nesterov wrote:
>
> +static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + regs->sp += 8;
> + if (ttt_emulate_op(auprobe, regs))
> + return 0;
> + return -ERESTART;
> +}

forgets to update ->ip before ttt_emulate_op(). Or we can simply
return -ERESTART after sp += 8. Please ignore this bug, this part
is trivial.

Oleg.

2014-04-04 21:56:53

by Jim Keniston

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible

On Fri, 2014-04-04 at 20:51 +0200, Oleg Nesterov wrote:
> SIGILL after the failed arch_uprobe_post_xol() should only be used as
> a last resort, we should try to restart the probed insn if possible.
>
> Currently only adjust_ret_addr() can fail, and this can only happen if
> another thread unmapped our stack after we executed "call" out-of-line.
> Most probably the application if buggy, but even in this case it can
> have a handler for SIGSEGV/etc. And in theory it can be even correct
> and do something non-trivial with its memory.
>
> Of course we can't restart unconditionally, so arch_uprobe_post_xol()
> does this only if ->post_xol() returns -ERESTART even if currently this
> is the only possible error.

When re-executing the call instruction, I'd think the stack pointer
would be wrong the second time around, unless you pop off the return
address from the first try.

>
> Note: this is not "perfect", we do not want the extra handler_chain()
> after restart, but I think this is the best solution we can realistically
> do without too much uglifications.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> arch/x86/kernel/uprobes.c | 15 +++++++++++----
> 1 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index e72903e..b820668 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -443,16 +443,17 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
> {
> struct uprobe_task *utask = current->utask;
> long correction = (long)(utask->vaddr - utask->xol_vaddr);
> - int ret = 0;
>
> handle_riprel_post_xol(auprobe, regs, &correction);
> if (auprobe->fixups & UPROBE_FIX_IP)
> regs->ip += correction;
>
> - if (auprobe->fixups & UPROBE_FIX_CALL)
> - ret = adjust_ret_addr(regs->sp, correction);
> + if (auprobe->fixups & UPROBE_FIX_CALL) {
> + if (adjust_ret_addr(regs->sp, correction))
> + return -ERESTART;
> + }
>
> - return ret;
> + return 0;
> }
>
> static struct uprobe_xol_ops default_xol_ops = {
> @@ -599,6 +600,12 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> int err = auprobe->ops->post_xol(auprobe, regs);
> if (err) {
> arch_uprobe_abort_xol(auprobe, regs);
> + /*
> + * Restart the probed insn. ->post_xol() must ensure
> + * this is really possible if it returns -ERESTART.
> + */
> + if (err == -ERESTART)
> + return 0;
> return err;
> }
> }

2014-04-04 23:44:36

by Jim Keniston

[permalink] [raw]
Subject: Re: [PATCH v2 0/9] uprobes/x86: preparations to fix the reprel jmp/call handling.

On Fri, 2014-04-04 at 21:32 +0200, Oleg Nesterov wrote:
> On 04/04, Oleg Nesterov wrote:
> >
> > Now let me send the draft RFC patch which fixes the "call" handling...
>
> Damn. apparently I can't understand lib/insn.c...
>
> Please see the draft below. Lets ignore 32bit tasks, lets ignore jmp's,
> please ignore how the (pseudo) code written, I'll change it anyway.
>
> Questions:

So far, I have answers for just #1 and #2.

>
> 1. Why insn_get_displacement() doesn't work? See "HELP!!!"
> below.

insn->moffset1.value seems to be what you want.

>
> 2. Do I use lib/insn.c correctly (ignoring displacement) ?

insn_get_length() has the side-effect of processing the entire
instruction, so just calling that should be sufficient. Looks OK
otherwise -- but I checked very quickly.

More in a day or two.

Jim

>
> In particular, is 'turn this insn into "1: call 1b;"'
> below correct?
>
> 3. Jim, do you still think it would be better to rewrite the
> call insns using a scratch register ?
>
> 4. Is there other call insns with OPCODE1() != 0xe8 which
> should be fixed too?
>
> Thanks,
>
> Oleg.
>
>
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 9f8210b..cca62c5 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -44,9 +44,15 @@ struct arch_uprobe {
> u16 fixups;
> const struct uprobe_xol_ops *ops;
>
> + union {
> #ifdef CONFIG_X86_64
> - unsigned long rip_rela_target_address;
> + unsigned long rip_rela_target_address;
> #endif
> + struct {
> + s32 disp;
> + u8 ilen;
> + } ttt;
> + };
> };
>
> struct arch_uprobe_task {
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index b820668..423ae86 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -461,6 +461,52 @@ static struct uprobe_xol_ops default_xol_ops = {
> .post_xol = default_post_xol_op,
> };
>
> +static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> +
> + if (put_user(regs->ip + auprobe->ttt.ilen, (long __user *)(regs->sp - 8)))
> + return false;
> +
> + regs->sp -= 8;
> + regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
> +
> + return true;
> +}
> +
> +static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + regs->sp += 8;
> + if (ttt_emulate_op(auprobe, regs))
> + return 0;
> + return -ERESTART;
> +}
> +
> +static struct uprobe_xol_ops ttt_xol_ops = {
> + .emulate = ttt_emulate_op,
> + .post_xol = ttt_post_xol_op,
> +};
> +
> +static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> +{
> + s32 *disp;
> +
> + insn_get_length(insn);
> + auprobe->ttt.ilen = insn->length;
> +
> + insn_get_displacement(insn);
> + auprobe->ttt.disp = insn->displacement.value;
> + // HELP!!! the above doesn't work, ->displacement.value == 0
> + auprobe->ttt.disp = *(s32 *)(auprobe->insn + 1);
> +
> + // turn this insn into "1: call 1b;", we only need to xol it
> + // to expand the stack if ->emulate() fails.
> + disp = (void *)auprobe->insn + insn_offset_displacement(insn);
> + *disp = -(s32)auprobe->ttt.ilen;
> +
> + auprobe->ops = &ttt_xol_ops;
> + return 0;
> +}
> +
> /**
> * arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
> * @mm: the probed address space.
> @@ -484,6 +530,9 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> * is either zero or it reflects rip-related fixups.
> */
> switch (OPCODE1(&insn)) {
> + case 0xe8: /* call relative - has its own xol_ops */
> + return ttt_setup_xol_ops(auprobe, &insn);
> +
> case 0x9d: /* popf */
> auprobe->fixups |= UPROBE_FIX_SETF;
> break;
> @@ -493,9 +542,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> case 0xca:
> fix_ip = false;
> break;
> - case 0xe8: /* call relative - Fix return addr */
> - fix_call = true;
> - break;
> case 0x9a: /* call absolute - Fix return addr, not ip */
> fix_call = true;
> fix_ip = false;
>

2014-04-05 12:46:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 9/9] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible

On 04/04, Jim Keniston wrote:
>
> On Fri, 2014-04-04 at 20:51 +0200, Oleg Nesterov wrote:
> >
> > Currently only adjust_ret_addr() can fail, and this can only happen if
> > another thread unmapped our stack after we executed "call" out-of-line.
> > Most probably the application if buggy, but even in this case it can
> > have a handler for SIGSEGV/etc. And in theory it can be even correct
> > and do something non-trivial with its memory.
> >
> > Of course we can't restart unconditionally, so arch_uprobe_post_xol()
> > does this only if ->post_xol() returns -ERESTART even if currently this
> > is the only possible error.
>
> When re-executing the call instruction, I'd think the stack pointer
> would be wrong the second time around, unless you pop off the return
> address from the first try.

Of course! Like ttt_post_xol_op() in the next patch does, can't understand
how I forgot.

Thanks a lot! Please see v3 below.

I also updated the changelog. Please do not ask me to cleanup the games
with ->sp now. I'll try to do this later when we finish the bug fixes.
The current code should be unified with the code we will add. And I think
that adjust_ret_address() should die. What we need is uprobe_push(), there
is no need for copy_from_user() afaics. The value we need to push is
utask->vaddr + correction-calculated-at-analyze-time.


-------------------------------------------------------------------------------
Subject: [PATCH v3 9/9] uprobes/x86: Teach arch_uprobe_post_xol() to restart if possible

SIGILL after the failed arch_uprobe_post_xol() should only be used as
a last resort, we should try to restart the probed insn if possible.

Currently only adjust_ret_addr() can fail, and this can only happen if
another thread unmapped our stack after we executed "call" out-of-line.
Most probably the application if buggy, but even in this case it can
have a handler for SIGSEGV/etc. And in theory it can be even correct
and do something non-trivial with its memory.

Of course we can't restart unconditionally, so arch_uprobe_post_xol()
does this only if ->post_xol() returns -ERESTART even if currently this
is the only possible error.

default_post_xol_op(UPROBE_FIX_CALL) can always restart, but as Jim
pointed out it should not forget to pop off the return address pushed
by this insn executed out-of-line.

Note: this is not "perfect", we do not want the extra handler_chain()
after restart, but I think this is the best solution we can realistically
do without too much uglifications.

TODO: This adds yet another is_ia32_task() check, and the next patches
will add more. I will try cleanup this later, after we fix the pending
problems. And to remind, it seems that adjust_ret_addr() should die.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 20 ++++++++++++++++----
1 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index e72903e..cdd6909 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -443,16 +443,22 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
{
struct uprobe_task *utask = current->utask;
long correction = (long)(utask->vaddr - utask->xol_vaddr);
- int ret = 0;

handle_riprel_post_xol(auprobe, regs, &correction);
if (auprobe->fixups & UPROBE_FIX_IP)
regs->ip += correction;

- if (auprobe->fixups & UPROBE_FIX_CALL)
- ret = adjust_ret_addr(regs->sp, correction);
+ if (auprobe->fixups & UPROBE_FIX_CALL) {
+ if (adjust_ret_addr(regs->sp, correction)) {
+ if (is_ia32_task())
+ regs->sp += 4;
+ else
+ regs->sp += 8;
+ return -ERESTART;
+ }
+ }

- return ret;
+ return 0;
}

static struct uprobe_xol_ops default_xol_ops = {
@@ -599,6 +605,12 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
int err = auprobe->ops->post_xol(auprobe, regs);
if (err) {
arch_uprobe_abort_xol(auprobe, regs);
+ /*
+ * Restart the probed insn. ->post_xol() must ensure
+ * this is really possible if it returns -ERESTART.
+ */
+ if (err == -ERESTART)
+ return 0;
return err;
}
}
--
1.5.5.1

2014-04-06 20:15:35

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 0/6] uprobes/x86: fix the reprel jmp/call handling

On 04/04, Jim Keniston wrote:
>
> On Fri, 2014-04-04 at 21:32 +0200, Oleg Nesterov wrote:
> >
> > 1. Why insn_get_displacement() doesn't work? See "HELP!!!"
> > below.
>
> insn->moffset1.value seems to be what you want.

Works! Thanks a lot.

Still I can't understand why displacement.nbytes == 0 in this case...
Nevermind.

OK. Please see the RFC changes. Obviously not for inclusion yet. And
totally untested, except I verified that the test-case from 4/6 works.

Please comment.

Oleg.

2014-04-06 20:16:28

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 1/6] uprobes/x86: Emulate unconditional rip-relative jmp's

0xeb and 0xe9. Anything else?

TODO: move ->fixup into the union along with rip_rela_target_address.

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/uprobes.h | 8 +++++++-
arch/x86/kernel/uprobes.c | 40 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 47 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 9f8210b..cca62c5 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -44,9 +44,15 @@ struct arch_uprobe {
u16 fixups;
const struct uprobe_xol_ops *ops;

+ union {
#ifdef CONFIG_X86_64
- unsigned long rip_rela_target_address;
+ unsigned long rip_rela_target_address;
#endif
+ struct {
+ s32 disp;
+ u8 ilen;
+ } ttt;
+ };
};

struct arch_uprobe_task {
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 1ea7e1a..32ab147 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -466,6 +466,42 @@ static struct uprobe_xol_ops default_xol_ops = {
.post_xol = default_post_xol_op,
};

+static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
+ return true;
+}
+
+static struct uprobe_xol_ops ttt_xol_ops = {
+ .emulate = ttt_emulate_op,
+};
+
+static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
+{
+
+ switch (OPCODE1(insn)) {
+ case 0xeb: /* jmp 8 */
+ case 0xe9: /* jmp 32 */
+ break;
+ default:
+ return -ENOSYS;
+ }
+
+ /* has the side-effect of processing the entire instruction */
+ insn_get_length(insn);
+ if (WARN_ON_ONCE(!insn_complete(insn)))
+ return -ENOEXEC;
+
+ auprobe->ttt.ilen = insn->length;
+ auprobe->ttt.disp = insn->moffset1.value;
+ /* so far we assume that it fits into ->moffset1 */
+ if (WARN_ON_ONCE(insn->moffset2.nbytes))
+ return -ENOEXEC;
+
+ auprobe->ops = &ttt_xol_ops;
+ return 0;
+}
+
/**
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
@@ -483,6 +519,10 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (ret)
return ret;

+ ret = ttt_setup_xol_ops(auprobe, &insn);
+ if (ret == 0 || ret != ENOSYS)
+ return ret;
+
/*
* Figure out which fixups arch_uprobe_post_xol() will need to perform,
* and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
--
1.5.5.1

2014-04-06 20:16:33

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

0xe8. Anything else?

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

void probe_func(void), callee(void);

int failed = 1;

asm (
".text\n"
".align 4096\n"
".globl probe_func\n"
"probe_func:\n"
"call callee\n"
"ret"
);

/*
* This assumes that:
*
* - &probe_func = 0x401000 + a_bit, aligned = 0x402000
*
* - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
* as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
*
* so we can target the non-canonical address from xol_vma using
* the simple math below, 100 * 4096 is just the random offset
*/
asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n");

void callee(void)
{
failed = 0;
}

int main(void)
{
probe_func();
return failed;
}

It SIGSEGV's if you probe "probe_func" (although it's not very reliable,
randomize_va_space/etc can change the placement of xol area).

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/uprobes.h | 1 +
arch/x86/kernel/uprobes.c | 69 ++++++++++++++++++++++++++++++++++------
2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index cca62c5..9528117 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -51,6 +51,7 @@ struct arch_uprobe {
struct {
s32 disp;
u8 ilen;
+ u8 opc1;
} ttt;
};
};
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 3bcc121..9283024 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -461,33 +461,85 @@ static struct uprobe_xol_ops default_xol_ops = {
.post_xol = default_post_xol_op,
};

+static bool ttt_is_call(struct arch_uprobe *auprobe)
+{
+ return auprobe->ttt.opc1 == 0xe8;
+}
+
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
+ unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
+
+ if (ttt_is_call(auprobe)) {
+ unsigned long new_sp = regs->sp - sizeof_long();
+ if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
+ return false;
+ regs->sp = new_sp;
+ }
+
+ regs->ip = new_ip + auprobe->ttt.disp;
return true;
}

+static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ BUG_ON(!ttt_is_call(auprobe));
+ /*
+ * We can only get here if ttt_emulate_op() failed to push the return
+ * address _and_ another thread expanded our stack before the (mangled)
+ * "call" insn was executed out-of-line. Just restore ->sp and restart.
+ * We could also restore ->ip and try to call ttt_emulate_op() again.
+ */
+ regs->sp += sizeof_long();
+ return -ERESTART;
+}
+
+static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
+{
+ /*
+ * Turn this insn into "call 1f; 1:", this is what we will execute
+ * out-of-line if ->emulate() fails.
+ *
+ * In the likely case this will lead to arch_uprobe_abort_xol(), but
+ * see the comment in ->emulate(). So we need to ensure that the new
+ * ->ip can't fall into non-canonical area and trigger #GP.
+ *
+ * We could turn it into (say) "pushf", but then we would need to
+ * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
+ * of ->insn[] for set_orig_insn().
+ */
+ memset(auprobe->insn + insn_offset_displacement(insn),
+ 0, insn->moffset1.nbytes);
+}
+
static struct uprobe_xol_ops ttt_xol_ops = {
.emulate = ttt_emulate_op,
+ .post_xol = ttt_post_xol_op,
};

static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
{
+ u8 opc1 = OPCODE1(insn);
+
+ /* has the side-effect of processing the entire instruction */
+ insn_get_length(insn);
+ if (WARN_ON_ONCE(!insn_complete(insn)))
+ return -ENOEXEC;

- switch (OPCODE1(insn)) {
+ switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
break;
+
+ case 0xe8: /* call relative */
+ ttt_clear_displacement(auprobe, insn);
+ auprobe->ttt.opc1 = opc1;
+ break;
default:
return -ENOSYS;
}

- /* has the side-effect of processing the entire instruction */
- insn_get_length(insn);
- if (WARN_ON_ONCE(!insn_complete(insn)))
- return -ENOEXEC;
-
auprobe->ttt.ilen = insn->length;
auprobe->ttt.disp = insn->moffset1.value;
/* so far we assume that it fits into ->moffset1 */
@@ -534,9 +586,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
case 0xca:
fix_ip = false;
break;
- case 0xe8: /* call relative - Fix return addr */
- fix_call = true;
- break;
case 0x9a: /* call absolute - Fix return addr, not ip */
fix_call = true;
fix_ip = false;
--
1.5.5.1

2014-04-06 20:16:48

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 2/6] uprobes/x86: Emulate nop's using ops->emulate()

Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change ttt_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 32ab147..dd5f51a 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -482,6 +482,7 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
switch (OPCODE1(insn)) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
+ case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
break;
default:
return -ENOSYS;
@@ -717,29 +718,10 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
regs->flags &= ~X86_EFLAGS_TF;
}

-/*
- * Skip these instructions as per the currently known x86 ISA.
- * rep=0x66*; nop=0x90
- */
static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- int i;
-
if (auprobe->ops->emulate)
return auprobe->ops->emulate(auprobe, regs);
-
- /* TODO: move this code into ->emulate() hook */
- for (i = 0; i < MAX_UINSN_BYTES; i++) {
- if (auprobe->insn[i] == 0x66)
- continue;
-
- if (auprobe->insn[i] == 0x90) {
- regs->ip += i + 1;
- return true;
- }
-
- break;
- }
return false;
}

--
1.5.5.1

2014-04-06 20:16:43

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 3/6] uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()

1. Add the trivial sizeof_long() helper and change other callers of
is_ia32_task() to use it.

TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
and adjust_ret_addr() should be named "nleft". And in fact only the
last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
needs to inspect the non-zero error code.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 37 +++++++++++++++----------------------
1 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index dd5f51a..3bcc121 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -408,6 +408,11 @@ struct uprobe_xol_ops {
int (*post_xol)(struct arch_uprobe *, struct pt_regs *);
};

+static inline int sizeof_long(void)
+{
+ return is_ia32_task() ? 4 : 8;
+}
+
static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
@@ -419,21 +424,14 @@ static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
*/
static int adjust_ret_addr(unsigned long sp, long correction)
{
- int rasize, ncopied;
- long ra = 0;
-
- if (is_ia32_task())
- rasize = 4;
- else
- rasize = 8;
+ int rasize = sizeof_long();
+ long ra;

- ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
- if (unlikely(ncopied))
+ if (copy_from_user(&ra, (void __user *)sp, rasize))
return -EFAULT;

ra += correction;
- ncopied = copy_to_user((void __user *)sp, &ra, rasize);
- if (unlikely(ncopied))
+ if (copy_to_user((void __user *)sp, &ra, rasize))
return -EFAULT;

return 0;
@@ -450,10 +448,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs

if (auprobe->fixups & UPROBE_FIX_CALL) {
if (adjust_ret_addr(regs->sp, correction)) {
- if (is_ia32_task())
- regs->sp += 4;
- else
- regs->sp += 8;
+ regs->sp += sizeof_long();
return -ERESTART;
}
}
@@ -738,23 +733,21 @@ if (ret) pr_crit("EMULATE: %lx -> %lx\n", ip, regs->ip);
unsigned long
arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs)
{
- int rasize, ncopied;
+ int rasize = sizeof_long(), nleft;
unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */

- rasize = is_ia32_task() ? 4 : 8;
- ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize);
- if (unlikely(ncopied))
+ if (copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize))
return -1;

/* check whether address has been already hijacked */
if (orig_ret_vaddr == trampoline_vaddr)
return orig_ret_vaddr;

- ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
- if (likely(!ncopied))
+ nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
+ if (likely(!nleft))
return orig_ret_vaddr;

- if (ncopied != rasize) {
+ if (nleft != rasize) {
pr_err("uprobe: return address clobbered: pid=%d, %%sp=%#lx, "
"%%ip=%#lx\n", current->pid, regs->sp, regs->ip);

--
1.5.5.1

2014-04-06 20:16:39

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

Incomplete, lacks "jcxz". Simple to fix. Anything else?

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9283024..797b8a4 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -466,18 +466,56 @@ static bool ttt_is_call(struct arch_uprobe *auprobe)
return auprobe->ttt.opc1 == 0xe8;
}

+#define X86_COND_OPCODES \
+ CASE(70, 71, XF(OF)) \
+ CASE(72, 73, XF(CF)) \
+ CASE(74, 75, XF(ZF)) \
+ CASE(78, 79, XF(SF)) \
+ CASE(7a, 7b, XF(PF)) \
+ CASE(76, 77, XF(CF) || XF(ZF)) \
+ CASE(7c, 7d, XF(SF) != XF(OF)) \
+ CASE(7e, 7f, XF(ZF) || XF(SF) != XF(OF))
+
+static bool ck_cond_opcode(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ unsigned long flags = regs->flags;
+ bool ret;
+
+ switch (auprobe->ttt.opc1) {
+ case 0x00: /* not a conditional jmp */
+ return true;
+
+ #define XF(xf) (!!(flags & X86_EFLAGS_ ## xf))
+ #define CASE(op_y, op_n, cond) \
+ case 0x ## op_y: ret = (cond) != 0; break; \
+ case 0x ## op_n: ret = (cond) == 0; break;
+
+ X86_COND_OPCODES
+ #undef CASE
+ #undef XF
+
+ default:
+ BUG();
+ }
+
+ return ret;
+}
+
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
+ unsigned long disp = auprobe->ttt.disp;

if (ttt_is_call(auprobe)) {
unsigned long new_sp = regs->sp - sizeof_long();
if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
return false;
regs->sp = new_sp;
+ } else if (!ck_cond_opcode(auprobe, regs)) {
+ disp = 0;
}

- regs->ip = new_ip + auprobe->ttt.disp;
+ regs->ip = new_ip + disp;
return true;
}

@@ -534,8 +572,14 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)

case 0xe8: /* call relative */
ttt_clear_displacement(auprobe, insn);
+ /* fall through */
+ #define CASE(op_y, op_n, cond) \
+ case 0x ## op_y: case 0x ## op_n:
+ X86_COND_OPCODES
+ #undef CASE
auprobe->ttt.opc1 = opc1;
break;
+
default:
return -ENOSYS;
}
--
1.5.5.1

2014-04-06 20:17:32

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

It seems that 16bit conditional jmp is just
0x0f + short_jump_opcode_incremented by 0x10.

But I'll try to cleanup this patch...

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 18 +++++++++++++++---
1 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 797b8a4..8f92cbf 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -564,6 +564,19 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
if (WARN_ON_ONCE(!insn_complete(insn)))
return -ENOEXEC;

+#define CASE(op_y, op_n, cond) \
+ case 0x ## op_y: case 0x ## op_n:
+
+ if (insn->opcode.nbytes == 2 && opc1 == 0x0f) {
+ opc1 = OPCODE2(insn) - 0x10;
+
+ switch (opc1) {
+ X86_COND_OPCODES
+ default:
+ return -ENOSYS;
+ }
+ }
+
switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
@@ -573,10 +586,7 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
case 0xe8: /* call relative */
ttt_clear_displacement(auprobe, insn);
/* fall through */
- #define CASE(op_y, op_n, cond) \
- case 0x ## op_y: case 0x ## op_n:
X86_COND_OPCODES
- #undef CASE
auprobe->ttt.opc1 = opc1;
break;

@@ -584,6 +594,8 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
return -ENOSYS;
}

+#undef CASE
+
auprobe->ttt.ilen = insn->length;
auprobe->ttt.disp = insn->moffset1.value;
/* so far we assume that it fits into ->moffset1 */
--
1.5.5.1

2014-04-07 14:27:30

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

On 04/06, Oleg Nesterov wrote:
>
> Incomplete, lacks "jcxz". Simple to fix. Anything else?

Please see v2 below. Simplify the preprocessor hacks.

-------------------------------------------------------------------------------
Subject: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

Incomplete, lacks "jcxz". Simple to fix. Anything else?

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 61 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 9283024..3865d8b 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -466,18 +466,72 @@ static bool ttt_is_call(struct arch_uprobe *auprobe)
return auprobe->ttt.opc1 == 0xe8;
}

+#define CASE_COND \
+ COND(70, 71, XF(OF)) \
+ COND(72, 73, XF(CF)) \
+ COND(74, 75, XF(ZF)) \
+ COND(78, 79, XF(SF)) \
+ COND(7a, 7b, XF(PF)) \
+ COND(76, 77, XF(CF) || XF(ZF)) \
+ COND(7c, 7d, XF(SF) != XF(OF)) \
+ COND(7e, 7f, XF(ZF) || XF(SF) != XF(OF))
+
+#define COND(op_y, op_n, expr) \
+ case 0x ## op_y: DO((expr) != 0) \
+ case 0x ## op_n: DO((expr) == 0)
+
+#define XF(xf) (!!(flags & X86_EFLAGS_ ## xf))
+
+static bool is_cond_jmp_opcode(u8 opcode)
+{
+ switch (opcode) {
+ #define DO(expr) \
+ return true;
+ CASE_COND
+ #undef DO
+
+ default:
+ return false;
+ }
+}
+
+static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ unsigned long flags = regs->flags;
+
+ switch (auprobe->ttt.opc1) {
+ case 0x00: /* not a conditional jmp */
+ return true;
+
+ #define DO(expr) \
+ return expr;
+ CASE_COND
+ #undef DO
+
+ default:
+ BUG();
+ }
+}
+
+#undef XF
+#undef COND
+#undef CASE_COND
+
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
+ unsigned long disp = auprobe->ttt.disp;

if (ttt_is_call(auprobe)) {
unsigned long new_sp = regs->sp - sizeof_long();
if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
return false;
regs->sp = new_sp;
+ } else if (!check_jmp_cond(auprobe, regs)) {
+ disp = 0;
}

- regs->ip = new_ip + auprobe->ttt.disp;
+ regs->ip = new_ip + disp;
return true;
}

@@ -536,8 +590,11 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
ttt_clear_displacement(auprobe, insn);
auprobe->ttt.opc1 = opc1;
break;
+
default:
- return -ENOSYS;
+ if (!is_cond_jmp_opcode(opc1))
+ return -ENOSYS;
+ auprobe->ttt.opc1 = opc1;
}

auprobe->ttt.ilen = insn->length;
--
1.5.5.1

2014-04-07 14:28:28

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

On 04/06, Oleg Nesterov wrote:
>
> But I'll try to cleanup this patch...

See v2 below.

-------------------------------------------------------------------------------
Subject: [RFC PATCH v2 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

It seems that 16bit condi jmp is just 0x0f + short_jump_opc_plus_0x10.

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 3865d8b..dae02f9 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -591,6 +591,10 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
auprobe->ttt.opc1 = opc1;
break;

+ case 0x0f:
+ if (insn->opcode.nbytes != 2)
+ return -ENOSYS;
+ opc1 = OPCODE2(insn) - 0x10;
default:
if (!is_cond_jmp_opcode(opc1))
return -ENOSYS;
--
1.5.5.1

2014-04-07 16:41:51

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

On 04/07, Oleg Nesterov wrote:
>
> > Incomplete, lacks "jcxz". Simple to fix. Anything else?
>
> Please see v2 below. Simplify the preprocessor hacks.

Finally some testing. This all even seems to work... Although so far
I only tested jz/jnz 8/32bit with this test-case:

asm (
".text\n"
".globl test_0\n"
"test_0:\n"

"movq $0, %rax\n"
"cmpq $0, %rax\n"

".globl t_0_n; t_0_n:\n"
"jnz 1f\n"

".globl t_0_y; t_0_y:\n"
"jz 2f\n"

"1: ud2\n"
"2: retq\n"
);

asm (
".text\n"
".globl test_1\n"
"test_1:\n"

"movq $1, %rax\n"
"cmpq $0, %rax\n"

".globl t_1_y; t_1_y:\n"
"jz 1f\n"

".globl t_1_n; t_1_n:\n"
"jnz 2f\n"

".org . + 1024\n"

"1: ud2\n"
"2: retq\n"
);

extern void test_0(void), test_1(void);

int main(void)
{
test_0();
test_1();
return 0;
}

it runs fine with t_{0,1}_{y,} probed.

As for "jcxz" support, it must be simple but I am still googling for
"jcxz for dummies". Will do tomorrow, probably I'll make a separate patch
for this to simlify the review.

Oleg.

2014-04-07 18:54:21

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 0/6] uprobes/x86: fix the reprel jmp/call handling

On Sun, 2014-04-06 at 22:15 +0200, Oleg Nesterov wrote:
> On 04/04, Jim Keniston wrote:
> >
> > On Fri, 2014-04-04 at 21:32 +0200, Oleg Nesterov wrote:
> > >
> > > 1. Why insn_get_displacement() doesn't work? See "HELP!!!"
> > > below.
> >
> > insn->moffset1.value seems to be what you want.
>
> Works! Thanks a lot.
>
> Still I can't understand why displacement.nbytes == 0 in this case...
> Nevermind.

Looking at Masami's arch/x86/lib/x86-opcode-map.txt and related code, I
see that the operands to the Jcc and JMP instructions are treated as
immediate values. So insn->immediate.value (which is in the same union
as insn->moffset1.value) is more appropriate, and insn->immediate.nbytes
should get you the correct size. Again, insn_get_length() finishes
parsing the whole instruction as necessary, so insn_get_immediate() gets
called as a side effect.

>
> OK. Please see the RFC changes. Obviously not for inclusion yet. And
> totally untested, except I verified that the test-case from 4/6 works.
>
> Please comment.

I'll look at the new patches today.

>
> Oleg.

Jim

2014-04-07 20:34:32

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()

On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> 1. Add the trivial sizeof_long() helper and change other callers of
> is_ia32_task() to use it.
>
...

This hunk #3 doesn't apply for me. I can't find in your patch sets
where you added the lines being replaced (and they weren't there
originally).

After I fixed up this hunk, this patch and the rest applied OK.

> @@ -450,10 +448,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
>
> if (auprobe->fixups & UPROBE_FIX_CALL) {
> if (adjust_ret_addr(regs->sp, correction)) {
> - if (is_ia32_task())
> - regs->sp += 4;
> - else
> - regs->sp += 8;
> + regs->sp += sizeof_long();
> return -ERESTART;
> }
> }
> @@ -738,23 +733,21 @@ if (ret) pr_crit("EMULATE: %lx -> %lx\n", ip, regs->ip);
...

This modified hunk worked for me.
@@ -450,7 +448,9 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs

if (auprobe->fixups & UPROBE_FIX_CALL) {
- if (adjust_ret_addr(regs->sp, correction))
+ if (adjust_ret_addr(regs->sp, correction)) {
+ regs->sp += sizeof_long();
return -ERESTART;
+ }
}

return 0;

Jim

2014-04-07 20:42:56

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 3/6] uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()

On Mon, 2014-04-07 at 13:34 -0700, Jim Keniston wrote:
> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> > 1. Add the trivial sizeof_long() helper and change other callers of
> > is_ia32_task() to use it.
> >
> ...
>
> This hunk #3 doesn't apply for me. I can't find in your patch sets
> where you added the lines being replaced (and they weren't there
> originally).

False alarm, I think. I didn't have your patch #9 v3 from Saturday.
Jim

>
> After I fixed up this hunk, this patch and the rest applied OK.
>
> > @@ -450,10 +448,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
> >
> > if (auprobe->fixups & UPROBE_FIX_CALL) {
> > if (adjust_ret_addr(regs->sp, correction)) {
> > - if (is_ia32_task())
> > - regs->sp += 4;
> > - else
> > - regs->sp += 8;
> > + regs->sp += sizeof_long();
> > return -ERESTART;
> > }
> > }
> > @@ -738,23 +733,21 @@ if (ret) pr_crit("EMULATE: %lx -> %lx\n", ip, regs->ip);
> ...
>
> This modified hunk worked for me.
> @@ -450,7 +448,9 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs
>
> if (auprobe->fixups & UPROBE_FIX_CALL) {
> - if (adjust_ret_addr(regs->sp, correction))
> + if (adjust_ret_addr(regs->sp, correction)) {
> + regs->sp += sizeof_long();
> return -ERESTART;
> + }
> }
>
> return 0;
>
> Jim

Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

(2014/04/05 3:51), Oleg Nesterov wrote:
> Introduce arch_uprobe->ops pointing to the "struct uprobe_xol_ops",
> move the current UPROBE_FIX_{RIP*,IP,CALL} code into the default
> set of methods and change arch_uprobe_pre/post_xol() accordingly.
>
> This way we can add the new uprobe_xol_ops's to handle the insns
> which need the special processing (rip-relative jmp/call at least).
>
> TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
> we should teach arch_uprobe_post_xol() or handle_singlestep() paths
> to restart the probed insn in this case. And probably "adjust" can
> be simplified and turned into set_ret_addr(). It seems that we do
> not really need copy_from_user(), we can always calculate the value
> we need to write into *regs->sp.

It seems that you fixed this in 8/9, we don't need the TODO list in
the description.

>
> Signed-off-by: Oleg Nesterov <[email protected]>
> Reviewed-by: Jim Keniston <[email protected]>
> ---
> arch/x86/include/asm/uprobes.h | 7 ++-
> arch/x86/kernel/uprobes.c | 107 +++++++++++++++++++++++++--------------
> 2 files changed, 74 insertions(+), 40 deletions(-)
>
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 3087ea9..9f8210b 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -33,12 +33,17 @@ typedef u8 uprobe_opcode_t;
> #define UPROBE_SWBP_INSN 0xcc
> #define UPROBE_SWBP_INSN_SIZE 1
>
> +struct uprobe_xol_ops;
> +
> struct arch_uprobe {
> - u16 fixups;
> union {
> u8 insn[MAX_UINSN_BYTES];
> u8 ixol[MAX_UINSN_BYTES];
> };
> +
> + u16 fixups;
> + const struct uprobe_xol_ops *ops;
> +
> #ifdef CONFIG_X86_64
> unsigned long rip_rela_target_address;
> #endif
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 3bb4198..13ad8a3 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -402,6 +402,64 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
> }
> #endif /* CONFIG_X86_64 */
>
> +struct uprobe_xol_ops {
> + bool (*emulate)(struct arch_uprobe *, struct pt_regs *);
> + int (*pre_xol)(struct arch_uprobe *, struct pt_regs *);
> + int (*post_xol)(struct arch_uprobe *, struct pt_regs *);
> +};
> +
> +static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
> + return 0;
> +}
> +
> +/*
> + * Adjust the return address pushed by a call insn executed out of line.
> + */
> +static int adjust_ret_addr(unsigned long sp, long correction)
> +{
> + int rasize, ncopied;
> + long ra = 0;
> +
> + if (is_ia32_task())
> + rasize = 4;
> + else
> + rasize = 8;
> +
> + ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
> + if (unlikely(ncopied))
> + return -EFAULT;
> +
> + ra += correction;
> + ncopied = copy_to_user((void __user *)sp, &ra, rasize);
> + if (unlikely(ncopied))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> +static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + struct uprobe_task *utask = current->utask;
> + long correction = (long)(utask->vaddr - utask->xol_vaddr);
> + int ret = 0;
> +
> + handle_riprel_post_xol(auprobe, regs, &correction);
> + if (auprobe->fixups & UPROBE_FIX_IP)
> + regs->ip += correction;
> +
> + if (auprobe->fixups & UPROBE_FIX_CALL)
> + ret = adjust_ret_addr(regs->sp, correction);
> +
> + return ret;
> +}
> +
> +static struct uprobe_xol_ops default_xol_ops = {
> + .pre_xol = default_pre_xol_op,
> + .post_xol = default_post_xol_op,
> +};
> +
> /**
> * arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
> * @mm: the probed address space.
> @@ -464,6 +522,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> if (fix_call)
> auprobe->fixups |= UPROBE_FIX_CALL;
>
> + auprobe->ops = &default_xol_ops;
> return 0;
> }
>
> @@ -485,33 +544,8 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> if (test_tsk_thread_flag(current, TIF_BLOCKSTEP))
> set_task_blockstep(current, false);
>
> - pre_xol_rip_insn(auprobe, regs, &utask->autask);
> - return 0;
> -}
> -
> -/*
> - * This function is called by arch_uprobe_post_xol() to adjust the return
> - * address pushed by a call instruction executed out of line.
> - */
> -static int adjust_ret_addr(unsigned long sp, long correction)
> -{
> - int rasize, ncopied;
> - long ra = 0;
> -
> - if (is_ia32_task())
> - rasize = 4;
> - else
> - rasize = 8;
> -
> - ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
> - if (unlikely(ncopied))
> - return -EFAULT;
> -
> - ra += correction;
> - ncopied = copy_to_user((void __user *)sp, &ra, rasize);
> - if (unlikely(ncopied))
> - return -EFAULT;
> -
> + if (auprobe->ops->pre_xol)
> + return auprobe->ops->pre_xol(auprobe, regs);
> return 0;
> }
>
> @@ -560,11 +594,8 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
> int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> struct uprobe_task *utask = current->utask;
> - long correction;
> - int result = 0;
>
> WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR);
> -
> current->thread.trap_nr = utask->autask.saved_trap_nr;
> /*
> * arch_uprobe_pre_xol() doesn't save the state of TIF_BLOCKSTEP
> @@ -576,15 +607,9 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
> else if (!(auprobe->fixups & UPROBE_FIX_SETF))
> regs->flags &= ~X86_EFLAGS_TF;
>
> - correction = (long)(utask->vaddr - utask->xol_vaddr);
> - handle_riprel_post_xol(auprobe, regs, &correction);
> - if (auprobe->fixups & UPROBE_FIX_IP)
> - regs->ip += correction;
> -
> - if (auprobe->fixups & UPROBE_FIX_CALL)
> - result = adjust_ret_addr(regs->sp, correction);
> -
> - return result;
> + if (auprobe->ops->post_xol)
> + return auprobe->ops->post_xol(auprobe, regs);
> + return 0;
> }
>
> /* callback routine for handling exceptions. */
> @@ -642,6 +667,10 @@ static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> int i;
>
> + if (auprobe->ops->emulate)
> + return auprobe->ops->emulate(auprobe, regs);
> +
> + /* TODO: move this code into ->emulate() hook */

If you think this can move into the emulate(), you should do in this
patch. Since the following code runs by default, there should be
no problem to do that.

> for (i = 0; i < MAX_UINSN_BYTES; i++) {
> if (auprobe->insn[i] == 0x66)
> continue;
>

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH 0/6] uprobes/x86: fix the reprel jmp/call handling

(2014/04/07 5:15), Oleg Nesterov wrote:
> On 04/04, Jim Keniston wrote:
>>
>> On Fri, 2014-04-04 at 21:32 +0200, Oleg Nesterov wrote:
>>>
>>> 1. Why insn_get_displacement() doesn't work? See "HELP!!!"
>>> below.
>>
>> insn->moffset1.value seems to be what you want.
>
> Works! Thanks a lot.
>
> Still I can't understand why displacement.nbytes == 0 in this case...
> Nevermind.

I guess that you misunderstanding what the displacement means.
insn->displacement means what x86 instruction encoding as
"displacement" bytes, which is an extension of addressing mode.

The relative Jumps are JMP(near) which operand is Jz,
and JMP(short) which operand is Jb, according to the Intel SDM 2b
appendix A2.

According to SDM2b A.2.1 and A.2.2, the J means
----
The instruction contains a relative offset to be added to the instruction
pointer register (for example, JMP (0E9), LOOP).
----

and z and b means
----
b Byte, regardless of operand-size attribute.
z Word for 16-bit operand-size or doubleword for 32 or 64-bit operand-size.
----
Ok, so these are have one immediate operand which has 1,2(word),
4(doubleword) bytes.
In that case, you should use insn->immediate, instead of insn->moffset1 which
is only for MOV(A0-A3) on x86-64. (please see Intel SDM2a 2.2.1.4)

Thank you,

> OK. Please see the RFC changes. Obviously not for inclusion yet. And
> totally untested, except I verified that the test-case from 4/6 works.
>
> Please comment.
>
> Oleg.
>
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-08 16:12:48

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

On 04/08, Masami Hiramatsu wrote:
>
> (2014/04/05 3:51), Oleg Nesterov wrote:
> >
> > TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
> > we should teach arch_uprobe_post_xol() or handle_singlestep() paths
> > to restart the probed insn in this case. And probably "adjust" can
> > be simplified and turned into set_ret_addr(). It seems that we do
> > not really need copy_from_user(), we can always calculate the value
> > we need to write into *regs->sp.
>
> It seems that you fixed this in 8/9, we don't need the TODO list in
> the description.

Well, OK, I'll update the changelog and remove the "error ... ignored"
part. Although to be honest, I do not understand why do you think it
is bad to document the other problems you found while you were writing
the patch.

> > + if (auprobe->ops->emulate)
> > + return auprobe->ops->emulate(auprobe, regs);
> > +
> > + /* TODO: move this code into ->emulate() hook */
>
> If you think this can move into the emulate(),

Yes sure,

> you should do in this
> patch.

No, sorry, I strongly disagree, this should come as a separate change,
and only after "Emulate jmp's".

> Since the following code runs by default, there should be
> no problem to do that.

Hmm. Not sure I understand "by default". If you meant that this should
go into default_emulate_op() (which we do not have), then I strongly
disagree again.

It should not, if nothing else we need to record insn->length somewhere,
this should go into ttt_emulate_op() we add later. And it simply looks
much more natural to handle jmp's and nop's together.

Masami, this time I simply can't understand your objections, please
clarify.

Thanks,

Oleg.

2014-04-08 16:28:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 0/6] uprobes/x86: fix the reprel jmp/call handling

On 04/08, Masami Hiramatsu wrote:
>
> (2014/04/07 5:15), Oleg Nesterov wrote:
> >
> > Still I can't understand why displacement.nbytes == 0 in this case...
> > Nevermind.
>
> I guess that you misunderstanding what the displacement means.

You are kidding ;) It is not that I misunderstood ->displacement in
particular, so far I understand almost nothing in this area!

> insn->displacement means
> [... snip ...]

Thanks a lot for your explanation,

> In that case, you should use insn->immediate, instead of insn->moffset1

OK... bu I'm afraid I'll ask a stupid question before I update this
series accordinly.

Thanks.

Oleg.

2014-04-08 18:11:01

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

On 04/08, Oleg Nesterov wrote:
>
> On 04/08, Masami Hiramatsu wrote:
> >
> > (2014/04/05 3:51), Oleg Nesterov wrote:
> > >
> > > TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
> > > we should teach arch_uprobe_post_xol() or handle_singlestep() paths
> > > to restart the probed insn in this case. And probably "adjust" can
> > > be simplified and turned into set_ret_addr(). It seems that we do
> > > not really need copy_from_user(), we can always calculate the value
> > > we need to write into *regs->sp.
> >
> > It seems that you fixed this in 8/9, we don't need the TODO list in
> > the description.
>
> Well, OK, I'll update the changelog and remove the "error ... ignored"
> part. Although to be honest, I do not understand why do you think it
> is bad to document the other problems you found while you were writing
> the patch.

I'll remove it altogether. But I'll update the changelog in
"Introduce sizeof_long() ..." from the next series to explain why I think
adjust_ret_addr() must die.

Oleg.

2014-04-08 19:26:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 0/6] uprobes/x86: fix the reprel jmp/call handling

On 04/08, Oleg Nesterov wrote:
>
> OK... bu I'm afraid I'll ask a stupid question before I update this
> series accordinly.

And I guess I should also use insn_offset_immediate() in ttt_clear_displacement().
Which should be renamed, but I have no idea how.

OK. Unless I am totally confused (very possible) the necessary changes are
trivial. I do not want to spam lkml, so let me just show the cumulative diff
(1/6 and 4/6 should be trivially updated).

Anything else I missed?

Lets ignore j*cxz. I tried to read the intel docs and it seems that this
insn is always rel8, so we do not need to emulate it to fix the problem.
But I'll make the "Emulate j*cxz" later anyway, just for completeness.

Oleg.

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index dae02f9..f0a8afa 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -562,8 +562,8 @@ static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *ins
* divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
* of ->insn[] for set_orig_insn().
*/
- memset(auprobe->insn + insn_offset_displacement(insn),
- 0, insn->moffset1.nbytes);
+ memset(auprobe->insn + insn_offset_immediate(insn),
+ 0, insn->immediate.nbytes);
}

static struct uprobe_xol_ops ttt_xol_ops = {
@@ -602,10 +602,7 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
}

auprobe->ttt.ilen = insn->length;
- auprobe->ttt.disp = insn->moffset1.value;
- /* so far we assume that it fits into ->moffset1 */
- if (WARN_ON_ONCE(insn->moffset2.nbytes))
- return -ENOEXEC;
+ auprobe->ttt.disp = insn->immediate.value;

auprobe->ops = &ttt_xol_ops;
return 0;

2014-04-08 20:36:58

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 1/6] uprobes/x86: Emulate unconditional rip-relative jmp's

On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> 0xeb and 0xe9. Anything else?

For unconditional rip-relative jumps, yes, I'm pretty sure that's it.

>
> TODO: move ->fixup into the union along with rip_rela_target_address.
>
> Reported-by: Jonathan Lebon <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> arch/x86/include/asm/uprobes.h | 8 +++++++-
> arch/x86/kernel/uprobes.c | 40 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index 9f8210b..cca62c5 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -44,9 +44,15 @@ struct arch_uprobe {
> u16 fixups;
> const struct uprobe_xol_ops *ops;
>
> + union {
> #ifdef CONFIG_X86_64
> - unsigned long rip_rela_target_address;
> + unsigned long rip_rela_target_address;
> #endif
> + struct {
> + s32 disp;
> + u8 ilen;
> + } ttt;

Are you planning to stick with ttt as the name/prefix for all this
jump-emulation code? Seems like you could come up with something more
descriptive without adding a lot of characters.

> + };
> };
>
> struct arch_uprobe_task {
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 1ea7e1a..32ab147 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -466,6 +466,42 @@ static struct uprobe_xol_ops default_xol_ops = {
> .post_xol = default_post_xol_op,
> };
>
> +static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
> + return true;
> +}
> +
> +static struct uprobe_xol_ops ttt_xol_ops = {
> + .emulate = ttt_emulate_op,
> +};
> +
> +static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> +{
> +
> + switch (OPCODE1(insn)) {
> + case 0xeb: /* jmp 8 */
> + case 0xe9: /* jmp 32 */
> + break;
> + default:
> + return -ENOSYS;

-ENOSYS looks like an error return, as opposed to what it is, the normal
return when the probed instruction is something other than a jump. This
gets more bewildering as you add patches and this switch grows and gets
more complicated. Add a comment here?

> + }
> +
> + /* has the side-effect of processing the entire instruction */
> + insn_get_length(insn);
> + if (WARN_ON_ONCE(!insn_complete(insn)))
> + return -ENOEXEC;
> +
> + auprobe->ttt.ilen = insn->length;
> + auprobe->ttt.disp = insn->moffset1.value;
> + /* so far we assume that it fits into ->moffset1 */
> + if (WARN_ON_ONCE(insn->moffset2.nbytes))
> + return -ENOEXEC;

s/moffset1/immediate/ -- which you've already addressed.

> +
> + auprobe->ops = &ttt_xol_ops;
> + return 0;
> +}
> +
> /**
> * arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
> * @mm: the probed address space.
> @@ -483,6 +519,10 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> if (ret)
> return ret;
>
> + ret = ttt_setup_xol_ops(auprobe, &insn);
> + if (ret == 0 || ret != ENOSYS)

This looks wrong in a couple of ways:
a. I think you intend to compare against -ENOSYS, not ENOSYS.
b. Given the (ret != [-]ENOSYS) test, the (ret == 0) test is
superfluous.


> + return ret;
> +
> /*
> * Figure out which fixups arch_uprobe_post_xol() will need to perform,
> * and annotate arch_uprobe->fixups accordingly. To start with, ->fixups

Jim

2014-04-08 22:26:12

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> 0xe8. Anything else?

No, I think e8 is the only call instruction uprobes will see.

>

The following couple of paragraphs should be included in the code,
perhaps merged with some of the related comments already there. Without
it, the code is pretty hard to follow.

> Emulating of rip-relative call is trivial, we only need to additionally
> push the ret-address. If this fails, we execute this instruction out of
> line and this should trigger the trap, the probed application should die
> or the same insn will be restarted if a signal handler expands the stack.
> We do not even need ->post_xol() for this case.
>
> But there is a corner (and almost theoretical) case: another thread can
> expand the stack right before we execute this insn out of line. In this
> case it hit the same problem we are trying to solve. So we simply turn
> the probed insn into "call 1f; 1:" and add ->post_xol() which restores
> ->sp and restarts.

Can you explain under what circumstances emulation of the call
instruction would fail? Will the copy_to_user() call that writes the
return address to the stack grow the stack if necessary? Will
single-stepping the mangled call instruction expand the stack?

>
> Many thanks to Jonathan who finally found the standalone reproducer,
...
> Reported-by: Jonathan Lebon <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> arch/x86/include/asm/uprobes.h | 1 +
> arch/x86/kernel/uprobes.c | 69 ++++++++++++++++++++++++++++++++++------
> 2 files changed, 60 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
> index cca62c5..9528117 100644
> --- a/arch/x86/include/asm/uprobes.h
> +++ b/arch/x86/include/asm/uprobes.h
> @@ -51,6 +51,7 @@ struct arch_uprobe {
> struct {
> s32 disp;
> u8 ilen;
> + u8 opc1;
> } ttt;
> };
> };
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 3bcc121..9283024 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -461,33 +461,85 @@ static struct uprobe_xol_ops default_xol_ops = {
> .post_xol = default_post_xol_op,
> };
>
> +static bool ttt_is_call(struct arch_uprobe *auprobe)
> +{
> + return auprobe->ttt.opc1 == 0xe8;
> +}
> +
> static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> - regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
> + unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
> +
> + if (ttt_is_call(auprobe)) {
> + unsigned long new_sp = regs->sp - sizeof_long();
> + if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
> + return false;
> + regs->sp = new_sp;
> + }
> +
> + regs->ip = new_ip + auprobe->ttt.disp;
> return true;
> }
>
> +static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + BUG_ON(!ttt_is_call(auprobe));
> + /*
> + * We can only get here if ttt_emulate_op() failed to push the return
> + * address _and_ another thread expanded our stack before the (mangled)
> + * "call" insn was executed out-of-line. Just restore ->sp and restart.
> + * We could also restore ->ip and try to call ttt_emulate_op() again.
> + */

As I understand it, this comment assumes that the single-stepped call
instruction can't grow the stack on its own, and will instead die with a
SEGV or some such. As noted above, I don't know if that's correct. (If
the single-stepped call doesn't die and doesn't grow the stack -- I'm
not sure how that would happen -- then it seems like we have an infinite
loop.)

> + regs->sp += sizeof_long();
> + return -ERESTART;
> +}
> +
> +static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
> +{
> + /*
> + * Turn this insn into "call 1f; 1:", this is what we will execute
> + * out-of-line if ->emulate() fails.

Add comment: In the unlikely event that this mangled instruction is
successfully single-stepped, we'll reset regs->ip to the beginning of
the instruction so the probepoint is hit again and the original
instruction is emulated (successfully this time, we assume).

> + *
> + * In the likely case this will lead to arch_uprobe_abort_xol(), but
> + * see the comment in ->emulate(). So we need to ensure that the new
> + * ->ip can't fall into non-canonical area and trigger #GP.
> + *
> + * We could turn it into (say) "pushf", but then we would need to
> + * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
> + * of ->insn[] for set_orig_insn().
> + */
> + memset(auprobe->insn + insn_offset_displacement(insn),
> + 0, insn->moffset1.nbytes);

s/displacement/immediate/
s/moffset1/immediate/
Both already fixed in today's patch.

> +}
> +
> static struct uprobe_xol_ops ttt_xol_ops = {
> .emulate = ttt_emulate_op,
> + .post_xol = ttt_post_xol_op,
> };
>
> static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> {
> + u8 opc1 = OPCODE1(insn);
> +
> + /* has the side-effect of processing the entire instruction */
> + insn_get_length(insn);
> + if (WARN_ON_ONCE(!insn_complete(insn)))
> + return -ENOEXEC;
>
> - switch (OPCODE1(insn)) {
> + switch (opc1) {
> case 0xeb: /* jmp 8 */
> case 0xe9: /* jmp 32 */
> case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
> break;
> +
> + case 0xe8: /* call relative */
> + ttt_clear_displacement(auprobe, insn);
> + auprobe->ttt.opc1 = opc1;

You set ttt.opc1 for call, and later for conditional jumps, but not for
nop and unconditional jumps. Might confuse a maintainer down the road?

> + break;
> default:
> return -ENOSYS;
> }
>
> - /* has the side-effect of processing the entire instruction */
> - insn_get_length(insn);
> - if (WARN_ON_ONCE(!insn_complete(insn)))
> - return -ENOEXEC;
> -
> auprobe->ttt.ilen = insn->length;
> auprobe->ttt.disp = insn->moffset1.value;
> /* so far we assume that it fits into ->moffset1 */
> @@ -534,9 +586,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> case 0xca:
> fix_ip = false;
> break;
> - case 0xe8: /* call relative - Fix return addr */
> - fix_call = true;
> - break;
> case 0x9a: /* call absolute - Fix return addr, not ip */
> fix_call = true;
> fix_ip = false;

2014-04-08 22:53:15

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

On Mon, 2014-04-07 at 16:27 +0200, Oleg Nesterov wrote:
> On 04/06, Oleg Nesterov wrote:
> >
> > Incomplete, lacks "jcxz". Simple to fix. Anything else?
>
> Please see v2 below. Simplify the preprocessor hacks.
>
> -------------------------------------------------------------------------------
> Subject: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's
>
> Incomplete, lacks "jcxz". Simple to fix. Anything else?
>
> Reported-by: Jonathan Lebon <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> arch/x86/kernel/uprobes.c | 61 +++++++++++++++++++++++++++++++++++++++++++-
> 1 files changed, 59 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 9283024..3865d8b 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -466,18 +466,72 @@ static bool ttt_is_call(struct arch_uprobe *auprobe)
> return auprobe->ttt.opc1 == 0xe8;
> }
>
> +#define CASE_COND \
> + COND(70, 71, XF(OF)) \
> + COND(72, 73, XF(CF)) \
> + COND(74, 75, XF(ZF)) \
> + COND(78, 79, XF(SF)) \
> + COND(7a, 7b, XF(PF)) \
> + COND(76, 77, XF(CF) || XF(ZF)) \
> + COND(7c, 7d, XF(SF) != XF(OF)) \
> + COND(7e, 7f, XF(ZF) || XF(SF) != XF(OF))
> +
> +#define COND(op_y, op_n, expr) \
> + case 0x ## op_y: DO((expr) != 0) \
> + case 0x ## op_n: DO((expr) == 0)
> +
> +#define XF(xf) (!!(flags & X86_EFLAGS_ ## xf))

All this macro magic seems way more clever than it is legible.

Given that you're mapping 0f 8x to 7x (patch #6), is_cond_jmp_opcode()
could just be
return (0x70 <= opcode && opcode <= 0x7f);

I would keep the XF macro (although the !! operation to convert non-zero
to 1 isn't strictly needed) and just do an explicit 16-case switch for
check_jmp_cond().

> +
> +static bool is_cond_jmp_opcode(u8 opcode)
> +{
> + switch (opcode) {
> + #define DO(expr) \
> + return true;
> + CASE_COND
> + #undef DO
> +
> + default:
> + return false;
> + }
> +}
> +
> +static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs)
> +{
> + unsigned long flags = regs->flags;
> +
> + switch (auprobe->ttt.opc1) {
> + case 0x00: /* not a conditional jmp */
> + return true;
> +
> + #define DO(expr) \
> + return expr;
> + CASE_COND
> + #undef DO
> +
> + default:
> + BUG();
> + }
> +}
> +
> +#undef XF
> +#undef COND
> +#undef CASE_COND
> +
> static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> {
> unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
> + unsigned long disp = auprobe->ttt.disp;

Looks like a negative ttt.disp will get sign-extended like you want, but
still, making disp unsigned here doesn't seem quite right.

>
> if (ttt_is_call(auprobe)) {
> unsigned long new_sp = regs->sp - sizeof_long();
> if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
> return false;
> regs->sp = new_sp;
> + } else if (!check_jmp_cond(auprobe, regs)) {
> + disp = 0;
> }
>
> - regs->ip = new_ip + auprobe->ttt.disp;
> + regs->ip = new_ip + disp;
> return true;
> }
>
> @@ -536,8 +590,11 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> ttt_clear_displacement(auprobe, insn);
> auprobe->ttt.opc1 = opc1;
> break;
> +
> default:
> - return -ENOSYS;
> + if (!is_cond_jmp_opcode(opc1))
> + return -ENOSYS;
> + auprobe->ttt.opc1 = opc1;
> }
>
> auprobe->ttt.ilen = insn->length;

2014-04-08 23:07:24

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

On Mon, 2014-04-07 at 16:28 +0200, Oleg Nesterov wrote:
> On 04/06, Oleg Nesterov wrote:
> >
> > But I'll try to cleanup this patch...
>
> See v2 below.
>
> -------------------------------------------------------------------------------
> Subject: [RFC PATCH v2 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's
>
> It seems that 16bit condi jmp is just 0x0f + short_jump_opc_plus_0x10.

Yes, but the code could use a comment to that effect. See below.

Searching for "jump" in the AMD manual, I see that there are 3 other
instructions that are essentially conditional branches: loop,
loope/loopz, and loopne/loopnz. They decrement ecx/rcx and then
conditionally branch. The offset is always 8 bits.

BTW, patches 2 and 3 look fine to me.

>
> Reported-by: Jonathan Lebon <[email protected]>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> arch/x86/kernel/uprobes.c | 4 ++++
> 1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
> index 3865d8b..dae02f9 100644
> --- a/arch/x86/kernel/uprobes.c
> +++ b/arch/x86/kernel/uprobes.c
> @@ -591,6 +591,10 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> auprobe->ttt.opc1 = opc1;
> break;
>
> + case 0x0f:
> + if (insn->opcode.nbytes != 2)
> + return -ENOSYS;
/*
* Map 0f 8x (Jcc with 32-bit displacement) to 7x
* (Jcc with 8-bit displacement). insn lib maps both
* to 32 bits.
*/
> + opc1 = OPCODE2(insn) - 0x10;
> default:
> if (!is_cond_jmp_opcode(opc1))
> return -ENOSYS;

Jim

Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

(2014/04/09 1:10), Oleg Nesterov wrote:
> On 04/08, Masami Hiramatsu wrote:
>>
>> (2014/04/05 3:51), Oleg Nesterov wrote:
>>>
>>> TODO: An error from adjust_ret_addr() shouldn't be silently ignored,
>>> we should teach arch_uprobe_post_xol() or handle_singlestep() paths
>>> to restart the probed insn in this case. And probably "adjust" can
>>> be simplified and turned into set_ret_addr(). It seems that we do
>>> not really need copy_from_user(), we can always calculate the value
>>> we need to write into *regs->sp.
>>
>> It seems that you fixed this in 8/9, we don't need the TODO list in
>> the description.
>
> Well, OK, I'll update the changelog and remove the "error ... ignored"
> part. Although to be honest, I do not understand why do you think it
> is bad to document the other problems you found while you were writing
> the patch.

Because you know how to fix that and you just can do that in following
patches :). In that case, you don't need to state it here.


>>> + if (auprobe->ops->emulate)
>>> + return auprobe->ops->emulate(auprobe, regs);
>>> +
>>> + /* TODO: move this code into ->emulate() hook */
>>
>> If you think this can move into the emulate(),
>
> Yes sure,
>
>> you should do in this
>> patch.
>
> No, sorry, I strongly disagree, this should come as a separate change,
> and only after "Emulate jmp's".

Ah, I see, with your RFC series. :)

>> Since the following code runs by default, there should be
>> no problem to do that.
>
> Hmm. Not sure I understand "by default".

I meant that the auprobe->ops->emulate() is always skipped and the
old code is always run, since the default_emulate_op() is NULL at
this point.

> If you meant that this should
> go into default_emulate_op() (which we do not have), then I strongly
> disagree again.
>
> It should not, if nothing else we need to record insn->length somewhere,
> this should go into ttt_emulate_op() we add later. And it simply looks
> much more natural to handle jmp's and nop's together.

I see, OK with ttt_emulate_op() series.

Reviewed-by: Masami Hiramatsu <[email protected]>

Thank you :)

>
> Masami, this time I simply can't understand your objections, please
> clarify.
>
> Thanks,
>
> Oleg.
>
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-09 14:47:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 1/6] uprobes/x86: Emulate unconditional rip-relative jmp's

On 04/08, Jim Keniston wrote:
>
> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> > 0xeb and 0xe9. Anything else?
>
> For unconditional rip-relative jumps, yes, I'm pretty sure that's it.

Great, thanks.

> > --- a/arch/x86/include/asm/uprobes.h
> > +++ b/arch/x86/include/asm/uprobes.h
> > @@ -44,9 +44,15 @@ struct arch_uprobe {
> > u16 fixups;
> > const struct uprobe_xol_ops *ops;
> >
> > + union {
> > #ifdef CONFIG_X86_64
> > - unsigned long rip_rela_target_address;
> > + unsigned long rip_rela_target_address;
> > #endif
> > + struct {
> > + s32 disp;
> > + u8 ilen;
> > + } ttt;
>
> Are you planning to stick with ttt as the name/prefix for all this
> jump-emulation code? Seems like you could come up with something more
> descriptive without adding a lot of characters.

Yes sure. How about s/ttt/branch/ ? I agree with any naming. I used
"ttt" because it allows me to change the naming in one step.

> > +static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > +{
> > +
> > + switch (OPCODE1(insn)) {
> > + case 0xeb: /* jmp 8 */
> > + case 0xe9: /* jmp 32 */
> > + break;
> > + default:
> > + return -ENOSYS;
>
> -ENOSYS looks like an error return, as opposed to what it is, the normal
> return when the probed instruction is something other than a jump. This
> gets more bewildering as you add patches and this switch grows and gets
> more complicated. Add a comment here?

OK, I added a short comment above this function,

/* Returns -ENOSYS if ttt_xol_ops doesn't handle this insn */
static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
...

> > + /* has the side-effect of processing the entire instruction */
> > + insn_get_length(insn);
> > + if (WARN_ON_ONCE(!insn_complete(insn)))
> > + return -ENOEXEC;
> > +
> > + auprobe->ttt.ilen = insn->length;
> > + auprobe->ttt.disp = insn->moffset1.value;
> > + /* so far we assume that it fits into ->moffset1 */
> > + if (WARN_ON_ONCE(insn->moffset2.nbytes))
> > + return -ENOEXEC;
>
> s/moffset1/immediate/ -- which you've already addressed.

Yes, dons, and I removed the ->moffset2 check.

> > @@ -483,6 +519,10 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > if (ret)
> > return ret;
> >
> > + ret = ttt_setup_xol_ops(auprobe, &insn);
> > + if (ret == 0 || ret != ENOSYS)
>
> This looks wrong in a couple of ways:
> a. I think you intend to compare against -ENOSYS, not ENOSYS.

OOPS! fixed.

> b. Given the (ret != [-]ENOSYS) test, the (ret == 0) test is
> superfluous.

I thought that the additional "ret == 0" (removed by gcc anyway) could
help the code reader... But yes, you are right, probably it only adds
more confusion.

- if (ret == 0 || ret != ENOSYS)
+ if (ret != -ENOSYS)

Thanks!

Oleg.

2014-04-09 15:43:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/08, Jim Keniston wrote:
>
> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> > 0xe8. Anything else?
>
> No, I think e8 is the only call instruction uprobes will see.

Good.

> The following couple of paragraphs should be included in the code,
> perhaps merged with some of the related comments already there. Without
> it, the code is pretty hard to follow.

OK, I'll try to improve the comments somehow...

> > Emulating of rip-relative call is trivial, we only need to additionally
> > push the ret-address. If this fails, we execute this instruction out of
> > line and this should trigger the trap, the probed application should die
> > or the same insn will be restarted if a signal handler expands the stack.
> > We do not even need ->post_xol() for this case.
> >
> > But there is a corner (and almost theoretical) case: another thread can
> > expand the stack right before we execute this insn out of line. In this
> > case it hit the same problem we are trying to solve. So we simply turn
> > the probed insn into "call 1f; 1:" and add ->post_xol() which restores
> > ->sp and restarts.
>
> Can you explain under what circumstances emulation of the call
> instruction would fail? Will the copy_to_user() call that writes the
> return address to the stack grow the stack if necessary?

If necessary and if possible.

> Will
> single-stepping the mangled call instruction expand the stack?

In the likely case it won't. If copy_to_user() failes, then "call" should
fail too (again, ignoring the potential race with another thread which can
populate the memory ->sp - 8 points to).

> > +static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > +{
> > + BUG_ON(!ttt_is_call(auprobe));
> > + /*
> > + * We can only get here if ttt_emulate_op() failed to push the return
> > + * address _and_ another thread expanded our stack before the (mangled)
> > + * "call" insn was executed out-of-line. Just restore ->sp and restart.
> > + * We could also restore ->ip and try to call ttt_emulate_op() again.
> > + */
>
> As I understand it, this comment assumes that the single-stepped call
> instruction can't grow the stack on its own, and will instead die with a
> SEGV or some such.

Or the signal handler expands the stack.

> As noted above, I don't know if that's correct. (If
> the single-stepped call doesn't die and doesn't grow the stack -- I'm
> not sure how that would happen -- then it seems like we have an infinite
> loop.)

This is fine. And this doesn't differ from the current situation when we
do not try to emulate "call" but always execute it out-of-line. And this
is not even specific to "call" insn.

Please forget about uprobes for the moment. Consider this application:

void sigh(itn sig)
{
}

int main(void)
{
signal(SIGSEGV, sigh);
*(int *)NULL = 0;
return 0;
}

it will run the endelss "page_fault -> sighandler -> restart" loop. Nothing
bad, it can be killed, the application is wrong.

But note that sigh() can actually do, say, mmap(NULL, MAP_FIXED), in this
case the restarted insn will write to the memory and this app will exit.

Now suppose that this "*(int *)NULL = 0" insn is probed. Everything is fine
again. The kernel executes this "mov" insn out of line, this triggers the trap,
arch_uprobe_abort_xol() restores ->ip, the process returns to user-mode with
the pending SIGSEGV. If it doesn not have a handler for SIGSEGV - it dies.
If it does, it restarts the same insn after return from the signal handler.
Will the restarted insn succeed or we have an infinite loop? This, again,
depends on what the signal handler does, but an infinite loop is fine.

And this is how the "call" emulation should work. Suppose that ->emulate()
fails because we can't push the return address. It can run out of memory,
->sp can be corrupted, or RLIMIT_STACK doesn't allow to grow the stack, or
this stack is not MAP_GROWSDOWN and ->sp - 8 misses the mmaped area, or
whatever else. We simply do not care.

We execute this insn out-of-line just to trigger the trap, so that the probed
application recieves the correct signal with the properly filled siginfo.
After that it can restart the same insn if it has the handler or die. It can
run the endless loop if the signal handler does nothing useful or the next time
->emulate() will succeed because the stack was expanded by the signal handler.
Once again, this doesn't really differ from the example above (well, ignoring
the fact that in this case the probed app obviously needs sigaltstack).

The only difference is the corner case, the race with another thread which
can populate ->sp - 8 right before we execute the failed insn out-of-line.
So we turn it into "call 1f; 1:" to ensure it can't hit the same bug out-
of-line, and restart it.

See?

> > +static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
> > +{
> > + /*
> > + * Turn this insn into "call 1f; 1:", this is what we will execute
> > + * out-of-line if ->emulate() fails.
>
> Add comment: In the unlikely event that this mangled instruction is
> successfully single-stepped, we'll reset regs->ip to the beginning of
> the instruction so the probepoint is hit again and the original
> instruction is emulated (successfully this time, we assume).

OK, will try...

> > static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
> > {
> > + u8 opc1 = OPCODE1(insn);
> > +
> > + /* has the side-effect of processing the entire instruction */
> > + insn_get_length(insn);
> > + if (WARN_ON_ONCE(!insn_complete(insn)))
> > + return -ENOEXEC;
> >
> > - switch (OPCODE1(insn)) {
> > + switch (opc1) {
> > case 0xeb: /* jmp 8 */
> > case 0xe9: /* jmp 32 */
> > case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
> > break;
> > +
> > + case 0xe8: /* call relative */
> > + ttt_clear_displacement(auprobe, insn);
> > + auprobe->ttt.opc1 = opc1;
>
> You set ttt.opc1 for call, and later for conditional jumps, but not for
> nop and unconditional jumps. Might confuse a maintainer down the road?

Yes. This allows to have the fast-path "opc1 == 0" check in check_jmp_cond().
And this allows to add "BUG()" into check_jmp_cond ;)

OK, if you think this is confusing, I'll set ttt.opc1 unconditionally
and change check_jmp_cond() to simply return "true" in the "default"
case.

Oleg.

2014-04-09 16:42:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

On 04/08, Jim Keniston wrote:
>
> On Mon, 2014-04-07 at 16:27 +0200, Oleg Nesterov wrote:
> >
> > +#define CASE_COND \
> > + COND(70, 71, XF(OF)) \
> > + COND(72, 73, XF(CF)) \
> > + COND(74, 75, XF(ZF)) \
> > + COND(78, 79, XF(SF)) \
> > + COND(7a, 7b, XF(PF)) \
> > + COND(76, 77, XF(CF) || XF(ZF)) \
> > + COND(7c, 7d, XF(SF) != XF(OF)) \
> > + COND(7e, 7f, XF(ZF) || XF(SF) != XF(OF))
> > +
> > +#define COND(op_y, op_n, expr) \
> > + case 0x ## op_y: DO((expr) != 0) \
> > + case 0x ## op_n: DO((expr) == 0)
> > +
> > +#define XF(xf) (!!(flags & X86_EFLAGS_ ## xf))
>
> All this macro magic seems way more clever than it is legible.

No-no-no, please do not ask me to remove it ;) I understand that this
is subjective, but to me it helps. Please see below.

> Given that you're mapping 0f 8x to 7x (patch #6), is_cond_jmp_opcode()
> could just be
> return (0x70 <= opcode && opcode <= 0x7f);

Heh. I blindly copied the opcodes from the manual, I didn't even realize
that the range is continuous.

But gcc is more clever than me, I removed "static" and

is_cond_jmp_opcode:
pushq %rbp #
movq %rsp, %rbp #,
call mcount
leave
subl $112, %edi #, tmp62
cmpb $15, %dil #, tmp62
setbe %al #, tmp64
ret

shows that at least this doesn't make the code worse.


> I would keep the XF macro (although the !! operation to convert non-zero
> to 1 isn't strictly needed)

Well, "!!" is commonly used to make clear the fact we need a boolean,
even if this is not strictly needed.

Besides, in this case it is needed for "!=" above, or we would need to do

- XF(SF) != XF(OF)
+ !!XF(SF) != !!XF(OF)

> and just do an explicit 16-case switch for
> check_jmp_cond().

I'd prefer to keep these macros. They are only used by is_cond_jmp_opcode()
and check_jmp_cond(), and I really think they make the code more readable.

And more editable. To remind, I am going to add the support for j*cxz/loop
later. Just for completeness, we do not need this to fix the bug. In this
case I will simply add

case 0xe3: DO(check_rcx(auprobe, regs))

at the end of CASE_COND and that is all.

> > static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
> > {
> > unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
> > + unsigned long disp = auprobe->ttt.disp;
>
> Looks like a negative ttt.disp will get sign-extended like you want, but
> still, making disp unsigned here doesn't seem quite right.

OK. I changed this line

unsigned long disp = (long)auprobe->ttt.disp;

to make it clear that yes, disp can be negative. Or we could simply use s32,
but then

regs->ip = new_ip + disp;

could look equally confusing.

Thanks,

Oleg.

2014-04-09 16:50:55

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

On 04/08, Jim Keniston wrote:
>
> On Mon, 2014-04-07 at 16:28 +0200, Oleg Nesterov wrote:
> >
> > It seems that 16bit condi jmp is just 0x0f + short_jump_opc_plus_0x10.
>
> Yes, but the code could use a comment to that effect. See below.

OK, will do.

> Searching for "jump" in the AMD manual, I see that there are 3 other
> instructions that are essentially conditional branches: loop,
> loope/loopz, and loopne/loopnz.

Yes, I know, Denys already informed me privately ;)

> The offset is always 8 bits.

And thus this series can ignore them. But as I already said, I'll try
to support them later just for completeness.

> BTW, patches 2 and 3 look fine to me.

Great!

Thanks again Jim for your helpful review.

Oleg.

2014-04-09 16:55:32

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

On 04/09, Masami Hiramatsu wrote:
>
> Reviewed-by: Masami Hiramatsu <[email protected]>

Thanks!

Does this mean that I can add the same into 6-8 you didn't ack
explicitely ?

Oleg.

2014-04-09 19:44:41

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 4/6] uprobes/x86: Emulate rip-relative call's

0xe8. Anything else?

Emulating of rip-relative call is trivial, we only need to additionally
push the ret-address. If this fails, we execute this instruction out of
line and this should trigger the trap, the probed application should die
or the same insn will be restarted if a signal handler expands the stack.
We do not even need ->post_xol() for this case.

But there is a corner (and almost theoretical) case: another thread can
expand the stack right before we execute this insn out of line. In this
case it hit the same problem we are trying to solve. So we simply turn
the probed insn into "call 1f; 1:" and add ->post_xol() which restores
->sp and restarts.

Many thanks to Jonathan who finally found the standalone reproducer,
otherwise I would never resolve the "random SIGSEGV's under systemtap"
bug-report. Now that the problem is clear we can write the simplified
test-case:

void probe_func(void), callee(void);

int failed = 1;

asm (
".text\n"
".align 4096\n"
".globl probe_func\n"
"probe_func:\n"
"call callee\n"
"ret"
);

/*
* This assumes that:
*
* - &probe_func = 0x401000 + a_bit, aligned = 0x402000
*
* - xol_vma->vm_start = TASK_SIZE_MAX - PAGE_SIZE = 0x7fffffffe000
* as xol_add_vma() asks; the 1st slot = 0x7fffffffe080
*
* so we can target the non-canonical address from xol_vma using
* the simple math below, 100 * 4096 is just the random offset
*/
asm (".org . + 0x800000000000 - 0x7fffffffe080 - 5 - 1 + 100 * 4096\n");

void callee(void)
{
failed = 0;
}

int main(void)
{
probe_func();
return failed;
}

It SIGSEGV's if you probe "probe_func" (although it's not very reliable,
randomize_va_space/etc can change the placement of xol area).

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/uprobes.h | 1 +
arch/x86/kernel/uprobes.c | 69 ++++++++++++++++++++++++++++++++++------
2 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index cca62c5..9528117 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -51,6 +51,7 @@ struct arch_uprobe {
struct {
s32 disp;
u8 ilen;
+ u8 opc1;
} ttt;
};
};
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 23a16a5..5a88022 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -461,34 +461,86 @@ static struct uprobe_xol_ops default_xol_ops = {
.post_xol = default_post_xol_op,
};

+static bool ttt_is_call(struct arch_uprobe *auprobe)
+{
+ return auprobe->ttt.opc1 == 0xe8;
+}
+
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
+ unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
+
+ if (ttt_is_call(auprobe)) {
+ unsigned long new_sp = regs->sp - sizeof_long();
+ if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
+ return false;
+ regs->sp = new_sp;
+ }
+
+ regs->ip = new_ip + auprobe->ttt.disp;
return true;
}

+static int ttt_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ BUG_ON(!ttt_is_call(auprobe));
+ /*
+ * We can only get here if ttt_emulate_op() failed to push the return
+ * address _and_ another thread expanded our stack before the (mangled)
+ * "call" insn was executed out-of-line. Just restore ->sp and restart.
+ * We could also restore ->ip and try to call ttt_emulate_op() again.
+ */
+ regs->sp += sizeof_long();
+ return -ERESTART;
+}
+
+static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
+{
+ /*
+ * Turn this insn into "call 1f; 1:", this is what we will execute
+ * out-of-line if ->emulate() fails.
+ *
+ * In the likely case this will lead to arch_uprobe_abort_xol(), but
+ * see the comment in ->emulate(). So we need to ensure that the new
+ * ->ip can't fall into non-canonical area and trigger #GP.
+ *
+ * We could turn it into (say) "pushf", but then we would need to
+ * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
+ * of ->insn[] for set_orig_insn().
+ */
+ memset(auprobe->insn + insn_offset_immediate(insn),
+ 0, insn->immediate.nbytes);
+}
+
static struct uprobe_xol_ops ttt_xol_ops = {
.emulate = ttt_emulate_op,
+ .post_xol = ttt_post_xol_op,
};

/* Returns -ENOSYS if ttt_xol_ops doesn't handle this insn */
static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
{
+ u8 opc1 = OPCODE1(insn);
+
+ /* has the side-effect of processing the entire instruction */
+ insn_get_length(insn);
+ if (WARN_ON_ONCE(!insn_complete(insn)))
+ return -ENOEXEC;

- switch (OPCODE1(insn)) {
+ switch (opc1) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
break;
+
+ case 0xe8: /* call relative */
+ ttt_clear_displacement(auprobe, insn);
+ break;
default:
return -ENOSYS;
}

- /* has the side-effect of processing the entire instruction */
- insn_get_length(insn);
- if (WARN_ON_ONCE(!insn_complete(insn)))
- return -ENOEXEC;
-
+ auprobe->ttt.opc1 = opc1;
auprobe->ttt.ilen = insn->length;
auprobe->ttt.disp = insn->immediate.value;

@@ -532,9 +584,6 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
case 0xca:
fix_ip = false;
break;
- case 0xe8: /* call relative - Fix return addr */
- fix_call = true;
- break;
case 0x9a: /* call absolute - Fix return addr, not ip */
fix_call = true;
fix_ip = false;
--
1.5.5.1

2014-04-09 19:44:47

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 5/6] uprobes/x86: Emulate rip-relative conditional "short" jmp's

Incomplete, lacks "jcxz". Simple to fix. Anything else?

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 5a88022..ab9342a 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -466,18 +466,69 @@ static bool ttt_is_call(struct arch_uprobe *auprobe)
return auprobe->ttt.opc1 == 0xe8;
}

+#define CASE_COND \
+ COND(70, 71, XF(OF)) \
+ COND(72, 73, XF(CF)) \
+ COND(74, 75, XF(ZF)) \
+ COND(78, 79, XF(SF)) \
+ COND(7a, 7b, XF(PF)) \
+ COND(76, 77, XF(CF) || XF(ZF)) \
+ COND(7c, 7d, XF(SF) != XF(OF)) \
+ COND(7e, 7f, XF(ZF) || XF(SF) != XF(OF))
+
+#define COND(op_y, op_n, expr) \
+ case 0x ## op_y: DO((expr) != 0) \
+ case 0x ## op_n: DO((expr) == 0)
+
+#define XF(xf) (!!(flags & X86_EFLAGS_ ## xf))
+
+static bool is_cond_jmp_opcode(u8 opcode)
+{
+ switch (opcode) {
+ #define DO(expr) \
+ return true;
+ CASE_COND
+ #undef DO
+
+ default:
+ return false;
+ }
+}
+
+static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ unsigned long flags = regs->flags;
+
+ switch (auprobe->ttt.opc1) {
+ #define DO(expr) \
+ return expr;
+ CASE_COND
+ #undef DO
+
+ default: /* not a conditional jmp */
+ return true;
+ }
+}
+
+#undef XF
+#undef COND
+#undef CASE_COND
+
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
+ unsigned long disp = (long)auprobe->ttt.disp;

if (ttt_is_call(auprobe)) {
unsigned long new_sp = regs->sp - sizeof_long();
if (copy_to_user((void __user *)new_sp, &new_ip, sizeof_long()))
return false;
regs->sp = new_sp;
+ } else if (!check_jmp_cond(auprobe, regs)) {
+ disp = 0;
}

- regs->ip = new_ip + auprobe->ttt.disp;
+ regs->ip = new_ip + disp;
return true;
}

@@ -536,8 +587,10 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
case 0xe8: /* call relative */
ttt_clear_displacement(auprobe, insn);
break;
+
default:
- return -ENOSYS;
+ if (!is_cond_jmp_opcode(opc1))
+ return -ENOSYS;
}

auprobe->ttt.opc1 = opc1;
--
1.5.5.1

2014-04-09 19:44:51

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

It seems that 16bit condi jmp is just 0x0f + short_jump_opc_plus_0x10.

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index ab9342a..cdad38d 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -588,6 +588,14 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
ttt_clear_displacement(auprobe, insn);
break;

+ case 0x0f:
+ if (insn->opcode.nbytes != 2)
+ return -ENOSYS;
+ /*
+ * If it is a "near" conditional jmp, OPCODE2() - 0x10 matches
+ * OPCODE1() of the "short" jmp which checks the same condition.
+ */
+ opc1 = OPCODE2(insn) - 0x10;
default:
if (!is_cond_jmp_opcode(opc1))
return -ENOSYS;
--
1.5.5.1

2014-04-09 19:44:38

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 3/6] uprobes/x86: Emulate nop's using ops->emulate()

Finally we can kill the ugly (and very limited) code in __skip_sstep().
Just change ttt_setup_xol_ops() to treat "nop" as jmp to the next insn.

Thanks to lib/insn.c, it is clever enough. OPCODE1() == 0x90 includes
"(rep;)+ nop;" at least, and (afaics) much more.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 20 +-------------------
1 files changed, 1 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 1cdc379..23a16a5 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -478,6 +478,7 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
switch (OPCODE1(insn)) {
case 0xeb: /* jmp 8 */
case 0xe9: /* jmp 32 */
+ case 0x90: /* prefix* + nop; same as jmp with .disp = 0 */
break;
default:
return -ENOSYS;
@@ -710,29 +711,10 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
regs->flags &= ~X86_EFLAGS_TF;
}

-/*
- * Skip these instructions as per the currently known x86 ISA.
- * rep=0x66*; nop=0x90
- */
static bool __skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
- int i;
-
if (auprobe->ops->emulate)
return auprobe->ops->emulate(auprobe, regs);
-
- /* TODO: move this code into ->emulate() hook */
- for (i = 0; i < MAX_UINSN_BYTES; i++) {
- if (auprobe->insn[i] == 0x66)
- continue;
-
- if (auprobe->insn[i] == 0x90) {
- regs->ip += i + 1;
- return true;
- }
-
- break;
- }
return false;
}

--
1.5.5.1

2014-04-09 19:44:35

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 2/6] uprobes/x86: Emulate unconditional rip-relative jmp's

0xeb and 0xe9. Anything else?

TODO: move ->fixup into the union along with rip_rela_target_address.

Reported-by: Jonathan Lebon <[email protected]>
Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/include/asm/uprobes.h | 8 +++++++-
arch/x86/kernel/uprobes.c | 38 ++++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 9f8210b..cca62c5 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -44,9 +44,15 @@ struct arch_uprobe {
u16 fixups;
const struct uprobe_xol_ops *ops;

+ union {
#ifdef CONFIG_X86_64
- unsigned long rip_rela_target_address;
+ unsigned long rip_rela_target_address;
#endif
+ struct {
+ s32 disp;
+ u8 ilen;
+ } ttt;
+ };
};

struct arch_uprobe_task {
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index ef59ef9..1cdc379 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -461,6 +461,40 @@ static struct uprobe_xol_ops default_xol_ops = {
.post_xol = default_post_xol_op,
};

+static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ regs->ip += auprobe->ttt.ilen + auprobe->ttt.disp;
+ return true;
+}
+
+static struct uprobe_xol_ops ttt_xol_ops = {
+ .emulate = ttt_emulate_op,
+};
+
+/* Returns -ENOSYS if ttt_xol_ops doesn't handle this insn */
+static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
+{
+
+ switch (OPCODE1(insn)) {
+ case 0xeb: /* jmp 8 */
+ case 0xe9: /* jmp 32 */
+ break;
+ default:
+ return -ENOSYS;
+ }
+
+ /* has the side-effect of processing the entire instruction */
+ insn_get_length(insn);
+ if (WARN_ON_ONCE(!insn_complete(insn)))
+ return -ENOEXEC;
+
+ auprobe->ttt.ilen = insn->length;
+ auprobe->ttt.disp = insn->immediate.value;
+
+ auprobe->ops = &ttt_xol_ops;
+ return 0;
+}
+
/**
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
@@ -478,6 +512,10 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (ret)
return ret;

+ ret = ttt_setup_xol_ops(auprobe, &insn);
+ if (ret != -ENOSYS)
+ return ret;
+
/*
* Figure out which fixups arch_uprobe_post_xol() will need to perform,
* and annotate arch_uprobe->fixups accordingly. To start with, ->fixups
--
1.5.5.1

2014-04-09 19:44:31

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 1/6] uprobes/x86: Introduce sizeof_long(), cleanup adjust_ret_addr() and arch_uretprobe_hijack_return_addr()

1. Add the trivial sizeof_long() helper and change other callers of
is_ia32_task() to use it.

TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
not necessarily mean 32bit. Fortunately syscall-like insns can't be
probed so it actually works, but it would be better to rename and
use is_ia32_frame().

2. As Jim pointed out "ncopied" in arch_uretprobe_hijack_return_addr()
and adjust_ret_addr() should be named "nleft". And in fact only the
last copy_to_user() in arch_uretprobe_hijack_return_addr() actually
needs to inspect the non-zero error code.

TODO: adjust_ret_addr() should die. We can always calculate the value
we need to write into *regs->sp, just UPROBE_FIX_CALL should record
insn->length.

Signed-off-by: Oleg Nesterov <[email protected]>
---
arch/x86/kernel/uprobes.c | 37 +++++++++++++++----------------------
1 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 1ea7e1a..ef59ef9 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -408,6 +408,11 @@ struct uprobe_xol_ops {
int (*post_xol)(struct arch_uprobe *, struct pt_regs *);
};

+static inline int sizeof_long(void)
+{
+ return is_ia32_task() ? 4 : 8;
+}
+
static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
pre_xol_rip_insn(auprobe, regs, &current->utask->autask);
@@ -419,21 +424,14 @@ static int default_pre_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
*/
static int adjust_ret_addr(unsigned long sp, long correction)
{
- int rasize, ncopied;
- long ra = 0;
-
- if (is_ia32_task())
- rasize = 4;
- else
- rasize = 8;
+ int rasize = sizeof_long();
+ long ra;

- ncopied = copy_from_user(&ra, (void __user *)sp, rasize);
- if (unlikely(ncopied))
+ if (copy_from_user(&ra, (void __user *)sp, rasize))
return -EFAULT;

ra += correction;
- ncopied = copy_to_user((void __user *)sp, &ra, rasize);
- if (unlikely(ncopied))
+ if (copy_to_user((void __user *)sp, &ra, rasize))
return -EFAULT;

return 0;
@@ -450,10 +448,7 @@ static int default_post_xol_op(struct arch_uprobe *auprobe, struct pt_regs *regs

if (auprobe->fixups & UPROBE_FIX_CALL) {
if (adjust_ret_addr(regs->sp, correction)) {
- if (is_ia32_task())
- regs->sp += 4;
- else
- regs->sp += 8;
+ regs->sp += sizeof_long();
return -ERESTART;
}
}
@@ -716,23 +711,21 @@ if (ret) pr_crit("EMULATE: %lx -> %lx\n", ip, regs->ip);
unsigned long
arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs)
{
- int rasize, ncopied;
+ int rasize = sizeof_long(), nleft;
unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */

- rasize = is_ia32_task() ? 4 : 8;
- ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize);
- if (unlikely(ncopied))
+ if (copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize))
return -1;

/* check whether address has been already hijacked */
if (orig_ret_vaddr == trampoline_vaddr)
return orig_ret_vaddr;

- ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
- if (likely(!ncopied))
+ nleft = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
+ if (likely(!nleft))
return orig_ret_vaddr;

- if (ncopied != rasize) {
+ if (nleft != rasize) {
pr_err("uprobe: return address clobbered: pid=%d, %%sp=%#lx, "
"%%ip=%#lx\n", current->pid, regs->sp, regs->ip);

--
1.5.5.1

2014-04-09 19:44:10

by Oleg Nesterov

[permalink] [raw]
Subject: [RFC PATCH v2 0/6] uprobes/x86: fix the reprel jmp/call handling

On 04/06, Oleg Nesterov wrote:
>
> OK. Please see the RFC changes. Obviously not for inclusion yet. And
> totally untested, except I verified that the test-case from 4/6 works.

Still not really tested, but seems to work.

Please see v2. All changes are minor except the s/ENOSYS/-ENOSYS/ fix.
Please see the intediff below. I also move "Introduce sizeof_long() ..."
to the head of this series.

Jim, I am still thinking how I can improve the commenents in 4/6 as you
asked me, and I obviously need to write the changelogs and change "ttt"
prefix.

Will you agree with s/ttt/branch/ ?

Do you think the code is fine in v2 ?

Oleg.


--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -500,16 +500,13 @@ static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs)
unsigned long flags = regs->flags;

switch (auprobe->ttt.opc1) {
- case 0x00: /* not a conditional jmp */
- return true;
-
#define DO(expr) \
return expr;
CASE_COND
#undef DO

- default:
- BUG();
+ default: /* not a conditional jmp */
+ return true;
}
}

@@ -520,7 +517,7 @@ static bool check_jmp_cond(struct arch_uprobe *auprobe, struct pt_regs *regs)
static bool ttt_emulate_op(struct arch_uprobe *auprobe, struct pt_regs *regs)
{
unsigned long new_ip = regs->ip += auprobe->ttt.ilen;
- unsigned long disp = auprobe->ttt.disp;
+ unsigned long disp = (long)auprobe->ttt.disp;

if (ttt_is_call(auprobe)) {
unsigned long new_sp = regs->sp - sizeof_long();
@@ -562,8 +559,8 @@ static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *ins
* divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
* of ->insn[] for set_orig_insn().
*/
- memset(auprobe->insn + insn_offset_displacement(insn),
- 0, insn->moffset1.nbytes);
+ memset(auprobe->insn + insn_offset_immediate(insn),
+ 0, insn->immediate.nbytes);
}

static struct uprobe_xol_ops ttt_xol_ops = {
@@ -571,6 +568,7 @@ static struct uprobe_xol_ops ttt_xol_ops = {
.post_xol = ttt_post_xol_op,
};

+/* Returns -ENOSYS if ttt_xol_ops doesn't handle this insn */
static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)
{
u8 opc1 = OPCODE1(insn);
@@ -588,24 +586,24 @@ static int ttt_setup_xol_ops(struct arch_uprobe *auprobe, struct insn *insn)

case 0xe8: /* call relative */
ttt_clear_displacement(auprobe, insn);
- auprobe->ttt.opc1 = opc1;
break;

case 0x0f:
if (insn->opcode.nbytes != 2)
return -ENOSYS;
+ /*
+ * If it is a "near" conditional jmp, OPCODE2() - 0x10 matches
+ * OPCODE1() of the "short" jmp which checks the same condition.
+ */
opc1 = OPCODE2(insn) - 0x10;
default:
if (!is_cond_jmp_opcode(opc1))
return -ENOSYS;
- auprobe->ttt.opc1 = opc1;
}

+ auprobe->ttt.opc1 = opc1;
auprobe->ttt.ilen = insn->length;
- auprobe->ttt.disp = insn->moffset1.value;
- /* so far we assume that it fits into ->moffset1 */
- if (WARN_ON_ONCE(insn->moffset2.nbytes))
- return -ENOEXEC;
+ auprobe->ttt.disp = insn->immediate.value;

auprobe->ops = &ttt_xol_ops;
return 0;
@@ -629,7 +627,7 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm,
return ret;

ret = ttt_setup_xol_ops(auprobe, &insn);
- if (ret == 0 || ret != ENOSYS)
+ if (ret != -ENOSYS)
return ret;

/*

2014-04-09 21:25:42

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Wed, 2014-04-09 at 17:43 +0200, Oleg Nesterov wrote:
> On 04/08, Jim Keniston wrote:
> >
> > On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> > > 0xe8. Anything else?

Oleg, thanks for responding to (if not necessarily agreeing with) all my
suggestions. I think your code is solid, but I think we were wrong
about one thing. See below.

> >
> > No, I think e8 is the only call instruction uprobes will see.
>
> Good.
>
> > The following couple of paragraphs should be included in the code,
> > perhaps merged with some of the related comments already there. Without
> > it, the code is pretty hard to follow.
>
> OK, I'll try to improve the comments somehow...
>
> > > Emulating of rip-relative call is trivial, we only need to additionally
> > > push the ret-address. If this fails, we execute this instruction out of
> > > line and this should trigger the trap, the probed application should die
> > > or the same insn will be restarted if a signal handler expands the stack.
> > > We do not even need ->post_xol() for this case.
> > >
> > > But there is a corner (and almost theoretical) case: another thread can
> > > expand the stack right before we execute this insn out of line. In this
> > > case it hit the same problem we are trying to solve. So we simply turn
> > > the probed insn into "call 1f; 1:" and add ->post_xol() which restores
> > > ->sp and restarts.
> >
> > Can you explain under what circumstances emulation of the call
> > instruction would fail? Will the copy_to_user() call that writes the
> > return address to the stack grow the stack if necessary?
>
> If necessary and if possible.
>
> > Will
> > single-stepping the mangled call instruction expand the stack?
>
> In the likely case it won't. If copy_to_user() failes, then "call" should
> fail too (again, ignoring the potential race with another thread which can
> populate the memory ->sp - 8 points to).


I've experimented with putting a uprobe on a call instruction (in a
tight loop, with no ret) and a kprobe on expand_stack(), and my results
seem to indicate that a single-stepped call instruction will
successfully grow the stack vma where a call to copy_to_user() won't.

So your trick of single-stepping the mangled call after the failed
emulation, and then retrying the emulation should come into play in more
than just theoretical situations. We'd need it any time the probed call
needs to grow the stack vma.

I can send you the test kprobe/uprobe modules I used, if you like.

Bottom line: no further code changes, but this would affect your
comments.

Jim

2014-04-09 21:34:09

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] uprobes/x86: fix the reprel jmp/call handling

On Wed, 2014-04-09 at 21:44 +0200, Oleg Nesterov wrote:
> On 04/06, Oleg Nesterov wrote:
> >
> > OK. Please see the RFC changes. Obviously not for inclusion yet. And
> > totally untested, except I verified that the test-case from 4/6 works.
>
> Still not really tested, but seems to work.
>
> Please see v2. All changes are minor except the s/ENOSYS/-ENOSYS/ fix.
> Please see the intediff below. I also move "Introduce sizeof_long() ..."
> to the head of this series.
>
> Jim, I am still thinking how I can improve the commenents in 4/6 as you
> asked me, and I obviously need to write the changelogs and change "ttt"
> prefix.
>
> Will you agree with s/ttt/branch/ ?

Yes.

>
> Do you think the code is fine in v2 ?

Yes, except for certain comments wrt call emulation. See my reply about
patch #4 from a few minutes ago.

Once that's resolved, I consider all your patches so far
Reviewed-by: Jim Keniston <[email protected]>
and
Acked-by: Jim Keniston <[email protected]>

>
> Oleg.
...

Jim

2014-04-10 04:06:04

by Jim Keniston

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Wed, 2014-04-09 at 14:25 -0700, Jim Keniston wrote:
> On Wed, 2014-04-09 at 17:43 +0200, Oleg Nesterov wrote:
> > On 04/08, Jim Keniston wrote:
> > >
> > > On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> > > > 0xe8. Anything else?
>
> Oleg, thanks for responding to (if not necessarily agreeing with) all my
> suggestions. I think your code is solid, but I think we were wrong
> about one thing. See below.

Oops. Nope. You were right and I was wrong. See below.

>
> > >
> > > No, I think e8 is the only call instruction uprobes will see.
> >
> > Good.
> >
> > > The following couple of paragraphs should be included in the code,
> > > perhaps merged with some of the related comments already there. Without
> > > it, the code is pretty hard to follow.
> >
> > OK, I'll try to improve the comments somehow...
> >
> > > > Emulating of rip-relative call is trivial, we only need to additionally
> > > > push the ret-address. If this fails, we execute this instruction out of
> > > > line and this should trigger the trap, the probed application should die
> > > > or the same insn will be restarted if a signal handler expands the stack.
> > > > We do not even need ->post_xol() for this case.
> > > >
> > > > But there is a corner (and almost theoretical) case: another thread can
> > > > expand the stack right before we execute this insn out of line. In this
> > > > case it hit the same problem we are trying to solve. So we simply turn
> > > > the probed insn into "call 1f; 1:" and add ->post_xol() which restores
> > > > ->sp and restarts.
> > >
> > > Can you explain under what circumstances emulation of the call
> > > instruction would fail? Will the copy_to_user() call that writes the
> > > return address to the stack grow the stack if necessary?
> >
> > If necessary and if possible.
> >
> > > Will
> > > single-stepping the mangled call instruction expand the stack?
> >
> > In the likely case it won't. If copy_to_user() failes, then "call" should
> > fail too (again, ignoring the potential race with another thread which can
> > populate the memory ->sp - 8 points to).
>
>
> I've experimented with putting a uprobe on a call instruction (in a
> tight loop, with no ret) and a kprobe on expand_stack(), and my results
> seem to indicate that a single-stepped call instruction will
> successfully grow the stack vma where a call to copy_to_user() won't.

Nope. There was a bug in my test. copy_to_user() can indeed grow the
stack vma. So never mind.
>
> So your trick of single-stepping the mangled call after the failed
> emulation, and then retrying the emulation should come into play in more
> than just theoretical situations. We'd need it any time the probed call
> needs to grow the stack vma.
>
> I can send you the test kprobe/uprobe modules I used, if you like.
>
> Bottom line: no further code changes, but this would affect your
> comments.
>
> Jim

Jim

2014-04-10 12:28:47

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH v2 0/6] uprobes/x86: fix the reprel jmp/call handling

On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> On 04/06, Oleg Nesterov wrote:
> uprobes/x86: fix the reprel jmp/call handling

In x86 asm-speak, relative jumps and calls are called
simply "relative" (meaning that instructions contain
an offset relative to current instruction pointer).

I propose to use the "relative" term when you refer
to such branch instructions.
Not, say, "rip-relative" - that term is used in x86-speak
only for instruction-pointer-relative operands.

MOV 0x123456(%RIP), %RCX

*This* is rip-relative offset.

2014-04-10 12:38:21

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/6] uprobes/x86: Emulate unconditional rip-relative jmp's

On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> uprobes/x86: Emulate unconditional rip-relative jmp's

I propose "Emulate unconditional relative jmp's"


> + union {
> #ifdef CONFIG_X86_64
> - unsigned long rip_rela_target_address;
> + unsigned long rip_rela_target_address;
> #endif
> + struct {
> + s32 disp;

In x86-speak, jumps and calls have "offsets", not "displacements"

> + u8 ilen;
> + } ttt;
> + };
> };

2014-04-10 12:49:50

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH v2 6/6] uprobes/x86: Emulate rip-relative conditional "near" jmp's

On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> + case 0x0f:
> + if (insn->opcode.nbytes != 2)
> + return -ENOSYS;
> + /*
> + * If it is a "near" conditional jmp, OPCODE2() - 0x10 matches
> + * OPCODE1() of the "short" jmp which checks the same condition.
> + */

I propose:

/*
* Near conditional jump opcodes: 0f 80..8f
* Short conditional jump opcodes: 70..7f
* Convert OPCODE2() to corresponding short jump opcode:
*/

> + opc1 = OPCODE2(insn) - 0x10;
> default:

2014-04-10 12:54:00

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/6] uprobes/x86: Emulate rip-relative call's

On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> +static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)

Branch instruction's offset isn't called "displacement"
on x86.

How about ttt_clear_branch_offset?

> +{
> + /*
> + * Turn this insn into "call 1f; 1:", this is what we will execute
> + * out-of-line if ->emulate() fails.
> + *
> + * In the likely case this will lead to arch_uprobe_abort_xol(), but
> + * see the comment in ->emulate(). So we need to ensure that the new
> + * ->ip can't fall into non-canonical area and trigger #GP.
> + *
> + * We could turn it into (say) "pushf", but then we would need to
> + * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
> + * of ->insn[] for set_orig_insn().
> + */
> + memset(auprobe->insn + insn_offset_immediate(insn),
> + 0, insn->immediate.nbytes);
> +}

Subject: Re: [RFC PATCH v2 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/10 21:53), Denys Vlasenko wrote:
> On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
>> +static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
>
> Branch instruction's offset isn't called "displacement"
> on x86.
>
> How about ttt_clear_branch_offset?

I like his idea. "displacement" on x86 is so confused especially
with using x86 insn decoder.

Thank you,

>
>> +{
>> + /*
>> + * Turn this insn into "call 1f; 1:", this is what we will execute
>> + * out-of-line if ->emulate() fails.
>> + *
>> + * In the likely case this will lead to arch_uprobe_abort_xol(), but
>> + * see the comment in ->emulate(). So we need to ensure that the new
>> + * ->ip can't fall into non-canonical area and trigger #GP.
>> + *
>> + * We could turn it into (say) "pushf", but then we would need to
>> + * divorce ->insn[] and ->ixol[]. We need to preserve the 1st byte
>> + * of ->insn[] for set_orig_insn().
>> + */
>> + memset(auprobe->insn + insn_offset_immediate(insn),
>> + 0, insn->immediate.nbytes);
>> +}
>
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-10 13:41:52

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10, Masami Hiramatsu wrote:
>
> (2014/04/10 21:53), Denys Vlasenko wrote:
> > On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> >> +static void ttt_clear_displacement(struct arch_uprobe *auprobe, struct insn *insn)
> >
> > Branch instruction's offset isn't called "displacement"
> > on x86.
> >
> > How about ttt_clear_branch_offset?
>
> I like his idea.

Me too. Thanks Denys!

Except ttt_clear_branch_offset() won't look nice after s/ttt/branch/ ;)
I'll rename it to branch_clear_offset().

> "displacement" on x86 is so confused especially
> with using x86 insn decoder.

Yes. The naming just mirros my initial misunderstanding of lib/insn.c.

Thanks,

Oleg.

2014-04-10 13:42:06

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
> On 04/08, Jim Keniston wrote:
>>
>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
>>> 0xe8. Anything else?
>>
>> No, I think e8 is the only call instruction uprobes will see.
>
> Good.

There is this monstrosity, "16-bit override for branches" in 64-mode:

66 e8 nn nn callw <offset16>

Nobody sane uses it because it truncates instruction pointer.

Or rather, *I think* it should truncate it (i.e. zero-extend to full width),
but conceivably some CPUs can be buggy wrt that:
they can decide to modify only lower 16 bits of IP,
or even they can decided to use signed <offset16> but apply it
to full-width RIP.

AMD manuals are not clear on what exactly should happen.

I am sure no one sane uses this form of branch instructions
in 32-bit and 64-bit code.

I don't think we should be trying to support it "correctly"
(we can just let program crash with SIGILL or something),
we only need to make sure we don't overlook its existence
and thus are not tricked into touching or modifying unrelated data.


Imagine that 66 e8 nn nn bytes are exactly at the end of
a page, and we wrongly assume that offset is 32-bit, not 16-bit.

2014-04-10 13:47:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH v2 2/6] uprobes/x86: Emulate unconditional rip-relative jmp's

On 04/10, Denys Vlasenko wrote:
>
> On 04/09/2014 09:44 PM, Oleg Nesterov wrote:
> > uprobes/x86: Emulate unconditional rip-relative jmp's
>
> I propose "Emulate unconditional relative jmp's"

Thanks, will do. And I also agree with your reply to 0/6, will update
the code/subjects.

>
> > + union {
> > #ifdef CONFIG_X86_64
> > - unsigned long rip_rela_target_address;
> > + unsigned long rip_rela_target_address;
> > #endif
> > + struct {
> > + s32 disp;
>
> In x86-speak, jumps and calls have "offsets", not "displacements"

OK. s/disp/offs/.

Oleg.

Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/10 22:41), Denys Vlasenko wrote:
> On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
>> On 04/08, Jim Keniston wrote:
>>>
>>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
>>>> 0xe8. Anything else?
>>>
>>> No, I think e8 is the only call instruction uprobes will see.
>>
>> Good.
>
> There is this monstrosity, "16-bit override for branches" in 64-mode:
>
> 66 e8 nn nn callw <offset16>
>
> Nobody sane uses it because it truncates instruction pointer.

No problem, insn.c can handle that too. :)

Thank you,

>
> Or rather, *I think* it should truncate it (i.e. zero-extend to full width),
> but conceivably some CPUs can be buggy wrt that:
> they can decide to modify only lower 16 bits of IP,
> or even they can decided to use signed <offset16> but apply it
> to full-width RIP.
>
> AMD manuals are not clear on what exactly should happen.
>
> I am sure no one sane uses this form of branch instructions
> in 32-bit and 64-bit code.
>
> I don't think we should be trying to support it "correctly"
> (we can just let program crash with SIGILL or something),
> we only need to make sure we don't overlook its existence
> and thus are not tricked into touching or modifying unrelated data.
>
>
> Imagine that 66 e8 nn nn bytes are exactly at the end of
> a page, and we wrongly assume that offset is 32-bit, not 16-bit.
>
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [PATCH v2 6/9] uprobes/x86: Introduce uprobe_xol_ops and arch_uprobe->ops

(2014/04/10 1:55), Oleg Nesterov wrote:
> On 04/09, Masami Hiramatsu wrote:
>>
>> Reviewed-by: Masami Hiramatsu <[email protected]>
>
> Thanks!
>
> Does this mean that I can add the same into 6-8 you didn't ack
> explicitely ?

Ah, yes! You can add my reviewed-by to other patches in the series.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-10 14:18:16

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10, Denys Vlasenko wrote:
>
> There is this monstrosity, "16-bit override for branches" in 64-mode:
>
> 66 e8 nn nn callw <offset16>
>
> Nobody sane uses it because it truncates instruction pointer.
>
> Or rather, *I think* it should truncate it (i.e. zero-extend to full width),
> but conceivably some CPUs can be buggy wrt that:
> they can decide to modify only lower 16 bits of IP,
> or even they can decided to use signed <offset16> but apply it
> to full-width RIP.
>
> AMD manuals are not clear on what exactly should happen.
>
> I am sure no one sane uses this form of branch instructions
> in 32-bit and 64-bit code.
>
> I don't think we should be trying to support it "correctly"
> (we can just let program crash with SIGILL or something),
> we only need to make sure we don't overlook its existence
> and thus are not tricked into touching or modifying unrelated data.

And after the quick check it seems that lib/insn.c doesn't parse
"66 e8 nn nn" correctly. It treats the next 2 bytes as the part
of 32bit offset.

> Imagine that 66 e8 nn nn bytes are exactly at the end of
> a page,

this doesn't matter. We always read MAX_UINSN_BYTES bytes, so

> and we wrongly assume that offset is 32-bit, not 16-bit.
> 66 e8 nn nn.

see above this will always happen.

So I think branch_setup_xol_ops() should simply return -ENOSYS in
this case, and let the default_ code execute it out of line. Given
that nobody should use this insn, this is probably not too bad.

We can teach branch_xol_ops to handle this insn correctly later.

Now the question is, how I can detect this insn correctly? I mean,
using the "right" helpers from lib/insn.c ?

Oleg.

2014-04-10 14:21:10

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10/2014 03:57 PM, Masami Hiramatsu wrote:
> (2014/04/10 22:41), Denys Vlasenko wrote:
>> On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
>>> On 04/08, Jim Keniston wrote:
>>>>
>>>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
>>>>> 0xe8. Anything else?
>>>>
>>>> No, I think e8 is the only call instruction uprobes will see.
>>>
>>> Good.
>>
>> There is this monstrosity, "16-bit override for branches" in 64-mode:
>>
>> 66 e8 nn nn callw <offset16>
>>
>> Nobody sane uses it because it truncates instruction pointer.
>
> No problem, insn.c can handle that too. :)

That's good that we decode it correctly,
but there is more to it.

Call insn pushes return address to stack.

This "mutant 16-bit call", what should it push?
Full RIP?
Truncated 16-bit IP? If yes, by how much does it
advance RSP? +2? +8?
Hmm. Does it affect RSP or only its 16-bit lower part?

It's a can of worms! :)

2014-04-10 14:28:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10, Masami Hiramatsu wrote:
>
> (2014/04/10 22:41), Denys Vlasenko wrote:
> > On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
> >> On 04/08, Jim Keniston wrote:
> >>>
> >>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
> >>>> 0xe8. Anything else?
> >>>
> >>> No, I think e8 is the only call instruction uprobes will see.
> >>
> >> Good.
> >
> > There is this monstrosity, "16-bit override for branches" in 64-mode:
> >
> > 66 e8 nn nn callw <offset16>
> >
> > Nobody sane uses it because it truncates instruction pointer.
>
> No problem, insn.c can handle that too. :)

Does it?

"callw 1f; 1:\n"
"rep; nop\n"

objdump:

66 e8 00 00 callw 485 <_init-0x3ffed3>
f3 90 pause


if we probe this "callw", we copy MAX_INSN_BYTES into auprobe->insn,
and after insn_get_length() (insn_complete() == T)

// this is correct
OPCODE1() == e8

// this all looks wrong
insn->length == 6
insn->immediate.value == -1863122944
insn->immediate.nbytes == 4

so it seems that lib/insn.c treats the next "pause" insn as the high
16 bits of address.

Oleg.

2014-04-10 14:31:08

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10/2014 04:18 PM, Oleg Nesterov wrote:
> On 04/10, Denys Vlasenko wrote:
>>
>> There is this monstrosity, "16-bit override for branches" in 64-mode:
>>
>> 66 e8 nn nn callw <offset16>
>>
>> Nobody sane uses it because it truncates instruction pointer.
>>
>> Or rather, *I think* it should truncate it (i.e. zero-extend to full width),
>> but conceivably some CPUs can be buggy wrt that:
>> they can decide to modify only lower 16 bits of IP,
>> or even they can decided to use signed <offset16> but apply it
>> to full-width RIP.
>>
>> AMD manuals are not clear on what exactly should happen.
>>
>> I am sure no one sane uses this form of branch instructions
>> in 32-bit and 64-bit code.
>>
>> I don't think we should be trying to support it "correctly"
>> (we can just let program crash with SIGILL or something),
>> we only need to make sure we don't overlook its existence
>> and thus are not tricked into touching or modifying unrelated data.
>
> And after the quick check it seems that lib/insn.c doesn't parse
> "66 e8 nn nn" correctly. It treats the next 2 bytes as the part
> of 32bit offset.

I didn't run-test it yet. By code inspection, it seems to work...

x86-opcode-map.txt:
e8: CALL Jz (f64)

gen-insn-attr-x86.awk:
imm_flag["Jz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"


insn.c:
case INAT_IMM_VWORD32:
if (!__get_immv32(insn))
goto err_out;
...
static int __get_immv32(struct insn *insn)
{
switch (insn->opnd_bytes) {
case 2:
insn->immediate.value = get_next(short, insn);
insn->immediate.nbytes = 2;
break;
case 4:
case 8:
insn->immediate.value = get_next(int, insn);
insn->immediate.nbytes = 4;
break;


...until I notice this code:

void insn_get_modrm(struct insn *insn)
{
...
if (insn->x86_64 && inat_is_force64(insn->attr))
insn->opnd_bytes = 8;


The (f64) modifier in x86-opcode-map.txt means that inat_is_force64()
is true for call opcode. So we won't reach "case 2:" in __get_immv32():
insn_get_prefixes() did set insn->opnd_bytes to 2 when it saw 0x66 prefix,
but it was before we reach this place, and here we overrode it.
This is a bug in insn decoder.

2014-04-10 17:03:07

by Denys Vlasenko

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Thursday 10 April 2014 16:30, Denys Vlasenko wrote:
> On 04/10/2014 04:18 PM, Oleg Nesterov wrote:
> > On 04/10, Denys Vlasenko wrote:
> >>
> >> There is this monstrosity, "16-bit override for branches" in 64-mode:
> >>
> >> 66 e8 nn nn callw <offset16>
> >>
> >> Nobody sane uses it because it truncates instruction pointer.
> >>
> >> Or rather, *I think* it should truncate it (i.e. zero-extend to full width),
> >> but conceivably some CPUs can be buggy wrt that:
> >> they can decide to modify only lower 16 bits of IP,
> >> or even they can decided to use signed <offset16> but apply it
> >> to full-width RIP.
> >>
> >> AMD manuals are not clear on what exactly should happen.
> >>
> >> I am sure no one sane uses this form of branch instructions
> >> in 32-bit and 64-bit code.
> >>
> >> I don't think we should be trying to support it "correctly"
> >> (we can just let program crash with SIGILL or something),
> >> we only need to make sure we don't overlook its existence
> >> and thus are not tricked into touching or modifying unrelated data.
> >
> > And after the quick check it seems that lib/insn.c doesn't parse
> > "66 e8 nn nn" correctly. It treats the next 2 bytes as the part
> > of 32bit offset.
>
> I didn't run-test it yet. By code inspection, it seems to work...
>
> x86-opcode-map.txt:
> e8: CALL Jz (f64)
>
> gen-insn-attr-x86.awk:
> imm_flag["Jz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
>
>
> insn.c:
> case INAT_IMM_VWORD32:
> if (!__get_immv32(insn))
> goto err_out;
> ...
> static int __get_immv32(struct insn *insn)
> {
> switch (insn->opnd_bytes) {
> case 2:
> insn->immediate.value = get_next(short, insn);
> insn->immediate.nbytes = 2;
> break;
> case 4:
> case 8:
> insn->immediate.value = get_next(int, insn);
> insn->immediate.nbytes = 4;
> break;
>
>
> ...until I notice this code:
>
> void insn_get_modrm(struct insn *insn)
> {
> ...
> if (insn->x86_64 && inat_is_force64(insn->attr))
> insn->opnd_bytes = 8;
>
>
> The (f64) modifier in x86-opcode-map.txt means that inat_is_force64()
> is true for call opcode. So we won't reach "case 2:" in __get_immv32():
> insn_get_prefixes() did set insn->opnd_bytes to 2 when it saw 0x66 prefix,
> but it was before we reach this place, and here we overrode it.
> This is a bug in insn decoder.

I tested it on both Intel and AMD CPUs and my worst fears came true:
this instruction has different widths on different CPUs.

This program:

# compile with: gcc -nostartfiles -nostdlib -o int3 int3.S
_start: .globl _start
.byte 0x66,0xe9,0,0
.byte 0,0
1: jmp 1b

compiles to this:

00000000004000b0 <_start>:
4000b0: 66 e9 00 00 jmpw b4 <_start-0x3ffffc>
4000b4: 00 00 add %al,(%rax)
4000b6: eb fe jmp 4000b6 <_start+0x6>

and it will reach 0x4000b6 on Intel CPU.
IOW, Intel SandyBridge CPU thinks that insn is in fact 66 e9 00 00 00 00,
no RIP truncation occurs.

On AMD K10 CPU, the very same binary jumps to 0x00b4
and gets SIGSEGV with MAPERR.
AMD thinks that the insn is 66 e9 00 00 as shown above.

Thus, insn.c decoder implements Intel's idea of this insn
while binutils (objdump, gdb) implement AMD decode.

This same program can be compiled to 32-bit code,
in this mode both CPUs treat insn as 66 e9 00 00.

Oleg, I'm sure you are very sympathetic by now to the idea
of just not supporting this insn at all. ;)

You can check whether insn had any prefix by checking
insn->prefixes->nbytes != 0...

..but there is a problem with that. P4 introduced branch hints,
which are implemented using segment prefixes on conditional jumps.
Meaning that some compilers produce

2e 0f 82 nn nn nn nn

as (hint not taken) JB <offset32> insn.

2e is CS segment prefix. insn->prefixes->nbytes == 1 for this insn.
DS prefix (3e) hints that branch is taken.

They were nearly useless on P4 anyway and ignored by all CPUs
before or since, but they can be seen in some programs.

Looks like we'll need this:

/*
* 16-bit overrides such as CALLW (66 e8 nn nn) are not supported.
* Intel and AMD behavior differ in 64-bit mode: Intel ignores 66 prefix.
* No one uses these insns.
* To filter them out, reject any branch insns with prefixes...
*/
if (insn->prefixes->nbytes > 1)
bail_out;
/*
* ...Except a single 3e or 2e "branch taken/not taken" hint prefix.
* These are (rarely) used, but ignored by any CPU except P4.
* Example: 2e 0f 82 nn nn nn nn is JB,PN <offset32>
*/
if (insn->prefixes->nbytes == 1 && (insn->prefixes->bytes[0] | 0x10) != 0x3e))
bail_out;



--
vda

2014-04-10 17:12:43

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On 04/10, Oleg Nesterov wrote:
>
> On 04/10, Masami Hiramatsu wrote:
> >
> > (2014/04/10 22:41), Denys Vlasenko wrote:
> > > There is this monstrosity, "16-bit override for branches" in 64-mode:
> > >
> > > 66 e8 nn nn callw <offset16>
> > >
> > > Nobody sane uses it because it truncates instruction pointer.
> >
> > No problem, insn.c can handle that too. :)
>
> Does it?
>
> "callw 1f; 1:\n"
> "rep; nop\n"
>
> objdump:
>
> 66 e8 00 00 callw 485 <_init-0x3ffed3>
> f3 90 pause
>
>
> if we probe this "callw", we copy MAX_INSN_BYTES into auprobe->insn,
> and after insn_get_length() (insn_complete() == T)
>
> // this is correct
> OPCODE1() == e8
>
> // this all looks wrong
> insn->length == 6
> insn->immediate.value == -1863122944
> insn->immediate.nbytes == 4
>
> so it seems that lib/insn.c treats the next "pause" insn as the high
> 16 bits of address.

Or perhaps lib/insn.c is fine but objdump is wrong? And everything
should work correctly? Although in this case I do not understand what
this "callw" actually does.

int main(void)
{
asm (
"nop\n"

"callw 1f; 1:\n"
".byte 0\n"
".byte 0\n"
);

return 0;
}

this runs just fine. With or without gdb. And gdb shows that ->ip is
incremented by 6 after "callw".

int main(void)
{
asm (
"nop\n"

"callw 1f; 1:\n"
".byte 10\n"
".byte 20\n"
);

return 0;
}

objdump:

000000000040047c <main>:
40047c: 55 push %rbp
40047d: 48 89 e5 mov %rsp,%rbp
400480: 90 nop
400481: 66 e8 00 00 callw 485 <_init-0x3ffed3>
400485: 0a 14 b8 or (%rax,%rdi,4),%dl
400488: 00 00 add %al,(%rax)
40048a: 00 00 add %al,(%rax)
40048c: c9 leaveq
40048d: c3 retq

run:

$ ./t
Segmentation fault (core dumped)

$ gdb ./t core.*
...
#0 0x00000000144a0487 in ?? ()

0x144a0487 - 0x400481 == 0x140a0006, this matches the additional 2 .bytes treated
as offset.

So I am totally confused.

Oleg.

Subject: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/10 23:28), Oleg Nesterov wrote:
> On 04/10, Masami Hiramatsu wrote:
>>
>> (2014/04/10 22:41), Denys Vlasenko wrote:
>>> On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
>>>> On 04/08, Jim Keniston wrote:
>>>>>
>>>>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
>>>>>> 0xe8. Anything else?
>>>>>
>>>>> No, I think e8 is the only call instruction uprobes will see.
>>>>
>>>> Good.
>>>
>>> There is this monstrosity, "16-bit override for branches" in 64-mode:
>>>
>>> 66 e8 nn nn callw <offset16>
>>>
>>> Nobody sane uses it because it truncates instruction pointer.
>>
>> No problem, insn.c can handle that too. :)
>
> Does it?
>
> "callw 1f; 1:\n"
> "rep; nop\n"
>
> objdump:
>
> 66 e8 00 00 callw 485 <_init-0x3ffed3>
> f3 90 pause
>
>
> if we probe this "callw", we copy MAX_INSN_BYTES into auprobe->insn,
> and after insn_get_length() (insn_complete() == T)
>
> // this is correct
> OPCODE1() == e8
>
> // this all looks wrong
> insn->length == 6
> insn->immediate.value == -1863122944
> insn->immediate.nbytes == 4



Oops, that should be a bug in insn.c!
I'll fix that asap!

Thank you,


>
> so it seems that lib/insn.c treats the next "pause" insn as the high
> 16 bits of address.
>
> Oleg.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/11 2:00), Oleg Nesterov wrote:
> On 04/10, Oleg Nesterov wrote:
>>
>> On 04/10, Masami Hiramatsu wrote:
>>>
>>> (2014/04/10 22:41), Denys Vlasenko wrote:
>>>> There is this monstrosity, "16-bit override for branches" in 64-mode:
>>>>
>>>> 66 e8 nn nn callw <offset16>
>>>>
>>>> Nobody sane uses it because it truncates instruction pointer.
>>>
>>> No problem, insn.c can handle that too. :)
>>
>> Does it?
>>
>> "callw 1f; 1:\n"
>> "rep; nop\n"
>>
>> objdump:
>>
>> 66 e8 00 00 callw 485 <_init-0x3ffed3>
>> f3 90 pause
>>
>>
>> if we probe this "callw", we copy MAX_INSN_BYTES into auprobe->insn,
>> and after insn_get_length() (insn_complete() == T)
>>
>> // this is correct
>> OPCODE1() == e8
>>
>> // this all looks wrong
>> insn->length == 6
>> insn->immediate.value == -1863122944
>> insn->immediate.nbytes == 4
>>
>> so it seems that lib/insn.c treats the next "pause" insn as the high
>> 16 bits of address.
>
> Or perhaps lib/insn.c is fine but objdump is wrong? And everything
> should work correctly? Although in this case I do not understand what
> this "callw" actually does.

Yeah, I think objdump is wrong. see below.

>
> int main(void)
> {
> asm (
> "nop\n"
>
> "callw 1f; 1:\n"
> ".byte 0\n"
> ".byte 0\n"
> );
>
> return 0;
> }
>
> this runs just fine. With or without gdb. And gdb shows that ->ip is
> incremented by 6 after "callw".
>
> int main(void)
> {
> asm (
> "nop\n"
>
> "callw 1f; 1:\n"
> ".byte 10\n"
> ".byte 20\n"
> );
>
> return 0;
> }
>
> objdump:
>
> 000000000040047c <main>:
> 40047c: 55 push %rbp
> 40047d: 48 89 e5 mov %rsp,%rbp
> 400480: 90 nop
> 400481: 66 e8 00 00 callw 485 <_init-0x3ffed3>
> 400485: 0a 14 b8 or (%rax,%rdi,4),%dl
> 400488: 00 00 add %al,(%rax)
> 40048a: 00 00 add %al,(%rax)
> 40048c: c9 leaveq
> 40048d: c3 retq
>
> run:
>
> $ ./t
> Segmentation fault (core dumped)
>
> $ gdb ./t core.*
> ...
> #0 0x00000000144a0487 in ?? ()
>
> 0x144a0487 - 0x400481 == 0x140a0006, this matches the additional 2 .bytes treated
> as offset.

Ah, OK. Ive also forgotten the operand size prefix on x86/x86-64.

The callw is actually "call with operand-size override prefix".
The operand-size prefix(0x66) changes the number of bytes (size) of
its operand (including immediate). But that depends on the opcode.

As I said, opcode 0xe8 (call) has Jz operand (jump offset immediate
with 16bit or 32bit size, depends on the operand size).
And on x86-32, the default operand-size is 4bytes, and it
changes to 2bytes with the 0x66 prefix. This means that the "66 e8"
becomes "callw" on x86-32.

However, please see Intel SDM 2b, appendix A,
A.2.5 Superscripts Utilized in Opcode Tables
----
f64
The operand size is forced to a 64-bit operand size when in 64-bit mode
(prefixes that change operand size are ignored for this instruction in 64-bit
mode
----
And Table A-2. One-byte Opcode Map: (08H ? FFH)
-----
CALL(f64) Jz
-----

Ah, I got this. This means that "call" on x86-64 never be "callw",
because the operand-size always 64bit, and in that case, immediate
is 4 bytes(*).

(*) again, at A.2.2 Codes for Operand Type
---
z Word for 16-bit operand-size or doubleword for 32 or 64-bit operand-size.
---

So, at least for this case, objdump is wrong. lib/insn.c is correct. :)

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/10 23:20), Denys Vlasenko wrote:
> On 04/10/2014 03:57 PM, Masami Hiramatsu wrote:
>> (2014/04/10 22:41), Denys Vlasenko wrote:
>>> On 04/09/2014 05:43 PM, Oleg Nesterov wrote:
>>>> On 04/08, Jim Keniston wrote:
>>>>>
>>>>> On Sun, 2014-04-06 at 22:16 +0200, Oleg Nesterov wrote:
>>>>>> 0xe8. Anything else?
>>>>>
>>>>> No, I think e8 is the only call instruction uprobes will see.
>>>>
>>>> Good.
>>>
>>> There is this monstrosity, "16-bit override for branches" in 64-mode:
>>>
>>> 66 e8 nn nn callw <offset16>
>>>
>>> Nobody sane uses it because it truncates instruction pointer.
>>
>> No problem, insn.c can handle that too. :)
>
> That's good that we decode it correctly,
> but there is more to it.
>
> Call insn pushes return address to stack.
>
> This "mutant 16-bit call", what should it push?
> Full RIP?
> Truncated 16-bit IP? If yes, by how much does it
> advance RSP? +2? +8?
> Hmm. Does it affect RSP or only its 16-bit lower part?

At least, if we can trust Intel SDM, it says that depends
on the operand-size (insn->opnd_bytes) and stack segment
descriptor. Please check the SDM vol.1 6.2.2 Stack Alignment
and vol.2a, 3.2 Instructions (A-M), CALL?Call Procedure.
But we'd better check it on x86-32.

Thank you!

>
> It's a can of worms! :)


--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-11 12:23:50

by Denys Vlasenko

[permalink] [raw]
Subject: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Fri, Apr 11, 2014 at 5:03 AM, Masami Hiramatsu
<[email protected]> wrote:
> At least, if we can trust Intel SDM, it says that depends
> on the operand-size (insn->opnd_bytes) and stack segment
> descriptor. Please check the SDM vol.1 6.2.2 Stack Alignment
> and vol.2a, 3.2 Instructions (A-M), CALL--Call Procedure.
> But we'd better check it on x86-32.

I am past trusting CPU manuals on this one:

By now I verified on the real hardware that AMD and Intel CPUs
handle this insn differently in 64-bit mode: Intel ignores 0x66 prefix.
AMD treats this insn the same as in 32-bit mode: as 16-bit insn.

(Should I submit a patch adding comment about it
in x86-opcode-map.txt?)

So there is no universally "correct" way to emulate it.

We, theoretically, can decode it differently *depending
on actual CPU(s) on the system*... do we really want
to go *that* far? I guess not.

--
vda

Subject: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/11 2:02), Denys Vlasenko wrote:

>> The (f64) modifier in x86-opcode-map.txt means that inat_is_force64()
>> is true for call opcode. So we won't reach "case 2:" in __get_immv32():
>> insn_get_prefixes() did set insn->opnd_bytes to 2 when it saw 0x66 prefix,
>> but it was before we reach this place, and here we overrode it.
>> This is a bug in insn decoder.
>
> I tested it on both Intel and AMD CPUs and my worst fears came true:
> this instruction has different widths on different CPUs.
>
> This program:
>
> # compile with: gcc -nostartfiles -nostdlib -o int3 int3.S
> _start: .globl _start
> .byte 0x66,0xe9,0,0
> .byte 0,0
> 1: jmp 1b
>
> compiles to this:
>
> 00000000004000b0 <_start>:
> 4000b0: 66 e9 00 00 jmpw b4 <_start-0x3ffffc>
> 4000b4: 00 00 add %al,(%rax)
> 4000b6: eb fe jmp 4000b6 <_start+0x6>
>
> and it will reach 0x4000b6 on Intel CPU.
> IOW, Intel SandyBridge CPU thinks that insn is in fact 66 e9 00 00 00 00,
> no RIP truncation occurs.
>
> On AMD K10 CPU, the very same binary jumps to 0x00b4
> and gets SIGSEGV with MAPERR.
> AMD thinks that the insn is 66 e9 00 00 as shown above.

Hmm, interesting.

> Thus, insn.c decoder implements Intel's idea of this insn
> while binutils (objdump, gdb) implement AMD decode.

Yeah, insn.c relays on intel's opcode map.

> This same program can be compiled to 32-bit code,
> in this mode both CPUs treat insn as 66 e9 00 00.

In 32 bit mode, insn.c treats it as 66 e9 00 00 correctly.

> Oleg, I'm sure you are very sympathetic by now to the idea
> of just not supporting this insn at all. ;)
>
> You can check whether insn had any prefix by checking
> insn->prefixes->nbytes != 0...

No, since there are other prefixes (and it may be meaningless)
you should find 0x66 in insn->prefixes->bytes[].

> ..but there is a problem with that. P4 introduced branch hints,
> which are implemented using segment prefixes on conditional jumps.
> Meaning that some compilers produce
>
> 2e 0f 82 nn nn nn nn
>
> as (hint not taken) JB <offset32> insn.
>
> 2e is CS segment prefix. insn->prefixes->nbytes == 1 for this insn.
> DS prefix (3e) hints that branch is taken.
>
> They were nearly useless on P4 anyway and ignored by all CPUs
> before or since, but they can be seen in some programs.

Hm, this could be done.

>
> Looks like we'll need this:
>
> /*
> * 16-bit overrides such as CALLW (66 e8 nn nn) are not supported.
> * Intel and AMD behavior differ in 64-bit mode: Intel ignores 66 prefix.
> * No one uses these insns.
> * To filter them out, reject any branch insns with prefixes...
> */
> if (insn->prefixes->nbytes > 1)
> bail_out;

As I said above, check insn->prefixes.bytes[0..nbytes].

> /*
> * ...Except a single 3e or 2e "branch taken/not taken" hint prefix.
> * These are (rarely) used, but ignored by any CPU except P4.
> * Example: 2e 0f 82 nn nn nn nn is JB,PN <offset32>
> */
> if (insn->prefixes->nbytes == 1 && (insn->prefixes->bytes[0] | 0x10) != 0x3e))
> bail_out;
>

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-14 12:24:57

by Denys Vlasenko

[permalink] [raw]
Subject: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Mon, Apr 14, 2014 at 7:14 AM, Masami Hiramatsu
<[email protected]> wrote:
>> You can check whether insn had any prefix by checking
>> insn->prefixes->nbytes != 0...
>
> No, since there are other prefixes (and it may be meaningless)
> you should find 0x66 in insn->prefixes->bytes[].

What "no"? Is my statement not true? Please reread it.
I'm not talking about checking for 0x66 prefix.
Here, I am talking about checking that insn has *any* prefix.

Subject: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/14 21:24), Denys Vlasenko wrote:
> On Mon, Apr 14, 2014 at 7:14 AM, Masami Hiramatsu
> <[email protected]> wrote:
>>> You can check whether insn had any prefix by checking
>>> insn->prefixes->nbytes != 0...
>>
>> No, since there are other prefixes (and it may be meaningless)
>> you should find 0x66 in insn->prefixes->bytes[].
>
> What "no"? Is my statement not true? Please reread it.
> I'm not talking about checking for 0x66 prefix.
> Here, I am talking about checking that insn has *any* prefix.

Ah, I see now, sorry. That's OK for me.

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

Subject: Re: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

(2014/04/11 21:23), Denys Vlasenko wrote:
> On Fri, Apr 11, 2014 at 5:03 AM, Masami Hiramatsu
> <[email protected]> wrote:
>> At least, if we can trust Intel SDM, it says that depends
>> on the operand-size (insn->opnd_bytes) and stack segment
>> descriptor. Please check the SDM vol.1 6.2.2 Stack Alignment
>> and vol.2a, 3.2 Instructions (A-M), CALL--Call Procedure.
>> But we'd better check it on x86-32.
>
> I am past trusting CPU manuals on this one:
>
> By now I verified on the real hardware that AMD and Intel CPUs
> handle this insn differently in 64-bit mode: Intel ignores 0x66 prefix.
> AMD treats this insn the same as in 32-bit mode: as 16-bit insn.
>
> (Should I submit a patch adding comment about it
> in x86-opcode-map.txt?)

Yeah, feel free to do so :)

> So there is no universally "correct" way to emulate it.
>
> We, theoretically, can decode it differently *depending
> on actual CPU(s) on the system*... do we really want
> to go *that* far? I guess not.

No I guess not too. If we start such thing, we need to cover
other x86 variants too... And actually jump/callw is not a
normal instructions.

BTW, I can see that difference on AMD64 architecture programmers
manual vol.3 appendix A. It seems that the difference is by
design...

Thank you,

--
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: [email protected]

2014-04-18 15:18:04

by Denys Vlasenko

[permalink] [raw]
Subject: Re: Re: Re: [RFC PATCH 4/6] uprobes/x86: Emulate rip-relative call's

On Mon, Apr 14, 2014 at 4:22 PM, Masami Hiramatsu
<[email protected]> wrote:
> (2014/04/11 21:23), Denys Vlasenko wrote:
>> On Fri, Apr 11, 2014 at 5:03 AM, Masami Hiramatsu
>> <[email protected]> wrote:
>>> At least, if we can trust Intel SDM, it says that depends
>>> on the operand-size (insn->opnd_bytes) and stack segment
>>> descriptor. Please check the SDM vol.1 6.2.2 Stack Alignment
>>> and vol.2a, 3.2 Instructions (A-M), CALL--Call Procedure.
>>> But we'd better check it on x86-32.
>>
>> I am past trusting CPU manuals on this one:
>>
>> By now I verified on the real hardware that AMD and Intel CPUs
>> handle this insn differently in 64-bit mode: Intel ignores 0x66 prefix.
>> AMD treats this insn the same as in 32-bit mode: as 16-bit insn.
>>
>> (Should I submit a patch adding comment about it
>> in x86-opcode-map.txt?)
>
> Yeah, feel free to do so :)

I sent a patch (in another email), please apply.