Subject: uprobe: single step over uprobe & global breakpoints

We have three pieces here:

[PATCH 1/5] uprobes: Use a helper instead of ptrace's single step
[PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

following Oleg's suggestion to allow single stepping over an uprobe.

[PATCH 3/5] uprobes: remove check for uprobe variable in
[PATCH 4/5] uprobes: probe definiton can only start with 'p' and '-'

category cleanup.

[RFC 5/5] uprobes: add global breakpoints

global breakpoints. There is no gdb interface available in terms of "how do I
know that that a program hit a global breakpoint" and not doing ps or
"cat trace" all the time.

Sebastian


Subject: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

The arch specific implementation behaves like user_enable_single_step()
except that it does not disable single stepping if it was already
enabled. This allows the debugger to single step over an uprobe.
The state of block stepping is not restored. It makes only sense
together with TF and if that was enabled then the debugger is notified.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/uprobes.h | 2 ++
arch/x86/kernel/uprobes.c | 42 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 44 insertions(+)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index f3971bb..cee5862 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -46,6 +46,8 @@ struct arch_uprobe_task {
#ifdef CONFIG_X86_64
unsigned long saved_scratch_register;
#endif
+#define UPROBE_CLEAR_TF (1 << 0)
+ unsigned int restore_flags;
};

extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 36fd420..8df1479 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -673,3 +673,45 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
}
return false;
}
+
+static int insn_changes_flags(struct arch_uprobe *auprobe)
+{
+ /* popf reads flags from stack */
+ if (auprobe->insn[0] == 0x9d)
+ return 1;
+ /*
+ * lock popf is not a valid opcode
+ * iret, sysret is not an opcode allowed in userland
+ */
+ return 0;
+}
+
+void arch_uprobe_enable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ autask->restore_flags = 0;
+ if (!test_tsk_thread_flag(current, TIF_SINGLESTEP) &&
+ !insn_changes_flags(auprobe))
+ autask->restore_flags |= UPROBE_CLEAR_TF;
+ /*
+ * The state of TIF_BLOCKSTEP is not saved. With the TF flag set we
+ * would to examine the opcode and the flags to make it right. Without
+ * TF block stepping makes no sense. Instead we wakeup the debugger via
+ * SIGTRAP in case TF was set. This has the side effect that the
+ * debugger gets woken up even if the opcode normally wouldn't do so.
+ */
+ user_enable_single_step(current);
+}
+
+void arch_uprobe_disable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ if (autask->restore_flags & UPROBE_CLEAR_TF)
+ user_disable_single_step(current);
+ else
+ send_sig(SIGTRAP, current, 0);
+}
--
1.7.10.4

Subject: [PATCH 4/5] uprobes: probe definiton can only start with 'p' and '-'

'r' and ' ' is not supported according to current code.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/trace/trace_uprobe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 03003cd..f3c3811 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -189,7 +189,7 @@ static int create_trace_uprobe(int argc, char **argv)
if (argv[0][0] == '-')
is_delete = true;
else if (argv[0][0] != 'p') {
- pr_info("Probe definition must be started with 'p', 'r' or" " '-'.\n");
+ pr_info("Probe definition must be started with 'p' or '-'.\n");
return -EINVAL;
}

--
1.7.10.4

Subject: [PATCH 1/5] uprobes: Use a helper instead of ptrace's single step enable

As Oleg pointed out in [0] utrace should not use the ptrace interface
for enabling/disabling single stepping.

[0] http://lkml.kernel.org/[email protected]

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/uprobes.h | 2 ++
kernel/events/uprobes.c | 14 ++++++++++++--
2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index efe4b33..0fc6585 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -111,6 +111,8 @@ extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsig
extern void uprobe_free_utask(struct task_struct *t);
extern void uprobe_copy_process(struct task_struct *t);
extern unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs);
+extern void __weak arch_uprobe_enable_step(struct arch_uprobe *arch);
+extern void __weak arch_uprobe_disable_step(struct arch_uprobe *arch);
extern int uprobe_post_sstep_notifier(struct pt_regs *regs);
extern int uprobe_pre_sstep_notifier(struct pt_regs *regs);
extern void uprobe_notify_resume(struct pt_regs *regs);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c08a22d..41a2555 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1463,6 +1463,16 @@ static struct uprobe *find_active_uprobe(unsigned long bp_vaddr, int *is_swbp)
return uprobe;
}

+void __weak arch_uprobe_enable_step(struct arch_uprobe *arch)
+{
+ user_enable_single_step(current);
+}
+
+void __weak arch_uprobe_disable_step(struct arch_uprobe *arch)
+{
+ user_disable_single_step(current);
+}
+
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1509,7 +1519,7 @@ static void handle_swbp(struct pt_regs *regs)

utask->state = UTASK_SSTEP;
if (!pre_ssout(uprobe, regs, bp_vaddr)) {
- user_enable_single_step(current);
+ arch_uprobe_enable_step(&uprobe->arch);
return;
}

@@ -1550,7 +1560,7 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs)
put_uprobe(uprobe);
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
- user_disable_single_step(current);
+ arch_uprobe_disable_step(&uprobe->arch);
xol_free_insn_slot(current);

spin_lock_irq(&current->sighand->siglock);
--
1.7.10.4

Subject: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

by the time we get here (after we pass cleanup_ret) uprobe is always is
set. If it is NULL we leave very early in the code.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
kernel/events/uprobes.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 41a2555..c8e5204 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1528,17 +1528,15 @@ cleanup_ret:
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
}
- if (uprobe) {
- if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+ if (!(uprobe->flags & UPROBE_SKIP_SSTEP))

- /*
- * cannot singlestep; cannot skip instruction;
- * re-execute the instruction.
- */
- instruction_pointer_set(regs, bp_vaddr);
+ /*
+ * cannot singlestep; cannot skip instruction;
+ * re-execute the instruction.
+ */
+ instruction_pointer_set(regs, bp_vaddr);

- put_uprobe(uprobe);
- }
+ put_uprobe(uprobe);
}

/*
--
1.7.10.4

Subject: [RFC 5/5] uprobes: add global breakpoints

By setting an uprobe tracepoint, one learns whenever a certain point
within a program is reached / passed. This is recorded and the
application continues.
This patch adds the ability to hold the program once this point has been
passed and the user may attach to the program via ptrace.
First, setup a global breakpoint which is very similar to a uprobe trace
point:

|echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events

This is exactly what uprobe does except that it starts with the letter
'g' instead of 'p'.

Step two is to enable it:
|echo 1 > events/uprobes/enable

Lets assume you execute ./sample and the breakpoint is hit. In ps you will
see:
|1938 pts/1 t+ 0:00 ./sample

Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
the tracee and inspect its registers, its stack, single step, let it
run…
In case the process is not of great interest, the user may continue
without gdb by writting its pid into the uprobe_gp_wakeup file

|echo 1938 > uprobe_gp_wakeup

What I miss right now is an interface to tell the user/gdb that there is a
program that hit a global breakpoint and is waiting for further instructions.
A "tail -f trace" does not work and may contain also a lot of other
informations. I've been thinking about a poll()able file which returns pids of
tasks which are put on hold. Other suggestions?

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
include/linux/uprobes.h | 10 +++++++
kernel/events/uprobes.c | 29 +++++++++++++++++--
kernel/ptrace.c | 4 ++-
kernel/trace/trace_uprobe.c | 67 +++++++++++++++++++++++++++++++++++++++++--
4 files changed, 104 insertions(+), 6 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0fc6585..991a665 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -63,6 +63,9 @@ enum uprobe_task_state {
UTASK_SSTEP,
UTASK_SSTEP_ACK,
UTASK_SSTEP_TRAPPED,
+ UTASK_TRACE_SLEEP,
+ UTASK_TRACE_WOKEUP_NORMAL,
+ UTASK_TRACE_WOKEUP_TRACED,
};

/*
@@ -76,6 +79,7 @@ struct uprobe_task {

unsigned long xol_vaddr;
unsigned long vaddr;
+ int skip_handler;
};

/*
@@ -120,6 +124,8 @@ extern bool uprobe_deny_signal(void);
extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs);
extern void uprobe_clear_state(struct mm_struct *mm);
extern void uprobe_reset_state(struct mm_struct *mm);
+extern int uprobe_wakeup_task(struct task_struct *t, int traced);
+
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
@@ -163,5 +169,9 @@ static inline void uprobe_clear_state(struct mm_struct *mm)
static inline void uprobe_reset_state(struct mm_struct *mm)
{
}
+static inline int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+ return 0;
+}
#endif /* !CONFIG_UPROBES */
#endif /* _LINUX_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c8e5204..f1326a2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1473,6 +1473,22 @@ void __weak arch_uprobe_disable_step(struct arch_uprobe *arch)
user_disable_single_step(current);
}

+int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+ struct uprobe_task *utask;
+
+ utask = t->utask;
+ if (!utask)
+ return -EINVAL;
+ if (utask->state != UTASK_TRACE_SLEEP)
+ return -EINVAL;
+
+ utask->state = traced ?
+ UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
+ wake_up_state(t, __TASK_TRACED);
+ return 0;
+}
+
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1513,7 +1529,16 @@ static void handle_swbp(struct pt_regs *regs)
goto cleanup_ret;
}
utask->active_uprobe = uprobe;
- handler_chain(uprobe, regs);
+ if (utask->skip_handler)
+ utask->skip_handler = 0;
+ else
+ handler_chain(uprobe, regs);
+
+ if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
+ send_sig(SIGTRAP, current, 0);
+ utask->skip_handler = 1;
+ goto cleanup_ret;
+ }
if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
goto cleanup_ret;

@@ -1528,7 +1553,7 @@ cleanup_ret:
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
}
- if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+ if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)

/*
* cannot singlestep; cannot skip instruction;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a232bb5..5d6d3ed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
__ptrace_link(task, current);

/* SEIZE doesn't trap tracee on attach */
- if (!seize)
+ if (!seize) {
send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+ uprobe_wakeup_task(task, 1);
+ }

spin_lock(&task->sighand->siglock);

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f3c3811..0aabee4 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -48,6 +48,7 @@ struct trace_uprobe {
unsigned int flags; /* For TP_FLAG_* */
ssize_t size; /* trace entry size */
unsigned int nr_args;
+ bool is_gb;
struct probe_arg args[];
};

@@ -177,19 +178,24 @@ static int create_trace_uprobe(int argc, char **argv)
struct path path;
unsigned long offset;
bool is_delete;
+ bool is_gb;
int i, ret;

inode = NULL;
ret = 0;
is_delete = false;
+ is_gb = false;
event = NULL;
group = NULL;

/* argc must be >= 1 */
if (argv[0][0] == '-')
is_delete = true;
+ else if (argv[0][0] == 'g')
+ is_gb = true;
else if (argv[0][0] != 'p') {
- pr_info("Probe definition must be started with 'p' or '-'.\n");
+ pr_info("Probe definition must be started with 'p', 'g' or "
+ "'-'.\n");
return -EINVAL;
}

@@ -277,7 +283,8 @@ static int create_trace_uprobe(int argc, char **argv)
if (ptr)
*ptr = '\0';

- snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset);
+ snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx",
+ is_gb ? 'g' : 'p', tail, offset);
event = buf;
kfree(tail);
}
@@ -298,6 +305,8 @@ static int create_trace_uprobe(int argc, char **argv)
goto error;
}

+ tu->is_gb = is_gb;
+
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
@@ -394,8 +403,12 @@ static int probes_seq_show(struct seq_file *m, void *v)
{
struct trace_uprobe *tu = v;
int i;
+ char type = 'p';
+
+ if (tu->is_gb)
+ type = 'g';

- seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+ seq_printf(m, "%c:%s/%s", type, tu->call.class->system, tu->call.name);
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);

for (i = 0; i < tu->nr_args; i++)
@@ -435,6 +448,38 @@ static const struct file_operations uprobe_events_ops = {
.write = probes_write,
};

+static ssize_t uprobes_gp_wakeup_write(struct file *filp,
+ const char __user *ubuf, size_t count, loff_t *ppos)
+{
+ struct task_struct *child;
+ unsigned long pid;
+ int ret;
+
+ ret = kstrtoul_from_user(ubuf, count, 0, &pid);
+ if (ret)
+ return ret;
+
+ rcu_read_lock();
+ child = find_task_by_vpid(pid);
+ if (child)
+ get_task_struct(child);
+ rcu_read_unlock();
+
+ if (!child)
+ return -EINVAL;
+
+ ret = uprobe_wakeup_task(child, 0);
+ put_task_struct(child);
+ return ret ? ret : count;
+}
+
+static const struct file_operations uprobe_gp_wakeup_ops = {
+ .owner = THIS_MODULE,
+ .open = simple_open,
+ .llseek = noop_llseek,
+ .write = uprobes_gp_wakeup_write,
+};
+
/* Probes profiling interfaces */
static int probes_profile_seq_show(struct seq_file *m, void *v)
{
@@ -704,6 +749,17 @@ int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type,
return 0;
}

+static void uprobe_wait_traced(struct trace_uprobe *tu)
+{
+ struct uprobe_task *utask;
+
+ utask = current->utask;
+ utask->state = UTASK_TRACE_SLEEP;
+
+ set_current_state(TASK_TRACED);
+ schedule();
+}
+
static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
{
struct uprobe_trace_consumer *utc;
@@ -721,6 +777,9 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
if (tu->flags & TP_FLAG_PROFILE)
uprobe_perf_func(tu, regs);
#endif
+ if (tu->is_gb)
+ uprobe_wait_traced(tu);
+
return 0;
}

@@ -779,6 +838,8 @@ static __init int init_uprobe_trace(void)

trace_create_file("uprobe_events", 0644, d_tracer,
NULL, &uprobe_events_ops);
+ trace_create_file("uprobe_gb_wakeup", 0644, d_tracer,
+ NULL, &uprobe_gp_wakeup_ops);
/* Profile interface */
trace_create_file("uprobe_profile", 0444, d_tracer,
NULL, &uprobe_profile_ops);
--
1.7.10.4

2012-08-08 09:11:21

by suzuki

[permalink] [raw]
Subject: Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

On 08/07/2012 09:42 PM, Sebastian Andrzej Siewior wrote:
> by the time we get here (after we pass cleanup_ret) uprobe is always is
> set. If it is NULL we leave very early in the code.
>
> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/events/uprobes.c | 16 +++++++---------
> 1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 41a2555..c8e5204 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1528,17 +1528,15 @@ cleanup_ret:
> utask->active_uprobe = NULL;
> utask->state = UTASK_RUNNING;
> }
> - if (uprobe) {
> - if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
> + if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>
Shouldn't we check uprobe != NULL before we check the uprobe->flags ?
i.e, shouldn't the above line be :

if (uprobe && ! (uprobe->flags & UPROBE_SKIP_SSTEP)) ?
> - /*
> - * cannot singlestep; cannot skip instruction;
> - * re-execute the instruction.
> - */
> - instruction_pointer_set(regs, bp_vaddr);
> + /*
> + * cannot singlestep; cannot skip instruction;
> + * re-execute the instruction.
> + */
> + instruction_pointer_set(regs, bp_vaddr);
>
> - put_uprobe(uprobe);
> - }
> + put_uprobe(uprobe);
> }

Thanks
Suzuki

Subject: Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

On 08/08/2012 11:10 AM, Suzuki K. Poulose wrote:
>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1528,17 +1528,15 @@ cleanup_ret:
>> utask->active_uprobe = NULL;
>> utask->state = UTASK_RUNNING;
>> }
>> - if (uprobe) {
>> - if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>> + if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>>
> Shouldn't we check uprobe != NULL before we check the uprobe->flags ?
> i.e, shouldn't the above line be :
>
> if (uprobe && ! (uprobe->flags & UPROBE_SKIP_SSTEP)) ?

The function starts like this:

if (!uprobe) {
if (is_swbp > 0) {
send_sig(SIGTRAP, current, 0);
} else {
instruction_pointer_set(regs, bp_vaddr);
}
return;
}

Which makes uprobe != NULL by the time we get there, no?

Sebastian

2012-08-08 13:02:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

On 08/07, Sebastian Andrzej Siewior wrote:
>
> by the time we get here (after we pass cleanup_ret) uprobe is always is
> set. If it is NULL we leave very early in the code.

Reviewed-by: Oleg Nesterov <[email protected]>

> Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
> ---
> kernel/events/uprobes.c | 16 +++++++---------
> 1 file changed, 7 insertions(+), 9 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 41a2555..c8e5204 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1528,17 +1528,15 @@ cleanup_ret:
> utask->active_uprobe = NULL;
> utask->state = UTASK_RUNNING;
> }
> - if (uprobe) {
> - if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
> + if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>
> - /*
> - * cannot singlestep; cannot skip instruction;
> - * re-execute the instruction.
> - */
> - instruction_pointer_set(regs, bp_vaddr);
> + /*
> + * cannot singlestep; cannot skip instruction;
> + * re-execute the instruction.
> + */
> + instruction_pointer_set(regs, bp_vaddr);
>
> - put_uprobe(uprobe);
> - }
> + put_uprobe(uprobe);
> }
>
> /*
> --
> 1.7.10.4
>

2012-08-08 13:13:13

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/07, Sebastian Andrzej Siewior wrote:
>
> The arch specific implementation behaves like user_enable_single_step()
> except that it does not disable single stepping if it was already
> enabled. This allows the debugger to single step over an uprobe.
> The state of block stepping is not restored. It makes only sense
> together with TF and if that was enabled then the debugger is notified.

I'll try to read this series later, just one nit for now...

> +static int insn_changes_flags(struct arch_uprobe *auprobe)
> +{
> + /* popf reads flags from stack */
> + if (auprobe->insn[0] == 0x9d)
> + return 1;

Ah, somehow I didn't think about this before.

->insn[0] doesn't look right, we should skip the prefixes.

Srikar, could you help? Perhaps validate_insn_bits() paths can
detect "popf" and do auprobe->fixups |= UPROBE_FIX_TF ?

This way we also do not need the new member in utask.

> +void arch_uprobe_enable_step(struct arch_uprobe *auprobe)
> +{
> + struct uprobe_task *utask = current->utask;
> + struct arch_uprobe_task *autask = &utask->autask;
> +
> + autask->restore_flags = 0;
> + if (!test_tsk_thread_flag(current, TIF_SINGLESTEP) &&
> + !insn_changes_flags(auprobe))
> + autask->restore_flags |= UPROBE_CLEAR_TF;
> + /*
> + * The state of TIF_BLOCKSTEP is not saved. With the TF flag set we
> + * would to examine the opcode and the flags to make it right. Without
> + * TF block stepping makes no sense. Instead we wakeup the debugger via
> + * SIGTRAP in case TF was set. This has the side effect that the
> + * debugger gets woken up even if the opcode normally wouldn't do so.
> + */
> + user_enable_single_step(current);

OK, once we have set_task_blockstep() we can change this.

Oleg.

Subject: Re: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/08/2012 02:57 PM, Oleg Nesterov wrote:
>> +static int insn_changes_flags(struct arch_uprobe *auprobe)
>> +{
>> + /* popf reads flags from stack */
>> + if (auprobe->insn[0] == 0x9d)
>> + return 1;
>
> Ah, somehow I didn't think about this before.
>
> ->insn[0] doesn't look right, we should skip the prefixes.

Why? I tried 'lock popf' and I got invalid instruction. The same for
'rep popf'.

> Oleg.
>

Sebastian

2012-08-08 13:19:17

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC 5/5] uprobes: add global breakpoints

On 08/07, Sebastian Andrzej Siewior wrote:
>
> By setting an uprobe tracepoint, one learns whenever a certain point
> within a program is reached / passed. This is recorded and the
> application continues.
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.
> First, setup a global breakpoint which is very similar to a uprobe trace
> point:
>
> |echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events
>
> This is exactly what uprobe does except that it starts with the letter
> 'g' instead of 'p'.
>
> Step two is to enable it:
> |echo 1 > events/uprobes/enable
>
> Lets assume you execute ./sample and the breakpoint is hit. In ps you will
> see:
> |1938 pts/1 t+ 0:00 ./sample
>
> Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
> the tracee and inspect its registers, its stack, single step, let it
> run…
> In case the process is not of great interest, the user may continue
> without gdb by writting its pid into the uprobe_gp_wakeup file
>
> |echo 1938 > uprobe_gp_wakeup
>
> What I miss right now is an interface to tell the user/gdb that there is a
> program that hit a global breakpoint and is waiting for further instructions.
> A "tail -f trace" does not work and may contain also a lot of other
> informations. I've been thinking about a poll()able file which returns pids of
> tasks which are put on hold. Other suggestions?

Honestly, I am not sure this is that useful...

OK, I'll try to read this patch later. But, at first glance,

> +int uprobe_wakeup_task(struct task_struct *t, int traced)
> +{
> + struct uprobe_task *utask;
> +
> + utask = t->utask;
> + if (!utask)
> + return -EINVAL;
> + if (utask->state != UTASK_TRACE_SLEEP)
> + return -EINVAL;
> +
> + utask->state = traced ?
> + UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
> + wake_up_state(t, __TASK_TRACED);
> + return 0;
> +}

This can obviously race with uprobe_wait_traced(), see below

> @@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
> __ptrace_link(task, current);
>
> /* SEIZE doesn't trap tracee on attach */
> - if (!seize)
> + if (!seize) {
> send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
> + uprobe_wakeup_task(task, 1);
> + }

Can't understand why uprobe_wakeup_task() depends on !PTRACE_SEIZE

> +static void uprobe_wait_traced(struct trace_uprobe *tu)
> +{
> + struct uprobe_task *utask;
> +
> + utask = current->utask;
> + utask->state = UTASK_TRACE_SLEEP;

WINDOW

> +
> + set_current_state(TASK_TRACED);
> + schedule();
> +}

Suppose that uprobe_wakeup_task() is called in the WINDOW above.

OTOH, uprobe_wakeup_task() can race with itself if it is called
twice at the same time, say from uprobes_gp_wakeup_write() and
ptrace_attach().

Oleg.

2012-08-08 14:57:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/08, Sebastian Andrzej Siewior wrote:
>
> On 08/08/2012 02:57 PM, Oleg Nesterov wrote:
>>> +static int insn_changes_flags(struct arch_uprobe *auprobe)
>>> +{
>>> + /* popf reads flags from stack */
>>> + if (auprobe->insn[0] == 0x9d)
>>> + return 1;
>>
>> Ah, somehow I didn't think about this before.
>>
>> ->insn[0] doesn't look right, we should skip the prefixes.
>
> Why? I tried 'lock popf' and I got invalid instruction. The same for
> 'rep popf'.

int main(void)
{
asm volatile ("pushf; rep; popf");

return 0;
}

objdump:

00000000040047c <main>:
40047c: 55 push %rbp
40047d: 48 89 e5 mov %rsp,%rbp
400480: 9c pushfq
400481: f3 9d repz popfq
400483: b8 00 00 00 00 mov $0x0,%eax
400488: c9 leaveq
400489: c3 retq



OK, probably nobody should do this (although the kernel should not
assume this imho), but

asm volatile ("pushfw; popfw");

doesn't look bad and the code is

000000000040047c <main>:
40047c: 55 push %rbp
40047d: 48 89 e5 mov %rsp,%rbp
400480: 66 9c pushfw
400482: 66 9d popfw
400484: b8 00 00 00 00 mov $0x0,%eax
400489: c9 leaveq
40048a: c3 retq



And in any case it would be better to re-use auprobe->fixups.

Oleg.

Subject: Re: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/08/2012 04:53 PM, Oleg Nesterov wrote:
>> Why? I tried 'lock popf' and I got invalid instruction. The same for
>> 'rep popf'.
>
> int main(void)
> {
> asm volatile ("pushf; rep; popf");
>
> return 0;
> }
>

Just tested and it works. Hmm.

> OK, probably nobody should do this (although the kernel should not
> assume this imho), but
>
> asm volatile ("pushfw; popfw");
>
> doesn't look bad and the code is
>
> 000000000040047c<main>:
> 40047c: 55 push %rbp
> 40047d: 48 89 e5 mov %rsp,%rbp
> 400480: 66 9c pushfw
> 400482: 66 9d popfw
> 400484: b8 00 00 00 00 mov $0x0,%eax
> 400489: c9 leaveq
> 40048a: c3 retq

Yes, that one works as well.

> And in any case it would be better to re-use auprobe->fixups.

Okay.

> Oleg.
>

Sebastian

Subject: Re: [PATCH 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On Wed, Aug 08, 2012 at 04:53:45PM +0200, Oleg Nesterov wrote:
> On 08/08, Sebastian Andrzej Siewior wrote:

...

> >> ->insn[0] doesn't look right, we should skip the prefixes.

insn_init()
insn_get_opcode()
if (OPCODE1() == 0x9d)

is always the right way of doing it.

...

> And in any case it would be better to re-use auprobe->fixups.

Agreed.

Ananth

Subject: [PATCH v2 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

The arch specific implementation behaves like user_enable_single_step()
except that it does not disable single stepping if it was already
enabled. This allows the debugger to single step over an uprobe.
The state of block stepping is not restored. It makes only sense
together with TF and if that was enabled then the debugger is notified.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
v1..v2: re-use auprobe->fixups for fixups

arch/x86/include/asm/uprobes.h | 2 ++
arch/x86/kernel/uprobes.c | 41 +++++++++++++++++++++++++++++++++++++++-
2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index f3971bb..cee5862 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -46,6 +46,8 @@ struct arch_uprobe_task {
#ifdef CONFIG_X86_64
unsigned long saved_scratch_register;
#endif
+#define UPROBE_CLEAR_TF (1 << 0)
+ unsigned int restore_flags;
};

extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 36fd420..5a43606 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,6 +41,9 @@
/* Adjust the return address of a call insn */
#define UPROBE_FIX_CALL 0x2

+/* Instruction will modify TF, don't change it */
+#define UPROBE_TF_CHANGES 0x4
+
#define UPROBE_FIX_RIP_AX 0x8000
#define UPROBE_FIX_RIP_CX 0x4000

@@ -233,12 +236,16 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn)
*/
static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
{
- bool fix_ip = true, fix_call = false; /* defaults */
+ bool fix_ip = true, fix_call = false, fix_tf = false; /* defaults */
int reg;

insn_get_opcode(insn); /* should be a nop */

switch (OPCODE1(insn)) {
+ case 0x9d:
+ /* popf */
+ fix_tf = true;
+ break;
case 0xc3: /* ret/lret */
case 0xcb:
case 0xc2:
@@ -277,6 +284,8 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
auprobe->fixups |= UPROBE_FIX_IP;
if (fix_call)
auprobe->fixups |= UPROBE_FIX_CALL;
+ if (fix_tf)
+ auprobe->fixups |= UPROBE_TF_CHANGES;
}

#ifdef CONFIG_X86_64
@@ -673,3 +682,33 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
}
return false;
}
+
+void arch_uprobe_enable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ autask->restore_flags = 0;
+ if (!test_tsk_thread_flag(current, TIF_SINGLESTEP) &&
+ !(auprobe->fixups & UPROBE_TF_CHANGES))
+ autask->restore_flags |= UPROBE_CLEAR_TF;
+ /*
+ * The state of TIF_BLOCKSTEP is not saved. With the TF flag set we
+ * would to examine the opcode and the flags to make it right. Without
+ * TF block stepping makes no sense. Instead we wakeup the debugger via
+ * SIGTRAP in case TF was set. This has the side effect that the
+ * debugger gets woken up even if the opcode normally wouldn't do so.
+ */
+ user_enable_single_step(current);
+}
+
+void arch_uprobe_disable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ if (autask->restore_flags & UPROBE_CLEAR_TF)
+ user_disable_single_step(current);
+ else
+ send_sig(SIGTRAP, current, 0);
+}
--
1.7.10.4

Subject: Re: [RFC 5/5] uprobes: add global breakpoints

* Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:

>> What I miss right now is an interface to tell the user/gdb that there is a
>> program that hit a global breakpoint and is waiting for further instructions.
>> A "tail -f trace" does not work and may contain also a lot of other
>> informations. I've been thinking about a poll()able file which returns pids of
>> tasks which are put on hold. Other suggestions?
>
>Honestly, I am not sure this is that useful...

How would you notify gdb that there is a new task that hit a breakpoint?
Or learn yourself?

>OK, I'll try to read this patch later. But, at first glance,
Thank you.

>> @@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
>> __ptrace_link(task, current);
>>
>> /* SEIZE doesn't trap tracee on attach */
>> - if (!seize)
>> + if (!seize) {
>> send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
>> + uprobe_wakeup_task(task, 1);
>> + }
>
>Can't understand why uprobe_wakeup_task() depends on !PTRACE_SEIZE

because in the SEIZE case the task isn't halted, it continues to run. Or
do you want to use PTRACE_SEIZE for tasks which hit the global
breakpoint and you have no interrest in them and want them to continue
like nothing happend?

>> +
>> + set_current_state(TASK_TRACED);
>> + schedule();
>> +}
>
>Suppose that uprobe_wakeup_task() is called in the WINDOW above.
>
>OTOH, uprobe_wakeup_task() can race with itself if it is called
>twice at the same time, say from uprobes_gp_wakeup_write() and
>ptrace_attach().
Okay, I'm going to close the window.

>
>Oleg.

Sebastian

2012-08-10 05:23:45

by suzuki

[permalink] [raw]
Subject: Re: [PATCH 3/5] uprobes: remove check for uprobe variable in handle_swbp()

On 08/08/2012 03:05 PM, Sebastian Andrzej Siewior wrote:
> On 08/08/2012 11:10 AM, Suzuki K. Poulose wrote:
>>> --- a/kernel/events/uprobes.c
>>> +++ b/kernel/events/uprobes.c
>>> @@ -1528,17 +1528,15 @@ cleanup_ret:
>>> utask->active_uprobe = NULL;
>>> utask->state = UTASK_RUNNING;
>>> }
>>> - if (uprobe) {
>>> - if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>>> + if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
>>>
>> Shouldn't we check uprobe != NULL before we check the uprobe->flags ?
>> i.e, shouldn't the above line be :
>>
>> if (uprobe && ! (uprobe->flags & UPROBE_SKIP_SSTEP)) ?
>
> The function starts like this:
>
> if (!uprobe) {
> if (is_swbp > 0) {
> send_sig(SIGTRAP, current, 0);
> } else {
> instruction_pointer_set(regs, bp_vaddr);
> }
> return;
> }
>
> Which makes uprobe != NULL by the time we get there, no?
>
My bad, was looking at an older version of the function. Also,
the removal of the if (uprobe), check triggered the above question.

Thanks
Suzuki

2012-08-13 11:35:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC 5/5] uprobes: add global breakpoints

On Tue, 2012-08-07 at 18:12 +0200, Sebastian Andrzej Siewior wrote:
> By setting an uprobe tracepoint, one learns whenever a certain point
> within a program is reached / passed. This is recorded and the
> application continues.
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.
> First, setup a global breakpoint which is very similar to a uprobe trace
> point:
>
> |echo 'g /home/bigeasy/sample:0x0000044d %ip %ax' > uprobe_events
>
> This is exactly what uprobe does except that it starts with the letter
> 'g' instead of 'p'.
>
> Step two is to enable it:
> |echo 1 > events/uprobes/enable
>
> Lets assume you execute ./sample and the breakpoint is hit. In ps you will
> see:
> |1938 pts/1 t+ 0:00 ./sample

This seems particularly dangerous.. suppose you tag a frequent function
(say malloc) and the entire userspace freezes, including your shell.

> Now you can attach gdb via 'gdb -p 1938'. The gdb can now interact with
> the tracee and inspect its registers, its stack, single step, let it
> run…
> In case the process is not of great interest, the user may continue
> without gdb by writting its pid into the uprobe_gp_wakeup file
>
> |echo 1938 > uprobe_gp_wakeup
>
> What I miss right now is an interface to tell the user/gdb that there is a
> program that hit a global breakpoint and is waiting for further instructions.
> A "tail -f trace" does not work and may contain also a lot of other
> informations. I've been thinking about a poll()able file which returns pids of
> tasks which are put on hold. Other suggestions?

I'm not really happy with any of this. I would suggest limiting this
stuff much further, like say only have it affect ptraced
processes/tasks. That way you cannot accidentally freeze the entire
system into oblivion.

GDB (and assorted stuff) can already track an entire process hierarchy
with fork follow stuffs.

2012-08-13 13:20:31

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC 5/5] uprobes: add global breakpoints

On 08/09, Sebastian Andrzej Siewior wrote:
>
> * Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:
>
> >> What I miss right now is an interface to tell the user/gdb that there is a
> >> program that hit a global breakpoint and is waiting for further instructions.
> >> A "tail -f trace" does not work and may contain also a lot of other
> >> informations. I've been thinking about a poll()able file which returns pids of
> >> tasks which are put on hold. Other suggestions?
> >
> >Honestly, I am not sure this is that useful...
>
> How would you notify gdb that there is a new task that hit a breakpoint?
> Or learn yourself?

But why do we need this?

OK, you do not need to convince me, I try to never argue with
new features.

However, I certainly dislike TASK_TRACED in uprobe_wait_traced().
And sleeping in ->handler() is not fair to other consumers.

And I do not think you should modify ptrace_attach() at all.
gdb/user can wakeup the task after PTRACE_ATTACH itself.

Oleg.

2012-08-13 13:28:44

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/09, Sebastian Andrzej Siewior wrote:
>
> v1..v2: re-use auprobe->fixups for fixups

Yes, but

> @@ -46,6 +46,8 @@ struct arch_uprobe_task {
> #ifdef CONFIG_X86_64
> unsigned long saved_scratch_register;
> #endif
> +#define UPROBE_CLEAR_TF (1 << 0)
> + unsigned int restore_flags;
> };

this patch still adds restore_flags into arch_uprobe_task.

> static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
> {
> - bool fix_ip = true, fix_call = false; /* defaults */
> + bool fix_ip = true, fix_call = false, fix_tf = false; /* defaults */
> int reg;
>
> insn_get_opcode(insn); /* should be a nop */
>
> switch (OPCODE1(insn)) {
> + case 0x9d:
> + /* popf */
> + fix_tf = true;
> + break;
> case 0xc3: /* ret/lret */
> case 0xcb:
> case 0xc2:
> @@ -277,6 +284,8 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
> auprobe->fixups |= UPROBE_FIX_IP;
> if (fix_call)
> auprobe->fixups |= UPROBE_FIX_CALL;
> + if (fix_tf)
> + auprobe->fixups |= UPROBE_TF_CHANGES;
> }

I won't insist, but do we really need fix_tf? "case 0x9d" could simply
add UPROBE_TF_CHANGES.

Oleg.

Subject: Re: [PATCH v2 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/13/2012 03:24 PM, Oleg Nesterov wrote:
> On 08/09, Sebastian Andrzej Siewior wrote:
>>
>> v1..v2: re-use auprobe->fixups for fixups
>
> Yes, but
>
>> @@ -46,6 +46,8 @@ struct arch_uprobe_task {
>> #ifdef CONFIG_X86_64
>> unsigned long saved_scratch_register;
>> #endif
>> +#define UPROBE_CLEAR_TF (1<< 0)
>> + unsigned int restore_flags;
>> };
>
> this patch still adds restore_flags into arch_uprobe_task.

Yes, but

>> static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
>> {
>> - bool fix_ip = true, fix_call = false; /* defaults */
>> + bool fix_ip = true, fix_call = false, fix_tf = false; /* defaults */
>> int reg;
>>
>> insn_get_opcode(insn); /* should be a nop */
>>
>> switch (OPCODE1(insn)) {
>> + case 0x9d:
>> + /* popf */
>> + fix_tf = true;
>> + break;
>> case 0xc3: /* ret/lret */
>> case 0xcb:
>> case 0xc2:
>> @@ -277,6 +284,8 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
>> auprobe->fixups |= UPROBE_FIX_IP;
>> if (fix_call)
>> auprobe->fixups |= UPROBE_FIX_CALL;
>> + if (fix_tf)
>> + auprobe->fixups |= UPROBE_TF_CHANGES;
>> }
>
> I won't insist, but do we really need fix_tf? "case 0x9d" could simply
> add UPROBE_TF_CHANGES.

if it is not 0x9d (in most cases) we need to decide on per-process
basis (not per-breakpoint) whether the task has gdb watching it or not.
So this code is evaluated once (by the time the breakpoint is
installed) but it may be executed two times: once with gdb and once
without it. On first execution the SIGTRAP will wakeup gdb, on the
second the SIGTRAP will terminate the program because there is no TRAP
handler installed.

> Oleg.

Sebastian

Subject: Re: [RFC 5/5] uprobes: add global breakpoints

On 08/13/2012 03:16 PM, Oleg Nesterov wrote:
> On 08/09, Sebastian Andrzej Siewior wrote:
>>
>> * Oleg Nesterov | 2012-08-08 15:14:57 [+0200]:
>>
>>>> What I miss right now is an interface to tell the user/gdb that there is a
>>>> program that hit a global breakpoint and is waiting for further instructions.
>>>> A "tail -f trace" does not work and may contain also a lot of other
>>>> informations. I've been thinking about a poll()able file which returns pids of
>>>> tasks which are put on hold. Other suggestions?
>>>
>>> Honestly, I am not sure this is that useful...
>>
>> How would you notify gdb that there is a new task that hit a breakpoint?
>> Or learn yourself?
>
> But why do we need this?

Shouldn't we learn somehow that a process hits a breakpoint? The task
was not yet monitored by gdb.

> OK, you do not need to convince me, I try to never argue with
> new features.

If there is a simple mechanism, I would switch to it. Right now I think
about using this "notification mechanism" to auto-exlude the listener
(and its parents) from the list of possible targets. So I don't freeze
the whole system while I have a breakpoint at malloc() in libc.

> However, I certainly dislike TASK_TRACED in uprobe_wait_traced().
> And sleeping in ->handler() is not fair to other consumers.

I added it as the last task in current consumer. I could move it out of
the consumer loop and freeze it after all consumer are handled but then
I lose the filter member (which is currently NULL, I know).

> And I do not think you should modify ptrace_attach() at all.
> gdb/user can wakeup the task after PTRACE_ATTACH itself.

I see. gdb / strace --pid $num" gdb does PTRACE_ATTACH and waits
afterwords in wait() indefinitely for the SIGSTOP which is blocked
since the process is already in TASK_TRACED. This is nice since the
signals are blocked and are delivered once the task is unfrozed.

> Oleg.

Sebastian

2012-08-14 14:31:21

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 2/5] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/14, Sebastian Andrzej Siewior wrote:
>
> On 08/13/2012 03:24 PM, Oleg Nesterov wrote:
>>
>> this patch still adds restore_flags into arch_uprobe_task.
>
> Yes, but

OOPS. Yes, we need a new member in ->utask now to record the state
of TIF_SINGLESTEP (X86_EFLAGS_TF actually).

I meant that, since the patch still uses TIF_SINGLESTEP,
arch_uprobe_disable_step() can check it but somehow I forgot that
since arch_uprobe_enable_step() still does user_enable_single_step()
TIF_SINGLESTEP is always set.

>>> static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
>>> {
>>> - bool fix_ip = true, fix_call = false; /* defaults */
>>> + bool fix_ip = true, fix_call = false, fix_tf = false; /* defaults */
>>> int reg;
>>>
>>> insn_get_opcode(insn); /* should be a nop */
>>>
>>> switch (OPCODE1(insn)) {
>>> + case 0x9d:
>>> + /* popf */
>>> + fix_tf = true;
>>> + break;
>>> case 0xc3: /* ret/lret */
>>> case 0xcb:
>>> case 0xc2:
>>> @@ -277,6 +284,8 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
>>> auprobe->fixups |= UPROBE_FIX_IP;
>>> if (fix_call)
>>> auprobe->fixups |= UPROBE_FIX_CALL;
>>> + if (fix_tf)
>>> + auprobe->fixups |= UPROBE_TF_CHANGES;
>>> }
>>
>> I won't insist, but do we really need fix_tf? "case 0x9d" could simply
>> add UPROBE_TF_CHANGES.
>
> if it is not 0x9d (in most cases) we need to decide on per-process
> basis (not per-breakpoint) whether the task has gdb watching it or not.

Yes, yes, I see, thanks.

But this doesn't explain why do we need to add the new variable, fix_tf.

case 0x9d:
auprobe->fixups |= UPROBE_TF_CHANGES;
break;

seems enough.

Oleg.

Subject: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

The arch specific implementation behaves like user_enable_single_step()
except that it does not disable single stepping if it was already
enabled. This allows the debugger to single step over an uprobe.
The state of block stepping is not restored. It makes only sense
together with TF and if that was enabled then the debugger is notified.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
v2..v3: don't introduce a bool var in prepare_fixups(), do it in within
case switch
v1..v2: re-use auprobe->fixups for fixups
arch/x86/include/asm/uprobes.h | 2 ++
arch/x86/kernel/uprobes.c | 37 +++++++++++++++++++++++++++++++++++++
2 files changed, 39 insertions(+)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index f3971bb..cee5862 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -46,6 +46,8 @@ struct arch_uprobe_task {
#ifdef CONFIG_X86_64
unsigned long saved_scratch_register;
#endif
+#define UPROBE_CLEAR_TF (1 << 0)
+ unsigned int restore_flags;
};

extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 36fd420..54182bd 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,6 +41,9 @@
/* Adjust the return address of a call insn */
#define UPROBE_FIX_CALL 0x2

+/* Instruction will modify TF, don't change it */
+#define UPROBE_TF_CHANGES 0x4
+
#define UPROBE_FIX_RIP_AX 0x8000
#define UPROBE_FIX_RIP_CX 0x4000

@@ -239,6 +242,10 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
insn_get_opcode(insn); /* should be a nop */

switch (OPCODE1(insn)) {
+ case 0x9d:
+ /* popf */
+ auprobe->fixups |= UPROBE_TF_CHANGES;
+ break;
case 0xc3: /* ret/lret */
case 0xcb:
case 0xc2:
@@ -673,3 +680,33 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
}
return false;
}
+
+void arch_uprobe_enable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ autask->restore_flags = 0;
+ if (!test_tsk_thread_flag(current, TIF_SINGLESTEP) &&
+ !(auprobe->fixups & UPROBE_TF_CHANGES))
+ autask->restore_flags |= UPROBE_CLEAR_TF;
+ /*
+ * The state of TIF_BLOCKSTEP is not saved. With the TF flag set we
+ * would to examine the opcode and the flags to make it right. Without
+ * TF block stepping makes no sense. Instead we wakeup the debugger via
+ * SIGTRAP in case TF was set. This has the side effect that the
+ * debugger gets woken up even if the opcode normally wouldn't do so.
+ */
+ user_enable_single_step(current);
+}
+
+void arch_uprobe_disable_step(struct arch_uprobe *auprobe)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ if (autask->restore_flags & UPROBE_CLEAR_TF)
+ user_disable_single_step(current);
+ else
+ send_sig(SIGTRAP, current, 0);
+}
--
1.7.10.4

Subject: Re: [RFC 5/5] uprobes: add global breakpoints

On 08/13/2012 01:34 PM, Peter Zijlstra wrote:
> I'm not really happy with any of this. I would suggest limiting this
> stuff much further, like say only have it affect ptraced
> processes/tasks. That way you cannot accidentally freeze the entire
> system into oblivion.

I'be been browsing over the cgroup Documentation and it seems to look
usefull. What I have in mind is the following:
/sys/fs/cgroup/gb

The root group is the default one where a breakpoint triggers. Below
you could have two groups:
- excluded
- active

The excluded group would never trigger a breakpoint.
Once a task in the root set triggers a breakpoint it will be moved into
to the active set. The eventfd notifcation API of cgroups could be used
to learn about this change.

The whole concept fails if the user does not move a single task into
the excluded group. To be overprotective here, I could try not do
anything until we have at least one pid in the "excluded" set.

So far I like this, it could be heat though.

Sebastian

Subject: [RFC 5/5 v2] uprobes: add global breakpoints

By setting an uprobe tracepoint, one learns whenever a certain point
within a program is reached / passed. This is recorded and the
application continues.
This patch adds the ability to hold the program once this point has been
passed and the user may attach to the program via ptrace.
First, setup a global breakpoint which is very similar to a uprobe trace
point:

|echo 'g /home/bigeasy/uprobetest/sample:0x0000044d %ip %ax %bx' > uprobe_events

This is exactly what uprobe does except that it starts with the letter
'g' instead of 'p'.

Step two is to enable it:
|echo 1 > events/uprobes/enable

Step three is to add pids of prcocess which are excluded from global
breakpoints even if the process would hit one. This should ensure that
the debugger remains active and the global breakpoint on system libc's
malloc() does not freeze the system. A pid can be excluded by
| echo e $pid > uprobe_gb_exclude

You need atleast one pid in the exlude list. An entry can be removed by
| echo a $pid > uprobe_gb_exclude

Lets assume you execute ./sample and the breakpoint is hit. In ps you will
see:
|1938 pts/1 t+ 0:00 ./sample

Now you can attach gdb via 'gdb -p 1938'. The gdb now can interact with
the tracee and inspect its registers or its stack, single step, let it
run…
In case the process is not of great interest, the user may continue
without gdb by writting its pid into the uprobe_gp_wakeup file

|echo 1938 > uprobe_gp_wakeup

Cc: [email protected]
Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
v1..v2:
- closed the window between set state / check state
- tried to address Peters review / concern:
- added "uprobe_gb_exclude". This file contains a list of pids which
are excluded from the "global breakpoint" behavior. The idea is to
whitelist programs which are essential and must not hit a
breakpoint. An empty list is invalid and _no_ global breakpoint will
hit.
- added "uprobe_gb_active". This file contains a list of pids which
hit the global breakpoint. The user can poll() here and wait for
the next victim. The size of the list limited. This is step two to
ensure a global system lock up does not occur. If a java program is
beeing debugged and the size of the list is too small then the list
could be allocated at runtime with more entries.

I've been thinking about alterntives to the approach above:
- cgroups
Would solve some problems. It would be very easy for the user to
group tasks in two groups: "root" group with "allowed" tasks and
sub group "excluded" for tasks which are excluded from the global
breakpoint(s). A third group would be required to put the "halted"
tasks. I would need one file to set the type of the group (root is
easy, "allowed" and "halted" have to be set). The notification
mechanism works on per file basis. So I would have to add file with
no content just to let the user that the task file has new entries.
All in all this looks like a abuse of cgroups just to follow forks
on the exclude list and maintain the list.
- auto exclude the read()er / poll()er of uprobe_gb_active
This sounds lovely but has two short commings:
- the pid of the process that opened it may change after fork()
since the initial owner may exit
- they may be two+ childs after fork() which read() / poll(). Both
should be excluded since I don't kwnow which one is which. I don't
know which one terminates because ->release() is called by last
process that closes the fd. That means in this scenario I would
add more entries to the while list than remove.
- having a list of tasks which currently poll() the file would
solve the problem with this endless growing list. However once
poll() is done (one process just hit the global breakpoint) I have
an empty list since no one can poll() now. That means that I
would exclude every further process which hits the global breakpoint
before someone poll()s again.

Oleg: The change in ptrace_attach() is still as it was. I tried to
address Peter concern here.
Now what options do I have here:
- not putting the task in TASK_TRACED but simply halt. This would work
without a change to ptrace_attach() but the task continues on any
signal. So a signal friendly task would continue and not notice a
thing.
- putting the TASK_TRACED and not touching ptrace_attach(). Each
ptrace() user would have to kick the task itself which means changes
to gdb / strace. If this is the prefered way then I guess it can be
done :)

include/linux/uprobes.h | 10 ++
kernel/events/uprobes.c | 13 +-
kernel/ptrace.c | 4 +-
kernel/trace/trace_uprobe.c | 414 ++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 435 insertions(+), 6 deletions(-)

diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 0fc6585..991a665 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -63,6 +63,9 @@ enum uprobe_task_state {
UTASK_SSTEP,
UTASK_SSTEP_ACK,
UTASK_SSTEP_TRAPPED,
+ UTASK_TRACE_SLEEP,
+ UTASK_TRACE_WOKEUP_NORMAL,
+ UTASK_TRACE_WOKEUP_TRACED,
};

/*
@@ -76,6 +79,7 @@ struct uprobe_task {

unsigned long xol_vaddr;
unsigned long vaddr;
+ int skip_handler;
};

/*
@@ -120,6 +124,8 @@ extern bool uprobe_deny_signal(void);
extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs);
extern void uprobe_clear_state(struct mm_struct *mm);
extern void uprobe_reset_state(struct mm_struct *mm);
+extern int uprobe_wakeup_task(struct task_struct *t, int traced);
+
#else /* !CONFIG_UPROBES */
struct uprobes_state {
};
@@ -163,5 +169,9 @@ static inline void uprobe_clear_state(struct mm_struct *mm)
static inline void uprobe_reset_state(struct mm_struct *mm)
{
}
+static inline int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+ return 0;
+}
#endif /* !CONFIG_UPROBES */
#endif /* _LINUX_UPROBES_H */
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index c8e5204..c140e03 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
goto cleanup_ret;
}
utask->active_uprobe = uprobe;
- handler_chain(uprobe, regs);
+ if (utask->skip_handler)
+ utask->skip_handler = 0;
+ else
+ handler_chain(uprobe, regs);
+
+ if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
+ send_sig(SIGTRAP, current, 0);
+ utask->skip_handler = 1;
+ goto cleanup_ret;
+ }
if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
goto cleanup_ret;

@@ -1528,7 +1537,7 @@ cleanup_ret:
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
}
- if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
+ if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)

/*
* cannot singlestep; cannot skip instruction;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index a232bb5..5d6d3ed 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -286,8 +286,10 @@ static int ptrace_attach(struct task_struct *task, long request,
__ptrace_link(task, current);

/* SEIZE doesn't trap tracee on attach */
- if (!seize)
+ if (!seize) {
send_sig_info(SIGSTOP, SEND_SIG_FORCED, task);
+ uprobe_wakeup_task(task, 1);
+ }

spin_lock(&task->sighand->siglock);

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index f3c3811..693c50a 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -22,6 +22,8 @@
#include <linux/uaccess.h>
#include <linux/uprobes.h>
#include <linux/namei.h>
+#include <linux/poll.h>
+#include <linux/sort.h>

#include "trace_probe.h"

@@ -48,6 +50,7 @@ struct trace_uprobe {
unsigned int flags; /* For TP_FLAG_* */
ssize_t size; /* trace entry size */
unsigned int nr_args;
+ bool is_gb;
struct probe_arg args[];
};

@@ -177,19 +180,24 @@ static int create_trace_uprobe(int argc, char **argv)
struct path path;
unsigned long offset;
bool is_delete;
+ bool is_gb;
int i, ret;

inode = NULL;
ret = 0;
is_delete = false;
+ is_gb = false;
event = NULL;
group = NULL;

/* argc must be >= 1 */
if (argv[0][0] == '-')
is_delete = true;
+ else if (argv[0][0] == 'g')
+ is_gb = true;
else if (argv[0][0] != 'p') {
- pr_info("Probe definition must be started with 'p' or '-'.\n");
+ pr_info("Probe definition must be started with 'p', 'g' or "
+ "'-'.\n");
return -EINVAL;
}

@@ -277,7 +285,8 @@ static int create_trace_uprobe(int argc, char **argv)
if (ptr)
*ptr = '\0';

- snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset);
+ snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx",
+ is_gb ? 'g' : 'p', tail, offset);
event = buf;
kfree(tail);
}
@@ -298,6 +307,8 @@ static int create_trace_uprobe(int argc, char **argv)
goto error;
}

+ tu->is_gb = is_gb;
+
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
@@ -394,8 +405,12 @@ static int probes_seq_show(struct seq_file *m, void *v)
{
struct trace_uprobe *tu = v;
int i;
+ char type = 'p';
+
+ if (tu->is_gb)
+ type = 'g';

- seq_printf(m, "p:%s/%s", tu->call.class->system, tu->call.name);
+ seq_printf(m, "%c:%s/%s", type, tu->call.class->system, tu->call.name);
seq_printf(m, " %s:0x%p", tu->filename, (void *)tu->offset);

for (i = 0; i < tu->nr_args; i++)
@@ -435,6 +450,366 @@ static const struct file_operations uprobe_events_ops = {
.write = probes_write,
};

+static int pidt_cmp(const void *a, const void *b)
+{
+ const pid_t *ap = a;
+ const pid_t *bp = b;
+
+ if (*ap != *bp)
+ return *ap > *bp ? 1 : -1;
+ return 0;
+}
+
+static pid_t* gb_pid_find(pid_t *first, pid_t *last, pid_t pid)
+{
+ while (first <= last) {
+ pid_t *mid;
+
+ mid = ((last - first) >> 1) + first;
+
+ if (*mid < pid)
+ first = mid + 1;
+ else if (*mid > pid)
+ last = mid - 1;
+ else
+ return mid;
+ }
+ return NULL;
+}
+
+static loff_t gb_read_reset(struct file *file, loff_t offset, int origin)
+{
+ if (offset != 0)
+ return -EINVAL;
+ if (origin != SEEK_SET)
+ return -EINVAL;
+ file->f_pos = 0;
+ return file->f_pos;
+}
+
+static DEFINE_MUTEX(gb_pid_lock);
+static DEFINE_MUTEX(gb_state_lock);
+
+static ssize_t gb_read(char __user *buffer, size_t count, loff_t *ppos,
+ pid_t *pids, u8 num_pids)
+{
+ char buf[800];
+ int left;
+ size_t wrote = 0;
+ int i;
+ int ret;
+
+ if (*ppos)
+ return 0;
+
+ left = min(sizeof(buf), count);
+
+ mutex_lock(&gb_pid_lock);
+ for (i = 0; (i < num_pids) && left - wrote > 0; i++) {
+ wrote += snprintf(&buf[wrote], left - wrote, "%d\n",
+ pids[i]);
+ }
+ mutex_unlock(&gb_pid_lock);
+
+ wrote = min(wrote, count);
+ ret = copy_to_user(buffer, buf, wrote);
+ if (ret)
+ return -EFAULT;
+ *ppos = 1;
+ return wrote;
+}
+
+static DECLARE_WAIT_QUEUE_HEAD(gb_hit_ev_queue);
+static pid_t active_pids[64];
+static u8 num_active_pids;
+
+static int uprobe_gb_record(void)
+{
+ mutex_lock(&gb_pid_lock);
+ if (WARN_ON_ONCE(num_active_pids > ARRAY_SIZE(active_pids))) {
+ mutex_unlock(&gb_pid_lock);
+ return -ENOSPC;
+ }
+
+ active_pids[num_active_pids] = current->pid;
+ num_active_pids++;
+
+ sort(active_pids, num_active_pids, sizeof(pid_t),
+ pidt_cmp, NULL);
+ mutex_unlock(&gb_pid_lock);
+
+ wake_up_interruptible(&gb_hit_ev_queue);
+ return 0;
+}
+
+static pid_t* gb_active_find(pid_t pid)
+{
+ return gb_pid_find(&active_pids[0],
+ &active_pids[num_active_pids], pid);
+}
+
+static int uprobe_gb_remove_active(pid_t pid)
+{
+ pid_t *match;
+ u8 entry;
+
+ mutex_lock(&gb_pid_lock);
+ match = gb_active_find(pid);
+ if (!match) {
+ mutex_unlock(&gb_pid_lock);
+ return -EINVAL;
+ }
+
+ num_active_pids--;
+ entry = match - active_pids;
+ memcpy(&active_pids[entry], &active_pids[entry + 1],
+ (num_active_pids - entry) * sizeof(pid_t));
+ mutex_unlock(&gb_pid_lock);
+ return 0;
+}
+
+static unsigned int gb_poll(struct file *file, struct poll_table_struct *wait)
+{
+ poll_wait(file, &gb_hit_ev_queue, wait);
+ if (num_active_pids)
+ return POLLIN | POLLRDNORM;
+ return 0;
+}
+
+static ssize_t gb_active_read(struct file *file, char __user * buffer, size_t count,
+ loff_t *ppos)
+{
+ int ret;
+ ret = gb_read(buffer, count, ppos, active_pids, num_active_pids);
+ return ret;
+}
+
+int uprobe_wakeup_task(struct task_struct *t, int traced)
+{
+ struct uprobe_task *utask;
+ int ret = -EINVAL;
+
+ utask = t->utask;
+ if (!utask)
+ return ret;
+ mutex_lock(&gb_state_lock);
+ if (utask->state != UTASK_TRACE_SLEEP)
+ goto out;
+
+ uprobe_gb_remove_active(t->pid);
+
+ utask->state = traced ?
+ UTASK_TRACE_WOKEUP_TRACED : UTASK_TRACE_WOKEUP_NORMAL;
+ wake_up_state(t, __TASK_TRACED);
+ ret = 0;
+out:
+ mutex_unlock(&gb_state_lock);
+ return ret;
+}
+
+static int gp_continue_pid(const char *buf)
+{
+ struct task_struct *child;
+ unsigned long pid;
+ int ret;
+
+ if (isspace(*buf))
+ buf++;
+
+ ret = kstrtoul(buf, 0, &pid);
+ if (ret)
+ return ret;
+
+ rcu_read_lock();
+ child = find_task_by_vpid(pid);
+ if (child)
+ get_task_struct(child);
+ rcu_read_unlock();
+
+ if (!child)
+ return -EINVAL;
+
+ ret = uprobe_wakeup_task(child, 0);
+ put_task_struct(child);
+ return ret;
+}
+
+static ssize_t gp_active_write(struct file *filp,
+ const char __user *ubuf, size_t count, loff_t *ppos)
+{
+ char buf[32];
+ int ret;
+
+ if (count >= sizeof(buf))
+ return -ERANGE;
+ ret = copy_from_user(buf, ubuf, count);
+ if (ret)
+ return -EFAULT;
+ buf[count] = '\0';
+
+ switch (buf[0]) {
+ case 'c':
+ ret = gp_continue_pid(&buf[1]);
+ break;
+
+ default:
+ ret = -EINVAL;
+ };
+
+ if (ret < 0)
+ return ret;
+ return count;
+}
+
+static const struct file_operations uprobe_gp_active_ops = {
+ .owner = THIS_MODULE,
+ .open = simple_open,
+ .llseek = gb_read_reset,
+ .read = gb_active_read,
+ .write = gp_active_write,
+ .poll = gb_poll,
+};
+
+static pid_t exlcuded_pids[64];
+static u8 num_exlcuded_pids;
+
+static pid_t* gb_exclude_find(pid_t pid)
+{
+ return gb_pid_find(&exlcuded_pids[0],
+ &exlcuded_pids[num_exlcuded_pids], pid);
+}
+
+static int uprobe_gb_allowed(void)
+{
+ pid_t *match;
+
+ if (!num_exlcuded_pids) {
+ pr_err_once("Need atleast one PID which is excluded from the "
+ "global breakpoint. This should be the "
+ "debugging tool.\n");
+ return -EINVAL;
+ }
+ mutex_lock(&gb_pid_lock);
+ match = gb_exclude_find(current->pid);
+ mutex_unlock(&gb_pid_lock);
+ if (match)
+ return -EPERM;
+ return 0;
+}
+
+static int gp_exclude_pid(const char *buf)
+{
+ unsigned long pid;
+ pid_t *match;
+ int ret;
+
+ if (isspace(*buf))
+ buf++;
+ ret = kstrtoul(buf, 0, &pid);
+ if (ret)
+ return ret;
+
+ mutex_lock(&gb_pid_lock);
+ if (num_exlcuded_pids >= ARRAY_SIZE(exlcuded_pids)) {
+ ret = -E2BIG;
+ goto out;
+ }
+
+ match = gb_exclude_find(pid);
+ if (match) {
+ ret = 0;
+ goto out;
+ }
+
+ exlcuded_pids[num_exlcuded_pids] = pid;
+ num_exlcuded_pids++;
+
+ sort(exlcuded_pids, num_exlcuded_pids, sizeof(pid_t),
+ pidt_cmp, NULL);
+out:
+ mutex_unlock(&gb_pid_lock);
+ return ret;
+}
+
+static int gp_allow_pid(const char *buf)
+{
+ unsigned long pid;
+ pid_t *match;
+ u8 entry;
+ int ret;
+
+ if (isspace(*buf))
+ buf++;
+
+ ret = kstrtoul(buf, 0, &pid);
+ if (ret)
+ return ret;
+
+ mutex_lock(&gb_pid_lock);
+ match = gb_exclude_find(pid);
+ if (!match) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ num_exlcuded_pids--;
+ entry = match - exlcuded_pids;
+ memcpy(&exlcuded_pids[entry], &exlcuded_pids[entry + 1],
+ (num_exlcuded_pids - entry) * sizeof(pid_t));
+ ret = 0;
+out:
+ mutex_unlock(&gb_pid_lock);
+ return ret;
+}
+
+static ssize_t gp_exclude_write(struct file *filp,
+ const char __user *ubuf, size_t count, loff_t *ppos)
+{
+ char buf[32];
+ int ret;
+
+ if (count >= sizeof(buf))
+ return -ERANGE;
+ ret = copy_from_user(buf, ubuf, count);
+ if (ret)
+ return -EFAULT;
+ buf[count] = '\0';
+
+ switch (buf[0]) {
+ case 'e':
+ ret = gp_exclude_pid(&buf[1]);
+ break;
+
+ case 'a':
+ ret = gp_allow_pid(&buf[1]);
+ break;
+
+ default:
+ ret = -EINVAL;
+ };
+
+ if (ret < 0)
+ return ret;
+ return count;
+}
+
+static ssize_t gb_exclude_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ int ret;
+
+ ret = gb_read(buffer, count, ppos, exlcuded_pids, num_exlcuded_pids);
+ return ret;
+}
+
+static const struct file_operations uprobe_gp_exclude_ops = {
+ .owner = THIS_MODULE,
+ .open = simple_open,
+ .llseek = gb_read_reset,
+ .read = gb_exclude_read,
+ .write = gp_exclude_write,
+};
+
/* Probes profiling interfaces */
static int probes_profile_seq_show(struct seq_file *m, void *v)
{
@@ -704,6 +1079,32 @@ int trace_uprobe_register(struct ftrace_event_call *event, enum trace_reg type,
return 0;
}

+static void uprobe_wait_traced(struct trace_uprobe *tu)
+{
+ struct uprobe_task *utask;
+ int ret;
+
+ ret = uprobe_gb_allowed();
+ if (ret)
+ return;
+
+ mutex_lock(&gb_state_lock);
+ utask = current->utask;
+ utask->state = UTASK_TRACE_SLEEP;
+
+ set_current_state(TASK_TRACED);
+ ret = uprobe_gb_record();
+ if (ret < 0) {
+ utask->state = UTASK_TRACE_WOKEUP_NORMAL;
+ set_current_state(TASK_RUNNING);
+ mutex_unlock(&gb_state_lock);
+ return;
+ }
+ mutex_unlock(&gb_state_lock);
+
+ schedule();
+}
+
static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
{
struct uprobe_trace_consumer *utc;
@@ -721,6 +1122,9 @@ static int uprobe_dispatcher(struct uprobe_consumer *con, struct pt_regs *regs)
if (tu->flags & TP_FLAG_PROFILE)
uprobe_perf_func(tu, regs);
#endif
+ if (tu->is_gb)
+ uprobe_wait_traced(tu);
+
return 0;
}

@@ -779,6 +1183,10 @@ static __init int init_uprobe_trace(void)

trace_create_file("uprobe_events", 0644, d_tracer,
NULL, &uprobe_events_ops);
+ trace_create_file("uprobe_gb_exclude", 0644, d_tracer,
+ NULL, &uprobe_gp_exclude_ops);
+ trace_create_file("uprobe_gb_active", 0644, d_tracer,
+ NULL, &uprobe_gp_active_ops);
/* Profile interface */
trace_create_file("uprobe_profile", 0444, d_tracer,
NULL, &uprobe_profile_ops);
--
1.7.10.4

2012-08-22 13:53:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC 5/5 v2] uprobes: add global breakpoints

On 08/21, Sebastian Andrzej Siewior wrote:
>
> This patch adds the ability to hold the program once this point has been
> passed and the user may attach to the program via ptrace.

Sorry Sebastian, I didn't even try to read the patch ;) Fortunately I am
not maintainer, I can only reapeat that you do not need to convince me.

> Oleg: The change in ptrace_attach() is still as it was. I tried to
> address Peter concern here.
> Now what options do I have here:
> - not putting the task in TASK_TRACED but simply halt. This would work
> without a change to ptrace_attach() but the task continues on any
> signal. So a signal friendly task would continue and not notice a
> thing.

TASK_KILLABLE

> - putting the TASK_TRACED

This is simply wrong, in many ways.

For example, what if the probed task is already ptraced? Or debugger
attaches via PTRACE_SEIZE? How can debugger know it is stopped?
uprobe_wait_traced() goes to sleep in TASK_TRACED without notification.
And it does not set ->exit_code, this means do_wait() won't work.
And note ptrace_stop()->recalc_sigpending_tsk().

> @@ -76,6 +79,7 @@ struct uprobe_task {
>
> unsigned long xol_vaddr;
> unsigned long vaddr;
> + int skip_handler;

I am trying to guess what this skip_handler does...

> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
> goto cleanup_ret;
> }
> utask->active_uprobe = uprobe;
> - handler_chain(uprobe, regs);
> + if (utask->skip_handler)
> + utask->skip_handler = 0;
> + else
> + handler_chain(uprobe, regs);
> +
> + if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
> + send_sig(SIGTRAP, current, 0);
> + utask->skip_handler = 1;
> + goto cleanup_ret;
> + }
> if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs))
> goto cleanup_ret;
>
> @@ -1528,7 +1537,7 @@ cleanup_ret:
> utask->active_uprobe = NULL;
> utask->state = UTASK_RUNNING;
> }
> - if (!(uprobe->flags & UPROBE_SKIP_SSTEP))
> + if (!(uprobe->flags & UPROBE_SKIP_SSTEP) || utask->skip_handler)

Am I understand correctly?

If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
When the task hits this bp again we skip handler_chain() because it
was already reported.

Yes? If yes, I don't think this can work. Suppose that the task
dequeues a signal before it returns to the usermode to re-execute
and enters the signal handler which can hit another uprobe.

And this can race with uprobe_register() afaics.

Oleg.

2012-08-22 14:07:50

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/20, Sebastian Andrzej Siewior wrote:
>
> The arch specific implementation behaves like user_enable_single_step()
> except that it does not disable single stepping if it was already
> enabled.

Sebastian, we have other uprobes patches in flight, I'll returns to
this after we push them.

As I said, personally I mostly agree with this change... but may be
I'll try to convince you to make v4 ;)

Oleg.

Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/22/2012 04:03 PM, Oleg Nesterov wrote:
> On 08/20, Sebastian Andrzej Siewior wrote:
>>
>> The arch specific implementation behaves like user_enable_single_step()
>> except that it does not disable single stepping if it was already
>> enabled.
>
> Sebastian, we have other uprobes patches in flight, I'll returns to
> this after we push them.
>
> As I said, personally I mostly agree with this change... but may be
> I'll try to convince you to make v4 ;)

Ehm. Is there anything I missed to do? Or are you speculating on
changes which will clash with these here?

>
> Oleg.
>

Sebastian

2012-08-22 16:04:02

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/22, Sebastian Andrzej Siewior wrote:
>
> On 08/22/2012 04:03 PM, Oleg Nesterov wrote:
>>
>> Sebastian, we have other uprobes patches in flight, I'll returns to
>> this after we push them.
>>
>> As I said, personally I mostly agree with this change... but may be
>> I'll try to convince you to make v4 ;)
>
> Ehm. Is there anything I missed to do? Or are you speculating on
> changes which will clash with these here?

If we have task_set_blockstep(), then perhaps it mmakes sense to
avoid user_enable_singlestep()/TIF_SINGLESTEP from the start.
We will see.

Oleg.

Subject: Re: [RFC 5/5 v2] uprobes: add global breakpoints

On 08/22/2012 03:48 PM, Oleg Nesterov wrote:
> On 08/21, Sebastian Andrzej Siewior wrote:
>>
>> This patch adds the ability to hold the program once this point has been
>> passed and the user may attach to the program via ptrace.
>
> Sorry Sebastian, I didn't even try to read the patch ;) Fortunately I am
> not maintainer, I can only reapeat that you do not need to convince me.

At least for the ptrace part I would prefer to have your blessing
instead something that seems to work but is wrong.

>> Oleg: The change in ptrace_attach() is still as it was. I tried to
>> address Peter concern here.
>> Now what options do I have here:
>> - not putting the task in TASK_TRACED but simply halt. This would work
>> without a change to ptrace_attach() but the task continues on any
>> signal. So a signal friendly task would continue and not notice a
>> thing.
>
> TASK_KILLABLE

That would help but would require a change in ptrace_attach() or
something in gdb/strace/?

One thing I just noticed: If I don't register a handler for SIGUSR1 and
send one to the application while it is in TASK_KILLABLE then the
signal gets delivered. If I register a signal handler for it than it
gets blocked and delivered once I resume the task.
Shouldn't it get blocked even if I don't register a handler for it?

>> - putting the TASK_TRACED
>
> This is simply wrong, in many ways.
>
> For example, what if the probed task is already ptraced? Or debugger
> attaches via PTRACE_SEIZE? How can debugger know it is stopped?
> uprobe_wait_traced() goes to sleep in TASK_TRACED without notification.
> And it does not set ->exit_code, this means do_wait() won't work.
> And note ptrace_stop()->recalc_sigpending_tsk().

Okay, okay. It looks like it is better to stick with TASK_KILLABLE
instead of fixing the issues you pointed out.

>> --- a/kernel/events/uprobes.c
>> +++ b/kernel/events/uprobes.c
>> @@ -1513,7 +1513,16 @@ static void handle_swbp(struct pt_regs *regs)
>> goto cleanup_ret;
>> }
>> utask->active_uprobe = uprobe;
>> - handler_chain(uprobe, regs);
>> + if (utask->skip_handler)
>> + utask->skip_handler = 0;
>> + else
>> + handler_chain(uprobe, regs);
>> +
>> + if (utask->state == UTASK_TRACE_WOKEUP_TRACED) {
>> + send_sig(SIGTRAP, current, 0);
>> + utask->skip_handler = 1;
>> + goto cleanup_ret;
>> + }
>> if (uprobe->flags& UPROBE_SKIP_SSTEP&& can_skip_sstep(uprobe, regs))
>> goto cleanup_ret;
>>
>> @@ -1528,7 +1537,7 @@ cleanup_ret:
>> utask->active_uprobe = NULL;
>> utask->state = UTASK_RUNNING;
>> }
>> - if (!(uprobe->flags& UPROBE_SKIP_SSTEP))
>> + if (!(uprobe->flags& UPROBE_SKIP_SSTEP) || utask->skip_handler)
>
> Am I understand correctly?
>
> If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
> re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
> When the task hits this bp again we skip handler_chain() because it
> was already reported.
>
> Yes? If yes, I don't think this can work. Suppose that the task
> dequeues a signal before it returns to the usermode to re-execute
> and enters the signal handler which can hit another uprobe.

ach, those signals make everything complicated. I though signals are
blocked until the single step is done but my test just showed my
something different. Okay, what now? A simple nested struct uprobe_task
and struct uprobe? Blocking signals isn't probably a good idea.

> And this can race with uprobe_register() afaics.

> Oleg.

Sebastian

2012-08-29 15:48:07

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [RFC 5/5 v2] uprobes: add global breakpoints

On 08/27, Sebastian Andrzej Siewior wrote:
>
> On 08/22/2012 03:48 PM, Oleg Nesterov wrote:
>> On 08/21, Sebastian Andrzej Siewior wrote:
>>>
>>> - not putting the task in TASK_TRACED but simply halt. This would work
>>> without a change to ptrace_attach() but the task continues on any
>>> signal. So a signal friendly task would continue and not notice a
>>> thing.
>>
>> TASK_KILLABLE
>
> That would help but would require a change in ptrace_attach() or
> something in gdb/strace/…

Well, I still think you should not touch ptrace_attach() at all.

> One thing I just noticed: If I don't register a handler for SIGUSR1 and
> send one to the application while it is in TASK_KILLABLE then the
> signal gets delivered.

Not really delivered... OK, it can be delivered (dequeued) before
the task sees SIGKILL, but this can be changed.

In short: in this case the task is correctly SIGKILL'ed. See sig_fatal()
in complete_signal().

> If I register a signal handler for it than it
> gets blocked and delivered once I resume the task.

Sure, if you have a handler, the signal is not fatal.

> Shouldn't it get blocked even if I don't register a handler for it?

No.

>> Am I understand correctly?
>>
>> If it was woken by PTRACE_ATTACH we set utask->skip_handler = 1 and
>> re-execute the instruction (yes, SIGTRAP, but this doesn't matter).
>> When the task hits this bp again we skip handler_chain() because it
>> was already reported.
>>
>> Yes? If yes, I don't think this can work. Suppose that the task
>> dequeues a signal before it returns to the usermode to re-execute
>> and enters the signal handler which can hit another uprobe.
>
> ach, those signals make everything complicated. I though signals are
> blocked until the single step is done

Yes, see uprobe_deny_signal().

> but my test just showed my
> something different.

I guess you missed the UTASK_SSTEP_TRAPPED logic.

But this doesn't matter. Surely we must not "block" signals _after_
the single step is done, and this is the problem.

> Okay, what now?

IMHO: don't do this ;)

> Blocking signals isn't probably a good idea.

This is bad and wrong idea, I think.

And, once again. Whatever you do, you can race with uprobe_register().
I mean, you must never expect that the task will hit the same uprobe
again, even if you are going to re-execute the same insn.

Oleg.

2012-08-29 17:35:49

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/22, Oleg Nesterov wrote:
>
> > Ehm. Is there anything I missed to do? Or are you speculating on
> > changes which will clash with these here?
>
> If we have task_set_blockstep(), then perhaps it mmakes sense to
> avoid user_enable_singlestep()/TIF_SINGLESTEP from the start.
> We will see.

But it is not clear when we will have task_set_blockstep.

So I am starting to think it would be better to apply your 1-2 and
change the code later. Still not correct, but better than nothing.



But. The more I think about the current code, the more I dislike it.
And I am starting to think we do not need yet another "weak arch*"
hook for single-stepping. Yes, it was me who suggested it, but this
is because I didn't want to complicate the merging of powerpc port.

However.

Ananth, Sebastian, what if we start with the patch below? Then
we can change arch/x86/kernel/uprobes.c to use the static
uprobe_*_step() helpers from the 2nd patch.

If we agree this code should be per-arch, then why do need other
hooks? This is just ugly, we already have arch_pre/post_xol.

The only problem is the pending powerpc patches, the change below
obviously breaks them. Were they already applied? If not, then
probably Ananth can do v6 on top of the patch below ;) The necessary
fixup is trivial.

What do you think?

Oleg.


--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -1440,10 +1440,8 @@ static void handle_swbp(struct pt_regs *
goto cleanup_ret;

utask->state = UTASK_SSTEP;
- if (!pre_ssout(uprobe, regs, bp_vaddr)) {
- user_enable_single_step(current);
+ if (!pre_ssout(uprobe, regs, bp_vaddr))
return;
- }

cleanup_ret:
if (utask) {
@@ -1480,7 +1478,6 @@ static void handle_singlestep(struct upr
put_uprobe(uprobe);
utask->active_uprobe = NULL;
utask->state = UTASK_RUNNING;
- user_disable_single_step(current);
xol_free_insn_slot(current);

spin_lock_irq(&current->sighand->siglock);
--- x/arch/x86/kernel/uprobes.c
+++ x/arch/x86/kernel/uprobes.c
@@ -471,6 +471,7 @@ int arch_uprobe_pre_xol(struct arch_upro
regs->ip = current->utask->xol_vaddr;
pre_xol_rip_insn(auprobe, regs, autask);

+ user_enable_single_step(current);
return 0;
}

@@ -596,6 +597,7 @@ int arch_uprobe_post_xol(struct arch_upr
if (auprobe->fixups & UPROBE_FIX_CALL)
result = adjust_ret_addr(regs->sp, correction);

+ user_disable_single_step(current);
return result;
}

@@ -640,6 +642,7 @@ void arch_uprobe_abort_xol(struct arch_u
current->thread.trap_nr = utask->autask.saved_trap_nr;
handle_riprel_post_xol(auprobe, regs, NULL);
instruction_pointer_set(regs, utask->vaddr);
+ user_disable_single_step(current);
}

/*

Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On Wed, Aug 29, 2012 at 07:37:48PM +0200, Oleg Nesterov wrote:
> On 08/22, Oleg Nesterov wrote:
> >
> > > Ehm. Is there anything I missed to do? Or are you speculating on
> > > changes which will clash with these here?
> >
> > If we have task_set_blockstep(), then perhaps it mmakes sense to
> > avoid user_enable_singlestep()/TIF_SINGLESTEP from the start.
> > We will see.
>
> But it is not clear when we will have task_set_blockstep.
>
> So I am starting to think it would be better to apply your 1-2 and
> change the code later. Still not correct, but better than nothing.
>
>
>
> But. The more I think about the current code, the more I dislike it.
> And I am starting to think we do not need yet another "weak arch*"
> hook for single-stepping. Yes, it was me who suggested it, but this
> is because I didn't want to complicate the merging of powerpc port.
>
> However.
>
> Ananth, Sebastian, what if we start with the patch below? Then
> we can change arch/x86/kernel/uprobes.c to use the static
> uprobe_*_step() helpers from the 2nd patch.

In principle I am fine with the change.

> If we agree this code should be per-arch, then why do need other
> hooks? This is just ugly, we already have arch_pre/post_xol.
>
> The only problem is the pending powerpc patches, the change below
> obviously breaks them. Were they already applied? If not, then
> probably Ananth can do v6 on top of the patch below ;) The necessary
> fixup is trivial.

They are under review. I can do the change, but since this change is
trivial enough, unless there is a pressing need to move the
user_*_single_step() right away, can't we hold off till 3.6? This can be
a simple enough cleanup then.

If not, I can spin a v6....

Ananth

Subject: [PATCH] x86/uprobes: don't disable single stepping if it was already on

This change checks if single stepping was already activated and if so,
it will leave it enabled. This allows the debugger to single step over
an uprobe. The state of block stepping is not restored. It makes only
sense together with opcode & flags inspection (is this a jump that
will be taken).

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
---
arch/x86/include/asm/uprobes.h | 2 ++
arch/x86/kernel/uprobes.c | 33 +++++++++++++++++++++++++++++++--
2 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index f3971bb..cf73dbf 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -46,6 +46,8 @@ struct arch_uprobe_task {
#ifdef CONFIG_X86_64
unsigned long saved_scratch_register;
#endif
+#define UPROBE_CLEAR_TF (1 << 0)
+ unsigned int restore_flags;
};

extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index f18ea64..8aac090 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -41,6 +41,9 @@
/* Adjust the return address of a call insn */
#define UPROBE_FIX_CALL 0x2

+/* Instruction will modify TF, don't change it */
+#define UPROBE_TF_CHANGES 0x4
+
#define UPROBE_FIX_RIP_AX 0x8000
#define UPROBE_FIX_RIP_CX 0x4000

@@ -239,6 +242,10 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn)
insn_get_opcode(insn); /* should be a nop */

switch (OPCODE1(insn)) {
+ case 0x9d:
+ /* popf */
+ auprobe->fixups |= UPROBE_TF_CHANGES;
+ break;
case 0xc3: /* ret/lret */
case 0xcb:
case 0xc2:
@@ -471,6 +478,17 @@ int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
regs->ip = current->utask->xol_vaddr;
pre_xol_rip_insn(auprobe, regs, autask);

+ autask->restore_flags = 0;
+ if (!test_tsk_thread_flag(current, TIF_SINGLESTEP) &&
+ !(auprobe->fixups & UPROBE_TF_CHANGES))
+ autask->restore_flags |= UPROBE_CLEAR_TF;
+ /*
+ * The state of TIF_BLOCKSTEP is not saved. With the TF flag set we
+ * would to examine the opcode and the flags to make it right. Without
+ * TF block stepping makes no sense. Instead we wakeup the debugger via
+ * SIGTRAP in case TF was set. This has the side effect that the
+ * debugger gets woken up even if the opcode normally wouldn't do so.
+ */
user_enable_single_step(current);
return 0;
}
@@ -555,6 +573,17 @@ bool arch_uprobe_xol_was_trapped(struct task_struct *t)
return false;
}

+static void disable_single_step(void)
+{
+ struct uprobe_task *utask = current->utask;
+ struct arch_uprobe_task *autask = &utask->autask;
+
+ if (autask->restore_flags & UPROBE_CLEAR_TF)
+ user_disable_single_step(current);
+ else
+ send_sig(SIGTRAP, current, 0);
+}
+
/*
* Called after single-stepping. To avoid the SMP problems that can
* occur when we temporarily put back the original opcode to
@@ -597,7 +626,7 @@ int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
if (auprobe->fixups & UPROBE_FIX_CALL)
result = adjust_ret_addr(regs->sp, correction);

- user_disable_single_step(current);
+ disable_single_step();
return result;
}

@@ -642,7 +671,7 @@ void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
current->thread.trap_nr = utask->autask.saved_trap_nr;
handle_riprel_post_xol(auprobe, regs, NULL);
instruction_pointer_set(regs, utask->vaddr);
- user_disable_single_step(current);
+ disable_single_step();
}

/*
--
1.7.10.4

2012-08-30 14:35:25

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/30, Ananth N Mavinakayanahalli wrote:
>
> On Wed, Aug 29, 2012 at 07:37:48PM +0200, Oleg Nesterov wrote:
> >
> > Ananth, Sebastian, what if we start with the patch below? Then
> > we can change arch/x86/kernel/uprobes.c to use the static
> > uprobe_*_step() helpers from the 2nd patch.
>
> In principle I am fine with the change.

OK, good.

> > If we agree this code should be per-arch, then why do need other
> > hooks? This is just ugly, we already have arch_pre/post_xol.
> >
> > The only problem is the pending powerpc patches, the change below
> > obviously breaks them. Were they already applied? If not, then
> > probably Ananth can do v6 on top of the patch below ;) The necessary
> > fixup is trivial.
>
> They are under review.

OK, I understand that v6 can confuse the maintainer and complicate the
merging process, please forget about v6.

And yes, this is really minor problem, still it would be nice to avoid
the unnecessary hooks/complications...

So. We can add "weak arch_uprobe" hooks, fix x86, and after powerpc is
merged change both powerpc and x86 in one patch (remove "weak" hooks
and move enable/disable into arch_pre/post_xol).

Or. We can apply the patch I sent right now, you can fix powerpc later,
when it is merged. This all is for 3.7 anyway, and fixup is trivial.

I agree either way. Which way do you prefer?




Sebastian, thanks for v4 you sent. I am still not sure what should we
do, but in any case I'll make the series which includes either 1-2 you
sent previously or this new patch.

Oleg.

Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On Thu, Aug 30, 2012 at 04:37:24PM +0200, Oleg Nesterov wrote:
> On 08/30, Ananth N Mavinakayanahalli wrote:
> >
> > On Wed, Aug 29, 2012 at 07:37:48PM +0200, Oleg Nesterov wrote:
> > >
> > > Ananth, Sebastian, what if we start with the patch below? Then
> > > we can change arch/x86/kernel/uprobes.c to use the static
> > > uprobe_*_step() helpers from the 2nd patch.
> >
> > In principle I am fine with the change.
>
> OK, good.
>
> > > If we agree this code should be per-arch, then why do need other
> > > hooks? This is just ugly, we already have arch_pre/post_xol.
> > >
> > > The only problem is the pending powerpc patches, the change below
> > > obviously breaks them. Were they already applied? If not, then
> > > probably Ananth can do v6 on top of the patch below ;) The necessary
> > > fixup is trivial.
> >
> > They are under review.
>
> OK, I understand that v6 can confuse the maintainer and complicate the
> merging process, please forget about v6.
>
> And yes, this is really minor problem, still it would be nice to avoid
> the unnecessary hooks/complications...
>
> So. We can add "weak arch_uprobe" hooks, fix x86, and after powerpc is
> merged change both powerpc and x86 in one patch (remove "weak" hooks
> and move enable/disable into arch_pre/post_xol).
>
> Or. We can apply the patch I sent right now, you can fix powerpc later,
> when it is merged. This all is for 3.7 anyway, and fixup is trivial.
>
> I agree either way. Which way do you prefer?

I prefer fixing both together later, just so nothing breaks while intial
testing, etc.

Ananth

2012-08-30 15:09:40

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v3] x86/uprobes: implement x86 specific arch_uprobe_*_step

On 08/30, Ananth N Mavinakayanahalli wrote:
>
> On Thu, Aug 30, 2012 at 04:37:24PM +0200, Oleg Nesterov wrote:
> >
> > So. We can add "weak arch_uprobe" hooks, fix x86, and after powerpc is
> > merged change both powerpc and x86 in one patch (remove "weak" hooks
> > and move enable/disable into arch_pre/post_xol).
> >
> > Or. We can apply the patch I sent right now, you can fix powerpc later,
> > when it is merged. This all is for 3.7 anyway, and fixup is trivial.
> >
> > I agree either way. Which way do you prefer?
>
> I prefer fixing both together later, just so nothing breaks while intial
> testing, etc.

OK.

Oleg.

Subject: Re: [RFC 5/5 v2] uprobes: add global breakpoints

On 08/29/2012 05:49 PM, Oleg Nesterov wrote:
>> That would help but would require a change in ptrace_attach() or
>> something in gdb/strace/…
>
> Well, I still think you should not touch ptrace_attach() at all.

Okay.

>> One thing I just noticed: If I don't register a handler for SIGUSR1 and
>> send one to the application while it is in TASK_KILLABLE then the
>> signal gets delivered.
>
> Not really delivered... OK, it can be delivered (dequeued) before
> the task sees SIGKILL, but this can be changed.
>
> In short: in this case the task is correctly SIGKILL'ed. See sig_fatal()
> in complete_signal().
>
>> If I register a signal handler for it than it
>> gets blocked and delivered once I resume the task.
>
> Sure, if you have a handler, the signal is not fatal.
>
>> Shouldn't it get blocked even if I don't register a handler for it?
>
> No.

Now, that I read again it looks like a brain fart on my side.

>> ach, those signals make everything complicated. I though signals are
>> blocked until the single step is done
>
> Yes, see uprobe_deny_signal().
>
>> but my test just showed my
>> something different.
>
> I guess you missed the UTASK_SSTEP_TRAPPED logic.
>
> But this doesn't matter. Surely we must not "block" signals _after_
> the single step is done, and this is the problem.
>
>> Okay, what now?
>
> IMHO: don't do this ;)
>
>> Blocking signals isn't probably a good idea.
>
> This is bad and wrong idea, I think.
>
> And, once again. Whatever you do, you can race with uprobe_register().
> I mean, you must never expect that the task will hit the same uprobe
> again, even if you are going to re-execute the same insn.

After witting why I think you are wrong I understood what you meant :)
So let me try to get this right…

>
> Oleg.

Sebastian