Hello All,
this is core implementation of the uretprobes. This enables a breakpoint
on function's return for the tools such as perf.
Introduce additional handler* for uprobe consumer that makes possible the
distinguish uprobe from uretprobe. Note, behind every uretprobe a regular
uprobe with return probe handler(rp_handler). Once hit the uprobe that has
rp_handler, we hijack the return address of the probed function and replacing
it with the address of trampoline. Trampoline is the preallocated page in
probed task's xol area that filled with breakpoint opcode. In turn, when the
return breakpoint is hit, invoke the rp_handler.
The patchset shouldn't be difficult to read and hopefully the comments to
commits will help.
Please, review.
patchset in git:
http://github.com/arapov/linux-aa/commits/uretprobes_v0
RFC reviews:
v4: https://lkml.org/lkml/2013/3/4/246
v3: https://lkml.org/lkml/2013/2/28/148
v2: https://lkml.org/lkml/2013/1/9/157
v1: https://lkml.org/lkml/2012/12/21/133
thanks,
Anton.
Anton Arapov (7):
uretprobes: preparation patch
uretprobes: extract fill_page() and trampoline implementation
uretprobes/x86: hijack return address
uretprobes: return probe entry, prepare_uretprobe()
uretprobes: return probe exit, invoke handlers
uretprobes: limit the depth of return probe nestedness
uretprobes: remove -ENOSYS as return probes implemented
arch/x86/include/asm/uprobes.h | 1 +
arch/x86/kernel/uprobes.c | 29 ++++++
include/linux/uprobes.h | 5 +
kernel/events/uprobes.c | 205 ++++++++++++++++++++++++++++++++++++++---
4 files changed, 229 insertions(+), 11 deletions(-)
--
1.8.1.4
Enclose return probes implementation, introduce ->rp_handler() and update
existing code to rely on ->handler() *and* ->rp_handler() for uprobe and
uretprobe respectively.
RFCv5 changes:
- don't remove uprobe in case there are no uprobe consumer(handler),
see handler_chain() changes.
RFCv3 changes: (the patch is introduced in v3)
- check whether at least one of the consumer's handlers were set.
- a 'TODO' cap that will be removed once return probes be implemented.
- introduce ->rp_handler().
Signed-off-by: Anton Arapov <[email protected]>
---
include/linux/uprobes.h | 1 +
kernel/events/uprobes.c | 14 +++++++++++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 02b83db..a28bdee 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -46,6 +46,7 @@ enum uprobe_filter_ctx {
struct uprobe_consumer {
int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs);
+ int (*rp_handler)(struct uprobe_consumer *self, struct pt_regs *regs);
bool (*filter)(struct uprobe_consumer *self,
enum uprobe_filter_ctx ctx,
struct mm_struct *mm);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 26bc2e2..3205a2e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -828,6 +828,14 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
struct uprobe *uprobe;
int ret;
+ /* Uprobe must have at least one set consumer */
+ if (!uc->handler && !uc->rp_handler)
+ return -EINVAL;
+
+ /* TODO: Implement return probes */
+ if (uc->rp_handler)
+ return -ENOSYS;
+
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
@@ -1488,10 +1496,14 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
{
struct uprobe_consumer *uc;
int remove = UPROBE_HANDLER_REMOVE;
+ int rc = 0;
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
- int rc = uc->handler(uc, regs);
+ if (uc->handler)
+ rc = uc->handler(uc, regs);
+ else
+ remove = 0;
WARN(rc & ~UPROBE_HANDLER_MASK,
"bad rc=0x%x from %pf()\n", rc, uc->handler);
--
1.8.1.4
When a uprobe with return probe consumer is hit, prepare_uretprobe()
function is invoked. It creates return_instance, hijacks return address
and replaces it with the trampoline.
- Return instances are kept as stack per uprobed task.
- Return instance is dirty, when the original return address is
trampoline's page vaddr (e.g. recursive call of the probed function).
N.B. it might be a good idea to introduce get_uprobe() to reflect
put_uprobe() later, but it is not a subject of this patchset.
RFCv6 changes:
- rework prepare_uretprobe() logic in order to make further unwinding
in handler_uretprobe() simplier.
- introduce the 'dirty' field.
RFCv5 changes:
- switch from hlist to simply linked list for tracking ->*return_uprobes.
- revert (*) from v4.
- preallocate first slot xol_area for return probes, see xol_get_area()
changes.
- add get_trampoline_vaddr() helper, to emphasize area->vaddr overload.
RFCv4 changes:
- get rid of area->rp_trampoline_vaddr as it always the same as ->vaddr.
- cleanup ->return_uprobes list in uprobe_free_utask(), because the
task can exit from inside the ret-probe'd function(s).
- in find_active_uprobe(): Once we inserted "int3" we must ensure that
handle_swbp() will be called even if this uprobe goes away. We have
the reference but it only protects uprobe itself, it can't protect
agains delete_uprobe().
IOW, we must ensure that uprobe_pre_sstep_notifier() can't return 0.
RFCv3 changes:
- protected uprobe with refcounter. See atomic_inc in prepare_uretprobe()
and put_uprobe() in a following patch in handle_uretprobe().
RFCv2 changes:
- get rid of ->return_consumers member from struct uprobe, introduce
rp_handler() in consumer.
Signed-off-by: Anton Arapov <[email protected]>
---
include/linux/uprobes.h | 1 +
kernel/events/uprobes.c | 91 ++++++++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 91 insertions(+), 1 deletion(-)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index a28bdee..145d466 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -69,6 +69,7 @@ struct uprobe_task {
enum uprobe_task_state state;
struct arch_uprobe_task autask;
+ struct return_instance *return_instances;
struct uprobe *active_uprobe;
unsigned long xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 86706d1..4ea3e91 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -75,6 +75,14 @@ struct uprobe {
struct arch_uprobe arch;
};
+struct return_instance {
+ struct uprobe *uprobe;
+ unsigned long orig_ret_vaddr; /* original return address */
+ bool dirty; /* true, if instance is nested */
+
+ struct return_instance *next; /* keep as stack */
+};
+
/*
* valid_vma: Verify if the specified vma is an executable vma
* Relax restrictions while unregistering: vm_flags might have
@@ -1318,6 +1326,7 @@ unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs)
void uprobe_free_utask(struct task_struct *t)
{
struct uprobe_task *utask = t->utask;
+ struct return_instance *ri, *tmp;
if (!utask)
return;
@@ -1325,6 +1334,15 @@ void uprobe_free_utask(struct task_struct *t)
if (utask->active_uprobe)
put_uprobe(utask->active_uprobe);
+ ri = utask->return_instances;
+ while (ri) {
+ put_uprobe(ri->uprobe);
+
+ tmp = ri;
+ ri = ri->next;
+ kfree(tmp);
+ }
+
xol_free_insn_slot(t);
kfree(utask);
t->utask = NULL;
@@ -1358,6 +1376,71 @@ static unsigned long get_trampoline_vaddr(struct xol_area *area)
return area->vaddr;
}
+static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
+{
+ struct return_instance *ri;
+ struct uprobe_task *utask;
+ struct xol_area *area;
+ unsigned long trampoline_vaddr;
+ unsigned long prev_ret_vaddr, ret_vaddr;
+
+ area = get_xol_area();
+ if (!area)
+ return;
+
+ utask = get_utask();
+ if (!utask)
+ return;
+
+ prev_ret_vaddr = -1;
+ if (utask->return_instances)
+ prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
+
+ ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
+ if (!ri)
+ return;
+
+ ri->dirty = false;
+ trampoline_vaddr = get_trampoline_vaddr(area);
+ ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
+
+ /*
+ * We don't want to keep trampoline address in stack, rather keep the
+ * original return address of first caller thru all the consequent
+ * instances. This also makes breakpoint unwrapping easier.
+ */
+ if (ret_vaddr == trampoline_vaddr) {
+ if (likely(prev_ret_vaddr != -1)) {
+ ri->dirty = true;
+ ret_vaddr = prev_ret_vaddr;
+ } else {
+ /*
+ * This situation is not possible. Likely we have an
+ * attack from user-space. Die.
+ */
+ printk(KERN_ERR "uprobe: something went wrong "
+ "pid/tgid=%d/%d", current->pid, current->tgid);
+ send_sig(SIGSEGV, current, 0);
+ kfree(ri);
+ return;
+ }
+ }
+
+ if (likely(ret_vaddr != -1)) {
+ atomic_inc(&uprobe->ref);
+ ri->uprobe = uprobe;
+ ri->orig_ret_vaddr = ret_vaddr;
+
+ /* add instance to the stack */
+ ri->next = utask->return_instances;
+ utask->return_instances = ri;
+
+ return;
+ }
+
+ kfree(ri);
+}
+
/* Prepare to single-step probed instruction out of line. */
static int
pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
@@ -1513,20 +1596,26 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
{
struct uprobe_consumer *uc;
int remove = UPROBE_HANDLER_REMOVE;
+ bool prep = false; /* prepare return uprobe, when needed */
int rc = 0;
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
if (uc->handler)
rc = uc->handler(uc, regs);
- else
+ else {
+ prep |= true;
remove = 0;
+ }
WARN(rc & ~UPROBE_HANDLER_MASK,
"bad rc=0x%x from %pf()\n", rc, uc->handler);
remove &= rc;
}
+ if (prep)
+ prepare_uretprobe(uprobe, regs); /* put bp at return */
+
if (remove && uprobe->consumers) {
WARN_ON(!uprobe_is_active(uprobe));
unapply_uprobe(uprobe, current->mm);
--
1.8.1.4
enclose return probes implementation
v3 changes: (the patch is introduced in v3)
- remove 'TODO' as return probes implemented now
Signed-off-by: Anton Arapov <[email protected]>
---
kernel/events/uprobes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5fb7809..e13f9f8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,10 +840,6 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
if (!uc->handler && !uc->rp_handler)
return -EINVAL;
- /* TODO: Implement return probes */
- if (uc->rp_handler)
- return -ENOSYS;
-
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
--
1.8.1.4
- Split out fill_page() from xol_get_insn_slot().
- Allocate trampoline page, as the very first one in uprobed
task xol area, and fill it with breakpoint opcode.
- Also introduce get_trampoline_vaddr() helper, to wrap the
trampoline address extraction from area->vaddr. That removes
confusion and eases the debug experience in case ->vaddr
notion will be changed.
Signed-off-by: Anton Arapov <[email protected]>
---
kernel/events/uprobes.c | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3205a2e..86706d1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1080,6 +1080,21 @@ void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned lon
set_bit(MMF_RECALC_UPROBES, &vma->vm_mm->flags);
}
+static void fill_page(struct page *page, unsigned long offset, uprobe_opcode_t *insn)
+{
+ void *vaddr;
+
+ vaddr = kmap_atomic(page);
+ memcpy(vaddr + offset, insn, MAX_UINSN_BYTES);
+ kunmap_atomic(vaddr);
+
+ /*
+ * We probably need flush_icache_user_range() but it needs vma.
+ * This should work on supported architectures too.
+ */
+ flush_dcache_page(page);
+}
+
/* Slot allocation for XOL */
static int xol_add_vma(struct xol_area *area)
{
@@ -1122,6 +1137,7 @@ static struct xol_area *get_xol_area(void)
{
struct mm_struct *mm = current->mm;
struct xol_area *area;
+ uprobe_opcode_t insn = UPROBE_SWBP_INSN;
area = mm->uprobes_state.xol_area;
if (area)
@@ -1139,6 +1155,10 @@ static struct xol_area *get_xol_area(void)
if (!area->page)
goto free_bitmap;
+ /* pre-allocate for return probes */
+ set_bit(0, area->bitmap);
+ fill_page(area->page, 0, &insn);
+
init_waitqueue_head(&area->wq);
if (!xol_add_vma(area))
return area;
@@ -1226,7 +1246,6 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
struct xol_area *area;
unsigned long offset;
unsigned long xol_vaddr;
- void *vaddr;
area = get_xol_area();
if (!area)
@@ -1238,14 +1257,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
/* Initialize the slot */
offset = xol_vaddr & ~PAGE_MASK;
- vaddr = kmap_atomic(area->page);
- memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
- kunmap_atomic(vaddr);
- /*
- * We probably need flush_icache_user_range() but it needs vma.
- * This should work on supported architectures too.
- */
- flush_dcache_page(area->page);
+ fill_page(area->page, offset, uprobe->arch.insn);
return xol_vaddr;
}
@@ -1341,6 +1353,11 @@ static struct uprobe_task *get_utask(void)
return current->utask;
}
+static unsigned long get_trampoline_vaddr(struct xol_area *area)
+{
+ return area->vaddr;
+}
+
/* Prepare to single-step probed instruction out of line. */
static int
pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long bp_vaddr)
--
1.8.1.4
Enclose return probes implementation.
Signed-off-by: Anton Arapov <[email protected]>
---
kernel/events/uprobes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5fb7809..e13f9f8 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -840,10 +840,6 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
if (!uc->handler && !uc->rp_handler)
return -EINVAL;
- /* TODO: Implement return probes */
- if (uc->rp_handler)
- return -ENOSYS;
-
/* Racy, just to catch the obvious mistakes */
if (offset > i_size_read(inode))
return -EINVAL;
--
1.8.1.4
Unlike the kretprobes we can't trust userspace, thus must have
protection from user space attacks, this patch limits the return
probes nestedness as a simple remedy for it.
The intention is to have KISS and bare minimum solution for the
initial implementation in order to not complicate the uretprobes
code.
In the future we may come up with more sophisticated solution that
should remove this depth limitation, however it is not easy task
and lays beyond this patchset. It should consider things like: breakpoint
address lays outside the stack and stack growth direction, longjmp,
sigaltstack... be able to clean up return instances.
Signed-off-by: Anton Arapov <[email protected]>
---
include/linux/uprobes.h | 3 +++
kernel/events/uprobes.c | 11 +++++++++++
2 files changed, 14 insertions(+)
diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
index 145d466..928d72f 100644
--- a/include/linux/uprobes.h
+++ b/include/linux/uprobes.h
@@ -38,6 +38,8 @@ struct inode;
#define UPROBE_HANDLER_REMOVE 1
#define UPROBE_HANDLER_MASK 1
+#define MAX_URETPROBE_DEPTH 64
+
enum uprobe_filter_ctx {
UPROBE_FILTER_REGISTER,
UPROBE_FILTER_UNREGISTER,
@@ -70,6 +72,7 @@ struct uprobe_task {
struct arch_uprobe_task autask;
struct return_instance *return_instances;
+ unsigned int depth;
struct uprobe *active_uprobe;
unsigned long xol_vaddr;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 91edd2c..5fb7809 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1392,6 +1392,13 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
if (!utask)
return;
+ if (utask->depth >= MAX_URETPROBE_DEPTH) {
+ printk_ratelimited(KERN_INFO "urpobe: reached the return probe"
+ " depth limit pid/tgid=%d/%d\n", current->pid,
+ current->tgid);
+ return;
+ }
+
prev_ret_vaddr = -1;
if (utask->return_instances)
prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
@@ -1431,6 +1438,8 @@ static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
ri->uprobe = uprobe;
ri->orig_ret_vaddr = ret_vaddr;
+ utask->depth++;
+
/* add instance to the stack */
ri->next = utask->return_instances;
utask->return_instances = ri;
@@ -1661,6 +1670,8 @@ static void handle_uretprobe(struct xol_area *area, struct pt_regs *regs)
ri = ri->next;
kfree(tmp);
+ utask->depth--;
+
if (!ri || ri->dirty == false) {
/*
* This is the first return uprobe (chronologically)
--
1.8.1.4
Uretprobe handlers are invoked when the trampoline is hit, on completion the
trampoline is replaced with the saved return address and the uretprobe instance
deleted.
RFCv6 changes:
- rework handle_uretprobe()
RFCv5 changes:
- switch to simply linked list ->return_uprobes
- rework handle_uretprobe()
RFCv4 changes:
- check, whether utask is not NULL in handle_uretprobe()
- get rid of area->rp_trampoline_vaddr
- minor handle_uretprobe() fixups
RFCv3 changes:
- protected uprobe with refcounter. See put_uprobe() in handle_uretprobe()
that reflects increment in prepare_uretprobe()
RFCv2 changes:
- get rid of ->return_consumers member from struct uprobe, introduce
rp_handler() in consumer instead
Signed-off-by: Anton Arapov <[email protected]>
---
kernel/events/uprobes.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 59 insertions(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4ea3e91..91edd2c 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1623,6 +1623,56 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
up_read(&uprobe->register_rwsem);
}
+static void handler_uretprobe_chain(struct uprobe *uprobe, struct pt_regs *regs)
+{
+ struct uprobe_consumer *uc;
+
+ down_read(&uprobe->register_rwsem);
+ for (uc = uprobe->consumers; uc; uc = uc->next) {
+ if (uc->rp_handler)
+ uc->rp_handler(uc, regs);
+ }
+ up_read(&uprobe->register_rwsem);
+}
+
+static void handle_uretprobe(struct xol_area *area, struct pt_regs *regs)
+{
+ struct uprobe_task *utask;
+ struct return_instance *ri, *tmp;
+ unsigned long prev_ret_vaddr;
+
+ utask = get_utask();
+ if (!utask)
+ return;
+
+ ri = utask->return_instances;
+ if (!ri)
+ return;
+
+ instruction_pointer_set(regs, ri->orig_ret_vaddr);
+
+ while (ri) {
+ if (ri->uprobe->consumers)
+ handler_uretprobe_chain(ri->uprobe, regs);
+
+ put_uprobe(ri->uprobe);
+ tmp = ri;
+ prev_ret_vaddr = tmp->orig_ret_vaddr;
+ ri = ri->next;
+ kfree(tmp);
+
+ if (!ri || ri->dirty == false) {
+ /*
+ * This is the first return uprobe (chronologically)
+ * pushed for this particular instance of the probed
+ * function.
+ */
+ utask->return_instances = ri;
+ return;
+ }
+ }
+}
+
/*
* Run handler and ask thread to singlestep.
* Ensure all non-fatal signals cannot interrupt thread while it singlesteps.
@@ -1631,11 +1681,19 @@ static void handle_swbp(struct pt_regs *regs)
{
struct uprobe *uprobe;
unsigned long bp_vaddr;
+ struct xol_area *area;
int uninitialized_var(is_swbp);
bp_vaddr = uprobe_get_swbp_addr(regs);
- uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
+ area = get_xol_area();
+ if (area) {
+ if (bp_vaddr == get_trampoline_vaddr(area)) {
+ handle_uretprobe(area, regs);
+ return;
+ }
+ }
+ uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
if (!uprobe) {
if (is_swbp > 0) {
/* No matching uprobe; signal SIGTRAP. */
--
1.8.1.4
- Hijack the return address and replace it with a trampoline address.
RFCv5 changes:
- change the fail return code, because orig_ret_vaddr=0 is possible
- style fixup
RFCv2 changes:
- remove ->doomed flag, kill task immediately
Signed-off-by: Anton Arapov <[email protected]>
---
arch/x86/include/asm/uprobes.h | 1 +
arch/x86/kernel/uprobes.c | 29 +++++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 8ff8be7..6e51979 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -55,4 +55,5 @@ extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern unsigned long arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs);
#endif /* _ASM_UPROBES_H */
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index 0ba4cfb..05d357e 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -697,3 +697,32 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
send_sig(SIGTRAP, current, 0);
return ret;
}
+
+unsigned long
+arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs)
+{
+ int rasize, ncopied;
+ unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
+
+ rasize = is_ia32_task() ? 4 : 8;
+ ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize);
+ if (unlikely(ncopied))
+ return -1;
+
+ /* check whether address has been already hijacked */
+ if (orig_ret_vaddr == trampoline_vaddr)
+ return orig_ret_vaddr;
+
+ ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
+ if (unlikely(ncopied)) {
+ if (ncopied != rasize) {
+ printk(KERN_ERR "uprobe: return address clobbered: "
+ "pid=%d, %%sp=%#lx, %%ip=%#lx\n",
+ current->pid, regs->sp, regs->ip);
+ /* kill task immediately */
+ send_sig(SIGSEGV, current, 0);
+ }
+ }
+
+ return orig_ret_vaddr;
+}
--
1.8.1.4
please disregard this one, not sure how it managed to hide within
the rest ones...
sry,
Anton.
On Fri, Mar 22, 2013 at 02:09:04PM +0100, Anton Arapov wrote:
> enclose return probes implementation
>
> v3 changes: (the patch is introduced in v3)
> - remove 'TODO' as return probes implemented now
>
> Signed-off-by: Anton Arapov <[email protected]>
> ---
> kernel/events/uprobes.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 5fb7809..e13f9f8 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -840,10 +840,6 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
> if (!uc->handler && !uc->rp_handler)
> return -EINVAL;
>
> - /* TODO: Implement return probes */
> - if (uc->rp_handler)
> - return -ENOSYS;
> -
> /* Racy, just to catch the obvious mistakes */
> if (offset > i_size_read(inode))
> return -EINVAL;
> --
> 1.8.1.4
>
I'll try to read this series later. Just one note...
On 03/22, Anton Arapov wrote:
>
> IOW, we must ensure that uprobe_pre_sstep_notifier() can't return 0.
Yes, but I do not see this change?
> +static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> +{
> + struct return_instance *ri;
> + struct uprobe_task *utask;
> + struct xol_area *area;
> + unsigned long trampoline_vaddr;
> + unsigned long prev_ret_vaddr, ret_vaddr;
> +
> + area = get_xol_area();
> + if (!area)
> + return;
> +
> + utask = get_utask();
> + if (!utask)
> + return;
> +
> + prev_ret_vaddr = -1;
> + if (utask->return_instances)
> + prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
> +
> + ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
> + if (!ri)
> + return;
> +
> + ri->dirty = false;
> + trampoline_vaddr = get_trampoline_vaddr(area);
> + ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
OK, but you need to ensure that this code can be compiled on poweprc.
Oleg.
On 03/22, Anton Arapov wrote:
>
> this is core implementation of the uretprobes. This enables a breakpoint
> on function's return for the tools such as perf.
And, before we ask Ingo to pull, we should teach trace_uprobe.c to handle
"perf -x func%return".
This looks simple, but probably we need to add the additional "ulong bp_vaddr"
argument for rp_handler().
Oleg.
On 03/22/2013 08:10 AM, Oleg Nesterov wrote:
> This looks simple, but probably we need to add the additional "ulong bp_vaddr"
> argument for rp_handler().
FWIW, in SystemTap we don't currently do anything that would need
bp_vaddr. The uprobe_consumer already gives the context, so I'm not
sure what else you would do with the *entry* address of the function
that just returned. Perhaps you could emulate an additional top line on
a stacktrace, but it doesn't tell you where in the function you were.
Josh
On Fri, Mar 22, 2013 at 02:40:42PM -0700, Josh Stone wrote:
> On 03/22/2013 08:10 AM, Oleg Nesterov wrote:
> > This looks simple, but probably we need to add the additional "ulong bp_vaddr"
> > argument for rp_handler().
>
> FWIW, in SystemTap we don't currently do anything that would need
> bp_vaddr. The uprobe_consumer already gives the context, so I'm not
> sure what else you would do with the *entry* address of the function
> that just returned. Perhaps you could emulate an additional top line on
> a stacktrace, but it doesn't tell you where in the function you were.
IIUC, Oleg have stressed that perf and trace events requires
additional code to leverage the return probes. IOW this patchset
implements core only and to use it we need to teach the perf about
rp_handler().
And I don't see why we would need additional parameter for
rp_handler() as well.
Anton
On 03/22, Anton Arapov wrote:
>
> @@ -1488,10 +1496,14 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
> {
> struct uprobe_consumer *uc;
> int remove = UPROBE_HANDLER_REMOVE;
> + int rc = 0;
>
> down_read(&uprobe->register_rwsem);
> for (uc = uprobe->consumers; uc; uc = uc->next) {
> - int rc = uc->handler(uc, regs);
> + if (uc->handler)
> + rc = uc->handler(uc, regs);
> + else
> + remove = 0;
Well, this doesn't look good. Yes, we need to conditionalize uc->handler()
and rc checks, but the code looks ugly. We touch remove twice, and the value
of rc inside the loop is bogus if ->handler == NULL.
I wouldn't have argued, but, but 4/7 changes the "else" branch and this change
is wrong (I'll write another email). We do not need this "else" at all.
I'd suggest the patch below.
Oleg.
--- x/kernel/events/uprobes.c
+++ x/kernel/events/uprobes.c
@@ -1491,10 +1491,13 @@ static void handler_chain(struct uprobe
down_read(&uprobe->register_rwsem);
for (uc = uprobe->consumers; uc; uc = uc->next) {
- int rc = uc->handler(uc, regs);
+ int rc = 0;
- WARN(rc & ~UPROBE_HANDLER_MASK,
- "bad rc=0x%x from %pf()\n", rc, uc->handler);
+ if (uc->handler) {
+ rc = uc->handler(uc, regs);
+ WARN(rc & ~UPROBE_HANDLER_MASK,
+ "bad rc=0x%x from %pf()\n", rc, uc->handler);
+ }
remove &= rc;
}
On 03/22, Anton Arapov wrote:
>
> @@ -1513,20 +1596,26 @@ static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs)
> {
> struct uprobe_consumer *uc;
> int remove = UPROBE_HANDLER_REMOVE;
> + bool prep = false; /* prepare return uprobe, when needed */
> int rc = 0;
>
> down_read(&uprobe->register_rwsem);
> for (uc = uprobe->consumers; uc; uc = uc->next) {
> if (uc->handler)
> rc = uc->handler(uc, regs);
> - else
> + else {
> + prep |= true;
> remove = 0;
> + }
This looks wrong (see also my reply to 1/7).
What if uc->handler != NULL and uc->ret_handler != NULL ? We need
if (uc->rer_handler)
need_prep = true;
Oleg.
On 03/22, Josh Stone wrote:
>
> On 03/22/2013 08:10 AM, Oleg Nesterov wrote:
> > This looks simple, but probably we need to add the additional "ulong bp_vaddr"
> > argument for rp_handler().
>
> FWIW, in SystemTap we don't currently do anything that would need
> bp_vaddr.
I know. This is not for systemtap.
> The uprobe_consumer already gives the context, so I'm not
> sure what else you would do with the *entry* address of the function
> that just returned.
context - yes, but not address (by the time ret_handler is called).
Oleg.
On 03/23, Anton Arapov wrote:
>
> IIUC, Oleg have stressed that perf and trace events requires
> additional code to leverage the return probes. IOW this patchset
> implements core only and to use it we need to teach the perf about
> rp_handler().
Yes,
> And I don't see why we would need additional parameter for
> rp_handler() as well.
trace_uprobe should use the same fmt = "(%lx <- %lx)" to teport
"entry <- exit" info, like trace_kprobe does.
And note that, unlike kprobe, the same function can be mmapped at
different address by different (or even the same) tasks.
But mostly I just wanted to say that yes, we will also teach
trace_uprobe to use the new feature before we ask to merge this
code.
Oleg.
On 03/22, Anton Arapov wrote:
>
> +static void fill_page(struct page *page, unsigned long offset, uprobe_opcode_t *insn)
^^^^^^^^^^^^^^^
Well, it should be "u8 *insn" or "char *".
xol_get_insn_slot() passes arch.insn, and this is not uprobe_opcode_t
in general (see arch/powerpc/include/asm/uprobes.h)
> +{
> + void *vaddr;
> +
> + vaddr = kmap_atomic(page);
> + memcpy(vaddr + offset, insn, MAX_UINSN_BYTES);
^^^^^^^^^^^^^^^
OK, but see below...
> + kunmap_atomic(vaddr);
> +
> + /*
> + * We probably need flush_icache_user_range() but it needs vma.
> + * This should work on supported architectures too.
> + */
> + flush_dcache_page(page);
> +}
Please do not move flush_dcache_page() from xol_get_insn_slot(),
this is not needed.
Unlike xol_get_insn_slot(), get_xol_area() calls fill_page() before
xol_add_vma().
> static int xol_add_vma(struct xol_area *area)
> {
> @@ -1122,6 +1137,7 @@ static struct xol_area *get_xol_area(void)
> {
> struct mm_struct *mm = current->mm;
> struct xol_area *area;
> + uprobe_opcode_t insn = UPROBE_SWBP_INSN;
>
> area = mm->uprobes_state.xol_area;
> if (area)
> @@ -1139,6 +1155,10 @@ static struct xol_area *get_xol_area(void)
> if (!area->page)
> goto free_bitmap;
>
> + /* pre-allocate for return probes */
> + set_bit(0, area->bitmap);
> + fill_page(area->page, 0, &insn);
sizeof(insn) == UPROBE_SWBP_INSN_SIZE != MAX_UINSN_BYTES. See also the
comments above.
Oleg.
On 03/22, Anton Arapov wrote:
>
> +arch_uretprobe_hijack_return_addr(unsigned long trampoline_vaddr, struct pt_regs *regs)
> +{
> + int rasize, ncopied;
> + unsigned long orig_ret_vaddr = 0; /* clear high bits for 32-bit apps */
> +
> + rasize = is_ia32_task() ? 4 : 8;
Hmm.. I guess you need is_compat_task() here.
> + ncopied = copy_from_user(&orig_ret_vaddr, (void __user *)regs->sp, rasize);
> + if (unlikely(ncopied))
> + return -1;
> +
> + /* check whether address has been already hijacked */
> + if (orig_ret_vaddr == trampoline_vaddr)
> + return orig_ret_vaddr;
> +
> + ncopied = copy_to_user((void __user *)regs->sp, &trampoline_vaddr, rasize);
> + if (unlikely(ncopied)) {
> + if (ncopied != rasize) {
> + printk(KERN_ERR "uprobe: return address clobbered: "
Cosmetic, but we have pr_err().
> + "pid=%d, %%sp=%#lx, %%ip=%#lx\n",
> + current->pid, regs->sp, regs->ip);
> + /* kill task immediately */
> + send_sig(SIGSEGV, current, 0);
probably force_sig_info() makes more sense, but this is minor
> + }
You need to retun -1 here.
Cosmetic again, but since this function should be updated anyway,
I'd suggest
ncopied = copy_to_user(...);
if (likely(!ncopied))
return orig_ret_vaddr;
if (ncopied != rasize) {
...
}
return -1;
Oleg.
On 03/22, Anton Arapov wrote:
>
> void uprobe_free_utask(struct task_struct *t)
> {
> struct uprobe_task *utask = t->utask;
> + struct return_instance *ri, *tmp;
>
> if (!utask)
> return;
> @@ -1325,6 +1334,15 @@ void uprobe_free_utask(struct task_struct *t)
> if (utask->active_uprobe)
> put_uprobe(utask->active_uprobe);
>
> + ri = utask->return_instances;
You also need to nullify ->return_instances before return, otherwise
it can be use-after-freed later.
uprobe_free_utask() can also be called when the task execs.
> + while (ri) {
> + put_uprobe(ri->uprobe);
> +
> + tmp = ri;
> + ri = ri->next;
> + kfree(tmp);
> + }
This is really minor, but I can't resist. Both put_uprobe() and kfree()
work with the same object, it would be more clean to use the same var.
Say,
while (ri) {
tmp = ri;
ri = ri->next;
put_uprobe(tmp->uprobe);
kfree(tmp);
}
> +static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> +{
...
> +
> + prev_ret_vaddr = -1;
> + if (utask->return_instances)
> + prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
> +
> + ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
> + if (!ri)
> + return;
> +
> + ri->dirty = false;
> + trampoline_vaddr = get_trampoline_vaddr(area);
> + ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
> +
> + /*
> + * We don't want to keep trampoline address in stack, rather keep the
> + * original return address of first caller thru all the consequent
> + * instances. This also makes breakpoint unwrapping easier.
> + */
> + if (ret_vaddr == trampoline_vaddr) {
> + if (likely(prev_ret_vaddr != -1)) {
> + ri->dirty = true;
> + ret_vaddr = prev_ret_vaddr;
> + } else {
> + /*
> + * This situation is not possible. Likely we have an
> + * attack from user-space. Die.
> + */
> + printk(KERN_ERR "uprobe: something went wrong "
> + "pid/tgid=%d/%d", current->pid, current->tgid);
> + send_sig(SIGSEGV, current, 0);
> + kfree(ri);
> + return;
> + }
> + }
> +
> + if (likely(ret_vaddr != -1)) {
> + atomic_inc(&uprobe->ref);
> + ri->uprobe = uprobe;
> + ri->orig_ret_vaddr = ret_vaddr;
> +
> + /* add instance to the stack */
> + ri->next = utask->return_instances;
> + utask->return_instances = ri;
> +
> + return;
> + }
> +
> + kfree(ri);
> +}
Anton, this really doesn't look clear/clean. Why do you need prev_ret_vaddr
in advance? Why do you need it at all? why do you delay the "ret_vaddr == -1"
errorcheck?
And ->dirty looks confusing... perhaps ->chained ?
ri = kzalloc(...);
if (!ri)
return;
ret_vaddr = arch_uretprobe_hijack_return_addr(...);
if (ret_vaddr == -1)
goto err;
if (ret_vaddr == trampoline_vaddr) {
if (!utask->return_instances) {
// This situation is not possible.
// (not sure we should send SIGSEGV)
pr_warn(...);
goto err;
}
ri->chained = true;
ret_vaddr = utask->return_instances->orig_ret_vaddr;
}
fill-ri-and-add-push-it;
return;
err:
kfree(ri);
return;
Oleg.
On 03/22, Anton Arapov wrote:
>
> +static void handle_uretprobe(struct xol_area *area, struct pt_regs *regs)
> +{
> + struct uprobe_task *utask;
> + struct return_instance *ri, *tmp;
> + unsigned long prev_ret_vaddr;
> +
> + utask = get_utask();
> + if (!utask)
> + return;
> +
> + ri = utask->return_instances;
> + if (!ri)
> + return;
Hmm. I am wondering what should the caller (handle_swbp) do in this
case...
> +
> + instruction_pointer_set(regs, ri->orig_ret_vaddr);
> +
> + while (ri) {
> + if (ri->uprobe->consumers)
> + handler_uretprobe_chain(ri->uprobe, regs);
I'd suggest to either remove this check or move it into
handler_uretprobe_chain().
> +
> + put_uprobe(ri->uprobe);
> + tmp = ri;
> + prev_ret_vaddr = tmp->orig_ret_vaddr;
For what? It seems that prev_ret_vaddr should be simply killed.
> + ri = ri->next;
> + kfree(tmp);
Another case when you do put_uprobe/kfree using the different vars...
Once again, the code is correct but imho a bit confusing.
> + if (!ri || ri->dirty == false) {
> + /*
> + * This is the first return uprobe (chronologically)
> + * pushed for this particular instance of the probed
> + * function.
> + */
> + utask->return_instances = ri;
> + return;
> + }
Else? we simply return without updating ->return_instances which
points to the freed element(s) ? OK, this must not be possible but
this is not obvious...
And the fact you check "ri != NULL" twice doesn't look very nice.
We already checked ri != NULL before while(ri), we have to do this
anyway for instruction_pointer_set(). Perhaps do/whild or even
for (;;) + break would be more clean. But this is minor.
I am not sure the logic is correct. OK. suppose that
->return_instances = NULL.
The task hits the rp breakoint. After that
return_instances -> { .dirty = false }
The task hits the same breakoint before return (tail call), now
we have
return_instances -> { .dirty = true } -> { .dirty = false }
Then it returns and handle_uretprobe() should unwind the whole stack.
But, it seems, the main loop will stop after the 1st iteration?
Ignoring the fact you need put_uprobe/kfree, it seems that we should
do something like this,
do {
handler_uretprobe_chain(...);
if (!ri->dirty) // not chained
break;
ri = ri->next;
} while (ri);
utask->return_instances = ri;
No?
> @@ -1631,11 +1681,19 @@ static void handle_swbp(struct pt_regs *regs)
> {
> struct uprobe *uprobe;
> unsigned long bp_vaddr;
> + struct xol_area *area;
> int uninitialized_var(is_swbp);
>
> bp_vaddr = uprobe_get_swbp_addr(regs);
> - uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
> + area = get_xol_area();
Why?
No, we do not want this heavy and potentially unnecessary get_xol_area(),
> + if (area) {
Just check uprobes_state.xol_area != NULL instead. If it is NULL
we simply should not call handle_uretprobe().
Or perhaps get_trampoline_vaddr() should simply return -1 if
->xol_area == NULL.
> + if (bp_vaddr == get_trampoline_vaddr(area)) {
I just noticed get_trampoline_vaddr() takes an argument... It should
not, I think.
Oleg.
On 03/22, Anton Arapov wrote:
>
> Unlike the kretprobes we can't trust userspace, thus must have
> protection from user space attacks,
Plus the user-space have the "unlimited" stack, it looks simply
impossible to handle the deep recursion correctly.
Lets consider the simplest case,
void probed_func(int x)
{
if (--x)
probed_func(x);
}
We could add ret_instance->recursion_counter to handle this case and
avoid kmalloc(). But this way we won't be able to implement the new
features like data-pouch.
> this patch limits the return
> probes nestedness as a simple remedy for it.
> The intention is to have KISS and bare minimum solution for the
> initial implementation in order to not complicate the uretprobes
> code.
>
> In the future we may come up with more sophisticated solution that
> should remove this depth limitation, however it is not easy task
> and lays beyond this patchset. It should consider things like: breakpoint
> address lays outside the stack and stack growth direction, longjmp,
> sigaltstack... be able to clean up return instances.
I agree. Perhaps we should at least try to do something more clever in
future, but this needs another (certainly nontrivial) series, we should
not mix this with the initial implementation, it should be as simple as
possible.
Oleg.
On 03/24, Oleg Nesterov wrote:
>
> On 03/22, Anton Arapov wrote:
> >
> > +static void fill_page(struct page *page, unsigned long offset, uprobe_opcode_t *insn)
> ^^^^^^^^^^^^^^^
> Well, it should be "u8 *insn" or "char *".
Or void*.
And we have another reason for this helper.
And I really think that we need to cleanup the kmap mess in uprobe.c
before we add the new abuser.
How about this simple series?
> sizeof(insn) == UPROBE_SWBP_INSN_SIZE != MAX_UINSN_BYTES. See also the
> comments above.
I think that copy_to_page() added by 4/5 is what you need, and it solves
the problems with typeof/sizeof. Plus it have another caller.
Oleg.
No functional changes. Rename copy_opcode() into copy_from_page() and
add the new "int len" argument to make it more more generic for the
new users.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 8 ++++----
1 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 5c273b3..d6891cb 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -173,10 +173,10 @@ bool __weak is_swbp_insn(uprobe_opcode_t *insn)
return *insn == UPROBE_SWBP_INSN;
}
-static void copy_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode)
+static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
{
void *kaddr = kmap_atomic(page);
- memcpy(opcode, kaddr + (vaddr & ~PAGE_MASK), UPROBE_SWBP_INSN_SIZE);
+ memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
kunmap_atomic(kaddr);
}
@@ -185,7 +185,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
uprobe_opcode_t old_opcode;
bool is_swbp;
- copy_opcode(page, vaddr, &old_opcode);
+ copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
is_swbp = is_swbp_insn(&old_opcode);
if (is_swbp_insn(new_opcode)) {
@@ -1449,7 +1449,7 @@ static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr)
if (result < 0)
return result;
- copy_opcode(page, vaddr, &opcode);
+ copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
put_page(page);
out:
return is_swbp_insn(&opcode);
--
1.5.5.1
Change __copy_insn() to use copy_from_page() and simplify the code.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 13 ++-----------
1 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index d6891cb..0a759c6 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -477,30 +477,21 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
unsigned long nbytes, loff_t offset)
{
struct page *page;
- void *vaddr;
- unsigned long off;
- pgoff_t idx;
if (!filp)
return -EINVAL;
if (!mapping->a_ops->readpage)
return -EIO;
-
- idx = offset >> PAGE_CACHE_SHIFT;
- off = offset & ~PAGE_MASK;
-
/*
* Ensure that the page that has the original instruction is
* populated and in page-cache.
*/
- page = read_mapping_page(mapping, idx, filp);
+ page = read_mapping_page(mapping, offset >> PAGE_CACHE_SHIFT, filp);
if (IS_ERR(page))
return PTR_ERR(page);
- vaddr = kmap_atomic(page);
- memcpy(insn, vaddr + off, nbytes);
- kunmap_atomic(vaddr);
+ copy_from_page(page, offset, insn, nbytes);
page_cache_release(page);
return 0;
--
1.5.5.1
Change write_opcode() to use copy_highpage() + copy_to_page()
and simplify the code.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 12 ++----------
1 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 9d943f7..72f03d4 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -232,7 +232,6 @@ static int write_opcode(struct mm_struct *mm, unsigned long vaddr,
uprobe_opcode_t opcode)
{
struct page *old_page, *new_page;
- void *vaddr_old, *vaddr_new;
struct vm_area_struct *vma;
int ret;
@@ -253,15 +252,8 @@ retry:
__SetPageUptodate(new_page);
- /* copy the page now that we've got it stable */
- vaddr_old = kmap_atomic(old_page);
- vaddr_new = kmap_atomic(new_page);
-
- memcpy(vaddr_new, vaddr_old, PAGE_SIZE);
- memcpy(vaddr_new + (vaddr & ~PAGE_MASK), &opcode, UPROBE_SWBP_INSN_SIZE);
-
- kunmap_atomic(vaddr_new);
- kunmap_atomic(vaddr_old);
+ copy_highpage(new_page, old_page);
+ copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
ret = anon_vma_prepare(vma);
if (ret)
--
1.5.5.1
__copy_insn(filp) can only be called after valid_vma() returns T,
vma->vm_file passed as "filp" can not be NULL.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 3 ---
1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 0a759c6..e5d479f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -478,9 +478,6 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
{
struct page *page;
- if (!filp)
- return -EINVAL;
-
if (!mapping->a_ops->readpage)
return -EIO;
/*
--
1.5.5.1
Extract the kmap_atomic/memcpy/kunmap_atomic code from
xol_get_insn_slot() into the new simple helper, copy_to_page().
It will have more users soon.
Signed-off-by: Oleg Nesterov <[email protected]>
---
kernel/events/uprobes.c | 14 ++++++++------
1 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index e5d479f..9d943f7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -180,6 +180,13 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in
kunmap_atomic(kaddr);
}
+static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
+{
+ void *kaddr = kmap_atomic(page);
+ memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
+ kunmap_atomic(kaddr);
+}
+
static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode)
{
uprobe_opcode_t old_opcode;
@@ -1204,9 +1211,7 @@ static unsigned long xol_take_insn_slot(struct xol_area *area)
static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
{
struct xol_area *area;
- unsigned long offset;
unsigned long xol_vaddr;
- void *vaddr;
area = get_xol_area();
if (!area)
@@ -1217,10 +1222,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
return 0;
/* Initialize the slot */
- offset = xol_vaddr & ~PAGE_MASK;
- vaddr = kmap_atomic(area->page);
- memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
- kunmap_atomic(vaddr);
+ copy_to_page(area->page, xol_vaddr, uprobe->arch.insn, MAX_UINSN_BYTES);
/*
* We probably need flush_icache_user_range() but it needs vma.
* This should work on supported architectures too.
--
1.5.5.1
On Sun, Mar 24, 2013 at 07:20:40PM +0100, Oleg Nesterov wrote:
> On 03/24, Oleg Nesterov wrote:
> >
> > On 03/22, Anton Arapov wrote:
> > >
> > > +static void fill_page(struct page *page, unsigned long offset, uprobe_opcode_t *insn)
> > ^^^^^^^^^^^^^^^
> > Well, it should be "u8 *insn" or "char *".
>
> Or void*.
>
> And we have another reason for this helper.
>
> And I really think that we need to cleanup the kmap mess in uprobe.c
> before we add the new abuser.
>
> How about this simple series?
>
> > sizeof(insn) == UPROBE_SWBP_INSN_SIZE != MAX_UINSN_BYTES. See also the
> > comments above.
>
> I think that copy_to_page() added by 4/5 is what you need, and it solves
> the problems with typeof/sizeof. Plus it have another caller.
That's what I can't come up with. My fill_page() was indeed ugly.
Thanks!
Anton.
> Oleg.
>
On Sun, Mar 24, 2013 at 07:21:10PM +0100, Oleg Nesterov wrote:
> No functional changes. Rename copy_opcode() into copy_from_page() and
> add the new "int len" argument to make it more more generic for the
> new users.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 5c273b3..d6891cb 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -173,10 +173,10 @@ bool __weak is_swbp_insn(uprobe_opcode_t *insn)
> return *insn == UPROBE_SWBP_INSN;
> }
>
> -static void copy_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode)
> +static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
> {
> void *kaddr = kmap_atomic(page);
> - memcpy(opcode, kaddr + (vaddr & ~PAGE_MASK), UPROBE_SWBP_INSN_SIZE);
> + memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
> kunmap_atomic(kaddr);
> }
>
> @@ -185,7 +185,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
> uprobe_opcode_t old_opcode;
> bool is_swbp;
>
> - copy_opcode(page, vaddr, &old_opcode);
> + copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
> is_swbp = is_swbp_insn(&old_opcode);
>
> if (is_swbp_insn(new_opcode)) {
> @@ -1449,7 +1449,7 @@ static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr)
> if (result < 0)
> return result;
>
> - copy_opcode(page, vaddr, &opcode);
> + copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
> put_page(page);
> out:
> return is_swbp_insn(&opcode);
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>
On Sun, Mar 24, 2013 at 07:21:18PM +0100, Oleg Nesterov wrote:
> __copy_insn(filp) can only be called after valid_vma() returns T,
> vma->vm_file passed as "filp" can not be NULL.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 0a759c6..e5d479f 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -478,9 +478,6 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
> {
> struct page *page;
>
> - if (!filp)
> - return -EINVAL;
> -
> if (!mapping->a_ops->readpage)
> return -EIO;
> /*
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>
On Sun, Mar 24, 2013 at 07:21:22PM +0100, Oleg Nesterov wrote:
> Extract the kmap_atomic/memcpy/kunmap_atomic code from
> xol_get_insn_slot() into the new simple helper, copy_to_page().
> It will have more users soon.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index e5d479f..9d943f7 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -180,6 +180,13 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in
> kunmap_atomic(kaddr);
> }
>
> +static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
> +{
> + void *kaddr = kmap_atomic(page);
> + memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
> + kunmap_atomic(kaddr);
> +}
> +
> static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode)
> {
> uprobe_opcode_t old_opcode;
> @@ -1204,9 +1211,7 @@ static unsigned long xol_take_insn_slot(struct xol_area *area)
> static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
> {
> struct xol_area *area;
> - unsigned long offset;
> unsigned long xol_vaddr;
> - void *vaddr;
>
> area = get_xol_area();
> if (!area)
> @@ -1217,10 +1222,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
> return 0;
>
> /* Initialize the slot */
> - offset = xol_vaddr & ~PAGE_MASK;
> - vaddr = kmap_atomic(area->page);
> - memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
> - kunmap_atomic(vaddr);
> + copy_to_page(area->page, xol_vaddr, uprobe->arch.insn, MAX_UINSN_BYTES);
> /*
> * We probably need flush_icache_user_range() but it needs vma.
> * This should work on supported architectures too.
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>
On Sun, Mar 24, 2013 at 07:21:15PM +0100, Oleg Nesterov wrote:
> Change __copy_insn() to use copy_from_page() and simplify the code.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 13 ++-----------
> 1 files changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index d6891cb..0a759c6 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -477,30 +477,21 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
> unsigned long nbytes, loff_t offset)
> {
> struct page *page;
> - void *vaddr;
> - unsigned long off;
> - pgoff_t idx;
>
> if (!filp)
> return -EINVAL;
>
> if (!mapping->a_ops->readpage)
> return -EIO;
> -
> - idx = offset >> PAGE_CACHE_SHIFT;
> - off = offset & ~PAGE_MASK;
> -
> /*
> * Ensure that the page that has the original instruction is
> * populated and in page-cache.
> */
> - page = read_mapping_page(mapping, idx, filp);
> + page = read_mapping_page(mapping, offset >> PAGE_CACHE_SHIFT, filp);
> if (IS_ERR(page))
> return PTR_ERR(page);
>
> - vaddr = kmap_atomic(page);
> - memcpy(insn, vaddr + off, nbytes);
> - kunmap_atomic(vaddr);
> + copy_from_page(page, offset, insn, nbytes);
> page_cache_release(page);
>
> return 0;
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>
On Sun, Mar 24, 2013 at 07:21:25PM +0100, Oleg Nesterov wrote:
> Change write_opcode() to use copy_highpage() + copy_to_page()
> and simplify the code.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
> ---
> kernel/events/uprobes.c | 12 ++----------
> 1 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 9d943f7..72f03d4 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -232,7 +232,6 @@ static int write_opcode(struct mm_struct *mm, unsigned long vaddr,
> uprobe_opcode_t opcode)
> {
> struct page *old_page, *new_page;
> - void *vaddr_old, *vaddr_new;
> struct vm_area_struct *vma;
> int ret;
>
> @@ -253,15 +252,8 @@ retry:
>
> __SetPageUptodate(new_page);
>
> - /* copy the page now that we've got it stable */
> - vaddr_old = kmap_atomic(old_page);
> - vaddr_new = kmap_atomic(new_page);
> -
> - memcpy(vaddr_new, vaddr_old, PAGE_SIZE);
> - memcpy(vaddr_new + (vaddr & ~PAGE_MASK), &opcode, UPROBE_SWBP_INSN_SIZE);
> -
> - kunmap_atomic(vaddr_new);
> - kunmap_atomic(vaddr_old);
> + copy_highpage(new_page, old_page);
> + copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
>
> ret = anon_vma_prepare(vma);
> if (ret)
> --
> 1.5.5.1
>
Acked-by: Anton Arapov <[email protected]>
On 03/24, Oleg Nesterov wrote:
>
> On 03/22, Anton Arapov wrote:
> >
> > @@ -1139,6 +1155,10 @@ static struct xol_area *get_xol_area(void)
> > if (!area->page)
> > goto free_bitmap;
> >
> > + /* pre-allocate for return probes */
> > + set_bit(0, area->bitmap);
> > + fill_page(area->page, 0, &insn);
>
> sizeof(insn) == UPROBE_SWBP_INSN_SIZE != MAX_UINSN_BYTES. See also the
> comments above.
Sorry, forgot to mention... you also need to update ->slot_count.
Oleg.
The last comment, I promise ;)
On 03/24, Oleg Nesterov wrote:
>
> On 03/22, Anton Arapov wrote:
> >
> > +static void handle_uretprobe(struct xol_area *area, struct pt_regs *regs)
> > +{
> > + struct uprobe_task *utask;
> > + struct return_instance *ri, *tmp;
> > + unsigned long prev_ret_vaddr;
> > +
> > + utask = get_utask();
> > + if (!utask)
> > + return;
> > +
> > + ri = utask->return_instances;
> > + if (!ri)
> > + return;
>
> Hmm. I am wondering what should the caller (handle_swbp) do in this
> case...
And you do not actually need get_utask(), just check current->utask.
handle_uretprobe() must not be called if either ->utask or
->return_instances is NULL. This can only happen if we have a bug,
or user-space tries to fool the kernel.
Perhaps it makes sense to add pr_warn().
Oleg.
On Sun, Mar 24, 2013 at 05:28:17PM +0100, Oleg Nesterov wrote:
> On 03/22, Anton Arapov wrote:
> >
> > +static void handle_uretprobe(struct xol_area *area, struct pt_regs *regs)
> > +{
> > + struct uprobe_task *utask;
> > + struct return_instance *ri, *tmp;
> > + unsigned long prev_ret_vaddr;
> > +
> > + utask = get_utask();
> > + if (!utask)
> > + return;
> > +
> > + ri = utask->return_instances;
> > + if (!ri)
> > + return;
> Hmm. I am wondering what should the caller (handle_swbp) do in this
> case...
Not sure as well... Will look into it.
> > +
> > + instruction_pointer_set(regs, ri->orig_ret_vaddr);
> > +
> > + while (ri) {
> > + if (ri->uprobe->consumers)
> > + handler_uretprobe_chain(ri->uprobe, regs);
> I'd suggest to either remove this check or move it into
> handler_uretprobe_chain().
>
> > +
> > + put_uprobe(ri->uprobe);
> > + tmp = ri;
> > + prev_ret_vaddr = tmp->orig_ret_vaddr;
> For what? It seems that prev_ret_vaddr should be simply killed.
Both above are leftovers I've overlooked before git-send... :(
> > + ri = ri->next;
> > + kfree(tmp);
> Another case when you do put_uprobe/kfree using the different vars...
> Once again, the code is correct but imho a bit confusing.
I agree will change it and align with the code in uprobe_free_utask()
> > + if (!ri || ri->dirty == false) {
> > + /*
> > + * This is the first return uprobe (chronologically)
> > + * pushed for this particular instance of the probed
> > + * function.
> > + */
> > + utask->return_instances = ri;
> > + return;
> > + }
>
> Else? we simply return without updating ->return_instances which
> points to the freed element(s) ? OK, this must not be possible but
> this is not obvious...
>
> And the fact you check "ri != NULL" twice doesn't look very nice.
> We already checked ri != NULL before while(ri), we have to do this
> anyway for instruction_pointer_set(). Perhaps do/whild or even
> for (;;) + break would be more clean. But this is minor.
>
>
> I am not sure the logic is correct. OK. suppose that
> ->return_instances = NULL.
>
> The task hits the rp breakoint. After that
>
> return_instances -> { .dirty = false }
>
> The task hits the same breakoint before return (tail call), now
> we have
>
> return_instances -> { .dirty = true } -> { .dirty = false }
>
> Then it returns and handle_uretprobe() should unwind the whole stack.
> But, it seems, the main loop will stop after the 1st iteration?
>
> Ignoring the fact you need put_uprobe/kfree, it seems that we should
> do something like this,
>
> do {
> handler_uretprobe_chain(...);
>
> if (!ri->dirty) // not chained
> break;
>
> ri = ri->next;
> } while (ri);
>
> utask->return_instances = ri;
> No?
Oleg, Do you mean
do {
handler_uretprobe_chain(...);
ri = ri->next;
if (!ri->dirty) // not chained
break;
} while (ri);
utask->return_instances = ri;
otherwise we stuck with the first instance in stack.
...and perhaps for(;;) would be 'more beautiful' here?
>
> > @@ -1631,11 +1681,19 @@ static void handle_swbp(struct pt_regs *regs)
> > {
> > struct uprobe *uprobe;
> > unsigned long bp_vaddr;
> > + struct xol_area *area;
> > int uninitialized_var(is_swbp);
> >
> > bp_vaddr = uprobe_get_swbp_addr(regs);
> > - uprobe = find_active_uprobe(bp_vaddr, &is_swbp);
> > + area = get_xol_area();
>
> Why?
> No, we do not want this heavy and potentially unnecessary get_xol_area(),
>
> > + if (area) {
>
> Just check uprobes_state.xol_area != NULL instead. If it is NULL
> we simply should not call handle_uretprobe().
>
> Or perhaps get_trampoline_vaddr() should simply return -1 if
> ->xol_area == NULL.
right.
>
> > + if (bp_vaddr == get_trampoline_vaddr(area)) {
>
> I just noticed get_trampoline_vaddr() takes an argument... It should
> not, I think.
>
Yes, at this place we must have *area allocated. And I agree with
your arguments, I will remove *area argument from
get_trampoline_vaddr() and handle_uretprobe() it makes sense to me as
well.
Anton
> Oleg.
>
On Sun, Mar 24, 2013 at 04:26:51PM +0100, Oleg Nesterov wrote:
> On 03/22, Anton Arapov wrote:
> >
> > void uprobe_free_utask(struct task_struct *t)
> > {
> > struct uprobe_task *utask = t->utask;
> > + struct return_instance *ri, *tmp;
> >
> > if (!utask)
> > return;
> > @@ -1325,6 +1334,15 @@ void uprobe_free_utask(struct task_struct *t)
> > if (utask->active_uprobe)
> > put_uprobe(utask->active_uprobe);
> >
> > + ri = utask->return_instances;
>
> You also need to nullify ->return_instances before return, otherwise
> it can be use-after-freed later.
>
> uprobe_free_utask() can also be called when the task execs.
>
> > + while (ri) {
> > + put_uprobe(ri->uprobe);
> > +
> > + tmp = ri;
> > + ri = ri->next;
> > + kfree(tmp);
> > + }
>
> This is really minor, but I can't resist. Both put_uprobe() and kfree()
> work with the same object, it would be more clean to use the same var.
> Say,
>
> while (ri) {
> tmp = ri;
> ri = ri->next;
>
> put_uprobe(tmp->uprobe);
> kfree(tmp);
> }
>
> > +static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> > +{
> ...
> > +
> > + prev_ret_vaddr = -1;
> > + if (utask->return_instances)
> > + prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
> > +
> > + ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
> > + if (!ri)
> > + return;
> > +
> > + ri->dirty = false;
> > + trampoline_vaddr = get_trampoline_vaddr(area);
> > + ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
> > +
> > + /*
> > + * We don't want to keep trampoline address in stack, rather keep the
> > + * original return address of first caller thru all the consequent
> > + * instances. This also makes breakpoint unwrapping easier.
> > + */
> > + if (ret_vaddr == trampoline_vaddr) {
> > + if (likely(prev_ret_vaddr != -1)) {
> > + ri->dirty = true;
> > + ret_vaddr = prev_ret_vaddr;
> > + } else {
> > + /*
> > + * This situation is not possible. Likely we have an
> > + * attack from user-space. Die.
> > + */
> > + printk(KERN_ERR "uprobe: something went wrong "
> > + "pid/tgid=%d/%d", current->pid, current->tgid);
> > + send_sig(SIGSEGV, current, 0);
> > + kfree(ri);
> > + return;
> > + }
> > + }
> > +
> > + if (likely(ret_vaddr != -1)) {
> > + atomic_inc(&uprobe->ref);
> > + ri->uprobe = uprobe;
> > + ri->orig_ret_vaddr = ret_vaddr;
> > +
> > + /* add instance to the stack */
> > + ri->next = utask->return_instances;
> > + utask->return_instances = ri;
> > +
> > + return;
> > + }
> > +
> > + kfree(ri);
> > +}
>
> Anton, this really doesn't look clear/clean. Why do you need prev_ret_vaddr
> in advance? Why do you need it at all? why do you delay the "ret_vaddr == -1"
> errorcheck?
>
> And ->dirty looks confusing... perhaps ->chained ?
>
> ri = kzalloc(...);
> if (!ri)
> return;
>
> ret_vaddr = arch_uretprobe_hijack_return_addr(...);
> if (ret_vaddr == -1)
> goto err;
>
> if (ret_vaddr == trampoline_vaddr) {
> if (!utask->return_instances) {
> // This situation is not possible.
> // (not sure we should send SIGSEGV)
> pr_warn(...);
> goto err;
> }
>
> ri->chained = true;
> ret_vaddr = utask->return_instances->orig_ret_vaddr;
> }
>
> fill-ri-and-add-push-it;
> return;
>
> err:
> kfree(ri);
> return;
I will do the appropriate changes. Thanks!
Anton.
> Oleg.
>
On 03/25, Anton Arapov wrote:
>
> On Sun, Mar 24, 2013 at 05:28:17PM +0100, Oleg Nesterov wrote:
> >
> > Ignoring the fact you need put_uprobe/kfree, it seems that we should
> > do something like this,
> >
> > do {
> > handler_uretprobe_chain(...);
> >
> > if (!ri->dirty) // not chained
> > break;
> >
> > ri = ri->next;
> > } while (ri);
> >
> > utask->return_instances = ri;
> > No?
>
> Oleg, Do you mean
>
> do {
> handler_uretprobe_chain(...);
>
> ri = ri->next;
>
> if (!ri->dirty) // not chained
> break;
> } while (ri);
>
> utask->return_instances = ri;
>
> otherwise we stuck with the first instance in stack.
Not sure I understand... but it is very possible I missed something.
But the pseudo code I wrote is not correct, I meant
utask->return_instances = ri->next;
after the main loop.
> ...and perhaps for(;;) would be 'more beautiful' here?
Oh, I would not mind either way. In fact we do not really need
ri != NULL check inside the loop (again, unless I am confused).
We must see a non-chained entry in the stack unless we have a
bug.
Oleg.
On Mon, Mar 25, 2013 at 05:38:00PM +0100, Oleg Nesterov wrote:
> On 03/25, Anton Arapov wrote:
> >
> > On Sun, Mar 24, 2013 at 05:28:17PM +0100, Oleg Nesterov wrote:
> > >
> > > Ignoring the fact you need put_uprobe/kfree, it seems that we should
> > > do something like this,
> > >
> > > do {
> > > handler_uretprobe_chain(...);
> > >
> > > if (!ri->dirty) // not chained
> > > break;
> > >
> > > ri = ri->next;
> > > } while (ri);
> > >
> > > utask->return_instances = ri;
> > > No?
> >
> > Oleg, Do you mean
> >
> > do {
> > handler_uretprobe_chain(...);
> >
> > ri = ri->next;
> >
> > if (!ri->dirty) // not chained
> > break;
> > } while (ri);
> >
> > utask->return_instances = ri;
> >
> > otherwise we stuck with the first instance in stack.
>
> Not sure I understand... but it is very possible I missed something.
>
> But the pseudo code I wrote is not correct, I meant
>
> utask->return_instances = ri->next;
>
> after the main loop.
This all makes sense now. Thanks.
Anton.
On Sun, Mar 24, 2013 at 04:26:51PM +0100, Oleg Nesterov wrote:
> On 03/22, Anton Arapov wrote:
[snip]
> And ->dirty looks confusing... perhaps ->chained ?
>
> ri = kzalloc(...);
> if (!ri)
> return;
>
> ret_vaddr = arch_uretprobe_hijack_return_addr(...);
> if (ret_vaddr == -1)
> goto err;
>
> if (ret_vaddr == trampoline_vaddr) {
> if (!utask->return_instances) {
> // This situation is not possible.
> // (not sure we should send SIGSEGV)
> pr_warn(...);
> goto err;
> }
If we don't send SIGSEGV, does it make sense to restore the original
return address that was just hijacked? So that we just decline setting
the breakpoint for this very case.
Anton.
>
> ri->chained = true;
> ret_vaddr = utask->return_instances->orig_ret_vaddr;
> }
>
> fill-ri-and-add-push-it;
> return;
>
> err:
> kfree(ri);
> return;
>
> Oleg.
>
On Tue, Mar 26, 2013 at 09:45:33AM +0100, Anton Arapov wrote:
> On Sun, Mar 24, 2013 at 04:26:51PM +0100, Oleg Nesterov wrote:
> > On 03/22, Anton Arapov wrote:
> [snip]
> > And ->dirty looks confusing... perhaps ->chained ?
> >
> > ri = kzalloc(...);
> > if (!ri)
> > return;
> >
> > ret_vaddr = arch_uretprobe_hijack_return_addr(...);
> > if (ret_vaddr == -1)
> > goto err;
> >
> > if (ret_vaddr == trampoline_vaddr) {
> > if (!utask->return_instances) {
> > // This situation is not possible.
> > // (not sure we should send SIGSEGV)
> > pr_warn(...);
> > goto err;
> > }
>
> If we don't send SIGSEGV, does it make sense to restore the original
> return address that was just hijacked? So that we just decline setting
> the breakpoint for this very case.
disregard this one. we have no address to restore at that moment. :)
> Anton.
>
> >
> > ri->chained = true;
> > ret_vaddr = utask->return_instances->orig_ret_vaddr;
> > }
> >
> > fill-ri-and-add-push-it;
> > return;
> >
> > err:
> > kfree(ri);
> > return;
> >
> > Oleg.
> >
* Oleg Nesterov <[email protected]> [2013-03-24 19:21:25]:
> Change write_opcode() to use copy_highpage() + copy_to_page()
> and simplify the code.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/events/uprobes.c | 12 ++----------
> 1 files changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 9d943f7..72f03d4 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -232,7 +232,6 @@ static int write_opcode(struct mm_struct *mm, unsigned long vaddr,
> uprobe_opcode_t opcode)
> {
> struct page *old_page, *new_page;
> - void *vaddr_old, *vaddr_new;
> struct vm_area_struct *vma;
> int ret;
>
> @@ -253,15 +252,8 @@ retry:
>
> __SetPageUptodate(new_page);
>
> - /* copy the page now that we've got it stable */
> - vaddr_old = kmap_atomic(old_page);
> - vaddr_new = kmap_atomic(new_page);
> -
> - memcpy(vaddr_new, vaddr_old, PAGE_SIZE);
> - memcpy(vaddr_new + (vaddr & ~PAGE_MASK), &opcode, UPROBE_SWBP_INSN_SIZE);
> -
> - kunmap_atomic(vaddr_new);
> - kunmap_atomic(vaddr_old);
> + copy_highpage(new_page, old_page);
> + copy_to_page(new_page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
>
> ret = anon_vma_prepare(vma);
> if (ret)
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2013-03-24 19:21:10]:
> No functional changes. Rename copy_opcode() into copy_from_page() and
> add the new "int len" argument to make it more more generic for the
> new users.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/events/uprobes.c | 8 ++++----
> 1 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 5c273b3..d6891cb 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -173,10 +173,10 @@ bool __weak is_swbp_insn(uprobe_opcode_t *insn)
> return *insn == UPROBE_SWBP_INSN;
> }
>
> -static void copy_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *opcode)
> +static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, int len)
> {
> void *kaddr = kmap_atomic(page);
> - memcpy(opcode, kaddr + (vaddr & ~PAGE_MASK), UPROBE_SWBP_INSN_SIZE);
> + memcpy(dst, kaddr + (vaddr & ~PAGE_MASK), len);
> kunmap_atomic(kaddr);
> }
>
> @@ -185,7 +185,7 @@ static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t
> uprobe_opcode_t old_opcode;
> bool is_swbp;
>
> - copy_opcode(page, vaddr, &old_opcode);
> + copy_from_page(page, vaddr, &old_opcode, UPROBE_SWBP_INSN_SIZE);
> is_swbp = is_swbp_insn(&old_opcode);
>
> if (is_swbp_insn(new_opcode)) {
> @@ -1449,7 +1449,7 @@ static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr)
> if (result < 0)
> return result;
>
> - copy_opcode(page, vaddr, &opcode);
> + copy_from_page(page, vaddr, &opcode, UPROBE_SWBP_INSN_SIZE);
> put_page(page);
> out:
> return is_swbp_insn(&opcode);
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2013-03-24 19:21:18]:
> __copy_insn(filp) can only be called after valid_vma() returns T,
> vma->vm_file passed as "filp" can not be NULL.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/events/uprobes.c | 3 ---
> 1 files changed, 0 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 0a759c6..e5d479f 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -478,9 +478,6 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
> {
> struct page *page;
>
> - if (!filp)
> - return -EINVAL;
> -
> if (!mapping->a_ops->readpage)
> return -EIO;
> /*
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2013-03-24 19:21:15]:
> Change __copy_insn() to use copy_from_page() and simplify the code.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/events/uprobes.c | 13 ++-----------
> 1 files changed, 2 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index d6891cb..0a759c6 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -477,30 +477,21 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
> unsigned long nbytes, loff_t offset)
> {
> struct page *page;
> - void *vaddr;
> - unsigned long off;
> - pgoff_t idx;
>
> if (!filp)
> return -EINVAL;
>
> if (!mapping->a_ops->readpage)
> return -EIO;
> -
> - idx = offset >> PAGE_CACHE_SHIFT;
> - off = offset & ~PAGE_MASK;
> -
> /*
> * Ensure that the page that has the original instruction is
> * populated and in page-cache.
> */
> - page = read_mapping_page(mapping, idx, filp);
> + page = read_mapping_page(mapping, offset >> PAGE_CACHE_SHIFT, filp);
> if (IS_ERR(page))
> return PTR_ERR(page);
>
> - vaddr = kmap_atomic(page);
> - memcpy(insn, vaddr + off, nbytes);
> - kunmap_atomic(vaddr);
> + copy_from_page(page, offset, insn, nbytes);
> page_cache_release(page);
>
> return 0;
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
* Oleg Nesterov <[email protected]> [2013-03-24 19:21:22]:
> Extract the kmap_atomic/memcpy/kunmap_atomic code from
> xol_get_insn_slot() into the new simple helper, copy_to_page().
> It will have more users soon.
>
> Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/events/uprobes.c | 14 ++++++++------
> 1 files changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index e5d479f..9d943f7 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -180,6 +180,13 @@ static void copy_from_page(struct page *page, unsigned long vaddr, void *dst, in
> kunmap_atomic(kaddr);
> }
>
> +static void copy_to_page(struct page *page, unsigned long vaddr, const void *src, int len)
> +{
> + void *kaddr = kmap_atomic(page);
> + memcpy(kaddr + (vaddr & ~PAGE_MASK), src, len);
> + kunmap_atomic(kaddr);
> +}
> +
> static int verify_opcode(struct page *page, unsigned long vaddr, uprobe_opcode_t *new_opcode)
> {
> uprobe_opcode_t old_opcode;
> @@ -1204,9 +1211,7 @@ static unsigned long xol_take_insn_slot(struct xol_area *area)
> static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
> {
> struct xol_area *area;
> - unsigned long offset;
> unsigned long xol_vaddr;
> - void *vaddr;
>
> area = get_xol_area();
> if (!area)
> @@ -1217,10 +1222,7 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
> return 0;
>
> /* Initialize the slot */
> - offset = xol_vaddr & ~PAGE_MASK;
> - vaddr = kmap_atomic(area->page);
> - memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES);
> - kunmap_atomic(vaddr);
> + copy_to_page(area->page, xol_vaddr, uprobe->arch.insn, MAX_UINSN_BYTES);
> /*
> * We probably need flush_icache_user_range() but it needs vma.
> * This should work on supported architectures too.
> --
> 1.5.5.1
>
--
Thanks and Regards
Srikar Dronamraju
On Fri, Mar 22, 2013 at 04:02:52PM +0100, Oleg Nesterov wrote:
> I'll try to read this series later. Just one note...
>
> On 03/22, Anton Arapov wrote:
> >
> > IOW, we must ensure that uprobe_pre_sstep_notifier() can't return 0.
>
> Yes, but I do not see this change?
>
> > +static void prepare_uretprobe(struct uprobe *uprobe, struct pt_regs *regs)
> > +{
> > + struct return_instance *ri;
> > + struct uprobe_task *utask;
> > + struct xol_area *area;
> > + unsigned long trampoline_vaddr;
> > + unsigned long prev_ret_vaddr, ret_vaddr;
> > +
> > + area = get_xol_area();
> > + if (!area)
> > + return;
> > +
> > + utask = get_utask();
> > + if (!utask)
> > + return;
> > +
> > + prev_ret_vaddr = -1;
> > + if (utask->return_instances)
> > + prev_ret_vaddr = utask->return_instances->orig_ret_vaddr;
> > +
> > + ri = kzalloc(sizeof(struct return_instance), GFP_KERNEL);
> > + if (!ri)
> > + return;
> > +
> > + ri->dirty = false;
> > + trampoline_vaddr = get_trampoline_vaddr(area);
> > + ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
>
> OK, but you need to ensure that this code can be compiled on poweprc.
It does compile, unlike the obvious arch_uretprobe_hijack_return_addr() where
I'll look for the Ananth's help, perhaps.
Anton.
On 03/26, Anton Arapov wrote:
>
> On Fri, Mar 22, 2013 at 04:02:52PM +0100, Oleg Nesterov wrote:
> > > + ri->dirty = false;
> > > + trampoline_vaddr = get_trampoline_vaddr(area);
> > > + ret_vaddr = arch_uretprobe_hijack_return_addr(trampoline_vaddr, regs);
> >
> > OK, but you need to ensure that this code can be compiled on poweprc.
>
> It does compile, unlike the obvious arch_uretprobe_hijack_return_addr() where
> I'll look for the Ananth's help, perhaps.
Just make the default weak function which returns -1.
After that Ananth can send the (hopefully simple) patch with the
powerpc implementation.
Oleg.