Subject: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

From: Ananth N Mavinakayanahalli <[email protected]>

On RISC architectures like powerpc, instructions are fixed size.
Instruction analysis on such platforms is just a matter of (insn % 4).
Pass the vaddr at which the uprobe is to be inserted so that
arch_uprobe_analyze_insn() can flag misaligned registration requests.

Changes in V2:
Pass (unsigned long)addr instead of (loff_t)vaddr to
arch_uprobe_analyze_insn(). We need the loff_t vaddr to take care of
the offset of inode:offset pair for large file sizes on 32bit.

Signed-off-by: Ananth N Mavinakaynahalli <[email protected]>
---
arch/x86/include/asm/uprobes.h | 2 +-
arch/x86/kernel/uprobes.c | 3 ++-
kernel/events/uprobes.c | 2 +-
3 files changed, 4 insertions(+), 3 deletions(-)

Index: linux-3.5-rc1/arch/x86/include/asm/uprobes.h
===================================================================
--- linux-3.5-rc1.orig/arch/x86/include/asm/uprobes.h
+++ linux-3.5-rc1/arch/x86/include/asm/uprobes.h
@@ -48,7 +48,7 @@ struct arch_uprobe_task {
#endif
};

-extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm);
+extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
Index: linux-3.5-rc1/arch/x86/kernel/uprobes.c
===================================================================
--- linux-3.5-rc1.orig/arch/x86/kernel/uprobes.c
+++ linux-3.5-rc1/arch/x86/kernel/uprobes.c
@@ -409,9 +409,10 @@ static int validate_insn_bits(struct arc
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
* @arch_uprobe: the probepoint information.
+ * @addr: virtual address at which to install the probepoint
* Return 0 on success or a -ve number on error.
*/
-int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm)
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
{
int ret;
struct insn insn;
Index: linux-3.5-rc1/kernel/events/uprobes.c
===================================================================
--- linux-3.5-rc1.orig/kernel/events/uprobes.c
+++ linux-3.5-rc1/kernel/events/uprobes.c
@@ -697,7 +697,7 @@ install_breakpoint(struct uprobe *uprobe
if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
return -EEXIST;

- ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
+ ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, addr);
if (ret)
return ret;


Subject: [PATCH v2 2/2] [POWERPC] uprobes: powerpc port

From: Ananth N Mavinakayanahalli <[email protected]>

This is the port of uprobes to powerpc. Usage is similar to x86.

One TODO in this port compared to x86 is the uprobe abort_xol() logic.
x86 depends on the thread_struct.trap_nr (absent in powerpc) to determine
if a signal was caused when the uprobed instruction was single-stepped/
emulated, in which case, we reset the instruction pointer to the probed
address and retry the probe again.

Tested on POWER6; I don't see anything here that should stop it from
working on a ppc32; since I don't have access to a ppc32 machine, it
would be good if somoene could verify that part.

[root@xxxx ~]# ./bin/perf probe -x /lib64/libc.so.6 malloc
Added new event:
probe_libc:malloc (on 0xb4860)

You can now use it in all perf tools, such as:

perf record -e probe_libc:malloc -aR sleep 1

[root@xxxx ~]# ./bin/perf record -e probe_libc:malloc -aR sleep 20
[ perf record: Woken up 22 times to write data ]
[ perf record: Captured and wrote 5.843 MB perf.data (~255302 samples) ]
[root@xxxx ~]# ./bin/perf report --stdio
# ========
# captured on: Mon Jun 4 05:26:31 2012
# hostname : xxxx.ibm.com
# os release : 3.4.0-uprobe
# perf version : 3.4.0
# arch : ppc64
# nrcpus online : 4
# nrcpus avail : 4
# cpudesc : POWER6 (raw), altivec supported
# cpuid : 62,769
# total memory : 7310528 kB
# cmdline : /root/bin/perf record -e probe_libc:malloc -aR sleep 20
# event : name = probe_libc:malloc, type = 2, config = 0x124, config1 = 0x0, con
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# ========
#
# Samples: 83K of event 'probe_libc:malloc'
# Event count (approx.): 83484
#
# Overhead Command Shared Object Symbol
# ........ ............ ............. ..........
#
69.05% tar libc-2.12.so [.] malloc
28.57% rm libc-2.12.so [.] malloc
1.32% avahi-daemon libc-2.12.so [.] malloc
0.58% bash libc-2.12.so [.] malloc
0.28% sshd libc-2.12.so [.] malloc
0.08% irqbalance libc-2.12.so [.] malloc
0.05% bzip2 libc-2.12.so [.] malloc
0.04% sleep libc-2.12.so [.] malloc
0.03% multipathd libc-2.12.so [.] malloc
0.01% sendmail libc-2.12.so [.] malloc
0.01% automount libc-2.12.so [.] malloc

V2:
a. arch_uprobe_analyze_insn() now gets unsigned long addr.
b. Verified that mtmsr[d] and rfi[d] are handled correctly by
emulate_step() (no changes to this patch).

Signed-off-by: Ananth N Mavinakayanahalli <[email protected]>
---
arch/powerpc/Kconfig | 3
arch/powerpc/include/asm/thread_info.h | 4
arch/powerpc/include/asm/uprobes.h | 49 +++++++++
arch/powerpc/kernel/Makefile | 1
arch/powerpc/kernel/signal.c | 6 +
arch/powerpc/kernel/uprobes.c | 164 +++++++++++++++++++++++++++++++++
6 files changed, 226 insertions(+), 1 deletion(-)

Index: linux-3.5-rc1/arch/powerpc/include/asm/thread_info.h
===================================================================
--- linux-3.5-rc1.orig/arch/powerpc/include/asm/thread_info.h
+++ linux-3.5-rc1/arch/powerpc/include/asm/thread_info.h
@@ -96,6 +96,7 @@ static inline struct thread_info *curren
#define TIF_RESTOREALL 11 /* Restore all regs (implies NOERROR) */
#define TIF_NOERROR 12 /* Force successful syscall return */
#define TIF_NOTIFY_RESUME 13 /* callback before returning to user */
+#define TIF_UPROBE 14 /* breakpointed or single-stepping */
#define TIF_SYSCALL_TRACEPOINT 15 /* syscall tracepoint instrumentation */

/* as above, but as bit values */
@@ -112,12 +113,13 @@ static inline struct thread_info *curren
#define _TIF_RESTOREALL (1<<TIF_RESTOREALL)
#define _TIF_NOERROR (1<<TIF_NOERROR)
#define _TIF_NOTIFY_RESUME (1<<TIF_NOTIFY_RESUME)
+#define _TIF_UPROBE (1<<TIF_UPROBE)
#define _TIF_SYSCALL_TRACEPOINT (1<<TIF_SYSCALL_TRACEPOINT)
#define _TIF_SYSCALL_T_OR_A (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \
_TIF_SECCOMP | _TIF_SYSCALL_TRACEPOINT)

#define _TIF_USER_WORK_MASK (_TIF_SIGPENDING | _TIF_NEED_RESCHED | \
- _TIF_NOTIFY_RESUME)
+ _TIF_NOTIFY_RESUME | _TIF_UPROBE)
#define _TIF_PERSYSCALL_MASK (_TIF_RESTOREALL|_TIF_NOERROR)

/* Bits in local_flags */
Index: linux-3.5-rc1/arch/powerpc/include/asm/uprobes.h
===================================================================
--- /dev/null
+++ linux-3.5-rc1/arch/powerpc/include/asm/uprobes.h
@@ -0,0 +1,49 @@
+#ifndef _ASM_UPROBES_H
+#define _ASM_UPROBES_H
+/*
+ * User-space Probes (UProbes) for powerpc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2007-2012
+ *
+ * Adapted from the x86 port by Ananth N Mavinakayanahalli <[email protected]>
+ */
+
+#include <linux/notifier.h>
+
+typedef unsigned int uprobe_opcode_t;
+
+#define MAX_UINSN_BYTES 4
+#define UPROBE_XOL_SLOT_BYTES (MAX_UINSN_BYTES)
+
+#define UPROBE_SWBP_INSN 0x7fe00008
+#define UPROBE_SWBP_INSN_SIZE 4 /* swbp insn size in bytes */
+
+struct arch_uprobe {
+ u8 insn[MAX_UINSN_BYTES];
+};
+
+struct arch_uprobe_task {
+};
+
+extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
+extern unsigned long uprobe_get_swbp_addr(struct pt_regs *regs);
+extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
+extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data);
+extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs);
+#endif /* _ASM_UPROBES_H */
Index: linux-3.5-rc1/arch/powerpc/kernel/Makefile
===================================================================
--- linux-3.5-rc1.orig/arch/powerpc/kernel/Makefile
+++ linux-3.5-rc1/arch/powerpc/kernel/Makefile
@@ -96,6 +96,7 @@ obj-$(CONFIG_MODULES) += ppc_ksyms.o
obj-$(CONFIG_BOOTX_TEXT) += btext.o
obj-$(CONFIG_SMP) += smp.o
obj-$(CONFIG_KPROBES) += kprobes.o
+obj-$(CONFIG_UPROBES) += uprobes.o
obj-$(CONFIG_PPC_UDBG_16550) += legacy_serial.o udbg_16550.o
obj-$(CONFIG_STACKTRACE) += stacktrace.o
obj-$(CONFIG_SWIOTLB) += dma-swiotlb.o
Index: linux-3.5-rc1/arch/powerpc/kernel/signal.c
===================================================================
--- linux-3.5-rc1.orig/arch/powerpc/kernel/signal.c
+++ linux-3.5-rc1/arch/powerpc/kernel/signal.c
@@ -11,6 +11,7 @@

#include <linux/tracehook.h>
#include <linux/signal.h>
+#include <linux/uprobes.h>
#include <linux/key.h>
#include <asm/hw_breakpoint.h>
#include <asm/uaccess.h>
@@ -157,6 +158,11 @@ static int do_signal(struct pt_regs *reg

void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)
{
+ if (thread_info_flags & _TIF_UPROBE) {
+ clear_thread_flag(TIF_UPROBE);
+ uprobe_notify_resume(regs);
+ }
+
if (thread_info_flags & _TIF_SIGPENDING)
do_signal(regs);

Index: linux-3.5-rc1/arch/powerpc/kernel/uprobes.c
===================================================================
--- /dev/null
+++ linux-3.5-rc1/arch/powerpc/kernel/uprobes.c
@@ -0,0 +1,164 @@
+/*
+ * User-space Probes (UProbes) for powerpc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2007-2012
+ *
+ * Adapted from the x86 port by Ananth N Mavinakayanahalli <[email protected]>
+ */
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/ptrace.h>
+#include <linux/uprobes.h>
+#include <linux/uaccess.h>
+
+#include <linux/kdebug.h>
+#include <asm/sstep.h>
+
+/**
+ * arch_uprobe_analyze_insn
+ * @mm: the probed address space.
+ * @arch_uprobe: the probepoint information.
+ * @addr: vaddr to probe.
+ * Return 0 on success or a -ve number on error.
+ */
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
+{
+ if (addr & 0x03)
+ return -EINVAL;
+ return 0;
+}
+
+/*
+ * arch_uprobe_pre_xol - prepare to execute out of line.
+ * @auprobe: the probepoint information.
+ * @regs: reflects the saved user state of current task.
+ */
+int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ /* FIXME: We don't support abort_xol on powerpc for now */
+ regs->nip = current->utask->xol_vaddr;
+ return 0;
+}
+
+/**
+ * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs
+ * @regs: Reflects the saved state of the task after it has hit a breakpoint
+ * instruction.
+ * Return the address of the breakpoint instruction.
+ */
+unsigned long uprobe_get_swbp_addr(struct pt_regs *regs)
+{
+ return instruction_pointer(regs);
+}
+
+/*
+ * If xol insn itself traps and generates a signal (SIGILL/SIGSEGV/etc),
+ * then detect the case where a singlestepped instruction jumps back to its
+ * own address. It is assumed that anything like do_page_fault/do_trap/etc
+ * sets thread.trap_nr != -1.
+ *
+ * FIXME: powerpc however doesn't have thread.trap_nr yet.
+ *
+ * arch_uprobe_pre_xol/arch_uprobe_post_xol save/restore thread.trap_nr,
+ * arch_uprobe_xol_was_trapped() simply checks that ->trap_nr is not equal to
+ * UPROBE_TRAP_NR == -1 set by arch_uprobe_pre_xol().
+ */
+bool arch_uprobe_xol_was_trapped(struct task_struct *t)
+{
+ /* FIXME: We don't support abort_xol on powerpc for now */
+ return false;
+}
+
+/*
+ * Called after single-stepping. To avoid the SMP problems that can
+ * occur when we temporarily put back the original opcode to
+ * single-step, we single-stepped a copy of the instruction.
+ *
+ * This function prepares to resume execution after the single-step.
+ */
+int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ /* FIXME: We don't support abort_xol on powerpc for now */
+
+ /*
+ * On powerpc, except for loads and stores, most instructions
+ * including ones that alter code flow (branches, calls, returns)
+ * are emulated in the kernel. We get here only if the emulation
+ * support doesn't exist and have to fix-up the next instruction
+ * to be executed.
+ */
+ regs->nip = current->utask->vaddr + MAX_UINSN_BYTES;
+ return 0;
+}
+
+/* callback routine for handling exceptions. */
+int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+ struct die_args *args = data;
+ struct pt_regs *regs = args->regs;
+ int ret = NOTIFY_DONE;
+
+ /* We are only interested in userspace traps */
+ if (regs && !user_mode(regs))
+ return NOTIFY_DONE;
+
+ switch (val) {
+ case DIE_BPT:
+ if (uprobe_pre_sstep_notifier(regs))
+ ret = NOTIFY_STOP;
+ break;
+ case DIE_SSTEP:
+ if (uprobe_post_sstep_notifier(regs))
+ ret = NOTIFY_STOP;
+ default:
+ break;
+ }
+ return ret;
+}
+
+/*
+ * This function gets called when XOL instruction either gets trapped or
+ * the thread has a fatal signal, so reset the instruction pointer to its
+ * probed address.
+ */
+void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ /* FIXME: We don't support abort_xol on powerpc for now */
+ return;
+}
+
+/*
+ * See if the instruction can be emulated.
+ * Returns true if instruction was emulated, false otherwise.
+ */
+bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs)
+{
+ int ret;
+ unsigned int insn;
+
+ memcpy(&insn, auprobe->insn, MAX_UINSN_BYTES);
+
+ /*
+ * emulate_step() returns 1 if the insn was successfully emulated.
+ * For all other cases, we need to single-step in hardware.
+ */
+ ret = emulate_step(regs, insn);
+ if (ret > 0)
+ return true;
+
+ return false;
+}
Index: linux-3.5-rc1/arch/powerpc/Kconfig
===================================================================
--- linux-3.5-rc1.orig/arch/powerpc/Kconfig
+++ linux-3.5-rc1/arch/powerpc/Kconfig
@@ -237,6 +237,9 @@ config PPC_OF_PLATFORM_PCI
config ARCH_SUPPORTS_DEBUG_PAGEALLOC
def_bool y

+config ARCH_SUPPORTS_UPROBES
+ def_bool y
+
config PPC_ADV_DEBUG_REGS
bool
depends on 40x || BOOKE

Subject: [tip:perf/core] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

Commit-ID: 7eb9ba5ed312ec6ed9d22259c5da1acb7cf4bd29
Gitweb: http://git.kernel.org/tip/7eb9ba5ed312ec6ed9d22259c5da1acb7cf4bd29
Author: Ananth N Mavinakayanahalli <[email protected]>
AuthorDate: Fri, 8 Jun 2012 15:02:57 +0530
Committer: Ingo Molnar <[email protected]>
CommitDate: Fri, 8 Jun 2012 12:22:27 +0200

uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

On RISC architectures like powerpc, instructions are fixed size.
Instruction analysis on such platforms is just a matter of
(insn % 4). Pass the vaddr at which the uprobe is to be inserted so
that arch_uprobe_analyze_insn() can flag misaligned registration
requests.

Signed-off-by: Ananth N Mavinakaynahalli <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Paul Mackerras <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Srikar Dronamraju <[email protected]>
Cc: Jim Keniston <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/uprobes.h | 2 +-
arch/x86/kernel/uprobes.c | 3 ++-
kernel/events/uprobes.c | 2 +-
3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h
index 1e9bed1..f3971bb 100644
--- a/arch/x86/include/asm/uprobes.h
+++ b/arch/x86/include/asm/uprobes.h
@@ -48,7 +48,7 @@ struct arch_uprobe_task {
#endif
};

-extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm);
+extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long addr);
extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs);
extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk);
diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c
index dc4e910..36fd420 100644
--- a/arch/x86/kernel/uprobes.c
+++ b/arch/x86/kernel/uprobes.c
@@ -409,9 +409,10 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm,
* arch_uprobe_analyze_insn - instruction analysis including validity and fixups.
* @mm: the probed address space.
* @arch_uprobe: the probepoint information.
+ * @addr: virtual address at which to install the probepoint
* Return 0 on success or a -ve number on error.
*/
-int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm)
+int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long addr)
{
int ret;
struct insn insn;
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 8c5e043..b52376d 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -706,7 +706,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
return -EEXIST;

- ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
+ ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, addr);
if (ret)
return ret;

2012-06-11 16:15:08

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

Ananth, Srikar,

I think the patch is correct and I am sorry for nit-picking,
this is really minor.

But,

On 06/08, Ananth N Mavinakayanahalli wrote:
>
> Changes in V2:
> Pass (unsigned long)addr

Well, IMHO, this is confusing.

First of all, why do we have this "addr" or even "vaddr"? It should
not exists. We pass it to copy_insn(), but for what?? copy_insn()
should simply use uprobe->offset, the virtual address for this
particular mapping does not matter at all. I am going to send
the cleanup.

Note also that we should move this !UPROBE_COPY_INSN from
install_breakpoint() to somewhere near alloc_uprobe(). This code
is called only once, it looks a bit strange to use the "random" mm
(the first mm vma_prio_tree_foreach() finds) and its mapping to
verify the insn. In fact this is simply not correct and should be
fixed, note that on x86 arch_uprobe_analyze_insn() checks
mm->context.ia32_compat.

IOW, Perhaps uprobe->offset makes more sense?

> --- linux-3.5-rc1.orig/kernel/events/uprobes.c
> +++ linux-3.5-rc1/kernel/events/uprobes.c
> @@ -697,7 +697,7 @@ install_breakpoint(struct uprobe *uprobe
> if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
> return -EEXIST;
>
> - ret = arch_uprobe_analyze_insn(&uprobe->arch, mm);
> + ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, addr);

Just fyi, this conflicts with
"[PATCH 1/3] uprobes: install_breakpoint() should fail if is_swbp_insn() == T"
I sent, but the conflict is trivial.

Oleg.

2012-06-11 19:11:35

by Oleg Nesterov

[permalink] [raw]
Subject: Q: a_ops->readpage() && struct file

On 06/11, Oleg Nesterov wrote:
>
> Note also that we should move this !UPROBE_COPY_INSN from
> install_breakpoint() to somewhere near alloc_uprobe().

The main problem is, uprobe_register() doesn't have struct file
for read_mapping_page().

Stupid question. I'm afraid the answer is "no" but I'll ask anyway.
Is it safe to pass filp == NULL to mapping->readpage()? In fact
I do not understand why it needs "struct file*" and I do not see
any example of actual usage. Yes, I didn't try to grep very much
and I understand that the filesystem can do something special.
Say it can use file->private_data...

However. There is read_cache_page_gfp() which does use
a_ops->readpage(filp => NULL), and the comment says nothing about
the riskiness.


If not, is there any other way
uprobe_register(struct inode *inode, loff_t offset) can read the
page at this offset?



And btw, read_mapping_page() accepts "void *data". Why? it uses
filler == a_ops->readpage, it shouldn't accept anything but file
pointer?


Please help, I know nothing about vfs.

Oleg.

2012-06-12 17:06:44

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

>
> Well, IMHO, this is confusing.
>
> First of all, why do we have this "addr" or even "vaddr"? It should
> not exists. We pass it to copy_insn(), but for what?? copy_insn()
> should simply use uprobe->offset, the virtual address for this
> particular mapping does not matter at all. I am going to send
> the cleanup.
>

Yes, we can use uprobe->offset instead of vaddr/addr.

> Note also that we should move this !UPROBE_COPY_INSN from
> install_breakpoint() to somewhere near alloc_uprobe(). This code
> is called only once, it looks a bit strange to use the "random" mm
> (the first mm vma_prio_tree_foreach() finds) and its mapping to
> verify the insn. In fact this is simply not correct and should be
> fixed, note that on x86 arch_uprobe_analyze_insn() checks

The reason we "delay" the copy_insn to the first insert is because
we have to get access to mm. For archs like x86, we want to know if the
executable is 32 bit or not (since we have a different valid set of
instructions for 32 bit and 64 bit). So in effect, if we get access to
struct file corresponding to the inode and if the inode corresponds to
32 bit executable file or 64 bit executable file during register, then
we can move it around alloc_uprobe().

2012-06-12 17:46:33

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

On 06/12, Srikar Dronamraju wrote:
> >
> > Note also that we should move this !UPROBE_COPY_INSN from
> > install_breakpoint() to somewhere near alloc_uprobe(). This code
> > is called only once, it looks a bit strange to use the "random" mm
> > (the first mm vma_prio_tree_foreach() finds) and its mapping to
> > verify the insn. In fact this is simply not correct and should be
> > fixed, note that on x86 arch_uprobe_analyze_insn() checks
>
> The reason we "delay" the copy_insn to the first insert is because
> we have to get access to mm. For archs like x86, we want to know if the
> executable is 32 bit or not

Yes. And this is wrong afaics.

Once again. This !UPROBE_COPY_INSN code is called only once, and it
uses the "random" mm. After that install_breakpoint() just calls
set_swbp(another_mm) while the insn can be invalid because
another_mm->ia32_compat != mm->ia32_compat.

> So in effect, if we get access to
> struct file corresponding to the inode and if the inode corresponds to
> 32 bit executable file or 64 bit executable file during register, then
> we can move it around alloc_uprobe().

I don't think this can work. I have another simple fix in mind, I'll
write another email later.

Oleg.

2012-06-13 09:58:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: Q: a_ops->readpage() && struct file

On Mon, 2012-06-11 at 21:09 +0200, Oleg Nesterov wrote:
> Stupid question. I'm afraid the answer is "no" but I'll ask anyway.
> Is it safe to pass filp == NULL to mapping->readpage()? In fact
> I do not understand why it needs "struct file*" and I do not see
> any example of actual usage.

Looking at afs_readpage it looks like its OK to pass in NULL. Same for
nfs_readpage. They use the file, if provided, to avoid some lookups, but
seem to deal with not having it.

This answer by example is of course not authorative nor complete.

2012-06-13 19:18:11

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

On 06/12, Oleg Nesterov wrote:
>
> On 06/12, Srikar Dronamraju wrote:
> > >
> > > Note also that we should move this !UPROBE_COPY_INSN from
> > > install_breakpoint() to somewhere near alloc_uprobe(). This code
> > > is called only once, it looks a bit strange to use the "random" mm
> > > (the first mm vma_prio_tree_foreach() finds) and its mapping to
> > > verify the insn. In fact this is simply not correct and should be
> > > fixed, note that on x86 arch_uprobe_analyze_insn() checks
> >
> > The reason we "delay" the copy_insn to the first insert is because
> > we have to get access to mm. For archs like x86, we want to know if the
> > executable is 32 bit or not
>
> Yes. And this is wrong afaics.
>
> Once again. This !UPROBE_COPY_INSN code is called only once, and it
> uses the "random" mm. After that install_breakpoint() just calls
> set_swbp(another_mm) while the insn can be invalid because
> another_mm->ia32_compat != mm->ia32_compat.
>
> > So in effect, if we get access to
> > struct file corresponding to the inode and if the inode corresponds to
> > 32 bit executable file or 64 bit executable file during register, then
> > we can move it around alloc_uprobe().
>
> I don't think this can work. I have another simple fix in mind, I'll
> write another email later.

For example. Suppose there is some instruction in /lib64/libc.so which
is valid for 64-bit, but not for 32-bit.

Suppose that a 32-bit application does mmap("/lib64/libc.so", PROT_EXEC).

Now. If vma_prio_tree_foreach() finds this 32-bit mm first, uprobe_register()
fails even if there are other 64-bit applications which could be traced.

Or. uprobe_register() succeeds because it finds a 64-bit mm first, and
then that 32-bit application actually executes the invalid insn.

We can move arch_uprobe_analyze_insn() outside of !UPROBE_COPY_INSN block.

Or, perhaps, validate_insn_bits() should call both
validate_insn_32bits() and validate_insn_64bits(), and set the
UPROBE_VALID_IF_32 / UPROBE_VALID_IF_64 flags. install_breakpoint()
should do the additinal check before set_swbp() and verify that
.ia32_compat matches UPROBE_VALID_IF_*.

What do you think?

Oleg.

2012-06-13 19:21:42

by Oleg Nesterov

[permalink] [raw]
Subject: Re: Q: a_ops->readpage() && struct file

On 06/13, Peter Zijlstra wrote:
>
> On Mon, 2012-06-11 at 21:09 +0200, Oleg Nesterov wrote:
> > Stupid question. I'm afraid the answer is "no" but I'll ask anyway.
> > Is it safe to pass filp == NULL to mapping->readpage()? In fact
> > I do not understand why it needs "struct file*" and I do not see
> > any example of actual usage.
>
> Looking at afs_readpage it looks like its OK to pass in NULL. Same for
> nfs_readpage. They use the file, if provided, to avoid some lookups, but
> seem to deal with not having it.

Yes, and reiser4 does the same.

> This answer by example is of course not authorative nor complete.

Yes... perhaps we should simply change this code to use NULL and
collect the bug-reports ;)

Oleg.

2012-06-14 11:49:38

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

* Oleg Nesterov <[email protected]> [2012-06-13 21:15:19]:

> On 06/12, Oleg Nesterov wrote:
> >
> > On 06/12, Srikar Dronamraju wrote:
> > > >
> > > > Note also that we should move this !UPROBE_COPY_INSN from
> > > > install_breakpoint() to somewhere near alloc_uprobe(). This code
> > > > is called only once, it looks a bit strange to use the "random" mm
> > > > (the first mm vma_prio_tree_foreach() finds) and its mapping to
> > > > verify the insn. In fact this is simply not correct and should be
> > > > fixed, note that on x86 arch_uprobe_analyze_insn() checks
> > >
> > > The reason we "delay" the copy_insn to the first insert is because
> > > we have to get access to mm. For archs like x86, we want to know if the
> > > executable is 32 bit or not
> >
> > Yes. And this is wrong afaics.
> >
> > Once again. This !UPROBE_COPY_INSN code is called only once, and it
> > uses the "random" mm. After that install_breakpoint() just calls
> > set_swbp(another_mm) while the insn can be invalid because
> > another_mm->ia32_compat != mm->ia32_compat.
> >
> > > So in effect, if we get access to
> > > struct file corresponding to the inode and if the inode corresponds to
> > > 32 bit executable file or 64 bit executable file during register, then
> > > we can move it around alloc_uprobe().
> >
> > I don't think this can work. I have another simple fix in mind, I'll
> > write another email later.
>
> For example. Suppose there is some instruction in /lib64/libc.so which
> is valid for 64-bit, but not for 32-bit.
>
> Suppose that a 32-bit application does mmap("/lib64/libc.so", PROT_EXEC).
>

How correct is it to have a 32 bit binary link to a 64 bit binary/library?
what if the 64 bit binary/executable were to make a jump to a 64 bit
address?


> Now. If vma_prio_tree_foreach() finds this 32-bit mm first, uprobe_register()
> fails even if there are other 64-bit applications which could be traced.
>
> Or. uprobe_register() succeeds because it finds a 64-bit mm first, and
> then that 32-bit application actually executes the invalid insn.
>
> We can move arch_uprobe_analyze_insn() outside of !UPROBE_COPY_INSN block.
>
> Or, perhaps, validate_insn_bits() should call both
> validate_insn_32bits() and validate_insn_64bits(), and set the
> UPROBE_VALID_IF_32 / UPROBE_VALID_IF_64 flags. install_breakpoint()
> should do the additinal check before set_swbp() and verify that
> .ia32_compat matches UPROBE_VALID_IF_*.
>

> What do you think?
>

Lets say we do find a 32 bit app and 64 bit app using the same library
and the underlying instruction is valid for tracing in 64 bit and not 32
bit. So when we are registering, and failed to insert a breakpoint for
the 32 bit app, should we just bail out or should we return a failure?

I would probably prefer to read the underlying file something similar to
what exec does and based on the magic decipher if we should verify for
32 bit instructions or 64 bit instructions.

I didnt find a good way to read the first few bytes just by looking at
the inode. So one option is to read the underlying file at the first
insertion (i.e similar to what we do now .. but instead of depending on
ia32_compat of a random mm, depend on the exec magic)

Better option would be read the underlying file at the register time and
return an error even without attempting an insert if the instruction
wasnt valid. But I am still ignorant on how to do this because we need a
struct file to do this.

--
Thanks and Regards
Srikar

2012-06-14 18:21:53

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

On 06/14, Srikar Dronamraju wrote:
>
> * Oleg Nesterov <[email protected]> [2012-06-13 21:15:19]:
>
> > For example. Suppose there is some instruction in /lib64/libc.so which
> > is valid for 64-bit, but not for 32-bit.
> >
> > Suppose that a 32-bit application does mmap("/lib64/libc.so", PROT_EXEC).
> >
>
> How correct is it to have a 32 bit binary link to a 64 bit binary/library?

No, I didn't mean this. I guess you misunderstood my point, see below.

> > Now. If vma_prio_tree_foreach() finds this 32-bit mm first, uprobe_register()
> > fails even if there are other 64-bit applications which could be traced.
> >
> > Or. uprobe_register() succeeds because it finds a 64-bit mm first, and
> > then that 32-bit application actually executes the invalid insn.
> >
> > We can move arch_uprobe_analyze_insn() outside of !UPROBE_COPY_INSN block.
> >
> > Or, perhaps, validate_insn_bits() should call both
> > validate_insn_32bits() and validate_insn_64bits(), and set the
> > UPROBE_VALID_IF_32 / UPROBE_VALID_IF_64 flags. install_breakpoint()
> > should do the additinal check before set_swbp() and verify that
> > .ia32_compat matches UPROBE_VALID_IF_*.
> >
>
> > What do you think?
> >
>
> Lets say we do find a 32 bit app and 64 bit app using the same library
> and the underlying instruction is valid for tracing in 64 bit and not 32
> bit. So when we are registering, and failed to insert a breakpoint for
> the 32 bit app, should we just bail out or should we return a failure?

I do not really know, I tend to think we should not fail. But this is
another story...

Look. Suppose that a 32-bit app starts after uprobe_register() succeeds.
In this case we have no option, uprobe_mmap()->install_breakpoint()
should "silently" fail. Currently it doesn't, this is one of the reasons
why I think the validation logic is wrong.

And. if install_breakpoint() can fail later anyway (in this case), then
I think uprobe_register() should not fail.

But probably this needs more discussion.


> I would probably prefer to read the underlying file something similar to
> what exec does and based on the magic decipher if we should verify for
> 32 bit instructions or 64 bit instructions.

But this can't protect from the malicious user who does
mmap(64-bit-code, PROT_EXEC) from a 32-bit app, and this can confuse
uprobes even if that 32-bit app never tries to actually execute that
64-bit-code.

That is why I think we need the additional (and arch-dependant) check
before every set_swbp(), but arch_uprobe_analyze_insn/etc should not
depend on task/mm/vaddr/whatever.

Oleg.

2012-06-15 12:37:21

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

> >
> > Lets say we do find a 32 bit app and 64 bit app using the same library
> > and the underlying instruction is valid for tracing in 64 bit and not 32
> > bit. So when we are registering, and failed to insert a breakpoint for
> > the 32 bit app, should we just bail out or should we return a failure?
>
> I do not really know, I tend to think we should not fail. But this is
> another story...
>
> Look. Suppose that a 32-bit app starts after uprobe_register() succeeds.
> In this case we have no option, uprobe_mmap()->install_breakpoint()
> should "silently" fail. Currently it doesn't, this is one of the reasons
> why I think the validation logic is wrong.
>
> And. if install_breakpoint() can fail later anyway (in this case), then
> I think uprobe_register() should not fail.
>
> But probably this needs more discussion.
>
>
> > I would probably prefer to read the underlying file something similar to
> > what exec does and based on the magic decipher if we should verify for
> > 32 bit instructions or 64 bit instructions.
>
> But this can't protect from the malicious user who does
> mmap(64-bit-code, PROT_EXEC) from a 32-bit app, and this can confuse
> uprobes even if that 32-bit app never tries to actually execute that
> 64-bit-code.
>

So if we read just after we allocate uprobe struct but before
probe insertion, then we dont need to check this for each process.

So if the library was 64 bit mapped in 32 bit process and has a valid
instruction for 64 bit, then we just check for valid 64 bit instructions
and allow insertion of the breakpoint even in the 32 bit process.

So in this case, the behaviour of such processes would be similar with
or without breakpoints.

> That is why I think we need the additional (and arch-dependant) check
> before every set_swbp(), but arch_uprobe_analyze_insn/etc should not
> depend on task/mm/vaddr/whatever.
>

Here is a very crude implementation of the same.
Also this depends on read_mapping_page taking NULL as an valid argument
for file. As a side-effect we can do away with UPROBE_COPY_INSN which
was set and read at just one place.

1. Move the copy_insn to just after alloc_uprobe.
2. Assume that copy_insn can work without struct file
3. Read the elfhdr for the file.
4. Pass the elfhdr to the arch specific analyze insn
5. Move the analyze instruction to before the actual probe insertion.

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 94c46af..41effbc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -32,6 +32,7 @@
#include <linux/swap.h> /* try_to_free_swap */
#include <linux/ptrace.h> /* user_enable_single_step */
#include <linux/kdebug.h> /* notifier mechanism */
+#include <linux/elf.h>

#include <linux/uprobes.h>

@@ -578,16 +579,14 @@ static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc)
}

static int
-__copy_insn(struct address_space *mapping, struct file *filp, char *insn,
- unsigned long nbytes, loff_t offset)
+__copy_insn(struct address_space *mapping, char *insn, unsigned long nbytes,
+ loff_t offset)
{
struct page *page;
void *vaddr;
unsigned long off;
pgoff_t idx;

- if (!filp)
- return -EINVAL;
if (!mapping->a_ops->readpage)
return -EIO;

@@ -598,7 +597,7 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
* Ensure that the page that has the original instruction is
* populated and in page-cache.
*/
- page = read_mapping_page(mapping, idx, filp);
+ page = read_mapping_page(mapping, idx, NULL);
if (IS_ERR(page))
return PTR_ERR(page);

@@ -610,7 +609,7 @@ __copy_insn(struct address_space *mapping, struct file *filp, char *insn,
return 0;
}

-static int copy_insn(struct uprobe *uprobe, struct file *filp)
+static int copy_insn(struct uprobe *uprobe)
{
struct address_space *mapping;
unsigned long nbytes;
@@ -627,13 +626,13 @@ static int copy_insn(struct uprobe *uprobe, struct file *filp)

/* Instruction at the page-boundary; copy bytes in second page */
if (nbytes < bytes) {
- int err = __copy_insn(mapping, filp, uprobe->arch.insn + nbytes,
+ int err = __copy_insn(mapping, uprobe->arch.insn + nbytes,
bytes - nbytes, uprobe->offset + nbytes);
if (err)
return err;
bytes = nbytes;
}
- return __copy_insn(mapping, filp, uprobe->arch.insn, bytes, uprobe->offset);
+ return __copy_insn(mapping, uprobe->arch.insn, bytes, uprobe->offset);
}

/*
@@ -675,25 +674,6 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm,
if (!uprobe->consumers)
return -EEXIST;

- if (!(uprobe->flags & UPROBE_COPY_INSN)) {
- ret = copy_insn(uprobe, vma->vm_file);
- if (ret)
- return ret;
-
- if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn))
- return -ENOTSUPP;
-
- ret = arch_uprobe_analyze_insn(&uprobe->arch, mm, vaddr);
- if (ret)
- return ret;
-
- /* write_opcode() assumes we don't cross page boundary */
- BUG_ON((uprobe->offset & ~PAGE_MASK) +
- UPROBE_SWBP_INSN_SIZE > PAGE_SIZE);
-
- uprobe->flags |= UPROBE_COPY_INSN;
- }
-
/*
* Ideally, should be updating the probe count after the breakpoint
* has been successfully inserted. However a thread could hit the
@@ -897,6 +877,7 @@ static void __uprobe_unregister(struct uprobe *uprobe)
*/
int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc)
{
+ struct elfhdr elf_ex;
struct uprobe *uprobe;
int ret;

@@ -906,10 +887,29 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
if (offset > i_size_read(inode))
return -EINVAL;

- ret = 0;
+ if ((offset & ~PAGE_MASK) + UPROBE_SWBP_INSN_SIZE > PAGE_SIZE)
+ return -EINVAL;
+
mutex_lock(uprobes_hash(inode));
uprobe = alloc_uprobe(inode, offset);

+ ret = copy_insn(uprobe);
+ if (ret)
+ goto out;
+
+ if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn)) {
+ ret = -ENOTSUPP;
+ goto out;
+ }
+
+ ret = __copy_insn(uprobe->inode->i_mapping, &elf_ex, sizeof(elf_ex), 0);
+ if (ret)
+ goto out;
+
+ ret = arch_uprobe_analyze_insn(&uprobe->arch, uprobe->offset, &elf_ex);
+ if (ret)
+ goto out;
+
if (uprobe && !consumer_add(uprobe, uc)) {
ret = __uprobe_register(uprobe);
if (ret) {
@@ -920,6 +920,7 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *
}
}

+out:
mutex_unlock(uprobes_hash(inode));
put_uprobe(uprobe);

2012-06-16 18:08:10

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

Srikar,

To clarify: I am not arguing, I am asking because I know nothing about
asm/opcodes/etc.

On 06/15, Srikar Dronamraju wrote:
>
> > But this can't protect from the malicious user who does
> > mmap(64-bit-code, PROT_EXEC) from a 32-bit app, and this can confuse
> > uprobes even if that 32-bit app never tries to actually execute that
> > 64-bit-code.
> >
>
> So if we read just after we allocate uprobe struct but before
> probe insertion, then we dont need to check this for each process.
>
> So if the library was 64 bit mapped in 32 bit process and has a valid
> instruction for 64 bit, then we just check for valid 64 bit instructions
> and allow insertion of the breakpoint even in the 32 bit process.

And what happens if this insn is not valid from validate_insn_32bits()
pov and that 32-bit application tries to execute it? Or vice versa,
see below.

> Here is a very crude implementation of the same.
> Also this depends on read_mapping_page taking NULL as an valid argument
> for file. As a side-effect we can do away with UPROBE_COPY_INSN which
> was set and read at just one place.
>
> 1. Move the copy_insn to just after alloc_uprobe.
> 2. Assume that copy_insn can work without struct file
> ...
> 5. Move the analyze instruction to before the actual probe insertion.

OK, this is what I think we should do anyway (at least try to do).

> 3. Read the elfhdr for the file.
> 4. Pass the elfhdr to the arch specific analyze insn

This assumes that everything is elf. Why? An application is free to
create a file in any format and do mmap(PROT_EXEC).

But OK, probably we can restrict uprobe_register() to work only with
elf files which do not mix 32/64 bits.




My concern is, are you sure an evil user can't confuse uprobes and
do something bad?

Just to explain what I mean. For example, we certainly do not want
to allow to probe the "syscall" insn, at least with the current
implementation. So I assume that validate_insn_64bits("syscall")
must fail.

Are you sure that validate_insn_32bits("syscall") will fail too?

Of course, I am not asking about "syscall" in particular. In general,
suppose that, say, validate_insn_64bits() returns true. Are you sure
this insn can't do something different and harmful if it is executed
by __USER32_CS task?

Oleg.

2012-06-18 12:11:27

by Srikar Dronamraju

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

* Oleg Nesterov <[email protected]> [2012-06-16 20:05:55]:

> Srikar,
>
> To clarify: I am not arguing, I am asking because I know nothing about
> asm/opcodes/etc.

Certainly not, I think you are talking about a valid concern.

>
> On 06/15, Srikar Dronamraju wrote:
> >
> > > But this can't protect from the malicious user who does
> > > mmap(64-bit-code, PROT_EXEC) from a 32-bit app, and this can confuse
> > > uprobes even if that 32-bit app never tries to actually execute that
> > > 64-bit-code.
> > >
> >
> > So if we read just after we allocate uprobe struct but before
> > probe insertion, then we dont need to check this for each process.
> >
> > So if the library was 64 bit mapped in 32 bit process and has a valid
> > instruction for 64 bit, then we just check for valid 64 bit instructions
> > and allow insertion of the breakpoint even in the 32 bit process.
>
> And what happens if this insn is not valid from validate_insn_32bits()
> pov and that 32-bit application tries to execute it? Or vice versa,
> see below.
>
> > Here is a very crude implementation of the same.
> > Also this depends on read_mapping_page taking NULL as an valid argument
> > for file. As a side-effect we can do away with UPROBE_COPY_INSN which
> > was set and read at just one place.
> >
> > 1. Move the copy_insn to just after alloc_uprobe.
> > 2. Assume that copy_insn can work without struct file
> > ...
> > 5. Move the analyze instruction to before the actual probe insertion.
>
> OK, this is what I think we should do anyway (at least try to do).
>
> > 3. Read the elfhdr for the file.
> > 4. Pass the elfhdr to the arch specific analyze insn
>
> This assumes that everything is elf. Why? An application is free to
> create a file in any format and do mmap(PROT_EXEC).
>
> But OK, probably we can restrict uprobe_register() to work only with
> elf files which do not mix 32/64 bits.
>

We should probably be able to fix file type part, As I said it was just
a thought.

>
> My concern is, are you sure an evil user can't confuse uprobes and
> do something bad?
>
> Just to explain what I mean. For example, we certainly do not want
> to allow to probe the "syscall" insn, at least with the current
> implementation. So I assume that validate_insn_64bits("syscall")
> must fail.
>
> Are you sure that validate_insn_32bits("syscall") will fail too?
>
> Of course, I am not asking about "syscall" in particular. In general,
> suppose that, say, validate_insn_64bits() returns true. Are you sure
> this insn can't do something different and harmful if it is executed
> by __USER32_CS task?
>

validate_insn_64bits can return fail for two cases.
1. Few opcodes that uprobes refuses to place probes.
2. opcodes that are invalid from a 64 perspective.

validate_insn_32bits() can return fail for similar reasons.

The first set is a common set between validate_insn_64bits /
validate_insn_32bits. This includes the syscall, lock prefix, etc.

Coming to the second set, there can be an instruction that is valid for
64 bit and not valid for 32 bit.

If the instruction is valid for 32 bit set but invalid instruction for
64 bit, and is part of a 32 bit executable file but was mapped by a 64
bit process. We would allow it to be probed since we only check for 32
bit set. [Assuming it runs till a breakpoint hit;] I assume singlestep
should generate a SIGILL signal since its not a valid 64 bit
instruction. However this behaviour is on par with the behaviour if the
probe was not hit too. i.e Once this invalid instruction was executed,
It would have generated SIGILL. The same should hold true for a 32 bit
invalid instruction in a 64 bit executable mapped into 32 bit process.

Please do let me know if my understanding is incorrect or if there are
loopholes

Again, this is all dependent on the ability to detect the executable
type from the inode. If there is any case we cant detect the executable
type, I think we should just skip this discussion and go with your
suggestion.

--
Thanks and Regards
Srikar

2012-06-20 17:18:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] uprobes: Pass probed vaddr to arch_uprobe_analyze_insn()

On 06/18, Srikar Dronamraju wrote:
>
> > My concern is, are you sure an evil user can't confuse uprobes and
> > do something bad?
> >
> > Just to explain what I mean. For example, we certainly do not want
> > to allow to probe the "syscall" insn, at least with the current
> > implementation. So I assume that validate_insn_64bits("syscall")
> > must fail.
> >
> > Are you sure that validate_insn_32bits("syscall") will fail too?
> >
> > Of course, I am not asking about "syscall" in particular. In general,
> > suppose that, say, validate_insn_64bits() returns true. Are you sure
> > this insn can't do something different and harmful if it is executed
> > by __USER32_CS task?
> >
>
> validate_insn_64bits can return fail for two cases.
> 1. Few opcodes that uprobes refuses to place probes.
> 2. opcodes that are invalid from a 64 perspective.
>
> validate_insn_32bits() can return fail for similar reasons.
>
> The first set is a common set between validate_insn_64bits /
> validate_insn_32bits. This includes the syscall, lock prefix, etc.
>
> Coming to the second set, there can be an instruction that is valid for
> 64 bit and not valid for 32 bit.
>
> If the instruction is valid for 32 bit set but invalid instruction for
> 64 bit, and is part of a 32 bit executable file but was mapped by a 64
> bit process. We would allow it to be probed since we only check for 32
> bit set. [Assuming it runs till a breakpoint hit;] I assume singlestep
> should generate a SIGILL signal since its not a valid 64 bit
> instruction. However this behaviour is on par with the behaviour if the
> probe was not hit too. i.e Once this invalid instruction was executed,
> It would have generated SIGILL. The same should hold true for a 32 bit
> invalid instruction in a 64 bit executable mapped into 32 bit process.

SIGILL (invalid insn) is fine, I was worried about the possibility
to allow to execute the valid (from CPU pov) but "dangerous" (from
uprobes pov) insn.

> Please do let me know if my understanding is incorrect or if there are
> loopholes

Well, you understand this indefinitely better than me ;) If you do not
see any hole then everything should be fine, I think.

> Again, this is all dependent on the ability to detect the executable
> type from the inode.

Yes... I am still not sure about this. But again, I am not arguing,
just I do not know.

Oleg.