2009-03-06 15:35:28

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 0/4] Text edit lock and atomic text_poke()

Hi,

Here is a series of patches which introduce text_mutex for protecting
editing kernel text from each other subsystems, and make text_poke()
atomic by using fixmap.

BTW,

> Paravirt and alternatives are always done when SMP is inactive, so there is no
> need to use locks.

Mathieu, I'm not sure that means. alternatives will be called from module
init code and other place where the system has already been running multi-
-threads(and they use smp_alt mutex). So, is it possible that those functions
will sleep or yield to another process?

Anyway, I added a new patch which locks text_mutex in alternative_smp_*.

Thank you,

arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/kernel/alternative.c | 29 ++++++++++++++++++++---------
include/linux/memory.h | 6 ++++++
kernel/kprobes.c | 15 +++++++++++++--
mm/memory.c | 9 +++++++++
5 files changed, 50 insertions(+), 11 deletions(-)

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]


2009-03-06 15:36:32

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 1/4] Text Edit Lock - Architecture Independent Code (v2)

From: Mathieu Desnoyers <[email protected]>

This is an architecture independant synchronization around kernel text
modifications through use of a global mutex.

A mutex has been chosen so that kprobes, the main user of this, can sleep during
memory allocation between the memory read of the instructions it must replace
and the memory write of the breakpoint.

Other user of this interface: immediate values.

Paravirt and alternatives are always done when SMP is inactive, so there is no
need to use locks.

Changelog :
Export text_mutex directly.

Signed-off-by: Mathieu Desnoyers <[email protected]>
CC: Andi Kleen <[email protected]>
CC: Ingo Molnar <[email protected]>
---
include/linux/memory.h | 6 ++++++
mm/memory.c | 9 +++++++++
2 files changed, 15 insertions(+)

Index: 2.6.29-rc7/include/linux/memory.h
===================================================================
--- 2.6.29-rc7.orig/include/linux/memory.h
+++ 2.6.29-rc7/include/linux/memory.h
@@ -99,4 +99,10 @@ enum mem_add_context { BOOT, HOTPLUG };
#define hotplug_memory_notifier(fn, pri) do { } while (0)
#endif

+/*
+ * Kernel text modification mutex, used for code patching. Users of this lock
+ * can sleep.
+ */
+extern struct mutex text_mutex;
+
#endif /* _LINUX_MEMORY_H_ */
Index: 2.6.29-rc7/mm/memory.c
===================================================================
--- 2.6.29-rc7.orig/mm/memory.c
+++ 2.6.29-rc7/mm/memory.c
@@ -48,6 +48,8 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/delayacct.h>
+#include <linux/kprobes.h>
+#include <linux/mutex.h>
#include <linux/init.h>
#include <linux/writeback.h>
#include <linux/memcontrol.h>
@@ -99,6 +101,13 @@ int randomize_va_space __read_mostly =
2;
#endif

+/*
+ * mutex protecting text section modification (dynamic code patching).
+ * some users need to sleep (allocating memory...) while they hold this lock.
+ */
+DEFINE_MUTEX(text_mutex);
+EXPORT_SYMBOL_GPL(text_mutex);
+
static int __init disable_randmaps(char *s)
{
randomize_va_space = 0;
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 15:37:07

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 2/4]Text Edit Lock - kprobes architecture independent support (v3)

From: Mathieu Desnoyers <[email protected]>

Use the mutual exclusion provided by the text edit lock in the kprobes code. It
allows coherent manipulation of the kernel code by other subsystems.

Changelog:

Move the kernel_text_lock/unlock out of the for loops.
Use text_mutex directly instead of a function.
Remove whitespace modifications.

(note : kprobes_mutex is always taken outside of text_mutex)

Signed-off-by: Mathieu Desnoyers <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
CC: [email protected]
CC: [email protected]
CC: [email protected]
CC: Roel Kluin <[email protected]>
---
kernel/kprobes.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

Index: 2.6.29-rc7/kernel/kprobes.c
===================================================================
--- 2.6.29-rc7.orig/kernel/kprobes.c
+++ 2.6.29-rc7/kernel/kprobes.c
@@ -43,6 +43,7 @@
#include <linux/seq_file.h>
#include <linux/debugfs.h>
#include <linux/kdebug.h>
+#include <linux/memory.h>

#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
@@ -699,9 +700,10 @@ int __kprobes register_kprobe(struct kpr
goto out;
}

+ mutex_lock(&text_mutex);
ret = arch_prepare_kprobe(p);
if (ret)
- goto out;
+ goto out_unlock_text;

INIT_HLIST_NODE(&p->hlist);
hlist_add_head_rcu(&p->hlist,
@@ -710,6 +712,8 @@ int __kprobes register_kprobe(struct kpr
if (kprobe_enabled)
arch_arm_kprobe(p);

+out_unlock_text:
+ mutex_unlock(&text_mutex);
out:
mutex_unlock(&kprobe_mutex);

@@ -746,8 +750,11 @@ valid_p:
* enabled and not gone - otherwise, the breakpoint would
* already have been removed. We save on flushing icache.
*/
- if (kprobe_enabled && !kprobe_gone(old_p))
+ if (kprobe_enabled && !kprobe_gone(old_p)) {
+ mutex_lock(&text_mutex);
arch_disarm_kprobe(p);
+ mutex_unlock(&text_mutex);
+ }
hlist_del_rcu(&old_p->hlist);
} else {
if (p->break_handler && !kprobe_gone(p))
@@ -1280,12 +1287,14 @@ static void __kprobes enable_all_kprobes
if (kprobe_enabled)
goto already_enabled;

+ mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist)
if (!kprobe_gone(p))
arch_arm_kprobe(p);
}
+ mutex_unlock(&text_mutex);

kprobe_enabled = true;
printk(KERN_INFO "Kprobes globally enabled\n");
@@ -1310,6 +1319,7 @@ static void __kprobes disable_all_kprobe

kprobe_enabled = false;
printk(KERN_INFO "Kprobes globally disabled\n");
+ mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
@@ -1318,6 +1328,7 @@ static void __kprobes disable_all_kprobe
}
}

+ mutex_unlock(&text_mutex);
mutex_unlock(&kprobe_mutex);
/* Allow all currently running kprobes to complete */
synchronize_sched();
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 15:37:46

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support

Use the mutual exclusion provided by the text edit lock in alternatives code.
Since alternative_smp_* will be called from module init code, etc,
we'd better protect it from other subsystems.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/kernel/alternative.c | 5 +++++
1 file changed, 5 insertions(+)

Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
===================================================================
--- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
+++ 2.6.29-rc7/arch/x86/kernel/alternative.c
@@ -5,6 +5,7 @@
#include <linux/kprobes.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
+#include <linux/memory.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
{
u8 **ptr;

+ mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
@@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
/* turn DS segment override prefix into lock prefix */
text_poke(*ptr, ((unsigned char []){0xf0}), 1);
};
+ mutex_unlock(&text_mutex);
}

static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
@@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
if (noreplace_smp)
return;

+ mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
@@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
/* turn lock prefix into DS segment override prefix */
text_poke(*ptr, ((unsigned char []){0x3E}), 1);
};
+ mutex_unlock(&text_mutex);
}

struct smp_alt_module {
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 15:38:21

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 4/4] Atomic text_poke() with fixmap

Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
#include <asm/nmi.h>
#include <asm/vsyscall.h>
#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
#include <asm/io.h>
+#include <asm/fixmap.h>

#define MAX_PATCH_LEN (255-1)

@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page.
+ *
+ * Note: Must be called under text_mutex.
*/
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
+ unsigned long flags;
char *vaddr;
- int nr_pages = 2;
struct page *pages[2];
int i;

- might_sleep();
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
- if (!pages[1])
- nr_pages = 1;
- vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- BUG_ON(!vaddr);
- local_irq_disable();
+ set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+ if (pages[1])
+ set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+ vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
+ local_irq_save(flags);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_enable();
- vunmap(vaddr);
+ local_irq_restore(flags);
+ clear_fixmap(FIX_TEXT_POKE0);
+ if (pages[1])
+ clear_fixmap(FIX_TEXT_POKE1);
+ local_flush_tlb();
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
===================================================================
--- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
+++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
+ FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
+ FIX_TEXT_POKE1,
__end_of_permanent_fixed_addresses,
#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
FIX_OHCI1394_BASE,
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 18:09:27

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 0/4] Text edit lock and atomic text_poke()

* Masami Hiramatsu ([email protected]) wrote:
> Hi,
>
> Here is a series of patches which introduce text_mutex for protecting
> editing kernel text from each other subsystems, and make text_poke()
> atomic by using fixmap.
>
> BTW,
>
> > Paravirt and alternatives are always done when SMP is inactive, so there is no
> > need to use locks.
>
> Mathieu, I'm not sure that means. alternatives will be called from module
> init code and other place where the system has already been running multi-
> -threads(and they use smp_alt mutex). So, is it possible that those functions
> will sleep or yield to another process?
>

As I remember, module code uses text_poke_early when modifying the
module text _before_ the module is made available to the rest of the
kernel. Therefore, it behaves as if it was in a UP system, and that's
ok, because only one CPU has to touch the text. Therefore, there is no
need to use text_poke, and therefore no need to take the text mutex.

Or maybe has it changed ? But I doubt so.

Mathieu

> Anyway, I added a new patch which locks text_mutex in alternative_smp_*.
>
> Thank you,
>
> arch/x86/include/asm/fixmap.h | 2 ++
> arch/x86/kernel/alternative.c | 29 ++++++++++++++++++++---------
> include/linux/memory.h | 6 ++++++
> kernel/kprobes.c | 15 +++++++++++++--
> mm/memory.c | 9 +++++++++
> 5 files changed, 50 insertions(+), 11 deletions(-)
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 18:12:32

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH -tip 0/4] Text edit lock and atomic text_poke()


On Fri, 6 Mar 2009, Mathieu Desnoyers wrote:

> * Masami Hiramatsu ([email protected]) wrote:
> > Hi,
> >
> > Here is a series of patches which introduce text_mutex for protecting
> > editing kernel text from each other subsystems, and make text_poke()
> > atomic by using fixmap.
> >
> > BTW,
> >
> > > Paravirt and alternatives are always done when SMP is inactive, so there is no
> > > need to use locks.
> >
> > Mathieu, I'm not sure that means. alternatives will be called from module
> > init code and other place where the system has already been running multi-
> > -threads(and they use smp_alt mutex). So, is it possible that those functions
> > will sleep or yield to another process?
> >
>
> As I remember, module code uses text_poke_early when modifying the
> module text _before_ the module is made available to the rest of the
> kernel. Therefore, it behaves as if it was in a UP system, and that's
> ok, because only one CPU has to touch the text. Therefore, there is no
> need to use text_poke, and therefore no need to take the text mutex.
>
> Or maybe has it changed ? But I doubt so.
>

That is correct. Currently ftrace simply does a probe_kernel_write on
module code to convert the calls to mcount into nops. This is done before
the module is available, and does not require stop_machine nor break
points. Since modules are not included in the DEBUG_RODATA there's no need
to play games with the page tables either.

-- Steve

2009-03-06 18:16:57

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support

* Masami Hiramatsu ([email protected]) wrote:
> Use the mutual exclusion provided by the text edit lock in alternatives code.
> Since alternative_smp_* will be called from module init code, etc,
> we'd better protect it from other subsystems.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/alternative.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
> ===================================================================
> --- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
> +++ 2.6.29-rc7/arch/x86/kernel/alternative.c
> @@ -5,6 +5,7 @@
> #include <linux/kprobes.h>
> #include <linux/mm.h>
> #include <linux/vmalloc.h>
> +#include <linux/memory.h>
> #include <asm/alternative.h>
> #include <asm/sections.h>
> #include <asm/pgtable.h>
> @@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
> {
> u8 **ptr;
>
> + mutex_lock(&text_mutex);
> for (ptr = start; ptr < end; ptr++) {
> if (*ptr < text)
> continue;
> @@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
> /* turn DS segment override prefix into lock prefix */
> text_poke(*ptr, ((unsigned char []){0xf0}), 1);
> };
> + mutex_unlock(&text_mutex);
> }
>
> static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
> @@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
> if (noreplace_smp)
> return;
>
> + mutex_lock(&text_mutex);
> for (ptr = start; ptr < end; ptr++) {
> if (*ptr < text)
> continue;
> @@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
> /* turn lock prefix into DS segment override prefix */
> text_poke(*ptr, ((unsigned char []){0x3E}), 1);
> };
> + mutex_unlock(&text_mutex);

And for these cases, the "alternatives_smp_lock" and
"alternatives_smp_unlock", the system is running as uniprocessor.
Therefore it's enough to disable interrupts within text_poke to make it
work correctly.

Mathieu

> }
>
> struct smp_alt_module {
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 18:19:18

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap

* Masami Hiramatsu ([email protected]) wrote:
> Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> and delayed unmapping.
>
> At the result of above change, text_poke() becomes atomic and can be called
> from stop_machine() etc.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Acked-by: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
> arch/x86/include/asm/fixmap.h | 2 ++
> arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
> 2 files changed, 17 insertions(+), 9 deletions(-)
>
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -13,7 +13,9 @@
> #include <asm/nmi.h>
> #include <asm/vsyscall.h>
> #include <asm/cacheflush.h>
> +#include <asm/tlbflush.h>
> #include <asm/io.h>
> +#include <asm/fixmap.h>
>
> #define MAX_PATCH_LEN (255-1)
>
> @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
> * It means the size must be writable atomically and the address must be aligned
> * in a way that permits an atomic write. It also makes sure we fit on a single
> * page.
> + *
> + * Note: Must be called under text_mutex.
> */
> void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> {
> + unsigned long flags;
> char *vaddr;
> - int nr_pages = 2;
> struct page *pages[2];
> int i;
>
> - might_sleep();
> if (!core_kernel_text((unsigned long)addr)) {
> pages[0] = vmalloc_to_page(addr);
> pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> pages[1] = virt_to_page(addr + PAGE_SIZE);
> }
> BUG_ON(!pages[0]);
> - if (!pages[1])
> - nr_pages = 1;
> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> - BUG_ON(!vaddr);
> - local_irq_disable();
> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));

Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
local_irq_save ? If yes, that would be better, especially for the SMP
alternatives code, which would rely on interrupt disabling in text_poke
for consistency (the mutex is not needed there).

Mathieu

> + if (pages[1])
> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> + local_irq_save(flags);
> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - local_irq_enable();
> - vunmap(vaddr);
> + local_irq_restore(flags);
> + clear_fixmap(FIX_TEXT_POKE0);
> + if (pages[1])
> + clear_fixmap(FIX_TEXT_POKE1);
> + local_flush_tlb();
> sync_core();
> /* Could also do a CLFLUSH here to speed up CPU recovery; but
> that causes hangs on some VIA CPUs. */
> Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
> +++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
> @@ -111,6 +111,8 @@ enum fixed_addresses {
> #ifdef CONFIG_PARAVIRT
> FIX_PARAVIRT_BOOTMAP,
> #endif
> + FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
> + FIX_TEXT_POKE1,
> __end_of_permanent_fixed_addresses,
> #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
> FIX_OHCI1394_BASE,
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 18:19:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap


* Mathieu Desnoyers <[email protected]> wrote:

> * Masami Hiramatsu ([email protected]) wrote:
> > Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> > and delayed unmapping.
> >
> > At the result of above change, text_poke() becomes atomic and can be called
> > from stop_machine() etc.
> >
> > Signed-off-by: Masami Hiramatsu <[email protected]>
> > Acked-by: Mathieu Desnoyers <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > ---
> > arch/x86/include/asm/fixmap.h | 2 ++
> > arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
> > 2 files changed, 17 insertions(+), 9 deletions(-)
> >
> > Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> > ===================================================================
> > --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> > +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> > @@ -13,7 +13,9 @@
> > #include <asm/nmi.h>
> > #include <asm/vsyscall.h>
> > #include <asm/cacheflush.h>
> > +#include <asm/tlbflush.h>
> > #include <asm/io.h>
> > +#include <asm/fixmap.h>
> >
> > #define MAX_PATCH_LEN (255-1)
> >
> > @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
> > * It means the size must be writable atomically and the address must be aligned
> > * in a way that permits an atomic write. It also makes sure we fit on a single
> > * page.
> > + *
> > + * Note: Must be called under text_mutex.
> > */
> > void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> > {
> > + unsigned long flags;
> > char *vaddr;
> > - int nr_pages = 2;
> > struct page *pages[2];
> > int i;
> >
> > - might_sleep();
> > if (!core_kernel_text((unsigned long)addr)) {
> > pages[0] = vmalloc_to_page(addr);
> > pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> > pages[1] = virt_to_page(addr + PAGE_SIZE);
> > }
> > BUG_ON(!pages[0]);
> > - if (!pages[1])
> > - nr_pages = 1;
> > - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > - BUG_ON(!vaddr);
> > - local_irq_disable();
> > + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>
> Can the set_fixmap/clear_fixmap/local_flush_tlb be called
> within local_irq_save ? If yes, that would be better,
> especially for the SMP alternatives code, which would rely on
> interrupt disabling in text_poke for consistency (the mutex is
> not needed there).

yes, it is atomic.

Ingo

2009-03-06 18:32:55

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2

Mathieu Desnoyers wrote:
>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>> pages[1] = virt_to_page(addr + PAGE_SIZE);
>> }
>> BUG_ON(!pages[0]);
>> - if (!pages[1])
>> - nr_pages = 1;
>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>> - BUG_ON(!vaddr);
>> - local_irq_disable();
>> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>
> Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
> local_irq_save ? If yes, that would be better, especially for the SMP
> alternatives code, which would rely on interrupt disabling in text_poke
> for consistency (the mutex is not needed there).

Yes, that just changes pte.

---
Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
#include <asm/nmi.h>
#include <asm/vsyscall.h>
#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
#include <asm/io.h>
+#include <asm/fixmap.h>

#define MAX_PATCH_LEN (255-1)

@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page.
+ *
+ * Note: Must be called under text_mutex.
*/
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
+ unsigned long flags;
char *vaddr;
- int nr_pages = 2;
struct page *pages[2];
int i;

- might_sleep();
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
- if (!pages[1])
- nr_pages = 1;
- vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- BUG_ON(!vaddr);
- local_irq_disable();
+ local_irq_save(flags);
+ set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+ if (pages[1])
+ set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+ vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_enable();
- vunmap(vaddr);
+ clear_fixmap(FIX_TEXT_POKE0);
+ if (pages[1])
+ clear_fixmap(FIX_TEXT_POKE1);
+ local_flush_tlb();
+ local_irq_restore(flags);
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
===================================================================
--- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
+++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
+ FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
+ FIX_TEXT_POKE1,
__end_of_permanent_fixed_addresses,
#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
FIX_OHCI1394_BASE,
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 18:42:33

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip 3/4]Text Edit Lock - Smp alternatives support

Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Use the mutual exclusion provided by the text edit lock in alternatives code.
>> Since alternative_smp_* will be called from module init code, etc,
>> we'd better protect it from other subsystems.
>>
>> Signed-off-by: Masami Hiramatsu <[email protected]>
>> Cc: Mathieu Desnoyers <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> ---
>> arch/x86/kernel/alternative.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> Index: 2.6.29-rc7/arch/x86/kernel/alternative.c
>> ===================================================================
>> --- 2.6.29-rc7.orig/arch/x86/kernel/alternative.c
>> +++ 2.6.29-rc7/arch/x86/kernel/alternative.c
>> @@ -5,6 +5,7 @@
>> #include <linux/kprobes.h>
>> #include <linux/mm.h>
>> #include <linux/vmalloc.h>
>> +#include <linux/memory.h>
>> #include <asm/alternative.h>
>> #include <asm/sections.h>
>> #include <asm/pgtable.h>
>> @@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **s
>> {
>> u8 **ptr;
>>
>> + mutex_lock(&text_mutex);
>> for (ptr = start; ptr < end; ptr++) {
>> if (*ptr < text)
>> continue;
>> @@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **s
>> /* turn DS segment override prefix into lock prefix */
>> text_poke(*ptr, ((unsigned char []){0xf0}), 1);
>> };
>> + mutex_unlock(&text_mutex);
>> }
>>
>> static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
>> @@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 *
>> if (noreplace_smp)
>> return;
>>
>> + mutex_lock(&text_mutex);
>> for (ptr = start; ptr < end; ptr++) {
>> if (*ptr < text)
>> continue;
>> @@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 *
>> /* turn lock prefix into DS segment override prefix */
>> text_poke(*ptr, ((unsigned char []){0x3E}), 1);
>> };
>> + mutex_unlock(&text_mutex);
>
> And for these cases, the "alternatives_smp_lock" and
> "alternatives_smp_unlock", the system is running as uniprocessor.
> Therefore it's enough to disable interrupts within text_poke to make it
> work correctly.

Hmm, you are right. With my atomic-text_poke() take2 patch,
we don't need this patch.

Thank you,

>
> Mathieu
>
>> }
>>
>> struct smp_alt_module {
>> --
>> Masami Hiramatsu
>>
>> Software Engineer
>> Hitachi Computer Products (America) Inc.
>> Software Solutions Division
>>
>> e-mail: [email protected]
>>
>

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 18:45:38

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2

* Masami Hiramatsu ([email protected]) wrote:
> Mathieu Desnoyers wrote:
> >> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >> pages[1] = virt_to_page(addr + PAGE_SIZE);
> >> }
> >> BUG_ON(!pages[0]);
> >> - if (!pages[1])
> >> - nr_pages = 1;
> >> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >> - BUG_ON(!vaddr);
> >> - local_irq_disable();
> >> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >
> > Can the set_fixmap/clear_fixmap/local_flush_tlb be called within
> > local_irq_save ? If yes, that would be better, especially for the SMP
> > alternatives code, which would rely on interrupt disabling in text_poke
> > for consistency (the mutex is not needed there).
>
> Yes, that just changes pte.
>
> ---
> Use fixmaps instead of vmap/vunmap in text_poke() for avoiding page allocation
> and delayed unmapping.
>
> At the result of above change, text_poke() becomes atomic and can be called
> from stop_machine() etc.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Acked-by: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
> arch/x86/include/asm/fixmap.h | 2 ++
> arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
> 2 files changed, 17 insertions(+), 9 deletions(-)
>
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -13,7 +13,9 @@
> #include <asm/nmi.h>
> #include <asm/vsyscall.h>
> #include <asm/cacheflush.h>
> +#include <asm/tlbflush.h>
> #include <asm/io.h>
> +#include <asm/fixmap.h>
>
> #define MAX_PATCH_LEN (255-1)
>
> @@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const
> * It means the size must be writable atomically and the address must be aligned
> * in a way that permits an atomic write. It also makes sure we fit on a single
> * page.
> + *
> + * Note: Must be called under text_mutex.

Maybe you could add "or in specific cases ensuring UP behavior".

The rest looks fine.

Acked-by: Mathieu Desnoyers <[email protected]>

> */
> void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
> {
> + unsigned long flags;
> char *vaddr;
> - int nr_pages = 2;
> struct page *pages[2];
> int i;
>
> - might_sleep();
> if (!core_kernel_text((unsigned long)addr)) {
> pages[0] = vmalloc_to_page(addr);
> pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> pages[1] = virt_to_page(addr + PAGE_SIZE);
> }
> BUG_ON(!pages[0]);
> - if (!pages[1])
> - nr_pages = 1;
> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> - BUG_ON(!vaddr);
> - local_irq_disable();
> + local_irq_save(flags);
> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> + if (pages[1])
> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - local_irq_enable();
> - vunmap(vaddr);
> + clear_fixmap(FIX_TEXT_POKE0);
> + if (pages[1])
> + clear_fixmap(FIX_TEXT_POKE1);
> + local_flush_tlb();
> + local_irq_restore(flags);
> sync_core();
> /* Could also do a CLFLUSH here to speed up CPU recovery; but
> that causes hangs on some VIA CPUs. */
> Index: linux-2.6-tip/arch/x86/include/asm/fixmap.h
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/include/asm/fixmap.h
> +++ linux-2.6-tip/arch/x86/include/asm/fixmap.h
> @@ -111,6 +111,8 @@ enum fixed_addresses {
> #ifdef CONFIG_PARAVIRT
> FIX_PARAVIRT_BOOTMAP,
> #endif
> + FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
> + FIX_TEXT_POKE1,
> __end_of_permanent_fixed_addresses,
> #ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
> FIX_OHCI1394_BASE,
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 19:09:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2


* Masami Hiramatsu <[email protected]> wrote:

> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> pages[1] = virt_to_page(addr + PAGE_SIZE);
> }
> BUG_ON(!pages[0]);
> - if (!pages[1])
> - nr_pages = 1;
> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> - BUG_ON(!vaddr);
> - local_irq_disable();
> + local_irq_save(flags);
> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> + if (pages[1])
> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - local_irq_enable();
> - vunmap(vaddr);
> + clear_fixmap(FIX_TEXT_POKE0);
> + if (pages[1])
> + clear_fixmap(FIX_TEXT_POKE1);
> + local_flush_tlb();
> + local_irq_restore(flags);
> sync_core();

I'm not sure at all about this widening of the irq-atomic
section and the idea of allowing non-locked access on single-CPU
situations - we dont really want to micro-optimize any of this
on such a level, holding the text lock is a robust rule all code
should be listening to. (Creating locking assymetry always
inserts a certain amount of fragility - adding to an already
fragile concept here.)

And note that there's no reason why text_poke could not be used
in stop_machine_run() - the stop_machine_run() handler must not
take the text_lock of course - but outside code calling
stop_machine_run() can do it and can hence serialize properly.

Note that even if we did this then your v2 patch is not fully
correct: you need to move the sync_core() at the end of the
sequence inside the critical section too. (right now this is
mostly harmless because the INVLPG inside the clear_fixmap()
happens to be serializing so it has an implicit sync_core()
property - but nevertheless we better do this straight away to
not cause problems later down the line.)

Ingo

2009-03-06 19:20:37

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2

* Ingo Molnar ([email protected]) wrote:
>
> * Masami Hiramatsu <[email protected]> wrote:
>
> > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> > pages[1] = virt_to_page(addr + PAGE_SIZE);
> > }
> > BUG_ON(!pages[0]);
> > - if (!pages[1])
> > - nr_pages = 1;
> > - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > - BUG_ON(!vaddr);
> > - local_irq_disable();
> > + local_irq_save(flags);
> > + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > + if (pages[1])
> > + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> > memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > - local_irq_enable();
> > - vunmap(vaddr);
> > + clear_fixmap(FIX_TEXT_POKE0);
> > + if (pages[1])
> > + clear_fixmap(FIX_TEXT_POKE1);
> > + local_flush_tlb();
> > + local_irq_restore(flags);
> > sync_core();
>
> I'm not sure at all about this widening of the irq-atomic
> section and the idea of allowing non-locked access on single-CPU
> situations - we dont really want to micro-optimize any of this
> on such a level, holding the text lock is a robust rule all code
> should be listening to. (Creating locking assymetry always
> inserts a certain amount of fragility - adding to an already
> fragile concept here.)
>
> And note that there's no reason why text_poke could not be used
> in stop_machine_run() - the stop_machine_run() handler must not
> take the text_lock of course - but outside code calling
> stop_machine_run() can do it and can hence serialize properly.
>
> Note that even if we did this then your v2 patch is not fully
> correct: you need to move the sync_core() at the end of the
> sequence inside the critical section too. (right now this is
> mostly harmless because the INVLPG inside the clear_fixmap()
> happens to be serializing so it has an implicit sync_core()
> property - but nevertheless we better do this straight away to
> not cause problems later down the line.)
>
> Ingo

Agreed. The alternatives_smp_lock/alternatives_smp_unlock specific case
does not bring us much if it has no perceivable performance impact. It's
better to keep a standard interface and clear requirements.

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 19:23:58

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH -tip 4/4] Atomic text_poke() with fixmap take2


* Mathieu Desnoyers <[email protected]> wrote:

> * Ingo Molnar ([email protected]) wrote:
> >
> > * Masami Hiramatsu <[email protected]> wrote:
> >
> > > @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> > > pages[1] = virt_to_page(addr + PAGE_SIZE);
> > > }
> > > BUG_ON(!pages[0]);
> > > - if (!pages[1])
> > > - nr_pages = 1;
> > > - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> > > - BUG_ON(!vaddr);
> > > - local_irq_disable();
> > > + local_irq_save(flags);
> > > + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > > + if (pages[1])
> > > + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > > + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> > > memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > > - local_irq_enable();
> > > - vunmap(vaddr);
> > > + clear_fixmap(FIX_TEXT_POKE0);
> > > + if (pages[1])
> > > + clear_fixmap(FIX_TEXT_POKE1);
> > > + local_flush_tlb();
> > > + local_irq_restore(flags);
> > > sync_core();
> >
> > I'm not sure at all about this widening of the irq-atomic
> > section and the idea of allowing non-locked access on single-CPU
> > situations - we dont really want to micro-optimize any of this
> > on such a level, holding the text lock is a robust rule all code
> > should be listening to. (Creating locking assymetry always
> > inserts a certain amount of fragility - adding to an already
> > fragile concept here.)
> >
> > And note that there's no reason why text_poke could not be used
> > in stop_machine_run() - the stop_machine_run() handler must not
> > take the text_lock of course - but outside code calling
> > stop_machine_run() can do it and can hence serialize properly.
> >
> > Note that even if we did this then your v2 patch is not fully
> > correct: you need to move the sync_core() at the end of the
> > sequence inside the critical section too. (right now this is
> > mostly harmless because the INVLPG inside the clear_fixmap()
> > happens to be serializing so it has an implicit sync_core()
> > property - but nevertheless we better do this straight away to
> > not cause problems later down the line.)
> >
> > Ingo
>
> Agreed. The alternatives_smp_lock/alternatives_smp_unlock
> specific case does not bring us much if it has no perceivable
> performance impact. It's better to keep a standard interface
> and clear requirements.

Note that i dont object to another aspect of this same change:
the fact that it makes the whole sequence more atomic and more
defensive [which is never bad of fragile interfaces].

I only got worried about the "lets use this without the text
lock" ideas.

So if Masami-san sends a delta patch with a different changelog
and with the sync_core() bit moved inside the critical section,
i'll apply that too.

Ingo

2009-03-06 20:25:26

by Masami Hiramatsu

[permalink] [raw]
Subject: [PATCH -tip 5/4] Expands irq-off region in text_poke()

Ingo Molnar wrote:
> * Mathieu Desnoyers <[email protected]> wrote:
>
>> * Ingo Molnar ([email protected]) wrote:
>>> * Masami Hiramatsu <[email protected]> wrote:
>>>
>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>>>> pages[1] = virt_to_page(addr + PAGE_SIZE);
>>>> }
>>>> BUG_ON(!pages[0]);
>>>> - if (!pages[1])
>>>> - nr_pages = 1;
>>>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>>>> - BUG_ON(!vaddr);
>>>> - local_irq_disable();
>>>> + local_irq_save(flags);
>>>> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>>> + if (pages[1])
>>>> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>>> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>>>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>>> - local_irq_enable();
>>>> - vunmap(vaddr);
>>>> + clear_fixmap(FIX_TEXT_POKE0);
>>>> + if (pages[1])
>>>> + clear_fixmap(FIX_TEXT_POKE1);
>>>> + local_flush_tlb();
>>>> + local_irq_restore(flags);
>>>> sync_core();
>>> I'm not sure at all about this widening of the irq-atomic
>>> section and the idea of allowing non-locked access on single-CPU
>>> situations - we dont really want to micro-optimize any of this
>>> on such a level, holding the text lock is a robust rule all code
>>> should be listening to. (Creating locking assymetry always
>>> inserts a certain amount of fragility - adding to an already
>>> fragile concept here.)
>>>
>>> And note that there's no reason why text_poke could not be used
>>> in stop_machine_run() - the stop_machine_run() handler must not
>>> take the text_lock of course - but outside code calling
>>> stop_machine_run() can do it and can hence serialize properly.
>>>
>>> Note that even if we did this then your v2 patch is not fully
>>> correct: you need to move the sync_core() at the end of the
>>> sequence inside the critical section too. (right now this is
>>> mostly harmless because the INVLPG inside the clear_fixmap()
>>> happens to be serializing so it has an implicit sync_core()
>>> property - but nevertheless we better do this straight away to
>>> not cause problems later down the line.)
>>>
>>> Ingo
>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock
>> specific case does not bring us much if it has no perceivable
>> performance impact. It's better to keep a standard interface
>> and clear requirements.
>
> Note that i dont object to another aspect of this same change:
> the fact that it makes the whole sequence more atomic and more
> defensive [which is never bad of fragile interfaces].
>
> I only got worried about the "lets use this without the text
> lock" ideas.
>
> So if Masami-san sends a delta patch with a different changelog
> and with the sync_core() bit moved inside the critical section,
> i'll apply that too.

OK, here is the delta patch.

Expand irq-atomic region to cover fixmap using code and sync_core.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/kernel/alternative.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
+ local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
- local_irq_save(flags);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_restore(flags);
clear_fixmap(FIX_TEXT_POKE0);
if (pages[1])
clear_fixmap(FIX_TEXT_POKE1);
@@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */
+ local_irq_restore(flags);
for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
return addr;
--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-06 21:06:45

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()

* Masami Hiramatsu ([email protected]) wrote:
> Ingo Molnar wrote:
> > * Mathieu Desnoyers <[email protected]> wrote:
> >
> >> * Ingo Molnar ([email protected]) wrote:
> >>> * Masami Hiramatsu <[email protected]> wrote:
> >>>
> >>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >>>> pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>>> }
> >>>> BUG_ON(!pages[0]);
> >>>> - if (!pages[1])
> >>>> - nr_pages = 1;
> >>>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >>>> - BUG_ON(!vaddr);
> >>>> - local_irq_disable();
> >>>> + local_irq_save(flags);
> >>>> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >>>> + if (pages[1])
> >>>> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >>>> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >>>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >>>> - local_irq_enable();
> >>>> - vunmap(vaddr);
> >>>> + clear_fixmap(FIX_TEXT_POKE0);
> >>>> + if (pages[1])
> >>>> + clear_fixmap(FIX_TEXT_POKE1);
> >>>> + local_flush_tlb();
> >>>> + local_irq_restore(flags);
> >>>> sync_core();
> >>> I'm not sure at all about this widening of the irq-atomic
> >>> section and the idea of allowing non-locked access on single-CPU
> >>> situations - we dont really want to micro-optimize any of this
> >>> on such a level, holding the text lock is a robust rule all code
> >>> should be listening to. (Creating locking assymetry always
> >>> inserts a certain amount of fragility - adding to an already
> >>> fragile concept here.)
> >>>
> >>> And note that there's no reason why text_poke could not be used
> >>> in stop_machine_run() - the stop_machine_run() handler must not
> >>> take the text_lock of course - but outside code calling
> >>> stop_machine_run() can do it and can hence serialize properly.
> >>>
> >>> Note that even if we did this then your v2 patch is not fully
> >>> correct: you need to move the sync_core() at the end of the
> >>> sequence inside the critical section too. (right now this is
> >>> mostly harmless because the INVLPG inside the clear_fixmap()
> >>> happens to be serializing so it has an implicit sync_core()
> >>> property - but nevertheless we better do this straight away to
> >>> not cause problems later down the line.)
> >>>
> >>> Ingo
> >> Agreed. The alternatives_smp_lock/alternatives_smp_unlock
> >> specific case does not bring us much if it has no perceivable
> >> performance impact. It's better to keep a standard interface
> >> and clear requirements.
> >
> > Note that i dont object to another aspect of this same change:
> > the fact that it makes the whole sequence more atomic and more
> > defensive [which is never bad of fragile interfaces].
> >
> > I only got worried about the "lets use this without the text
> > lock" ideas.
> >
> > So if Masami-san sends a delta patch with a different changelog
> > and with the sync_core() bit moved inside the critical section,
> > i'll apply that too.
>
> OK, here is the delta patch.
>
> Expand irq-atomic region to cover fixmap using code and sync_core.
>
> Signed-off-by: Masami Hiramatsu <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/alternative.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> ===================================================================
> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
> pages[1] = virt_to_page(addr + PAGE_SIZE);
> }
> BUG_ON(!pages[0]);
> + local_irq_save(flags);
> set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> if (pages[1])
> set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> - local_irq_save(flags);
> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> - local_irq_restore(flags);
> clear_fixmap(FIX_TEXT_POKE0);
> if (pages[1])
> clear_fixmap(FIX_TEXT_POKE1);
> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
> sync_core();
> /* Could also do a CLFLUSH here to speed up CPU recovery; but
> that causes hangs on some VIA CPUs. */
> + local_irq_restore(flags);
> for (i = 0; i < len; i++)
> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);

I think irq off should cover the BUG_ON too. This safety check assumes
we are the only ones modifying "addr".

Mathieu

> return addr;
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-06 21:57:43

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()



Mathieu Desnoyers wrote:
> * Masami Hiramatsu ([email protected]) wrote:
>> Ingo Molnar wrote:
>>> * Mathieu Desnoyers <[email protected]> wrote:
>>>
>>>> * Ingo Molnar ([email protected]) wrote:
>>>>> * Masami Hiramatsu <[email protected]> wrote:
>>>>>
>>>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
>>>>>> pages[1] = virt_to_page(addr + PAGE_SIZE);
>>>>>> }
>>>>>> BUG_ON(!pages[0]);
>>>>>> - if (!pages[1])
>>>>>> - nr_pages = 1;
>>>>>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
>>>>>> - BUG_ON(!vaddr);
>>>>>> - local_irq_disable();
>>>>>> + local_irq_save(flags);
>>>>>> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>>>>>> + if (pages[1])
>>>>>> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>>>>>> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>>>>>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>>>>>> - local_irq_enable();
>>>>>> - vunmap(vaddr);
>>>>>> + clear_fixmap(FIX_TEXT_POKE0);
>>>>>> + if (pages[1])
>>>>>> + clear_fixmap(FIX_TEXT_POKE1);
>>>>>> + local_flush_tlb();
>>>>>> + local_irq_restore(flags);
>>>>>> sync_core();
>>>>> I'm not sure at all about this widening of the irq-atomic
>>>>> section and the idea of allowing non-locked access on single-CPU
>>>>> situations - we dont really want to micro-optimize any of this
>>>>> on such a level, holding the text lock is a robust rule all code
>>>>> should be listening to. (Creating locking assymetry always
>>>>> inserts a certain amount of fragility - adding to an already
>>>>> fragile concept here.)
>>>>>
>>>>> And note that there's no reason why text_poke could not be used
>>>>> in stop_machine_run() - the stop_machine_run() handler must not
>>>>> take the text_lock of course - but outside code calling
>>>>> stop_machine_run() can do it and can hence serialize properly.
>>>>>
>>>>> Note that even if we did this then your v2 patch is not fully
>>>>> correct: you need to move the sync_core() at the end of the
>>>>> sequence inside the critical section too. (right now this is
>>>>> mostly harmless because the INVLPG inside the clear_fixmap()
>>>>> happens to be serializing so it has an implicit sync_core()
>>>>> property - but nevertheless we better do this straight away to
>>>>> not cause problems later down the line.)
>>>>>
>>>>> Ingo
>>>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock
>>>> specific case does not bring us much if it has no perceivable
>>>> performance impact. It's better to keep a standard interface
>>>> and clear requirements.
>>> Note that i dont object to another aspect of this same change:
>>> the fact that it makes the whole sequence more atomic and more
>>> defensive [which is never bad of fragile interfaces].
>>>
>>> I only got worried about the "lets use this without the text
>>> lock" ideas.
>>>
>>> So if Masami-san sends a delta patch with a different changelog
>>> and with the sync_core() bit moved inside the critical section,
>>> i'll apply that too.
>> OK, here is the delta patch.
>>
>> Expand irq-atomic region to cover fixmap using code and sync_core.
>>
>> Signed-off-by: Masami Hiramatsu <[email protected]>
>> Cc: Mathieu Desnoyers <[email protected]>
>> Cc: Ingo Molnar <[email protected]>
>> ---
>> arch/x86/kernel/alternative.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
>> ===================================================================
>> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
>> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
>> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
>> pages[1] = virt_to_page(addr + PAGE_SIZE);
>> }
>> BUG_ON(!pages[0]);
>> + local_irq_save(flags);
>> set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
>> if (pages[1])
>> set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
>> vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
>> - local_irq_save(flags);
>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
>> - local_irq_restore(flags);
>> clear_fixmap(FIX_TEXT_POKE0);
>> if (pages[1])
>> clear_fixmap(FIX_TEXT_POKE1);
>> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
>> sync_core();
>> /* Could also do a CLFLUSH here to speed up CPU recovery; but
>> that causes hangs on some VIA CPUs. */
>> + local_irq_restore(flags);
>> for (i = 0; i < len; i++)
>> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
>
> I think irq off should cover the BUG_ON too. This safety check assumes
> we are the only ones modifying "addr".

I think others don't change without text_mutex, don't it?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-07 01:42:39

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()

* Masami Hiramatsu ([email protected]) wrote:
>
>
> Mathieu Desnoyers wrote:
> > * Masami Hiramatsu ([email protected]) wrote:
> >> Ingo Molnar wrote:
> >>> * Mathieu Desnoyers <[email protected]> wrote:
> >>>
> >>>> * Ingo Molnar ([email protected]) wrote:
> >>>>> * Masami Hiramatsu <[email protected]> wrote:
> >>>>>
> >>>>>> @@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, co
> >>>>>> pages[1] = virt_to_page(addr + PAGE_SIZE);
> >>>>>> }
> >>>>>> BUG_ON(!pages[0]);
> >>>>>> - if (!pages[1])
> >>>>>> - nr_pages = 1;
> >>>>>> - vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
> >>>>>> - BUG_ON(!vaddr);
> >>>>>> - local_irq_disable();
> >>>>>> + local_irq_save(flags);
> >>>>>> + set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >>>>>> + if (pages[1])
> >>>>>> + set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >>>>>> + vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >>>>>> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >>>>>> - local_irq_enable();
> >>>>>> - vunmap(vaddr);
> >>>>>> + clear_fixmap(FIX_TEXT_POKE0);
> >>>>>> + if (pages[1])
> >>>>>> + clear_fixmap(FIX_TEXT_POKE1);
> >>>>>> + local_flush_tlb();
> >>>>>> + local_irq_restore(flags);
> >>>>>> sync_core();
> >>>>> I'm not sure at all about this widening of the irq-atomic
> >>>>> section and the idea of allowing non-locked access on single-CPU
> >>>>> situations - we dont really want to micro-optimize any of this
> >>>>> on such a level, holding the text lock is a robust rule all code
> >>>>> should be listening to. (Creating locking assymetry always
> >>>>> inserts a certain amount of fragility - adding to an already
> >>>>> fragile concept here.)
> >>>>>
> >>>>> And note that there's no reason why text_poke could not be used
> >>>>> in stop_machine_run() - the stop_machine_run() handler must not
> >>>>> take the text_lock of course - but outside code calling
> >>>>> stop_machine_run() can do it and can hence serialize properly.
> >>>>>
> >>>>> Note that even if we did this then your v2 patch is not fully
> >>>>> correct: you need to move the sync_core() at the end of the
> >>>>> sequence inside the critical section too. (right now this is
> >>>>> mostly harmless because the INVLPG inside the clear_fixmap()
> >>>>> happens to be serializing so it has an implicit sync_core()
> >>>>> property - but nevertheless we better do this straight away to
> >>>>> not cause problems later down the line.)
> >>>>>
> >>>>> Ingo
> >>>> Agreed. The alternatives_smp_lock/alternatives_smp_unlock
> >>>> specific case does not bring us much if it has no perceivable
> >>>> performance impact. It's better to keep a standard interface
> >>>> and clear requirements.
> >>> Note that i dont object to another aspect of this same change:
> >>> the fact that it makes the whole sequence more atomic and more
> >>> defensive [which is never bad of fragile interfaces].
> >>>
> >>> I only got worried about the "lets use this without the text
> >>> lock" ideas.
> >>>
> >>> So if Masami-san sends a delta patch with a different changelog
> >>> and with the sync_core() bit moved inside the critical section,
> >>> i'll apply that too.
> >> OK, here is the delta patch.
> >>
> >> Expand irq-atomic region to cover fixmap using code and sync_core.
> >>
> >> Signed-off-by: Masami Hiramatsu <[email protected]>
> >> Cc: Mathieu Desnoyers <[email protected]>
> >> Cc: Ingo Molnar <[email protected]>
> >> ---
> >> arch/x86/kernel/alternative.c | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> Index: linux-2.6-tip/arch/x86/kernel/alternative.c
> >> ===================================================================
> >> --- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
> >> +++ linux-2.6-tip/arch/x86/kernel/alternative.c
> >> @@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
> >> pages[1] = virt_to_page(addr + PAGE_SIZE);
> >> }
> >> BUG_ON(!pages[0]);
> >> + local_irq_save(flags);
> >> set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> >> if (pages[1])
> >> set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> >> vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> >> - local_irq_save(flags);
> >> memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> >> - local_irq_restore(flags);
> >> clear_fixmap(FIX_TEXT_POKE0);
> >> if (pages[1])
> >> clear_fixmap(FIX_TEXT_POKE1);
> >> @@ -540,6 +539,7 @@ void *__kprobes text_poke(void *addr, co
> >> sync_core();
> >> /* Could also do a CLFLUSH here to speed up CPU recovery; but
> >> that causes hangs on some VIA CPUs. */
> >> + local_irq_restore(flags);
> >> for (i = 0; i < len; i++)
> >> BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
> >
> > I think irq off should cover the BUG_ON too. This safety check assumes
> > we are the only ones modifying "addr".
>
> I think others don't change without text_mutex, don't it?
>

They shouldn't, but given we decided to grow the irq off region to
contain all the code that needs to be executed atomically, it should
also contain the BUG_ON, because it is expected to be as atomic as the
rest of the code.

Mathieu

> Thank you,
>
> --
> Masami Hiramatsu
>
> Software Engineer
> Hitachi Computer Products (America) Inc.
> Software Solutions Division
>
> e-mail: [email protected]
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-03-08 15:52:17

by Mathieu Desnoyers

[permalink] [raw]
Subject: [tip:tracing/core] tracing, Text Edit Lock - Architecture Independent Code

Commit-ID: 0e39ac444636ff5be39b26f1cb56d79594654dda
Gitweb: http://git.kernel.org/tip/0e39ac444636ff5be39b26f1cb56d79594654dda
Author: "Mathieu Desnoyers" <[email protected]>
AuthorDate: Fri, 6 Mar 2009 10:35:52 -0500
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 6 Mar 2009 16:48:59 +0100

tracing, Text Edit Lock - Architecture Independent Code

This is an architecture independant synchronization around kernel text
modifications through use of a global mutex.

A mutex has been chosen so that kprobes, the main user of this, can sleep
during memory allocation between the memory read of the instructions it
must replace and the memory write of the breakpoint.

Other user of this interface: immediate values.

Paravirt and alternatives are always done when SMP is inactive, so there
is no need to use locks.

Signed-off-by: Mathieu Desnoyers <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
include/linux/memory.h | 6 ++++++
mm/memory.c | 10 ++++++++++
2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/include/linux/memory.h b/include/linux/memory.h
index 3fdc108..86a6c0f 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -99,4 +99,10 @@ enum mem_add_context { BOOT, HOTPLUG };
#define hotplug_memory_notifier(fn, pri) do { } while (0)
#endif

+/*
+ * Kernel text modification mutex, used for code patching. Users of this lock
+ * can sleep.
+ */
+extern struct mutex text_mutex;
+
#endif /* _LINUX_MEMORY_H_ */
diff --git a/mm/memory.c b/mm/memory.c
index baa999e..05fab3b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -48,6 +48,8 @@
#include <linux/rmap.h>
#include <linux/module.h>
#include <linux/delayacct.h>
+#include <linux/kprobes.h>
+#include <linux/mutex.h>
#include <linux/init.h>
#include <linux/writeback.h>
#include <linux/memcontrol.h>
@@ -99,6 +101,14 @@ int randomize_va_space __read_mostly =
2;
#endif

+/*
+ * mutex protecting text section modification (dynamic code patching).
+ * some users need to sleep (allocating memory...) while they hold this lock.
+ *
+ * NOT exported to modules - patching kernel text is a really delicate matter.
+ */
+DEFINE_MUTEX(text_mutex);
+
static int __init disable_randmaps(char *s)
{
randomize_va_space = 0;

2009-03-08 15:52:33

by Mathieu Desnoyers

[permalink] [raw]
Subject: [tip:tracing/core] tracing, Text Edit Lock - kprobes architecture independent support

Commit-ID: 4460fdad85becd569f11501ad5b91814814335ff
Gitweb: http://git.kernel.org/tip/4460fdad85becd569f11501ad5b91814814335ff
Author: "Mathieu Desnoyers" <[email protected]>
AuthorDate: Fri, 6 Mar 2009 10:36:38 -0500
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 6 Mar 2009 16:49:00 +0100

tracing, Text Edit Lock - kprobes architecture independent support

Use the mutual exclusion provided by the text edit lock in the kprobes code. It
allows coherent manipulation of the kernel code by other subsystems.

Changelog:

Move the kernel_text_lock/unlock out of the for loops.
Use text_mutex directly instead of a function.
Remove whitespace modifications.

(note : kprobes_mutex is always taken outside of text_mutex)

Signed-off-by: Mathieu Desnoyers <[email protected]>
Acked-by: Ananth N Mavinakayanahalli <[email protected]>
Acked-by: Masami Hiramatsu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
kernel/kprobes.c | 15 +++++++++++++--
1 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index 7ba8cd9..479d4d5 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -43,6 +43,7 @@
#include <linux/seq_file.h>
#include <linux/debugfs.h>
#include <linux/kdebug.h>
+#include <linux/memory.h>

#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
@@ -699,9 +700,10 @@ int __kprobes register_kprobe(struct kprobe *p)
goto out;
}

+ mutex_lock(&text_mutex);
ret = arch_prepare_kprobe(p);
if (ret)
- goto out;
+ goto out_unlock_text;

INIT_HLIST_NODE(&p->hlist);
hlist_add_head_rcu(&p->hlist,
@@ -710,6 +712,8 @@ int __kprobes register_kprobe(struct kprobe *p)
if (kprobe_enabled)
arch_arm_kprobe(p);

+out_unlock_text:
+ mutex_unlock(&text_mutex);
out:
mutex_unlock(&kprobe_mutex);

@@ -746,8 +750,11 @@ valid_p:
* enabled and not gone - otherwise, the breakpoint would
* already have been removed. We save on flushing icache.
*/
- if (kprobe_enabled && !kprobe_gone(old_p))
+ if (kprobe_enabled && !kprobe_gone(old_p)) {
+ mutex_lock(&text_mutex);
arch_disarm_kprobe(p);
+ mutex_unlock(&text_mutex);
+ }
hlist_del_rcu(&old_p->hlist);
} else {
if (p->break_handler && !kprobe_gone(p))
@@ -1280,12 +1287,14 @@ static void __kprobes enable_all_kprobes(void)
if (kprobe_enabled)
goto already_enabled;

+ mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist)
if (!kprobe_gone(p))
arch_arm_kprobe(p);
}
+ mutex_unlock(&text_mutex);

kprobe_enabled = true;
printk(KERN_INFO "Kprobes globally enabled\n");
@@ -1310,6 +1319,7 @@ static void __kprobes disable_all_kprobes(void)

kprobe_enabled = false;
printk(KERN_INFO "Kprobes globally disabled\n");
+ mutex_lock(&text_mutex);
for (i = 0; i < KPROBE_TABLE_SIZE; i++) {
head = &kprobe_table[i];
hlist_for_each_entry_rcu(p, node, head, hlist) {
@@ -1318,6 +1328,7 @@ static void __kprobes disable_all_kprobes(void)
}
}

+ mutex_unlock(&text_mutex);
mutex_unlock(&kprobe_mutex);
/* Allow all currently running kprobes to complete */
synchronize_sched();

2009-03-08 15:52:51

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:tracing/core] tracing, Text Edit Lock - SMP alternatives support

Commit-ID: 3945dab45aa8c89014893bfa8eb1e1661a409cef
Gitweb: http://git.kernel.org/tip/3945dab45aa8c89014893bfa8eb1e1661a409cef
Author: "Masami Hiramatsu" <[email protected]>
AuthorDate: Fri, 6 Mar 2009 10:37:22 -0500
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 6 Mar 2009 16:49:00 +0100

tracing, Text Edit Lock - SMP alternatives support

Use the mutual exclusion provided by the text edit lock in alternatives code.
Since alternative_smp_* will be called from module init code, etc,
we'd better protect it from other subsystems.

Signed-off-by: Masami Hiramatsu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/kernel/alternative.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 4c80f15..092a7b8 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -5,6 +5,7 @@
#include <linux/kprobes.h>
#include <linux/mm.h>
#include <linux/vmalloc.h>
+#include <linux/memory.h>
#include <asm/alternative.h>
#include <asm/sections.h>
#include <asm/pgtable.h>
@@ -226,6 +227,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
{
u8 **ptr;

+ mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
@@ -234,6 +236,7 @@ static void alternatives_smp_lock(u8 **start, u8 **end, u8 *text, u8 *text_end)
/* turn DS segment override prefix into lock prefix */
text_poke(*ptr, ((unsigned char []){0xf0}), 1);
};
+ mutex_unlock(&text_mutex);
}

static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end)
@@ -243,6 +246,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
if (noreplace_smp)
return;

+ mutex_lock(&text_mutex);
for (ptr = start; ptr < end; ptr++) {
if (*ptr < text)
continue;
@@ -251,6 +255,7 @@ static void alternatives_smp_unlock(u8 **start, u8 **end, u8 *text, u8 *text_end
/* turn lock prefix into DS segment override prefix */
text_poke(*ptr, ((unsigned char []){0x3E}), 1);
};
+ mutex_unlock(&text_mutex);
}

struct smp_alt_module {

2009-03-08 15:53:13

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:tracing/core] x86: implement atomic text_poke() via fixmap

Commit-ID: 78ff7fae04554b49d29226ed12536268c2500d1f
Gitweb: http://git.kernel.org/tip/78ff7fae04554b49d29226ed12536268c2500d1f
Author: "Masami Hiramatsu" <[email protected]>
AuthorDate: Fri, 6 Mar 2009 10:37:54 -0500
Commit: Ingo Molnar <[email protected]>
CommitDate: Fri, 6 Mar 2009 16:49:01 +0100

x86: implement atomic text_poke() via fixmap

Use fixmaps instead of vmap/vunmap in text_poke() for avoiding
page allocation and delayed unmapping.

At the result of above change, text_poke() becomes atomic and can be called
from stop_machine() etc.

Signed-off-by: Masami Hiramatsu <[email protected]>
Acked-by: Mathieu Desnoyers <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/include/asm/fixmap.h | 2 ++
arch/x86/kernel/alternative.c | 24 +++++++++++++++---------
2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 63a79c7..81937a5 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -111,6 +111,8 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT
FIX_PARAVIRT_BOOTMAP,
#endif
+ FIX_TEXT_POKE0, /* reserve 2 pages for text_poke() */
+ FIX_TEXT_POKE1,
__end_of_permanent_fixed_addresses,
#ifdef CONFIG_PROVIDE_OHCI1394_DMA_INIT
FIX_OHCI1394_BASE,
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 092a7b8..2d903b7 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -13,7 +13,9 @@
#include <asm/nmi.h>
#include <asm/vsyscall.h>
#include <asm/cacheflush.h>
+#include <asm/tlbflush.h>
#include <asm/io.h>
+#include <asm/fixmap.h>

#define MAX_PATCH_LEN (255-1)

@@ -505,15 +507,16 @@ void *text_poke_early(void *addr, const void *opcode, size_t len)
* It means the size must be writable atomically and the address must be aligned
* in a way that permits an atomic write. It also makes sure we fit on a single
* page.
+ *
+ * Note: Must be called under text_mutex.
*/
void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
{
+ unsigned long flags;
char *vaddr;
- int nr_pages = 2;
struct page *pages[2];
int i;

- might_sleep();
if (!core_kernel_text((unsigned long)addr)) {
pages[0] = vmalloc_to_page(addr);
pages[1] = vmalloc_to_page(addr + PAGE_SIZE);
@@ -523,14 +526,17 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
- if (!pages[1])
- nr_pages = 1;
- vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
- BUG_ON(!vaddr);
- local_irq_disable();
+ set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
+ if (pages[1])
+ set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
+ vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
+ local_irq_save(flags);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_enable();
- vunmap(vaddr);
+ local_irq_restore(flags);
+ clear_fixmap(FIX_TEXT_POKE0);
+ if (pages[1])
+ clear_fixmap(FIX_TEXT_POKE1);
+ local_flush_tlb();
sync_core();
/* Could also do a CLFLUSH here to speed up CPU recovery; but
that causes hangs on some VIA CPUs. */

2009-03-09 17:41:20

by Masami Hiramatsu

[permalink] [raw]
Subject: Re: [PATCH -tip 5/4] Expands irq-off region in text_poke()

Mathieu Desnoyers wrote:
>>> I think irq off should cover the BUG_ON too. This safety check assumes
>>> we are the only ones modifying "addr".
>> I think others don't change without text_mutex, don't it?
>>
>
> They shouldn't, but given we decided to grow the irq off region to
> contain all the code that needs to be executed atomically, it should
> also contain the BUG_ON, because it is expected to be as atomic as the
> rest of the code.

Hm, sure, I updated the delta patch.

Expand irq-off region to cover fixmap using code and cache synchronizing.

Signed-off-by: Masami Hiramatsu <[email protected]>
Cc: Mathieu Desnoyers <[email protected]>
Cc: Ingo Molnar <[email protected]>
---
arch/x86/kernel/alternative.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6-tip/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6-tip.orig/arch/x86/kernel/alternative.c
+++ linux-2.6-tip/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, co
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
+ local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
- local_irq_save(flags);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_restore(flags);
clear_fixmap(FIX_TEXT_POKE0);
if (pages[1])
clear_fixmap(FIX_TEXT_POKE1);
@@ -542,5 +541,6 @@ void *__kprobes text_poke(void *addr, co
that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+ local_irq_restore(flags);
return addr;
}

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: [email protected]

2009-03-10 21:58:31

by Masami Hiramatsu

[permalink] [raw]
Subject: [tip:tracing/core] x86: expand irq-off region in text_poke()

Commit-ID: 7cf49427042400d40bdc80b5c3399b6b5945afa8
Gitweb: http://git.kernel.org/tip/7cf49427042400d40bdc80b5c3399b6b5945afa8
Author: "Masami Hiramatsu" <[email protected]>
AuthorDate: Mon, 9 Mar 2009 12:40:40 -0400
Commit: Ingo Molnar <[email protected]>
CommitDate: Tue, 10 Mar 2009 16:24:23 +0100

x86: expand irq-off region in text_poke()

Expand irq-off region to cover fixmap using code and cache synchronizing.

Signed-off-by: Masami Hiramatsu <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>


---
arch/x86/kernel/alternative.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2d903b7..f576587 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -526,13 +526,12 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
pages[1] = virt_to_page(addr + PAGE_SIZE);
}
BUG_ON(!pages[0]);
+ local_irq_save(flags);
set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
if (pages[1])
set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
- local_irq_save(flags);
memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
- local_irq_restore(flags);
clear_fixmap(FIX_TEXT_POKE0);
if (pages[1])
clear_fixmap(FIX_TEXT_POKE1);
@@ -542,5 +541,6 @@ void *__kprobes text_poke(void *addr, const void *opcode, size_t len)
that causes hangs on some VIA CPUs. */
for (i = 0; i < len; i++)
BUG_ON(((char *)addr)[i] != ((char *)opcode)[i]);
+ local_irq_restore(flags);
return addr;
}