2010-11-23 21:28:24

by Jason Baron

[permalink] [raw]
Subject: [PATCH 0/3] jump label: updates for 2.6.37

Hi,

A few jump label patches that I want considered for 2.6.37. Patches are against
the latest -tip tree.

The first one, which adds 'state' to the jump label mechanism is the most
important. Essentially, it ensures that if jump labels are enabled/disabled in
the core kernel but the actual call sites are in modules, we properly honor the
state of the jump label. This also works for jump labels which may be defined in
one module but made use of in another module.

There has been some discussion about using the 'key' variable to store the
enabled/disabled state for each jump label. However, I think a better design
will be to use the 'key' variable to store a pointer to the appropriate jump
label tables. In this way, we can enable/disable jump labels, without the
hashing that I'm currently doing. However, I didn't want to propose these more
invasive changes until 2.6.38.

thanks,

-Jason

Jason Baron (3):
jump label: add enabled/disabled state to jump label key entries
jump label: move jump table to r/w section
jump label: add docs

Documentation/jump-label.txt | 126 +++++++++++++++++++++++++++++++++++++
include/asm-generic/vmlinux.lds.h | 14 +---
kernel/jump_label.c | 101 +++++++++++++++++++++++-------
3 files changed, 209 insertions(+), 32 deletions(-)
create mode 100644 Documentation/jump-label.txt


2010-11-23 21:28:19

by Jason Baron

[permalink] [raw]
Subject: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

By storing the state of the jump label with each key, we make
sure that when modules are added, they are updated to the correct
state. For example, if the kmalloc tracepoint is enabled and
a module is added which has kmalloc, we make sure that the tracepoint
is properly enabled on module load.

Also, if jump_label_enable(key), is called but the key has not yet
been added to the hashtable of jump label keys, add 'key' to the table.
In this way, if key value has its state updated, but we have not
yet encountered a JUMP_LABEL() definition for it (if its located in
a module), we ensure that the jump label is set to the correct
state when it finally is encountered.

When modules are unloaded, we traverse the jump label hashtable,
and remove any entries that have a key value that is contained
by that module's text section. In this way key values are properly
unregistered, and can be re-used.

Signed-off-by: Jason Baron <[email protected]>
---
kernel/jump_label.c | 101 ++++++++++++++++++++++++++++++++++++++++-----------
1 files changed, 79 insertions(+), 22 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3b79bd9..88e853a 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -26,10 +26,11 @@ static DEFINE_MUTEX(jump_label_mutex);
struct jump_label_entry {
struct hlist_node hlist;
struct jump_entry *table;
- int nr_entries;
/* hang modules off here */
struct hlist_head modules;
unsigned long key;
+ u32 nr_entries : 31,
+ enabled : 1;
};

struct jump_label_module_entry {
@@ -108,6 +109,7 @@ add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
e->key = key;
e->table = table;
e->nr_entries = nr_entries;
+ e->enabled = 0;
INIT_HLIST_HEAD(&(e->modules));
hlist_add_head(&e->hlist, head);
return e;
@@ -164,27 +166,37 @@ void jump_label_update(unsigned long key, enum jump_label_type type)

jump_label_lock();
entry = get_jump_label_entry((jump_label_t)key);
- if (entry) {
- count = entry->nr_entries;
- iter = entry->table;
+ if (!entry) {
+ if (type == JUMP_LABEL_ENABLE) {
+ entry = add_jump_label_entry(key, 0, NULL);
+ if (IS_ERR(entry))
+ goto out;
+ } else
+ goto out;
+ }
+ if (type == JUMP_LABEL_ENABLE)
+ entry->enabled = 1;
+ else
+ entry->enabled = 0;
+ count = entry->nr_entries;
+ iter = entry->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ /* enable/disable jump labels in modules */
+ hlist_for_each_entry(e_module, module_node, &(entry->modules),
+ hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
while (count--) {
- if (kernel_text_address(iter->code))
+ if (iter->key && kernel_text_address(iter->code))
arch_jump_label_transform(iter, type);
iter++;
}
- /* eanble/disable jump labels in modules */
- hlist_for_each_entry(e_module, module_node, &(entry->modules),
- hlist) {
- count = e_module->nr_entries;
- iter = e_module->table;
- while (count--) {
- if (iter->key &&
- kernel_text_address(iter->code))
- arch_jump_label_transform(iter, type);
- iter++;
- }
- }
}
+out:
jump_label_unlock();
}

@@ -316,6 +328,41 @@ add_jump_label_module_entry(struct jump_label_entry *entry,
return e;
}

+static void update_jump_label_module(struct module *mod)
+{
+ struct hlist_head *head;
+ struct hlist_node *node, *node_next, *module_node, *module_node_next;
+ struct jump_label_entry *e;
+ struct jump_label_module_entry *e_module;
+ struct jump_entry *iter;
+ int i, count;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return;
+
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ if (!e->enabled)
+ continue;
+ hlist_for_each_entry_safe(e_module, module_node,
+ module_node_next,
+ &(e->modules), hlist) {
+ if (e_module->mod != mod)
+ continue;
+ count = e_module->nr_entries;
+ iter = e_module->table;
+ while (count--) {
+ arch_jump_label_transform(iter,
+ JUMP_LABEL_ENABLE);
+ iter++;
+ }
+ }
+ }
+ }
+}
+
static int add_jump_label_module(struct module *mod)
{
struct jump_entry *iter, *iter_begin;
@@ -349,6 +396,9 @@ static int add_jump_label_module(struct module *mod)
if (IS_ERR(module_entry))
return PTR_ERR(module_entry);
}
+ /* update new entries to the correct state */
+ update_jump_label_module(mod);
+
return 0;
}

@@ -360,10 +410,6 @@ static void remove_jump_label_module(struct module *mod)
struct jump_label_module_entry *e_module;
int i;

- /* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
- return;
-
for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
head = &jump_label_table[i];
hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
@@ -375,10 +421,21 @@ static void remove_jump_label_module(struct module *mod)
kfree(e_module);
}
}
+ }
+ }
+ /* now check if any keys can be removed */
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ if (!within_module_core(e->key, mod))
+ continue;
if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
hlist_del(&e->hlist);
kfree(e);
+ continue;
}
+ WARN(1, KERN_ERR "jump label: "
+ "tyring to remove used key: %lu !\n", e->key);
}
}
}
@@ -470,7 +527,7 @@ void jump_label_apply_nops(struct module *mod)

struct notifier_block jump_label_module_nb = {
.notifier_call = jump_label_module_notify,
- .priority = 0,
+ .priority = 1, /* higher than tracepoints */
};

static __init int init_jump_label_module(void)
--
1.7.1

2010-11-23 21:28:12

by Jason Baron

[permalink] [raw]
Subject: [PATCH 2/3] jump label: move jump table to r/w section

Since we writing the jump table it should be be in R/W kernel
section. Move it to DATA_DATA

Signed-off-by: Jason Baron <[email protected]>
---
include/asm-generic/vmlinux.lds.h | 14 ++++----------
1 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index bd69d79..9ca894d 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -161,6 +161,10 @@
VMLINUX_SYMBOL(__start___tracepoints) = .; \
*(__tracepoints) \
VMLINUX_SYMBOL(__stop___tracepoints) = .; \
+ . = ALIGN(8); \
+ VMLINUX_SYMBOL(__start___jump_table) = .; \
+ *(__jump_table) \
+ VMLINUX_SYMBOL(__stop___jump_table) = .; \
/* implement dynamic printk debug */ \
. = ALIGN(8); \
VMLINUX_SYMBOL(__start___verbose) = .; \
@@ -221,8 +225,6 @@
\
BUG_TABLE \
\
- JUMP_TABLE \
- \
/* PCI quirks */ \
.pci_fixup : AT(ADDR(.pci_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_pci_fixups_early) = .; \
@@ -566,14 +568,6 @@
#define BUG_TABLE
#endif

-#define JUMP_TABLE \
- . = ALIGN(8); \
- __jump_table : AT(ADDR(__jump_table) - LOAD_OFFSET) { \
- VMLINUX_SYMBOL(__start___jump_table) = .; \
- *(__jump_table) \
- VMLINUX_SYMBOL(__stop___jump_table) = .; \
- }
-
#ifdef CONFIG_PM_TRACE
#define TRACEDATA \
. = ALIGN(4); \
--
1.7.1

2010-11-23 21:28:11

by Jason Baron

[permalink] [raw]
Subject: [PATCH 3/3] jump label: add docs

Add jump label docs as: Documentation/jump-label.txt

Signed-off-by: Jason Baron <[email protected]>
---
Documentation/jump-label.txt | 126 ++++++++++++++++++++++++++++++++++++++++++
1 files changed, 126 insertions(+), 0 deletions(-)
create mode 100644 Documentation/jump-label.txt

diff --git a/Documentation/jump-label.txt b/Documentation/jump-label.txt
new file mode 100644
index 0000000..920b4c0
--- /dev/null
+++ b/Documentation/jump-label.txt
@@ -0,0 +1,126 @@
+ Jump Label
+ ----------
+
+By: Jason Baron <[email protected]>
+
+
+1) Motivation
+
+
+Currently, tracepoints are implemented using a conditional. The conditional
+check requires checking a global variable for each tracepoint. Although
+the overhead of this check is small, it increases when the memory cache comes
+under pressure (memory cache lines for these global variables may be shared
+with other memory accesses). As we increase the number of tracepoints in the
+kernel this overhead may become more of an issue. In addition, tracepoints are
+often dormant (disabled) and provide no direct kernel functionality. Thus, it
+is highly desirable to reduce their impact as much as possible. Although
+tracepoints are the original motivation for this work, other kernel code paths
+should be able to make use of the jump label optimization.
+
+
+2) Jump label description/usage
+
+
+gcc (v4.5) adds a new 'asm goto' statement that allows branching to a label.
+http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01556.html
+
+Thus, this patch set introduces an architecture specific 'JUMP_LABEL()' macro as
+follows (x86):
+
+# define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
+
+# define JUMP_LABEL(key, label) \
+ do { \
+ asm goto("1:" \
+ JUMP_LABEL_INITIAL_NOP \
+ ".pushsection __jump_table, \"a\" \n\t"\
+ _ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
+ ".popsection \n\t" \
+ : : "i" (key) : : label); \
+ } while (0)
+
+
+For architectures that have not yet introduced jump label support it's simply:
+
+#define JUMP_LABEL(key, label) \
+ if (unlikely(*key)) \
+ goto label;
+
+which then can be used as:
+
+ ....
+ JUMP_LABEL(key, trace_label);
+ printk("not doing tracing\n");
+ return;
+trace_label:
+ printk("doing tracing: %d\n", file);
+ ....
+
+The 'key' argument is thus a pointer to a conditional argument that can be used
+if the optimization is not enabled. Otherwise, this address serves as a unique
+key to identify the particular instance of the jump label.
+
+Thus, when tracing is disabled, we simply have a no-op in the fast path (when
+CONFIG_CC_OPTIMIZE_FOR_SIZE is set the no-op is currently followed by a jump
+around the dormant (disabled) tracing code). The 'JUMP_LABEL()' macro produces
+a 'jump_table', which has the following format:
+
+[instruction address] [jump target address] [key]
+
+Thus, to enable a tracepoint, we simply patch the 'instruction address' with
+a jump to the 'jump target.'
+
+The calls to enable and disable a jump label are: enable_jump_label(key) and
+disable_jump_label(key).
+
+
+3) Architecture interface
+
+
+There are a few functions and macros that architectures must implement in order
+to take advantage of this optimization. As previously mentioned, if there is no
+architecture support, we simply fall back to a traditional, load, test, and
+jump sequence.
+
+* add "HAVE_ARCH_JUMP_LABEL" to arch/<arch>/Kconfig to indicate support
+
+* #define JUMP_LABEL_NOP_SIZE, arch/x86/include/asm/jump_label.h
+
+* #define "JUMP_LABEL(key, label)", arch/x86/include/asm/jump_label.h
+
+* void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type)
+ see: arch/x86/kernel/jump_label.c
+
+* void arch_jump_label_text_poke_early(jump_label_t addr)
+ see: arch/x86/kernel/jump_label.c
+
+* finally add a definition for "struct jump_entry".
+ see: arch/x86/include/asm/jump_label.h
+
+
+4) Jump label analysis (x86)
+
+
+I've tested the performance of using 'get_cycles()' calls around the
+tracepoint call sites. For an Intel Core 2 Quad cpu (in cycles, averages):
+
+ idle after tbench run
+ ---- ----------------
+old code 32 88
+new code 2 4
+
+
+The performance improvement can be reproduced reliably on both Intel and AMD
+hardware.
+
+
+5) Acknowledgements
+
+
+Thanks to Roland McGrath and Richard Henderson for helping come up with the
+initial 'asm goto' and jump label design.
+
+Thanks to Mathieu Desnoyers and H. Peter Anvin for calling attention to this
+issue, and outlining the requirements of a solution. Mathieu also implemented a
+solution in the form of the "Immediate Values" work.
--
1.7.1

2010-11-23 21:38:27

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On 11/23/2010 01:27 PM, Jason Baron wrote:
> Hi,
>
> A few jump label patches that I want considered for 2.6.37. Patches are against
> the latest -tip tree.
>
> The first one, which adds 'state' to the jump label mechanism is the most
> important. Essentially, it ensures that if jump labels are enabled/disabled in
> the core kernel but the actual call sites are in modules, we properly honor the
> state of the jump label. This also works for jump labels which may be defined in
> one module but made use of in another module.
>
> There has been some discussion about using the 'key' variable to store the
> enabled/disabled state for each jump label. However, I think a better design
> will be to use the 'key' variable to store a pointer to the appropriate jump
> label tables. In this way, we can enable/disable jump labels, without the
> hashing that I'm currently doing. However, I didn't want to propose these more
> invasive changes until 2.6.38.
>

I would also like to see a change in the API, preferrably something
closer to the "SWITCH_POINT" interface I discussed with Stephen before
Kernel Summit.

-hpa

2010-11-23 21:42:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> Hi,
>
> A few jump label patches that I want considered for 2.6.37. Patches are against
> the latest -tip tree.

I can see patch 2 and 3 going to 2.6.37, but patch 1 seems a bit too
big. If it is not a true fix for anything and is just a design change,
then lets hold off till 2.6.38.


>
> The first one, which adds 'state' to the jump label mechanism is the most
> important. Essentially, it ensures that if jump labels are enabled/disabled in
> the core kernel but the actual call sites are in modules, we properly honor the
> state of the jump label. This also works for jump labels which may be defined in
> one module but made use of in another module.

What happens if we don't do this. What does "honoring" the state
actually mean?


>
> There has been some discussion about using the 'key' variable to store the
> enabled/disabled state for each jump label. However, I think a better design
> will be to use the 'key' variable to store a pointer to the appropriate jump
> label tables. In this way, we can enable/disable jump labels, without the
> hashing that I'm currently doing. However, I didn't want to propose these more
> invasive changes until 2.6.38.

I'm thinking even patch 1 is too invasive. Unless it crashes or truly
breaks something without the change.

Patch 2 is small and reasonable, and patch 3 is just documentation
changes. Both OK for an -rc update.

-- Steve

2010-11-23 21:56:53

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, Nov 23, 2010 at 04:42:07PM -0500, Steven Rostedt wrote:
> On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > Hi,
> >
> > A few jump label patches that I want considered for 2.6.37. Patches are against
> > the latest -tip tree.
>
> I can see patch 2 and 3 going to 2.6.37, but patch 1 seems a bit too
> big. If it is not a true fix for anything and is just a design change,
> then lets hold off till 2.6.38.
>
>
> >
> > The first one, which adds 'state' to the jump label mechanism is the most
> > important. Essentially, it ensures that if jump labels are enabled/disabled in
> > the core kernel but the actual call sites are in modules, we properly honor the
> > state of the jump label. This also works for jump labels which may be defined in
> > one module but made use of in another module.
>
> What happens if we don't do this. What does "honoring" the state
> actually mean?
>

So for example, let's say we do:

1) echo 1 > /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
2) load module 'A' that has the 'kmalloc' tracepoint.

Without patch #1, we would fail to trace the 'kmalloc' in module 'A'.
The only way to get the output from the 'kmalloc' tracepoint in module
'A', would be to issue the echo command again.

To me this is a correctness patch, and should be included.

thanks,

-Jason

2010-11-23 23:10:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 16:56 -0500, Jason Baron wrote:
> On Tue, Nov 23, 2010 at 04:42:07PM -0500, Steven Rostedt wrote:
> > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > Hi,
> > >
> > > A few jump label patches that I want considered for 2.6.37. Patches are against
> > > the latest -tip tree.
> >
> > I can see patch 2 and 3 going to 2.6.37, but patch 1 seems a bit too
> > big. If it is not a true fix for anything and is just a design change,
> > then lets hold off till 2.6.38.
> >
> >
> > >
> > > The first one, which adds 'state' to the jump label mechanism is the most
> > > important. Essentially, it ensures that if jump labels are enabled/disabled in
> > > the core kernel but the actual call sites are in modules, we properly honor the
> > > state of the jump label. This also works for jump labels which may be defined in
> > > one module but made use of in another module.
> >
> > What happens if we don't do this. What does "honoring" the state
> > actually mean?
> >
>
> So for example, let's say we do:
>
> 1) echo 1 > /sys/kernel/debug/tracing/events/kmem/kmalloc/enable
> 2) load module 'A' that has the 'kmalloc' tracepoint.
>
> Without patch #1, we would fail to trace the 'kmalloc' in module 'A'.
> The only way to get the output from the 'kmalloc' tracepoint in module
> 'A', would be to issue the echo command again.
>
> To me this is a correctness patch, and should be included.
>

It's not what you think that matters, it's what Linus thinks ;-)
He just wants bug fixes.

Anyway, I just tried what you explained with the current kernel, with
and without jump labels and, without jump labels, the module has its
kmalloc tracepoint traced, but with jump labels it does not. So we can
treat this as a regression, which is something that can go into an -rc.

The change log must state that this _is_ a regression, or Linus may not
accept it.

I'll test out your patches.

-- Steve

2010-11-23 23:11:58

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 13:36 -0800, H. Peter Anvin wrote:
> On 11/23/2010 01:27 PM, Jason Baron wrote:
> > Hi,
> >
> > A few jump label patches that I want considered for 2.6.37. Patches are against
> > the latest -tip tree.
> >
> > The first one, which adds 'state' to the jump label mechanism is the most
> > important. Essentially, it ensures that if jump labels are enabled/disabled in
> > the core kernel but the actual call sites are in modules, we properly honor the
> > state of the jump label. This also works for jump labels which may be defined in
> > one module but made use of in another module.
> >
> > There has been some discussion about using the 'key' variable to store the
> > enabled/disabled state for each jump label. However, I think a better design
> > will be to use the 'key' variable to store a pointer to the appropriate jump
> > label tables. In this way, we can enable/disable jump labels, without the
> > hashing that I'm currently doing. However, I didn't want to propose these more
> > invasive changes until 2.6.38.
> >
>
> I would also like to see a change in the API, preferrably something
> closer to the "SWITCH_POINT" interface I discussed with Stephen before
> Kernel Summit.
>

Could you explain in more detail what you would like to see.

Thanks,

-- Steve

2010-11-23 23:34:21

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On 11/23/2010 03:11 PM, Steven Rostedt wrote:
>> I would also like to see a change in the API, preferrably something
>> closer to the "SWITCH_POINT" interface I discussed with Stephen before
>> Kernel Summit.
>
> Could you explain in more detail what you would like to see.

The JUMP_LABEL() macro is rather ugly, and I found from the
static_cpu_has() work that inlines like (somewhat pseudocode here):

static inline bool SWITCH_POINT(void *metadata)
{
asm goto("1: <5 byte nop>\n"
".section \".metadata\",\"a\"\n"
".long 1b, %p0\n"
".previous\n"
: : "i" (metadata)
: : l_yes);
return false;
l_yes:
return true;
}

... work quite well; with the resulting SWITCH_POINT() being usable like
any other boolean expression in the kernel, i.e. as part of if, while,
etc. Most of the time, gcc is smart enough to just use the flow of
control provided, and it also permits backwards compatibility with older
gcc by patching in a one-byte immediate instead.

There are some instances where it double-jumps; those can be avoided by
always jumping (allowing the patch code to replace the jump with a
5-byte NOP opportunistically a posteori) but unfortunately current gcc
tends to not order the sequentially next code afterwards if one does that.

-hpa

2010-11-24 00:00:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Tue, 2010-11-23 at 18:43 -0500, Mathieu Desnoyers wrote:
> * Jason Baron ([email protected]) wrote:
> [...]
> > +static void update_jump_label_module(struct module *mod)
> > +{
> > + struct hlist_head *head;
> > + struct hlist_node *node, *node_next, *module_node, *module_node_next;
> > + struct jump_label_entry *e;
> > + struct jump_label_module_entry *e_module;
> > + struct jump_entry *iter;
> > + int i, count;
> > +
> > + /* if the module doesn't have jump label entries, just return */
> > + if (!mod->num_jump_entries)
> > + return;
> > +
> > + for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> > + head = &jump_label_table[i];
> > + hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> > + if (!e->enabled)
> > + continue;
> > + hlist_for_each_entry_safe(e_module, module_node,
> > + module_node_next,
> > + &(e->modules), hlist) {
> > + if (e_module->mod != mod)
> > + continue;
>
> Ouch.
>
> Could you iterate on the loaded/unloaded module jump labels and do hash
> table lookups rather than doing this O(n^3) iteration ?

This does not look O(n^3) to me.

It's a hash table, thus the first two loops is just iterating O(n) items
in the hash.

And the third loop is all the modules that use a particular event.

So it is O(n*m) where n is the number of events, and m is the number of
modules attached to the events. And that's only if all events are used
by those modules. The actual case is much smaller.

-- Steve

2010-11-24 00:04:12

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

On Tue, 2010-11-23 at 18:55 -0500, Mathieu Desnoyers wrote:
> * Jason Baron ([email protected]) wrote:
> > Since we writing the jump table it should be be in R/W kernel
> > section. Move it to DATA_DATA
> >
> > Signed-off-by: Jason Baron <[email protected]>
> > ---
> > include/asm-generic/vmlinux.lds.h | 14 ++++----------
> > 1 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index bd69d79..9ca894d 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -161,6 +161,10 @@
> > VMLINUX_SYMBOL(__start___tracepoints) = .; \
> > *(__tracepoints) \
> > VMLINUX_SYMBOL(__stop___tracepoints) = .; \
> > + . = ALIGN(8); \
>
> Past churn with various architectures and compiler with tracepoints,
> markers and immediate values lead me to hint at the following approach
> for jump label structure alignment:
>
> . = ALIGN(32);
>
> and to modify jump_label.h to have:
>
> struct jump_entry {
> jump_label_t code;
> jump_label_t target;
> jump_label_t key;
> } __attribute__((aligned(32)));
>
> Otherwise, the compiler is free to choose on which value it prefers to
> align the jump_entry structures, which might not match the address at
> which the linker scripts puts the beginning of the jump table.
>
> In this case, given that we put put the jump label table after the
> tracepoint table, we should be already aligned on 32 bytes. But I would
> recommend to put the . = ALIGN(32) in the linker script anyway, just for
> documentation purpose (and it should not add any padding in this case).
>
> This is not a problem introduced by this patch, it also applies to the
> current jump label code.

Agreed, but this change could probably wait for 2.6.38.

Also, if this is done, then an it should be wrapped in a
#ifdef CONFIG_JUMP_LABEL and only inserted if we are using jump labels.
Otherwise we may add a 32 byte hole for nothing. I know it's small, but
why waste it if you don't need to.

-- Steve

2010-11-24 00:10:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 15:32 -0800, H. Peter Anvin wrote:
> On 11/23/2010 03:11 PM, Steven Rostedt wrote:
> >> I would also like to see a change in the API, preferrably something
> >> closer to the "SWITCH_POINT" interface I discussed with Stephen before
> >> Kernel Summit.
> >
> > Could you explain in more detail what you would like to see.
>
> The JUMP_LABEL() macro is rather ugly, and I found from the
> static_cpu_has() work that inlines like (somewhat pseudocode here):
>
> static inline bool SWITCH_POINT(void *metadata)
> {
> asm goto("1: <5 byte nop>\n"
> ".section \".metadata\",\"a\"\n"
> ".long 1b, %p0\n"
> ".previous\n"
> : : "i" (metadata)
> : : l_yes);
> return false;
> l_yes:
> return true;
> }
>
> ... work quite well; with the resulting SWITCH_POINT() being usable like
> any other boolean expression in the kernel, i.e. as part of if, while,
> etc. Most of the time, gcc is smart enough to just use the flow of
> control provided, and it also permits backwards compatibility with older
> gcc by patching in a one-byte immediate instead.
>
> There are some instances where it double-jumps; those can be avoided by
> always jumping (allowing the patch code to replace the jump with a
> 5-byte NOP opportunistically a posteori) but unfortunately current gcc
> tends to not order the sequentially next code afterwards if one does that.
>

So you would rather have it as an if statement? Something like this:

if (unlikely(JUMP_LABEL(key)))
__DO_TRACE(....);

(Note, I like the above better too)

-- Steve

2010-11-24 00:34:17

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

* Steven Rostedt ([email protected]) wrote:
> On Tue, 2010-11-23 at 18:43 -0500, Mathieu Desnoyers wrote:
> > * Jason Baron ([email protected]) wrote:
> > [...]
> > > +static void update_jump_label_module(struct module *mod)
> > > +{
> > > + struct hlist_head *head;
> > > + struct hlist_node *node, *node_next, *module_node, *module_node_next;
> > > + struct jump_label_entry *e;
> > > + struct jump_label_module_entry *e_module;
> > > + struct jump_entry *iter;
> > > + int i, count;
> > > +
> > > + /* if the module doesn't have jump label entries, just return */
> > > + if (!mod->num_jump_entries)
> > > + return;
> > > +
> > > + for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> > > + head = &jump_label_table[i];
> > > + hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> > > + if (!e->enabled)
> > > + continue;
> > > + hlist_for_each_entry_safe(e_module, module_node,
> > > + module_node_next,
> > > + &(e->modules), hlist) {
> > > + if (e_module->mod != mod)
> > > + continue;
> >
> > Ouch.
> >
> > Could you iterate on the loaded/unloaded module jump labels and do hash
> > table lookups rather than doing this O(n^3) iteration ?
>
> This does not look O(n^3) to me.
>
> It's a hash table, thus the first two loops is just iterating O(n) items
> in the hash.

Good point.

>
> And the third loop is all the modules that use a particular event.
>
> So it is O(n*m) where n is the number of events, and m is the number of
> modules attached to the events. And that's only if all events are used
> by those modules. The actual case is much smaller.

Still, I wonder if the O(n) iteration on all events could be shrinked to
an interation on only the events present in the loaded/unloaded module ?
I think I did something like that for immediate values already. It might
apply (or not) here, just a thought.

Thanks,

Mathieu


--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-11-24 00:34:04

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

* Jason Baron ([email protected]) wrote:
[...]
> +static void update_jump_label_module(struct module *mod)
> +{
> + struct hlist_head *head;
> + struct hlist_node *node, *node_next, *module_node, *module_node_next;
> + struct jump_label_entry *e;
> + struct jump_label_module_entry *e_module;
> + struct jump_entry *iter;
> + int i, count;
> +
> + /* if the module doesn't have jump label entries, just return */
> + if (!mod->num_jump_entries)
> + return;
> +
> + for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> + head = &jump_label_table[i];
> + hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> + if (!e->enabled)
> + continue;
> + hlist_for_each_entry_safe(e_module, module_node,
> + module_node_next,
> + &(e->modules), hlist) {
> + if (e_module->mod != mod)
> + continue;

Ouch.

Could you iterate on the loaded/unloaded module jump labels and do hash
table lookups rather than doing this O(n^3) iteration ?

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-11-24 00:35:44

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

On Tue, 2010-11-23 at 19:27 -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:

> The lack of proper alignment did cause nasty hard-to-identify/reproduce
> regressions based on the compiler used, target platform and data layout.
> It's up to you, but I'd be much more comfortable merging this in 2.6.37.

Hmm, I may add a patch on top to apply it.

-- Steve

2010-11-24 00:36:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 19:10 -0500, Steven Rostedt wrote:

> So you would rather have it as an if statement? Something like this:
>
> if (unlikely(JUMP_LABEL(key)))
> __DO_TRACE(....);
>
> (Note, I like the above better too)
>

I do like this better, but this change should wait till 2.6.38.

-- Steve

2010-11-24 00:38:50

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On 11/23/2010 04:36 PM, Steven Rostedt wrote:
> On Tue, 2010-11-23 at 19:10 -0500, Steven Rostedt wrote:
>
>> So you would rather have it as an if statement? Something like this:
>>
>> if (unlikely(JUMP_LABEL(key)))
>> __DO_TRACE(....);
>>
>> (Note, I like the above better too)
>>
>
> I do like this better, but this change should wait till 2.6.38.
>

Yes.

-hpa

2010-11-24 00:40:26

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

* Jason Baron ([email protected]) wrote:
> Since we writing the jump table it should be be in R/W kernel
> section. Move it to DATA_DATA
>
> Signed-off-by: Jason Baron <[email protected]>
> ---
> include/asm-generic/vmlinux.lds.h | 14 ++++----------
> 1 files changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index bd69d79..9ca894d 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -161,6 +161,10 @@
> VMLINUX_SYMBOL(__start___tracepoints) = .; \
> *(__tracepoints) \
> VMLINUX_SYMBOL(__stop___tracepoints) = .; \
> + . = ALIGN(8); \

Past churn with various architectures and compiler with tracepoints,
markers and immediate values lead me to hint at the following approach
for jump label structure alignment:

. = ALIGN(32);

and to modify jump_label.h to have:

struct jump_entry {
jump_label_t code;
jump_label_t target;
jump_label_t key;
} __attribute__((aligned(32)));

Otherwise, the compiler is free to choose on which value it prefers to
align the jump_entry structures, which might not match the address at
which the linker scripts puts the beginning of the jump table.

In this case, given that we put put the jump label table after the
tracepoint table, we should be already aligned on 32 bytes. But I would
recommend to put the . = ALIGN(32) in the linker script anyway, just for
documentation purpose (and it should not add any padding in this case).

This is not a problem introduced by this patch, it also applies to the
current jump label code.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-11-24 01:07:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

* Steven Rostedt ([email protected]) wrote:
> On Tue, 2010-11-23 at 18:55 -0500, Mathieu Desnoyers wrote:
> > * Jason Baron ([email protected]) wrote:
> > > Since we writing the jump table it should be be in R/W kernel
> > > section. Move it to DATA_DATA
> > >
> > > Signed-off-by: Jason Baron <[email protected]>
> > > ---
> > > include/asm-generic/vmlinux.lds.h | 14 ++++----------
> > > 1 files changed, 4 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > > index bd69d79..9ca894d 100644
> > > --- a/include/asm-generic/vmlinux.lds.h
> > > +++ b/include/asm-generic/vmlinux.lds.h
> > > @@ -161,6 +161,10 @@
> > > VMLINUX_SYMBOL(__start___tracepoints) = .; \
> > > *(__tracepoints) \
> > > VMLINUX_SYMBOL(__stop___tracepoints) = .; \
> > > + . = ALIGN(8); \
> >
> > Past churn with various architectures and compiler with tracepoints,
> > markers and immediate values lead me to hint at the following approach
> > for jump label structure alignment:
> >
> > . = ALIGN(32);
> >
> > and to modify jump_label.h to have:
> >
> > struct jump_entry {
> > jump_label_t code;
> > jump_label_t target;
> > jump_label_t key;
> > } __attribute__((aligned(32)));
> >
> > Otherwise, the compiler is free to choose on which value it prefers to
> > align the jump_entry structures, which might not match the address at
> > which the linker scripts puts the beginning of the jump table.
> >
> > In this case, given that we put put the jump label table after the
> > tracepoint table, we should be already aligned on 32 bytes. But I would
> > recommend to put the . = ALIGN(32) in the linker script anyway, just for
> > documentation purpose (and it should not add any padding in this case).
> >
> > This is not a problem introduced by this patch, it also applies to the
> > current jump label code.
>
> Agreed, but this change could probably wait for 2.6.38.
>
> Also, if this is done, then an it should be wrapped in a
> #ifdef CONFIG_JUMP_LABEL and only inserted if we are using jump labels.
> Otherwise we may add a 32 byte hole for nothing. I know it's small, but
> why waste it if you don't need to.

The lack of proper alignment did cause nasty hard-to-identify/reproduce
regressions based on the compiler used, target platform and data layout.
It's up to you, but I'd be much more comfortable merging this in 2.6.37.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-11-24 02:18:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

On Tue, 2010-11-23 at 18:55 -0500, Mathieu Desnoyers wrote:
> * Jason Baron ([email protected]) wrote:
> > Since we writing the jump table it should be be in R/W kernel
> > section. Move it to DATA_DATA
> >
> > Signed-off-by: Jason Baron <[email protected]>
> > ---
> > include/asm-generic/vmlinux.lds.h | 14 ++++----------
> > 1 files changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > index bd69d79..9ca894d 100644
> > --- a/include/asm-generic/vmlinux.lds.h
> > +++ b/include/asm-generic/vmlinux.lds.h
> > @@ -161,6 +161,10 @@
> > VMLINUX_SYMBOL(__start___tracepoints) = .; \
> > *(__tracepoints) \
> > VMLINUX_SYMBOL(__stop___tracepoints) = .; \
> > + . = ALIGN(8); \
>
> Past churn with various architectures and compiler with tracepoints,
> markers and immediate values lead me to hint at the following approach
> for jump label structure alignment:
>
> . = ALIGN(32);
>
> and to modify jump_label.h to have:
>
> struct jump_entry {
> jump_label_t code;
> jump_label_t target;
> jump_label_t key;
> } __attribute__((aligned(32)));
>
> Otherwise, the compiler is free to choose on which value it prefers to
> align the jump_entry structures, which might not match the address at
> which the linker scripts puts the beginning of the jump table.
>
> In this case, given that we put put the jump label table after the
> tracepoint table, we should be already aligned on 32 bytes. But I would
> recommend to put the . = ALIGN(32) in the linker script anyway, just for
> documentation purpose (and it should not add any padding in this case).
>
> This is not a problem introduced by this patch, it also applies to the
> current jump label code.
>

Looking at this we have much bigger issues. That alignment to the
structure wont do anything but break things. This is because the
structure is not used in assigning that section. It's done in assembly:

# define JUMP_LABEL(key, label) \
do { \
asm goto("1:" \
JUMP_LABEL_INITIAL_NOP \
".pushsection __jump_table, \"a\" \n\t"\
_ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
".popsection \n\t" \
: : "i" (key) : : label); \
} while (0)



That __ASM_PTR "1b, %l[" #label "], %c0 \n\t" is assigning the section
the address of label 1, the address of the label #label, and the key.

We either need to fix this by allocating an array of jump_entrys and
having each arch copy the data into it, or we need to do something
similar to exception tables.

-- Steve

2010-11-24 02:59:56

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/3] jump label: move jump table to r/w section

[ Calling attention to David Miller ]

On Tue, 2010-11-23 at 21:18 -0500, Steven Rostedt wrote:
> On Tue, 2010-11-23 at 18:55 -0500, Mathieu Desnoyers wrote:
> > * Jason Baron ([email protected]) wrote:
> > > Since we writing the jump table it should be be in R/W kernel
> > > section. Move it to DATA_DATA
> > >
> > > Signed-off-by: Jason Baron <[email protected]>
> > > ---
> > > include/asm-generic/vmlinux.lds.h | 14 ++++----------
> > > 1 files changed, 4 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> > > index bd69d79..9ca894d 100644
> > > --- a/include/asm-generic/vmlinux.lds.h
> > > +++ b/include/asm-generic/vmlinux.lds.h
> > > @@ -161,6 +161,10 @@
> > > VMLINUX_SYMBOL(__start___tracepoints) = .; \
> > > *(__tracepoints) \
> > > VMLINUX_SYMBOL(__stop___tracepoints) = .; \
> > > + . = ALIGN(8); \
> >
> > Past churn with various architectures and compiler with tracepoints,
> > markers and immediate values lead me to hint at the following approach
> > for jump label structure alignment:
> >
> > . = ALIGN(32);
> >
> > and to modify jump_label.h to have:
> >
> > struct jump_entry {
> > jump_label_t code;
> > jump_label_t target;
> > jump_label_t key;
> > } __attribute__((aligned(32)));
> >
> > Otherwise, the compiler is free to choose on which value it prefers to
> > align the jump_entry structures, which might not match the address at
> > which the linker scripts puts the beginning of the jump table.
> >
> > In this case, given that we put put the jump label table after the
> > tracepoint table, we should be already aligned on 32 bytes. But I would
> > recommend to put the . = ALIGN(32) in the linker script anyway, just for
> > documentation purpose (and it should not add any padding in this case).
> >
> > This is not a problem introduced by this patch, it also applies to the
> > current jump label code.
> >
>
> Looking at this we have much bigger issues. That alignment to the
> structure wont do anything but break things. This is because the
> structure is not used in assigning that section. It's done in assembly:
>
> # define JUMP_LABEL(key, label) \
> do { \
> asm goto("1:" \
> JUMP_LABEL_INITIAL_NOP \
> ".pushsection __jump_table, \"a\" \n\t"\
> _ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
> ".popsection \n\t" \
> : : "i" (key) : : label); \
> } while (0)
>
>
>
> That __ASM_PTR "1b, %l[" #label "], %c0 \n\t" is assigning the section
> the address of label 1, the address of the label #label, and the key.
>
> We either need to fix this by allocating an array of jump_entrys and
> having each arch copy the data into it, or we need to do something
> similar to exception tables.

Talking with Mathieu about this, we may have the simple solution of
adding the packed attribute to all arch declarations of struct
jump_entry, and that should fix it. And having this on an 8 byte
alignment should be fine.

I just want to make sure this is fine with sparc, as it is one of the
more exotic archs.

-- Steve

2010-11-24 08:20:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> struct hlist_head modules;
> unsigned long key;
> + u32 nr_entries : 31,
> + enabled : 1;
> };

I still don't see why you do this, why not simply mandate that the key
is of type atomic_t* and use *key as enabled state?

2010-11-24 08:29:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Tue, 2010-11-23 at 18:10 -0500, Steven Rostedt wrote:

> Anyway, I just tried what you explained with the current kernel, with
> and without jump labels and, without jump labels, the module has its
> kmalloc tracepoint traced, but with jump labels it does not. So we can
> treat this as a regression, which is something that can go into an -rc.
>
> The change log must state that this _is_ a regression, or Linus may not
> accept it.

I really dislike the first patch... Preferably I'd simply fully revert
all the jump-label stuff and try again next round with a saner
interface.

There's a really good simple fix for this, simply disable the gcc
trickery for .37 and use the fallback.

Then for .38, mandate the key type to be atomic_t * and switch to the
SWITCH_POINT() interface from hpa.

2010-11-24 09:21:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Wed, Nov 24, 2010 at 09:29:22AM +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-23 at 18:10 -0500, Steven Rostedt wrote:
>
> > Anyway, I just tried what you explained with the current kernel, with
> > and without jump labels and, without jump labels, the module has its
> > kmalloc tracepoint traced, but with jump labels it does not. So we can
> > treat this as a regression, which is something that can go into an -rc.
> >
> > The change log must state that this _is_ a regression, or Linus may not
> > accept it.
>
> I really dislike the first patch... Preferably I'd simply fully revert
> all the jump-label stuff and try again next round with a saner
> interface.

Agreed.

>
> There's a really good simple fix for this, simply disable the gcc
> trickery for .37 and use the fallback.
>
> Then for .38, mandate the key type to be atomic_t * and switch to the
> SWITCH_POINT() interface from hpa.

This would also allow getting rid of the hash tables and use
binary search, which is imho much cleaner and simpler.

I will refresh my older patches to do that.

-Andi

--
[email protected] -- Speaking for myself only.

2010-11-24 12:47:42

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Wed, 2010-11-24 at 09:29 +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-23 at 18:10 -0500, Steven Rostedt wrote:
>
> > Anyway, I just tried what you explained with the current kernel, with
> > and without jump labels and, without jump labels, the module has its
> > kmalloc tracepoint traced, but with jump labels it does not. So we can
> > treat this as a regression, which is something that can go into an -rc.
> >
> > The change log must state that this _is_ a regression, or Linus may not
> > accept it.
>
> I really dislike the first patch... Preferably I'd simply fully revert
> all the jump-label stuff and try again next round with a saner
> interface.

Well, it is configurable and default off. Lets leave it as is for 37
then, and we can rework it for 38.

>
> There's a really good simple fix for this, simply disable the gcc
> trickery for .37 and use the fallback.

Again, its default off. Unless you specifically enable it, it does the
old style tracepoints.

>
> Then for .38, mandate the key type to be atomic_t * and switch to the
> SWITCH_POINT() interface from hpa.

I agree about the SWITCH_POINT() type interface, but I'm still not sure
about forcing atomic_t. I'll have to think about that one.

-- Steve

2010-11-24 13:49:34

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/3] jump label: updates for 2.6.37

On Wed, 2010-11-24 at 07:47 -0500, Steven Rostedt wrote:

> Again, its default off. Unless you specifically enable it, it does the
> old style tracepoints.

Thinking about this more, perhaps the best thing to do is just make the
jump label config depend on experimental.

-- Steve

2010-11-24 14:55:03

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > struct hlist_head modules;
> > unsigned long key;
> > + u32 nr_entries : 31,
> > + enabled : 1;
> > };
>
> I still don't see why you do this, why not simply mandate that the key
> is of type atomic_t* and use *key as enabled state?
>

Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
That way we don't need any hashing.

thanks,

-Jason

2010-11-24 15:11:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > struct hlist_head modules;
> > > unsigned long key;
> > > + u32 nr_entries : 31,
> > > + enabled : 1;
> > > };
> >
> > I still don't see why you do this, why not simply mandate that the key
> > is of type atomic_t* and use *key as enabled state?
> >
>
> Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> That way we don't need any hashing.

But but but, you're doing a friggin stop_machine to poke text, that's
way more expensive than anything else.

You can do away with the hash by using the bsearch stuff Andi has been
proposing.

Also, I'd actually like to have more than 1 bit of storage, I'm using it
as a general refcount.

2010-11-24 15:15:09

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > struct hlist_head modules;
> > > unsigned long key;
> > > + u32 nr_entries : 31,
> > > + enabled : 1;
> > > };
> >
> > I still don't see why you do this, why not simply mandate that the key
> > is of type atomic_t* and use *key as enabled state?
> >
>
> Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> That way we don't need any hashing.

I'm curious, how are you going to get the keys to point to the jump
label structures?

-- Steve

2010-11-24 15:20:30

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 04:11:18PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> > On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > > struct hlist_head modules;
> > > > unsigned long key;
> > > > + u32 nr_entries : 31,
> > > > + enabled : 1;
> > > > };
> > >
> > > I still don't see why you do this, why not simply mandate that the key
> > > is of type atomic_t* and use *key as enabled state?
> > >
> >
> > Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> > In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> > That way we don't need any hashing.
>
> But but but, you're doing a friggin stop_machine to poke text, that's
> way more expensive than anything else.
>

Yes, but other arches do not require stop_machine(). Also, there is work
for x86 to make the code patching happen without stop_machine().

> You can do away with the hash by using the bsearch stuff Andi has been
> proposing.
>
> Also, I'd actually like to have more than 1 bit of storage, I'm using it
> as a general refcount.
>
>

Not a problem, we can have a few fields, the first of which can be a
pointer, the rest can be used for other purposes.

thanks,

-Jason

2010-11-24 15:22:00

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 10:15:02AM -0500, Steven Rostedt wrote:
> On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> > On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > > struct hlist_head modules;
> > > > unsigned long key;
> > > > + u32 nr_entries : 31,
> > > > + enabled : 1;
> > > > };
> > >
> > > I still don't see why you do this, why not simply mandate that the key
> > > is of type atomic_t* and use *key as enabled state?
> > >
> >
> > Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> > In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> > That way we don't need any hashing.
>
> I'm curious, how are you going to get the keys to point to the jump
> label structures?
>
> -- Steve
>
>

The keys are simply the address of a variable or structure (so as to be
unique). We can put pointers or anything else in the variable.

thanks,

-Jason

2010-11-24 15:22:05

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:15 -0500, Steven Rostedt wrote:
> On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> > On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > > struct hlist_head modules;
> > > > unsigned long key;
> > > > + u32 nr_entries : 31,
> > > > + enabled : 1;
> > > > };
> > >
> > > I still don't see why you do this, why not simply mandate that the key
> > > is of type atomic_t* and use *key as enabled state?
> > >
> >
> > Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> > In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> > That way we don't need any hashing.
>
> I'm curious, how are you going to get the keys to point to the jump
> label structures?

What jump label structure, we've got the section with:

{key-address, text-address}

tuples, keep that sorted and binary search it,. voila!

You don't need extra data, but since we have to have some data to have a
key-address, use that storage for a simple refcount.

2010-11-24 15:24:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:19 -0500, Jason Baron wrote:
> On Wed, Nov 24, 2010 at 04:11:18PM +0100, Peter Zijlstra wrote:
> > On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> > > On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > > > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > > > struct hlist_head modules;
> > > > > unsigned long key;
> > > > > + u32 nr_entries : 31,
> > > > > + enabled : 1;
> > > > > };
> > > >
> > > > I still don't see why you do this, why not simply mandate that the key
> > > > is of type atomic_t* and use *key as enabled state?
> > > >
> > >
> > > Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> > > In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> > > That way we don't need any hashing.
> >
> > But but but, you're doing a friggin stop_machine to poke text, that's
> > way more expensive than anything else.
> >
>
> Yes, but other arches do not require stop_machine(). Also, there is work
> for x86 to make the code patching happen without stop_machine().

Even without stop machine you're sending IPIs to all CPUs, that's not
free either.

And I think the only arch where you can do text pokes without cross-cpu
synchronization is one that doesn't have SMP support.

2010-11-24 15:25:18

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:21 -0500, Jason Baron wrote:
>
> The keys are simply the address of a variable or structure (so as to be
> unique). We can put pointers or anything else in the variable.

If you keep the key type fixed, like say an atomic_t, you can easily
read it and know if an jump-label is enabled or disabled if you load its
text later (modules).

2010-11-24 15:43:18

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 04:24:05PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-24 at 10:19 -0500, Jason Baron wrote:
> > On Wed, Nov 24, 2010 at 04:11:18PM +0100, Peter Zijlstra wrote:
> > > On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
> > > > On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
> > > > > On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
> > > > > > struct hlist_head modules;
> > > > > > unsigned long key;
> > > > > > + u32 nr_entries : 31,
> > > > > > + enabled : 1;
> > > > > > };
> > > > >
> > > > > I still don't see why you do this, why not simply mandate that the key
> > > > > is of type atomic_t* and use *key as enabled state?
> > > > >
> > > >
> > > > Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
> > > > In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
> > > > That way we don't need any hashing.
> > >
> > > But but but, you're doing a friggin stop_machine to poke text, that's
> > > way more expensive than anything else.
> > >
> >
> > Yes, but other arches do not require stop_machine(). Also, there is work
> > for x86 to make the code patching happen without stop_machine().
>
> Even without stop machine you're sending IPIs to all CPUs, that's not
> free either.
>
> And I think the only arch where you can do text pokes without cross-cpu
> synchronization is one that doesn't have SMP support.
>
>

is this really true?

The powerpc implementation uses patch_instruction():


arch/powerpc/lib/code-patching.c:

void patch_instruction(unsigned int *addr, unsigned int instr)
{
*addr = instr;
asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
(addr));
}


And sparc does uses flushi():


include/asm/system_64.h:

#define flushi(addr) __asm__ __volatile__ ("flush %0" : : "r" (addr)
: "memory")

thanks,

-Jason

2010-11-24 15:53:21

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:42 -0500, Jason Baron wrote:

> > And I think the only arch where you can do text pokes without cross-cpu
> > synchronization is one that doesn't have SMP support.
> >
> >
>
> is this really true?
>
> The powerpc implementation uses patch_instruction():
>
>
> arch/powerpc/lib/code-patching.c:
>
> void patch_instruction(unsigned int *addr, unsigned int instr)
> {
> *addr = instr;
> asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
> (addr));
> }

Is this ever called outside of boot up? After SMP is enabled? (besides
for creating trampolines, before they are used).

-- Steve

2010-11-24 16:56:11

by David Daney

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On 11/24/2010 07:42 AM, Jason Baron wrote:
> On Wed, Nov 24, 2010 at 04:24:05PM +0100, Peter Zijlstra wrote:
>> On Wed, 2010-11-24 at 10:19 -0500, Jason Baron wrote:
>>> On Wed, Nov 24, 2010 at 04:11:18PM +0100, Peter Zijlstra wrote:
>>>> On Wed, 2010-11-24 at 09:54 -0500, Jason Baron wrote:
>>>>> On Wed, Nov 24, 2010 at 09:20:09AM +0100, Peter Zijlstra wrote:
>>>>>> On Tue, 2010-11-23 at 16:27 -0500, Jason Baron wrote:
>>>>>>> struct hlist_head modules;
>>>>>>> unsigned long key;
>>>>>>> + u32 nr_entries : 31,
>>>>>>> + enabled : 1;
>>>>>>> };
>>>>>>
>>>>>> I still don't see why you do this, why not simply mandate that the key
>>>>>> is of type atomic_t* and use *key as enabled state?
>>>>>>
>>>>>
>>>>> Because I want to use *key as a pointer directly to 'struct jump_label_entry'.
>>>>> In this way jump_label_enable(), jump_label_disable(), become O(1) operations.
>>>>> That way we don't need any hashing.
>>>>
>>>> But but but, you're doing a friggin stop_machine to poke text, that's
>>>> way more expensive than anything else.
>>>>
>>>
>>> Yes, but other arches do not require stop_machine(). Also, there is work
>>> for x86 to make the code patching happen without stop_machine().
>>
>> Even without stop machine you're sending IPIs to all CPUs, that's not
>> free either.
>>
>> And I think the only arch where you can do text pokes without cross-cpu
>> synchronization is one that doesn't have SMP support.
>>
>>
>
> is this really true?
>

For MIPS SMP it is. ICache invalidation is done per CPU. If you need
the change to be visible on all CPUs, you have to do an IPI to get all
CPUs to do their own ICache invalidation.


> The powerpc implementation uses patch_instruction():
>
>
> arch/powerpc/lib/code-patching.c:
>
> void patch_instruction(unsigned int *addr, unsigned int instr)
> {
> *addr = instr;
> asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
> (addr));
> }
>
>
> And sparc does uses flushi():
>
>
> include/asm/system_64.h:
>
> #define flushi(addr) __asm__ __volatile__ ("flush %0" : : "r" (addr)
> : "memory")
>

I don't know how SPARC and PPC work, but does that really work on SMP
(or even NUMA) machines?

David Daney

2010-11-24 18:25:14

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Tue, Nov 23, 2010 at 07:24:11PM -0500, Mathieu Desnoyers wrote:
> * Steven Rostedt ([email protected]) wrote:
> > On Tue, 2010-11-23 at 18:43 -0500, Mathieu Desnoyers wrote:
> > > * Jason Baron ([email protected]) wrote:
> > > [...]
> > > > +static void update_jump_label_module(struct module *mod)
> > > > +{
> > > > + struct hlist_head *head;
> > > > + struct hlist_node *node, *node_next, *module_node, *module_node_next;
> > > > + struct jump_label_entry *e;
> > > > + struct jump_label_module_entry *e_module;
> > > > + struct jump_entry *iter;
> > > > + int i, count;
> > > > +
> > > > + /* if the module doesn't have jump label entries, just return */
> > > > + if (!mod->num_jump_entries)
> > > > + return;
> > > > +
> > > > + for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
> > > > + head = &jump_label_table[i];
> > > > + hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
> > > > + if (!e->enabled)
> > > > + continue;
> > > > + hlist_for_each_entry_safe(e_module, module_node,
> > > > + module_node_next,
> > > > + &(e->modules), hlist) {
> > > > + if (e_module->mod != mod)
> > > > + continue;
> > >
> > > Ouch.
> > >
> > > Could you iterate on the loaded/unloaded module jump labels and do hash
> > > table lookups rather than doing this O(n^3) iteration ?
> >
> > This does not look O(n^3) to me.
> >
> > It's a hash table, thus the first two loops is just iterating O(n) items
> > in the hash.
>
> Good point.
>
> >
> > And the third loop is all the modules that use a particular event.
> >
> > So it is O(n*m) where n is the number of events, and m is the number of
> > modules attached to the events. And that's only if all events are used
> > by those modules. The actual case is much smaller.
>
> Still, I wonder if the O(n) iteration on all events could be shrinked to
> an interation on only the events present in the loaded/unloaded module ?
> I think I did something like that for immediate values already. It might
> apply (or not) here, just a thought.
>
> Thanks,
>
> Mathieu
>
>

indeed. here's a re-spun patch that only iterates over the events in
loaded module.

In practice, at least in my testing, most of these iterations tend to be
O(n), since almost all the tracepoints are used in one location. The
exceptions being kmalloc (which I believe there are patches pending to
put it in a centralized location), and module_get. In addition they are
on the slow module load/unload paths.

thanks,

-Jason

By storing the state of the jump label with each key, we make
sure that when modules are added, they are updated to the correct
state. For example, if the kmalloc tracepoint is enabled and
a module is added which has kmalloc, we make sure that the tracepoint
is properly enabled on module load.

Also, if jump_label_enable(key), is called but the key has not yet
been added to the hashtable of jump label keys, add 'key' to the table.
In this way, if key value has its state updated, but we have not
yet encountered a JUMP_LABEL() definition for it (if its located in
a module), we ensure that the jump label is set to the correct
state when it finally is encountered.

When modules are unloaded, we traverse the jump label hashtable,
and remove any entries that have a key value that is contained
by that module's text section. In this way key values are properly
unregistered, and can be re-used.

Signed-off-by: Jason Baron <[email protected]>
---
kernel/jump_label.c | 71 +++++++++++++++++++++++++++++++++++----------------
1 files changed, 49 insertions(+), 22 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 3b79bd9..c27e004 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -26,10 +26,11 @@ static DEFINE_MUTEX(jump_label_mutex);
struct jump_label_entry {
struct hlist_node hlist;
struct jump_entry *table;
- int nr_entries;
/* hang modules off here */
struct hlist_head modules;
unsigned long key;
+ u32 nr_entries : 31,
+ enabled : 1;
};

struct jump_label_module_entry {
@@ -108,6 +109,7 @@ add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
e->key = key;
e->table = table;
e->nr_entries = nr_entries;
+ e->enabled = 0;
INIT_HLIST_HEAD(&(e->modules));
hlist_add_head(&e->hlist, head);
return e;
@@ -164,27 +166,37 @@ void jump_label_update(unsigned long key, enum jump_label_type type)

jump_label_lock();
entry = get_jump_label_entry((jump_label_t)key);
- if (entry) {
- count = entry->nr_entries;
- iter = entry->table;
+ if (!entry) {
+ if (type == JUMP_LABEL_ENABLE) {
+ entry = add_jump_label_entry(key, 0, NULL);
+ if (IS_ERR(entry))
+ goto out;
+ } else
+ goto out;
+ }
+ if (type == JUMP_LABEL_ENABLE)
+ entry->enabled = 1;
+ else
+ entry->enabled = 0;
+ count = entry->nr_entries;
+ iter = entry->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ /* enable/disable jump labels in modules */
+ hlist_for_each_entry(e_module, module_node, &(entry->modules),
+ hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
while (count--) {
- if (kernel_text_address(iter->code))
+ if (iter->key && kernel_text_address(iter->code))
arch_jump_label_transform(iter, type);
iter++;
}
- /* eanble/disable jump labels in modules */
- hlist_for_each_entry(e_module, module_node, &(entry->modules),
- hlist) {
- count = e_module->nr_entries;
- iter = e_module->table;
- while (count--) {
- if (iter->key &&
- kernel_text_address(iter->code))
- arch_jump_label_transform(iter, type);
- iter++;
- }
- }
}
+out:
jump_label_unlock();
}

@@ -305,6 +317,7 @@ add_jump_label_module_entry(struct jump_label_entry *entry,
int count, struct module *mod)
{
struct jump_label_module_entry *e;
+ struct jump_entry *iter;

e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
if (!e)
@@ -313,6 +326,13 @@ add_jump_label_module_entry(struct jump_label_entry *entry,
e->nr_entries = count;
e->table = iter_begin;
hlist_add_head(&e->hlist, &entry->modules);
+ if (entry->enabled) {
+ iter = iter_begin;
+ while (count--) {
+ arch_jump_label_transform(iter, JUMP_LABEL_ENABLE);
+ iter++;
+ }
+ }
return e;
}

@@ -360,10 +380,6 @@ static void remove_jump_label_module(struct module *mod)
struct jump_label_module_entry *e_module;
int i;

- /* if the module doesn't have jump label entries, just return */
- if (!mod->num_jump_entries)
- return;
-
for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
head = &jump_label_table[i];
hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
@@ -375,10 +391,21 @@ static void remove_jump_label_module(struct module *mod)
kfree(e_module);
}
}
+ }
+ }
+ /* now check if any keys can be removed */
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ if (!within_module_core(e->key, mod))
+ continue;
if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
hlist_del(&e->hlist);
kfree(e);
+ continue;
}
+ WARN(1, KERN_ERR "jump label: "
+ "tyring to remove used key: %lu !\n", e->key);
}
}
}
@@ -470,7 +497,7 @@ void jump_label_apply_nops(struct module *mod)

struct notifier_block jump_label_module_nb = {
.notifier_call = jump_label_module_notify,
- .priority = 0,
+ .priority = 1, /* higher than tracepoints */
};

static __init int init_jump_label_module(void)
--
1.7.1

2010-11-24 18:39:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 13:24 -0500, Jason Baron wrote:
> By storing the state of the jump label with each key, we make
> sure that when modules are added, they are updated to the correct
> state. For example, if the kmalloc tracepoint is enabled and
> a module is added which has kmalloc, we make sure that the tracepoint
> is properly enabled on module load.
>
> Also, if jump_label_enable(key), is called but the key has not yet
> been added to the hashtable of jump label keys, add 'key' to the table.
> In this way, if key value has its state updated, but we have not
> yet encountered a JUMP_LABEL() definition for it (if its located in
> a module), we ensure that the jump label is set to the correct
> state when it finally is encountered.
>
> When modules are unloaded, we traverse the jump label hashtable,
> and remove any entries that have a key value that is contained
> by that module's text section. In this way key values are properly
> unregistered, and can be re-used.

So why again are we adding all this complexity? Does this really need to
be optimized in the face of how expensive text pokes are?

2010-11-24 19:07:56

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 07:39:30PM +0100, Peter Zijlstra wrote:
> On Wed, 2010-11-24 at 13:24 -0500, Jason Baron wrote:
> > By storing the state of the jump label with each key, we make
> > sure that when modules are added, they are updated to the correct
> > state. For example, if the kmalloc tracepoint is enabled and
> > a module is added which has kmalloc, we make sure that the tracepoint
> > is properly enabled on module load.
> >
> > Also, if jump_label_enable(key), is called but the key has not yet
> > been added to the hashtable of jump label keys, add 'key' to the table.
> > In this way, if key value has its state updated, but we have not
> > yet encountered a JUMP_LABEL() definition for it (if its located in
> > a module), we ensure that the jump label is set to the correct
> > state when it finally is encountered.
> >
> > When modules are unloaded, we traverse the jump label hashtable,
> > and remove any entries that have a key value that is contained
> > by that module's text section. In this way key values are properly
> > unregistered, and can be re-used.
>
> So why again are we adding all this complexity? Does this really need to
> be optimized in the face of how expensive text pokes are?

perhaps, not. But I would like to better understand how expensive text
pokes are on some other arches such as powerpc first.

Obviously, I want the best implementation - the last posted version of
binary search, had a deadlock with the module_mutex, which I saw no
resolution for...

So, with this 1 additinal patch, we have a complete working implemenation of
jump labels within the current framework. We can iterate from there if ppl want
to simpify/optimize various parts. That's how I envision proceeding.

thanks,

-Jason

2010-11-24 19:10:31

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:21 -0500, Jason Baron wrote:

> The keys are simply the address of a variable or structure (so as to be
> unique). We can put pointers or anything else in the variable.

Note, there is not a 1 to 1 with keys and places that need to be
patched.

Doing the following objdump:

objdump -dr vmlinux | grep 'jmpq.*<trace_kmalloc' | wc -l
375

That's 375 instances of kmalloc tracepoints[*] and one key. How do you
handle this?

-- Steve




[*] this should be fixed, we probably should find a way to move the
tracepoint into the kmalloc functions that are not inlined.

2010-11-24 19:18:48

by Jason Baron

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, Nov 24, 2010 at 10:57:26AM -0500, Steven Rostedt wrote:
> On Wed, 2010-11-24 at 10:21 -0500, Jason Baron wrote:
>
> > The keys are simply the address of a variable or structure (so as to be
> > unique). We can put pointers or anything else in the variable.
>
> Note, there is not a 1 to 1 with keys and places that need to be
> patched.
>
> Doing the following objdump:
>
> objdump -dr vmlinux | grep 'jmpq.*<trace_kmalloc' | wc -l
> 375
>
> That's 375 instances of kmalloc tracepoints[*] and one key. How do you
> handle this?
>

so there is 1 key associated with a kmalloc, call it key 'a'.
Then, each time a trace_kmalloc is found in the text it generates a
entry in the jump label section:

[to be patched address i] [address to jump to j] [key a]
[to be patched address k] [address to jump to l] [key a]
.
.

So when we do jump_label_enable(key a), we patch all addresses
associated with key a. Does this make sense?


> -- Steve
>
>
>
>
> [*] this should be fixed, we probably should find a way to move the
> tracepoint into the kmalloc functions that are not inlined.
>

I believe there are patches pending for this.

thanks,

-Jason

2010-11-25 02:39:25

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Wed, 2010-11-24 at 10:53 -0500, Steven Rostedt wrote:
> On Wed, 2010-11-24 at 10:42 -0500, Jason Baron wrote:
>
> > > And I think the only arch where you can do text pokes without cross-cpu
> > > synchronization is one that doesn't have SMP support.
> > >
> > >
> >
> > is this really true?
> >
> > The powerpc implementation uses patch_instruction():
> >
> >
> > arch/powerpc/lib/code-patching.c:
> >
> > void patch_instruction(unsigned int *addr, unsigned int instr)
> > {
> > *addr = instr;
> > asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
> > (addr));
> > }
>
> Is this ever called outside of boot up? After SMP is enabled? (besides
> for creating trampolines, before they are used).

It is now :)

AFAIK it works fine, the icbi invalidates across all processors. The
only issue is that it's not precise, ie. another CPU might not see the
update immediately, but as soon as it takes an interrupt or something it
will.

What would suit us would be to have an arch callback that is called
after all the transforms for a particular jump label key have been made.
That way we could optimise the individual patches, and do a sync step at
the end, ie. when we want the effect of the patching to be globally
visible.

cheers



Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-11-25 06:52:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Thu, 2010-11-25 at 13:39 +1100, Michael Ellerman wrote:
> > > arch/powerpc/lib/code-patching.c:
> > >
> > > void patch_instruction(unsigned int *addr, unsigned int instr)
> > > {
> > > *addr = instr;
> > > asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
> > > (addr));
> > > }
> >
> > Is this ever called outside of boot up? After SMP is enabled? (besides
> > for creating trampolines, before they are used).
>
> It is now :)
>
> AFAIK it works fine, the icbi invalidates across all processors. The
> only issue is that it's not precise, ie. another CPU might not see the
> update immediately, but as soon as it takes an interrupt or something it
> will.

Ooh, nice, so the CPUs won't get all confused because you change code
from under their ifetch cache?

How expensive is this icbi ins?

> What would suit us would be to have an arch callback that is called
> after all the transforms for a particular jump label key have been made.
> That way we could optimise the individual patches, and do a sync step at
> the end, ie. when we want the effect of the patching to be globally
> visible.

I think such a sync-barrier is desired (possibly only on the enable
path) so we can actually say the tracepoints are on.

Which would mean sending IPIs to all CPUs and waiting for them to
acknowledge them. Which, while not quite as expensive as stop_machine,
its not really cheap either.

2010-11-25 13:42:47

by Michael Ellerman

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Thu, 2010-11-25 at 07:52 +0100, Peter Zijlstra wrote:
> On Thu, 2010-11-25 at 13:39 +1100, Michael Ellerman wrote:
> > > > arch/powerpc/lib/code-patching.c:
> > > >
> > > > void patch_instruction(unsigned int *addr, unsigned int instr)
> > > > {
> > > > *addr = instr;
> > > > asm ("dcbst 0, %0; sync; icbi 0,%0; sync; isync" : : "r"
> > > > (addr));
> > > > }
> > >
> > > Is this ever called outside of boot up? After SMP is enabled? (besides
> > > for creating trampolines, before they are used).
> >
> > It is now :)
> >
> > AFAIK it works fine, the icbi invalidates across all processors. The
> > only issue is that it's not precise, ie. another CPU might not see the
> > update immediately, but as soon as it takes an interrupt or something it
> > will.
>
> Ooh, nice, so the CPUs won't get all confused because you change code
> from under their ifetch cache?

Apparently not, at least according to the architecture.

> How expensive is this icbi ins?

Well apparently it's free :) But it has the effect of causing the
subsequent isync to do the real work, so that is probably expensive.
Benh can tell you all the gory details.

> > What would suit us would be to have an arch callback that is called
> > after all the transforms for a particular jump label key have been made.
> > That way we could optimise the individual patches, and do a sync step at
> > the end, ie. when we want the effect of the patching to be globally
> > visible.
>
> I think such a sync-barrier is desired (possibly only on the enable
> path) so we can actually say the tracepoints are on.

It might be nice on disable too, presumably the tracepoint code is happy
with them firing more or less any time, but other users of jump labels
might want a firmer guarantee that they are disabled.

> Which would mean sending IPIs to all CPUs and waiting for them to
> acknowledge them. Which, while not quite as expensive as stop_machine,
> its not really cheap either.

True, and presumably the sync point is once per jump label, ie. once per
tracepoint. For tracepoints you'd actually be happy with eg. perf
enabling a whole bunch, and then syncing just at the end, but that's
probably premature optimisation.

cheers



Attachments:
signature.asc (198.00 B)
This is a digitally signed message part

2010-11-25 14:00:12

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

* Peter Zijlstra ([email protected]) wrote:
[...]
> > What would suit us would be to have an arch callback that is called
> > after all the transforms for a particular jump label key have been made.
> > That way we could optimise the individual patches, and do a sync step at
> > the end, ie. when we want the effect of the patching to be globally
> > visible.
>
> I think such a sync-barrier is desired (possibly only on the enable
> path) so we can actually say the tracepoints are on.
>
> Which would mean sending IPIs to all CPUs and waiting for them to
> acknowledge them. Which, while not quite as expensive as stop_machine,
> its not really cheap either.

Yep, although this can be batched when enabling many tracepoints en
masse. May I suggest that you guys benchmark the two approaches so we
can figure out at how many tracepoints we start hitting a latency wall ?
100, 1000 and 10000 tracepoints should give interesting measurement
points. If we are still below 2 seconds on common hardware when enabling
10000 tracepoints, then the binary search might be fine.

Please note that HPA recommended the use of a perfect hash. It would
make sense, although there seems to be a non-null probability that the
perfect hash cannot be generated. There are techniques that will retry
with a different seed, but the kernel build time then becomes slightly
harder to predict (for very, very rare occurences, so maybe we don't
care).

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

2010-11-25 21:27:25

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH 1/3] jump label: add enabled/disabled state to jump label key entries

On Fri, 2010-11-26 at 00:42 +1100, Michael Ellerman wrote:
> > Ooh, nice, so the CPUs won't get all confused because you change
> code
> > from under their ifetch cache?
>
> Apparently not, at least according to the architecture.

As long as you are atomically changing one word fully aligned (remember,
no variable length instructions on ppc :-) you are fine. The other CPU
will see either the old or the new value, not something in between.

The dcbf/sync/icbi/isync is really only necessary on older processors.
dcbf will broadcast a request to flush that line out of D, sync will
wait for that to complete, icbi will broadcast an invalidate of that
line out of I, sync will wait for that to have gone out and isync will
locally synchronize the pipeline (toss prefetch).

Now, P5 and later have a HW snoop of I/D, so dcbf isn't useful. You need
at least an isync tho to ensure prefetched stuff has been tossed or you
may still execute the "old" instructions for a little while.

On P7 (I'm not sure about 5 and 6 here), additionally, they have sneaky
optimisations in isync (bcs some people abuse it as a read barrier)
using a scoreboard to decide what to do. In essence, that means that
alone, it won't toss prefetch unless scoreboarded to do so by a previous
icbi. So one icbi (regardless of how much you want to invalidate and
with any address) followed by isync will do.

An interrupt will do too tho :-)

Cheers,
Ben.