Subject: [PATCH 0/3] kmemtrace over tracepoints

Hi everybody,

As suggested by Mathieu Desnoyers, here's kmemtrace w/ tracepoints instead
of markers. The RCU stuff has been giving me some headaches because of
circular dependencies, but I think I've got it right.

I'm currently working on another thing Mathieu suggested, that is
variable-length fields, so we don't waste space on non-64-bit arches.

Pekka, please merge this if it looks okay. I'll try to come up in time with
the stuff described above; however, should I not be ready in time, please
ask Linus to git-pull whatever we've got at present. Hopefully, it'll get
in this time.

Alexey Dobriyan, please see if that problem you reported when kmemtrace was
using markers is still there.

And (for everybody) please be kind and test and report whatever problems
appear if possible.


Happy holidays,
Eduard

Eduard - Gabriel Munteanu (3):
RCU: Move some definitions to minimal headers.
tracepoints: Include only minimal RCU headers in linux/tracepoint.h.
kmemtrace: Use tracepoints instead of markers.

include/linux/kmemtrace.h | 76 +++------------
include/linux/rcuclassic.h | 40 +--------
include/linux/rcuclassic_min.h | 77 +++++++++++++++
include/linux/rcupdate.h | 160 +-------------------------------
include/linux/rcupdate_min.h | 205 ++++++++++++++++++++++++++++++++++++++++
include/linux/rcupreempt.h | 32 +------
include/linux/rcupreempt_min.h | 68 +++++++++++++
include/linux/slab_def.h | 11 +-
include/linux/slub_def.h | 16 ++--
include/linux/tracepoint.h | 2 +-
lib/Kconfig.debug | 2 +-
mm/kmemtrace.c | 50 +++++-----
mm/slab.c | 27 +++---
mm/slob.c | 32 +++---
mm/slub.c | 36 ++++----
15 files changed, 457 insertions(+), 377 deletions(-)
create mode 100644 include/linux/rcuclassic_min.h
create mode 100644 include/linux/rcupdate_min.h
create mode 100644 include/linux/rcupreempt_min.h


Subject: [PATCH 1/3] RCU: Move some definitions to minimal headers.

linux/rcupdate.h used to include SLAB headers, which got in the way of
switching kmemtrace to use tracepoints instead of markers. The circular
header inclusion pattern that appeared was making it impossible to use
tracepoints in SLAB allocator inlines.

This moves some header code into separate files. As an added bonus,
preprocessing is faster if care is taken to use these minimal headers,
since no effort is spent on including SLAB headers.

Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
---
include/linux/rcuclassic.h | 40 +--------
include/linux/rcuclassic_min.h | 77 +++++++++++++++
include/linux/rcupdate.h | 160 +-------------------------------
include/linux/rcupdate_min.h | 205 ++++++++++++++++++++++++++++++++++++++++
include/linux/rcupreempt.h | 32 +------
include/linux/rcupreempt_min.h | 68 +++++++++++++
6 files changed, 353 insertions(+), 229 deletions(-)
create mode 100644 include/linux/rcuclassic_min.h
create mode 100644 include/linux/rcupdate_min.h
create mode 100644 include/linux/rcupreempt_min.h

diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
index 5f89b62..32f1512 100644
--- a/include/linux/rcuclassic.h
+++ b/include/linux/rcuclassic.h
@@ -39,6 +39,7 @@
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
+#include <linux/rcuclassic_min.h>

#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
#define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */
@@ -131,45 +132,6 @@ static inline void rcu_bh_qsctr_inc(int cpu)
extern int rcu_pending(int cpu);
extern int rcu_needs_cpu(int cpu);

-#ifdef CONFIG_DEBUG_LOCK_ALLOC
-extern struct lockdep_map rcu_lock_map;
-# define rcu_read_acquire() \
- lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
-# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
-#else
-# define rcu_read_acquire() do { } while (0)
-# define rcu_read_release() do { } while (0)
-#endif
-
-#define __rcu_read_lock() \
- do { \
- preempt_disable(); \
- __acquire(RCU); \
- rcu_read_acquire(); \
- } while (0)
-#define __rcu_read_unlock() \
- do { \
- rcu_read_release(); \
- __release(RCU); \
- preempt_enable(); \
- } while (0)
-#define __rcu_read_lock_bh() \
- do { \
- local_bh_disable(); \
- __acquire(RCU_BH); \
- rcu_read_acquire(); \
- } while (0)
-#define __rcu_read_unlock_bh() \
- do { \
- rcu_read_release(); \
- __release(RCU_BH); \
- local_bh_enable(); \
- } while (0)
-
-#define __synchronize_sched() synchronize_rcu()
-
-#define call_rcu_sched(head, func) call_rcu(head, func)
-
extern void __rcu_init(void);
#define rcu_init_sched() do { } while (0)
extern void rcu_check_callbacks(int cpu, int user);
diff --git a/include/linux/rcuclassic_min.h b/include/linux/rcuclassic_min.h
new file mode 100644
index 0000000..7ea051b
--- /dev/null
+++ b/include/linux/rcuclassic_min.h
@@ -0,0 +1,77 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion (classic version)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2001
+ *
+ * Author: Dipankar Sarma <[email protected]>
+ *
+ * Based on the original work by Paul McKenney <[email protected]>
+ * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
+ * Papers:
+ * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
+ * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU
+ *
+ */
+
+#ifndef _LINUX_RCUCLASSIC_MIN_H
+#define _LINUX_RCUCLASSIC_MIN_H
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#include <linux/lockdep.h>
+extern struct lockdep_map rcu_lock_map;
+# define rcu_read_acquire() \
+ lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
+# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
+#else
+# define rcu_read_acquire() do { } while (0)
+# define rcu_read_release() do { } while (0)
+#endif
+
+#define __rcu_read_lock() \
+ do { \
+ preempt_disable(); \
+ __acquire(RCU); \
+ rcu_read_acquire(); \
+ } while (0)
+#define __rcu_read_unlock() \
+ do { \
+ rcu_read_release(); \
+ __release(RCU); \
+ preempt_enable(); \
+ } while (0)
+#define __rcu_read_lock_bh() \
+ do { \
+ local_bh_disable(); \
+ __acquire(RCU_BH); \
+ rcu_read_acquire(); \
+ } while (0)
+#define __rcu_read_unlock_bh() \
+ do { \
+ rcu_read_release(); \
+ __release(RCU_BH); \
+ local_bh_enable(); \
+ } while (0)
+
+#define __synchronize_sched() synchronize_rcu()
+
+#define call_rcu_sched(head, func) call_rcu(head, func)
+
+#endif /* _LINUX_RCUCLASSIC_MIN_H */
+
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 86f1f5e..d7659e8 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -41,16 +41,7 @@
#include <linux/seqlock.h>
#include <linux/lockdep.h>
#include <linux/completion.h>
-
-/**
- * struct rcu_head - callback structure for use with RCU
- * @next: next update requests in a list
- * @func: actual update function to call after the grace period.
- */
-struct rcu_head {
- struct rcu_head *next;
- void (*func)(struct rcu_head *head);
-};
+#include <linux/rcupdate_min.h>

#ifdef CONFIG_CLASSIC_RCU
#include <linux/rcuclassic.h>
@@ -64,131 +55,6 @@ struct rcu_head {
(ptr)->next = NULL; (ptr)->func = NULL; \
} while (0)

-/**
- * rcu_read_lock - mark the beginning of an RCU read-side critical section.
- *
- * When synchronize_rcu() is invoked on one CPU while other CPUs
- * are within RCU read-side critical sections, then the
- * synchronize_rcu() is guaranteed to block until after all the other
- * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
- * on one CPU while other CPUs are within RCU read-side critical
- * sections, invocation of the corresponding RCU callback is deferred
- * until after the all the other CPUs exit their critical sections.
- *
- * Note, however, that RCU callbacks are permitted to run concurrently
- * with RCU read-side critical sections. One way that this can happen
- * is via the following sequence of events: (1) CPU 0 enters an RCU
- * read-side critical section, (2) CPU 1 invokes call_rcu() to register
- * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
- * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
- * callback is invoked. This is legal, because the RCU read-side critical
- * section that was running concurrently with the call_rcu() (and which
- * therefore might be referencing something that the corresponding RCU
- * callback would free up) has completed before the corresponding
- * RCU callback is invoked.
- *
- * RCU read-side critical sections may be nested. Any deferred actions
- * will be deferred until the outermost RCU read-side critical section
- * completes.
- *
- * It is illegal to block while in an RCU read-side critical section.
- */
-#define rcu_read_lock() __rcu_read_lock()
-
-/**
- * rcu_read_unlock - marks the end of an RCU read-side critical section.
- *
- * See rcu_read_lock() for more information.
- */
-
-/*
- * So where is rcu_write_lock()? It does not exist, as there is no
- * way for writers to lock out RCU readers. This is a feature, not
- * a bug -- this property is what provides RCU's performance benefits.
- * Of course, writers must coordinate with each other. The normal
- * spinlock primitives work well for this, but any other technique may be
- * used as well. RCU does not care how the writers keep out of each
- * others' way, as long as they do so.
- */
-#define rcu_read_unlock() __rcu_read_unlock()
-
-/**
- * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
- *
- * This is equivalent of rcu_read_lock(), but to be used when updates
- * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
- * consider completion of a softirq handler to be a quiescent state,
- * a process in RCU read-side critical section must be protected by
- * disabling softirqs. Read-side critical sections in interrupt context
- * can use just rcu_read_lock().
- *
- */
-#define rcu_read_lock_bh() __rcu_read_lock_bh()
-
-/*
- * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
- *
- * See rcu_read_lock_bh() for more information.
- */
-#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
-
-/**
- * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
- *
- * Should be used with either
- * - synchronize_sched()
- * or
- * - call_rcu_sched() and rcu_barrier_sched()
- * on the write-side to insure proper synchronization.
- */
-#define rcu_read_lock_sched() preempt_disable()
-
-/*
- * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
- *
- * See rcu_read_lock_sched for more information.
- */
-#define rcu_read_unlock_sched() preempt_enable()
-
-
-
-/**
- * rcu_dereference - fetch an RCU-protected pointer in an
- * RCU read-side critical section. This pointer may later
- * be safely dereferenced.
- *
- * Inserts memory barriers on architectures that require them
- * (currently only the Alpha), and, more importantly, documents
- * exactly which pointers are protected by RCU.
- */
-
-#define rcu_dereference(p) ({ \
- typeof(p) _________p1 = ACCESS_ONCE(p); \
- smp_read_barrier_depends(); \
- (_________p1); \
- })
-
-/**
- * rcu_assign_pointer - assign (publicize) a pointer to a newly
- * initialized structure that will be dereferenced by RCU read-side
- * critical sections. Returns the value assigned.
- *
- * Inserts memory barriers on architectures that require them
- * (pretty much all of them other than x86), and also prevents
- * the compiler from reordering the code that initializes the
- * structure after the pointer assignment. More importantly, this
- * call documents which pointers will be dereferenced by RCU read-side
- * code.
- */
-
-#define rcu_assign_pointer(p, v) \
- ({ \
- if (!__builtin_constant_p(v) || \
- ((v) != NULL)) \
- smp_wmb(); \
- (p) = (v); \
- })
-
/* Infrastructure to implement the synchronize_() primitives. */

struct rcu_synchronize {
@@ -211,24 +77,6 @@ void name(void) \
}

/**
- * synchronize_sched - block until all CPUs have exited any non-preemptive
- * kernel code sequences.
- *
- * This means that all preempt_disable code sequences, including NMI and
- * hardware-interrupt handlers, in progress on entry will have completed
- * before this primitive returns. However, this does not guarantee that
- * softirq handlers will have completed, since in some kernels, these
- * handlers can run in process context, and can block.
- *
- * This primitive provides the guarantees made by the (now removed)
- * synchronize_kernel() API. In contrast, synchronize_rcu() only
- * guarantees that rcu_read_lock() sections will have completed.
- * In "classic RCU", these two guarantees happen to be one and
- * the same, but can differ in realtime RCU implementations.
- */
-#define synchronize_sched() __synchronize_sched()
-
-/**
* call_rcu - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
* @func: actual update function to be invoked after the grace period
@@ -263,12 +111,6 @@ extern void call_rcu(struct rcu_head *head,
extern void call_rcu_bh(struct rcu_head *head,
void (*func)(struct rcu_head *head));

-/* Exported common interfaces */
-extern void synchronize_rcu(void);
-extern void rcu_barrier(void);
-extern void rcu_barrier_bh(void);
-extern void rcu_barrier_sched(void);
-
/* Internal to kernel */
extern void rcu_init(void);
extern int rcu_needs_cpu(int cpu);
diff --git a/include/linux/rcupdate_min.h b/include/linux/rcupdate_min.h
new file mode 100644
index 0000000..35ca45d
--- /dev/null
+++ b/include/linux/rcupdate_min.h
@@ -0,0 +1,205 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright IBM Corporation, 2001
+ *
+ * Author: Dipankar Sarma <[email protected]>
+ *
+ * Based on the original work by Paul McKenney <[email protected]>
+ * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
+ * Papers:
+ * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
+ * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
+ *
+ * This is a minimal header. Use it if you don't require all definitions,
+ * as it doesn't end up including slab stuff and lots of other headers.
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * http://lse.sourceforge.net/locking/rcupdate.html
+ *
+ */
+
+#ifndef __LINUX_RCUPDATE_MIN_H
+#define __LINUX_RCUPDATE_MIN_H
+
+/**
+ * struct rcu_head - callback structure for use with RCU
+ * @next: next update requests in a list
+ * @func: actual update function to call after the grace period.
+ */
+struct rcu_head {
+ struct rcu_head *next;
+ void (*func)(struct rcu_head *head);
+};
+
+#ifdef CONFIG_CLASSIC_RCU
+#include <linux/rcuclassic_min.h>
+#else /* #ifdef CONFIG_CLASSIC_RCU */
+#include <linux/rcupreempt_min.h>
+#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
+
+/**
+ * rcu_read_lock - mark the beginning of an RCU read-side critical section.
+ *
+ * When synchronize_rcu() is invoked on one CPU while other CPUs
+ * are within RCU read-side critical sections, then the
+ * synchronize_rcu() is guaranteed to block until after all the other
+ * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
+ * on one CPU while other CPUs are within RCU read-side critical
+ * sections, invocation of the corresponding RCU callback is deferred
+ * until after the all the other CPUs exit their critical sections.
+ *
+ * Note, however, that RCU callbacks are permitted to run concurrently
+ * with RCU read-side critical sections. One way that this can happen
+ * is via the following sequence of events: (1) CPU 0 enters an RCU
+ * read-side critical section, (2) CPU 1 invokes call_rcu() to register
+ * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
+ * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
+ * callback is invoked. This is legal, because the RCU read-side critical
+ * section that was running concurrently with the call_rcu() (and which
+ * therefore might be referencing something that the corresponding RCU
+ * callback would free up) has completed before the corresponding
+ * RCU callback is invoked.
+ *
+ * RCU read-side critical sections may be nested. Any deferred actions
+ * will be deferred until the outermost RCU read-side critical section
+ * completes.
+ *
+ * It is illegal to block while in an RCU read-side critical section.
+ */
+#define rcu_read_lock() __rcu_read_lock()
+
+/**
+ * rcu_read_unlock - marks the end of an RCU read-side critical section.
+ *
+ * See rcu_read_lock() for more information.
+ */
+
+/*
+ * So where is rcu_write_lock()? It does not exist, as there is no
+ * way for writers to lock out RCU readers. This is a feature, not
+ * a bug -- this property is what provides RCU's performance benefits.
+ * Of course, writers must coordinate with each other. The normal
+ * spinlock primitives work well for this, but any other technique may be
+ * used as well. RCU does not care how the writers keep out of each
+ * others' way, as long as they do so.
+ */
+#define rcu_read_unlock() __rcu_read_unlock()
+
+/**
+ * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
+ *
+ * This is equivalent of rcu_read_lock(), but to be used when updates
+ * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
+ * consider completion of a softirq handler to be a quiescent state,
+ * a process in RCU read-side critical section must be protected by
+ * disabling softirqs. Read-side critical sections in interrupt context
+ * can use just rcu_read_lock().
+ *
+ */
+#define rcu_read_lock_bh() __rcu_read_lock_bh()
+
+/*
+ * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
+ *
+ * See rcu_read_lock_bh() for more information.
+ */
+#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
+
+/**
+ * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
+ *
+ * Should be used with either
+ * - synchronize_sched()
+ * or
+ * - call_rcu_sched() and rcu_barrier_sched()
+ * on the write-side to insure proper synchronization.
+ */
+#define rcu_read_lock_sched() preempt_disable()
+
+/*
+ * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
+ *
+ * See rcu_read_lock_sched for more information.
+ */
+#define rcu_read_unlock_sched() preempt_enable()
+
+
+
+/**
+ * rcu_dereference - fetch an RCU-protected pointer in an
+ * RCU read-side critical section. This pointer may later
+ * be safely dereferenced.
+ *
+ * Inserts memory barriers on architectures that require them
+ * (currently only the Alpha), and, more importantly, documents
+ * exactly which pointers are protected by RCU.
+ */
+
+#define rcu_dereference(p) ({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); \
+ (_________p1); \
+ })
+
+/**
+ * rcu_assign_pointer - assign (publicize) a pointer to a newly
+ * initialized structure that will be dereferenced by RCU read-side
+ * critical sections. Returns the value assigned.
+ *
+ * Inserts memory barriers on architectures that require them
+ * (pretty much all of them other than x86), and also prevents
+ * the compiler from reordering the code that initializes the
+ * structure after the pointer assignment. More importantly, this
+ * call documents which pointers will be dereferenced by RCU read-side
+ * code.
+ */
+
+#define rcu_assign_pointer(p, v) \
+ ({ \
+ if (!__builtin_constant_p(v) || \
+ ((v) != NULL)) \
+ smp_wmb(); \
+ (p) = (v); \
+ })
+
+/**
+ * synchronize_sched - block until all CPUs have exited any non-preemptive
+ * kernel code sequences.
+ *
+ * This means that all preempt_disable code sequences, including NMI and
+ * hardware-interrupt handlers, in progress on entry will have completed
+ * before this primitive returns. However, this does not guarantee that
+ * softirq handlers will have completed, since in some kernels, these
+ * handlers can run in process context, and can block.
+ *
+ * This primitive provides the guarantees made by the (now removed)
+ * synchronize_kernel() API. In contrast, synchronize_rcu() only
+ * guarantees that rcu_read_lock() sections will have completed.
+ * In "classic RCU", these two guarantees happen to be one and
+ * the same, but can differ in realtime RCU implementations.
+ */
+#define synchronize_sched() __synchronize_sched()
+
+/* Exported common interfaces */
+extern void synchronize_rcu(void);
+extern void rcu_barrier(void);
+extern void rcu_barrier_bh(void);
+extern void rcu_barrier_sched(void);
+
+#endif /* __LINUX_RCUPDATE_MIN_H */
+
diff --git a/include/linux/rcupreempt.h b/include/linux/rcupreempt.h
index 3e05c09..1f1dcf1 100644
--- a/include/linux/rcupreempt.h
+++ b/include/linux/rcupreempt.h
@@ -39,6 +39,7 @@
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
+#include <linux/rcupreempt_min.h>

struct rcu_dyntick_sched {
int dynticks;
@@ -58,37 +59,6 @@ static inline void rcu_qsctr_inc(int cpu)
}
#define rcu_bh_qsctr_inc(cpu)

-/*
- * Someone might want to pass call_rcu_bh as a function pointer.
- * So this needs to just be a rename and not a macro function.
- * (no parentheses)
- */
-#define call_rcu_bh call_rcu
-
-/**
- * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
- * @head: structure to be used for queueing the RCU updates.
- * @func: actual update function to be invoked after the grace period
- *
- * The update function will be invoked some time after a full
- * synchronize_sched()-style grace period elapses, in other words after
- * all currently executing preempt-disabled sections of code (including
- * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
- * completed.
- */
-extern void call_rcu_sched(struct rcu_head *head,
- void (*func)(struct rcu_head *head));
-
-extern void __rcu_read_lock(void) __acquires(RCU);
-extern void __rcu_read_unlock(void) __releases(RCU);
-extern int rcu_pending(int cpu);
-extern int rcu_needs_cpu(int cpu);
-
-#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
-#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
-
-extern void __synchronize_sched(void);
-
extern void __rcu_init(void);
extern void rcu_init_sched(void);
extern void rcu_check_callbacks(int cpu, int user);
diff --git a/include/linux/rcupreempt_min.h b/include/linux/rcupreempt_min.h
new file mode 100644
index 0000000..37d3693
--- /dev/null
+++ b/include/linux/rcupreempt_min.h
@@ -0,0 +1,68 @@
+/*
+ * Read-Copy Update mechanism for mutual exclusion (RT implementation)
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2006
+ *
+ * Author: Paul McKenney <[email protected]>
+ *
+ * Based on the original work by Paul McKenney <[email protected]>
+ * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
+ * Papers:
+ * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
+ * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
+ *
+ * For detailed explanation of Read-Copy Update mechanism see -
+ * Documentation/RCU
+ *
+ */
+
+#ifndef __LINUX_RCUPREEMPT_MIN_H
+#define __LINUX_RCUPREEMPT_MIN_H
+
+/*
+ * Someone might want to pass call_rcu_bh as a function pointer.
+ * So this needs to just be a rename and not a macro function.
+ * (no parentheses)
+ */
+#define call_rcu_bh call_rcu
+
+/**
+ * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual update function to be invoked after the grace period
+ *
+ * The update function will be invoked some time after a full
+ * synchronize_sched()-style grace period elapses, in other words after
+ * all currently executing preempt-disabled sections of code (including
+ * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
+ * completed.
+ */
+extern void call_rcu_sched(struct rcu_head *head,
+ void (*func)(struct rcu_head *head));
+
+extern void __rcu_read_lock(void) __acquires(RCU);
+extern void __rcu_read_unlock(void) __releases(RCU);
+extern int rcu_pending(int cpu);
+extern int rcu_needs_cpu(int cpu);
+
+#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
+#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
+
+extern void __synchronize_sched(void);
+
+#endif /* __LINUX_RCUPREEMPT_MIN_H */
+
--
1.6.0.6

Subject: [PATCH 2/3] tracepoints: Include only minimal RCU headers in linux/tracepoint.h.

This allows kmemtrace to work nicely with tracepoints, since RCU headers
don't end up anymore including SLAB headers. It might also speed up
compilation because there's less preprocessing involved.

Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
---
include/linux/tracepoint.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index c5bb39c..86b0e8a 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -15,7 +15,7 @@
*/

#include <linux/types.h>
-#include <linux/rcupdate.h>
+#include <linux/rcupdate_min.h>

struct module;
struct tracepoint;
--
1.6.0.6

Subject: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

kmemtrace now uses tracepoints instead of markers. We no longer need to
use format specifiers. Also dropped kmemtrace_mark_alloc_node(), since
it's easy to pass -1 as the NUMA node without providing a needless and
long-named variant of the same function.

Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
---
include/linux/kmemtrace.h | 76 +++++++++-----------------------------------
include/linux/slab_def.h | 11 +++---
include/linux/slub_def.h | 16 +++++-----
lib/Kconfig.debug | 2 +-
mm/kmemtrace.c | 50 ++++++++++++++---------------
mm/slab.c | 27 ++++++++--------
mm/slob.c | 32 +++++++++---------
mm/slub.c | 36 ++++++++++----------
8 files changed, 103 insertions(+), 147 deletions(-)

diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
index 5bea8ea..7b38245 100644
--- a/include/linux/kmemtrace.h
+++ b/include/linux/kmemtrace.h
@@ -9,8 +9,8 @@

#ifdef __KERNEL__

+#include <linux/tracepoint.h>
#include <linux/types.h>
-#include <linux/marker.h>

enum kmemtrace_type_id {
KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
@@ -18,67 +18,23 @@ enum kmemtrace_type_id {
KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
};

-#ifdef CONFIG_KMEMTRACE
-
extern void kmemtrace_init(void);

-static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags,
- int node)
-{
- trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
- "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
- type_id, call_site, (unsigned long) ptr,
- (unsigned long) bytes_req, (unsigned long) bytes_alloc,
- (unsigned long) gfp_flags, node);
-}
-
-static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr)
-{
- trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
- type_id, call_site, (unsigned long) ptr);
-}
-
-#else /* CONFIG_KMEMTRACE */
-
-static inline void kmemtrace_init(void)
-{
-}
-
-static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags,
- int node)
-{
-}
-
-static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr)
-{
-}
-
-#endif /* CONFIG_KMEMTRACE */
-
-static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
- unsigned long call_site,
- const void *ptr,
- size_t bytes_req,
- size_t bytes_alloc,
- gfp_t gfp_flags)
-{
- kmemtrace_mark_alloc_node(type_id, call_site, ptr,
- bytes_req, bytes_alloc, gfp_flags, -1);
-}
+DEFINE_TRACE(kmemtrace_alloc,
+ TPPROTO(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node),
+ TPARGS(type_id, call_site, ptr,
+ bytes_req, bytes_alloc, gfp_flags, node));
+DEFINE_TRACE(kmemtrace_free,
+ TPPROTO(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr),
+ TPARGS(type_id, call_site, ptr));

#endif /* __KERNEL__ */

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 7555ce9..e5a8b60 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -76,8 +76,9 @@ found:

ret = kmem_cache_alloc_notrace(cachep, flags);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
- size, slab_buffer_size(cachep), flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
+ size, slab_buffer_size(cachep),
+ flags, -1);

return ret;
}
@@ -134,9 +135,9 @@ found:

ret = kmem_cache_alloc_node_notrace(cachep, flags, node);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
- ret, size, slab_buffer_size(cachep),
- flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
+ ret, size, slab_buffer_size(cachep),
+ flags, node);

return ret;
}
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index dc28432..ce69c68 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
unsigned int order = get_order(size);
void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
- size, PAGE_SIZE << order, flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
+ size, PAGE_SIZE << order, flags, -1);

return ret;
}
@@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)

ret = kmem_cache_alloc_notrace(s, flags);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
- _THIS_IP_, ret,
- size, s->size, flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ _THIS_IP_, ret,
+ size, s->size, flags, -1);

return ret;
}
@@ -283,9 +283,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)

ret = kmem_cache_alloc_node_notrace(s, flags, node);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
- _THIS_IP_, ret,
- size, s->size, flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ _THIS_IP_, ret,
+ size, s->size, flags, node);

return ret;
}
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index b5417e2..9cf8eb8 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -805,7 +805,7 @@ config FIREWIRE_OHCI_REMOTE_DMA

config KMEMTRACE
bool "Kernel memory tracer (kmemtrace)"
- depends on RELAY && DEBUG_FS && MARKERS
+ depends on RELAY && DEBUG_FS && TRACEPOINTS
help
kmemtrace provides tracing for slab allocator functions, such as
kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
index 2a70a80..7bc81b2 100644
--- a/mm/kmemtrace.c
+++ b/mm/kmemtrace.c
@@ -58,8 +58,13 @@ struct kmemtrace_stats_alloc {
s32 numa_node;
} __attribute__ ((__packed__));

-static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
- const char *format, va_list *args)
+static void kmemtrace_probe_alloc(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr,
+ size_t bytes_req,
+ size_t bytes_alloc,
+ gfp_t gfp_flags,
+ int node)
{
unsigned long flags;
struct kmemtrace_event *ev;
@@ -81,25 +86,26 @@ static void kmemtrace_probe_alloc(void *probe_data, void *call_data,

ev = buf;
ev->event_id = KMEMTRACE_EVENT_ALLOC;
- ev->type_id = va_arg(*args, int);
+ ev->type_id = type_id;
ev->event_size = sizeof(struct kmemtrace_event) +
sizeof(struct kmemtrace_stats_alloc);
ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
- ev->call_site = va_arg(*args, unsigned long);
- ev->ptr = va_arg(*args, unsigned long);
+ ev->call_site = call_site;
+ ev->ptr = (unsigned long) ptr;

stats = buf + sizeof(struct kmemtrace_event);
- stats->bytes_req = va_arg(*args, unsigned long);
- stats->bytes_alloc = va_arg(*args, unsigned long);
- stats->gfp_flags = va_arg(*args, unsigned long);
- stats->numa_node = va_arg(*args, int);
+ stats->bytes_req = bytes_req;
+ stats->bytes_alloc = bytes_alloc;
+ stats->gfp_flags = gfp_flags;
+ stats->numa_node = node;

failed:
local_irq_restore(flags);
}

-static void kmemtrace_probe_free(void *probe_data, void *call_data,
- const char *format, va_list *args)
+static void kmemtrace_probe_free(enum kmemtrace_type_id type_id,
+ unsigned long call_site,
+ const void *ptr)
{
unsigned long flags;
struct kmemtrace_event *ev;
@@ -115,11 +121,11 @@ static void kmemtrace_probe_free(void *probe_data, void *call_data,
* C99 does not guarantee the rvalues evaluation order.
*/
ev->event_id = KMEMTRACE_EVENT_FREE;
- ev->type_id = va_arg(*args, int);
+ ev->type_id = type_id;
ev->event_size = sizeof(struct kmemtrace_event);
ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
- ev->call_site = va_arg(*args, unsigned long);
- ev->ptr = va_arg(*args, unsigned long);
+ ev->call_site = call_site;
+ ev->ptr = (unsigned long) ptr;

failed:
local_irq_restore(flags);
@@ -173,26 +179,18 @@ static int kmemtrace_start_probes(void)
{
int err;

- err = marker_probe_register("kmemtrace_alloc", "type_id %d "
- "call_site %lu ptr %lu "
- "bytes_req %lu bytes_alloc %lu "
- "gfp_flags %lu node %d",
- kmemtrace_probe_alloc, NULL);
+ err = register_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
if (err)
return err;
- err = marker_probe_register("kmemtrace_free", "type_id %d "
- "call_site %lu ptr %lu",
- kmemtrace_probe_free, NULL);
+ err = register_trace_kmemtrace_free(kmemtrace_probe_free);

return err;
}

static void kmemtrace_stop_probes(void)
{
- marker_probe_unregister("kmemtrace_alloc",
- kmemtrace_probe_alloc, NULL);
- marker_probe_unregister("kmemtrace_free",
- kmemtrace_probe_free, NULL);
+ unregister_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
+ unregister_trace_kmemtrace_free(kmemtrace_probe_free);
}

static int kmemtrace_enabled_get(void *data, u64 *val)
diff --git a/mm/slab.c b/mm/slab.c
index 1fcb32b..74d405d 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
{
void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
- obj_size(cachep), cachep->buffer_size, flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
+ obj_size(cachep), cachep->buffer_size,
+ flags, -1);

return ret;
}
@@ -3686,9 +3687,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
void *ret = __cache_alloc_node(cachep, flags, nodeid,
__builtin_return_address(0));

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
- obj_size(cachep), cachep->buffer_size,
- flags, nodeid);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
+ obj_size(cachep), cachep->buffer_size,
+ flags, nodeid);

return ret;
}
@@ -3716,9 +3717,9 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
return cachep;
ret = kmem_cache_alloc_node_notrace(cachep, flags, node);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
- (unsigned long) caller, ret,
- size, cachep->buffer_size, flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ (unsigned long) caller, ret,
+ size, cachep->buffer_size, flags, node);

return ret;
}
@@ -3768,9 +3769,9 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
return cachep;
ret = __cache_alloc(cachep, flags, caller);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
- (unsigned long) caller, ret,
- size, cachep->buffer_size, flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ (unsigned long) caller, ret,
+ size, cachep->buffer_size, flags, -1);

return ret;
}
@@ -3816,7 +3817,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
__cache_free(cachep, objp);
local_irq_restore(flags);

- kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
+ trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
}
EXPORT_SYMBOL(kmem_cache_free);

@@ -3844,7 +3845,7 @@ void kfree(const void *objp)
__cache_free(c, (void *)objp);
local_irq_restore(flags);

- kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
+ trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
}
EXPORT_SYMBOL(kfree);

diff --git a/mm/slob.c b/mm/slob.c
index 0f1a49f..15f812c 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -477,9 +477,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
*m = size;
ret = (void *)m + align;

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
- _RET_IP_, ret,
- size, size + align, gfp, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ _RET_IP_, ret,
+ size, size + align, gfp, node);
} else {
unsigned int order = get_order(size);

@@ -490,9 +490,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
page->private = size;
}

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
- _RET_IP_, ret,
- size, PAGE_SIZE << order, gfp, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ _RET_IP_, ret,
+ size, PAGE_SIZE << order, gfp, node);
}

return ret;
@@ -514,7 +514,7 @@ void kfree(const void *block)
} else
put_page(&sp->page);

- kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
}
EXPORT_SYMBOL(kfree);

@@ -585,16 +585,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)

if (c->size < PAGE_SIZE) {
b = slob_alloc(c->size, flags, c->align, node);
- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
- _RET_IP_, b, c->size,
- SLOB_UNITS(c->size) * SLOB_UNIT,
- flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
+ _RET_IP_, b, c->size,
+ SLOB_UNITS(c->size) * SLOB_UNIT,
+ flags, node);
} else {
b = slob_new_page(flags, get_order(c->size), node);
- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
- _RET_IP_, b, c->size,
- PAGE_SIZE << get_order(c->size),
- flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
+ _RET_IP_, b, c->size,
+ PAGE_SIZE << get_order(c->size),
+ flags, node);
}

if (c->ctor)
@@ -632,7 +632,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
__kmem_cache_free(b, c->size);
}

- kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
+ trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
}
EXPORT_SYMBOL(kmem_cache_free);

diff --git a/mm/slub.c b/mm/slub.c
index f756915..3a2ec3a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1620,8 +1620,8 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
void *ret = slab_alloc(s, gfpflags, -1, _RET_IP_);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
- s->objsize, s->size, gfpflags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
+ s->objsize, s->size, gfpflags, -1);

return ret;
}
@@ -1640,8 +1640,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
{
void *ret = slab_alloc(s, gfpflags, node, _RET_IP_);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
- s->objsize, s->size, gfpflags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
+ s->objsize, s->size, gfpflags, node);

return ret;
}
@@ -1766,7 +1766,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)

slab_free(s, page, x, _RET_IP_);

- kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
+ trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
}
EXPORT_SYMBOL(kmem_cache_free);

@@ -2694,8 +2694,8 @@ void *__kmalloc(size_t size, gfp_t flags)

ret = slab_alloc(s, flags, -1, _RET_IP_);

- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
- size, s->size, flags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
+ size, s->size, flags, -1);

return ret;
}
@@ -2721,10 +2721,10 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
if (unlikely(size > PAGE_SIZE)) {
ret = kmalloc_large_node(size, flags, node);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
- _RET_IP_, ret,
- size, PAGE_SIZE << get_order(size),
- flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
+ _RET_IP_, ret,
+ size, PAGE_SIZE << get_order(size),
+ flags, node);

return ret;
}
@@ -2736,8 +2736,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)

ret = slab_alloc(s, flags, node, _RET_IP_);

- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
- size, s->size, flags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
+ size, s->size, flags, node);

return ret;
}
@@ -2798,7 +2798,7 @@ void kfree(const void *x)
}
slab_free(page->slab, page, object, _RET_IP_);

- kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
+ trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
}
EXPORT_SYMBOL(kfree);

@@ -3272,8 +3272,8 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
ret = slab_alloc(s, gfpflags, -1, caller);

/* Honor the call site pointer we recieved. */
- kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
- s->size, gfpflags);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
+ s->size, gfpflags, -1);

return ret;
}
@@ -3295,8 +3295,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
ret = slab_alloc(s, gfpflags, node, caller);

/* Honor the call site pointer we recieved. */
- kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, caller, ret,
- size, s->size, gfpflags, node);
+ trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret,
+ size, s->size, gfpflags, node);

return ret;
}
--
1.6.0.6

2008-12-29 09:41:50

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 0/3] kmemtrace over tracepoints

Hi Eduard,

On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
<[email protected]> wrote:
> Pekka, please merge this if it looks okay. I'll try to come up in time with
> the stuff described above; however, should I not be ready in time, please
> ask Linus to git-pull whatever we've got at present. Hopefully, it'll get
> in this time.

The RCU and tracepoint changes need an ACK from the relevant
maintainers before I can merge this. The conversion to tracepoints in
slab allocators looks good to me though.

Pekka

2008-12-29 09:42:56

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 1/3] RCU: Move some definitions to minimal headers.

On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
<[email protected]> wrote:
> linux/rcupdate.h used to include SLAB headers, which got in the way of
> switching kmemtrace to use tracepoints instead of markers. The circular
> header inclusion pattern that appeared was making it impossible to use
> tracepoints in SLAB allocator inlines.
>
> This moves some header code into separate files. As an added bonus,
> preprocessing is faster if care is taken to use these minimal headers,
> since no effort is spent on including SLAB headers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>

I'm okay with this but lets cc Paul for an ack/nak.

> ---
> include/linux/rcuclassic.h | 40 +--------
> include/linux/rcuclassic_min.h | 77 +++++++++++++++
> include/linux/rcupdate.h | 160 +-------------------------------
> include/linux/rcupdate_min.h | 205 ++++++++++++++++++++++++++++++++++++++++
> include/linux/rcupreempt.h | 32 +------
> include/linux/rcupreempt_min.h | 68 +++++++++++++
> 6 files changed, 353 insertions(+), 229 deletions(-)
> create mode 100644 include/linux/rcuclassic_min.h
> create mode 100644 include/linux/rcupdate_min.h
> create mode 100644 include/linux/rcupreempt_min.h
>
> diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
> index 5f89b62..32f1512 100644
> --- a/include/linux/rcuclassic.h
> +++ b/include/linux/rcuclassic.h
> @@ -39,6 +39,7 @@
> #include <linux/percpu.h>
> #include <linux/cpumask.h>
> #include <linux/seqlock.h>
> +#include <linux/rcuclassic_min.h>
>
> #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> #define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */
> @@ -131,45 +132,6 @@ static inline void rcu_bh_qsctr_inc(int cpu)
> extern int rcu_pending(int cpu);
> extern int rcu_needs_cpu(int cpu);
>
> -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> -extern struct lockdep_map rcu_lock_map;
> -# define rcu_read_acquire() \
> - lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
> -# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> -#else
> -# define rcu_read_acquire() do { } while (0)
> -# define rcu_read_release() do { } while (0)
> -#endif
> -
> -#define __rcu_read_lock() \
> - do { \
> - preempt_disable(); \
> - __acquire(RCU); \
> - rcu_read_acquire(); \
> - } while (0)
> -#define __rcu_read_unlock() \
> - do { \
> - rcu_read_release(); \
> - __release(RCU); \
> - preempt_enable(); \
> - } while (0)
> -#define __rcu_read_lock_bh() \
> - do { \
> - local_bh_disable(); \
> - __acquire(RCU_BH); \
> - rcu_read_acquire(); \
> - } while (0)
> -#define __rcu_read_unlock_bh() \
> - do { \
> - rcu_read_release(); \
> - __release(RCU_BH); \
> - local_bh_enable(); \
> - } while (0)
> -
> -#define __synchronize_sched() synchronize_rcu()
> -
> -#define call_rcu_sched(head, func) call_rcu(head, func)
> -
> extern void __rcu_init(void);
> #define rcu_init_sched() do { } while (0)
> extern void rcu_check_callbacks(int cpu, int user);
> diff --git a/include/linux/rcuclassic_min.h b/include/linux/rcuclassic_min.h
> new file mode 100644
> index 0000000..7ea051b
> --- /dev/null
> +++ b/include/linux/rcuclassic_min.h
> @@ -0,0 +1,77 @@
> +/*
> + * Read-Copy Update mechanism for mutual exclusion (classic version)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright IBM Corporation, 2001
> + *
> + * Author: Dipankar Sarma <[email protected]>
> + *
> + * Based on the original work by Paul McKenney <[email protected]>
> + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> + * Papers:
> + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> + *
> + * For detailed explanation of Read-Copy Update mechanism see -
> + * Documentation/RCU
> + *
> + */
> +
> +#ifndef _LINUX_RCUCLASSIC_MIN_H
> +#define _LINUX_RCUCLASSIC_MIN_H
> +
> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> +#include <linux/lockdep.h>
> +extern struct lockdep_map rcu_lock_map;
> +# define rcu_read_acquire() \
> + lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
> +# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> +#else
> +# define rcu_read_acquire() do { } while (0)
> +# define rcu_read_release() do { } while (0)
> +#endif
> +
> +#define __rcu_read_lock() \
> + do { \
> + preempt_disable(); \
> + __acquire(RCU); \
> + rcu_read_acquire(); \
> + } while (0)
> +#define __rcu_read_unlock() \
> + do { \
> + rcu_read_release(); \
> + __release(RCU); \
> + preempt_enable(); \
> + } while (0)
> +#define __rcu_read_lock_bh() \
> + do { \
> + local_bh_disable(); \
> + __acquire(RCU_BH); \
> + rcu_read_acquire(); \
> + } while (0)
> +#define __rcu_read_unlock_bh() \
> + do { \
> + rcu_read_release(); \
> + __release(RCU_BH); \
> + local_bh_enable(); \
> + } while (0)
> +
> +#define __synchronize_sched() synchronize_rcu()
> +
> +#define call_rcu_sched(head, func) call_rcu(head, func)
> +
> +#endif /* _LINUX_RCUCLASSIC_MIN_H */
> +
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 86f1f5e..d7659e8 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -41,16 +41,7 @@
> #include <linux/seqlock.h>
> #include <linux/lockdep.h>
> #include <linux/completion.h>
> -
> -/**
> - * struct rcu_head - callback structure for use with RCU
> - * @next: next update requests in a list
> - * @func: actual update function to call after the grace period.
> - */
> -struct rcu_head {
> - struct rcu_head *next;
> - void (*func)(struct rcu_head *head);
> -};
> +#include <linux/rcupdate_min.h>
>
> #ifdef CONFIG_CLASSIC_RCU
> #include <linux/rcuclassic.h>
> @@ -64,131 +55,6 @@ struct rcu_head {
> (ptr)->next = NULL; (ptr)->func = NULL; \
> } while (0)
>
> -/**
> - * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> - *
> - * When synchronize_rcu() is invoked on one CPU while other CPUs
> - * are within RCU read-side critical sections, then the
> - * synchronize_rcu() is guaranteed to block until after all the other
> - * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
> - * on one CPU while other CPUs are within RCU read-side critical
> - * sections, invocation of the corresponding RCU callback is deferred
> - * until after the all the other CPUs exit their critical sections.
> - *
> - * Note, however, that RCU callbacks are permitted to run concurrently
> - * with RCU read-side critical sections. One way that this can happen
> - * is via the following sequence of events: (1) CPU 0 enters an RCU
> - * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> - * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> - * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> - * callback is invoked. This is legal, because the RCU read-side critical
> - * section that was running concurrently with the call_rcu() (and which
> - * therefore might be referencing something that the corresponding RCU
> - * callback would free up) has completed before the corresponding
> - * RCU callback is invoked.
> - *
> - * RCU read-side critical sections may be nested. Any deferred actions
> - * will be deferred until the outermost RCU read-side critical section
> - * completes.
> - *
> - * It is illegal to block while in an RCU read-side critical section.
> - */
> -#define rcu_read_lock() __rcu_read_lock()
> -
> -/**
> - * rcu_read_unlock - marks the end of an RCU read-side critical section.
> - *
> - * See rcu_read_lock() for more information.
> - */
> -
> -/*
> - * So where is rcu_write_lock()? It does not exist, as there is no
> - * way for writers to lock out RCU readers. This is a feature, not
> - * a bug -- this property is what provides RCU's performance benefits.
> - * Of course, writers must coordinate with each other. The normal
> - * spinlock primitives work well for this, but any other technique may be
> - * used as well. RCU does not care how the writers keep out of each
> - * others' way, as long as they do so.
> - */
> -#define rcu_read_unlock() __rcu_read_unlock()
> -
> -/**
> - * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
> - *
> - * This is equivalent of rcu_read_lock(), but to be used when updates
> - * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> - * consider completion of a softirq handler to be a quiescent state,
> - * a process in RCU read-side critical section must be protected by
> - * disabling softirqs. Read-side critical sections in interrupt context
> - * can use just rcu_read_lock().
> - *
> - */
> -#define rcu_read_lock_bh() __rcu_read_lock_bh()
> -
> -/*
> - * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
> - *
> - * See rcu_read_lock_bh() for more information.
> - */
> -#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
> -
> -/**
> - * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
> - *
> - * Should be used with either
> - * - synchronize_sched()
> - * or
> - * - call_rcu_sched() and rcu_barrier_sched()
> - * on the write-side to insure proper synchronization.
> - */
> -#define rcu_read_lock_sched() preempt_disable()
> -
> -/*
> - * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
> - *
> - * See rcu_read_lock_sched for more information.
> - */
> -#define rcu_read_unlock_sched() preempt_enable()
> -
> -
> -
> -/**
> - * rcu_dereference - fetch an RCU-protected pointer in an
> - * RCU read-side critical section. This pointer may later
> - * be safely dereferenced.
> - *
> - * Inserts memory barriers on architectures that require them
> - * (currently only the Alpha), and, more importantly, documents
> - * exactly which pointers are protected by RCU.
> - */
> -
> -#define rcu_dereference(p) ({ \
> - typeof(p) _________p1 = ACCESS_ONCE(p); \
> - smp_read_barrier_depends(); \
> - (_________p1); \
> - })
> -
> -/**
> - * rcu_assign_pointer - assign (publicize) a pointer to a newly
> - * initialized structure that will be dereferenced by RCU read-side
> - * critical sections. Returns the value assigned.
> - *
> - * Inserts memory barriers on architectures that require them
> - * (pretty much all of them other than x86), and also prevents
> - * the compiler from reordering the code that initializes the
> - * structure after the pointer assignment. More importantly, this
> - * call documents which pointers will be dereferenced by RCU read-side
> - * code.
> - */
> -
> -#define rcu_assign_pointer(p, v) \
> - ({ \
> - if (!__builtin_constant_p(v) || \
> - ((v) != NULL)) \
> - smp_wmb(); \
> - (p) = (v); \
> - })
> -
> /* Infrastructure to implement the synchronize_() primitives. */
>
> struct rcu_synchronize {
> @@ -211,24 +77,6 @@ void name(void) \
> }
>
> /**
> - * synchronize_sched - block until all CPUs have exited any non-preemptive
> - * kernel code sequences.
> - *
> - * This means that all preempt_disable code sequences, including NMI and
> - * hardware-interrupt handlers, in progress on entry will have completed
> - * before this primitive returns. However, this does not guarantee that
> - * softirq handlers will have completed, since in some kernels, these
> - * handlers can run in process context, and can block.
> - *
> - * This primitive provides the guarantees made by the (now removed)
> - * synchronize_kernel() API. In contrast, synchronize_rcu() only
> - * guarantees that rcu_read_lock() sections will have completed.
> - * In "classic RCU", these two guarantees happen to be one and
> - * the same, but can differ in realtime RCU implementations.
> - */
> -#define synchronize_sched() __synchronize_sched()
> -
> -/**
> * call_rcu - Queue an RCU callback for invocation after a grace period.
> * @head: structure to be used for queueing the RCU updates.
> * @func: actual update function to be invoked after the grace period
> @@ -263,12 +111,6 @@ extern void call_rcu(struct rcu_head *head,
> extern void call_rcu_bh(struct rcu_head *head,
> void (*func)(struct rcu_head *head));
>
> -/* Exported common interfaces */
> -extern void synchronize_rcu(void);
> -extern void rcu_barrier(void);
> -extern void rcu_barrier_bh(void);
> -extern void rcu_barrier_sched(void);
> -
> /* Internal to kernel */
> extern void rcu_init(void);
> extern int rcu_needs_cpu(int cpu);
> diff --git a/include/linux/rcupdate_min.h b/include/linux/rcupdate_min.h
> new file mode 100644
> index 0000000..35ca45d
> --- /dev/null
> +++ b/include/linux/rcupdate_min.h
> @@ -0,0 +1,205 @@
> +/*
> + * Read-Copy Update mechanism for mutual exclusion
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright IBM Corporation, 2001
> + *
> + * Author: Dipankar Sarma <[email protected]>
> + *
> + * Based on the original work by Paul McKenney <[email protected]>
> + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> + * Papers:
> + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> + *
> + * This is a minimal header. Use it if you don't require all definitions,
> + * as it doesn't end up including slab stuff and lots of other headers.
> + *
> + * For detailed explanation of Read-Copy Update mechanism see -
> + * http://lse.sourceforge.net/locking/rcupdate.html
> + *
> + */
> +
> +#ifndef __LINUX_RCUPDATE_MIN_H
> +#define __LINUX_RCUPDATE_MIN_H
> +
> +/**
> + * struct rcu_head - callback structure for use with RCU
> + * @next: next update requests in a list
> + * @func: actual update function to call after the grace period.
> + */
> +struct rcu_head {
> + struct rcu_head *next;
> + void (*func)(struct rcu_head *head);
> +};
> +
> +#ifdef CONFIG_CLASSIC_RCU
> +#include <linux/rcuclassic_min.h>
> +#else /* #ifdef CONFIG_CLASSIC_RCU */
> +#include <linux/rcupreempt_min.h>
> +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
> +
> +/**
> + * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> + *
> + * When synchronize_rcu() is invoked on one CPU while other CPUs
> + * are within RCU read-side critical sections, then the
> + * synchronize_rcu() is guaranteed to block until after all the other
> + * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
> + * on one CPU while other CPUs are within RCU read-side critical
> + * sections, invocation of the corresponding RCU callback is deferred
> + * until after the all the other CPUs exit their critical sections.
> + *
> + * Note, however, that RCU callbacks are permitted to run concurrently
> + * with RCU read-side critical sections. One way that this can happen
> + * is via the following sequence of events: (1) CPU 0 enters an RCU
> + * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> + * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> + * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> + * callback is invoked. This is legal, because the RCU read-side critical
> + * section that was running concurrently with the call_rcu() (and which
> + * therefore might be referencing something that the corresponding RCU
> + * callback would free up) has completed before the corresponding
> + * RCU callback is invoked.
> + *
> + * RCU read-side critical sections may be nested. Any deferred actions
> + * will be deferred until the outermost RCU read-side critical section
> + * completes.
> + *
> + * It is illegal to block while in an RCU read-side critical section.
> + */
> +#define rcu_read_lock() __rcu_read_lock()
> +
> +/**
> + * rcu_read_unlock - marks the end of an RCU read-side critical section.
> + *
> + * See rcu_read_lock() for more information.
> + */
> +
> +/*
> + * So where is rcu_write_lock()? It does not exist, as there is no
> + * way for writers to lock out RCU readers. This is a feature, not
> + * a bug -- this property is what provides RCU's performance benefits.
> + * Of course, writers must coordinate with each other. The normal
> + * spinlock primitives work well for this, but any other technique may be
> + * used as well. RCU does not care how the writers keep out of each
> + * others' way, as long as they do so.
> + */
> +#define rcu_read_unlock() __rcu_read_unlock()
> +
> +/**
> + * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
> + *
> + * This is equivalent of rcu_read_lock(), but to be used when updates
> + * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> + * consider completion of a softirq handler to be a quiescent state,
> + * a process in RCU read-side critical section must be protected by
> + * disabling softirqs. Read-side critical sections in interrupt context
> + * can use just rcu_read_lock().
> + *
> + */
> +#define rcu_read_lock_bh() __rcu_read_lock_bh()
> +
> +/*
> + * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
> + *
> + * See rcu_read_lock_bh() for more information.
> + */
> +#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
> +
> +/**
> + * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
> + *
> + * Should be used with either
> + * - synchronize_sched()
> + * or
> + * - call_rcu_sched() and rcu_barrier_sched()
> + * on the write-side to insure proper synchronization.
> + */
> +#define rcu_read_lock_sched() preempt_disable()
> +
> +/*
> + * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
> + *
> + * See rcu_read_lock_sched for more information.
> + */
> +#define rcu_read_unlock_sched() preempt_enable()
> +
> +
> +
> +/**
> + * rcu_dereference - fetch an RCU-protected pointer in an
> + * RCU read-side critical section. This pointer may later
> + * be safely dereferenced.
> + *
> + * Inserts memory barriers on architectures that require them
> + * (currently only the Alpha), and, more importantly, documents
> + * exactly which pointers are protected by RCU.
> + */
> +
> +#define rcu_dereference(p) ({ \
> + typeof(p) _________p1 = ACCESS_ONCE(p); \
> + smp_read_barrier_depends(); \
> + (_________p1); \
> + })
> +
> +/**
> + * rcu_assign_pointer - assign (publicize) a pointer to a newly
> + * initialized structure that will be dereferenced by RCU read-side
> + * critical sections. Returns the value assigned.
> + *
> + * Inserts memory barriers on architectures that require them
> + * (pretty much all of them other than x86), and also prevents
> + * the compiler from reordering the code that initializes the
> + * structure after the pointer assignment. More importantly, this
> + * call documents which pointers will be dereferenced by RCU read-side
> + * code.
> + */
> +
> +#define rcu_assign_pointer(p, v) \
> + ({ \
> + if (!__builtin_constant_p(v) || \
> + ((v) != NULL)) \
> + smp_wmb(); \
> + (p) = (v); \
> + })
> +
> +/**
> + * synchronize_sched - block until all CPUs have exited any non-preemptive
> + * kernel code sequences.
> + *
> + * This means that all preempt_disable code sequences, including NMI and
> + * hardware-interrupt handlers, in progress on entry will have completed
> + * before this primitive returns. However, this does not guarantee that
> + * softirq handlers will have completed, since in some kernels, these
> + * handlers can run in process context, and can block.
> + *
> + * This primitive provides the guarantees made by the (now removed)
> + * synchronize_kernel() API. In contrast, synchronize_rcu() only
> + * guarantees that rcu_read_lock() sections will have completed.
> + * In "classic RCU", these two guarantees happen to be one and
> + * the same, but can differ in realtime RCU implementations.
> + */
> +#define synchronize_sched() __synchronize_sched()
> +
> +/* Exported common interfaces */
> +extern void synchronize_rcu(void);
> +extern void rcu_barrier(void);
> +extern void rcu_barrier_bh(void);
> +extern void rcu_barrier_sched(void);
> +
> +#endif /* __LINUX_RCUPDATE_MIN_H */
> +
> diff --git a/include/linux/rcupreempt.h b/include/linux/rcupreempt.h
> index 3e05c09..1f1dcf1 100644
> --- a/include/linux/rcupreempt.h
> +++ b/include/linux/rcupreempt.h
> @@ -39,6 +39,7 @@
> #include <linux/percpu.h>
> #include <linux/cpumask.h>
> #include <linux/seqlock.h>
> +#include <linux/rcupreempt_min.h>
>
> struct rcu_dyntick_sched {
> int dynticks;
> @@ -58,37 +59,6 @@ static inline void rcu_qsctr_inc(int cpu)
> }
> #define rcu_bh_qsctr_inc(cpu)
>
> -/*
> - * Someone might want to pass call_rcu_bh as a function pointer.
> - * So this needs to just be a rename and not a macro function.
> - * (no parentheses)
> - */
> -#define call_rcu_bh call_rcu
> -
> -/**
> - * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
> - * @head: structure to be used for queueing the RCU updates.
> - * @func: actual update function to be invoked after the grace period
> - *
> - * The update function will be invoked some time after a full
> - * synchronize_sched()-style grace period elapses, in other words after
> - * all currently executing preempt-disabled sections of code (including
> - * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
> - * completed.
> - */
> -extern void call_rcu_sched(struct rcu_head *head,
> - void (*func)(struct rcu_head *head));
> -
> -extern void __rcu_read_lock(void) __acquires(RCU);
> -extern void __rcu_read_unlock(void) __releases(RCU);
> -extern int rcu_pending(int cpu);
> -extern int rcu_needs_cpu(int cpu);
> -
> -#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
> -#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
> -
> -extern void __synchronize_sched(void);
> -
> extern void __rcu_init(void);
> extern void rcu_init_sched(void);
> extern void rcu_check_callbacks(int cpu, int user);
> diff --git a/include/linux/rcupreempt_min.h b/include/linux/rcupreempt_min.h
> new file mode 100644
> index 0000000..37d3693
> --- /dev/null
> +++ b/include/linux/rcupreempt_min.h
> @@ -0,0 +1,68 @@
> +/*
> + * Read-Copy Update mechanism for mutual exclusion (RT implementation)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2006
> + *
> + * Author: Paul McKenney <[email protected]>
> + *
> + * Based on the original work by Paul McKenney <[email protected]>
> + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> + * Papers:
> + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> + *
> + * For detailed explanation of Read-Copy Update mechanism see -
> + * Documentation/RCU
> + *
> + */
> +
> +#ifndef __LINUX_RCUPREEMPT_MIN_H
> +#define __LINUX_RCUPREEMPT_MIN_H
> +
> +/*
> + * Someone might want to pass call_rcu_bh as a function pointer.
> + * So this needs to just be a rename and not a macro function.
> + * (no parentheses)
> + */
> +#define call_rcu_bh call_rcu
> +
> +/**
> + * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
> + * @head: structure to be used for queueing the RCU updates.
> + * @func: actual update function to be invoked after the grace period
> + *
> + * The update function will be invoked some time after a full
> + * synchronize_sched()-style grace period elapses, in other words after
> + * all currently executing preempt-disabled sections of code (including
> + * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
> + * completed.
> + */
> +extern void call_rcu_sched(struct rcu_head *head,
> + void (*func)(struct rcu_head *head));
> +
> +extern void __rcu_read_lock(void) __acquires(RCU);
> +extern void __rcu_read_unlock(void) __releases(RCU);
> +extern int rcu_pending(int cpu);
> +extern int rcu_needs_cpu(int cpu);
> +
> +#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
> +#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
> +
> +extern void __synchronize_sched(void);
> +
> +#endif /* __LINUX_RCUPREEMPT_MIN_H */
> +
> --
> 1.6.0.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-12-29 09:44:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
<[email protected]> wrote:
> kmemtrace now uses tracepoints instead of markers. We no longer need to
> use format specifiers. Also dropped kmemtrace_mark_alloc_node(), since
> it's easy to pass -1 as the NUMA node without providing a needless and
> long-named variant of the same function.
>
> Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>

Looks good to me. Mathieu, does this look sane to you as well?

> ---
> include/linux/kmemtrace.h | 76 +++++++++-----------------------------------
> include/linux/slab_def.h | 11 +++---
> include/linux/slub_def.h | 16 +++++-----
> lib/Kconfig.debug | 2 +-
> mm/kmemtrace.c | 50 ++++++++++++++---------------
> mm/slab.c | 27 ++++++++--------
> mm/slob.c | 32 +++++++++---------
> mm/slub.c | 36 ++++++++++----------
> 8 files changed, 103 insertions(+), 147 deletions(-)
>
> diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
> index 5bea8ea..7b38245 100644
> --- a/include/linux/kmemtrace.h
> +++ b/include/linux/kmemtrace.h
> @@ -9,8 +9,8 @@
>
> #ifdef __KERNEL__
>
> +#include <linux/tracepoint.h>
> #include <linux/types.h>
> -#include <linux/marker.h>
>
> enum kmemtrace_type_id {
> KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
> @@ -18,67 +18,23 @@ enum kmemtrace_type_id {
> KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
> };
>
> -#ifdef CONFIG_KMEMTRACE
> -
> extern void kmemtrace_init(void);
>
> -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags,
> - int node)
> -{
> - trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
> - "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
> - type_id, call_site, (unsigned long) ptr,
> - (unsigned long) bytes_req, (unsigned long) bytes_alloc,
> - (unsigned long) gfp_flags, node);
> -}
> -
> -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr)
> -{
> - trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
> - type_id, call_site, (unsigned long) ptr);
> -}
> -
> -#else /* CONFIG_KMEMTRACE */
> -
> -static inline void kmemtrace_init(void)
> -{
> -}
> -
> -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags,
> - int node)
> -{
> -}
> -
> -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr)
> -{
> -}
> -
> -#endif /* CONFIG_KMEMTRACE */
> -
> -static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags)
> -{
> - kmemtrace_mark_alloc_node(type_id, call_site, ptr,
> - bytes_req, bytes_alloc, gfp_flags, -1);
> -}
> +DEFINE_TRACE(kmemtrace_alloc,
> + TPPROTO(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr,
> + size_t bytes_req,
> + size_t bytes_alloc,
> + gfp_t gfp_flags,
> + int node),
> + TPARGS(type_id, call_site, ptr,
> + bytes_req, bytes_alloc, gfp_flags, node));
> +DEFINE_TRACE(kmemtrace_free,
> + TPPROTO(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr),
> + TPARGS(type_id, call_site, ptr));
>
> #endif /* __KERNEL__ */
>
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 7555ce9..e5a8b60 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -76,8 +76,9 @@ found:
>
> ret = kmem_cache_alloc_notrace(cachep, flags);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> - size, slab_buffer_size(cachep), flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> + size, slab_buffer_size(cachep),
> + flags, -1);
>
> return ret;
> }
> @@ -134,9 +135,9 @@ found:
>
> ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> - ret, size, slab_buffer_size(cachep),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> + ret, size, slab_buffer_size(cachep),
> + flags, node);
>
> return ret;
> }
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index dc28432..ce69c68 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
> unsigned int order = get_order(size);
> void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> - size, PAGE_SIZE << order, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> + size, PAGE_SIZE << order, flags, -1);
>
> return ret;
> }
> @@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>
> ret = kmem_cache_alloc_notrace(s, flags);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> - _THIS_IP_, ret,
> - size, s->size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _THIS_IP_, ret,
> + size, s->size, flags, -1);
>
> return ret;
> }
> @@ -283,9 +283,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>
> ret = kmem_cache_alloc_node_notrace(s, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _THIS_IP_, ret,
> - size, s->size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _THIS_IP_, ret,
> + size, s->size, flags, node);
>
> return ret;
> }
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index b5417e2..9cf8eb8 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -805,7 +805,7 @@ config FIREWIRE_OHCI_REMOTE_DMA
>
> config KMEMTRACE
> bool "Kernel memory tracer (kmemtrace)"
> - depends on RELAY && DEBUG_FS && MARKERS
> + depends on RELAY && DEBUG_FS && TRACEPOINTS
> help
> kmemtrace provides tracing for slab allocator functions, such as
> kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
> diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
> index 2a70a80..7bc81b2 100644
> --- a/mm/kmemtrace.c
> +++ b/mm/kmemtrace.c
> @@ -58,8 +58,13 @@ struct kmemtrace_stats_alloc {
> s32 numa_node;
> } __attribute__ ((__packed__));
>
> -static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
> - const char *format, va_list *args)
> +static void kmemtrace_probe_alloc(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr,
> + size_t bytes_req,
> + size_t bytes_alloc,
> + gfp_t gfp_flags,
> + int node)
> {
> unsigned long flags;
> struct kmemtrace_event *ev;
> @@ -81,25 +86,26 @@ static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
>
> ev = buf;
> ev->event_id = KMEMTRACE_EVENT_ALLOC;
> - ev->type_id = va_arg(*args, int);
> + ev->type_id = type_id;
> ev->event_size = sizeof(struct kmemtrace_event) +
> sizeof(struct kmemtrace_stats_alloc);
> ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> - ev->call_site = va_arg(*args, unsigned long);
> - ev->ptr = va_arg(*args, unsigned long);
> + ev->call_site = call_site;
> + ev->ptr = (unsigned long) ptr;
>
> stats = buf + sizeof(struct kmemtrace_event);
> - stats->bytes_req = va_arg(*args, unsigned long);
> - stats->bytes_alloc = va_arg(*args, unsigned long);
> - stats->gfp_flags = va_arg(*args, unsigned long);
> - stats->numa_node = va_arg(*args, int);
> + stats->bytes_req = bytes_req;
> + stats->bytes_alloc = bytes_alloc;
> + stats->gfp_flags = gfp_flags;
> + stats->numa_node = node;
>
> failed:
> local_irq_restore(flags);
> }
>
> -static void kmemtrace_probe_free(void *probe_data, void *call_data,
> - const char *format, va_list *args)
> +static void kmemtrace_probe_free(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr)
> {
> unsigned long flags;
> struct kmemtrace_event *ev;
> @@ -115,11 +121,11 @@ static void kmemtrace_probe_free(void *probe_data, void *call_data,
> * C99 does not guarantee the rvalues evaluation order.
> */
> ev->event_id = KMEMTRACE_EVENT_FREE;
> - ev->type_id = va_arg(*args, int);
> + ev->type_id = type_id;
> ev->event_size = sizeof(struct kmemtrace_event);
> ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> - ev->call_site = va_arg(*args, unsigned long);
> - ev->ptr = va_arg(*args, unsigned long);
> + ev->call_site = call_site;
> + ev->ptr = (unsigned long) ptr;
>
> failed:
> local_irq_restore(flags);
> @@ -173,26 +179,18 @@ static int kmemtrace_start_probes(void)
> {
> int err;
>
> - err = marker_probe_register("kmemtrace_alloc", "type_id %d "
> - "call_site %lu ptr %lu "
> - "bytes_req %lu bytes_alloc %lu "
> - "gfp_flags %lu node %d",
> - kmemtrace_probe_alloc, NULL);
> + err = register_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> if (err)
> return err;
> - err = marker_probe_register("kmemtrace_free", "type_id %d "
> - "call_site %lu ptr %lu",
> - kmemtrace_probe_free, NULL);
> + err = register_trace_kmemtrace_free(kmemtrace_probe_free);
>
> return err;
> }
>
> static void kmemtrace_stop_probes(void)
> {
> - marker_probe_unregister("kmemtrace_alloc",
> - kmemtrace_probe_alloc, NULL);
> - marker_probe_unregister("kmemtrace_free",
> - kmemtrace_probe_free, NULL);
> + unregister_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> + unregister_trace_kmemtrace_free(kmemtrace_probe_free);
> }
>
> static int kmemtrace_enabled_get(void *data, u64 *val)
> diff --git a/mm/slab.c b/mm/slab.c
> index 1fcb32b..74d405d 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
> {
> void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - obj_size(cachep), cachep->buffer_size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + obj_size(cachep), cachep->buffer_size,
> + flags, -1);
>
> return ret;
> }
> @@ -3686,9 +3687,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> void *ret = __cache_alloc_node(cachep, flags, nodeid,
> __builtin_return_address(0));
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - obj_size(cachep), cachep->buffer_size,
> - flags, nodeid);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + obj_size(cachep), cachep->buffer_size,
> + flags, nodeid);
>
> return ret;
> }
> @@ -3716,9 +3717,9 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
> return cachep;
> ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - (unsigned long) caller, ret,
> - size, cachep->buffer_size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + (unsigned long) caller, ret,
> + size, cachep->buffer_size, flags, node);
>
> return ret;
> }
> @@ -3768,9 +3769,9 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
> return cachep;
> ret = __cache_alloc(cachep, flags, caller);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> - (unsigned long) caller, ret,
> - size, cachep->buffer_size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + (unsigned long) caller, ret,
> + size, cachep->buffer_size, flags, -1);
>
> return ret;
> }
> @@ -3816,7 +3817,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> __cache_free(cachep, objp);
> local_irq_restore(flags);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> @@ -3844,7 +3845,7 @@ void kfree(const void *objp)
> __cache_free(c, (void *)objp);
> local_irq_restore(flags);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> }
> EXPORT_SYMBOL(kfree);
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 0f1a49f..15f812c 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -477,9 +477,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> *m = size;
> ret = (void *)m + align;
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, size + align, gfp, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, size + align, gfp, node);
> } else {
> unsigned int order = get_order(size);
>
> @@ -490,9 +490,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> page->private = size;
> }
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, PAGE_SIZE << order, gfp, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, PAGE_SIZE << order, gfp, node);
> }
>
> return ret;
> @@ -514,7 +514,7 @@ void kfree(const void *block)
> } else
> put_page(&sp->page);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> }
> EXPORT_SYMBOL(kfree);
>
> @@ -585,16 +585,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
>
> if (c->size < PAGE_SIZE) {
> b = slob_alloc(c->size, flags, c->align, node);
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> - _RET_IP_, b, c->size,
> - SLOB_UNITS(c->size) * SLOB_UNIT,
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> + _RET_IP_, b, c->size,
> + SLOB_UNITS(c->size) * SLOB_UNIT,
> + flags, node);
> } else {
> b = slob_new_page(flags, get_order(c->size), node);
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> - _RET_IP_, b, c->size,
> - PAGE_SIZE << get_order(c->size),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> + _RET_IP_, b, c->size,
> + PAGE_SIZE << get_order(c->size),
> + flags, node);
> }
>
> if (c->ctor)
> @@ -632,7 +632,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
> __kmem_cache_free(b, c->size);
> }
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f756915..3a2ec3a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1620,8 +1620,8 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
> {
> void *ret = slab_alloc(s, gfpflags, -1, _RET_IP_);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - s->objsize, s->size, gfpflags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + s->objsize, s->size, gfpflags, -1);
>
> return ret;
> }
> @@ -1640,8 +1640,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> {
> void *ret = slab_alloc(s, gfpflags, node, _RET_IP_);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - s->objsize, s->size, gfpflags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + s->objsize, s->size, gfpflags, node);
>
> return ret;
> }
> @@ -1766,7 +1766,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
>
> slab_free(s, page, x, _RET_IP_);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> @@ -2694,8 +2694,8 @@ void *__kmalloc(size_t size, gfp_t flags)
>
> ret = slab_alloc(s, flags, -1, _RET_IP_);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> - size, s->size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> + size, s->size, flags, -1);
>
> return ret;
> }
> @@ -2721,10 +2721,10 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> if (unlikely(size > PAGE_SIZE)) {
> ret = kmalloc_large_node(size, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, PAGE_SIZE << get_order(size),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, PAGE_SIZE << get_order(size),
> + flags, node);
>
> return ret;
> }
> @@ -2736,8 +2736,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
>
> ret = slab_alloc(s, flags, node, _RET_IP_);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> - size, s->size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> + size, s->size, flags, node);
>
> return ret;
> }
> @@ -2798,7 +2798,7 @@ void kfree(const void *x)
> }
> slab_free(page->slab, page, object, _RET_IP_);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> }
> EXPORT_SYMBOL(kfree);
>
> @@ -3272,8 +3272,8 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
> ret = slab_alloc(s, gfpflags, -1, caller);
>
> /* Honor the call site pointer we recieved. */
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> - s->size, gfpflags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> + s->size, gfpflags, -1);
>
> return ret;
> }
> @@ -3295,8 +3295,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> ret = slab_alloc(s, gfpflags, node, caller);
>
> /* Honor the call site pointer we recieved. */
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> - size, s->size, gfpflags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> + size, s->size, gfpflags, node);
>
> return ret;
> }
> --
> 1.6.0.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2008-12-29 12:33:03

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/3] RCU: Move some definitions to minimal headers.


* Pekka Enberg <[email protected]> wrote:

> On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
> <[email protected]> wrote:
> > linux/rcupdate.h used to include SLAB headers, which got in the way of
> > switching kmemtrace to use tracepoints instead of markers. The circular
> > header inclusion pattern that appeared was making it impossible to use
> > tracepoints in SLAB allocator inlines.
> >
> > This moves some header code into separate files. As an added bonus,
> > preprocessing is faster if care is taken to use these minimal headers,
> > since no effort is spent on including SLAB headers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
>
> I'm okay with this but lets cc Paul for an ack/nak.

This should go via tip/core/rcu i suspect, which has pending RCU bits
(already sent to Linus, but not merged yet).

Ingo

2008-12-29 13:44:15

by Pekka Enberg

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

Hi Eduard,

On Mon, 2008-12-29 at 03:40 +0200, Eduard - Gabriel Munteanu wrote:
> diff --git a/mm/slab.c b/mm/slab.c
> index 1fcb32b..74d405d 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
> {
> void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - obj_size(cachep), cachep->buffer_size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + obj_size(cachep), cachep->buffer_size,
> + flags, -1);

One minor nit: I'd prefer we call these trace_kmem_cache_alloc(),
trace_kmalloc(), and so on. The trace_kmemtrace prefix doesn't really
make much sense to me.

2008-12-29 14:56:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 1/3] RCU: Move some definitions to minimal headers.


* Eduard - Gabriel Munteanu <[email protected]> wrote:

> linux/rcupdate.h used to include SLAB headers, which got in the way of
> switching kmemtrace to use tracepoints instead of markers. The circular
> header inclusion pattern that appeared was making it impossible to use
> tracepoints in SLAB allocator inlines.
>
> This moves some header code into separate files. As an added bonus,
> preprocessing is faster if care is taken to use these minimal headers,
> since no effort is spent on including SLAB headers.
>
> Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
> ---
> include/linux/rcuclassic.h | 40 +--------
> include/linux/rcuclassic_min.h | 77 +++++++++++++++
> include/linux/rcupdate.h | 160 +-------------------------------
> include/linux/rcupdate_min.h | 205 ++++++++++++++++++++++++++++++++++++++++
> include/linux/rcupreempt.h | 32 +------
> include/linux/rcupreempt_min.h | 68 +++++++++++++
> 6 files changed, 353 insertions(+), 229 deletions(-)
> create mode 100644 include/linux/rcuclassic_min.h
> create mode 100644 include/linux/rcupdate_min.h
> create mode 100644 include/linux/rcupreempt_min.h

ok, i like the idea - i'm just not sure this is the best approach. If we
want to do a splitup, it should be along the lines of spinlock.h and
spinlock_types.h.

The principle there is that the type declarations and inline function
definitions kept separate. The 'type' headers are lightweight with minimal
dependencies - the 'inline function' headers are more heavyweight.

For RCU ... cannot we get away much simpler, by removing the headers from
rcupdate.h that cause the SLAB dependency hell? Does rcupdate.h really
need to include all of:

#include <linux/cache.h>
#include <linux/spinlock.h>
#include <linux/threads.h>
#include <linux/percpu.h>
#include <linux/cpumask.h>
#include <linux/seqlock.h>
#include <linux/lockdep.h>
#include <linux/completion.h>

AFAICS it should be able to get away with spinlock_types.h and i'm not
sure it needs cache.h. The other headers are 'data type' headers mostly.

Which one is causing the trouble for you - cache.h ? If you do a patch
that removes that trouble include file i can make wide build coverage
tests to make sure it's OK to remove it.

but rcu_types.h would be nice to have too - i'm just not sure it's really
what you need. (you seem to need certain RCU functions too, right?)

Ingo

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Mon, Dec 29, 2008 at 03:43:56PM +0200, Pekka Enberg wrote:
> Hi Eduard,
>
> On Mon, 2008-12-29 at 03:40 +0200, Eduard - Gabriel Munteanu wrote:
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 1fcb32b..74d405d 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
> > {
> > void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > - obj_size(cachep), cachep->buffer_size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > + obj_size(cachep), cachep->buffer_size,
> > + flags, -1);
>
> One minor nit: I'd prefer we call these trace_kmem_cache_alloc(),
> trace_kmalloc(), and so on. The trace_kmemtrace prefix doesn't really
> make much sense to me.

Sure, I'll also move the KMEMTRACE_TYPE_* enum to mm/kmemtrace.c.

2008-12-30 06:31:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 1/3] RCU: Move some definitions to minimal headers.

On Mon, Dec 29, 2008 at 11:42:43AM +0200, Pekka Enberg wrote:
> On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
> <[email protected]> wrote:
> > linux/rcupdate.h used to include SLAB headers, which got in the way of
> > switching kmemtrace to use tracepoints instead of markers. The circular
> > header inclusion pattern that appeared was making it impossible to use
> > tracepoints in SLAB allocator inlines.
> >
> > This moves some header code into separate files. As an added bonus,
> > preprocessing is faster if care is taken to use these minimal headers,
> > since no effort is spent on including SLAB headers.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
>
> I'm okay with this but lets cc Paul for an ack/nak.

First let me make sure that I understand the intent. The idea looks to
me to be to split rcuclassic.h into an rcuclassic_min.h that contains
only those definitions needed by a pure RCU reader, leaving the rest of
the internal-to-RCU stuff in rcuclassic.h. Correct?

If so, then rcupreempt.h and rcutree.h would need to be similarly split,
correct? (Not sure rcupreempt.h splits so nicely, will need to look.)

I have not yet thought through what would need to happen in rcupdate.h.
My guess is that very few files that include rcupdate.h need anything
more than what you put in rcuclassic_min.h along with rcu_assign_pointer()
and rcu_dereference() and various function declarations from rcupdate.h.

If this really is the case, then it seems like it would be best for
rcupdate.h to give the minimal RCU-reader definitions and for something
like rcupdate_internal.h to give the works. Not sure that this really
works nicely for all of the RCU implementations, though... Will dig
into it some more.

Thanx, Paul

> > ---
> > include/linux/rcuclassic.h | 40 +--------
> > include/linux/rcuclassic_min.h | 77 +++++++++++++++
> > include/linux/rcupdate.h | 160 +-------------------------------
> > include/linux/rcupdate_min.h | 205 ++++++++++++++++++++++++++++++++++++++++
> > include/linux/rcupreempt.h | 32 +------
> > include/linux/rcupreempt_min.h | 68 +++++++++++++
> > 6 files changed, 353 insertions(+), 229 deletions(-)
> > create mode 100644 include/linux/rcuclassic_min.h
> > create mode 100644 include/linux/rcupdate_min.h
> > create mode 100644 include/linux/rcupreempt_min.h
> >
> > diff --git a/include/linux/rcuclassic.h b/include/linux/rcuclassic.h
> > index 5f89b62..32f1512 100644
> > --- a/include/linux/rcuclassic.h
> > +++ b/include/linux/rcuclassic.h
> > @@ -39,6 +39,7 @@
> > #include <linux/percpu.h>
> > #include <linux/cpumask.h>
> > #include <linux/seqlock.h>
> > +#include <linux/rcuclassic_min.h>
> >
> > #ifdef CONFIG_RCU_CPU_STALL_DETECTOR
> > #define RCU_SECONDS_TILL_STALL_CHECK ( 3 * HZ) /* for rcp->jiffies_stall */
> > @@ -131,45 +132,6 @@ static inline void rcu_bh_qsctr_inc(int cpu)
> > extern int rcu_pending(int cpu);
> > extern int rcu_needs_cpu(int cpu);
> >
> > -#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > -extern struct lockdep_map rcu_lock_map;
> > -# define rcu_read_acquire() \
> > - lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
> > -# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> > -#else
> > -# define rcu_read_acquire() do { } while (0)
> > -# define rcu_read_release() do { } while (0)
> > -#endif
> > -
> > -#define __rcu_read_lock() \
> > - do { \
> > - preempt_disable(); \
> > - __acquire(RCU); \
> > - rcu_read_acquire(); \
> > - } while (0)
> > -#define __rcu_read_unlock() \
> > - do { \
> > - rcu_read_release(); \
> > - __release(RCU); \
> > - preempt_enable(); \
> > - } while (0)
> > -#define __rcu_read_lock_bh() \
> > - do { \
> > - local_bh_disable(); \
> > - __acquire(RCU_BH); \
> > - rcu_read_acquire(); \
> > - } while (0)
> > -#define __rcu_read_unlock_bh() \
> > - do { \
> > - rcu_read_release(); \
> > - __release(RCU_BH); \
> > - local_bh_enable(); \
> > - } while (0)
> > -
> > -#define __synchronize_sched() synchronize_rcu()
> > -
> > -#define call_rcu_sched(head, func) call_rcu(head, func)
> > -
> > extern void __rcu_init(void);
> > #define rcu_init_sched() do { } while (0)
> > extern void rcu_check_callbacks(int cpu, int user);
> > diff --git a/include/linux/rcuclassic_min.h b/include/linux/rcuclassic_min.h
> > new file mode 100644
> > index 0000000..7ea051b
> > --- /dev/null
> > +++ b/include/linux/rcuclassic_min.h
> > @@ -0,0 +1,77 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion (classic version)
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright IBM Corporation, 2001
> > + *
> > + * Author: Dipankar Sarma <[email protected]>
> > + *
> > + * Based on the original work by Paul McKenney <[email protected]>
> > + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> > + * Papers:
> > + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> > + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * Documentation/RCU
> > + *
> > + */
> > +
> > +#ifndef _LINUX_RCUCLASSIC_MIN_H
> > +#define _LINUX_RCUCLASSIC_MIN_H
> > +
> > +#ifdef CONFIG_DEBUG_LOCK_ALLOC
> > +#include <linux/lockdep.h>
> > +extern struct lockdep_map rcu_lock_map;
> > +# define rcu_read_acquire() \
> > + lock_acquire(&rcu_lock_map, 0, 0, 2, 1, NULL, _THIS_IP_)
> > +# define rcu_read_release() lock_release(&rcu_lock_map, 1, _THIS_IP_)
> > +#else
> > +# define rcu_read_acquire() do { } while (0)
> > +# define rcu_read_release() do { } while (0)
> > +#endif
> > +
> > +#define __rcu_read_lock() \
> > + do { \
> > + preempt_disable(); \
> > + __acquire(RCU); \
> > + rcu_read_acquire(); \
> > + } while (0)
> > +#define __rcu_read_unlock() \
> > + do { \
> > + rcu_read_release(); \
> > + __release(RCU); \
> > + preempt_enable(); \
> > + } while (0)
> > +#define __rcu_read_lock_bh() \
> > + do { \
> > + local_bh_disable(); \
> > + __acquire(RCU_BH); \
> > + rcu_read_acquire(); \
> > + } while (0)
> > +#define __rcu_read_unlock_bh() \
> > + do { \
> > + rcu_read_release(); \
> > + __release(RCU_BH); \
> > + local_bh_enable(); \
> > + } while (0)
> > +
> > +#define __synchronize_sched() synchronize_rcu()
> > +
> > +#define call_rcu_sched(head, func) call_rcu(head, func)
> > +
> > +#endif /* _LINUX_RCUCLASSIC_MIN_H */
> > +
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 86f1f5e..d7659e8 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -41,16 +41,7 @@
> > #include <linux/seqlock.h>
> > #include <linux/lockdep.h>
> > #include <linux/completion.h>
> > -
> > -/**
> > - * struct rcu_head - callback structure for use with RCU
> > - * @next: next update requests in a list
> > - * @func: actual update function to call after the grace period.
> > - */
> > -struct rcu_head {
> > - struct rcu_head *next;
> > - void (*func)(struct rcu_head *head);
> > -};
> > +#include <linux/rcupdate_min.h>
> >
> > #ifdef CONFIG_CLASSIC_RCU
> > #include <linux/rcuclassic.h>
> > @@ -64,131 +55,6 @@ struct rcu_head {
> > (ptr)->next = NULL; (ptr)->func = NULL; \
> > } while (0)
> >
> > -/**
> > - * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> > - *
> > - * When synchronize_rcu() is invoked on one CPU while other CPUs
> > - * are within RCU read-side critical sections, then the
> > - * synchronize_rcu() is guaranteed to block until after all the other
> > - * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
> > - * on one CPU while other CPUs are within RCU read-side critical
> > - * sections, invocation of the corresponding RCU callback is deferred
> > - * until after the all the other CPUs exit their critical sections.
> > - *
> > - * Note, however, that RCU callbacks are permitted to run concurrently
> > - * with RCU read-side critical sections. One way that this can happen
> > - * is via the following sequence of events: (1) CPU 0 enters an RCU
> > - * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> > - * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> > - * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> > - * callback is invoked. This is legal, because the RCU read-side critical
> > - * section that was running concurrently with the call_rcu() (and which
> > - * therefore might be referencing something that the corresponding RCU
> > - * callback would free up) has completed before the corresponding
> > - * RCU callback is invoked.
> > - *
> > - * RCU read-side critical sections may be nested. Any deferred actions
> > - * will be deferred until the outermost RCU read-side critical section
> > - * completes.
> > - *
> > - * It is illegal to block while in an RCU read-side critical section.
> > - */
> > -#define rcu_read_lock() __rcu_read_lock()
> > -
> > -/**
> > - * rcu_read_unlock - marks the end of an RCU read-side critical section.
> > - *
> > - * See rcu_read_lock() for more information.
> > - */
> > -
> > -/*
> > - * So where is rcu_write_lock()? It does not exist, as there is no
> > - * way for writers to lock out RCU readers. This is a feature, not
> > - * a bug -- this property is what provides RCU's performance benefits.
> > - * Of course, writers must coordinate with each other. The normal
> > - * spinlock primitives work well for this, but any other technique may be
> > - * used as well. RCU does not care how the writers keep out of each
> > - * others' way, as long as they do so.
> > - */
> > -#define rcu_read_unlock() __rcu_read_unlock()
> > -
> > -/**
> > - * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
> > - *
> > - * This is equivalent of rcu_read_lock(), but to be used when updates
> > - * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> > - * consider completion of a softirq handler to be a quiescent state,
> > - * a process in RCU read-side critical section must be protected by
> > - * disabling softirqs. Read-side critical sections in interrupt context
> > - * can use just rcu_read_lock().
> > - *
> > - */
> > -#define rcu_read_lock_bh() __rcu_read_lock_bh()
> > -
> > -/*
> > - * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
> > - *
> > - * See rcu_read_lock_bh() for more information.
> > - */
> > -#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
> > -
> > -/**
> > - * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
> > - *
> > - * Should be used with either
> > - * - synchronize_sched()
> > - * or
> > - * - call_rcu_sched() and rcu_barrier_sched()
> > - * on the write-side to insure proper synchronization.
> > - */
> > -#define rcu_read_lock_sched() preempt_disable()
> > -
> > -/*
> > - * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
> > - *
> > - * See rcu_read_lock_sched for more information.
> > - */
> > -#define rcu_read_unlock_sched() preempt_enable()
> > -
> > -
> > -
> > -/**
> > - * rcu_dereference - fetch an RCU-protected pointer in an
> > - * RCU read-side critical section. This pointer may later
> > - * be safely dereferenced.
> > - *
> > - * Inserts memory barriers on architectures that require them
> > - * (currently only the Alpha), and, more importantly, documents
> > - * exactly which pointers are protected by RCU.
> > - */
> > -
> > -#define rcu_dereference(p) ({ \
> > - typeof(p) _________p1 = ACCESS_ONCE(p); \
> > - smp_read_barrier_depends(); \
> > - (_________p1); \
> > - })
> > -
> > -/**
> > - * rcu_assign_pointer - assign (publicize) a pointer to a newly
> > - * initialized structure that will be dereferenced by RCU read-side
> > - * critical sections. Returns the value assigned.
> > - *
> > - * Inserts memory barriers on architectures that require them
> > - * (pretty much all of them other than x86), and also prevents
> > - * the compiler from reordering the code that initializes the
> > - * structure after the pointer assignment. More importantly, this
> > - * call documents which pointers will be dereferenced by RCU read-side
> > - * code.
> > - */
> > -
> > -#define rcu_assign_pointer(p, v) \
> > - ({ \
> > - if (!__builtin_constant_p(v) || \
> > - ((v) != NULL)) \
> > - smp_wmb(); \
> > - (p) = (v); \
> > - })
> > -
> > /* Infrastructure to implement the synchronize_() primitives. */
> >
> > struct rcu_synchronize {
> > @@ -211,24 +77,6 @@ void name(void) \
> > }
> >
> > /**
> > - * synchronize_sched - block until all CPUs have exited any non-preemptive
> > - * kernel code sequences.
> > - *
> > - * This means that all preempt_disable code sequences, including NMI and
> > - * hardware-interrupt handlers, in progress on entry will have completed
> > - * before this primitive returns. However, this does not guarantee that
> > - * softirq handlers will have completed, since in some kernels, these
> > - * handlers can run in process context, and can block.
> > - *
> > - * This primitive provides the guarantees made by the (now removed)
> > - * synchronize_kernel() API. In contrast, synchronize_rcu() only
> > - * guarantees that rcu_read_lock() sections will have completed.
> > - * In "classic RCU", these two guarantees happen to be one and
> > - * the same, but can differ in realtime RCU implementations.
> > - */
> > -#define synchronize_sched() __synchronize_sched()
> > -
> > -/**
> > * call_rcu - Queue an RCU callback for invocation after a grace period.
> > * @head: structure to be used for queueing the RCU updates.
> > * @func: actual update function to be invoked after the grace period
> > @@ -263,12 +111,6 @@ extern void call_rcu(struct rcu_head *head,
> > extern void call_rcu_bh(struct rcu_head *head,
> > void (*func)(struct rcu_head *head));
> >
> > -/* Exported common interfaces */
> > -extern void synchronize_rcu(void);
> > -extern void rcu_barrier(void);
> > -extern void rcu_barrier_bh(void);
> > -extern void rcu_barrier_sched(void);
> > -
> > /* Internal to kernel */
> > extern void rcu_init(void);
> > extern int rcu_needs_cpu(int cpu);
> > diff --git a/include/linux/rcupdate_min.h b/include/linux/rcupdate_min.h
> > new file mode 100644
> > index 0000000..35ca45d
> > --- /dev/null
> > +++ b/include/linux/rcupdate_min.h
> > @@ -0,0 +1,205 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright IBM Corporation, 2001
> > + *
> > + * Author: Dipankar Sarma <[email protected]>
> > + *
> > + * Based on the original work by Paul McKenney <[email protected]>
> > + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> > + * Papers:
> > + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> > + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> > + *
> > + * This is a minimal header. Use it if you don't require all definitions,
> > + * as it doesn't end up including slab stuff and lots of other headers.
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * http://lse.sourceforge.net/locking/rcupdate.html
> > + *
> > + */
> > +
> > +#ifndef __LINUX_RCUPDATE_MIN_H
> > +#define __LINUX_RCUPDATE_MIN_H
> > +
> > +/**
> > + * struct rcu_head - callback structure for use with RCU
> > + * @next: next update requests in a list
> > + * @func: actual update function to call after the grace period.
> > + */
> > +struct rcu_head {
> > + struct rcu_head *next;
> > + void (*func)(struct rcu_head *head);
> > +};
> > +
> > +#ifdef CONFIG_CLASSIC_RCU
> > +#include <linux/rcuclassic_min.h>
> > +#else /* #ifdef CONFIG_CLASSIC_RCU */
> > +#include <linux/rcupreempt_min.h>
> > +#endif /* #else #ifdef CONFIG_CLASSIC_RCU */
> > +
> > +/**
> > + * rcu_read_lock - mark the beginning of an RCU read-side critical section.
> > + *
> > + * When synchronize_rcu() is invoked on one CPU while other CPUs
> > + * are within RCU read-side critical sections, then the
> > + * synchronize_rcu() is guaranteed to block until after all the other
> > + * CPUs exit their critical sections. Similarly, if call_rcu() is invoked
> > + * on one CPU while other CPUs are within RCU read-side critical
> > + * sections, invocation of the corresponding RCU callback is deferred
> > + * until after the all the other CPUs exit their critical sections.
> > + *
> > + * Note, however, that RCU callbacks are permitted to run concurrently
> > + * with RCU read-side critical sections. One way that this can happen
> > + * is via the following sequence of events: (1) CPU 0 enters an RCU
> > + * read-side critical section, (2) CPU 1 invokes call_rcu() to register
> > + * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
> > + * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
> > + * callback is invoked. This is legal, because the RCU read-side critical
> > + * section that was running concurrently with the call_rcu() (and which
> > + * therefore might be referencing something that the corresponding RCU
> > + * callback would free up) has completed before the corresponding
> > + * RCU callback is invoked.
> > + *
> > + * RCU read-side critical sections may be nested. Any deferred actions
> > + * will be deferred until the outermost RCU read-side critical section
> > + * completes.
> > + *
> > + * It is illegal to block while in an RCU read-side critical section.
> > + */
> > +#define rcu_read_lock() __rcu_read_lock()
> > +
> > +/**
> > + * rcu_read_unlock - marks the end of an RCU read-side critical section.
> > + *
> > + * See rcu_read_lock() for more information.
> > + */
> > +
> > +/*
> > + * So where is rcu_write_lock()? It does not exist, as there is no
> > + * way for writers to lock out RCU readers. This is a feature, not
> > + * a bug -- this property is what provides RCU's performance benefits.
> > + * Of course, writers must coordinate with each other. The normal
> > + * spinlock primitives work well for this, but any other technique may be
> > + * used as well. RCU does not care how the writers keep out of each
> > + * others' way, as long as they do so.
> > + */
> > +#define rcu_read_unlock() __rcu_read_unlock()
> > +
> > +/**
> > + * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
> > + *
> > + * This is equivalent of rcu_read_lock(), but to be used when updates
> > + * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
> > + * consider completion of a softirq handler to be a quiescent state,
> > + * a process in RCU read-side critical section must be protected by
> > + * disabling softirqs. Read-side critical sections in interrupt context
> > + * can use just rcu_read_lock().
> > + *
> > + */
> > +#define rcu_read_lock_bh() __rcu_read_lock_bh()
> > +
> > +/*
> > + * rcu_read_unlock_bh - marks the end of a softirq-only RCU critical section
> > + *
> > + * See rcu_read_lock_bh() for more information.
> > + */
> > +#define rcu_read_unlock_bh() __rcu_read_unlock_bh()
> > +
> > +/**
> > + * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
> > + *
> > + * Should be used with either
> > + * - synchronize_sched()
> > + * or
> > + * - call_rcu_sched() and rcu_barrier_sched()
> > + * on the write-side to insure proper synchronization.
> > + */
> > +#define rcu_read_lock_sched() preempt_disable()
> > +
> > +/*
> > + * rcu_read_unlock_sched - marks the end of a RCU-classic critical section
> > + *
> > + * See rcu_read_lock_sched for more information.
> > + */
> > +#define rcu_read_unlock_sched() preempt_enable()
> > +
> > +
> > +
> > +/**
> > + * rcu_dereference - fetch an RCU-protected pointer in an
> > + * RCU read-side critical section. This pointer may later
> > + * be safely dereferenced.
> > + *
> > + * Inserts memory barriers on architectures that require them
> > + * (currently only the Alpha), and, more importantly, documents
> > + * exactly which pointers are protected by RCU.
> > + */
> > +
> > +#define rcu_dereference(p) ({ \
> > + typeof(p) _________p1 = ACCESS_ONCE(p); \
> > + smp_read_barrier_depends(); \
> > + (_________p1); \
> > + })
> > +
> > +/**
> > + * rcu_assign_pointer - assign (publicize) a pointer to a newly
> > + * initialized structure that will be dereferenced by RCU read-side
> > + * critical sections. Returns the value assigned.
> > + *
> > + * Inserts memory barriers on architectures that require them
> > + * (pretty much all of them other than x86), and also prevents
> > + * the compiler from reordering the code that initializes the
> > + * structure after the pointer assignment. More importantly, this
> > + * call documents which pointers will be dereferenced by RCU read-side
> > + * code.
> > + */
> > +
> > +#define rcu_assign_pointer(p, v) \
> > + ({ \
> > + if (!__builtin_constant_p(v) || \
> > + ((v) != NULL)) \
> > + smp_wmb(); \
> > + (p) = (v); \
> > + })
> > +
> > +/**
> > + * synchronize_sched - block until all CPUs have exited any non-preemptive
> > + * kernel code sequences.
> > + *
> > + * This means that all preempt_disable code sequences, including NMI and
> > + * hardware-interrupt handlers, in progress on entry will have completed
> > + * before this primitive returns. However, this does not guarantee that
> > + * softirq handlers will have completed, since in some kernels, these
> > + * handlers can run in process context, and can block.
> > + *
> > + * This primitive provides the guarantees made by the (now removed)
> > + * synchronize_kernel() API. In contrast, synchronize_rcu() only
> > + * guarantees that rcu_read_lock() sections will have completed.
> > + * In "classic RCU", these two guarantees happen to be one and
> > + * the same, but can differ in realtime RCU implementations.
> > + */
> > +#define synchronize_sched() __synchronize_sched()
> > +
> > +/* Exported common interfaces */
> > +extern void synchronize_rcu(void);
> > +extern void rcu_barrier(void);
> > +extern void rcu_barrier_bh(void);
> > +extern void rcu_barrier_sched(void);
> > +
> > +#endif /* __LINUX_RCUPDATE_MIN_H */
> > +
> > diff --git a/include/linux/rcupreempt.h b/include/linux/rcupreempt.h
> > index 3e05c09..1f1dcf1 100644
> > --- a/include/linux/rcupreempt.h
> > +++ b/include/linux/rcupreempt.h
> > @@ -39,6 +39,7 @@
> > #include <linux/percpu.h>
> > #include <linux/cpumask.h>
> > #include <linux/seqlock.h>
> > +#include <linux/rcupreempt_min.h>
> >
> > struct rcu_dyntick_sched {
> > int dynticks;
> > @@ -58,37 +59,6 @@ static inline void rcu_qsctr_inc(int cpu)
> > }
> > #define rcu_bh_qsctr_inc(cpu)
> >
> > -/*
> > - * Someone might want to pass call_rcu_bh as a function pointer.
> > - * So this needs to just be a rename and not a macro function.
> > - * (no parentheses)
> > - */
> > -#define call_rcu_bh call_rcu
> > -
> > -/**
> > - * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
> > - * @head: structure to be used for queueing the RCU updates.
> > - * @func: actual update function to be invoked after the grace period
> > - *
> > - * The update function will be invoked some time after a full
> > - * synchronize_sched()-style grace period elapses, in other words after
> > - * all currently executing preempt-disabled sections of code (including
> > - * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
> > - * completed.
> > - */
> > -extern void call_rcu_sched(struct rcu_head *head,
> > - void (*func)(struct rcu_head *head));
> > -
> > -extern void __rcu_read_lock(void) __acquires(RCU);
> > -extern void __rcu_read_unlock(void) __releases(RCU);
> > -extern int rcu_pending(int cpu);
> > -extern int rcu_needs_cpu(int cpu);
> > -
> > -#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
> > -#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
> > -
> > -extern void __synchronize_sched(void);
> > -
> > extern void __rcu_init(void);
> > extern void rcu_init_sched(void);
> > extern void rcu_check_callbacks(int cpu, int user);
> > diff --git a/include/linux/rcupreempt_min.h b/include/linux/rcupreempt_min.h
> > new file mode 100644
> > index 0000000..37d3693
> > --- /dev/null
> > +++ b/include/linux/rcupreempt_min.h
> > @@ -0,0 +1,68 @@
> > +/*
> > + * Read-Copy Update mechanism for mutual exclusion (RT implementation)
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your option) any later version.
> > + *
> > + * This program is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public License
> > + * along with this program; if not, write to the Free Software
> > + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
> > + *
> > + * Copyright (C) IBM Corporation, 2006
> > + *
> > + * Author: Paul McKenney <[email protected]>
> > + *
> > + * Based on the original work by Paul McKenney <[email protected]>
> > + * and inputs from Rusty Russell, Andrea Arcangeli and Andi Kleen.
> > + * Papers:
> > + * http://www.rdrop.com/users/paulmck/paper/rclockpdcsproof.pdf
> > + * http://lse.sourceforge.net/locking/rclock_OLS.2001.05.01c.sc.pdf (OLS2001)
> > + *
> > + * For detailed explanation of Read-Copy Update mechanism see -
> > + * Documentation/RCU
> > + *
> > + */
> > +
> > +#ifndef __LINUX_RCUPREEMPT_MIN_H
> > +#define __LINUX_RCUPREEMPT_MIN_H
> > +
> > +/*
> > + * Someone might want to pass call_rcu_bh as a function pointer.
> > + * So this needs to just be a rename and not a macro function.
> > + * (no parentheses)
> > + */
> > +#define call_rcu_bh call_rcu
> > +
> > +/**
> > + * call_rcu_sched - Queue RCU callback for invocation after sched grace period.
> > + * @head: structure to be used for queueing the RCU updates.
> > + * @func: actual update function to be invoked after the grace period
> > + *
> > + * The update function will be invoked some time after a full
> > + * synchronize_sched()-style grace period elapses, in other words after
> > + * all currently executing preempt-disabled sections of code (including
> > + * hardirq handlers, NMI handlers, and local_irq_save() blocks) have
> > + * completed.
> > + */
> > +extern void call_rcu_sched(struct rcu_head *head,
> > + void (*func)(struct rcu_head *head));
> > +
> > +extern void __rcu_read_lock(void) __acquires(RCU);
> > +extern void __rcu_read_unlock(void) __releases(RCU);
> > +extern int rcu_pending(int cpu);
> > +extern int rcu_needs_cpu(int cpu);
> > +
> > +#define __rcu_read_lock_bh() { rcu_read_lock(); local_bh_disable(); }
> > +#define __rcu_read_unlock_bh() { local_bh_enable(); rcu_read_unlock(); }
> > +
> > +extern void __synchronize_sched(void);
> > +
> > +#endif /* __LINUX_RCUPREEMPT_MIN_H */
> > +
> > --
> > 1.6.0.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

2009-01-02 20:54:10

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

* Eduard - Gabriel Munteanu ([email protected]) wrote:
> kmemtrace now uses tracepoints instead of markers. We no longer need to
> use format specifiers. Also dropped kmemtrace_mark_alloc_node(), since
> it's easy to pass -1 as the NUMA node without providing a needless and
> long-named variant of the same function.
>
> Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
> ---
> include/linux/kmemtrace.h | 76 +++++++++-----------------------------------
> include/linux/slab_def.h | 11 +++---
> include/linux/slub_def.h | 16 +++++-----
> lib/Kconfig.debug | 2 +-
> mm/kmemtrace.c | 50 ++++++++++++++---------------
> mm/slab.c | 27 ++++++++--------
> mm/slob.c | 32 +++++++++---------
> mm/slub.c | 36 ++++++++++----------
> 8 files changed, 103 insertions(+), 147 deletions(-)
>
> diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
> index 5bea8ea..7b38245 100644
> --- a/include/linux/kmemtrace.h
> +++ b/include/linux/kmemtrace.h
> @@ -9,8 +9,8 @@
>
> #ifdef __KERNEL__
>
> +#include <linux/tracepoint.h>
> #include <linux/types.h>
> -#include <linux/marker.h>
>
> enum kmemtrace_type_id {
> KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
> @@ -18,67 +18,23 @@ enum kmemtrace_type_id {
> KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
> };
>
> -#ifdef CONFIG_KMEMTRACE
> -
> extern void kmemtrace_init(void);
>
> -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,

Why do you need the enum kmemtrace_type_id type_id exactly ? Can you
remove this parameter by adding more specific tracepoints ?

> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags,
> - int node)
> -{
> - trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
> - "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
> - type_id, call_site, (unsigned long) ptr,
> - (unsigned long) bytes_req, (unsigned long) bytes_alloc,
> - (unsigned long) gfp_flags, node);
> -}
> -
> -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr)
> -{
> - trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
> - type_id, call_site, (unsigned long) ptr);
> -}
> -
> -#else /* CONFIG_KMEMTRACE */
> -
> -static inline void kmemtrace_init(void)
> -{
> -}
> -
> -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags,
> - int node)
> -{
> -}
> -
> -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr)
> -{
> -}
> -
> -#endif /* CONFIG_KMEMTRACE */
> -
> -static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
> - unsigned long call_site,
> - const void *ptr,
> - size_t bytes_req,
> - size_t bytes_alloc,
> - gfp_t gfp_flags)
> -{
> - kmemtrace_mark_alloc_node(type_id, call_site, ptr,
> - bytes_req, bytes_alloc, gfp_flags, -1);
> -}
> +DEFINE_TRACE(kmemtrace_alloc,
> + TPPROTO(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr,
> + size_t bytes_req,
> + size_t bytes_alloc,
> + gfp_t gfp_flags,
> + int node),
> + TPARGS(type_id, call_site, ptr,
> + bytes_req, bytes_alloc, gfp_flags, node));
> +DEFINE_TRACE(kmemtrace_free,
> + TPPROTO(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr),
> + TPARGS(type_id, call_site, ptr));
>
> #endif /* __KERNEL__ */
>
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 7555ce9..e5a8b60 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -76,8 +76,9 @@ found:
>
> ret = kmem_cache_alloc_notrace(cachep, flags);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> - size, slab_buffer_size(cachep), flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> + size, slab_buffer_size(cachep),

You'll probably want to give "cachep" as parameter and do the
slab_buffer_size(cachep) within the probe to remove unneeded code from
the caller site.

> + flags, -1);
>
> return ret;
> }
> @@ -134,9 +135,9 @@ found:
>
> ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> - ret, size, slab_buffer_size(cachep),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> + ret, size, slab_buffer_size(cachep),
> + flags, node);
>
> return ret;
> }
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index dc28432..ce69c68 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
> unsigned int order = get_order(size);
> void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> - size, PAGE_SIZE << order, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> + size, PAGE_SIZE << order, flags, -1);

Same here for order.

Why do you need to pass -1 ? Can you have a more specific tracepoint
instead ?

>
> return ret;
> }
> @@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>
> ret = kmem_cache_alloc_notrace(s, flags);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> - _THIS_IP_, ret,
> - size, s->size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _THIS_IP_, ret,
> + size, s->size, flags, -1);
>

Pass s as parameter and do the s->size dereference within the probe.

The same applies to many other caller sites below.

Mathieu

> return ret;
> }
> @@ -283,9 +283,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>
> ret = kmem_cache_alloc_node_notrace(s, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _THIS_IP_, ret,
> - size, s->size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _THIS_IP_, ret,
> + size, s->size, flags, node);
>
> return ret;
> }
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index b5417e2..9cf8eb8 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -805,7 +805,7 @@ config FIREWIRE_OHCI_REMOTE_DMA
>
> config KMEMTRACE
> bool "Kernel memory tracer (kmemtrace)"
> - depends on RELAY && DEBUG_FS && MARKERS
> + depends on RELAY && DEBUG_FS && TRACEPOINTS
> help
> kmemtrace provides tracing for slab allocator functions, such as
> kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
> diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
> index 2a70a80..7bc81b2 100644
> --- a/mm/kmemtrace.c
> +++ b/mm/kmemtrace.c
> @@ -58,8 +58,13 @@ struct kmemtrace_stats_alloc {
> s32 numa_node;
> } __attribute__ ((__packed__));
>
> -static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
> - const char *format, va_list *args)
> +static void kmemtrace_probe_alloc(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr,
> + size_t bytes_req,
> + size_t bytes_alloc,
> + gfp_t gfp_flags,
> + int node)
> {
> unsigned long flags;
> struct kmemtrace_event *ev;
> @@ -81,25 +86,26 @@ static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
>
> ev = buf;
> ev->event_id = KMEMTRACE_EVENT_ALLOC;
> - ev->type_id = va_arg(*args, int);
> + ev->type_id = type_id;
> ev->event_size = sizeof(struct kmemtrace_event) +
> sizeof(struct kmemtrace_stats_alloc);
> ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> - ev->call_site = va_arg(*args, unsigned long);
> - ev->ptr = va_arg(*args, unsigned long);
> + ev->call_site = call_site;
> + ev->ptr = (unsigned long) ptr;
>
> stats = buf + sizeof(struct kmemtrace_event);
> - stats->bytes_req = va_arg(*args, unsigned long);
> - stats->bytes_alloc = va_arg(*args, unsigned long);
> - stats->gfp_flags = va_arg(*args, unsigned long);
> - stats->numa_node = va_arg(*args, int);
> + stats->bytes_req = bytes_req;
> + stats->bytes_alloc = bytes_alloc;
> + stats->gfp_flags = gfp_flags;
> + stats->numa_node = node;
>
> failed:
> local_irq_restore(flags);
> }
>
> -static void kmemtrace_probe_free(void *probe_data, void *call_data,
> - const char *format, va_list *args)
> +static void kmemtrace_probe_free(enum kmemtrace_type_id type_id,
> + unsigned long call_site,
> + const void *ptr)
> {
> unsigned long flags;
> struct kmemtrace_event *ev;
> @@ -115,11 +121,11 @@ static void kmemtrace_probe_free(void *probe_data, void *call_data,
> * C99 does not guarantee the rvalues evaluation order.
> */
> ev->event_id = KMEMTRACE_EVENT_FREE;
> - ev->type_id = va_arg(*args, int);
> + ev->type_id = type_id;
> ev->event_size = sizeof(struct kmemtrace_event);
> ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> - ev->call_site = va_arg(*args, unsigned long);
> - ev->ptr = va_arg(*args, unsigned long);
> + ev->call_site = call_site;
> + ev->ptr = (unsigned long) ptr;
>
> failed:
> local_irq_restore(flags);
> @@ -173,26 +179,18 @@ static int kmemtrace_start_probes(void)
> {
> int err;
>
> - err = marker_probe_register("kmemtrace_alloc", "type_id %d "
> - "call_site %lu ptr %lu "
> - "bytes_req %lu bytes_alloc %lu "
> - "gfp_flags %lu node %d",
> - kmemtrace_probe_alloc, NULL);
> + err = register_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> if (err)
> return err;
> - err = marker_probe_register("kmemtrace_free", "type_id %d "
> - "call_site %lu ptr %lu",
> - kmemtrace_probe_free, NULL);
> + err = register_trace_kmemtrace_free(kmemtrace_probe_free);
>
> return err;
> }
>
> static void kmemtrace_stop_probes(void)
> {
> - marker_probe_unregister("kmemtrace_alloc",
> - kmemtrace_probe_alloc, NULL);
> - marker_probe_unregister("kmemtrace_free",
> - kmemtrace_probe_free, NULL);
> + unregister_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> + unregister_trace_kmemtrace_free(kmemtrace_probe_free);
> }
>
> static int kmemtrace_enabled_get(void *data, u64 *val)
> diff --git a/mm/slab.c b/mm/slab.c
> index 1fcb32b..74d405d 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
> {
> void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - obj_size(cachep), cachep->buffer_size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + obj_size(cachep), cachep->buffer_size,
> + flags, -1);
>
> return ret;
> }
> @@ -3686,9 +3687,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> void *ret = __cache_alloc_node(cachep, flags, nodeid,
> __builtin_return_address(0));
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - obj_size(cachep), cachep->buffer_size,
> - flags, nodeid);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + obj_size(cachep), cachep->buffer_size,
> + flags, nodeid);
>
> return ret;
> }
> @@ -3716,9 +3717,9 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
> return cachep;
> ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - (unsigned long) caller, ret,
> - size, cachep->buffer_size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + (unsigned long) caller, ret,
> + size, cachep->buffer_size, flags, node);
>
> return ret;
> }
> @@ -3768,9 +3769,9 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
> return cachep;
> ret = __cache_alloc(cachep, flags, caller);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> - (unsigned long) caller, ret,
> - size, cachep->buffer_size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + (unsigned long) caller, ret,
> + size, cachep->buffer_size, flags, -1);
>
> return ret;
> }
> @@ -3816,7 +3817,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> __cache_free(cachep, objp);
> local_irq_restore(flags);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> @@ -3844,7 +3845,7 @@ void kfree(const void *objp)
> __cache_free(c, (void *)objp);
> local_irq_restore(flags);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> }
> EXPORT_SYMBOL(kfree);
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 0f1a49f..15f812c 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -477,9 +477,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> *m = size;
> ret = (void *)m + align;
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, size + align, gfp, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, size + align, gfp, node);
> } else {
> unsigned int order = get_order(size);
>
> @@ -490,9 +490,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> page->private = size;
> }
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, PAGE_SIZE << order, gfp, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, PAGE_SIZE << order, gfp, node);
> }
>
> return ret;
> @@ -514,7 +514,7 @@ void kfree(const void *block)
> } else
> put_page(&sp->page);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> }
> EXPORT_SYMBOL(kfree);
>
> @@ -585,16 +585,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
>
> if (c->size < PAGE_SIZE) {
> b = slob_alloc(c->size, flags, c->align, node);
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> - _RET_IP_, b, c->size,
> - SLOB_UNITS(c->size) * SLOB_UNIT,
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> + _RET_IP_, b, c->size,
> + SLOB_UNITS(c->size) * SLOB_UNIT,
> + flags, node);
> } else {
> b = slob_new_page(flags, get_order(c->size), node);
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> - _RET_IP_, b, c->size,
> - PAGE_SIZE << get_order(c->size),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> + _RET_IP_, b, c->size,
> + PAGE_SIZE << get_order(c->size),
> + flags, node);
> }
>
> if (c->ctor)
> @@ -632,7 +632,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
> __kmem_cache_free(b, c->size);
> }
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> diff --git a/mm/slub.c b/mm/slub.c
> index f756915..3a2ec3a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1620,8 +1620,8 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
> {
> void *ret = slab_alloc(s, gfpflags, -1, _RET_IP_);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - s->objsize, s->size, gfpflags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + s->objsize, s->size, gfpflags, -1);
>
> return ret;
> }
> @@ -1640,8 +1640,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> {
> void *ret = slab_alloc(s, gfpflags, node, _RET_IP_);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> - s->objsize, s->size, gfpflags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> + s->objsize, s->size, gfpflags, node);
>
> return ret;
> }
> @@ -1766,7 +1766,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
>
> slab_free(s, page, x, _RET_IP_);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> }
> EXPORT_SYMBOL(kmem_cache_free);
>
> @@ -2694,8 +2694,8 @@ void *__kmalloc(size_t size, gfp_t flags)
>
> ret = slab_alloc(s, flags, -1, _RET_IP_);
>
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> - size, s->size, flags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> + size, s->size, flags, -1);
>
> return ret;
> }
> @@ -2721,10 +2721,10 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> if (unlikely(size > PAGE_SIZE)) {
> ret = kmalloc_large_node(size, flags, node);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> - _RET_IP_, ret,
> - size, PAGE_SIZE << get_order(size),
> - flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> + _RET_IP_, ret,
> + size, PAGE_SIZE << get_order(size),
> + flags, node);
>
> return ret;
> }
> @@ -2736,8 +2736,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
>
> ret = slab_alloc(s, flags, node, _RET_IP_);
>
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> - size, s->size, flags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> + size, s->size, flags, node);
>
> return ret;
> }
> @@ -2798,7 +2798,7 @@ void kfree(const void *x)
> }
> slab_free(page->slab, page, object, _RET_IP_);
>
> - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> }
> EXPORT_SYMBOL(kfree);
>
> @@ -3272,8 +3272,8 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
> ret = slab_alloc(s, gfpflags, -1, caller);
>
> /* Honor the call site pointer we recieved. */
> - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> - s->size, gfpflags);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> + s->size, gfpflags, -1);
>
> return ret;
> }
> @@ -3295,8 +3295,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> ret = slab_alloc(s, gfpflags, node, caller);
>
> /* Honor the call site pointer we recieved. */
> - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> - size, s->size, gfpflags, node);
> + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> + size, s->size, gfpflags, node);
>
> return ret;
> }
> --
> 1.6.0.6
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

2009-01-02 20:54:44

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

* Pekka Enberg ([email protected]) wrote:
> On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
> <[email protected]> wrote:
> > kmemtrace now uses tracepoints instead of markers. We no longer need to
> > use format specifiers. Also dropped kmemtrace_mark_alloc_node(), since
> > it's easy to pass -1 as the NUMA node without providing a needless and
> > long-named variant of the same function.
> >
> > Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
>
> Looks good to me. Mathieu, does this look sane to you as well?
>

Just got back from vacation. Just replied to it.

Happy new year :)

Mathieu

> > ---
> > include/linux/kmemtrace.h | 76 +++++++++-----------------------------------
> > include/linux/slab_def.h | 11 +++---
> > include/linux/slub_def.h | 16 +++++-----
> > lib/Kconfig.debug | 2 +-
> > mm/kmemtrace.c | 50 ++++++++++++++---------------
> > mm/slab.c | 27 ++++++++--------
> > mm/slob.c | 32 +++++++++---------
> > mm/slub.c | 36 ++++++++++----------
> > 8 files changed, 103 insertions(+), 147 deletions(-)
> >
> > diff --git a/include/linux/kmemtrace.h b/include/linux/kmemtrace.h
> > index 5bea8ea..7b38245 100644
> > --- a/include/linux/kmemtrace.h
> > +++ b/include/linux/kmemtrace.h
> > @@ -9,8 +9,8 @@
> >
> > #ifdef __KERNEL__
> >
> > +#include <linux/tracepoint.h>
> > #include <linux/types.h>
> > -#include <linux/marker.h>
> >
> > enum kmemtrace_type_id {
> > KMEMTRACE_TYPE_KMALLOC = 0, /* kmalloc() or kfree(). */
> > @@ -18,67 +18,23 @@ enum kmemtrace_type_id {
> > KMEMTRACE_TYPE_PAGES, /* __get_free_pages() and friends. */
> > };
> >
> > -#ifdef CONFIG_KMEMTRACE
> > -
> > extern void kmemtrace_init(void);
> >
> > -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> > - unsigned long call_site,
> > - const void *ptr,
> > - size_t bytes_req,
> > - size_t bytes_alloc,
> > - gfp_t gfp_flags,
> > - int node)
> > -{
> > - trace_mark(kmemtrace_alloc, "type_id %d call_site %lu ptr %lu "
> > - "bytes_req %lu bytes_alloc %lu gfp_flags %lu node %d",
> > - type_id, call_site, (unsigned long) ptr,
> > - (unsigned long) bytes_req, (unsigned long) bytes_alloc,
> > - (unsigned long) gfp_flags, node);
> > -}
> > -
> > -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> > - unsigned long call_site,
> > - const void *ptr)
> > -{
> > - trace_mark(kmemtrace_free, "type_id %d call_site %lu ptr %lu",
> > - type_id, call_site, (unsigned long) ptr);
> > -}
> > -
> > -#else /* CONFIG_KMEMTRACE */
> > -
> > -static inline void kmemtrace_init(void)
> > -{
> > -}
> > -
> > -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> > - unsigned long call_site,
> > - const void *ptr,
> > - size_t bytes_req,
> > - size_t bytes_alloc,
> > - gfp_t gfp_flags,
> > - int node)
> > -{
> > -}
> > -
> > -static inline void kmemtrace_mark_free(enum kmemtrace_type_id type_id,
> > - unsigned long call_site,
> > - const void *ptr)
> > -{
> > -}
> > -
> > -#endif /* CONFIG_KMEMTRACE */
> > -
> > -static inline void kmemtrace_mark_alloc(enum kmemtrace_type_id type_id,
> > - unsigned long call_site,
> > - const void *ptr,
> > - size_t bytes_req,
> > - size_t bytes_alloc,
> > - gfp_t gfp_flags)
> > -{
> > - kmemtrace_mark_alloc_node(type_id, call_site, ptr,
> > - bytes_req, bytes_alloc, gfp_flags, -1);
> > -}
> > +DEFINE_TRACE(kmemtrace_alloc,
> > + TPPROTO(enum kmemtrace_type_id type_id,
> > + unsigned long call_site,
> > + const void *ptr,
> > + size_t bytes_req,
> > + size_t bytes_alloc,
> > + gfp_t gfp_flags,
> > + int node),
> > + TPARGS(type_id, call_site, ptr,
> > + bytes_req, bytes_alloc, gfp_flags, node));
> > +DEFINE_TRACE(kmemtrace_free,
> > + TPPROTO(enum kmemtrace_type_id type_id,
> > + unsigned long call_site,
> > + const void *ptr),
> > + TPARGS(type_id, call_site, ptr));
> >
> > #endif /* __KERNEL__ */
> >
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index 7555ce9..e5a8b60 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -76,8 +76,9 @@ found:
> >
> > ret = kmem_cache_alloc_notrace(cachep, flags);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > - size, slab_buffer_size(cachep), flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > + size, slab_buffer_size(cachep),
> > + flags, -1);
> >
> > return ret;
> > }
> > @@ -134,9 +135,9 @@ found:
> >
> > ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > - ret, size, slab_buffer_size(cachep),
> > - flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > + ret, size, slab_buffer_size(cachep),
> > + flags, node);
> >
> > return ret;
> > }
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index dc28432..ce69c68 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
> > unsigned int order = get_order(size);
> > void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > - size, PAGE_SIZE << order, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > + size, PAGE_SIZE << order, flags, -1);
> >
> > return ret;
> > }
> > @@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
> >
> > ret = kmem_cache_alloc_notrace(s, flags);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> > - _THIS_IP_, ret,
> > - size, s->size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _THIS_IP_, ret,
> > + size, s->size, flags, -1);
> >
> > return ret;
> > }
> > @@ -283,9 +283,9 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
> >
> > ret = kmem_cache_alloc_node_notrace(s, flags, node);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> > - _THIS_IP_, ret,
> > - size, s->size, flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _THIS_IP_, ret,
> > + size, s->size, flags, node);
> >
> > return ret;
> > }
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index b5417e2..9cf8eb8 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -805,7 +805,7 @@ config FIREWIRE_OHCI_REMOTE_DMA
> >
> > config KMEMTRACE
> > bool "Kernel memory tracer (kmemtrace)"
> > - depends on RELAY && DEBUG_FS && MARKERS
> > + depends on RELAY && DEBUG_FS && TRACEPOINTS
> > help
> > kmemtrace provides tracing for slab allocator functions, such as
> > kmalloc, kfree, kmem_cache_alloc, kmem_cache_free etc.. Collected
> > diff --git a/mm/kmemtrace.c b/mm/kmemtrace.c
> > index 2a70a80..7bc81b2 100644
> > --- a/mm/kmemtrace.c
> > +++ b/mm/kmemtrace.c
> > @@ -58,8 +58,13 @@ struct kmemtrace_stats_alloc {
> > s32 numa_node;
> > } __attribute__ ((__packed__));
> >
> > -static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
> > - const char *format, va_list *args)
> > +static void kmemtrace_probe_alloc(enum kmemtrace_type_id type_id,
> > + unsigned long call_site,
> > + const void *ptr,
> > + size_t bytes_req,
> > + size_t bytes_alloc,
> > + gfp_t gfp_flags,
> > + int node)
> > {
> > unsigned long flags;
> > struct kmemtrace_event *ev;
> > @@ -81,25 +86,26 @@ static void kmemtrace_probe_alloc(void *probe_data, void *call_data,
> >
> > ev = buf;
> > ev->event_id = KMEMTRACE_EVENT_ALLOC;
> > - ev->type_id = va_arg(*args, int);
> > + ev->type_id = type_id;
> > ev->event_size = sizeof(struct kmemtrace_event) +
> > sizeof(struct kmemtrace_stats_alloc);
> > ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> > - ev->call_site = va_arg(*args, unsigned long);
> > - ev->ptr = va_arg(*args, unsigned long);
> > + ev->call_site = call_site;
> > + ev->ptr = (unsigned long) ptr;
> >
> > stats = buf + sizeof(struct kmemtrace_event);
> > - stats->bytes_req = va_arg(*args, unsigned long);
> > - stats->bytes_alloc = va_arg(*args, unsigned long);
> > - stats->gfp_flags = va_arg(*args, unsigned long);
> > - stats->numa_node = va_arg(*args, int);
> > + stats->bytes_req = bytes_req;
> > + stats->bytes_alloc = bytes_alloc;
> > + stats->gfp_flags = gfp_flags;
> > + stats->numa_node = node;
> >
> > failed:
> > local_irq_restore(flags);
> > }
> >
> > -static void kmemtrace_probe_free(void *probe_data, void *call_data,
> > - const char *format, va_list *args)
> > +static void kmemtrace_probe_free(enum kmemtrace_type_id type_id,
> > + unsigned long call_site,
> > + const void *ptr)
> > {
> > unsigned long flags;
> > struct kmemtrace_event *ev;
> > @@ -115,11 +121,11 @@ static void kmemtrace_probe_free(void *probe_data, void *call_data,
> > * C99 does not guarantee the rvalues evaluation order.
> > */
> > ev->event_id = KMEMTRACE_EVENT_FREE;
> > - ev->type_id = va_arg(*args, int);
> > + ev->type_id = type_id;
> > ev->event_size = sizeof(struct kmemtrace_event);
> > ev->seq_num = atomic_add_return(1, &kmemtrace_seq_num);
> > - ev->call_site = va_arg(*args, unsigned long);
> > - ev->ptr = va_arg(*args, unsigned long);
> > + ev->call_site = call_site;
> > + ev->ptr = (unsigned long) ptr;
> >
> > failed:
> > local_irq_restore(flags);
> > @@ -173,26 +179,18 @@ static int kmemtrace_start_probes(void)
> > {
> > int err;
> >
> > - err = marker_probe_register("kmemtrace_alloc", "type_id %d "
> > - "call_site %lu ptr %lu "
> > - "bytes_req %lu bytes_alloc %lu "
> > - "gfp_flags %lu node %d",
> > - kmemtrace_probe_alloc, NULL);
> > + err = register_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> > if (err)
> > return err;
> > - err = marker_probe_register("kmemtrace_free", "type_id %d "
> > - "call_site %lu ptr %lu",
> > - kmemtrace_probe_free, NULL);
> > + err = register_trace_kmemtrace_free(kmemtrace_probe_free);
> >
> > return err;
> > }
> >
> > static void kmemtrace_stop_probes(void)
> > {
> > - marker_probe_unregister("kmemtrace_alloc",
> > - kmemtrace_probe_alloc, NULL);
> > - marker_probe_unregister("kmemtrace_free",
> > - kmemtrace_probe_free, NULL);
> > + unregister_trace_kmemtrace_alloc(kmemtrace_probe_alloc);
> > + unregister_trace_kmemtrace_free(kmemtrace_probe_free);
> > }
> >
> > static int kmemtrace_enabled_get(void *data, u64 *val)
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 1fcb32b..74d405d 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3624,8 +3624,9 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
> > {
> > void *ret = __cache_alloc(cachep, flags, __builtin_return_address(0));
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > - obj_size(cachep), cachep->buffer_size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > + obj_size(cachep), cachep->buffer_size,
> > + flags, -1);
> >
> > return ret;
> > }
> > @@ -3686,9 +3687,9 @@ void *kmem_cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid)
> > void *ret = __cache_alloc_node(cachep, flags, nodeid,
> > __builtin_return_address(0));
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > - obj_size(cachep), cachep->buffer_size,
> > - flags, nodeid);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > + obj_size(cachep), cachep->buffer_size,
> > + flags, nodeid);
> >
> > return ret;
> > }
> > @@ -3716,9 +3717,9 @@ __do_kmalloc_node(size_t size, gfp_t flags, int node, void *caller)
> > return cachep;
> > ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> > - (unsigned long) caller, ret,
> > - size, cachep->buffer_size, flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + (unsigned long) caller, ret,
> > + size, cachep->buffer_size, flags, node);
> >
> > return ret;
> > }
> > @@ -3768,9 +3769,9 @@ static __always_inline void *__do_kmalloc(size_t size, gfp_t flags,
> > return cachep;
> > ret = __cache_alloc(cachep, flags, caller);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> > - (unsigned long) caller, ret,
> > - size, cachep->buffer_size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + (unsigned long) caller, ret,
> > + size, cachep->buffer_size, flags, -1);
> >
> > return ret;
> > }
> > @@ -3816,7 +3817,7 @@ void kmem_cache_free(struct kmem_cache *cachep, void *objp)
> > __cache_free(cachep, objp);
> > local_irq_restore(flags);
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> > + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, objp);
> > }
> > EXPORT_SYMBOL(kmem_cache_free);
> >
> > @@ -3844,7 +3845,7 @@ void kfree(const void *objp)
> > __cache_free(c, (void *)objp);
> > local_irq_restore(flags);
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> > + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, objp);
> > }
> > EXPORT_SYMBOL(kfree);
> >
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 0f1a49f..15f812c 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -477,9 +477,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> > *m = size;
> > ret = (void *)m + align;
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> > - _RET_IP_, ret,
> > - size, size + align, gfp, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _RET_IP_, ret,
> > + size, size + align, gfp, node);
> > } else {
> > unsigned int order = get_order(size);
> >
> > @@ -490,9 +490,9 @@ void *__kmalloc_node(size_t size, gfp_t gfp, int node)
> > page->private = size;
> > }
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> > - _RET_IP_, ret,
> > - size, PAGE_SIZE << order, gfp, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _RET_IP_, ret,
> > + size, PAGE_SIZE << order, gfp, node);
> > }
> >
> > return ret;
> > @@ -514,7 +514,7 @@ void kfree(const void *block)
> > } else
> > put_page(&sp->page);
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, block);
> > }
> > EXPORT_SYMBOL(kfree);
> >
> > @@ -585,16 +585,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *c, gfp_t flags, int node)
> >
> > if (c->size < PAGE_SIZE) {
> > b = slob_alloc(c->size, flags, c->align, node);
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> > - _RET_IP_, b, c->size,
> > - SLOB_UNITS(c->size) * SLOB_UNIT,
> > - flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> > + _RET_IP_, b, c->size,
> > + SLOB_UNITS(c->size) * SLOB_UNIT,
> > + flags, node);
> > } else {
> > b = slob_new_page(flags, get_order(c->size), node);
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE,
> > - _RET_IP_, b, c->size,
> > - PAGE_SIZE << get_order(c->size),
> > - flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE,
> > + _RET_IP_, b, c->size,
> > + PAGE_SIZE << get_order(c->size),
> > + flags, node);
> > }
> >
> > if (c->ctor)
> > @@ -632,7 +632,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
> > __kmem_cache_free(b, c->size);
> > }
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> > + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, b);
> > }
> > EXPORT_SYMBOL(kmem_cache_free);
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index f756915..3a2ec3a 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1620,8 +1620,8 @@ void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
> > {
> > void *ret = slab_alloc(s, gfpflags, -1, _RET_IP_);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > - s->objsize, s->size, gfpflags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > + s->objsize, s->size, gfpflags, -1);
> >
> > return ret;
> > }
> > @@ -1640,8 +1640,8 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node)
> > {
> > void *ret = slab_alloc(s, gfpflags, node, _RET_IP_);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > - s->objsize, s->size, gfpflags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_CACHE, _RET_IP_, ret,
> > + s->objsize, s->size, gfpflags, node);
> >
> > return ret;
> > }
> > @@ -1766,7 +1766,7 @@ void kmem_cache_free(struct kmem_cache *s, void *x)
> >
> > slab_free(s, page, x, _RET_IP_);
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> > + trace_kmemtrace_free(KMEMTRACE_TYPE_CACHE, _RET_IP_, x);
> > }
> > EXPORT_SYMBOL(kmem_cache_free);
> >
> > @@ -2694,8 +2694,8 @@ void *__kmalloc(size_t size, gfp_t flags)
> >
> > ret = slab_alloc(s, flags, -1, _RET_IP_);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> > - size, s->size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> > + size, s->size, flags, -1);
> >
> > return ret;
> > }
> > @@ -2721,10 +2721,10 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> > if (unlikely(size > PAGE_SIZE)) {
> > ret = kmalloc_large_node(size, flags, node);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC,
> > - _RET_IP_, ret,
> > - size, PAGE_SIZE << get_order(size),
> > - flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _RET_IP_, ret,
> > + size, PAGE_SIZE << get_order(size),
> > + flags, node);
> >
> > return ret;
> > }
> > @@ -2736,8 +2736,8 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node)
> >
> > ret = slab_alloc(s, flags, node, _RET_IP_);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> > - size, s->size, flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, ret,
> > + size, s->size, flags, node);
> >
> > return ret;
> > }
> > @@ -2798,7 +2798,7 @@ void kfree(const void *x)
> > }
> > slab_free(page->slab, page, object, _RET_IP_);
> >
> > - kmemtrace_mark_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> > + trace_kmemtrace_free(KMEMTRACE_TYPE_KMALLOC, _RET_IP_, x);
> > }
> > EXPORT_SYMBOL(kfree);
> >
> > @@ -3272,8 +3272,8 @@ void *__kmalloc_track_caller(size_t size, gfp_t gfpflags, unsigned long caller)
> > ret = slab_alloc(s, gfpflags, -1, caller);
> >
> > /* Honor the call site pointer we recieved. */
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> > - s->size, gfpflags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret, size,
> > + s->size, gfpflags, -1);
> >
> > return ret;
> > }
> > @@ -3295,8 +3295,8 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t gfpflags,
> > ret = slab_alloc(s, gfpflags, node, caller);
> >
> > /* Honor the call site pointer we recieved. */
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> > - size, s->size, gfpflags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, caller, ret,
> > + size, s->size, gfpflags, node);
> >
> > return ret;
> > }
> > --
> > 1.6.0.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Fri, Jan 02, 2009 at 03:53:45PM -0500, Mathieu Desnoyers wrote:
> * Eduard - Gabriel Munteanu ([email protected]) wrote:
> >
> > -#ifdef CONFIG_KMEMTRACE
> > -
> > extern void kmemtrace_init(void);
> >
> > -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
>
> Why do you need the enum kmemtrace_type_id type_id exactly ? Can you
> remove this parameter by adding more specific tracepoints ?
>

Already done and moved that enum into the .c file. I've resubmitted the
whole thing (2 patches instead of 3). I'll provide an URL if you didn't get
those e-mails.

> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index 7555ce9..e5a8b60 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -76,8 +76,9 @@ found:
> >
> > ret = kmem_cache_alloc_notrace(cachep, flags);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > - size, slab_buffer_size(cachep), flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > + size, slab_buffer_size(cachep),
>
> You'll probably want to give "cachep" as parameter and do the
> slab_buffer_size(cachep) within the probe to remove unneeded code from
> the caller site.
>

Hmm, I don't see how this could be an improvement. Slab code hackers are
supposed to see this and keep kmemtrace in line with slab internals.
Moving this away only makes things more awkward.

Besides, having a probe for each allocator and each function hardly
makes any sense.

Anyway, one could argue for an inline within slab code to handle such
internals, which does make more sense.

> > + flags, -1);
> >
> > return ret;
> > }
> > @@ -134,9 +135,9 @@ found:
> >
> > ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
> >
> > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > - ret, size, slab_buffer_size(cachep),
> > - flags, node);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > + ret, size, slab_buffer_size(cachep),
> > + flags, node);
> >
> > return ret;
> > }
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index dc28432..ce69c68 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
> > unsigned int order = get_order(size);
> > void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > - size, PAGE_SIZE << order, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > + size, PAGE_SIZE << order, flags, -1);
>
> Same here for order.

Same goes here.

> Why do you need to pass -1 ? Can you have a more specific tracepoint
> instead ?
>

I've noticed probe functions tend to have really long names. -1 is just
very convenient, we could make it typo-proof by taking all negative values as
meaning "same node" in userspace, or clamp / warn about it in probes. Anyway,
I've seen SLOB does it in its inlines, e.g. kmem_cache_alloc() calls
kmem_cache_alloc_node(..., -1).

> >
> > return ret;
> > }
> > @@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
> >
> > ret = kmem_cache_alloc_notrace(s, flags);
> >
> > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> > - _THIS_IP_, ret,
> > - size, s->size, flags);
> > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > + _THIS_IP_, ret,
> > + size, s->size, flags, -1);
> >
>
> Pass s as parameter and do the s->size dereference within the probe.
>
> The same applies to many other caller sites below.

Same as first comment, it would entail having specific kmemtrace
functions for specific allocator functions within specific allocators.


Eduard

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Fri, Jan 02, 2009 at 03:54:26PM -0500, Mathieu Desnoyers wrote:
> * Pekka Enberg ([email protected]) wrote:
> > On Mon, Dec 29, 2008 at 3:40 AM, Eduard - Gabriel Munteanu
> > <[email protected]> wrote:
> > > kmemtrace now uses tracepoints instead of markers. We no longer need to
> > > use format specifiers. Also dropped kmemtrace_mark_alloc_node(), since
> > > it's easy to pass -1 as the NUMA node without providing a needless and
> > > long-named variant of the same function.
> > >
> > > Signed-off-by: Eduard - Gabriel Munteanu <[email protected]>
> >
> > Looks good to me. Mathieu, does this look sane to you as well?
> >
>
> Just got back from vacation. Just replied to it.
>
> Happy new year :)
>
> Mathieu

Thanks for pitching in. Happy new year!


Cheers,
Eduard

2009-01-02 23:43:14

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

* Eduard - Gabriel Munteanu ([email protected]) wrote:
> On Fri, Jan 02, 2009 at 03:53:45PM -0500, Mathieu Desnoyers wrote:
> > * Eduard - Gabriel Munteanu ([email protected]) wrote:
> > >
> > > -#ifdef CONFIG_KMEMTRACE
> > > -
> > > extern void kmemtrace_init(void);
> > >
> > > -static inline void kmemtrace_mark_alloc_node(enum kmemtrace_type_id type_id,
> >
> > Why do you need the enum kmemtrace_type_id type_id exactly ? Can you
> > remove this parameter by adding more specific tracepoints ?
> >
>
> Already done and moved that enum into the .c file. I've resubmitted the
> whole thing (2 patches instead of 3). I'll provide an URL if you didn't get
> those e-mails.
>

I got those, I just replied a little bit late. Thanks. I am not talking
about moving the enum to a .c file here, I am simply asking why we don't
use the already existing tracepoint naming rather than adding another
level of event description in an event field.

> > > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > > index 7555ce9..e5a8b60 100644
> > > --- a/include/linux/slab_def.h
> > > +++ b/include/linux/slab_def.h
> > > @@ -76,8 +76,9 @@ found:
> > >
> > > ret = kmem_cache_alloc_notrace(cachep, flags);
> > >
> > > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > - size, slab_buffer_size(cachep), flags);
> > > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > + size, slab_buffer_size(cachep),
> >
> > You'll probably want to give "cachep" as parameter and do the
> > slab_buffer_size(cachep) within the probe to remove unneeded code from
> > the caller site.
> >
>
> Hmm, I don't see how this could be an improvement.

Because whatever slab_buffer_size() does will be done on the fastpath of
the instrumented code *even when instrumentation is disabled*, and this
is something we need to avoid above all.

> Slab code hackers are
> supposed to see this and keep kmemtrace in line with slab internals.
> Moving this away only makes things more awkward.
>

I am just saying this should be moved _into_ the kmemtrace probe. This
code will have to be kept in sync with the kernel anyway.

> Besides, having a probe for each allocator and each function hardly
> makes any sense.

Why ?

>
> Anyway, one could argue for an inline within slab code to handle such
> internals, which does make more sense.
>

Please keep the logic _outside_ of the memory subsystem fastpath. This
can be done by putting it in the kmemtrace probe callbacks as I propose.

> > > + flags, -1);
> > >
> > > return ret;
> > > }
> > > @@ -134,9 +135,9 @@ found:
> > >
> > > ret = kmem_cache_alloc_node_notrace(cachep, flags, node);
> > >
> > > - kmemtrace_mark_alloc_node(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > > - ret, size, slab_buffer_size(cachep),
> > > - flags, node);
> > > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_,
> > > + ret, size, slab_buffer_size(cachep),
> > > + flags, node);
> > >
> > > return ret;
> > > }
> > > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > > index dc28432..ce69c68 100644
> > > --- a/include/linux/slub_def.h
> > > +++ b/include/linux/slub_def.h
> > > @@ -220,8 +220,8 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
> > > unsigned int order = get_order(size);
> > > void *ret = (void *) __get_free_pages(flags | __GFP_COMP, order);
> > >
> > > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > - size, PAGE_SIZE << order, flags);
> > > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > + size, PAGE_SIZE << order, flags, -1);
> >
> > Same here for order.
>
> Same goes here.
>
> > Why do you need to pass -1 ? Can you have a more specific tracepoint
> > instead ?
> >
>
> I've noticed probe functions tend to have really long names. -1 is just
> very convenient, we could make it typo-proof by taking all negative values as
> meaning "same node" in userspace, or clamp / warn about it in probes. Anyway,
> I've seen SLOB does it in its inlines, e.g. kmem_cache_alloc() calls
> kmem_cache_alloc_node(..., -1).
>

With all due respect, your argument is bogus because it does not take
into account something far more important than "long function names",
which is what the resulting assembly looks like, and how much overhead
you add. Here, you are adding the overhead of 1 more parameter *per
call* just to keep tracepoint names smaller. The same thing applies to
your added enumeration. Is it me or it does not sound like a good
tradeoff ? Remember that tracepoints in the memory allocator code are
meant to be called *very* often.

Mathieu

> > >
> > > return ret;
> > > }
> > > @@ -242,9 +242,9 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
> > >
> > > ret = kmem_cache_alloc_notrace(s, flags);
> > >
> > > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC,
> > > - _THIS_IP_, ret,
> > > - size, s->size, flags);
> > > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC,
> > > + _THIS_IP_, ret,
> > > + size, s->size, flags, -1);
> > >
> >
> > Pass s as parameter and do the s->size dereference within the probe.
> >
> > The same applies to many other caller sites below.
>
> Same as first comment, it would entail having specific kmemtrace
> functions for specific allocator functions within specific allocators.
>
>
> Eduard
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Fri, Jan 02, 2009 at 06:42:54PM -0500, Mathieu Desnoyers wrote:
> * Eduard - Gabriel Munteanu ([email protected]) wrote:
> > >
> > > Why do you need the enum kmemtrace_type_id type_id exactly ? Can you
> > > remove this parameter by adding more specific tracepoints ?
> > >
> >
> > Already done and moved that enum into the .c file. I've resubmitted the
> > whole thing (2 patches instead of 3). I'll provide an URL if you didn't get
> > those e-mails.
> >
>
> I got those, I just replied a little bit late. Thanks. I am not talking
> about moving the enum to a .c file here, I am simply asking why we don't
> use the already existing tracepoint naming rather than adding another
> level of event description in an event field.
>

Yes, already done (not just moved things around) in case I wasn't clear.

> > > > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > > > index 7555ce9..e5a8b60 100644
> > > > --- a/include/linux/slab_def.h
> > > > +++ b/include/linux/slab_def.h
> > > > @@ -76,8 +76,9 @@ found:
> > > >
> > > > ret = kmem_cache_alloc_notrace(cachep, flags);
> > > >
> > > > - kmemtrace_mark_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > > - size, slab_buffer_size(cachep), flags);
> > > > + trace_kmemtrace_alloc(KMEMTRACE_TYPE_KMALLOC, _THIS_IP_, ret,
> > > > + size, slab_buffer_size(cachep),
> > >
> > > You'll probably want to give "cachep" as parameter and do the
> > > slab_buffer_size(cachep) within the probe to remove unneeded code from
> > > the caller site.
> > >
> >
> > Hmm, I don't see how this could be an improvement.
>
> Because whatever slab_buffer_size() does will be done on the fastpath of
> the instrumented code *even when instrumentation is disabled*, and this
> is something we need to avoid above all.
>

Oh, I see. You mean the case when kmemtrace is compiled in but not
enabled. For slab_buffer_size() this is no problem, but things like
(1 << order) do result in more computation being performed. I'll fix it.

Heh, I sometimes wish C was more like Haskell (lazy, outer evaluation) :P.

> > Slab code hackers are
> > supposed to see this and keep kmemtrace in line with slab internals.
> > Moving this away only makes things more awkward.
> >
>
> I am just saying this should be moved _into_ the kmemtrace probe. This
> code will have to be kept in sync with the kernel anyway.
>

I got that, but I'll try to keep any logic within slab code files. The
kmemtrace probe should call such a function to perform computation, so
developers won't ever need to look at mm/kmemtrace.c.

> > Besides, having a probe for each allocator and each function hardly
> > makes any sense.
>
> Why ?
>

Hmm, probably having some get_req_size() & get_alloc_size() defined for
all slab allocators might make more sense. Just like we do now to handle
different kmalloc() variants.

I really feel like having to change both slab code and kmemtrace code to
support some new allocator isn't the way to go. Ideally, one should only
insert calls to probes and provide that get_*_size() interface.

I'll see what can be done. But please comment if you think otherwise.

> > > Why do you need to pass -1 ? Can you have a more specific tracepoint
> > > instead ?
> > >
> >
> > I've noticed probe functions tend to have really long names. -1 is just
> > very convenient, we could make it typo-proof by taking all negative values as
> > meaning "same node" in userspace, or clamp / warn about it in probes. Anyway,
> > I've seen SLOB does it in its inlines, e.g. kmem_cache_alloc() calls
> > kmem_cache_alloc_node(..., -1).
> >
>
> With all due respect, your argument is bogus because it does not take
> into account something far more important than "long function names",
> which is what the resulting assembly looks like, and how much overhead
> you add. Here, you are adding the overhead of 1 more parameter *per
> call* just to keep tracepoint names smaller. The same thing applies to
> your added enumeration. Is it me or it does not sound like a good
> tradeoff ? Remember that tracepoints in the memory allocator code are
> meant to be called *very* often.

Yes, it really is bogus, sorry. Will fix.


Cheers,
Eduard

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Fri, Jan 02, 2009 at 06:42:54PM -0500, Mathieu Desnoyers wrote:
> Because whatever slab_buffer_size() does will be done on the fastpath of
> the instrumented code *even when instrumentation is disabled*, and this
> is something we need to avoid above all.

I had doubts about this, so I tried it myself. It seems that when using
-O2 it generates optimal code, without computing a << b unnecessarily.
It only precomputes it at -O0. Here's how I tested...

#include <stdio.h>

#define unlikely(x) __builtin_expect(!!(x), 0)

static int do_something_enabled;

static void print_that(unsigned long num)
{
printf("input << 5 == %lu\n", num);
}

static inline void do_something(unsigned long num)
{
if (unlikely(do_something_enabled)) /* Like DEFINE_TRACE does. */
print_that(num);
}

static void call_do_something(unsigned long in)
{
do_something(in << 5);
}

int main(int argc, char **argv)
{
unsigned long in;

if (argc != 3) {
printf("Wrong number of arguments!\n");
return 0;
}

sscanf(argv[1], "%d", &do_something_enabled);
sscanf(argv[2], "%lu", &in);

call_do_something(in);

return 0;
}


Snippet of objdump output when using -O2:

static inline void do_something(unsigned long num)
{
if (unlikely(do_something_enabled))
400635: 85 c0 test %eax,%eax
400637: 74 be je 4005f7 <main+0x17>
print_that():
/home/edi/prj/src/inlineargs/inlineargs.c:9

static int do_something_enabled;

static void print_that(unsigned long num)
{
printf("input << 5 == %lu\n", num);
400639: 48 c1 e6 05 shl $0x5,%rsi
40063d: bf 6e 07 40 00 mov $0x40076e,%edi
400642: 31 c0 xor %eax,%eax
400644: e8 5f fe ff ff callq 4004a8 <printf@plt>


Snippet of objdump output when using -O0:

static void call_do_something(unsigned long in)
{
4005fd: 55 push %rbp
4005fe: 48 89 e5 mov %rsp,%rbp
400601: 48 83 ec 10 sub $0x10,%rsp
400605: 48 89 7d f8 mov %rdi,-0x8(%rbp)
/home/edi/prj/src/inlineargs/inlineargs.c:20
do_something(in << 5);
400609: 48 8b 45 f8 mov -0x8(%rbp),%rax
40060d: 48 89 c7 mov %rax,%rdi
400610: 48 c1 e7 05 shl $0x5,%rdi
400614: e8 02 00 00 00 callq 40061b <do_something>


Look at that shl, it indicates the left-shift (<< 5). In the first case it's
deferred as much as possible. However, in the second case, it's done
before calling that inline. Also confirmed with GCC using breakpoints on
that shl.

Can we take this as general behaviour, i.e. fn(a()), where fn() is inlined
and a() has no side-effects, will only compute a() when needed, at least on
GCC and when -O2 is in effect?

It only seems natural to me GCC would do this on a regular basis.


Cheers,
Eduard

2009-01-05 16:11:00

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

* Eduard - Gabriel Munteanu ([email protected]) wrote:
> On Fri, Jan 02, 2009 at 06:42:54PM -0500, Mathieu Desnoyers wrote:
> > Because whatever slab_buffer_size() does will be done on the fastpath of
> > the instrumented code *even when instrumentation is disabled*, and this
> > is something we need to avoid above all.
>
> I had doubts about this, so I tried it myself. It seems that when using
> -O2 it generates optimal code, without computing a << b unnecessarily.
> It only precomputes it at -O0. Here's how I tested...
>
> #include <stdio.h>
>
> #define unlikely(x) __builtin_expect(!!(x), 0)
>
> static int do_something_enabled;
>
> static void print_that(unsigned long num)
> {
> printf("input << 5 == %lu\n", num);
> }
>
> static inline void do_something(unsigned long num)
> {
> if (unlikely(do_something_enabled)) /* Like DEFINE_TRACE does. */
> print_that(num);
> }
>
> static void call_do_something(unsigned long in)
> {
> do_something(in << 5);
> }
>
> int main(int argc, char **argv)
> {
> unsigned long in;
>
> if (argc != 3) {
> printf("Wrong number of arguments!\n");
> return 0;
> }
>
> sscanf(argv[1], "%d", &do_something_enabled);
> sscanf(argv[2], "%lu", &in);
>
> call_do_something(in);
>
> return 0;
> }
>
>
> Snippet of objdump output when using -O2:
>
> static inline void do_something(unsigned long num)
> {
> if (unlikely(do_something_enabled))
> 400635: 85 c0 test %eax,%eax
> 400637: 74 be je 4005f7 <main+0x17>
> print_that():
> /home/edi/prj/src/inlineargs/inlineargs.c:9
>
> static int do_something_enabled;
>
> static void print_that(unsigned long num)
> {
> printf("input << 5 == %lu\n", num);
> 400639: 48 c1 e6 05 shl $0x5,%rsi
> 40063d: bf 6e 07 40 00 mov $0x40076e,%edi
> 400642: 31 c0 xor %eax,%eax
> 400644: e8 5f fe ff ff callq 4004a8 <printf@plt>
>
>
> Snippet of objdump output when using -O0:
>
> static void call_do_something(unsigned long in)
> {
> 4005fd: 55 push %rbp
> 4005fe: 48 89 e5 mov %rsp,%rbp
> 400601: 48 83 ec 10 sub $0x10,%rsp
> 400605: 48 89 7d f8 mov %rdi,-0x8(%rbp)
> /home/edi/prj/src/inlineargs/inlineargs.c:20
> do_something(in << 5);
> 400609: 48 8b 45 f8 mov -0x8(%rbp),%rax
> 40060d: 48 89 c7 mov %rax,%rdi
> 400610: 48 c1 e7 05 shl $0x5,%rdi
> 400614: e8 02 00 00 00 callq 40061b <do_something>
>
>
> Look at that shl, it indicates the left-shift (<< 5). In the first case it's
> deferred as much as possible. However, in the second case, it's done
> before calling that inline. Also confirmed with GCC using breakpoints on
> that shl.
>
> Can we take this as general behaviour, i.e. fn(a()), where fn() is inlined
> and a() has no side-effects, will only compute a() when needed, at least on
> GCC and when -O2 is in effect?
>
> It only seems natural to me GCC would do this on a regular basis.
>

Hopefully it does, especially when there are no side-effects. Can you
also try with -Os ?

Mathieu


>
> Cheers,
> Eduard
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

On Mon, Jan 05, 2009 at 11:05:34AM -0500, Mathieu Desnoyers wrote:
> Hopefully it does, especially when there are no side-effects. Can you
> also try with -Os ?
>
> Mathieu

Here's the disassembled code when using -Os. It seems it's optimised, as
with -O2. My GCC's version is 4.3.2 (Gentoo Linux).

If you want to test yourself, the output was generated with 'objdump -d
-S -a'.

What do you think?

static void print_that(unsigned long num)
{
printf("input << 5 == %lu\n", num);
40062d: 48 c1 e6 05 shl $0x5,%rsi
400631: bf 5e 07 40 00 mov $0x40075e,%edi
400636: 31 c0 xor %eax,%eax
400638: e8 6b fe ff ff callq 4004a8 <printf@plt>
sscanf(argv[2], "%lu", &in);

call_do_something(in);

return 0;
}
40063d: 5a pop %rdx
40063e: 59 pop %rcx
40063f: 31 c0 xor %eax,%eax
400641: 5b pop %rbx
400642: c3 retq


Eduard

2009-01-05 18:14:56

by Mathieu Desnoyers

[permalink] [raw]
Subject: Re: [PATCH 3/3] kmemtrace: Use tracepoints instead of markers.

* Eduard - Gabriel Munteanu ([email protected]) wrote:
> On Mon, Jan 05, 2009 at 11:05:34AM -0500, Mathieu Desnoyers wrote:
> > Hopefully it does, especially when there are no side-effects. Can you
> > also try with -Os ?
> >
> > Mathieu
>
> Here's the disassembled code when using -Os. It seems it's optimised, as
> with -O2. My GCC's version is 4.3.2 (Gentoo Linux).
>
> If you want to test yourself, the output was generated with 'objdump -d
> -S -a'.
>
> What do you think?
>
> static void print_that(unsigned long num)
> {
> printf("input << 5 == %lu\n", num);
> 40062d: 48 c1 e6 05 shl $0x5,%rsi
> 400631: bf 5e 07 40 00 mov $0x40075e,%edi
> 400636: 31 c0 xor %eax,%eax
> 400638: e8 6b fe ff ff callq 4004a8 <printf@plt>
> sscanf(argv[2], "%lu", &in);
>
> call_do_something(in);
>
> return 0;
> }
> 40063d: 5a pop %rdx
> 40063e: 59 pop %rcx
> 40063f: 31 c0 xor %eax,%eax
> 400641: 5b pop %rbx
> 400642: c3 retq
>
>
> Eduard
>

It looks good. Although I wonder if gcc will still optimize this in more
complicated functions. Just to be 100% sure, I would recommend testing
it in larger functions like schedule() in the kernel tree. But so far it
looks like it does not hurt much to leave a small supplementary operation
in the unlikely() branch.

Mathieu

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68