Hi all,
While fuzzing with trinity inside a KVM tools guest running the latest -next
kernel I've stumbled on the following spew. It seems to cause an actual lockup
as hung task messages followed soon after.
[ 690.762537] =============================================
[ 690.764196] [ INFO: possible recursive locking detected ]
[ 690.765247] 3.16.0-rc1-next-20140618-sasha-00029-g9e4acf8-dirty #664 Tainted: G W
[ 690.766457] ---------------------------------------------
[ 690.767237] kworker/u95:0/256 is trying to acquire lock:
[ 690.767886] (&(&n->list_lock)->rlock){-.-.-.}, at: get_partial_node.isra.35 (mm/slub.c:1630)
[ 690.769162]
[ 690.769162] but task is already holding lock:
[ 690.769851] (&(&n->list_lock)->rlock){-.-.-.}, at: kmem_cache_close (mm/slub.c:3209 mm/slub.c:3233)
[ 690.770137]
[ 690.770137] other info that might help us debug this:
[ 690.770137] Possible unsafe locking scenario:
[ 690.770137]
[ 690.770137] CPU0
[ 690.770137] ----
[ 690.770137] lock(&(&n->list_lock)->rlock);
[ 690.770137] lock(&(&n->list_lock)->rlock);
[ 690.770137]
[ 690.770137] *** DEADLOCK ***
[ 690.770137]
[ 690.770137] May be due to missing lock nesting notation
[ 690.770137]
[ 690.770137] 7 locks held by kworker/u95:0/256:
[ 690.770137] #0: ("%s"("netns")){.+.+.+}, at: process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[ 690.770137] #1: (net_cleanup_work){+.+.+.}, at: process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[ 690.770137] #2: (net_mutex){+.+.+.}, at: cleanup_net (net/core/net_namespace.c:287)
[ 690.770137] #3: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
[ 690.770137] #4: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
[ 690.770137] #5: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:343)
[ 690.770137] #6: (&(&n->list_lock)->rlock){-.-.-.}, at: kmem_cache_close (mm/slub.c:3209 mm/slub.c:3233)
[ 690.770137]
[ 690.770137] stack backtrace:
[ 690.770137] CPU: 18 PID: 256 Comm: kworker/u95:0 Tainted: G W 3.16.0-rc1-next-20140618-sasha-00029-g9e4acf8-dirty #664
[ 690.770137] Workqueue: netns cleanup_net
[ 690.770137] ffff8808a172b000 ffff8808a1737628 ffffffff9d5179a0 0000000000000003
[ 690.770137] ffffffffa0b499c0 ffff8808a1737728 ffffffff9a1cac52 ffff8808a1737668
[ 690.770137] ffffffff9a1a74f8 23e00d8075e32f12 ffff8808a172b000 23e00d8000000001
[ 690.770137] Call Trace:
[ 690.770137] dump_stack (lib/dump_stack.c:52)
[ 690.770137] __lock_acquire (kernel/locking/lockdep.c:3034 kernel/locking/lockdep.c:3180)
[ 690.770137] ? sched_clock_cpu (kernel/sched/clock.c:311)
[ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 690.770137] lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
[ 690.770137] ? get_partial_node.isra.35 (mm/slub.c:1630)
[ 690.770137] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:305)
[ 690.770137] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 690.770137] ? get_partial_node.isra.35 (mm/slub.c:1630)
[ 690.770137] get_partial_node.isra.35 (mm/slub.c:1630)
[ 690.770137] ? __slab_alloc (mm/slub.c:2304)
[ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
[ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
[ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
[ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
[ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 690.770137] debug_object_init (lib/debugobjects.c:365)
[ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
[ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
[ 690.770137] ? discard_slab (mm/slub.c:1486)
[ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
[ 690.770137] call_rcu (kernel/rcu/tree_plugin.h:679)
[ 690.770137] discard_slab (mm/slub.c:1515 mm/slub.c:1523)
[ 690.770137] kmem_cache_close (mm/slub.c:3212 mm/slub.c:3233)
[ 690.770137] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
[ 690.770137] __kmem_cache_shutdown (mm/slub.c:3245)
[ 690.770137] kmem_cache_destroy (mm/slab_common.c:349)
[ 690.770137] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 2))
[ 690.770137] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
[ 690.770137] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
[ 690.770137] cleanup_net (net/core/net_namespace.c:302 (discriminator 2))
[ 690.770137] process_one_work (kernel/workqueue.c:2081 include/linux/jump_label.h:115 include/trace/events/workqueue.h:111 kernel/workqueue.c:2086)
[ 690.770137] ? process_one_work (include/linux/workqueue.h:185 kernel/workqueue.c:599 kernel/workqueue.c:626 kernel/workqueue.c:2074)
[ 690.770137] worker_thread (kernel/workqueue.c:2213)
[ 690.770137] ? rescuer_thread (kernel/workqueue.c:2157)
[ 690.770137] kthread (kernel/kthread.c:210)
[ 690.770137] ? kthread_create_on_node (kernel/kthread.c:176)
[ 690.770137] ret_from_fork (arch/x86/kernel/entry_64.S:349)
Thanks,
Sasha
On Thu, 19 Jun 2014, Sasha Levin wrote:
> [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> [ 690.770137] ? discard_slab (mm/slub.c:1486)
> [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
__call_rcu does a slab allocation? This means __call_rcu can no longer be
used in slab allocators? What happened?
On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Sasha Levin wrote:
>
> > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > [ 690.770137] ? discard_slab (mm/slub.c:1486)
> > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
>
> __call_rcu does a slab allocation? This means __call_rcu can no longer be
> used in slab allocators? What happened?
My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
call_rcu_bh(), or call_srcu().
Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
That would be unfortunate...
Thanx, Paul
On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> > On Thu, 19 Jun 2014, Sasha Levin wrote:
> >
> > > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> > > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > > [ 690.770137] ? discard_slab (mm/slub.c:1486)
> > > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> >
> > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> > used in slab allocators? What happened?
>
> My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> call_rcu_bh(), or call_srcu().
>
> Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> That would be unfortunate...
Well, no. Look at the callchain:
__call_rcu
debug_object_activate
rcuhead_fixup_activate
debug_object_init
kmem_cache_alloc
So call rcu activates the object, but the object has no reference in
the debug objects code so the fixup code is called which inits the
object and allocates a reference ....
Thanks,
tglx
On Thu, 19 Jun 2014, Thomas Gleixner wrote:
> Well, no. Look at the callchain:
>
> __call_rcu
> debug_object_activate
> rcuhead_fixup_activate
> debug_object_init
> kmem_cache_alloc
>
> So call rcu activates the object, but the object has no reference in
> the debug objects code so the fixup code is called which inits the
> object and allocates a reference ....
So we need to init the object in the page struct before the __call_rcu?
On Thu, 19 Jun 2014, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Thomas Gleixner wrote:
>
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> > debug_object_activate
> > rcuhead_fixup_activate
> > debug_object_init
> > kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
>
> So we need to init the object in the page struct before the __call_rcu?
Looks like RCU is lazily relying on the state callback to initialize
the objects.
There is an unused debug_init_rcu_head() inline in kernel/rcu/update.c
Paul????
On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
>
> > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> > > On Thu, 19 Jun 2014, Sasha Levin wrote:
> > >
> > > > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> > > > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> > > > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> > > > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> > > > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> > > > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> > > > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> > > > [ 690.770137] ? discard_slab (mm/slub.c:1486)
> > > > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> > >
> > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> > > used in slab allocators? What happened?
> >
> > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> > call_rcu_bh(), or call_srcu().
> >
> > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> > That would be unfortunate...
>
> Well, no. Look at the callchain:
>
> __call_rcu
> debug_object_activate
> rcuhead_fixup_activate
> debug_object_init
> kmem_cache_alloc
>
> So call rcu activates the object, but the object has no reference in
> the debug objects code so the fixup code is called which inits the
> object and allocates a reference ....
OK, got it. And you are right, call_rcu() has done this for a very
long time, so not sure what changed. But it seems like the right
approach is to provide a debug-object-free call_rcu_alloc() for use
by the memory allocators.
Seem reasonable? If so, please see the following patch.
Thanx, Paul
------------------------------------------------------------------------
rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
The sl*b allocators use call_rcu() to manage object lifetimes, but
call_rcu() can use debug-objects, which in turn invokes the sl*b
allocators. These allocators are not prepared for this sort of
recursion, which can result in failures.
This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
which act as their call_rcu() and call_rcu_sched() counterparts, but
which avoid invoking debug-objects. These new API members are intended
only for use by the sl*b allocators, and this commit makes the sl*b
allocators use call_rcu_alloc(). Why call_rcu_sched_alloc()? Because
in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
Reported-by: Sasha Levin <[email protected]>
Set-straight-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index d5e40a42cc43..1f708a7f9e7d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -140,13 +140,24 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
* if CPU A and CPU B are the same CPU (but again only if the system has
* more than one CPU).
*/
-void call_rcu(struct rcu_head *head,
- void (*func)(struct rcu_head *head));
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *head));
+
+/**
+ * call_rcu__alloc() - Queue an RCU for invocation after grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu(), but avoids invoking debug-objects. This permits
+ * this to be called from allocators without needing to worry about
+ * recursive calls into those allocators for debug-objects allocations.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu));
#else /* #ifdef CONFIG_PREEMPT_RCU */
/* In classic RCU, call_rcu() is just call_rcu_sched(). */
#define call_rcu call_rcu_sched
+#define call_rcu_alloc call_rcu_sched_alloc
#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
@@ -196,6 +207,19 @@ void call_rcu_bh(struct rcu_head *head,
void call_rcu_sched(struct rcu_head *head,
void (*func)(struct rcu_head *rcu));
+/**
+ * call_rcu_sched_alloc() - Queue RCU for invocation after sched grace period.
+ * @head: structure to be used for queueing the RCU updates.
+ * @func: actual callback function to be invoked after the grace period
+ *
+ * Similar to call_rcu_sched(), but avoids invoking debug-objects.
+ * This permits this to be called from allocators without needing to
+ * worry about recursive calls into those allocators for debug-objects
+ * allocations.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu));
+
void synchronize_sched(void);
#ifdef CONFIG_PREEMPT_RCU
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index d9efcc13008c..515e60067c53 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -338,15 +338,14 @@ void synchronize_sched(void)
EXPORT_SYMBOL_GPL(synchronize_sched);
/*
- * Helper function for call_rcu() and call_rcu_bh().
+ * Provide call_rcu() function, but avoid invoking debug objects.
*/
-static void __call_rcu(struct rcu_head *head,
- void (*func)(struct rcu_head *rcu),
- struct rcu_ctrlblk *rcp)
+static void __call_rcu_nodo(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp)
{
unsigned long flags;
- debug_rcu_head_queue(head);
head->func = func;
head->next = NULL;
@@ -358,6 +357,17 @@ static void __call_rcu(struct rcu_head *head,
}
/*
+ * Helper function for call_rcu() and call_rcu_bh().
+ */
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp)
+{
+ debug_rcu_head_queue(head);
+ __call_rcu_nodo(head, func, rcp);
+}
+
+/*
* Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
@@ -369,6 +379,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_sched_ctrlblk);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
* Post an RCU bottom-half callback to be invoked after any subsequent
* quiescent state.
*/
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 8c47d04ecdea..593195d38850 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2640,25 +2640,16 @@ static void rcu_leak_callback(struct rcu_head *rhp)
}
/*
- * Helper function for call_rcu() and friends. The cpu argument will
- * normally be -1, indicating "currently running CPU". It may specify
- * a CPU only if that CPU is a no-CBs CPU. Currently, only _rcu_barrier()
- * is expected to specify a CPU.
+ * Provide call_rcu() function, but avoid invoking debug objects.
*/
static void
-__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
- struct rcu_state *rsp, int cpu, bool lazy)
+__call_rcu_nodo(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+ struct rcu_state *rsp, int cpu, bool lazy)
{
unsigned long flags;
struct rcu_data *rdp;
WARN_ON_ONCE((unsigned long)head & 0x1); /* Misaligned rcu_head! */
- if (debug_rcu_head_queue(head)) {
- /* Probable double call_rcu(), so leak the callback. */
- ACCESS_ONCE(head->func) = rcu_leak_callback;
- WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
- return;
- }
head->func = func;
head->next = NULL;
@@ -2704,6 +2695,25 @@ __call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
}
/*
+ * Helper function for call_rcu() and friends. The cpu argument will
+ * normally be -1, indicating "currently running CPU". It may specify
+ * a CPU only if that CPU is a no-CBs CPU. Currently, only _rcu_barrier()
+ * is expected to specify a CPU.
+ */
+static void
+__call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu),
+ struct rcu_state *rsp, int cpu, bool lazy)
+{
+ if (debug_rcu_head_queue(head)) {
+ /* Probable double call_rcu(), so leak the callback. */
+ ACCESS_ONCE(head->func) = rcu_leak_callback;
+ WARN_ONCE(1, "__call_rcu(): Leaked duplicate callback\n");
+ return;
+ }
+ __call_rcu_nodo(head, func, rsp, cpu, lazy);
+}
+
+/*
* Queue an RCU-sched callback for invocation after a grace period.
*/
void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
@@ -2713,6 +2723,18 @@ void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
+ * Similar to call_rcu_sched(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_sched_alloc(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_sched_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_sched_alloc);
+
+/*
* Queue an RCU callback for invocation after a quicker grace period.
*/
void call_rcu_bh(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 569b390daa15..e9362d7f8328 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -679,6 +679,17 @@ void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
}
EXPORT_SYMBOL_GPL(call_rcu);
+/*
+ * Similar to call_rcu(), but avoids debug-objects and thus calls
+ * into the memory allocators, which don't appreciate that sort of
+ * recursion.
+ */
+void call_rcu_alloc(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ __call_rcu_nodo(head, func, &rcu_preempt_state, -1, 0);
+}
+EXPORT_SYMBOL_GPL(call_rcu_alloc);
+
/**
* synchronize_rcu - wait until a grace period has elapsed.
*
diff --git a/mm/slab.c b/mm/slab.c
index 9ca3b87edabc..1e5de0d39701 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1994,7 +1994,7 @@ static void slab_destroy(struct kmem_cache *cachep, struct page *page)
* we can use it safely.
*/
head = (void *)&page->rcu_head;
- call_rcu(head, kmem_rcu_free);
+ call_rcu_alloc(head, kmem_rcu_free);
} else {
kmem_freepages(cachep, page);
diff --git a/mm/slob.c b/mm/slob.c
index 21980e0f39a8..47ad4a43521a 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -605,7 +605,7 @@ void kmem_cache_free(struct kmem_cache *c, void *b)
struct slob_rcu *slob_rcu;
slob_rcu = b + (c->size - sizeof(struct slob_rcu));
slob_rcu->size = c->size;
- call_rcu(&slob_rcu->head, kmem_rcu_free);
+ call_rcu_alloc(&slob_rcu->head, kmem_rcu_free);
} else {
__kmem_cache_free(b, c->size);
}
diff --git a/mm/slub.c b/mm/slub.c
index b2b047327d76..7f01e57fd99f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1512,7 +1512,7 @@ static void free_slab(struct kmem_cache *s, struct page *page)
head = (void *)&page->lru;
}
- call_rcu(head, rcu_free_slab);
+ call_rcu_alloc(head, rcu_free_slab);
} else
__free_slab(s, page);
}
On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
>> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
>> >
>>> > > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
>>>> > > > On Thu, 19 Jun 2014, Sasha Levin wrote:
>>>> > > >
>>>>> > > > > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
>>>>> > > > > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
>>>>> > > > > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
>>>>> > > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
>>>>> > > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
>>>>> > > > > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
>>>>> > > > > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
>>>>> > > > > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
>>>>> > > > > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
>>>>> > > > > [ 690.770137] ? discard_slab (mm/slub.c:1486)
>>>>> > > > > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
>>>> > > >
>>>> > > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
>>>> > > > used in slab allocators? What happened?
>>> > >
>>> > > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
>>> > > call_rcu_bh(), or call_srcu().
>>> > >
>>> > > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
>>> > > That would be unfortunate...
>> >
>> > Well, no. Look at the callchain:
>> >
>> > __call_rcu
>> > debug_object_activate
>> > rcuhead_fixup_activate
>> > debug_object_init
>> > kmem_cache_alloc
>> >
>> > So call rcu activates the object, but the object has no reference in
>> > the debug objects code so the fixup code is called which inits the
>> > object and allocates a reference ....
> OK, got it. And you are right, call_rcu() has done this for a very
> long time, so not sure what changed.
It's probable my fault. I've introduced clone() and unshare() fuzzing.
Those two are full with issues and I've been waiting with enabling those
until the rest of the kernel could survive trinity for more than an hour.
Thanks,
Sasha
On Thu, Jun 19, 2014 at 03:19:39PM -0500, Christoph Lameter wrote:
> On Thu, 19 Jun 2014, Thomas Gleixner wrote:
>
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> > debug_object_activate
> > rcuhead_fixup_activate
> > debug_object_init
> > kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
>
> So we need to init the object in the page struct before the __call_rcu?
Good point. The patch I just sent will complain at callback-invocation
time because the debug-object information won't be present.
One way to handle this would be for rcu_do_batch() to avoid complaining
if it gets a callback that has not been through call_rcu()'s
debug_rcu_head_queue(). One way to do that would be to have an
alternative to debug_object_deactivate() that does not complain
if it is handed an unactivated object.
Another way to handle this would be for me to put the definition of
debug_rcu_head_queue() somewhere where the sl*b allocator could get
at it, and have the sl*b allocators invoke it some at initialization
and within the RCU callback.
Other thoughts?
Thanx, Paul
On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > Well, no. Look at the callchain:
> >
> > __call_rcu
> > debug_object_activate
> > rcuhead_fixup_activate
> > debug_object_init
> > kmem_cache_alloc
> >
> > So call rcu activates the object, but the object has no reference in
> > the debug objects code so the fixup code is called which inits the
> > object and allocates a reference ....
>
> OK, got it. And you are right, call_rcu() has done this for a very
> long time, so not sure what changed. But it seems like the right
> approach is to provide a debug-object-free call_rcu_alloc() for use
> by the memory allocators.
>
> Seem reasonable? If so, please see the following patch.
Not really, you're torpedoing the whole purpose of debugobjects :)
So, why can't we just init the rcu head when the stuff is created?
If that's impossible due to other memory allocator constraints, then
instead of inventing a whole new API we can simply flag the relevent
data in the memory allocator as we do with the debug objects mem cache
itself (SLAB_DEBUG_OBJECTS).
Thanks,
tglx
On Thu, Jun 19, 2014 at 04:32:38PM -0400, Sasha Levin wrote:
> On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> >> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> >> >
> >>> > > On Thu, Jun 19, 2014 at 10:03:04AM -0500, Christoph Lameter wrote:
> >>>> > > > On Thu, 19 Jun 2014, Sasha Levin wrote:
> >>>> > > >
> >>>>> > > > > [ 690.770137] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> >>>>> > > > > [ 690.770137] __slab_alloc (mm/slub.c:1732 mm/slub.c:2205 mm/slub.c:2369)
> >>>>> > > > > [ 690.770137] ? __lock_acquire (kernel/locking/lockdep.c:3189)
> >>>>> > > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [ 690.770137] kmem_cache_alloc (mm/slub.c:2442 mm/slub.c:2484 mm/slub.c:2489)
> >>>>> > > > > [ 690.770137] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [ 690.770137] ? debug_object_activate (lib/debugobjects.c:439)
> >>>>> > > > > [ 690.770137] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> >>>>> > > > > [ 690.770137] debug_object_init (lib/debugobjects.c:365)
> >>>>> > > > > [ 690.770137] rcuhead_fixup_activate (kernel/rcu/update.c:231)
> >>>>> > > > > [ 690.770137] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> >>>>> > > > > [ 690.770137] ? discard_slab (mm/slub.c:1486)
> >>>>> > > > > [ 690.770137] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 2) kernel/rcu/tree.c:2585 (discriminator 2))
> >>>> > > >
> >>>> > > > __call_rcu does a slab allocation? This means __call_rcu can no longer be
> >>>> > > > used in slab allocators? What happened?
> >>> > >
> >>> > > My guess is that the root cause is a double call_rcu(), call_rcu_sched(),
> >>> > > call_rcu_bh(), or call_srcu().
> >>> > >
> >>> > > Perhaps the DEBUG_OBJECTS code now allocates memory to report errors?
> >>> > > That would be unfortunate...
> >> >
> >> > Well, no. Look at the callchain:
> >> >
> >> > __call_rcu
> >> > debug_object_activate
> >> > rcuhead_fixup_activate
> >> > debug_object_init
> >> > kmem_cache_alloc
> >> >
> >> > So call rcu activates the object, but the object has no reference in
> >> > the debug objects code so the fixup code is called which inits the
> >> > object and allocates a reference ....
> > OK, got it. And you are right, call_rcu() has done this for a very
> > long time, so not sure what changed.
>
> It's probable my fault. I've introduced clone() and unshare() fuzzing.
>
> Those two are full with issues and I've been waiting with enabling those
> until the rest of the kernel could survive trinity for more than an hour.
Well, that might explain why I haven't seen it in my testing. ;-)
Thanx, Paul
On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
>
> The sl*b allocators use call_rcu() to manage object lifetimes, but
> call_rcu() can use debug-objects, which in turn invokes the sl*b
> allocators. These allocators are not prepared for this sort of
> recursion, which can result in failures.
>
> This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
> which act as their call_rcu() and call_rcu_sched() counterparts, but
> which avoid invoking debug-objects. These new API members are intended
> only for use by the sl*b allocators, and this commit makes the sl*b
> allocators use call_rcu_alloc(). Why call_rcu_sched_alloc()? Because
> in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
> therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
>
> Reported-by: Sasha Levin <[email protected]>
> Set-straight-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
Paul, what is this patch based on? It won't apply cleanly on -next
or Linus's tree.
Thanks,
Sasha
On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > Well, no. Look at the callchain:
> > >
> > > __call_rcu
> > > debug_object_activate
> > > rcuhead_fixup_activate
> > > debug_object_init
> > > kmem_cache_alloc
> > >
> > > So call rcu activates the object, but the object has no reference in
> > > the debug objects code so the fixup code is called which inits the
> > > object and allocates a reference ....
> >
> > OK, got it. And you are right, call_rcu() has done this for a very
> > long time, so not sure what changed. But it seems like the right
> > approach is to provide a debug-object-free call_rcu_alloc() for use
> > by the memory allocators.
> >
> > Seem reasonable? If so, please see the following patch.
>
> Not really, you're torpedoing the whole purpose of debugobjects :)
>
> So, why can't we just init the rcu head when the stuff is created?
That would allow me to keep my code unchanged, so I am in favor. ;-)
Thanx, Paul
> If that's impossible due to other memory allocator constraints, then
> instead of inventing a whole new API we can simply flag the relevent
> data in the memory allocator as we do with the debug objects mem cache
> itself (SLAB_DEBUG_OBJECTS).
>
> Thanks,
>
> tglx
>
On Thu, Jun 19, 2014 at 04:42:11PM -0400, Sasha Levin wrote:
> On 06/19/2014 04:29 PM, Paul E. McKenney wrote:
> > rcu: Provide call_rcu_alloc() and call_rcu_sched_alloc() to avoid recursion
> >
> > The sl*b allocators use call_rcu() to manage object lifetimes, but
> > call_rcu() can use debug-objects, which in turn invokes the sl*b
> > allocators. These allocators are not prepared for this sort of
> > recursion, which can result in failures.
> >
> > This commit therefore creates call_rcu_alloc() and call_rcu_sched_alloc(),
> > which act as their call_rcu() and call_rcu_sched() counterparts, but
> > which avoid invoking debug-objects. These new API members are intended
> > only for use by the sl*b allocators, and this commit makes the sl*b
> > allocators use call_rcu_alloc(). Why call_rcu_sched_alloc()? Because
> > in CONFIG_PREEMPT=n kernels, call_rcu() maps to call_rcu_sched(), so
> > therefore call_rcu_alloc() must map to call_rcu_sched_alloc().
> >
> > Reported-by: Sasha Levin <[email protected]>
> > Set-straight-by: Thomas Gleixner <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
>
> Paul, what is this patch based on? It won't apply cleanly on -next
> or Linus's tree.
On my -rcu tree, but I think that Thomas's approach is better.
Thanx, Paul
On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > Well, no. Look at the callchain:
> > > >
> > > > __call_rcu
> > > > debug_object_activate
> > > > rcuhead_fixup_activate
> > > > debug_object_init
> > > > kmem_cache_alloc
> > > >
> > > > So call rcu activates the object, but the object has no reference in
> > > > the debug objects code so the fixup code is called which inits the
> > > > object and allocates a reference ....
> > >
> > > OK, got it. And you are right, call_rcu() has done this for a very
> > > long time, so not sure what changed. But it seems like the right
> > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > by the memory allocators.
> > >
> > > Seem reasonable? If so, please see the following patch.
> >
> > Not really, you're torpedoing the whole purpose of debugobjects :)
> >
> > So, why can't we just init the rcu head when the stuff is created?
>
> That would allow me to keep my code unchanged, so I am in favor. ;-)
Almost unchanged. You need to provide a function to do so, i.e. make
use of
debug_init_rcu_head()
Thanks,
tglx
On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
>
>
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
>
> > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > Well, no. Look at the callchain:
> > > > >
> > > > > __call_rcu
> > > > > debug_object_activate
> > > > > rcuhead_fixup_activate
> > > > > debug_object_init
> > > > > kmem_cache_alloc
> > > > >
> > > > > So call rcu activates the object, but the object has no reference in
> > > > > the debug objects code so the fixup code is called which inits the
> > > > > object and allocates a reference ....
> > > >
> > > > OK, got it. And you are right, call_rcu() has done this for a very
> > > > long time, so not sure what changed. But it seems like the right
> > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > by the memory allocators.
> > > >
> > > > Seem reasonable? If so, please see the following patch.
> > >
> > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > >
> > > So, why can't we just init the rcu head when the stuff is created?
> >
> > That would allow me to keep my code unchanged, so I am in favor. ;-)
>
> Almost unchanged. You need to provide a function to do so, i.e. make
> use of
>
> debug_init_rcu_head()
You mean like this?
Thanx, Paul
------------------------------------------------------------------------
rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks. If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.
This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.
Reported-by: Sasha Levin <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 063a6bf1a2b6..34ae5c376e35 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -358,9 +358,19 @@ void wait_rcu_gp(call_rcu_func_t crf);
* initialization.
*/
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+void debug_init_rcu_head(struct rcu_head *head);
+void debug_rcu_head_free(struct rcu_head *head);
void init_rcu_head_on_stack(struct rcu_head *head);
void destroy_rcu_head_on_stack(struct rcu_head *head);
#else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void debug_init_rcu_head(struct rcu_head *head)
+{
+}
+
+static inline void debug_rcu_head_free(struct rcu_head *head)
+{
+}
+
static inline void init_rcu_head_on_stack(struct rcu_head *head)
{
}
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a2aeb4df0f60..a41c81a26506 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -200,12 +200,12 @@ void wait_rcu_gp(call_rcu_func_t crf)
EXPORT_SYMBOL_GPL(wait_rcu_gp);
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-static inline void debug_init_rcu_head(struct rcu_head *head)
+void debug_init_rcu_head(struct rcu_head *head)
{
debug_object_init(head, &rcuhead_debug_descr);
}
-static inline void debug_rcu_head_free(struct rcu_head *head)
+void debug_rcu_head_free(struct rcu_head *head)
{
debug_object_free(head, &rcuhead_debug_descr);
}
On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
> >
> >
> > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> >
> > > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > Well, no. Look at the callchain:
> > > > > >
> > > > > > __call_rcu
> > > > > > debug_object_activate
> > > > > > rcuhead_fixup_activate
> > > > > > debug_object_init
> > > > > > kmem_cache_alloc
> > > > > >
> > > > > > So call rcu activates the object, but the object has no reference in
> > > > > > the debug objects code so the fixup code is called which inits the
> > > > > > object and allocates a reference ....
> > > > >
> > > > > OK, got it. And you are right, call_rcu() has done this for a very
> > > > > long time, so not sure what changed. But it seems like the right
> > > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > > by the memory allocators.
> > > > >
> > > > > Seem reasonable? If so, please see the following patch.
> > > >
> > > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > > >
> > > > So, why can't we just init the rcu head when the stuff is created?
> > >
> > > That would allow me to keep my code unchanged, so I am in favor. ;-)
> >
> > Almost unchanged. You need to provide a function to do so, i.e. make
> > use of
> >
> > debug_init_rcu_head()
>
> You mean like this?
I'd rather name it init_rcu_head() and free_rcu_head() w/o the debug_
prefix, so it's consistent with init_rcu_head_on_stack /
destroy_rcu_head_on_stack. But either way works for me.
Acked-by: Thomas Gleixner <[email protected]>
On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> This commit therefore exports the debug_init_rcu_head() and
> debug_rcu_head_free() functions, which permits the allocators to allocated
> and pre-initialize the debug-objects information, so that there no longer
> any need for call_rcu() to do that initialization, which in turn prevents
> the recursion into the memory allocators.
Looks-good-to: Christoph Lameter <[email protected]>
On Fri, Jun 20, 2014 at 10:17:32AM +0200, Thomas Gleixner wrote:
> On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > On Thu, Jun 19, 2014 at 11:32:41PM +0200, Thomas Gleixner wrote:
> > >
> > >
> > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > >
> > > > On Thu, Jun 19, 2014 at 10:37:17PM +0200, Thomas Gleixner wrote:
> > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > On Thu, Jun 19, 2014 at 09:29:08PM +0200, Thomas Gleixner wrote:
> > > > > > > On Thu, 19 Jun 2014, Paul E. McKenney wrote:
> > > > > > > Well, no. Look at the callchain:
> > > > > > >
> > > > > > > __call_rcu
> > > > > > > debug_object_activate
> > > > > > > rcuhead_fixup_activate
> > > > > > > debug_object_init
> > > > > > > kmem_cache_alloc
> > > > > > >
> > > > > > > So call rcu activates the object, but the object has no reference in
> > > > > > > the debug objects code so the fixup code is called which inits the
> > > > > > > object and allocates a reference ....
> > > > > >
> > > > > > OK, got it. And you are right, call_rcu() has done this for a very
> > > > > > long time, so not sure what changed. But it seems like the right
> > > > > > approach is to provide a debug-object-free call_rcu_alloc() for use
> > > > > > by the memory allocators.
> > > > > >
> > > > > > Seem reasonable? If so, please see the following patch.
> > > > >
> > > > > Not really, you're torpedoing the whole purpose of debugobjects :)
> > > > >
> > > > > So, why can't we just init the rcu head when the stuff is created?
> > > >
> > > > That would allow me to keep my code unchanged, so I am in favor. ;-)
> > >
> > > Almost unchanged. You need to provide a function to do so, i.e. make
> > > use of
> > >
> > > debug_init_rcu_head()
> >
> > You mean like this?
>
> I'd rather name it init_rcu_head() and free_rcu_head() w/o the debug_
> prefix, so it's consistent with init_rcu_head_on_stack /
> destroy_rcu_head_on_stack. But either way works for me.
>
> Acked-by: Thomas Gleixner <[email protected]>
So just drop the _on_stack() from the other names, then. Please see
below.
Thanx, Paul
------------------------------------------------------------------------
rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
Currently, call_rcu() relies on implicit allocation and initialization
for the debug-objects handling of RCU callbacks. If you hammer the
kernel hard enough with Sasha's modified version of trinity, you can end
up with the sl*b allocators recursing into themselves via this implicit
call_rcu() allocation.
This commit therefore exports the debug_init_rcu_head() and
debug_rcu_head_free() functions, which permits the allocators to allocated
and pre-initialize the debug-objects information, so that there no longer
any need for call_rcu() to do that initialization, which in turn prevents
the recursion into the memory allocators.
Reported-by: Sasha Levin <[email protected]>
Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
Acked-by: Thomas Gleixner <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 063a6bf1a2b6..37c92cfef9ec 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -358,9 +358,19 @@ void wait_rcu_gp(call_rcu_func_t crf);
* initialization.
*/
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
+void init_rcu_head(struct rcu_head *head);
+void destroy_rcu_head(struct rcu_head *head);
void init_rcu_head_on_stack(struct rcu_head *head);
void destroy_rcu_head_on_stack(struct rcu_head *head);
#else /* !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
+static inline void init_rcu_head(struct rcu_head *head)
+{
+}
+
+static inline void destroy_rcu_head(struct rcu_head *head)
+{
+}
+
static inline void init_rcu_head_on_stack(struct rcu_head *head)
{
}
diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
index a2aeb4df0f60..0fb691e63ce6 100644
--- a/kernel/rcu/update.c
+++ b/kernel/rcu/update.c
@@ -200,12 +200,12 @@ void wait_rcu_gp(call_rcu_func_t crf)
EXPORT_SYMBOL_GPL(wait_rcu_gp);
#ifdef CONFIG_DEBUG_OBJECTS_RCU_HEAD
-static inline void debug_init_rcu_head(struct rcu_head *head)
+void init_rcu_head(struct rcu_head *head)
{
debug_object_init(head, &rcuhead_debug_descr);
}
-static inline void debug_rcu_head_free(struct rcu_head *head)
+void destroy_rcu_head(struct rcu_head *head)
{
debug_object_free(head, &rcuhead_debug_descr);
}
On 06/20/2014 11:40 AM, Paul E. McKenney wrote:
> rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
>
> Currently, call_rcu() relies on implicit allocation and initialization
> for the debug-objects handling of RCU callbacks. If you hammer the
> kernel hard enough with Sasha's modified version of trinity, you can end
> up with the sl*b allocators recursing into themselves via this implicit
> call_rcu() allocation.
>
> This commit therefore exports the debug_init_rcu_head() and
> debug_rcu_head_free() functions, which permits the allocators to allocated
> and pre-initialize the debug-objects information, so that there no longer
> any need for call_rcu() to do that initialization, which in turn prevents
> the recursion into the memory allocators.
>
> Reported-by: Sasha Levin <[email protected]>
> Suggested-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
> Acked-by: Thomas Gleixner <[email protected]>
Hi Paul,
Oddly enough, I still see the issue in -next (I made sure that this patch
was in the tree):
[ 393.810123] =============================================
[ 393.810123] [ INFO: possible recursive locking detected ]
[ 393.810123] 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 Not tainted
[ 393.810123] ---------------------------------------------
[ 393.810123] trinity-c32/9762 is trying to acquire lock:
[ 393.810123] (&(&n->list_lock)->rlock){-.-...}, at: get_partial_node.isra.39 (mm/slub.c:1628)
[ 393.810123]
[ 393.810123] but task is already holding lock:
[ 393.810123] (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[ 393.810123]
[ 393.810123] other info that might help us debug this:
[ 393.810123] Possible unsafe locking scenario:
[ 393.810123]
[ 393.810123] CPU0
[ 393.810123] ----
[ 393.810123] lock(&(&n->list_lock)->rlock);
[ 393.810123] lock(&(&n->list_lock)->rlock);
[ 393.810123]
[ 393.810123] *** DEADLOCK ***
[ 393.810123]
[ 393.810123] May be due to missing lock nesting notation
[ 393.810123]
[ 393.810123] 5 locks held by trinity-c32/9762:
[ 393.810123] #0: (net_mutex){+.+.+.}, at: copy_net_ns (net/core/net_namespace.c:254)
[ 393.810123] #1: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
[ 393.810123] #2: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
[ 393.810123] #3: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:344)
[ 393.810123] #4: (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[ 393.810123]
[ 393.810123] stack backtrace:
[ 393.810123] CPU: 32 PID: 9762 Comm: trinity-c32 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
[ 393.843284] ffff880bc26730e0 0000000000000000 ffffffffb4ae7ff0 ffff880bc26a3848
[ 393.843284] ffffffffb0e47068 ffffffffb4ae7ff0 ffff880bc26a38f0 ffffffffac258586
[ 393.843284] ffff880bc2673e30 000000050000000a ffffffffb444dee0 ffff880bc2673e48
[ 393.843284] Call Trace:
[ 393.843284] dump_stack (lib/dump_stack.c:52)
[ 393.843284] __lock_acquire (kernel/locking/lockdep.c:1739 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
[ 393.843284] lock_acquire (kernel/locking/lockdep.c:3602)
[ 393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
[ 393.843284] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
[ 393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
[ 393.843284] get_partial_node.isra.39 (mm/slub.c:1628)
[ 393.843284] ? check_irq_usage (kernel/locking/lockdep.c:1638)
[ 393.843284] ? __slab_alloc (mm/slub.c:2307)
[ 393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 393.843284] __slab_alloc (mm/slub.c:1730 mm/slub.c:2208 mm/slub.c:2372)
[ 393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 393.843284] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
[ 393.843284] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
[ 393.843284] kmem_cache_alloc (mm/slub.c:2445 mm/slub.c:2487 mm/slub.c:2492)
[ 393.843284] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
[ 393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 393.843284] ? check_chain_key (kernel/locking/lockdep.c:2188)
[ 393.843284] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
[ 393.843284] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
[ 393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
[ 393.843284] debug_object_init (lib/debugobjects.c:365)
[ 393.843284] rcuhead_fixup_activate (kernel/rcu/update.c:260)
[ 393.843284] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
[ 393.843284] ? preempt_count_sub (kernel/sched/core.c:2600)
[ 393.843284] ? slab_cpuup_callback (mm/slub.c:1484)
[ 393.843284] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 8) kernel/rcu/tree.c:2665 (discriminator 8))
[ 393.843284] ? __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[ 393.843284] call_rcu (kernel/rcu/tree_plugin.h:679)
[ 393.843284] discard_slab (mm/slub.c:1522)
[ 393.843284] __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
[ 393.843284] kmem_cache_destroy (mm/slab_common.c:350)
[ 393.843284] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 3))
[ 393.843284] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
[ 393.843284] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
[ 393.843284] setup_net (net/core/net_namespace.c:180 (discriminator 3))
[ 393.843284] copy_net_ns (net/core/net_namespace.c:255)
[ 393.843284] create_new_namespaces (kernel/nsproxy.c:95)
[ 393.843284] unshare_nsproxy_namespaces (kernel/nsproxy.c:190 (discriminator 4))
[ 393.843284] SyS_unshare (kernel/fork.c:1865 kernel/fork.c:1814)
[ 393.843284] tracesys (arch/x86/kernel/entry_64.S:542)
Thanks,
Sasha
On Sat, Jul 12, 2014 at 02:03:57PM -0400, Sasha Levin wrote:
> On 06/20/2014 11:40 AM, Paul E. McKenney wrote:
> > rcu: Export debug_init_rcu_head() and and debug_init_rcu_head()
> >
> > Currently, call_rcu() relies on implicit allocation and initialization
> > for the debug-objects handling of RCU callbacks. If you hammer the
> > kernel hard enough with Sasha's modified version of trinity, you can end
> > up with the sl*b allocators recursing into themselves via this implicit
> > call_rcu() allocation.
> >
> > This commit therefore exports the debug_init_rcu_head() and
> > debug_rcu_head_free() functions, which permits the allocators to allocated
> > and pre-initialize the debug-objects information, so that there no longer
> > any need for call_rcu() to do that initialization, which in turn prevents
> > the recursion into the memory allocators.
> >
> > Reported-by: Sasha Levin <[email protected]>
> > Suggested-by: Thomas Gleixner <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> > Acked-by: Thomas Gleixner <[email protected]>
>
> Hi Paul,
>
> Oddly enough, I still see the issue in -next (I made sure that this patch
> was in the tree):
Hello, Sasha,
This commit is only part of the solution. The allocators need to change
to make use of it.
Thanx, Paul
> [ 393.810123] =============================================
> [ 393.810123] [ INFO: possible recursive locking detected ]
> [ 393.810123] 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813 Not tainted
> [ 393.810123] ---------------------------------------------
> [ 393.810123] trinity-c32/9762 is trying to acquire lock:
> [ 393.810123] (&(&n->list_lock)->rlock){-.-...}, at: get_partial_node.isra.39 (mm/slub.c:1628)
> [ 393.810123]
> [ 393.810123] but task is already holding lock:
> [ 393.810123] (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [ 393.810123]
> [ 393.810123] other info that might help us debug this:
> [ 393.810123] Possible unsafe locking scenario:
> [ 393.810123]
> [ 393.810123] CPU0
> [ 393.810123] ----
> [ 393.810123] lock(&(&n->list_lock)->rlock);
> [ 393.810123] lock(&(&n->list_lock)->rlock);
> [ 393.810123]
> [ 393.810123] *** DEADLOCK ***
> [ 393.810123]
> [ 393.810123] May be due to missing lock nesting notation
> [ 393.810123]
> [ 393.810123] 5 locks held by trinity-c32/9762:
> [ 393.810123] #0: (net_mutex){+.+.+.}, at: copy_net_ns (net/core/net_namespace.c:254)
> [ 393.810123] #1: (cpu_hotplug.lock){++++++}, at: get_online_cpus (kernel/cpu.c:90)
> [ 393.810123] #2: (mem_hotplug.lock){.+.+.+}, at: get_online_mems (mm/memory_hotplug.c:83)
> [ 393.810123] #3: (slab_mutex){+.+.+.}, at: kmem_cache_destroy (mm/slab_common.c:344)
> [ 393.810123] #4: (&(&n->list_lock)->rlock){-.-...}, at: __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [ 393.810123]
> [ 393.810123] stack backtrace:
> [ 393.810123] CPU: 32 PID: 9762 Comm: trinity-c32 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
> [ 393.843284] ffff880bc26730e0 0000000000000000 ffffffffb4ae7ff0 ffff880bc26a3848
> [ 393.843284] ffffffffb0e47068 ffffffffb4ae7ff0 ffff880bc26a38f0 ffffffffac258586
> [ 393.843284] ffff880bc2673e30 000000050000000a ffffffffb444dee0 ffff880bc2673e48
> [ 393.843284] Call Trace:
> [ 393.843284] dump_stack (lib/dump_stack.c:52)
> [ 393.843284] __lock_acquire (kernel/locking/lockdep.c:1739 kernel/locking/lockdep.c:1783 kernel/locking/lockdep.c:2115 kernel/locking/lockdep.c:3182)
> [ 393.843284] lock_acquire (kernel/locking/lockdep.c:3602)
> [ 393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
> [ 393.843284] _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
> [ 393.843284] ? get_partial_node.isra.39 (mm/slub.c:1628)
> [ 393.843284] get_partial_node.isra.39 (mm/slub.c:1628)
> [ 393.843284] ? check_irq_usage (kernel/locking/lockdep.c:1638)
> [ 393.843284] ? __slab_alloc (mm/slub.c:2307)
> [ 393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 393.843284] __slab_alloc (mm/slub.c:1730 mm/slub.c:2208 mm/slub.c:2372)
> [ 393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 393.843284] ? kvm_clock_read (./arch/x86/include/asm/preempt.h:90 arch/x86/kernel/kvmclock.c:86)
> [ 393.843284] ? sched_clock (./arch/x86/include/asm/paravirt.h:192 arch/x86/kernel/tsc.c:304)
> [ 393.843284] kmem_cache_alloc (mm/slub.c:2445 mm/slub.c:2487 mm/slub.c:2492)
> [ 393.843284] ? debug_smp_processor_id (lib/smp_processor_id.c:57)
> [ 393.843284] ? __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 393.843284] ? check_chain_key (kernel/locking/lockdep.c:2188)
> [ 393.843284] __debug_object_init (lib/debugobjects.c:100 lib/debugobjects.c:312)
> [ 393.843284] ? _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:160 kernel/locking/spinlock.c:191)
> [ 393.843284] ? __this_cpu_preempt_check (lib/smp_processor_id.c:63)
> [ 393.843284] debug_object_init (lib/debugobjects.c:365)
> [ 393.843284] rcuhead_fixup_activate (kernel/rcu/update.c:260)
> [ 393.843284] debug_object_activate (lib/debugobjects.c:280 lib/debugobjects.c:439)
> [ 393.843284] ? preempt_count_sub (kernel/sched/core.c:2600)
> [ 393.843284] ? slab_cpuup_callback (mm/slub.c:1484)
> [ 393.843284] __call_rcu (kernel/rcu/rcu.h:76 (discriminator 8) kernel/rcu/tree.c:2665 (discriminator 8))
> [ 393.843284] ? __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [ 393.843284] call_rcu (kernel/rcu/tree_plugin.h:679)
> [ 393.843284] discard_slab (mm/slub.c:1522)
> [ 393.843284] __kmem_cache_shutdown (mm/slub.c:3210 mm/slub.c:3233 mm/slub.c:3244)
> [ 393.843284] kmem_cache_destroy (mm/slab_common.c:350)
> [ 393.843284] nf_conntrack_cleanup_net_list (net/netfilter/nf_conntrack_core.c:1569 (discriminator 3))
> [ 393.843284] nf_conntrack_pernet_exit (net/netfilter/nf_conntrack_standalone.c:558)
> [ 393.843284] ops_exit_list.isra.1 (net/core/net_namespace.c:135)
> [ 393.843284] setup_net (net/core/net_namespace.c:180 (discriminator 3))
> [ 393.843284] copy_net_ns (net/core/net_namespace.c:255)
> [ 393.843284] create_new_namespaces (kernel/nsproxy.c:95)
> [ 393.843284] unshare_nsproxy_namespaces (kernel/nsproxy.c:190 (discriminator 4))
> [ 393.843284] SyS_unshare (kernel/fork.c:1865 kernel/fork.c:1814)
> [ 393.843284] tracesys (arch/x86/kernel/entry_64.S:542)
>
>
> Thanks,
> Sasha
>