Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 82e310033d7c21a7a88427f14e0dad78d731a5cd ("rcutorture: Enable multiple concurrent callback-flood kthreads")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: default
torture_type: rcu
test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+-------------------------------------------------------------------------------+------------+------------+
| | 12e885433d | 82e310033d |
+-------------------------------------------------------------------------------+------------+------------+
| boot_successes | 95 | 47 |
| boot_failures | 31 | 25 |
| invoked_oom-killer:gfp_mask=0x | 5 | 4 |
| Mem-Info | 10 | 16 |
| WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture] | 24 | 15 |
| EIP:rcutorture_oom_notify | 24 | 15 |
| page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null) | 5 | 12 |
| WARNING:possible_recursive_locking_detected | 0 | 15 |
| WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog.cold[rcutorture] | 0 | 6 |
| EIP:rcu_torture_fwd_prog.cold | 0 | 6 |
+-------------------------------------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 63.586627][ T2] ============================================
[ 63.586883][ T2] WARNING: possible recursive locking detected
[ 63.587141][ T2] 5.16.0-rc1-00016-g82e310033d7c #1 Tainted: G W
[ 63.587457][ T2] --------------------------------------------
[ 63.587712][ T2] kthreadd/2 is trying to acquire lock:
[ 63.587943][ T2] f941d6f8 (rcu_fwd_mutex){+.+.}-{3:3}, at: rcu_torture_fwd_cb_hist+0x40/0x110 [rcutorture]
[ 63.588368][ T2]
[ 63.588368][ T2] but task is already holding lock:
[ 63.588675][ T2] f941d6f8 (rcu_fwd_mutex){+.+.}-{3:3}, at: rcutorture_oom_notify+0x1d/0x100 [rcutorture]
[ 63.589092][ T2]
[ 63.589092][ T2] other info that might help us debug this:
[ 63.589427][ T2] Possible unsafe locking scenario:
[ 63.589427][ T2]
[ 63.589736][ T2] CPU0
[ 63.589873][ T2] ----
[ 63.590011][ T2] lock(rcu_fwd_mutex);
[ 63.590189][ T2] lock(rcu_fwd_mutex);
[ 63.590366][ T2]
[ 63.590366][ T2] *** DEADLOCK ***
[ 63.590366][ T2]
[ 63.590704][ T2] May be due to missing lock nesting notation
[ 63.590704][ T2]
[ 63.591049][ T2] 3 locks held by kthreadd/2:
[ 63.591247][ T2] #0: c5579e38 (oom_lock){+.+.}-{3:3}, at: __alloc_pages_slowpath+0x4f7/0xe40
[ 63.591664][ T2] #1: c5579cbc ((oom_notify_list).rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x29/0x80
[ 63.592107][ T2] #2: f941d6f8 (rcu_fwd_mutex){+.+.}-{3:3}, at: rcutorture_oom_notify+0x1d/0x100 [rcutorture]
[ 63.592544][ T2]
[ 63.592544][ T2] stack backtrace:
[ 63.592789][ T2] CPU: 1 PID: 2 Comm: kthreadd Tainted: G W 5.16.0-rc1-00016-g82e310033d7c #1
[ 63.593210][ T2] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 63.593591][ T2] Call Trace:
[ 63.593729][ T2] dump_stack_lvl+0x44/0x57
[ 63.593920][ T2] dump_stack+0xd/0x10
[ 63.594091][ T2] validate_chain.cold+0xff/0x12f
[ 63.594303][ T2] ? irq_work_claim+0x51/0x80
[ 63.594500][ T2] ? wake_up_klogd+0x95/0xb0
[ 63.594718][ T2] __lock_acquire+0x4ab/0xa40
[ 63.594916][ T2] lock_acquire+0x94/0x280
[ 63.595101][ T2] ? rcu_torture_fwd_cb_hist+0x40/0x110 [rcutorture]
[ 63.595384][ T2] ? report_bug+0x9d/0xe0
[ 63.595568][ T2] __mutex_lock+0x5a/0xe50
[ 63.595753][ T2] ? rcu_torture_fwd_cb_hist+0x40/0x110 [rcutorture]
[ 63.596035][ T2] ? exc_invalid_op+0x1b/0x60
[ 63.596231][ T2] ? handle_exception+0x133/0x133
[ 63.596442][ T2] ? rcu_torture_fwd_prog+0xf0/0xf0 [rcutorture]
[ 63.596708][ T2] ? memory_bm_create+0x1ab/0x3e0
[ 63.596920][ T2] ? rcu_torture_fwd_prog+0xf0/0xf0 [rcutorture]
[ 63.597186][ T2] mutex_lock_nested+0x19/0x20
[ 63.597386][ T2] ? rcu_torture_fwd_cb_hist+0x40/0x110 [rcutorture]
[ 63.597666][ T2] rcu_torture_fwd_cb_hist+0x40/0x110 [rcutorture]
[ 63.597940][ T2] ? rcu_torture_fwd_prog+0xf0/0xf0 [rcutorture]
[ 63.598206][ T2] ? rcu_torture_fwd_prog+0xf0/0xf0 [rcutorture]
[ 63.598472][ T2] rcutorture_oom_notify+0x98/0x100 [rcutorture]
[ 63.598739][ T2] ? down_read+0x76/0x280
[ 63.598920][ T2] ? rcu_torture_fwd_prog+0xf0/0xf0 [rcutorture]
[ 63.599188][ T2] blocking_notifier_call_chain+0x5d/0x80
[ 63.599427][ T2] out_of_memory+0x1bf/0x3c0
[ 63.599621][ T2] __alloc_pages_slowpath+0xd7f/0xe40
[ 63.599890][ T2] __alloc_pages+0x3ce/0x450
[ 63.600083][ T2] cache_grow_begin+0x247/0x320
[ 63.600330][ T2] cache_alloc_refill+0x2f0/0x540
[ 63.600541][ T2] ? check_preemption_disabled+0x38/0xf0
[ 63.600778][ T2] kmem_cache_alloc+0x7a9/0x8f0
[ 63.600983][ T2] copy_process+0xc19/0x1d30
[ 63.601175][ T2] ? check_preemption_disabled+0x38/0xf0
[ 63.601412][ T2] kernel_clone+0x9b/0x8b0
[ 63.601598][ T2] ? check_preemption_disabled+0x38/0xf0
[ 63.601834][ T2] ? set_kthread_struct+0x60/0x60
[ 63.602045][ T2] kernel_thread+0x4b/0x60
[ 63.602230][ T2] ? set_kthread_struct+0x60/0x60
[ 63.602441][ T2] kthreadd+0x11e/0x170
[ 63.602616][ T2] ? kthread_is_per_cpu+0x30/0x30
[ 63.602826][ T2] ret_from_fork+0x1c/0x28
To reproduce:
# build kernel
cd linux
cp config-5.16.0-rc1-00016-g82e310033d7c .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
On Tue, Dec 28, 2021 at 11:11:35PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: 82e310033d7c21a7a88427f14e0dad78d731a5cd ("rcutorture: Enable multiple concurrent callback-flood kthreads")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: rcutorture
> version:
> with following parameters:
>
> runtime: 300s
> test: default
> torture_type: rcu
>
> test-description: rcutorture is rcutorture kernel module load/unload test.
> test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------------------------------------+------------+------------+
> | | 12e885433d | 82e310033d |
> +-------------------------------------------------------------------------------+------------+------------+
> | boot_successes | 95 | 47 |
> | boot_failures | 31 | 25 |
> | invoked_oom-killer:gfp_mask=0x | 5 | 4 |
> | Mem-Info | 10 | 16 |
> | WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture] | 24 | 15 |
> | EIP:rcutorture_oom_notify | 24 | 15 |
> | page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null) | 5 | 12 |
> | WARNING:possible_recursive_locking_detected | 0 | 15 |
> | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog.cold[rcutorture] | 0 | 6 |
> | EIP:rcu_torture_fwd_prog.cold | 0 | 6 |
> +-------------------------------------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
Good catch! Does this following patch address it?
Thanx, Paul
------------------------------------------------------------------------
commit dd47cbdcc2f72ba3df1248fb7fe210acca18d09c
Author: Paul E. McKenney <[email protected]>
Date: Tue Dec 28 15:59:38 2021 -0800
rcutorture: Fix rcu_fwd_mutex deadlock
The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is
invoked from rcutorture_oom_notify() function, which hold this same
mutex across this call. This commit fixes the resulting deadlock.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 918a2ea34ba13..9190dce686208 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -2184,7 +2184,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
if (rfp->n_launders_hist[i].n_launders > 0)
break;
- mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
pr_alert("%s: Callback-invocation histogram %d (duration %lu jiffies):",
__func__, rfp->rcu_fwd_id, jiffies - rfp->rcu_fwd_startat);
gps_old = rfp->rcu_launder_gp_seq_start;
@@ -2197,7 +2196,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
gps_old = gps;
}
pr_cont("\n");
- mutex_unlock(&rcu_fwd_mutex);
}
/* Callback function for continuous-flood RCU callbacks. */
@@ -2435,7 +2433,9 @@ static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
n_launders, n_launders_sa,
n_max_gps, n_max_cbs, cver, gps);
atomic_long_add(n_max_cbs, &rcu_fwd_max_cbs);
+ mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
rcu_torture_fwd_cb_hist(rfp);
+ mutex_unlock(&rcu_fwd_mutex);
}
schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
hi Paul,
On Tue, Dec 28, 2021 at 04:06:09PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 28, 2021 at 11:11:35PM +0800, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed the following commit (built with gcc-9):
> >
> > commit: 82e310033d7c21a7a88427f14e0dad78d731a5cd ("rcutorture: Enable multiple concurrent callback-flood kthreads")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > in testcase: rcutorture
> > version:
> > with following parameters:
> >
> > runtime: 300s
> > test: default
> > torture_type: rcu
> >
> > test-description: rcutorture is rcutorture kernel module load/unload test.
> > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> >
> >
> > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> >
> > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> >
> >
> > +-------------------------------------------------------------------------------+------------+------------+
> > | | 12e885433d | 82e310033d |
> > +-------------------------------------------------------------------------------+------------+------------+
> > | boot_successes | 95 | 47 |
> > | boot_failures | 31 | 25 |
> > | invoked_oom-killer:gfp_mask=0x | 5 | 4 |
> > | Mem-Info | 10 | 16 |
> > | WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture] | 24 | 15 |
> > | EIP:rcutorture_oom_notify | 24 | 15 |
> > | page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null) | 5 | 12 |
> > | WARNING:possible_recursive_locking_detected | 0 | 15 |
> > | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog.cold[rcutorture] | 0 | 6 |
> > | EIP:rcu_torture_fwd_prog.cold | 0 | 6 |
> > +-------------------------------------------------------------------------------+------------+------------+
> >
> >
> > If you fix the issue, kindly add following tag
> > Reported-by: kernel test robot <[email protected]>
>
> Good catch! Does this following patch address it?
we applied below patch upon next-20211224,
confirmed no "WARNING:possible_recursive_locking_detected" after patch.
=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type:
gcc-9/i386-randconfig-a002-20211003/debian-i386-20191205.cgz/300s/vm-snb-i386/default/rcutorture/rcu
commit:
next-20211224
917801238dcca ("rcutorture: Fix rcu_fwd_mutex deadlock")
next-20211224 917801238dccaf61a63ffe77890
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
3:100 -1% 2:100 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
2:100 -2% :100 dmesg.BUG:workqueue_lockup-pool
1:100 -1% :100 dmesg.EIP:clear_user
2:100 2% 4:100 dmesg.EIP:rcu_torture_fwd_prog_cr
15:100 -3% 12:100 dmesg.EIP:rcutorture_oom_notify
2:100 -2% :100 dmesg.Kernel_panic-not_syncing:System_is_deadlocked_on_memory
9:100 -5% 4:100 dmesg.Mem-Info
2:100 -2% :100 dmesg.Out_of_memory_and_no_killable_processes
2:100 2% 4:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_cr[rcutorture]
15:100 -3% 12:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture]
15:100 -15% :100 dmesg.WARNING:possible_recursive_locking_detected <--------
3:100 -1% 2:100 dmesg.invoked_oom-killer:gfp_mask=0x
6:100 -4% 2:100 dmesg.page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null)
100:100 0% 100:100 dmesg.sysctl_table_check_failed
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit dd47cbdcc2f72ba3df1248fb7fe210acca18d09c
> Author: Paul E. McKenney <[email protected]>
> Date: Tue Dec 28 15:59:38 2021 -0800
>
> rcutorture: Fix rcu_fwd_mutex deadlock
>
> The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is
> invoked from rcutorture_oom_notify() function, which hold this same
> mutex across this call. This commit fixes the resulting deadlock.
>
> Reported-by: kernel test robot <[email protected]>
> Signed-off-by: Paul E. McKenney <[email protected]>
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index 918a2ea34ba13..9190dce686208 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -2184,7 +2184,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
> if (rfp->n_launders_hist[i].n_launders > 0)
> break;
> - mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> pr_alert("%s: Callback-invocation histogram %d (duration %lu jiffies):",
> __func__, rfp->rcu_fwd_id, jiffies - rfp->rcu_fwd_startat);
> gps_old = rfp->rcu_launder_gp_seq_start;
> @@ -2197,7 +2196,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> gps_old = gps;
> }
> pr_cont("\n");
> - mutex_unlock(&rcu_fwd_mutex);
> }
>
> /* Callback function for continuous-flood RCU callbacks. */
> @@ -2435,7 +2433,9 @@ static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
> n_launders, n_launders_sa,
> n_max_gps, n_max_cbs, cver, gps);
> atomic_long_add(n_max_cbs, &rcu_fwd_max_cbs);
> + mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> rcu_torture_fwd_cb_hist(rfp);
> + mutex_unlock(&rcu_fwd_mutex);
> }
> schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
> tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
On Wed, Dec 29, 2021 at 10:01:21PM +0800, Oliver Sang wrote:
> hi Paul,
>
> On Tue, Dec 28, 2021 at 04:06:09PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 28, 2021 at 11:11:35PM +0800, kernel test robot wrote:
> > >
> > >
> > > Greeting,
> > >
> > > FYI, we noticed the following commit (built with gcc-9):
> > >
> > > commit: 82e310033d7c21a7a88427f14e0dad78d731a5cd ("rcutorture: Enable multiple concurrent callback-flood kthreads")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > in testcase: rcutorture
> > > version:
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: default
> > > torture_type: rcu
> > >
> > > test-description: rcutorture is rcutorture kernel module load/unload test.
> > > test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
> > >
> > >
> > > on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
> > >
> > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
> > >
> > >
> > > +-------------------------------------------------------------------------------+------------+------------+
> > > | | 12e885433d | 82e310033d |
> > > +-------------------------------------------------------------------------------+------------+------------+
> > > | boot_successes | 95 | 47 |
> > > | boot_failures | 31 | 25 |
> > > | invoked_oom-killer:gfp_mask=0x | 5 | 4 |
> > > | Mem-Info | 10 | 16 |
> > > | WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture] | 24 | 15 |
> > > | EIP:rcutorture_oom_notify | 24 | 15 |
> > > | page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null) | 5 | 12 |
> > > | WARNING:possible_recursive_locking_detected | 0 | 15 |
> > > | WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog.cold[rcutorture] | 0 | 6 |
> > > | EIP:rcu_torture_fwd_prog.cold | 0 | 6 |
> > > +-------------------------------------------------------------------------------+------------+------------+
> > >
> > >
> > > If you fix the issue, kindly add following tag
> > > Reported-by: kernel test robot <[email protected]>
> >
> > Good catch! Does this following patch address it?
>
> we applied below patch upon next-20211224,
> confirmed no "WARNING:possible_recursive_locking_detected" after patch.
>
> =========================================================================================
> compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type:
> gcc-9/i386-randconfig-a002-20211003/debian-i386-20191205.cgz/300s/vm-snb-i386/default/rcutorture/rcu
>
> commit:
> next-20211224
> 917801238dcca ("rcutorture: Fix rcu_fwd_mutex deadlock")
>
> next-20211224 917801238dccaf61a63ffe77890
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> 3:100 -1% 2:100 dmesg.BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c
> 2:100 -2% :100 dmesg.BUG:workqueue_lockup-pool
> 1:100 -1% :100 dmesg.EIP:clear_user
> 2:100 2% 4:100 dmesg.EIP:rcu_torture_fwd_prog_cr
> 15:100 -3% 12:100 dmesg.EIP:rcutorture_oom_notify
> 2:100 -2% :100 dmesg.Kernel_panic-not_syncing:System_is_deadlocked_on_memory
> 9:100 -5% 4:100 dmesg.Mem-Info
> 2:100 -2% :100 dmesg.Out_of_memory_and_no_killable_processes
> 2:100 2% 4:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcu_torture_fwd_prog_cr[rcutorture]
> 15:100 -3% 12:100 dmesg.WARNING:at_kernel/rcu/rcutorture.c:#rcutorture_oom_notify[rcutorture]
> 15:100 -15% :100 dmesg.WARNING:possible_recursive_locking_detected <--------
> 3:100 -1% 2:100 dmesg.invoked_oom-killer:gfp_mask=0x
> 6:100 -4% 2:100 dmesg.page_allocation_failure:order:#,mode:#(GFP_NOWAIT|__GFP_COMP),nodemask=(null)
> 100:100 0% 100:100 dmesg.sysctl_table_check_failed
Good to hear! May I add your Tested-by?
Many of the remainder appear to be due to memory exhaustion, FWIW.
Thanx, Paul
> > ------------------------------------------------------------------------
> >
> > commit dd47cbdcc2f72ba3df1248fb7fe210acca18d09c
> > Author: Paul E. McKenney <[email protected]>
> > Date: Tue Dec 28 15:59:38 2021 -0800
> >
> > rcutorture: Fix rcu_fwd_mutex deadlock
> >
> > The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is
> > invoked from rcutorture_oom_notify() function, which hold this same
> > mutex across this call. This commit fixes the resulting deadlock.
> >
> > Reported-by: kernel test robot <[email protected]>
> > Signed-off-by: Paul E. McKenney <[email protected]>
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 918a2ea34ba13..9190dce686208 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -2184,7 +2184,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> > for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
> > if (rfp->n_launders_hist[i].n_launders > 0)
> > break;
> > - mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> > pr_alert("%s: Callback-invocation histogram %d (duration %lu jiffies):",
> > __func__, rfp->rcu_fwd_id, jiffies - rfp->rcu_fwd_startat);
> > gps_old = rfp->rcu_launder_gp_seq_start;
> > @@ -2197,7 +2196,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> > gps_old = gps;
> > }
> > pr_cont("\n");
> > - mutex_unlock(&rcu_fwd_mutex);
> > }
> >
> > /* Callback function for continuous-flood RCU callbacks. */
> > @@ -2435,7 +2433,9 @@ static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
> > n_launders, n_launders_sa,
> > n_max_gps, n_max_cbs, cver, gps);
> > atomic_long_add(n_max_cbs, &rcu_fwd_max_cbs);
> > + mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> > rcu_torture_fwd_cb_hist(rfp);
> > + mutex_unlock(&rcu_fwd_mutex);
> > }
> > schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
> > tick_dep_clear_task(current, TICK_DEP_BIT_RCU);
hi Paul,
On Wed, Dec 29, 2021 at 09:24:41AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 29, 2021 at 10:01:21PM +0800, Oliver Sang wrote:
> > hi Paul,
> >
> > we applied below patch upon next-20211224,
> > confirmed no "WARNING:possible_recursive_locking_detected" after patch.
> >
>
> Good to hear! May I add your Tested-by?
sure (:
Tested-by: Oliver Sang <[email protected]>
>
> Many of the remainder appear to be due to memory exhaustion, FWIW.
thanks for information
>
> Thanx, Paul
>
> > > ------------------------------------------------------------------------
> > >
> > > commit dd47cbdcc2f72ba3df1248fb7fe210acca18d09c
> > > Author: Paul E. McKenney <[email protected]>
> > > Date: Tue Dec 28 15:59:38 2021 -0800
> > >
> > > rcutorture: Fix rcu_fwd_mutex deadlock
> > >
> > > The rcu_torture_fwd_cb_hist() function acquires rcu_fwd_mutex, but is
> > > invoked from rcutorture_oom_notify() function, which hold this same
> > > mutex across this call. This commit fixes the resulting deadlock.
> > >
> > > Reported-by: kernel test robot <[email protected]>
> > > Signed-off-by: Paul E. McKenney <[email protected]>
> > >
> > > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > > index 918a2ea34ba13..9190dce686208 100644
> > > --- a/kernel/rcu/rcutorture.c
> > > +++ b/kernel/rcu/rcutorture.c
> > > @@ -2184,7 +2184,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> > > for (i = ARRAY_SIZE(rfp->n_launders_hist) - 1; i > 0; i--)
> > > if (rfp->n_launders_hist[i].n_launders > 0)
> > > break;
> > > - mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> > > pr_alert("%s: Callback-invocation histogram %d (duration %lu jiffies):",
> > > __func__, rfp->rcu_fwd_id, jiffies - rfp->rcu_fwd_startat);
> > > gps_old = rfp->rcu_launder_gp_seq_start;
> > > @@ -2197,7 +2196,6 @@ static void rcu_torture_fwd_cb_hist(struct rcu_fwd *rfp)
> > > gps_old = gps;
> > > }
> > > pr_cont("\n");
> > > - mutex_unlock(&rcu_fwd_mutex);
> > > }
> > >
> > > /* Callback function for continuous-flood RCU callbacks. */
> > > @@ -2435,7 +2433,9 @@ static void rcu_torture_fwd_prog_cr(struct rcu_fwd *rfp)
> > > n_launders, n_launders_sa,
> > > n_max_gps, n_max_cbs, cver, gps);
> > > atomic_long_add(n_max_cbs, &rcu_fwd_max_cbs);
> > > + mutex_lock(&rcu_fwd_mutex); // Serialize histograms.
> > > rcu_torture_fwd_cb_hist(rfp);
> > > + mutex_unlock(&rcu_fwd_mutex);
> > > }
> > > schedule_timeout_uninterruptible(HZ); /* Let CBs drain. */
> > > tick_dep_clear_task(current, TICK_DEP_BIT_RCU);