2019-09-28 12:39:59

by Ingo Molnar

[permalink] [raw]
Subject: [GIT PULL] scheduler fixes

Linus,

Please pull the latest sched-urgent-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-urgent-for-linus

# HEAD: 4892f51ad54ddff2883a60b6ad4323c1f632a9d6 sched/fair: Avoid redundant EAS calculation

The changes are:

- Apply a number of membarrier related fixes and cleanups, which fixes a
use-after-free race in the membarrier code.

- Introduce proper RCU protection for tasks on the runqueue - to get rid
of the subtle task_rcu_dereference() interface that was easy to get
wrong.

- Misc fixes, but also an EAS speedup.

Thanks,

Ingo

------------------>
Eric W. Biederman (4):
tasks: Add a count of task RCU users
tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code
tasks, sched/core: RCUify the assignment of rq->curr

KeMeng Shi (1):
sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()

Mathieu Desnoyers (7):
sched/membarrier: Fix private expedited registration check
sched/membarrier: Remove redundant check
sched/membarrier: Call sync_core only before usermode for same mm
sched/membarrier: Fix p->mm->membarrier_state racy load
selftests, sched/membarrier: Add multi-threaded test
sched/membarrier: Skip IPIs when mm->mm_users == 1
sched/membarrier: Return -ENOMEM to userspace on memory allocation failure

Qian Cai (3):
sched/fair: Remove unused cfs_rq_clock_task() function
sched/core: Convert vcpu_is_preempted() from macro to an inline function
sched/fair: Fix -Wunused-but-set-variable warnings

Quentin Perret (1):
sched/fair: Avoid redundant EAS calculation

Valentin Schneider (2):
sched/core: Fix preempt_schedule() interrupt return comment
sched/core: Remove double update_max_interval() call on CPU startup


fs/exec.c | 2 +-
include/linux/mm_types.h | 14 +-
include/linux/rcuwait.h | 20 +-
include/linux/sched.h | 10 +-
include/linux/sched/mm.h | 10 +-
include/linux/sched/task.h | 2 +-
kernel/exit.c | 74 +------
kernel/fork.c | 8 +-
kernel/sched/core.c | 28 +--
kernel/sched/fair.c | 39 +---
kernel/sched/membarrier.c | 239 +++++++++++++--------
kernel/sched/sched.h | 34 +++
tools/testing/selftests/membarrier/.gitignore | 3 +-
tools/testing/selftests/membarrier/Makefile | 5 +-
.../{membarrier_test.c => membarrier_test_impl.h} | 40 ++--
.../membarrier/membarrier_test_multi_thread.c | 73 +++++++
.../membarrier/membarrier_test_single_thread.c | 24 +++
17 files changed, 375 insertions(+), 250 deletions(-)
rename tools/testing/selftests/membarrier/{membarrier_test.c => membarrier_test_impl.h} (95%)
create mode 100644 tools/testing/selftests/membarrier/membarrier_test_multi_thread.c
create mode 100644 tools/testing/selftests/membarrier/membarrier_test_single_thread.c


2019-09-28 20:51:42

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

The pull request you sent on Sat, 28 Sep 2019 14:39:05 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-urgent-for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9c5efe9ae7df78600c0ee7bcce27516eb687fa6e

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

2019-09-30 23:46:42

by John Stultz

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

On Sat, Sep 28, 2019 at 5:40 AM Ingo Molnar <[email protected]> wrote:
>
> Please pull the latest sched-urgent-for-linus git tree from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git sched-urgent-for-linus
>
> # HEAD: 4892f51ad54ddff2883a60b6ad4323c1f632a9d6 sched/fair: Avoid redundant EAS calculation
>
> The changes are:
>
> - Apply a number of membarrier related fixes and cleanups, which fixes a
> use-after-free race in the membarrier code.
>
> - Introduce proper RCU protection for tasks on the runqueue - to get rid
> of the subtle task_rcu_dereference() interface that was easy to get
> wrong.
>
> - Misc fixes, but also an EAS speedup.
>
> Thanks,
>
> Ingo
>
> ------------------>
> Eric W. Biederman (4):
> tasks: Add a count of task RCU users
> tasks, sched/core: Ensure tasks are available for a grace period after leaving the runqueue
> tasks, sched/core: With a grace period after finish_task_switch(), remove unnecessary code
> tasks, sched/core: RCUify the assignment of rq->curr
>
> KeMeng Shi (1):
> sched/core: Fix migration to invalid CPU in __set_cpus_allowed_ptr()
>
> Mathieu Desnoyers (7):
> sched/membarrier: Fix private expedited registration check
> sched/membarrier: Remove redundant check
> sched/membarrier: Call sync_core only before usermode for same mm
> sched/membarrier: Fix p->mm->membarrier_state racy load
> selftests, sched/membarrier: Add multi-threaded test
> sched/membarrier: Skip IPIs when mm->mm_users == 1
> sched/membarrier: Return -ENOMEM to userspace on memory allocation failure
>
> Qian Cai (3):
> sched/fair: Remove unused cfs_rq_clock_task() function
> sched/core: Convert vcpu_is_preempted() from macro to an inline function
> sched/fair: Fix -Wunused-but-set-variable warnings
>
> Quentin Perret (1):
> sched/fair: Avoid redundant EAS calculation
>
> Valentin Schneider (2):
> sched/core: Fix preempt_schedule() interrupt return comment
> sched/core: Remove double update_max_interval() call on CPU startup

Hey all,
After rebasing my hikey960 patches onto v5.4-rc1, I started seeing
boot hangs/stalls trying boot AOSP:

[ 9.788182] ------------[ cut here ]------------
[ 9.792829] WARNING: CPU: 7 PID: 516 at
kernel/rcu/tree_plugin.h:293 rcu_note_context_switch+0x48/0x4a8
[ 9.802229] Modules linked in:
[ 9.805298] CPU: 7 PID: 516 Comm: Jit thread pool Not tainted
5.3.0-13104-g0dbefe07634f #1126
[ 9.813822] Hardware name: HiKey960 (DT)
[ 9.817742] pstate: 20400085 (nzCv daIf +PAN -UAO)
[ 9.822530] pc : rcu_note_context_switch+0x48/0x4a8
[ 9.827403] lr : rcu_note_context_switch+0x1c/0x4a8
[ 9.832273] sp : ffffffc012ee3a60
[ 9.835581] x29: ffffffc012ee3a60 x28: ffffff82192d4140
[ 9.840889] x27: 0000000000000000 x26: ffffff821f7b38c0
[ 9.846195] x25: 00000000efb51cf8 x24: ffffffc0117ba000
[ 9.851501] x23: 0000000000000000 x22: ffffff82192d4140
[ 9.856806] x21: 0000000000000000 x20: ffffff821f7b38c0
[ 9.862111] x19: ffffff821f7b44c0 x18: 0000000000000000
[ 9.867416] x17: 0000000000000000 x16: 0000000000000000
[ 9.872721] x15: 0000000000000000 x14: 0000000000000000
[ 9.878026] x13: 0000000000000000 x12: 0000000000000000
[ 9.883331] x11: 0000000000000000 x10: 0000000000000000
[ 9.888636] x9 : 0000000000000000 x8 : ffffffc012ee3c60
[ 9.893941] x7 : ffffffc012ee3c70 x6 : ffffff8219026788
[ 9.899246] x5 : 00000000014a2000 x4 : 0000000000000000
[ 9.904551] x3 : ffffffc20e1fe000 x2 : 0000000000000001
[ 9.909856] x1 : ffffffc0117ba428 x0 : 0000000000000023
[ 9.915163] Call trace:
[ 9.917605] rcu_note_context_switch+0x48/0x4a8
[ 9.922134] __schedule+0x90/0x7d8
[ 9.925530] schedule+0x38/0xc0
[ 9.928667] futex_wait_queue_me+0xc0/0x140
[ 9.932847] futex_wait+0xe0/0x210
[ 9.936242] do_futex+0x618/0xdf8
[ 9.939551] __arm64_sys_futex_time32+0xfc/0x148
[ 9.944167] el0_svc_common.constprop.1+0x64/0x188
[ 9.948955] el0_svc_compat_handler+0x18/0x38
[ 9.953307] el0_svc_compat+0x8/0x2c
[ 9.956876] ---[ end trace cdf2ffd45270a24d ]---

Usually followed by:
[ 30.807092] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 30.813207] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P521 P519
[ 30.819998] (detected by 4, t=5255 jiffies, g=169, q=5967)
[ 30.825568] Jit thread pool S 0 521 1 0x00000000
[ 30.831050] Call trace:
[ 30.833498] __switch_to+0xd4/0x230
[ 30.836984] __schedule+0x320/0x7d8
[ 30.840464] schedule+0x38/0xc0
[ 30.843600] futex_wait_queue_me+0xc0/0x140
[ 30.847776] futex_wait+0xe0/0x210
[ 30.851169] do_futex+0x618/0xdf8
[ 30.854476] __arm64_sys_futex+0xfc/0x148
[ 30.858479] el0_svc_common.constprop.1+0x64/0x188
[ 30.863262] el0_svc_handler+0x20/0x80
[ 30.867003] el0_svc+0x8/0xc
[ 30.869876] Jit thread pool S 0 519 1 0x00400000
[ 30.875353] Call trace:
[ 30.877790] __switch_to+0xd4/0x230
[ 30.881271] __schedule+0x320/0x7d8
[ 30.884750] schedule+0x38/0xc0
[ 30.887883] futex_wait_queue_me+0xc0/0x140
[ 30.892057] futex_wait+0xe0/0x210
[ 30.895450] do_futex+0x618/0xdf8
[ 30.898755] __arm64_sys_futex_time32+0xfc/0x148
[ 30.903364] el0_svc_common.constprop.1+0x64/0x188
[ 30.908146] el0_svc_compat_handler+0x18/0x38
[ 30.912494] el0_svc_compat+0x8/0x2c
[ 31.711121] rcu: INFO: rcu_preempt detected expedited stalls on
CPUs/tasks: { P521 P519 } 5440 jiffies s: 77 root: 0x0/T
[ 31.722030] rcu: blocking rcu_node structures:

None of which seems particularly informative as to what might be going
awry. So I bisected the regression down to this merge.

Reverting the following patches:
"sched/membarrier: Return -ENOMEM to userspace on memory allocation failure"
"sched/membarrier: Skip IPIs when mm->mm_users == 1"
"sched/membarrier: Fix p->mm->membarrier_state racy load"

Seems to get things working again, but I've not been able to narrow it
down further yet as I start hitting build issues.

Not sure whats wrong here, but I'm happy to try any patches, or help
with debugging this.

thanks
-john

2019-10-01 07:20:03

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

On Mon, Sep 30, 2019 at 04:45:49PM -0700, John Stultz wrote:
> Reverting the following patches:

> "sched/membarrier: Fix p->mm->membarrier_state racy load"

ARGH, I fudged it... please try:


diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index a39bed2c784f..168479a7d61b 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -174,7 +174,6 @@ static int membarrier_private_expedited(int flags)
*/
if (cpu == raw_smp_processor_id())
continue;
- rcu_read_lock();
p = rcu_dereference(cpu_rq(cpu)->curr);
if (p && p->mm == mm)
__cpumask_set_cpu(cpu, tmpmask);

2019-10-01 18:18:16

by John Stultz

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

On Tue, Oct 1, 2019 at 12:19 AM Peter Zijlstra <[email protected]> wrote:
>
> On Mon, Sep 30, 2019 at 04:45:49PM -0700, John Stultz wrote:
> > Reverting the following patches:
>
> > "sched/membarrier: Fix p->mm->membarrier_state racy load"
>
> ARGH, I fudged it... please try:
>
>
> diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> index a39bed2c784f..168479a7d61b 100644
> --- a/kernel/sched/membarrier.c
> +++ b/kernel/sched/membarrier.c
> @@ -174,7 +174,6 @@ static int membarrier_private_expedited(int flags)
> */
> if (cpu == raw_smp_processor_id())
> continue;
> - rcu_read_lock();
> p = rcu_dereference(cpu_rq(cpu)->curr);
> if (p && p->mm == mm)
> __cpumask_set_cpu(cpu, tmpmask);


Yep. Looks like that solves it!
Tested-by: John Stultz <[email protected]>

Thanks so much for the quick turnaround!
-john

2019-10-02 04:37:46

by Joel Fernandes

[permalink] [raw]
Subject: Re: [GIT PULL] scheduler fixes

On Tue, Oct 01, 2019 at 11:15:01AM -0700, John Stultz wrote:
> On Tue, Oct 1, 2019 at 12:19 AM Peter Zijlstra <[email protected]> wrote:
> >
> > On Mon, Sep 30, 2019 at 04:45:49PM -0700, John Stultz wrote:
> > > Reverting the following patches:
> >
> > > "sched/membarrier: Fix p->mm->membarrier_state racy load"
> >
> > ARGH, I fudged it... please try:
> >
> >
> > diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
> > index a39bed2c784f..168479a7d61b 100644
> > --- a/kernel/sched/membarrier.c
> > +++ b/kernel/sched/membarrier.c
> > @@ -174,7 +174,6 @@ static int membarrier_private_expedited(int flags)
> > */
> > if (cpu == raw_smp_processor_id())
> > continue;
> > - rcu_read_lock();
> > p = rcu_dereference(cpu_rq(cpu)->curr);
> > if (p && p->mm == mm)
> > __cpumask_set_cpu(cpu, tmpmask);
>
>
> Yep. Looks like that solves it!
> Tested-by: John Stultz <[email protected]>

Makes sense.

And here I was wondering yesterday why I was seeing bug reports with
t->read_lock_nesting as non-zero when the task was interrupted in user mode ;-)

Reviewed-by: Joel Fernandes (Google) <[email protected]>

thanks,

- Joel