2022-02-20 16:31:20

by Chengming Zhou

[permalink] [raw]
Subject: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chengming Zhou <[email protected]>
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 4 +---
2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c151413fda..9a109c6ac0e0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,

cpuacct_charge(task, delta_exec);

- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime(cgrp, delta_exec);
- rcu_read_unlock();
}

static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 307800586ac8..f79f88456d72 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;

- rcu_read_lock();
+ lockdep_assert_rq_held(cpu_rq(cpu));

for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
- rcu_read_unlock();
}

/*
--
2.20.1


Subject: [tip: sched/core] sched/cpuacct: Optimize away RCU read lock

The following commit has been merged into the sched/core branch of tip:

Commit-ID: dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Gitweb: https://git.kernel.org/tip/dc6e0818bc9a0336d9accf3ea35d146d72aa7a18
Author: Chengming Zhou <[email protected]>
AuthorDate: Sun, 20 Feb 2022 13:14:25 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 01 Mar 2022 16:18:38 +01:00

sched/cpuacct: Optimize away RCU read lock

Since cpuacct_charge() is called from the scheduler update_curr(),
we must already have rq lock held, then the RCU read lock can
be optimized away.

And do the same thing in it's wrapper cgroup_account_cputime(),
but we can't use lockdep_assert_rq_held() there, which defined
in kernel/sched/sched.h.

Suggested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chengming Zhou <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
include/linux/cgroup.h | 2 --
kernel/sched/cpuacct.c | 4 +---
2 files changed, 1 insertion(+), 5 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c1514..9a109c6 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,

cpuacct_charge(task, delta_exec);

- rcu_read_lock();
cgrp = task_dfl_cgroup(task);
if (cgroup_parent(cgrp))
__cgroup_account_cputime(cgrp, delta_exec);
- rcu_read_unlock();
}

static inline void cgroup_account_cputime_field(struct task_struct *task,
diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
index 3078005..f79f884 100644
--- a/kernel/sched/cpuacct.c
+++ b/kernel/sched/cpuacct.c
@@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
unsigned int cpu = task_cpu(tsk);
struct cpuacct *ca;

- rcu_read_lock();
+ lockdep_assert_rq_held(cpu_rq(cpu));

for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
*per_cpu_ptr(ca->cpuusage, cpu) += cputime;
-
- rcu_read_unlock();
}

/*

2022-03-09 02:14:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> > > > Signed-off-by: Chengming Zhou <[email protected]>
> > >
> > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > found that it triggers a following warning in the early boot stage:
> > >
> > > Calibrating delay loop (skipped), value calculated using timer
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> >
> > Paul?
>
> That would reduce the number of warnings, but it also would hide bugs.
>
> So, are you sure you really want this?

Of course, if you are designing a use case that really expects multiple
types of readers...

Another approach might be rcu_dereference_brs(), but hopefully with a
better name, that checks for either rcu_read_lock(), disabled BH, or
disabled preemption. Or if you are looking only for rcu_read_lock()
or disabled preemption, rcu_dereference_rs(), again hopefully with a
better name.

Is that what you had in mind?

Thanx, Paul


2022-03-09 02:21:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
> > Since cpuacct_charge() is called from the scheduler update_curr(),
> > we must already have rq lock held, then the RCU read lock can
> > be optimized away.
> >
> > And do the same thing in it's wrapper cgroup_account_cputime(),
> > but we can't use lockdep_assert_rq_held() there, which defined
> > in kernel/sched/sched.h.
> >
> > Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> > Signed-off-by: Chengming Zhou <[email protected]>
>
> This patch landed recently in linux-next as commit dc6e0818bc9a
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> found that it triggers a following warning in the early boot stage:
>
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

Arguably, with the flavours folded again, rcu_dereference_check() ought
to default include rcu_read_lock_sched_held() or its equivalent I
suppose.

Paul?

2022-03-09 02:29:06

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > we must already have rq lock held, then the RCU read lock can
> > > be optimized away.
> > >
> > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > but we can't use lockdep_assert_rq_held() there, which defined
> > > in kernel/sched/sched.h.
> > >
> > > Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> > > Signed-off-by: Chengming Zhou <[email protected]>
> >
> > This patch landed recently in linux-next as commit dc6e0818bc9a
> > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > found that it triggers a following warning in the early boot stage:
> >
> > Calibrating delay loop (skipped), value calculated using timer
> > frequency.. 48.00 BogoMIPS (lpj=240000)
> > pid_max: default: 32768 minimum: 301
> > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > CPU: Testing write buffer coherency: ok
> > CPU0: Spectre v2: using BPIALL workaround
> >
> > =============================
> > WARNING: suspicious RCU usage
> > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > -----------------------------
> > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> Arguably, with the flavours folded again, rcu_dereference_check() ought
> to default include rcu_read_lock_sched_held() or its equivalent I
> suppose.
>
> Paul?

That would reduce the number of warnings, but it also would hide bugs.

So, are you sure you really want this?

Thanx, Paul

2022-03-09 02:35:02

by Marek Szyprowski

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On 20.02.2022 06:14, Chengming Zhou wrote:
> Since cpuacct_charge() is called from the scheduler update_curr(),
> we must already have rq lock held, then the RCU read lock can
> be optimized away.
>
> And do the same thing in it's wrapper cgroup_account_cputime(),
> but we can't use lockdep_assert_rq_held() there, which defined
> in kernel/sched/sched.h.
>
> Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Chengming Zhou <[email protected]>

This patch landed recently in linux-next as commit dc6e0818bc9a
("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
found that it triggers a following warning in the early boot stage:

Calibrating delay loop (skipped), value calculated using timer
frequency.. 48.00 BogoMIPS (lpj=240000)
pid_max: default: 32768 minimum: 301
Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
CPU: Testing write buffer coherency: ok
CPU0: Spectre v2: using BPIALL workaround

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
2 locks held by kthreadd/2:
 #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
 #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a
#11458
Hardware name: Samsung Exynos (Flattened Device Tree)
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x58/0x70
 dump_stack_lvl from update_curr+0x1bc/0x35c
 update_curr from dequeue_task_fair+0xb0/0x8e8
 dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
 __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
 __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
 __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
 kthreadd from ret_from_fork+0x14/0x2c
Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
9fa0:                                     00000000 00000000 00000000
00000000
9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
cblist_init_generic: Setting adjustable number of callback queues.
cblist_init_generic: Setting shift to 1 and lim to 1.
Running RCU-tasks wait API self tests
Setting up static identity map for 0x40100000 - 0x40100060
rcu: Hierarchical SRCU implementation.

=============================
WARNING: suspicious RCU usage
5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
-----------------------------
./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!

other info that might help us debug this:


rcu_scheduler_active = 1, debug_locks = 1
1 lock held by migration/0/13:
 #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
raw_spin_rq_lock_nested+0x24/0x34

stack backtrace:
CPU: 0 PID: 13 Comm: migration/0 Not tainted
5.17.0-rc5-00050-gdc6e0818bc9a #11458
Hardware name: Samsung Exynos (Flattened Device Tree)
Stopper: 0x0 <- 0x0
 unwind_backtrace from show_stack+0x10/0x14
 show_stack from dump_stack_lvl+0x58/0x70
 dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
 put_prev_task_stop from __schedule+0x698/0x964
 __schedule from schedule+0x54/0xe0
 schedule from smpboot_thread_fn+0x218/0x288
 smpboot_thread_fn from kthread+0xf0/0x134
 kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc1ccffb0 to 0xc1ccfff8)
ffa0:                                     00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
smp: Bringing up secondary CPUs ...
CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
CPU1: Spectre v2: using BPIALL workaround
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).

The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.

> ---
> include/linux/cgroup.h | 2 --
> kernel/sched/cpuacct.c | 4 +---
> 2 files changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
> index 75c151413fda..9a109c6ac0e0 100644
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>
> cpuacct_charge(task, delta_exec);
>
> - rcu_read_lock();
> cgrp = task_dfl_cgroup(task);
> if (cgroup_parent(cgrp))
> __cgroup_account_cputime(cgrp, delta_exec);
> - rcu_read_unlock();
> }
>
> static inline void cgroup_account_cputime_field(struct task_struct *task,
> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
> index 307800586ac8..f79f88456d72 100644
> --- a/kernel/sched/cpuacct.c
> +++ b/kernel/sched/cpuacct.c
> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
> unsigned int cpu = task_cpu(tsk);
> struct cpuacct *ca;
>
> - rcu_read_lock();
> + lockdep_assert_rq_held(cpu_rq(cpu));
>
> for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
> *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
> -
> - rcu_read_unlock();
> }
>
> /*

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

2022-03-09 03:14:49

by Chengming Zhou

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On 2022/3/9 7:20 上午, Marek Szyprowski wrote:
> On 20.02.2022 06:14, Chengming Zhou wrote:
>> Since cpuacct_charge() is called from the scheduler update_curr(),
>> we must already have rq lock held, then the RCU read lock can
>> be optimized away.
>>
>> And do the same thing in it's wrapper cgroup_account_cputime(),
>> but we can't use lockdep_assert_rq_held() there, which defined
>> in kernel/sched/sched.h.
>>
>> Suggested-by: Peter Zijlstra (Intel) <[email protected]>
>> Signed-off-by: Chengming Zhou <[email protected]>
>
> This patch landed recently in linux-next as commit dc6e0818bc9a
> ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> found that it triggers a following warning in the early boot stage:

Hi, thanks for the report. I've send a fix patch[1] for review.

[1] https://lore.kernel.org/lkml/[email protected]/

>
> Calibrating delay loop (skipped), value calculated using timer
> frequency.. 48.00 BogoMIPS (lpj=240000)
> pid_max: default: 32768 minimum: 301
> Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> CPU: Testing write buffer coherency: ok
> CPU0: Spectre v2: using BPIALL workaround
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 1
> 2 locks held by kthreadd/2:
>  #0: c1d7972c (&p->pi_lock){....}-{2:2}, at: task_rq_lock+0x30/0x118
>  #1: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
> raw_spin_rq_lock_nested+0x24/0x34
>
> stack backtrace:
> CPU: 0 PID: 2 Comm: kthreadd Not tainted 5.17.0-rc5-00050-gdc6e0818bc9a
> #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
>  unwind_backtrace from show_stack+0x10/0x14
>  show_stack from dump_stack_lvl+0x58/0x70
>  dump_stack_lvl from update_curr+0x1bc/0x35c
>  update_curr from dequeue_task_fair+0xb0/0x8e8
>  dequeue_task_fair from __do_set_cpus_allowed+0x19c/0x258
>  __do_set_cpus_allowed from __set_cpus_allowed_ptr_locked+0x130/0x1d8
>  __set_cpus_allowed_ptr_locked from __set_cpus_allowed_ptr+0x48/0x64
>  __set_cpus_allowed_ptr from kthreadd+0x44/0x16c
>  kthreadd from ret_from_fork+0x14/0x2c
> Exception stack(0xc1cb9fb0 to 0xc1cb9ff8)
> 9fa0:                                     00000000 00000000 00000000
> 00000000
> 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> CPU0: thread -1, cpu 0, socket 9, mpidr 80000900
> cblist_init_generic: Setting adjustable number of callback queues.
> cblist_init_generic: Setting shift to 1 and lim to 1.
> Running RCU-tasks wait API self tests
> Setting up static identity map for 0x40100000 - 0x40100060
> rcu: Hierarchical SRCU implementation.
>
> =============================
> WARNING: suspicious RCU usage
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> -----------------------------
> ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
>
> other info that might help us debug this:
>
>
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by migration/0/13:
>  #0: ef7b52d0 (&rq->__lock){-...}-{2:2}, at:
> raw_spin_rq_lock_nested+0x24/0x34
>
> stack backtrace:
> CPU: 0 PID: 13 Comm: migration/0 Not tainted
> 5.17.0-rc5-00050-gdc6e0818bc9a #11458
> Hardware name: Samsung Exynos (Flattened Device Tree)
> Stopper: 0x0 <- 0x0
>  unwind_backtrace from show_stack+0x10/0x14
>  show_stack from dump_stack_lvl+0x58/0x70
>  dump_stack_lvl from put_prev_task_stop+0x16c/0x25c
>  put_prev_task_stop from __schedule+0x698/0x964
>  __schedule from schedule+0x54/0xe0
>  schedule from smpboot_thread_fn+0x218/0x288
>  smpboot_thread_fn from kthread+0xf0/0x134
>  kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc1ccffb0 to 0xc1ccfff8)
> ffa0:                                     00000000 00000000 00000000
> 00000000
> ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 00000000
> ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
> smp: Bringing up secondary CPUs ...
> CPU1: thread -1, cpu 1, socket 9, mpidr 80000901
> CPU1: Spectre v2: using BPIALL workaround
> smp: Brought up 1 node, 2 CPUs
> SMP: Total of 2 processors activated (96.00 BogoMIPS).
>
> The above log comes from ARM 32bit Samsung Exnyos4210 based Trats board.
>
>> ---
>> include/linux/cgroup.h | 2 --
>> kernel/sched/cpuacct.c | 4 +---
>> 2 files changed, 1 insertion(+), 5 deletions(-)
>>
>> diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
>> index 75c151413fda..9a109c6ac0e0 100644
>> --- a/include/linux/cgroup.h
>> +++ b/include/linux/cgroup.h
>> @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task,
>>
>> cpuacct_charge(task, delta_exec);
>>
>> - rcu_read_lock();
>> cgrp = task_dfl_cgroup(task);
>> if (cgroup_parent(cgrp))
>> __cgroup_account_cputime(cgrp, delta_exec);
>> - rcu_read_unlock();
>> }
>>
>> static inline void cgroup_account_cputime_field(struct task_struct *task,
>> diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c
>> index 307800586ac8..f79f88456d72 100644
>> --- a/kernel/sched/cpuacct.c
>> +++ b/kernel/sched/cpuacct.c
>> @@ -337,12 +337,10 @@ void cpuacct_charge(struct task_struct *tsk, u64 cputime)
>> unsigned int cpu = task_cpu(tsk);
>> struct cpuacct *ca;
>>
>> - rcu_read_lock();
>> + lockdep_assert_rq_held(cpu_rq(cpu));
>>
>> for (ca = task_ca(tsk); ca; ca = parent_ca(ca))
>> *per_cpu_ptr(ca->cpuusage, cpu) += cputime;
>> -
>> - rcu_read_unlock();
>> }
>>
>> /*
>
> Best regards

2022-03-10 11:42:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > we must already have rq lock held, then the RCU read lock can
> > > > be optimized away.
> > > >
> > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > in kernel/sched/sched.h.
> > > >
> > > > Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> > > > Signed-off-by: Chengming Zhou <[email protected]>
> > >
> > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > found that it triggers a following warning in the early boot stage:
> > >
> > > Calibrating delay loop (skipped), value calculated using timer
> > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > pid_max: default: 32768 minimum: 301
> > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > CPU: Testing write buffer coherency: ok
> > > CPU0: Spectre v2: using BPIALL workaround
> > >
> > > =============================
> > > WARNING: suspicious RCU usage
> > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > -----------------------------
> > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> >
> > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > to default include rcu_read_lock_sched_held() or its equivalent I
> > suppose.
> >
> > Paul?
>
> That would reduce the number of warnings, but it also would hide bugs.
>
> So, are you sure you really want this?

I don't understand... Since the flavours got merged regular RCU has it's
quescent state held off by preempt_disable. So how can relying on that
cause bugs?

And if we can rely on that, then surely rcu_dereferenced_check() ought
to play by the same rules, otherwise we get silly warnings like these at
hand.

Specifically, we removed the rcu_read_lock() here because this has
rq->lock held, which is a raw_spinlock_t which very much implies preempt
disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
be disabled.

There is no possible way for RCU to make progress.

2022-03-10 23:46:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Thu, Mar 10, 2022 at 09:45:17AM +0100, Peter Zijlstra wrote:
> On Tue, Mar 08, 2022 at 03:44:03PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 09, 2022 at 12:32:25AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 09, 2022 at 12:20:33AM +0100, Marek Szyprowski wrote:
> > > > On 20.02.2022 06:14, Chengming Zhou wrote:
> > > > > Since cpuacct_charge() is called from the scheduler update_curr(),
> > > > > we must already have rq lock held, then the RCU read lock can
> > > > > be optimized away.
> > > > >
> > > > > And do the same thing in it's wrapper cgroup_account_cputime(),
> > > > > but we can't use lockdep_assert_rq_held() there, which defined
> > > > > in kernel/sched/sched.h.
> > > > >
> > > > > Suggested-by: Peter Zijlstra (Intel) <[email protected]>
> > > > > Signed-off-by: Chengming Zhou <[email protected]>
> > > >
> > > > This patch landed recently in linux-next as commit dc6e0818bc9a
> > > > ("sched/cpuacct: Optimize away RCU read lock"). On my test systems I
> > > > found that it triggers a following warning in the early boot stage:
> > > >
> > > > Calibrating delay loop (skipped), value calculated using timer
> > > > frequency.. 48.00 BogoMIPS (lpj=240000)
> > > > pid_max: default: 32768 minimum: 301
> > > > Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear)
> > > > CPU: Testing write buffer coherency: ok
> > > > CPU0: Spectre v2: using BPIALL workaround
> > > >
> > > > =============================
> > > > WARNING: suspicious RCU usage
> > > > 5.17.0-rc5-00050-gdc6e0818bc9a #11458 Not tainted
> > > > -----------------------------
> > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > >
> > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > suppose.
> > >
> > > Paul?
> >
> > That would reduce the number of warnings, but it also would hide bugs.
> >
> > So, are you sure you really want this?
>
> I don't understand... Since the flavours got merged regular RCU has it's
> quescent state held off by preempt_disable. So how can relying on that
> cause bugs?

Somene forgets an rcu_read_lock() and there happens to be something
like a preempt_disable() that by coincidence covers that particular
rcu_dereference(). The kernel therefore doesn't complain. That someone
goes on to other things, maybe even posthumously. Then some time later
the preempt_disable() goes away, for good and sufficient reasons.

Good luck figuring out where to put the needed rcu_read_lock() and
rcu_read_unlock().

> And if we can rely on that, then surely rcu_dereferenced_check() ought
> to play by the same rules, otherwise we get silly warnings like these at
> hand.
>
> Specifically, we removed the rcu_read_lock() here because this has
> rq->lock held, which is a raw_spinlock_t which very much implies preempt
> disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> be disabled.
>
> There is no possible way for RCU to make progress.

Then let's have that particular rcu_dereference_check() explicitly state
what it needs, which seems to be either rcu_read_lock() on the one hand.
Right now, that could be just this:

p = rcu_dereference_check(gp, rcu_read_lock_sched_held());

Or am I missing something here?

Thanx, Paul

2022-03-12 13:03:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:

> > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > >
> > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > suppose.
> > > >
> > > > Paul?
> > >
> > > That would reduce the number of warnings, but it also would hide bugs.
> > >
> > > So, are you sure you really want this?
> >
> > I don't understand... Since the flavours got merged regular RCU has it's
> > quescent state held off by preempt_disable. So how can relying on that
> > cause bugs?
>
> Somene forgets an rcu_read_lock() and there happens to be something
> like a preempt_disable() that by coincidence covers that particular
> rcu_dereference(). The kernel therefore doesn't complain. That someone
> goes on to other things, maybe even posthumously. Then some time later
> the preempt_disable() goes away, for good and sufficient reasons.
>
> Good luck figuring out where to put the needed rcu_read_lock() and
> rcu_read_unlock().

Well, that's software engineering for you. Also in that case the warning
will work as expected. Then figuring out how to fix it is not the
problem of the warning -- that worked as advertised.

(also, I don't think it'll be too hard, you just gotta figure out which
object is rcu protected -- the warning gives you this, where the lookup
happens -- again the warning helps, and how long it's used for, all
relatively well definted things)

I don't see a problem. No bugs hidden.

> > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > to play by the same rules, otherwise we get silly warnings like these at
> > hand.
> >
> > Specifically, we removed the rcu_read_lock() here because this has
> > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > be disabled.
> >
> > There is no possible way for RCU to make progress.
>
> Then let's have that particular rcu_dereference_check() explicitly state
> what it needs, which seems to be either rcu_read_lock() on the one hand.
> Right now, that could be just this:
>
> p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
>
> Or am I missing something here?

That will work; I just don't agree with it. Per the rules of RCU it is
entirely correct to mix rcu_read_lock() and preempt_disable() (or
anything that implies the same). So I strongly feel that
rcu_dereference() should not warn about obviously correct code. Why
would we need to special case this ?

2022-03-12 22:10:59

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v3 2/3] sched/cpuacct: optimize away RCU read lock

On Sat, Mar 12, 2022 at 01:15:33PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 10, 2022 at 07:01:52AM -0800, Paul E. McKenney wrote:
>
> > > > > > ./include/linux/cgroup.h:481 suspicious rcu_dereference_check() usage!
> > > > >
> > > > > Arguably, with the flavours folded again, rcu_dereference_check() ought
> > > > > to default include rcu_read_lock_sched_held() or its equivalent I
> > > > > suppose.
> > > > >
> > > > > Paul?
> > > >
> > > > That would reduce the number of warnings, but it also would hide bugs.
> > > >
> > > > So, are you sure you really want this?
> > >
> > > I don't understand... Since the flavours got merged regular RCU has it's
> > > quescent state held off by preempt_disable. So how can relying on that
> > > cause bugs?
> >
> > Somene forgets an rcu_read_lock() and there happens to be something
> > like a preempt_disable() that by coincidence covers that particular
> > rcu_dereference(). The kernel therefore doesn't complain. That someone
> > goes on to other things, maybe even posthumously. Then some time later
> > the preempt_disable() goes away, for good and sufficient reasons.
> >
> > Good luck figuring out where to put the needed rcu_read_lock() and
> > rcu_read_unlock().
>
> Well, that's software engineering for you.

My point exactly!!!

> Also in that case the warning
> will work as expected. Then figuring out how to fix it is not the
> problem of the warning -- that worked as advertised.
>
> (also, I don't think it'll be too hard, you just gotta figure out which
> object is rcu protected -- the warning gives you this, where the lookup
> happens -- again the warning helps, and how long it's used for, all
> relatively well definted things)

Without in any way agreeing with that assessment of difficulty, especially
in the general case... It is -way- easier just to tell RCU what your
design rules are for the code in question.

> I don't see a problem. No bugs hidden.

C'mon, Peter!

There really was a bug hidden. That someone intended to add some
calls to rcu_read_lock() and rcu_read_unlock() in the proper places.
Their failure to add them really was a bug.

That bug was hidden by: (1) There being a preempt_disable() or
whatever that by coincidence happened to be covering the part of the
code containing the rcu_dereference() and (2) Your proposed change that
would make rcu_dereference() unable to detect that bug.

And that bug can be quite bad. Given your proposed change, RCU
cannot detect this bug:


/* Preemption is enabled. */
/* There should be an rcu_read_lock() here. */
preempt_disable();
p = rcu_dereference(gp);
do_something_with(p);
preempt_enable();
/* Without the rcu_read_lock(), *p is history. */
do_something_else_with(p);
/* There should be an rcu_read_unlock() here. */

> > > And if we can rely on that, then surely rcu_dereferenced_check() ought
> > > to play by the same rules, otherwise we get silly warnings like these at
> > > hand.
> > >
> > > Specifically, we removed the rcu_read_lock() here because this has
> > > rq->lock held, which is a raw_spinlock_t which very much implies preempt
> > > disable, on top of that, it's also an IRQ-safe lock and thus IRQs will
> > > be disabled.
> > >
> > > There is no possible way for RCU to make progress.
> >
> > Then let's have that particular rcu_dereference_check() explicitly state
> > what it needs, which seems to be either rcu_read_lock() on the one hand.
> > Right now, that could be just this:
> >
> > p = rcu_dereference_check(gp, rcu_read_lock_sched_held());
> >
> > Or am I missing something here?
>
> That will work; I just don't agree with it. Per the rules of RCU it is
> entirely correct to mix rcu_read_lock() and preempt_disable() (or
> anything that implies the same). So I strongly feel that
> rcu_dereference() should not warn about obviously correct code. Why
> would we need to special case this ?

This use case might well be entirely correct, but it is most certainly
not the common case.

Therefore, my answer to this requested chance in rcu_dereference()
semantics is "no".

Thanx, Paul