2022-09-01 23:17:34

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes

Here is v5 of call_rcu_lazy() based on the latest RCU -dev branch. Main changes are:
- moved length field into rcu_data (Frederic suggestion)
- added new traces to aid debugging and testing.
- the new trace patch (along with the rcuscale and rcutorture tests)
gives confidence that the patches work well. Also it is tested on
real ChromeOS hardware and the boot time is looking good even though
lazy callbacks are being queued (i.e. the lazy ones do not effect the
synchronous non-lazy ones that effect boot time)
- rewrote some parts of the core patch.
- for rcutop, please apply the diff in the following link to the BCC repo:
https://lore.kernel.org/r/[email protected]
Then, cd libbpf-tools/ and run make to build the rcutop static binary.
(If you need an x86 binary, ping me and I'll email you).
In the future, I will attempt to make rcutop built within the kernel repo.
This is already done for another tool (see tools/bpf/runqslower) so is doable.

The 2 mm patches are what Vlastimil pulled into slab-next. I included them in
this series so that the tracing patch builds.

Previous series was posted here:
https://lore.kernel.org/all/[email protected]/

Linked below [1] is some power data I collected with Turbostat on an x86
ChromeOS ADL machine. The numbers are not based on -next, but rather 5.19
kernel as that's what booted on my ChromeOS machine).

These are output by Turbostat, by running:
turbostat -S -s PkgWatt,CorWatt --interval 5
PkgWatt - summary of package power in Watts 5 second interval.
CoreWatt - summary of core power in Watts 5 second interval.

[1] https://lore.kernel.org/r/[email protected]

Joel Fernandes (Google) (15):
rcu/tree: Use READ_ONCE() for lockless read of rnp->qsmask
rcu: Fix late wakeup when flush of bypass cblist happens
rcu: Move trace_rcu_callback() before bypassing
rcu: Introduce call_rcu_lazy() API implementation
rcu: Add per-CB tracing for queuing, flush and invocation.
rcuscale: Add laziness and kfree tests
rcutorture: Add test code for call_rcu_lazy()
fs: Move call_rcu() to call_rcu_lazy() in some paths
cred: Move call_rcu() to call_rcu_lazy()
security: Move call_rcu() to call_rcu_lazy()
net/core: Move call_rcu() to call_rcu_lazy()
kernel: Move various core kernel usages to call_rcu_lazy()
lib: Move call_rcu() to call_rcu_lazy()
i915: Move call_rcu() to call_rcu_lazy()
fork: Move thread_stack_free_rcu() to call_rcu_lazy()

Vineeth Pillai (1):
rcu: shrinker for lazy rcu

Vlastimil Babka (2):
mm/slub: perform free consistency checks before call_rcu
mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head

drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 +-
fs/dcache.c | 4 +-
fs/eventpoll.c | 2 +-
fs/file_table.c | 2 +-
fs/inode.c | 2 +-
include/linux/rcupdate.h | 6 +
include/linux/types.h | 44 +++
include/trace/events/rcu.h | 69 ++++-
kernel/cred.c | 2 +-
kernel/exit.c | 2 +-
kernel/fork.c | 6 +-
kernel/pid.c | 2 +-
kernel/rcu/Kconfig | 19 ++
kernel/rcu/rcu.h | 12 +
kernel/rcu/rcu_segcblist.c | 23 +-
kernel/rcu/rcu_segcblist.h | 8 +
kernel/rcu/rcuscale.c | 74 ++++-
kernel/rcu/rcutorture.c | 60 +++-
kernel/rcu/tree.c | 187 ++++++++----
kernel/rcu/tree.h | 13 +-
kernel/rcu/tree_nocb.h | 282 +++++++++++++++---
kernel/time/posix-timers.c | 2 +-
lib/radix-tree.c | 2 +-
lib/xarray.c | 2 +-
mm/slab.h | 54 ++--
mm/slub.c | 20 +-
net/core/dst.c | 2 +-
security/security.c | 2 +-
security/selinux/avc.c | 4 +-
.../selftests/rcutorture/configs/rcu/CFLIST | 1 +
.../selftests/rcutorture/configs/rcu/TREE11 | 18 ++
.../rcutorture/configs/rcu/TREE11.boot | 8 +
32 files changed, 783 insertions(+), 153 deletions(-)
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot

--
2.37.2.789.g6183377224-goog


2022-09-01 23:18:38

by Joel Fernandes

[permalink] [raw]
Subject: [PATCH v5 05/18] rcu: Move trace_rcu_callback() before bypassing

If any CB is queued into the bypass list, then trace_rcu_callback() does
not show it. This makes it not clear when a callback was actually
queued, as you only end up getting a trace_rcu_invoke_callback() trace.
Fix it by moving trace_rcu_callback() before
trace_rcu_nocb_try_bypass().

Signed-off-by: Joel Fernandes (Google) <[email protected]>
---
kernel/rcu/tree.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 5ec97e3f7468..9fe581be8696 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2809,10 +2809,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
}

check_cb_ovld(rdp);
- if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
- return; // Enqueued onto ->nocb_bypass, so just leave.
- // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
- rcu_segcblist_enqueue(&rdp->cblist, head);
+
if (__is_kvfree_rcu_offset((unsigned long)func))
trace_rcu_kvfree_callback(rcu_state.name, head,
(unsigned long)func,
@@ -2821,6 +2818,11 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func)
trace_rcu_callback(rcu_state.name, head,
rcu_segcblist_n_cbs(&rdp->cblist));

+ if (rcu_nocb_try_bypass(rdp, head, &was_alldone, flags))
+ return; // Enqueued onto ->nocb_bypass, so just leave.
+ // If no-CBs CPU gets here, rcu_nocb_try_bypass() acquired ->nocb_lock.
+ rcu_segcblist_enqueue(&rdp->cblist, head);
+
trace_rcu_segcb_stats(&rdp->cblist, TPS("SegCBQueued"));

/* Go handle any RCU core processing required. */
--
2.37.2.789.g6183377224-goog

2022-09-03 15:49:42

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes

On Thu, Sep 01, 2022 at 10:17:02PM +0000, Joel Fernandes (Google) wrote:
> Here is v5 of call_rcu_lazy() based on the latest RCU -dev branch. Main changes are:
> - moved length field into rcu_data (Frederic suggestion)
> - added new traces to aid debugging and testing.
> - the new trace patch (along with the rcuscale and rcutorture tests)
> gives confidence that the patches work well. Also it is tested on
> real ChromeOS hardware and the boot time is looking good even though
> lazy callbacks are being queued (i.e. the lazy ones do not effect the
> synchronous non-lazy ones that effect boot time)
> - rewrote some parts of the core patch.
> - for rcutop, please apply the diff in the following link to the BCC repo:
> https://lore.kernel.org/r/[email protected]
> Then, cd libbpf-tools/ and run make to build the rcutop static binary.
> (If you need an x86 binary, ping me and I'll email you).
> In the future, I will attempt to make rcutop built within the kernel repo.
> This is already done for another tool (see tools/bpf/runqslower) so is doable.
>
> The 2 mm patches are what Vlastimil pulled into slab-next. I included them in
> this series so that the tracing patch builds.
>
> Previous series was posted here:
> https://lore.kernel.org/all/[email protected]/
>
> Linked below [1] is some power data I collected with Turbostat on an x86
> ChromeOS ADL machine. The numbers are not based on -next, but rather 5.19
> kernel as that's what booted on my ChromeOS machine).
>
> These are output by Turbostat, by running:
> turbostat -S -s PkgWatt,CorWatt --interval 5
> PkgWatt - summary of package power in Watts 5 second interval.
> CoreWatt - summary of core power in Watts 5 second interval.
>
> [1] https://lore.kernel.org/r/[email protected]

Thank you for all your work on this!

I have pulled these in for testing and review. Some work is required
for them to be ready for mainline.

> Joel Fernandes (Google) (15):

I took these:

> rcu/tree: Use READ_ONCE() for lockless read of rnp->qsmask
> rcu: Fix late wakeup when flush of bypass cblist happens
> rcu: Move trace_rcu_callback() before bypassing
> rcu: Introduce call_rcu_lazy() API implementation

You and Frederic need to come to agreement on "rcu: Fix late wakeup when
flush of bypass cblist happens".

These have some difficulties, so I put them on top of the stack:

> rcu: Add per-CB tracing for queuing, flush and invocation.
This one breaks 32-bit MIPS.
> rcuscale: Add laziness and kfree tests
I am concerned that this one has OOM problems.
> rcutorture: Add test code for call_rcu_lazy()
This one does not need a separate scenario, but instead a separate
rcutorture.gp_lazy boot parameter. For a rough template for this
sort of change, please see the rcutorture changes in this commit:

76ea364161e7 ("rcu: Add full-sized polling for start_poll()")

The key point is that every rcutorture.torture_type=rcu test then
becomes a call_rcu_lazy() test. (Unless explicitly overridden.)

I took these, though they need go up their respective trees or get acks
from their respective maintainers.

> fs: Move call_rcu() to call_rcu_lazy() in some paths
> cred: Move call_rcu() to call_rcu_lazy()
> security: Move call_rcu() to call_rcu_lazy()
> net/core: Move call_rcu() to call_rcu_lazy()
> kernel: Move various core kernel usages to call_rcu_lazy()
> lib: Move call_rcu() to call_rcu_lazy()
> i915: Move call_rcu() to call_rcu_lazy()
> fork: Move thread_stack_free_rcu() to call_rcu_lazy()
>
> Vineeth Pillai (1):

I took this:

> rcu: shrinker for lazy rcu
>
> Vlastimil Babka (2):

As noted earlier, I took these strictly for testing. I do not expect
to push them into -next, let alone into mainline.

> mm/slub: perform free consistency checks before call_rcu
> mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head

These all are on -rcu not-yet-for-mainline branch lazy.2022.09.03b.

Thanx, Paul

> drivers/gpu/drm/i915/gem/i915_gem_object.c | 2 +-
> fs/dcache.c | 4 +-
> fs/eventpoll.c | 2 +-
> fs/file_table.c | 2 +-
> fs/inode.c | 2 +-
> include/linux/rcupdate.h | 6 +
> include/linux/types.h | 44 +++
> include/trace/events/rcu.h | 69 ++++-
> kernel/cred.c | 2 +-
> kernel/exit.c | 2 +-
> kernel/fork.c | 6 +-
> kernel/pid.c | 2 +-
> kernel/rcu/Kconfig | 19 ++
> kernel/rcu/rcu.h | 12 +
> kernel/rcu/rcu_segcblist.c | 23 +-
> kernel/rcu/rcu_segcblist.h | 8 +
> kernel/rcu/rcuscale.c | 74 ++++-
> kernel/rcu/rcutorture.c | 60 +++-
> kernel/rcu/tree.c | 187 ++++++++----
> kernel/rcu/tree.h | 13 +-
> kernel/rcu/tree_nocb.h | 282 +++++++++++++++---
> kernel/time/posix-timers.c | 2 +-
> lib/radix-tree.c | 2 +-
> lib/xarray.c | 2 +-
> mm/slab.h | 54 ++--
> mm/slub.c | 20 +-
> net/core/dst.c | 2 +-
> security/security.c | 2 +-
> security/selinux/avc.c | 4 +-
> .../selftests/rcutorture/configs/rcu/CFLIST | 1 +
> .../selftests/rcutorture/configs/rcu/TREE11 | 18 ++
> .../rcutorture/configs/rcu/TREE11.boot | 8 +
> 32 files changed, 783 insertions(+), 153 deletions(-)
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11
> create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot
>
> --
> 2.37.2.789.g6183377224-goog
>