2024-06-09 08:28:14

by Julia Lawall

[permalink] [raw]
Subject: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed, it is not necessary to use call_rcu
when the callback only performs kmem_cache_free. Use
kfree_rcu() directly.

The changes were done using the following Coccinelle semantic patch.
This semantic patch is designed to ignore cases where the callback
function is used in another way.

// <smpl>
@r@
expression e;
local idexpression e2;
identifier cb,f;
position p;
@@

(
call_rcu(...,e2)
|
call_rcu(&e->f,cb@p)
)

@r1@
type T;
identifier x,r.cb;
@@

cb(...) {
(
kmem_cache_free(...);
|
T x = ...;
kmem_cache_free(...,x);
|
T x;
x = ...;
kmem_cache_free(...,x);
)
}

@s depends on r1@
position p != r.p;
identifier r.cb;
@@

cb@p

@script:ocaml@
cb << r.cb;
p << s.p;
@@

Printf.eprintf "Other use of %s at %s:%d\n"
cb (List.hd p).file (List.hd p).line

@depends on r1 && !s@
expression e;
identifier r.cb,f;
position r.p;
@@

- call_rcu(&e->f,cb@p)
+ kfree_rcu(e,f)

@r1a depends on !s@
type T;
identifier x,r.cb;
@@

- cb(...) {
(
- kmem_cache_free(...);
|
- T x = ...;
- kmem_cache_free(...,x);
|
- T x;
- x = ...;
- kmem_cache_free(...,x);
)
- }
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>

---

arch/powerpc/kvm/book3s_mmu_hpte.c | 8 +-------
block/blk-ioc.c | 9 +--------
drivers/net/wireguard/allowedips.c | 9 ++-------
fs/ecryptfs/dentry.c | 8 +-------
fs/nfsd/nfs4state.c | 9 +--------
fs/tracefs/inode.c | 10 +---------
kernel/time/posix-timers.c | 9 +--------
kernel/workqueue.c | 8 +-------
net/bridge/br_fdb.c | 9 +--------
net/can/gw.c | 13 +++----------
net/ipv4/fib_trie.c | 8 +-------
net/ipv4/inetpeer.c | 9 ++-------
net/ipv6/ip6_fib.c | 9 +--------
net/ipv6/xfrm6_tunnel.c | 8 +-------
net/kcm/kcmsock.c | 10 +---------
net/netfilter/nf_conncount.c | 10 +---------
net/netfilter/nf_conntrack_expect.c | 10 +---------
net/netfilter/xt_hashlimit.c | 9 +--------
18 files changed, 22 insertions(+), 143 deletions(-)


2024-06-09 08:30:54

by Julia Lawall

[permalink] [raw]
Subject: [PATCH 08/14] nfsd: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

Since SLOB was removed, it is not necessary to use call_rcu
when the callback only performs kmem_cache_free. Use
kfree_rcu() directly.

The changes were done using the following Coccinelle semantic patch.
This semantic patch is designed to ignore cases where the callback
function is used in another way.

// <smpl>
@r@
expression e;
local idexpression e2;
identifier cb,f;
position p;
@@

(
call_rcu(...,e2)
|
call_rcu(&e->f,cb@p)
)

@r1@
type T;
identifier x,r.cb;
@@

cb(...) {
(
kmem_cache_free(...);
|
T x = ...;
kmem_cache_free(...,x);
|
T x;
x = ...;
kmem_cache_free(...,x);
)
}

@s depends on r1@
position p != r.p;
identifier r.cb;
@@

cb@p

@script:ocaml@
cb << r.cb;
p << s.p;
@@

Printf.eprintf "Other use of %s at %s:%d\n"
cb (List.hd p).file (List.hd p).line

@depends on r1 && !s@
expression e;
identifier r.cb,f;
position r.p;
@@

- call_rcu(&e->f,cb@p)
+ kfree_rcu(e,f)

@r1a depends on !s@
type T;
identifier x,r.cb;
@@

- cb(...) {
(
- kmem_cache_free(...);
|
- T x = ...;
- kmem_cache_free(...,x);
|
- T x;
- x = ...;
- kmem_cache_free(...,x);
)
- }
// </smpl>

Signed-off-by: Julia Lawall <[email protected]>
Reviewed-by: Paul E. McKenney <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>

---
fs/nfsd/nfs4state.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a20c2c9d7d45..eba5083504c7 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -571,13 +571,6 @@ opaque_hashval(const void *ptr, int nbytes)
return x;
}

-static void nfsd4_free_file_rcu(struct rcu_head *rcu)
-{
- struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu);
-
- kmem_cache_free(file_slab, fp);
-}
-
void
put_nfs4_file(struct nfs4_file *fi)
{
@@ -585,7 +578,7 @@ put_nfs4_file(struct nfs4_file *fi)
nfsd4_file_hash_remove(fi);
WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
- call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
+ kfree_rcu(fi, fi_rcu);
}
}



2024-06-09 15:44:03

by Chuck Lever III

[permalink] [raw]
Subject: Re: [PATCH 08/14] nfsd: replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Sun, Jun 09, 2024 at 10:27:20AM +0200, Julia Lawall wrote:
> Since SLOB was removed, it is not necessary to use call_rcu
> when the callback only performs kmem_cache_free. Use
> kfree_rcu() directly.
>
> The changes were done using the following Coccinelle semantic patch.
> This semantic patch is designed to ignore cases where the callback
> function is used in another way.
>
> // <smpl>
> @r@
> expression e;
> local idexpression e2;
> identifier cb,f;
> position p;
> @@
>
> (
> call_rcu(...,e2)
> |
> call_rcu(&e->f,cb@p)
> )
>
> @r1@
> type T;
> identifier x,r.cb;
> @@
>
> cb(...) {
> (
> kmem_cache_free(...);
> |
> T x = ...;
> kmem_cache_free(...,x);
> |
> T x;
> x = ...;
> kmem_cache_free(...,x);
> )
> }
>
> @s depends on r1@
> position p != r.p;
> identifier r.cb;
> @@
>
> cb@p
>
> @script:ocaml@
> cb << r.cb;
> p << s.p;
> @@
>
> Printf.eprintf "Other use of %s at %s:%d\n"
> cb (List.hd p).file (List.hd p).line
>
> @depends on r1 && !s@
> expression e;
> identifier r.cb,f;
> position r.p;
> @@
>
> - call_rcu(&e->f,cb@p)
> + kfree_rcu(e,f)
>
> @r1a depends on !s@
> type T;
> identifier x,r.cb;
> @@
>
> - cb(...) {
> (
> - kmem_cache_free(...);
> |
> - T x = ...;
> - kmem_cache_free(...,x);
> |
> - T x;
> - x = ...;
> - kmem_cache_free(...,x);
> )
> - }
> // </smpl>
>
> Signed-off-by: Julia Lawall <[email protected]>
> Reviewed-by: Paul E. McKenney <[email protected]>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> ---
> fs/nfsd/nfs4state.c | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a20c2c9d7d45..eba5083504c7 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -571,13 +571,6 @@ opaque_hashval(const void *ptr, int nbytes)
> return x;
> }
>
> -static void nfsd4_free_file_rcu(struct rcu_head *rcu)
> -{
> - struct nfs4_file *fp = container_of(rcu, struct nfs4_file, fi_rcu);
> -
> - kmem_cache_free(file_slab, fp);
> -}
> -
> void
> put_nfs4_file(struct nfs4_file *fi)
> {
> @@ -585,7 +578,7 @@ put_nfs4_file(struct nfs4_file *fi)
> nfsd4_file_hash_remove(fi);
> WARN_ON_ONCE(!list_empty(&fi->fi_clnt_odstate));
> WARN_ON_ONCE(!list_empty(&fi->fi_delegations));
> - call_rcu(&fi->fi_rcu, nfsd4_free_file_rcu);
> + kfree_rcu(fi, fi_rcu);
> }
> }
>
>

Acked-by: Chuck Lever <[email protected]>

--
Chuck Lever

2024-06-12 21:33:19

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> Since SLOB was removed, it is not necessary to use call_rcu
> when the callback only performs kmem_cache_free. Use
> kfree_rcu() directly.
>
> The changes were done using the following Coccinelle semantic patch.
> This semantic patch is designed to ignore cases where the callback
> function is used in another way.

How does the discussion on:
[PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
https://lore.kernel.org/all/[email protected]/
reflect on this series? IIUC we should hold off..

2024-06-12 22:38:04

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > Since SLOB was removed, it is not necessary to use call_rcu
> > when the callback only performs kmem_cache_free. Use
> > kfree_rcu() directly.
> >
> > The changes were done using the following Coccinelle semantic patch.
> > This semantic patch is designed to ignore cases where the callback
> > function is used in another way.
>
> How does the discussion on:
> [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> https://lore.kernel.org/all/[email protected]/
> reflect on this series? IIUC we should hold off..

We do need to hold off for the ones in kernel modules (such as 07/14)
where the kmem_cache is destroyed during module unload.

OK, I might as well go through them...

[PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Needs to wait, see wg_allowedips_slab_uninit().

[PATCH 02/14] net: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't immediately see the rcu_barrier(), but if there isn't
one in there somewhere there probably should be. Caution
suggests a need to wait.

[PATCH 03/14] KVM: PPC: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't immediately see the rcu_barrier(), but if there isn't
one in there somewhere there probably should be. Caution
suggests a need to wait.

[PATCH 04/14] xfrm6_tunnel: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Needs to wait, see xfrm6_tunnel_fini().

[PATCH 05/14] tracefs: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
This one is fine because the tracefs_inode_cachep kmem_cache
is created at boot and never destroyed.

[PATCH 06/14] eCryptfs: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't see a kmem_cache_destroy(), but then again, I also don't
see the kmem_cache_create(). Unless someone can see what I am
not seeing, let's wait.

[PATCH 07/14] net: bridge: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Needs to wait, see br_fdb_fini() and br_deinit().

[PATCH 08/14] nfsd: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't immediately see the rcu_barrier(), but if there isn't
one in there somewhere there probably should be. Caution
suggests a need to wait.

[PATCH 09/14] block: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't see a kmem_cache_destroy(), but then again, I also don't
see the kmem_cache_create(). Unless someone can see what I am
not seeing, let's wait.

[PATCH 10/14] can: gw: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Needs to wait, see cgw_module_exit().

[PATCH 11/14] posix-timers: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
This one is fine because the posix_timers_cache kmem_cache is
created at boot and never destroyed.

[PATCH 12/14] workqueue: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
This one is fine because the pwq_cache kmem_cache is created at
boot and never destroyed.

[PATCH 13/14] kcm: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
I don't immediately see the rcu_barrier(), but if there isn't
one in there somewhere there probably should be. Caution
suggests a need to wait.

[PATCH 14/14] netfilter: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
Needs to wait, see hashlimit_mt_exit().

So 05/14, 11/14 and 12/14 are OK and can go ahead. The rest need some
help.

Apologies for my having gotten overly enthusiastic about this change!

Thanx, Paul

2024-06-12 22:46:31

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, 12 Jun 2024 15:37:55 -0700 Paul E. McKenney wrote:
> So 05/14, 11/14 and 12/14 are OK and can go ahead. The rest need some
> help.

Thank you for the breakdown!

2024-06-12 22:53:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On 6/12/24 4:37 PM, Paul E. McKenney wrote:
> [PATCH 09/14] block: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> I don't see a kmem_cache_destroy(), but then again, I also don't
> see the kmem_cache_create(). Unless someone can see what I am
> not seeing, let's wait.

It's in that same file:

blk_ioc_init()

the cache itself never goes away, as the ioc code is not unloadable. So
I think the change there should be fine.

--
Jens Axboe


2024-06-12 23:04:26

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, Jun 12, 2024 at 04:52:57PM -0600, Jens Axboe wrote:
> On 6/12/24 4:37 PM, Paul E. McKenney wrote:
> > [PATCH 09/14] block: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > I don't see a kmem_cache_destroy(), but then again, I also don't
> > see the kmem_cache_create(). Unless someone can see what I am
> > not seeing, let's wait.
>
> It's in that same file:
>
> blk_ioc_init()
>
> the cache itself never goes away, as the ioc code is not unloadable. So
> I think the change there should be fine.

Thank you, Jens! (And to Jakub for motivating me to go look.)

So to update the scorecared, 05/14, 09/14, 11/14 and 12/14 are OK and
can go ahead.

Thanx, Paul

2024-06-12 23:32:15

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > Since SLOB was removed, it is not necessary to use call_rcu
> > > when the callback only performs kmem_cache_free. Use
> > > kfree_rcu() directly.
> > >
> > > The changes were done using the following Coccinelle semantic patch.
> > > This semantic patch is designed to ignore cases where the callback
> > > function is used in another way.
> >
> > How does the discussion on:
> > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > https://lore.kernel.org/all/[email protected]/
> > reflect on this series? IIUC we should hold off..
>
> We do need to hold off for the ones in kernel modules (such as 07/14)
> where the kmem_cache is destroyed during module unload.
>
> OK, I might as well go through them...
>
> [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> Needs to wait, see wg_allowedips_slab_uninit().

Right, this has exactly the same pattern as the batman-adv issue:

void wg_allowedips_slab_uninit(void)
{
rcu_barrier();
kmem_cache_destroy(node_cache);
}

I'll hold off on sending that up until this matter is resolved.

Jason

2024-06-13 00:32:16

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 01:31:57AM +0200, Jason A. Donenfeld wrote:
> On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > when the callback only performs kmem_cache_free. Use
> > > > kfree_rcu() directly.
> > > >
> > > > The changes were done using the following Coccinelle semantic patch.
> > > > This semantic patch is designed to ignore cases where the callback
> > > > function is used in another way.
> > >
> > > How does the discussion on:
> > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > https://lore.kernel.org/all/[email protected]/
> > > reflect on this series? IIUC we should hold off..
> >
> > We do need to hold off for the ones in kernel modules (such as 07/14)
> > where the kmem_cache is destroyed during module unload.
> >
> > OK, I might as well go through them...
> >
> > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > Needs to wait, see wg_allowedips_slab_uninit().
>
> Right, this has exactly the same pattern as the batman-adv issue:
>
> void wg_allowedips_slab_uninit(void)
> {
> rcu_barrier();
> kmem_cache_destroy(node_cache);
> }
>
> I'll hold off on sending that up until this matter is resolved.

BTW, I think this whole thing might be caused by:

a35d16905efc ("rcu: Add basic support for kfree_rcu() batching")

The commit message there mentions:

There is an implication with rcu_barrier() with this patch. Since the
kfree_rcu() calls can be batched, and may not be handed yet to the RCU
machinery in fact, the monitor may not have even run yet to do the
queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
to wait for those kfree_rcu()s that are already made. So this means a
kfree_rcu() followed by an rcu_barrier() does not imply that memory will
be freed once rcu_barrier() returns.

Before that, a kfree_rcu() used to just add a normal call_rcu() into the
list, but with the function offset < 4096 as a special marker. So the
kfree_rcu() calls would be treated alongside the other call_rcu() ones
and thus affected by rcu_barrier(). Looks like that behavior is no more
since this commit.

Rather than getting rid of the batching, which seems good for
efficiency, I wonder if the right fix to this would be adding a
`should_destroy` boolean to kmem_cache, which kmem_cache_destroy() sets
to true. And then right after it checks `if (number_of_allocations == 0)
actually_destroy()`, and likewise on each kmem_cache_free(), it could
check `if (should_destroy && number_of_allocations == 0)
actually_destroy()`. This way, the work is delayed until it's safe to do
so. This might also mitigate other lurking bugs of bad code that calls
kmem_cache_destroy() before kmem_cache_free().

Jason

2024-06-13 03:39:14

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 02:31:53AM +0200, Jason A. Donenfeld wrote:
> On Thu, Jun 13, 2024 at 01:31:57AM +0200, Jason A. Donenfeld wrote:
> > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > when the callback only performs kmem_cache_free. Use
> > > > > kfree_rcu() directly.
> > > > >
> > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > This semantic patch is designed to ignore cases where the callback
> > > > > function is used in another way.
> > > >
> > > > How does the discussion on:
> > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > https://lore.kernel.org/all/[email protected]/
> > > > reflect on this series? IIUC we should hold off..
> > >
> > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > where the kmem_cache is destroyed during module unload.
> > >
> > > OK, I might as well go through them...
> > >
> > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > Needs to wait, see wg_allowedips_slab_uninit().
> >
> > Right, this has exactly the same pattern as the batman-adv issue:
> >
> > void wg_allowedips_slab_uninit(void)
> > {
> > rcu_barrier();
> > kmem_cache_destroy(node_cache);
> > }
> >
> > I'll hold off on sending that up until this matter is resolved.
>
> BTW, I think this whole thing might be caused by:
>
> a35d16905efc ("rcu: Add basic support for kfree_rcu() batching")
>
> The commit message there mentions:
>
> There is an implication with rcu_barrier() with this patch. Since the
> kfree_rcu() calls can be batched, and may not be handed yet to the RCU
> machinery in fact, the monitor may not have even run yet to do the
> queue_rcu_work(), there seems no easy way of implementing rcu_barrier()
> to wait for those kfree_rcu()s that are already made. So this means a
> kfree_rcu() followed by an rcu_barrier() does not imply that memory will
> be freed once rcu_barrier() returns.
>
> Before that, a kfree_rcu() used to just add a normal call_rcu() into the
> list, but with the function offset < 4096 as a special marker. So the
> kfree_rcu() calls would be treated alongside the other call_rcu() ones
> and thus affected by rcu_barrier(). Looks like that behavior is no more
> since this commit.

You might well be right, and thank you for digging into this!

> Rather than getting rid of the batching, which seems good for
> efficiency, I wonder if the right fix to this would be adding a
> `should_destroy` boolean to kmem_cache, which kmem_cache_destroy() sets
> to true. And then right after it checks `if (number_of_allocations == 0)
> actually_destroy()`, and likewise on each kmem_cache_free(), it could
> check `if (should_destroy && number_of_allocations == 0)
> actually_destroy()`. This way, the work is delayed until it's safe to do
> so. This might also mitigate other lurking bugs of bad code that calls
> kmem_cache_destroy() before kmem_cache_free().

Here are the current options being considered, including those that
are completely brain-dead:

o Document current state. (Must use call_rcu() if module
destroys slab of RCU-protected objects.)

Need to review Julia's and Uladzislau's series of patches
that change call_rcu() of slab objects to kfree_rcu().

o Make rcu_barrier() wait for kfree_rcu() objects. (This is
surprisingly complex and will wait unnecessarily in some cases.
However, it does preserve current code.)

o Make a kfree_rcu_barrier() that waits for kfree_rcu() objects.
(This avoids the unnecessary waits, but adds complexity to
kfree_rcu(). This is harder than it looks, but could be done,
for example by maintaining pairs of per-CPU counters and handling
them in an SRCU-like fashion. Need some way of communicating the
index, though.)

(There might be use cases where both rcu_barrier() and
kfree_rcu_barrier() would need to be invoked.)

A simpler way to implement this is to scan all of the in-flight
objects, and queue each (either separately or in bulk) using
call_rcu(). This still has problems with kfree_rcu_mightsleep()
under low-memory conditions, in which case there are a bunch
of synchronize_rcu() instances waiting. These instances could
use SRCU-like per-CPU arrays of counters. Or just protect the
calls to synchronize_rcu() and the later frees with an SRCU
reader, then have the other end call synchronize_srcu().

o Make the current kmem_cache_destroy() asynchronously wait for
all memory to be returned, then complete the destruction.
(This gets rid of a valuable debugging technique because
in normal use, it is a bug to attempt to destroy a kmem_cache
that has objects still allocated.)

o Make a kmem_cache_destroy_rcu() that asynchronously waits for
all memory to be returned, then completes the destruction.
(This raises the question of what to is it takes a "long time"
for the objects to be freed.)

o Make a kmem_cache_free_barrier() that blocks until all
objects in the specified kmem_cache have been freed.

o Make a kmem_cache_destroy_wait() that waits for all memory to
be returned, then does the destruction. This is equivalent to:

kmem_cache_free_barrier(&mycache);
kmem_cache_destroy(&mycache);

Uladzislau has started discussions on the last few of these:
https://lore.kernel.org/all/ZmnL4jkhJLIW924W@pc636/

I have also added this information to a Google Document for
easier tracking:
https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit?usp=sharing

Other thoughts?

Thanx, Paul

2024-06-13 12:24:53

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, Jun 12, 2024 at 08:38:02PM -0700, Paul E. McKenney wrote:
> o Make the current kmem_cache_destroy() asynchronously wait for
> all memory to be returned, then complete the destruction.
> (This gets rid of a valuable debugging technique because
> in normal use, it is a bug to attempt to destroy a kmem_cache
> that has objects still allocated.)
>
> o Make a kmem_cache_destroy_rcu() that asynchronously waits for
> all memory to be returned, then completes the destruction.
> (This raises the question of what to is it takes a "long time"
> for the objects to be freed.)

These seem like the best two options.

> o Make a kmem_cache_free_barrier() that blocks until all
> objects in the specified kmem_cache have been freed.
>
> o Make a kmem_cache_destroy_wait() that waits for all memory to
> be returned, then does the destruction. This is equivalent to:
>
> kmem_cache_free_barrier(&mycache);
> kmem_cache_destroy(&mycache);

These also seem fine, but I'm less keen about blocking behavior.

Though, along the ideas of kmem_cache_destroy_rcu(), you might also
consider renaming this last one to kmem_cache_destroy_rcu_wait/barrier().
This way, it's RCU focused, and you can deal directly with the question
of, "how long is too long to block/to memleak?"

Specifically what I mean is that we can still claim a memory leak has
occurred if one batched kfree_rcu freeing grace period has elapsed since
the last call to kmem_cache_destroy_rcu_wait/barrier() or
kmem_cache_destroy_rcu(). In that case, you quit blocking, or you quit
asynchronously waiting, and then you splat about a memleak like we have
now.

But then, if that mechanism generally works, we don't really need a new
function and we can just go with the first option of making
kmem_cache_destroy() asynchronously wait. It'll wait, as you described,
but then we adjust the tail of every kfree_rcu batch freeing cycle to
check if there are _still_ any old outstanding kmem_cache_destroy()
requests. If so, then we can splat and keep the old debugging info we
currently have for finding memleaks.

Jason

2024-06-13 12:46:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 02:22:41PM +0200, Jason A. Donenfeld wrote:
> On Wed, Jun 12, 2024 at 08:38:02PM -0700, Paul E. McKenney wrote:
> > o Make the current kmem_cache_destroy() asynchronously wait for
> > all memory to be returned, then complete the destruction.
> > (This gets rid of a valuable debugging technique because
> > in normal use, it is a bug to attempt to destroy a kmem_cache
> > that has objects still allocated.)
> >
> > o Make a kmem_cache_destroy_rcu() that asynchronously waits for
> > all memory to be returned, then completes the destruction.
> > (This raises the question of what to is it takes a "long time"
> > for the objects to be freed.)
>
> These seem like the best two options.

I like them myself, but much depends on how much violence they do to
the slab subsystem and to debuggability.

> > o Make a kmem_cache_free_barrier() that blocks until all
> > objects in the specified kmem_cache have been freed.
> >
> > o Make a kmem_cache_destroy_wait() that waits for all memory to
> > be returned, then does the destruction. This is equivalent to:
> >
> > kmem_cache_free_barrier(&mycache);
> > kmem_cache_destroy(&mycache);
>
> These also seem fine, but I'm less keen about blocking behavior.

One advantage of the blocking behavior is that it pinpoints memory
leaks from that slab. On the other hand, one can argue that you want
this to block during testing but to be asynchronous in production.
Borrowing someone else's hand, there are probably lots of other arguments
one can make.

> Though, along the ideas of kmem_cache_destroy_rcu(), you might also
> consider renaming this last one to kmem_cache_destroy_rcu_wait/barrier().
> This way, it's RCU focused, and you can deal directly with the question
> of, "how long is too long to block/to memleak?"

Good point!

> Specifically what I mean is that we can still claim a memory leak has
> occurred if one batched kfree_rcu freeing grace period has elapsed since
> the last call to kmem_cache_destroy_rcu_wait/barrier() or
> kmem_cache_destroy_rcu(). In that case, you quit blocking, or you quit
> asynchronously waiting, and then you splat about a memleak like we have
> now.

How about a kmem_cache_destroy_rcu() that marks that specified cache
for destruction, and then a kmem_cache_destroy_barrier() that waits?

I took the liberty of adding your name to the Google document [1] and
adding this section:

kmem_cache_destroy_rcu/_barrier()

The idea here is to provide a asynchronous 
kmem_cache_destroy_rcu() as described above along with a
kmem_cache_destroy_barrier() that waits for the destruction
of all prior kmem_cache instances previously passed
to kmem_cache_destroy_rcu().  Alternatively,  could
return a cookie that could be passed into a later call to
kmem_cache_destroy_barrier().  This alternative has the
advantage of isolating which kmem_cache instance is suffering
the memory leak.

Please let me know if either liberty is in any way problematic.

> But then, if that mechanism generally works, we don't really need a new
> function and we can just go with the first option of making
> kmem_cache_destroy() asynchronously wait. It'll wait, as you described,
> but then we adjust the tail of every kfree_rcu batch freeing cycle to
> check if there are _still_ any old outstanding kmem_cache_destroy()
> requests. If so, then we can splat and keep the old debugging info we
> currently have for finding memleaks.

The mechanism can always be sabotaged by memory-leak bugs on the part
of the user of the kmem_cache structure in play, right?

OK, but I see your point. I added this to the existing
"kmem_cache_destroy() Lingers for kfree_rcu()" section:

One way of preserving this debugging information is to splat if
all of the slab’s memory has not been freed within a reasonable
timeframe, perhaps the same 21 seconds that causes an RCU CPU
stall warning.

Does that capture it?

Thanx, Paul

[1] https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit?usp=sharing

2024-06-13 12:47:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > when the callback only performs kmem_cache_free. Use
> > > > kfree_rcu() directly.
> > > >
> > > > The changes were done using the following Coccinelle semantic patch.
> > > > This semantic patch is designed to ignore cases where the callback
> > > > function is used in another way.
> > >
> > > How does the discussion on:
> > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > https://lore.kernel.org/all/[email protected]/
> > > reflect on this series? IIUC we should hold off..
> >
> > We do need to hold off for the ones in kernel modules (such as 07/14)
> > where the kmem_cache is destroyed during module unload.
> >
> > OK, I might as well go through them...
> >
> > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > Needs to wait, see wg_allowedips_slab_uninit().
>
> Also, notably, this patch needs additionally:
>
> diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> index e4e1638fce1b..c95f6937c3f1 100644
> --- a/drivers/net/wireguard/allowedips.c
> +++ b/drivers/net/wireguard/allowedips.c
> @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
>
> void wg_allowedips_slab_uninit(void)
> {
> - rcu_barrier();
> kmem_cache_destroy(node_cache);
> }
>
> Once kmem_cache_destroy has been fixed to be deferrable.
>
> I assume the other patches are similar -- an rcu_barrier() can be
> removed. So some manual meddling of these might be in order.

Assuming that the deferrable kmem_cache_destroy() is the option chosen,
agreed.

Thanx, Paul

2024-06-13 13:07:32

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > when the callback only performs kmem_cache_free. Use
> > > > > kfree_rcu() directly.
> > > > >
> > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > This semantic patch is designed to ignore cases where the callback
> > > > > function is used in another way.
> > > >
> > > > How does the discussion on:
> > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > https://lore.kernel.org/all/[email protected]/
> > > > reflect on this series? IIUC we should hold off..
> > >
> > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > where the kmem_cache is destroyed during module unload.
> > >
> > > OK, I might as well go through them...
> > >
> > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > Needs to wait, see wg_allowedips_slab_uninit().
> >
> > Also, notably, this patch needs additionally:
> >
> > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > index e4e1638fce1b..c95f6937c3f1 100644
> > --- a/drivers/net/wireguard/allowedips.c
> > +++ b/drivers/net/wireguard/allowedips.c
> > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> >
> > void wg_allowedips_slab_uninit(void)
> > {
> > - rcu_barrier();
> > kmem_cache_destroy(node_cache);
> > }
> >
> > Once kmem_cache_destroy has been fixed to be deferrable.
> >
> > I assume the other patches are similar -- an rcu_barrier() can be
> > removed. So some manual meddling of these might be in order.
>
> Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> agreed.
>
<snip>
void kmem_cache_destroy(struct kmem_cache *s)
{
int err = -EBUSY;
bool rcu_set;

if (unlikely(!s) || !kasan_check_byte(s))
return;

cpus_read_lock();
mutex_lock(&slab_mutex);

rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;

s->refcount--;
if (s->refcount)
goto out_unlock;

err = shutdown_cache(s);
WARN(err, "%s %s: Slab cache still has objects when called from %pS",
__func__, s->name, (void *)_RET_IP_);
...
cpus_read_unlock();
if (!err && !rcu_set)
kmem_cache_release(s);
}
<snip>

so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
and a cache by a grace period. Similar flag can be added, like
SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
if there are still objects which should be freed.

Any thoughts here?

--
Uladzislau Rezki

2024-06-13 14:17:46

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Wed, 12 Jun 2024 20:38:02 -0700 Paul E. McKenney wrote:
> o Make rcu_barrier() wait for kfree_rcu() objects. (This is
> surprisingly complex and will wait unnecessarily in some cases.
> However, it does preserve current code.)

Not sure how much mental capacity for API variations we expect from
people using caches, but I feel like this would score the highest
on Rusty's API scale. I'd even venture an opinion that it's less
confusing to require cache users to have their own (trivial) callbacks
than add API variants we can't error check even at runtime...

2024-06-13 14:53:31

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 07:17:38AM -0700, Jakub Kicinski wrote:
> On Wed, 12 Jun 2024 20:38:02 -0700 Paul E. McKenney wrote:
> > o Make rcu_barrier() wait for kfree_rcu() objects. (This is
> > surprisingly complex and will wait unnecessarily in some cases.
> > However, it does preserve current code.)
>
> Not sure how much mental capacity for API variations we expect from
> people using caches, but I feel like this would score the highest
> on Rusty's API scale. I'd even venture an opinion that it's less
> confusing to require cache users to have their own (trivial) callbacks
> than add API variants we can't error check even at runtime...

Fair point, though please see Jason's emails.

And the underlying within-RCU mechanism is the same either way, so that
API decision can be deferred for some time.

But the within-slab mechanism does have the advantage of also possibly
simplifying reference-counting and the potential upcoming hazard pointers.
On the other hand, I currently have no idea what level of violence this
change would make to the slab subsystem.

Thanx, Paul

2024-06-13 15:06:38

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > kfree_rcu() directly.
> > > > > >
> > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > function is used in another way.
> > > > >
> > > > > How does the discussion on:
> > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > https://lore.kernel.org/all/[email protected]/
> > > > > reflect on this series? IIUC we should hold off..
> > > >
> > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > where the kmem_cache is destroyed during module unload.
> > > >
> > > > OK, I might as well go through them...
> > > >
> > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > Needs to wait, see wg_allowedips_slab_uninit().
> > >
> > > Also, notably, this patch needs additionally:
> > >
> > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > index e4e1638fce1b..c95f6937c3f1 100644
> > > --- a/drivers/net/wireguard/allowedips.c
> > > +++ b/drivers/net/wireguard/allowedips.c
> > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > >
> > > void wg_allowedips_slab_uninit(void)
> > > {
> > > - rcu_barrier();
> > > kmem_cache_destroy(node_cache);
> > > }
> > >
> > > Once kmem_cache_destroy has been fixed to be deferrable.
> > >
> > > I assume the other patches are similar -- an rcu_barrier() can be
> > > removed. So some manual meddling of these might be in order.
> >
> > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > agreed.
> >
> <snip>
> void kmem_cache_destroy(struct kmem_cache *s)
> {
> int err = -EBUSY;
> bool rcu_set;
>
> if (unlikely(!s) || !kasan_check_byte(s))
> return;
>
> cpus_read_lock();
> mutex_lock(&slab_mutex);
>
> rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
>
> s->refcount--;
> if (s->refcount)
> goto out_unlock;
>
> err = shutdown_cache(s);
> WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> __func__, s->name, (void *)_RET_IP_);
> ...
> cpus_read_unlock();
> if (!err && !rcu_set)
> kmem_cache_release(s);
> }
> <snip>
>
> so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> and a cache by a grace period. Similar flag can be added, like
> SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> if there are still objects which should be freed.
>
> Any thoughts here?

Wouldn't we also need some additional code to later check for all objects
being freed to the slab, whether or not that code is initiated from
kmem_cache_destroy()?

Either way, I am adding the SLAB_DESTROY_ONCE_FULLY_FREED possibility,
thank you! [1]

Thanx, Paul

[1] https://docs.google.com/document/d/1v0rcZLvvjVGejT3523W0rDy_sLFu2LWc_NR3fQItZaA/edit?usp=sharing

2024-06-13 17:46:13

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > kfree_rcu() directly.
> > > > > > > >
> > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > function is used in another way.
> > > > > > >
> > > > > > > How does the discussion on:
> > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > reflect on this series? IIUC we should hold off..
> > > > > >
> > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > where the kmem_cache is destroyed during module unload.
> > > > > >
> > > > > > OK, I might as well go through them...
> > > > > >
> > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > >
> > > > > Also, notably, this patch needs additionally:
> > > > >
> > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > >
> > > > > void wg_allowedips_slab_uninit(void)
> > > > > {
> > > > > - rcu_barrier();
> > > > > kmem_cache_destroy(node_cache);
> > > > > }
> > > > >
> > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > >
> > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > removed. So some manual meddling of these might be in order.
> > > >
> > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > agreed.
> > > >
> > > <snip>
> > > void kmem_cache_destroy(struct kmem_cache *s)
> > > {
> > > int err = -EBUSY;
> > > bool rcu_set;
> > >
> > > if (unlikely(!s) || !kasan_check_byte(s))
> > > return;
> > >
> > > cpus_read_lock();
> > > mutex_lock(&slab_mutex);
> > >
> > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > >
> > > s->refcount--;
> > > if (s->refcount)
> > > goto out_unlock;
> > >
> > > err = shutdown_cache(s);
> > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > __func__, s->name, (void *)_RET_IP_);
> > > ...
> > > cpus_read_unlock();
> > > if (!err && !rcu_set)
> > > kmem_cache_release(s);
> > > }
> > > <snip>
> > >
> > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > and a cache by a grace period. Similar flag can be added, like
> > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > if there are still objects which should be freed.
> > >
> > > Any thoughts here?
> >
> > Wouldn't we also need some additional code to later check for all objects
> > being freed to the slab, whether or not that code is initiated from
> > kmem_cache_destroy()?
> >
> Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> It checks that flag and if it is true and extra worker is scheduled to perform a
> deferred(instead of right away) destroy after rcu_barrier() finishes.

Like this?

SLAB_DESTROY_ONCE_FULLY_FREED

Instead of adding a new kmem_cache_destroy_rcu()
or kmem_cache_destroy_wait() API member, instead add a
SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
existing kmem_cache_destroy() function.? Use of this flag would
suppress any warnings that would otherwise be issued if there
was still slab memory yet to be freed, and it would also spawn
workqueues (or timers or whatever) to do any needed cleanup work.

Thanx, Paul

2024-06-13 17:58:51

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 10:45:59AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > > kfree_rcu() directly.
> > > > > > > > >
> > > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > > function is used in another way.
> > > > > > > >
> > > > > > > > How does the discussion on:
> > > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > > reflect on this series? IIUC we should hold off..
> > > > > > >
> > > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > > where the kmem_cache is destroyed during module unload.
> > > > > > >
> > > > > > > OK, I might as well go through them...
> > > > > > >
> > > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > > >
> > > > > > Also, notably, this patch needs additionally:
> > > > > >
> > > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > > >
> > > > > > void wg_allowedips_slab_uninit(void)
> > > > > > {
> > > > > > - rcu_barrier();
> > > > > > kmem_cache_destroy(node_cache);
> > > > > > }
> > > > > >
> > > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > > >
> > > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > > removed. So some manual meddling of these might be in order.
> > > > >
> > > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > > agreed.
> > > > >
> > > > <snip>
> > > > void kmem_cache_destroy(struct kmem_cache *s)
> > > > {
> > > > int err = -EBUSY;
> > > > bool rcu_set;
> > > >
> > > > if (unlikely(!s) || !kasan_check_byte(s))
> > > > return;
> > > >
> > > > cpus_read_lock();
> > > > mutex_lock(&slab_mutex);
> > > >
> > > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > > >
> > > > s->refcount--;
> > > > if (s->refcount)
> > > > goto out_unlock;
> > > >
> > > > err = shutdown_cache(s);
> > > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > > __func__, s->name, (void *)_RET_IP_);
> > > > ...
> > > > cpus_read_unlock();
> > > > if (!err && !rcu_set)
> > > > kmem_cache_release(s);
> > > > }
> > > > <snip>
> > > >
> > > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > > and a cache by a grace period. Similar flag can be added, like
> > > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > > if there are still objects which should be freed.
> > > >
> > > > Any thoughts here?
> > >
> > > Wouldn't we also need some additional code to later check for all objects
> > > being freed to the slab, whether or not that code is initiated from
> > > kmem_cache_destroy()?
> > >
> > Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> > It checks that flag and if it is true and extra worker is scheduled to perform a
> > deferred(instead of right away) destroy after rcu_barrier() finishes.
>
> Like this?
>
> SLAB_DESTROY_ONCE_FULLY_FREED
>
> Instead of adding a new kmem_cache_destroy_rcu()
> or kmem_cache_destroy_wait() API member, instead add a
> SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
> existing kmem_cache_destroy() function.  Use of this flag would
> suppress any warnings that would otherwise be issued if there
> was still slab memory yet to be freed, and it would also spawn
> workqueues (or timers or whatever) to do any needed cleanup work.
>
>
The flag is passed as all others during creating a cache:

slab = kmem_cache_create(name, size, ..., SLAB_DESTROY_ONCE_FULLY_FREED | OTHER_FLAGS, NULL);

the rest description is correct to me.

--
Uladzislau Rezki

2024-06-13 18:14:23

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 07:58:17PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 13, 2024 at 10:45:59AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> > > On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > > > kfree_rcu() directly.
> > > > > > > > > >
> > > > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > > > function is used in another way.
> > > > > > > > >
> > > > > > > > > How does the discussion on:
> > > > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > > > reflect on this series? IIUC we should hold off..
> > > > > > > >
> > > > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > > > where the kmem_cache is destroyed during module unload.
> > > > > > > >
> > > > > > > > OK, I might as well go through them...
> > > > > > > >
> > > > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > > > >
> > > > > > > Also, notably, this patch needs additionally:
> > > > > > >
> > > > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > > > >
> > > > > > > void wg_allowedips_slab_uninit(void)
> > > > > > > {
> > > > > > > - rcu_barrier();
> > > > > > > kmem_cache_destroy(node_cache);
> > > > > > > }
> > > > > > >
> > > > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > > > >
> > > > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > > > removed. So some manual meddling of these might be in order.
> > > > > >
> > > > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > > > agreed.
> > > > > >
> > > > > <snip>
> > > > > void kmem_cache_destroy(struct kmem_cache *s)
> > > > > {
> > > > > int err = -EBUSY;
> > > > > bool rcu_set;
> > > > >
> > > > > if (unlikely(!s) || !kasan_check_byte(s))
> > > > > return;
> > > > >
> > > > > cpus_read_lock();
> > > > > mutex_lock(&slab_mutex);
> > > > >
> > > > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > > > >
> > > > > s->refcount--;
> > > > > if (s->refcount)
> > > > > goto out_unlock;
> > > > >
> > > > > err = shutdown_cache(s);
> > > > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > > > __func__, s->name, (void *)_RET_IP_);
> > > > > ...
> > > > > cpus_read_unlock();
> > > > > if (!err && !rcu_set)
> > > > > kmem_cache_release(s);
> > > > > }
> > > > > <snip>
> > > > >
> > > > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > > > and a cache by a grace period. Similar flag can be added, like
> > > > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > > > if there are still objects which should be freed.
> > > > >
> > > > > Any thoughts here?
> > > >
> > > > Wouldn't we also need some additional code to later check for all objects
> > > > being freed to the slab, whether or not that code is initiated from
> > > > kmem_cache_destroy()?
> > > >
> > > Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> > > It checks that flag and if it is true and extra worker is scheduled to perform a
> > > deferred(instead of right away) destroy after rcu_barrier() finishes.
> >
> > Like this?
> >
> > SLAB_DESTROY_ONCE_FULLY_FREED
> >
> > Instead of adding a new kmem_cache_destroy_rcu()
> > or kmem_cache_destroy_wait() API member, instead add a
> > SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
> > existing kmem_cache_destroy() function.? Use of this flag would
> > suppress any warnings that would otherwise be issued if there
> > was still slab memory yet to be freed, and it would also spawn
> > workqueues (or timers or whatever) to do any needed cleanup work.
> >
> >
> The flag is passed as all others during creating a cache:
>
> slab = kmem_cache_create(name, size, ..., SLAB_DESTROY_ONCE_FULLY_FREED | OTHER_FLAGS, NULL);
>
> the rest description is correct to me.

Good catch, fixed, thank you!

Thanx, Paul

2024-06-14 12:35:48

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Thu, Jun 13, 2024 at 11:13:52AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 13, 2024 at 07:58:17PM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 13, 2024 at 10:45:59AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> > > > On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > > > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > > > > kfree_rcu() directly.
> > > > > > > > > > >
> > > > > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > > > > function is used in another way.
> > > > > > > > > >
> > > > > > > > > > How does the discussion on:
> > > > > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > > > > reflect on this series? IIUC we should hold off..
> > > > > > > > >
> > > > > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > > > > where the kmem_cache is destroyed during module unload.
> > > > > > > > >
> > > > > > > > > OK, I might as well go through them...
> > > > > > > > >
> > > > > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > > > > >
> > > > > > > > Also, notably, this patch needs additionally:
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > > > > >
> > > > > > > > void wg_allowedips_slab_uninit(void)
> > > > > > > > {
> > > > > > > > - rcu_barrier();
> > > > > > > > kmem_cache_destroy(node_cache);
> > > > > > > > }
> > > > > > > >
> > > > > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > > > > >
> > > > > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > > > > removed. So some manual meddling of these might be in order.
> > > > > > >
> > > > > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > > > > agreed.
> > > > > > >
> > > > > > <snip>
> > > > > > void kmem_cache_destroy(struct kmem_cache *s)
> > > > > > {
> > > > > > int err = -EBUSY;
> > > > > > bool rcu_set;
> > > > > >
> > > > > > if (unlikely(!s) || !kasan_check_byte(s))
> > > > > > return;
> > > > > >
> > > > > > cpus_read_lock();
> > > > > > mutex_lock(&slab_mutex);
> > > > > >
> > > > > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > > > > >
> > > > > > s->refcount--;
> > > > > > if (s->refcount)
> > > > > > goto out_unlock;
> > > > > >
> > > > > > err = shutdown_cache(s);
> > > > > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > > > > __func__, s->name, (void *)_RET_IP_);
> > > > > > ...
> > > > > > cpus_read_unlock();
> > > > > > if (!err && !rcu_set)
> > > > > > kmem_cache_release(s);
> > > > > > }
> > > > > > <snip>
> > > > > >
> > > > > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > > > > and a cache by a grace period. Similar flag can be added, like
> > > > > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > > > > if there are still objects which should be freed.
> > > > > >
> > > > > > Any thoughts here?
> > > > >
> > > > > Wouldn't we also need some additional code to later check for all objects
> > > > > being freed to the slab, whether or not that code is initiated from
> > > > > kmem_cache_destroy()?
> > > > >
> > > > Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> > > > It checks that flag and if it is true and extra worker is scheduled to perform a
> > > > deferred(instead of right away) destroy after rcu_barrier() finishes.
> > >
> > > Like this?
> > >
> > > SLAB_DESTROY_ONCE_FULLY_FREED
> > >
> > > Instead of adding a new kmem_cache_destroy_rcu()
> > > or kmem_cache_destroy_wait() API member, instead add a
> > > SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
> > > existing kmem_cache_destroy() function.  Use of this flag would
> > > suppress any warnings that would otherwise be issued if there
> > > was still slab memory yet to be freed, and it would also spawn
> > > workqueues (or timers or whatever) to do any needed cleanup work.
> > >
> > >
> > The flag is passed as all others during creating a cache:
> >
> > slab = kmem_cache_create(name, size, ..., SLAB_DESTROY_ONCE_FULLY_FREED | OTHER_FLAGS, NULL);
> >
> > the rest description is correct to me.
>
> Good catch, fixed, thank you!
>
And here we go with prototype(untested):

<snip>
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 7247e217e21b..700b8a909f8a 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -59,6 +59,7 @@ enum _slab_flag_bits {
#ifdef CONFIG_SLAB_OBJ_EXT
_SLAB_NO_OBJ_EXT,
#endif
+ _SLAB_DEFER_DESTROY,
_SLAB_FLAGS_LAST_BIT
};

@@ -139,6 +140,7 @@ enum _slab_flag_bits {
*/
/* Defer freeing slabs to RCU */
#define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU)
+#define SLAB_DEFER_DESTROY __SLAB_FLAG_BIT(_SLAB_DEFER_DESTROY)
/* Trace allocations and frees */
#define SLAB_TRACE __SLAB_FLAG_BIT(_SLAB_TRACE)

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 1560a1546bb1..99458a0197b5 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -45,6 +45,11 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work);
static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
slab_caches_to_rcu_destroy_workfn);

+static LIST_HEAD(slab_caches_defer_destroy);
+static void slab_caches_defer_destroy_workfn(struct work_struct *work);
+static DECLARE_DELAYED_WORK(slab_caches_defer_destroy_work,
+ slab_caches_defer_destroy_workfn);
+
/*
* Set of flags that will prevent slab merging
*/
@@ -448,6 +453,31 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
}
}

+static void
+slab_caches_defer_destroy_workfn(struct work_struct *work)
+{
+ struct kmem_cache *s, *s2;
+
+ mutex_lock(&slab_mutex);
+ list_for_each_entry_safe(s, s2, &slab_caches_defer_destroy, list) {
+ if (__kmem_cache_empty(s)) {
+ /* free asan quarantined objects */
+ kasan_cache_shutdown(s);
+ (void) __kmem_cache_shutdown(s);
+
+ list_del(&s->list);
+
+ debugfs_slab_release(s);
+ kfence_shutdown_cache(s);
+ kmem_cache_release(s);
+ }
+ }
+ mutex_unlock(&slab_mutex);
+
+ if (!list_empty(&slab_caches_defer_destroy))
+ schedule_delayed_work(&slab_caches_defer_destroy_work, HZ);
+}
+
static int shutdown_cache(struct kmem_cache *s)
{
/* free asan quarantined objects */
@@ -493,6 +523,13 @@ void kmem_cache_destroy(struct kmem_cache *s)
if (s->refcount)
goto out_unlock;

+ /* Should a destroy process be deferred? */
+ if (s->flags & SLAB_DEFER_DESTROY) {
+ list_move_tail(&s->list, &slab_caches_defer_destroy);
+ schedule_delayed_work(&slab_caches_defer_destroy_work, HZ);
+ goto out_unlock;
+ }
+
err = shutdown_cache(s);
WARN(err, "%s %s: Slab cache still has objects when called from %pS",
__func__, s->name, (void *)_RET_IP_);
<snip>

Thanks!

--
Uladzislau Rezki

2024-06-14 14:18:00

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Fri, Jun 14, 2024 at 02:35:33PM +0200, Uladzislau Rezki wrote:
> On Thu, Jun 13, 2024 at 11:13:52AM -0700, Paul E. McKenney wrote:
> > On Thu, Jun 13, 2024 at 07:58:17PM +0200, Uladzislau Rezki wrote:
> > > On Thu, Jun 13, 2024 at 10:45:59AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> > > > > On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > > > > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > > > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > > > > > kfree_rcu() directly.
> > > > > > > > > > > >
> > > > > > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > > > > > function is used in another way.
> > > > > > > > > > >
> > > > > > > > > > > How does the discussion on:
> > > > > > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > > > > > reflect on this series? IIUC we should hold off..
> > > > > > > > > >
> > > > > > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > > > > > where the kmem_cache is destroyed during module unload.
> > > > > > > > > >
> > > > > > > > > > OK, I might as well go through them...
> > > > > > > > > >
> > > > > > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > > > > > >
> > > > > > > > > Also, notably, this patch needs additionally:
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > > > > > >
> > > > > > > > > void wg_allowedips_slab_uninit(void)
> > > > > > > > > {
> > > > > > > > > - rcu_barrier();
> > > > > > > > > kmem_cache_destroy(node_cache);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > > > > > >
> > > > > > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > > > > > removed. So some manual meddling of these might be in order.
> > > > > > > >
> > > > > > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > > > > > agreed.
> > > > > > > >
> > > > > > > <snip>
> > > > > > > void kmem_cache_destroy(struct kmem_cache *s)
> > > > > > > {
> > > > > > > int err = -EBUSY;
> > > > > > > bool rcu_set;
> > > > > > >
> > > > > > > if (unlikely(!s) || !kasan_check_byte(s))
> > > > > > > return;
> > > > > > >
> > > > > > > cpus_read_lock();
> > > > > > > mutex_lock(&slab_mutex);
> > > > > > >
> > > > > > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > > > > > >
> > > > > > > s->refcount--;
> > > > > > > if (s->refcount)
> > > > > > > goto out_unlock;
> > > > > > >
> > > > > > > err = shutdown_cache(s);
> > > > > > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > > > > > __func__, s->name, (void *)_RET_IP_);
> > > > > > > ...
> > > > > > > cpus_read_unlock();
> > > > > > > if (!err && !rcu_set)
> > > > > > > kmem_cache_release(s);
> > > > > > > }
> > > > > > > <snip>
> > > > > > >
> > > > > > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > > > > > and a cache by a grace period. Similar flag can be added, like
> > > > > > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > > > > > if there are still objects which should be freed.
> > > > > > >
> > > > > > > Any thoughts here?
> > > > > >
> > > > > > Wouldn't we also need some additional code to later check for all objects
> > > > > > being freed to the slab, whether or not that code is initiated from
> > > > > > kmem_cache_destroy()?
> > > > > >
> > > > > Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> > > > > It checks that flag and if it is true and extra worker is scheduled to perform a
> > > > > deferred(instead of right away) destroy after rcu_barrier() finishes.
> > > >
> > > > Like this?
> > > >
> > > > SLAB_DESTROY_ONCE_FULLY_FREED
> > > >
> > > > Instead of adding a new kmem_cache_destroy_rcu()
> > > > or kmem_cache_destroy_wait() API member, instead add a
> > > > SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
> > > > existing kmem_cache_destroy() function.? Use of this flag would
> > > > suppress any warnings that would otherwise be issued if there
> > > > was still slab memory yet to be freed, and it would also spawn
> > > > workqueues (or timers or whatever) to do any needed cleanup work.
> > > >
> > > >
> > > The flag is passed as all others during creating a cache:
> > >
> > > slab = kmem_cache_create(name, size, ..., SLAB_DESTROY_ONCE_FULLY_FREED | OTHER_FLAGS, NULL);
> > >
> > > the rest description is correct to me.
> >
> > Good catch, fixed, thank you!
> >
> And here we go with prototype(untested):

Thank you for putting this together! It looks way simpler than I would
have guessed, and quite a bit simpler than I would expect it would be
to extend rcu_barrier() to cover kfree_rcu().

> <snip>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 7247e217e21b..700b8a909f8a 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -59,6 +59,7 @@ enum _slab_flag_bits {
> #ifdef CONFIG_SLAB_OBJ_EXT
> _SLAB_NO_OBJ_EXT,
> #endif
> + _SLAB_DEFER_DESTROY,
> _SLAB_FLAGS_LAST_BIT
> };
>
> @@ -139,6 +140,7 @@ enum _slab_flag_bits {
> */
> /* Defer freeing slabs to RCU */
> #define SLAB_TYPESAFE_BY_RCU __SLAB_FLAG_BIT(_SLAB_TYPESAFE_BY_RCU)
> +#define SLAB_DEFER_DESTROY __SLAB_FLAG_BIT(_SLAB_DEFER_DESTROY)
> /* Trace allocations and frees */
> #define SLAB_TRACE __SLAB_FLAG_BIT(_SLAB_TRACE)
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 1560a1546bb1..99458a0197b5 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -45,6 +45,11 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work);
> static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
> slab_caches_to_rcu_destroy_workfn);
>
> +static LIST_HEAD(slab_caches_defer_destroy);
> +static void slab_caches_defer_destroy_workfn(struct work_struct *work);
> +static DECLARE_DELAYED_WORK(slab_caches_defer_destroy_work,
> + slab_caches_defer_destroy_workfn);
> +
> /*
> * Set of flags that will prevent slab merging
> */
> @@ -448,6 +453,31 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> }
> }
>
> +static void
> +slab_caches_defer_destroy_workfn(struct work_struct *work)
> +{
> + struct kmem_cache *s, *s2;
> +
> + mutex_lock(&slab_mutex);
> + list_for_each_entry_safe(s, s2, &slab_caches_defer_destroy, list) {
> + if (__kmem_cache_empty(s)) {
> + /* free asan quarantined objects */
> + kasan_cache_shutdown(s);
> + (void) __kmem_cache_shutdown(s);
> +
> + list_del(&s->list);
> +
> + debugfs_slab_release(s);
> + kfence_shutdown_cache(s);
> + kmem_cache_release(s);
> + }

My guess is that there would want to be a splat if the slab stuck around
for too long, but maybe that should instead be handled elsewhere or in
some other way? I must defer to you guys on that one.

Thanx, Paul

> + }
> + mutex_unlock(&slab_mutex);
> +
> + if (!list_empty(&slab_caches_defer_destroy))
> + schedule_delayed_work(&slab_caches_defer_destroy_work, HZ);
> +}
> +
> static int shutdown_cache(struct kmem_cache *s)
> {
> /* free asan quarantined objects */
> @@ -493,6 +523,13 @@ void kmem_cache_destroy(struct kmem_cache *s)
> if (s->refcount)
> goto out_unlock;
>
> + /* Should a destroy process be deferred? */
> + if (s->flags & SLAB_DEFER_DESTROY) {
> + list_move_tail(&s->list, &slab_caches_defer_destroy);
> + schedule_delayed_work(&slab_caches_defer_destroy_work, HZ);
> + goto out_unlock;
> + }
> +
> err = shutdown_cache(s);
> WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> __func__, s->name, (void *)_RET_IP_);
> <snip>



2024-06-14 14:51:00

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Fri, Jun 14, 2024 at 07:17:29AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 14, 2024 at 02:35:33PM +0200, Uladzislau Rezki wrote:
> > On Thu, Jun 13, 2024 at 11:13:52AM -0700, Paul E. McKenney wrote:
> > > On Thu, Jun 13, 2024 at 07:58:17PM +0200, Uladzislau Rezki wrote:
> > > > On Thu, Jun 13, 2024 at 10:45:59AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Jun 13, 2024 at 07:38:59PM +0200, Uladzislau Rezki wrote:
> > > > > > On Thu, Jun 13, 2024 at 08:06:30AM -0700, Paul E. McKenney wrote:
> > > > > > > On Thu, Jun 13, 2024 at 03:06:54PM +0200, Uladzislau Rezki wrote:
> > > > > > > > On Thu, Jun 13, 2024 at 05:47:08AM -0700, Paul E. McKenney wrote:
> > > > > > > > > On Thu, Jun 13, 2024 at 01:58:59PM +0200, Jason A. Donenfeld wrote:
> > > > > > > > > > On Wed, Jun 12, 2024 at 03:37:55PM -0700, Paul E. McKenney wrote:
> > > > > > > > > > > On Wed, Jun 12, 2024 at 02:33:05PM -0700, Jakub Kicinski wrote:
> > > > > > > > > > > > On Sun, 9 Jun 2024 10:27:12 +0200 Julia Lawall wrote:
> > > > > > > > > > > > > Since SLOB was removed, it is not necessary to use call_rcu
> > > > > > > > > > > > > when the callback only performs kmem_cache_free. Use
> > > > > > > > > > > > > kfree_rcu() directly.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The changes were done using the following Coccinelle semantic patch.
> > > > > > > > > > > > > This semantic patch is designed to ignore cases where the callback
> > > > > > > > > > > > > function is used in another way.
> > > > > > > > > > > >
> > > > > > > > > > > > How does the discussion on:
> > > > > > > > > > > > [PATCH] Revert "batman-adv: prefer kfree_rcu() over call_rcu() with free-only callbacks"
> > > > > > > > > > > > https://lore.kernel.org/all/[email protected]/
> > > > > > > > > > > > reflect on this series? IIUC we should hold off..
> > > > > > > > > > >
> > > > > > > > > > > We do need to hold off for the ones in kernel modules (such as 07/14)
> > > > > > > > > > > where the kmem_cache is destroyed during module unload.
> > > > > > > > > > >
> > > > > > > > > > > OK, I might as well go through them...
> > > > > > > > > > >
> > > > > > > > > > > [PATCH 01/14] wireguard: allowedips: replace call_rcu by kfree_rcu for simple kmem_cache_free callback
> > > > > > > > > > > Needs to wait, see wg_allowedips_slab_uninit().
> > > > > > > > > >
> > > > > > > > > > Also, notably, this patch needs additionally:
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c
> > > > > > > > > > index e4e1638fce1b..c95f6937c3f1 100644
> > > > > > > > > > --- a/drivers/net/wireguard/allowedips.c
> > > > > > > > > > +++ b/drivers/net/wireguard/allowedips.c
> > > > > > > > > > @@ -377,7 +377,6 @@ int __init wg_allowedips_slab_init(void)
> > > > > > > > > >
> > > > > > > > > > void wg_allowedips_slab_uninit(void)
> > > > > > > > > > {
> > > > > > > > > > - rcu_barrier();
> > > > > > > > > > kmem_cache_destroy(node_cache);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > Once kmem_cache_destroy has been fixed to be deferrable.
> > > > > > > > > >
> > > > > > > > > > I assume the other patches are similar -- an rcu_barrier() can be
> > > > > > > > > > removed. So some manual meddling of these might be in order.
> > > > > > > > >
> > > > > > > > > Assuming that the deferrable kmem_cache_destroy() is the option chosen,
> > > > > > > > > agreed.
> > > > > > > > >
> > > > > > > > <snip>
> > > > > > > > void kmem_cache_destroy(struct kmem_cache *s)
> > > > > > > > {
> > > > > > > > int err = -EBUSY;
> > > > > > > > bool rcu_set;
> > > > > > > >
> > > > > > > > if (unlikely(!s) || !kasan_check_byte(s))
> > > > > > > > return;
> > > > > > > >
> > > > > > > > cpus_read_lock();
> > > > > > > > mutex_lock(&slab_mutex);
> > > > > > > >
> > > > > > > > rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> > > > > > > >
> > > > > > > > s->refcount--;
> > > > > > > > if (s->refcount)
> > > > > > > > goto out_unlock;
> > > > > > > >
> > > > > > > > err = shutdown_cache(s);
> > > > > > > > WARN(err, "%s %s: Slab cache still has objects when called from %pS",
> > > > > > > > __func__, s->name, (void *)_RET_IP_);
> > > > > > > > ...
> > > > > > > > cpus_read_unlock();
> > > > > > > > if (!err && !rcu_set)
> > > > > > > > kmem_cache_release(s);
> > > > > > > > }
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > so we have SLAB_TYPESAFE_BY_RCU flag that defers freeing slab-pages
> > > > > > > > and a cache by a grace period. Similar flag can be added, like
> > > > > > > > SLAB_DESTROY_ONCE_FULLY_FREED, in this case a worker rearm itself
> > > > > > > > if there are still objects which should be freed.
> > > > > > > >
> > > > > > > > Any thoughts here?
> > > > > > >
> > > > > > > Wouldn't we also need some additional code to later check for all objects
> > > > > > > being freed to the slab, whether or not that code is initiated from
> > > > > > > kmem_cache_destroy()?
> > > > > > >
> > > > > > Same away as SLAB_TYPESAFE_BY_RCU is handled from the kmem_cache_destroy() function.
> > > > > > It checks that flag and if it is true and extra worker is scheduled to perform a
> > > > > > deferred(instead of right away) destroy after rcu_barrier() finishes.
> > > > >
> > > > > Like this?
> > > > >
> > > > > SLAB_DESTROY_ONCE_FULLY_FREED
> > > > >
> > > > > Instead of adding a new kmem_cache_destroy_rcu()
> > > > > or kmem_cache_destroy_wait() API member, instead add a
> > > > > SLAB_DESTROY_ONCE_FULLY_FREED flag that can be passed to the
> > > > > existing kmem_cache_destroy() function.  Use of this flag would
> > > > > suppress any warnings that would otherwise be issued if there
> > > > > was still slab memory yet to be freed, and it would also spawn
> > > > > workqueues (or timers or whatever) to do any needed cleanup work.
> > > > >
> > > > >
> > > > The flag is passed as all others during creating a cache:
> > > >
> > > > slab = kmem_cache_create(name, size, ..., SLAB_DESTROY_ONCE_FULLY_FREED | OTHER_FLAGS, NULL);
> > > >
> > > > the rest description is correct to me.
> > >
> > > Good catch, fixed, thank you!
> > >
> > And here we go with prototype(untested):
>
> Thank you for putting this together! It looks way simpler than I would
> have guessed, and quite a bit simpler than I would expect it would be
> to extend rcu_barrier() to cover kfree_rcu().
>
Yep, it should be pretty pretty straightforward. The slab mechanism does
not have a functionality when it comes to defer of destroying, i.e. it
is not allowed to destroy non-fully-freed-slab:

<snip>
void kmem_cache_destroy(struct kmem_cache *s)
{
...
err = shutdown_cache(s);
WARN(err, "%s %s: Slab cache still has objects when called from %pS",
__func__, s->name, (void *)_RET_IP_);
...
<snip>

So, this patch extends it.

> >
> > +static void
> > +slab_caches_defer_destroy_workfn(struct work_struct *work)
> > +{
> > + struct kmem_cache *s, *s2;
> > +
> > + mutex_lock(&slab_mutex);
> > + list_for_each_entry_safe(s, s2, &slab_caches_defer_destroy, list) {
> > + if (__kmem_cache_empty(s)) {
> > + /* free asan quarantined objects */
> > + kasan_cache_shutdown(s);
> > + (void) __kmem_cache_shutdown(s);
> > +
> > + list_del(&s->list);
> > +
> > + debugfs_slab_release(s);
> > + kfence_shutdown_cache(s);
> > + kmem_cache_release(s);
> > + }
>
> My guess is that there would want to be a splat if the slab stuck around
> for too long, but maybe that should instead be handled elsewhere or in
> some other way? I must defer to you guys on that one.
>
Probably yes.

--
Uladzislau Rezki

2024-06-14 19:34:05

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: [PATCH 00/14] replace call_rcu by kfree_rcu for simple kmem_cache_free callback

On Fri, Jun 14, 2024 at 02:35:33PM +0200, Uladzislau Rezki wrote:
> + /* Should a destroy process be deferred? */
> + if (s->flags & SLAB_DEFER_DESTROY) {
> + list_move_tail(&s->list, &slab_caches_defer_destroy);
> + schedule_delayed_work(&slab_caches_defer_destroy_work, HZ);
> + goto out_unlock;
> + }

Wouldn't it be smoother to have the actual kmem_cache_free() function
check to see if it's been marked for destruction and the refcount is
zero, rather than polling every one second? I mentioned this approach
in: https://lore.kernel.org/all/[email protected]/ -

I wonder if the right fix to this would be adding a `should_destroy`
boolean to kmem_cache, which kmem_cache_destroy() sets to true. And
then right after it checks `if (number_of_allocations == 0)
actually_destroy()`, and likewise on each kmem_cache_free(), it
could check `if (should_destroy && number_of_allocations == 0)
actually_destroy()`.

Jason