Changes since v2:
Adjusted feedback from Vlastimil. Thanks!
Dropped kmalloc subsystem generalization patches.
I did more work generalizing kmalloc subsystem.
Now they look quite bigger. I'll send them as separate
series with RFC.
Changes since v1:
Adjusted comments from Matthew, VLastimil, Rientjes.
Thank you for feedback!
Hello, these are cleanup patches for SLUB.
Please consider them for slab-next :)
Any comments will be appreciated.
Thanks.
Hyeonggon Yoo (2):
mm/slub: limit number of node partial slabs only in cache creation
mm/slub: refactor deactivate_slab()
mm/slub.c | 105 +++++++++++++++++++++---------------------------------
1 file changed, 41 insertions(+), 64 deletions(-)
--
2.33.1
Simplify deactivate_slab() by unlocking n->list_lock and retrying
cmpxchg_double() when cmpxchg_double() fails, and perform
add_{partial,full} only when it succeed.
Releasing and taking n->list_lock again here is not harmful as SLUB
avoids deactivating slabs as much as possible.
[ [email protected]: perform add_{partial,full} when cmpxchg_double()
succeed.
count deactivating full slabs even if debugging flag is not set. ]
Signed-off-by: Hyeonggon Yoo <[email protected]>
---
mm/slub.c | 91 +++++++++++++++++++++++--------------------------------
1 file changed, 38 insertions(+), 53 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 1ce09b0347ad..f0cb9d0443ac 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2348,10 +2348,10 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
void *freelist)
{
- enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
+ enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE, M_FULL_NOLIST };
struct kmem_cache_node *n = get_node(s, slab_nid(slab));
- int lock = 0, free_delta = 0;
- enum slab_modes l = M_NONE, m = M_NONE;
+ int free_delta = 0;
+ enum slab_modes mode = M_NONE;
void *nextfree, *freelist_iter, *freelist_tail;
int tail = DEACTIVATE_TO_HEAD;
unsigned long flags = 0;
@@ -2393,14 +2393,10 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
* Ensure that the slab is unfrozen while the list presence
* reflects the actual number of objects during unfreeze.
*
- * We setup the list membership and then perform a cmpxchg
- * with the count. If there is a mismatch then the slab
- * is not unfrozen but the slab is on the wrong list.
- *
- * Then we restart the process which may have to remove
- * the slab from the list that we just put it on again
- * because the number of objects in the slab may have
- * changed.
+ * We first perform cmpxchg holding lock and insert to list
+ * when it succeed. If there is mismatch then the slab is not
+ * unfrozen and number of objects in the slab may have changed.
+ * Then release lock and retry cmpxchg again.
*/
redo:
@@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
new.frozen = 0;
if (!new.inuse && n->nr_partial >= s->min_partial)
- m = M_FREE;
+ mode = M_FREE;
else if (new.freelist) {
- m = M_PARTIAL;
- if (!lock) {
- lock = 1;
- /*
- * Taking the spinlock removes the possibility that
- * acquire_slab() will see a slab that is frozen
- */
- spin_lock_irqsave(&n->list_lock, flags);
- }
- } else {
- m = M_FULL;
- if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
- lock = 1;
- /*
- * This also ensures that the scanning of full
- * slabs from diagnostic functions will not see
- * any frozen slabs.
- */
- spin_lock_irqsave(&n->list_lock, flags);
- }
- }
-
- if (l != m) {
- if (l == M_PARTIAL)
- remove_partial(n, slab);
- else if (l == M_FULL)
- remove_full(s, n, slab);
+ mode = M_PARTIAL;
+ /*
+ * Taking the spinlock removes the possibility that
+ * acquire_slab() will see a slab that is frozen
+ */
+ spin_lock_irqsave(&n->list_lock, flags);
+ } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
+ mode = M_FULL;
+ /*
+ * This also ensures that the scanning of full
+ * slabs from diagnostic functions will not see
+ * any frozen slabs.
+ */
+ spin_lock_irqsave(&n->list_lock, flags);
+ } else
+ mode = M_FULL_NOLIST;
- if (m == M_PARTIAL)
- add_partial(n, slab, tail);
- else if (m == M_FULL)
- add_full(s, n, slab);
- }
- l = m;
if (!cmpxchg_double_slab(s, slab,
old.freelist, old.counters,
new.freelist, new.counters,
- "unfreezing slab"))
+ "unfreezing slab")) {
+ if (mode == M_PARTIAL || mode == M_FULL)
+ spin_unlock_irqrestore(&n->list_lock, flags);
goto redo;
+ }
- if (lock)
- spin_unlock_irqrestore(&n->list_lock, flags);
- if (m == M_PARTIAL)
+ if (mode == M_PARTIAL) {
+ add_partial(n, slab, tail);
+ spin_unlock_irqrestore(&n->list_lock, flags);
stat(s, tail);
- else if (m == M_FULL)
- stat(s, DEACTIVATE_FULL);
- else if (m == M_FREE) {
+ } else if (mode == M_FREE) {
stat(s, DEACTIVATE_EMPTY);
discard_slab(s, slab);
stat(s, FREE_SLAB);
- }
+ } else if (mode == M_FULL) {
+ add_full(s, n, slab);
+ spin_unlock_irqrestore(&n->list_lock, flags);
+ stat(s, DEACTIVATE_FULL);
+ } else if (mode == M_FULL_NOLIST)
+ stat(s, DEACTIVATE_FULL);
}
#ifdef CONFIG_SLUB_CPU_PARTIAL
--
2.33.1
SLUB sets number of minimum partial slabs for node (min_partial)
using set_min_partial(). SLUB holds at least min_partial slabs even if
they're empty to avoid excessive use of page allocator.
set_min_partial() limits value of min_partial limits value of
min_partial MIN_PARTIAL and MAX_PARTIAL. As set_min_partial() can be
called by min_partial_store() too, Only limit value of min_partial
in kmem_cache_open() so that it can be changed to value that a user wants.
[ [email protected]: Fold set_min_partial() into its callers ]
Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
---
mm/slub.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/mm/slub.c b/mm/slub.c
index 261474092e43..1ce09b0347ad 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4000,15 +4000,6 @@ static int init_kmem_cache_nodes(struct kmem_cache *s)
return 1;
}
-static void set_min_partial(struct kmem_cache *s, unsigned long min)
-{
- if (min < MIN_PARTIAL)
- min = MIN_PARTIAL;
- else if (min > MAX_PARTIAL)
- min = MAX_PARTIAL;
- s->min_partial = min;
-}
-
static void set_cpu_partial(struct kmem_cache *s)
{
#ifdef CONFIG_SLUB_CPU_PARTIAL
@@ -4215,7 +4206,8 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags)
* The larger the object size is, the more slabs we want on the partial
* list to avoid pounding the page allocator excessively.
*/
- set_min_partial(s, ilog2(s->size) / 2);
+ s->min_partial = min_t(unsigned long, MAX_PARTIAL, ilog2(s->size) / 2);
+ s->min_partial = max_t(unsigned long, MIN_PARTIAL, s->min_partial);
set_cpu_partial(s);
@@ -5396,7 +5388,7 @@ static ssize_t min_partial_store(struct kmem_cache *s, const char *buf,
if (err)
return err;
- set_min_partial(s, min);
+ s->min_partial = min;
return length;
}
SLAB_ATTR(min_partial);
--
2.33.1
On 3/7/22 08:40, Hyeonggon Yoo wrote:
> Simplify deactivate_slab() by unlocking n->list_lock and retrying
> cmpxchg_double() when cmpxchg_double() fails, and perform
> add_{partial,full} only when it succeed.
>
> Releasing and taking n->list_lock again here is not harmful as SLUB
> avoids deactivating slabs as much as possible.
>
> [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> succeed.
>
> count deactivating full slabs even if debugging flag is not set. ]
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
adding both to slab-next. Fixed up some nits myself, see below:
>
> @@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> new.frozen = 0;
>
> if (!new.inuse && n->nr_partial >= s->min_partial)
> - m = M_FREE;
> + mode = M_FREE;
> else if (new.freelist) {
This was against kernel style even before the patch - we use { } in the
'else if' branch, thus all branches should use { } even if one-line.
> - m = M_PARTIAL;
> - if (!lock) {
> - lock = 1;
> - /*
> - * Taking the spinlock removes the possibility that
> - * acquire_slab() will see a slab that is frozen
> - */
> - spin_lock_irqsave(&n->list_lock, flags);
> - }
> - } else {
> - m = M_FULL;
> - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> - lock = 1;
> - /*
> - * This also ensures that the scanning of full
> - * slabs from diagnostic functions will not see
> - * any frozen slabs.
> - */
> - spin_lock_irqsave(&n->list_lock, flags);
> - }
> - }
> -
> - if (l != m) {
> - if (l == M_PARTIAL)
> - remove_partial(n, slab);
> - else if (l == M_FULL)
> - remove_full(s, n, slab);
> + mode = M_PARTIAL;
> + /*
> + * Taking the spinlock removes the possibility that
> + * acquire_slab() will see a slab that is frozen
> + */
> + spin_lock_irqsave(&n->list_lock, flags);
> + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> + mode = M_FULL;
> + /*
> + * This also ensures that the scanning of full
> + * slabs from diagnostic functions will not see
> + * any frozen slabs.
> + */
> + spin_lock_irqsave(&n->list_lock, flags);
> + } else
> + mode = M_FULL_NOLIST;
Ditto here (this is new).
> - if (m == M_PARTIAL)
> - add_partial(n, slab, tail);
> - else if (m == M_FULL)
> - add_full(s, n, slab);
> - }
>
> - l = m;
> if (!cmpxchg_double_slab(s, slab,
> old.freelist, old.counters,
> new.freelist, new.counters,
> - "unfreezing slab"))
> + "unfreezing slab")) {
> + if (mode == M_PARTIAL || mode == M_FULL)
> + spin_unlock_irqrestore(&n->list_lock, flags);
> goto redo;
> + }
>
> - if (lock)
> - spin_unlock_irqrestore(&n->list_lock, flags);
>
> - if (m == M_PARTIAL)
> + if (mode == M_PARTIAL) {
> + add_partial(n, slab, tail);
> + spin_unlock_irqrestore(&n->list_lock, flags);
> stat(s, tail);
> - else if (m == M_FULL)
> - stat(s, DEACTIVATE_FULL);
> - else if (m == M_FREE) {
> + } else if (mode == M_FREE) {
> stat(s, DEACTIVATE_EMPTY);
> discard_slab(s, slab);
> stat(s, FREE_SLAB);
> - }
> + } else if (mode == M_FULL) {
> + add_full(s, n, slab);
> + spin_unlock_irqrestore(&n->list_lock, flags);
> + stat(s, DEACTIVATE_FULL);
> + } else if (mode == M_FULL_NOLIST)
> + stat(s, DEACTIVATE_FULL);
And here.
> }
>
> #ifdef CONFIG_SLUB_CPU_PARTIAL
On Tue, Mar 08, 2022 at 09:40:07AM +0800, Xiongwei Song wrote:
> Hello,
>
> On Mon, Mar 7, 2022 at 3:41 PM Hyeonggon Yoo <[email protected]> wrote:
> >
> > Simplify deactivate_slab() by unlocking n->list_lock and retrying
> > cmpxchg_double() when cmpxchg_double() fails, and perform
> > add_{partial,full} only when it succeed.
> >
> > Releasing and taking n->list_lock again here is not harmful as SLUB
> > avoids deactivating slabs as much as possible.
> >
> > [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> > succeed.
> >
> > count deactivating full slabs even if debugging flag is not set. ]
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > ---
> > mm/slub.c | 91 +++++++++++++++++++++++--------------------------------
> > 1 file changed, 38 insertions(+), 53 deletions(-)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 1ce09b0347ad..f0cb9d0443ac 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2348,10 +2348,10 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> > static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > void *freelist)
> > {
> > - enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
> > + enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE, M_FULL_NOLIST };
> > struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> > - int lock = 0, free_delta = 0;
> > - enum slab_modes l = M_NONE, m = M_NONE;
> > + int free_delta = 0;
> > + enum slab_modes mode = M_NONE;
> > void *nextfree, *freelist_iter, *freelist_tail;
> > int tail = DEACTIVATE_TO_HEAD;
> > unsigned long flags = 0;
> > @@ -2393,14 +2393,10 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > * Ensure that the slab is unfrozen while the list presence
> > * reflects the actual number of objects during unfreeze.
> > *
> > - * We setup the list membership and then perform a cmpxchg
> > - * with the count. If there is a mismatch then the slab
> > - * is not unfrozen but the slab is on the wrong list.
> > - *
> > - * Then we restart the process which may have to remove
> > - * the slab from the list that we just put it on again
> > - * because the number of objects in the slab may have
> > - * changed.
> > + * We first perform cmpxchg holding lock and insert to list
> > + * when it succeed. If there is mismatch then the slab is not
> > + * unfrozen and number of objects in the slab may have changed.
> > + * Then release lock and retry cmpxchg again.
> > */
> > redo:
> >
> > @@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > new.frozen = 0;
> >
> > if (!new.inuse && n->nr_partial >= s->min_partial)
> > - m = M_FREE;
> > + mode = M_FREE;
> > else if (new.freelist) {
> > - m = M_PARTIAL;
> > - if (!lock) {
> > - lock = 1;
> > - /*
> > - * Taking the spinlock removes the possibility that
> > - * acquire_slab() will see a slab that is frozen
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > - } else {
> > - m = M_FULL;
> > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> > - lock = 1;
> > - /*
> > - * This also ensures that the scanning of full
> > - * slabs from diagnostic functions will not see
> > - * any frozen slabs.
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > - }
> > -
> > - if (l != m) {
> > - if (l == M_PARTIAL)
> > - remove_partial(n, slab);
> > - else if (l == M_FULL)
> > - remove_full(s, n, slab);
> > + mode = M_PARTIAL;
> > + /*
> > + * Taking the spinlock removes the possibility that
> > + * acquire_slab() will see a slab that is frozen
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> > + mode = M_FULL;
> > + /*
> > + * This also ensures that the scanning of full
> > + * slabs from diagnostic functions will not see
> > + * any frozen slabs.
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > + } else
> > + mode = M_FULL_NOLIST;
> >
> > - if (m == M_PARTIAL)
> > - add_partial(n, slab, tail);
> > - else if (m == M_FULL)
i> > - add_full(s, n, slab);
> > - }
> >
> > - l = m;
> > if (!cmpxchg_double_slab(s, slab,
> > old.freelist, old.counters,
> > new.freelist, new.counters,
> > - "unfreezing slab"))
> > + "unfreezing slab")) {
> > + if (mode == M_PARTIAL || mode == M_FULL)
> > + spin_unlock_irqrestore(&n->list_lock, flags);
>
> The slab doesn't belong to any node here, should we remove locking/unlocking
> spin for cmpxchg_double_slab() call? Just calling spin_lock_irqsave() before
> add_partial()/add_full call is fine?
>
I thought about that, and tested, but that is not okay.
taking spinlock around cmpxchg prevents race between __slab_free() and
deactivate_slab(). list can be corrupted without spinlock.
think about case below: (when SLAB_STORE_USER is set)
__slab_free() deactivate_slab()
================= =================
(deactivating full slab)
cmpxchg_double()
spin_lock_irqsave()
cmpxchg_double()
/* not in full list yet */
remove_full()
add_partial()
spin_unlock_irqrestore()
spin_lock_irqsave()
add_full()
spin_unlock_irqrestore()
> > goto redo;
>
> How about do {...} while(!cmpxchg_double_slab())? The readability looks better?
>
This will be:
do {
if (mode == M_PARTIAL || mode == M_FULL)
spin_unlock_irqrestore();
[...]
} while (!cmpxchg_double_slab());
I think goto version is better for reading?
Thanks!
> Regards,
> Xiongwei
>
> > + }
> >
> > - if (lock)
> > - spin_unlock_irqrestore(&n->list_lock, flags);
> >
> > - if (m == M_PARTIAL)
> > + if (mode == M_PARTIAL) {
> > + add_partial(n, slab, tail);
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > stat(s, tail);
> > - else if (m == M_FULL)
> > - stat(s, DEACTIVATE_FULL);
> > - else if (m == M_FREE) {
> > + } else if (mode == M_FREE) {
> > stat(s, DEACTIVATE_EMPTY);
> > discard_slab(s, slab);
> > stat(s, FREE_SLAB);
> > - }
> > + } else if (mode == M_FULL) {
> > + add_full(s, n, slab);
> > + spin_unlock_irqrestore(&n->list_lock, flags);y
> > + stat(s, DEACTIVATE_FULL);
> > + } else if (mode == M_FULL_NOLIST)
> > + stat(s, DEACTIVATE_FULL);
> > }
> >
> > #ifdef CONFIG_SLUB_CPU_PARTIAL
> > --
> > 2.33.1
> >
> >
--
Thank you, You are awesome!
Hyeonggon :-)
On Mon, Mar 07, 2022 at 05:40:42PM +0100, Vlastimil Babka wrote:
> On 3/7/22 08:40, Hyeonggon Yoo wrote:
> > Simplify deactivate_slab() by unlocking n->list_lock and retrying
> > cmpxchg_double() when cmpxchg_double() fails, and perform
> > add_{partial,full} only when it succeed.
> >
> > Releasing and taking n->list_lock again here is not harmful as SLUB
> > avoids deactivating slabs as much as possible.
> >
> > [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> > succeed.
> >
> > count deactivating full slabs even if debugging flag is not set. ]
> >
> > Signed-off-by: Hyeonggon Yoo <[email protected]>
>
> Reviewed-by: Vlastimil Babka <[email protected]>
>
> adding both to slab-next. Fixed up some nits myself, see below:
>
> >
> > @@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > new.frozen = 0;
> >
> > if (!new.inuse && n->nr_partial >= s->min_partial)
> > - m = M_FREE;
> > + mode = M_FREE;
> > else if (new.freelist) {
>
> This was against kernel style even before the patch - we use { } in the
> 'else if' branch, thus all branches should use { } even if one-line.
>
Ah, you are right. Agree with this change.
"Remove unnecessary brace" rule does not apply here.
> > - m = M_PARTIAL;
> > - if (!lock) {
> > - lock = 1;
> > - /*
> > - * Taking the spinlock removes the possibility that
> > - * acquire_slab() will see a slab that is frozen
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > - } else {
> > - m = M_FULL;
> > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> > - lock = 1;
> > - /*
> > - * This also ensures that the scanning of full
> > - * slabs from diagnostic functions will not see
> > - * any frozen slabs.
> > - */
> > - spin_lock_irqsave(&n->list_lock, flags);
> > - }
> > - }
> > -
> > - if (l != m) {
> > - if (l == M_PARTIAL)
> > - remove_partial(n, slab);
> > - else if (l == M_FULL)
> > - remove_full(s, n, slab);
> > + mode = M_PARTIAL;
> > + /*
> > + * Taking the spinlock removes the possibility that
> > + * acquire_slab() will see a slab that is frozen
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> > + mode = M_FULL;
> > + /*
> > + * This also ensures that the scanning of full
> > + * slabs from diagnostic functions will not see
> > + * any frozen slabs.
> > + */
> > + spin_lock_irqsave(&n->list_lock, flags);
> > + } else
> > + mode = M_FULL_NOLIST;
>
> Ditto here (this is new).
Yes.
>
> > - if (m == M_PARTIAL)
> > - add_partial(n, slab, tail);
> > - else if (m == M_FULL)
> > - add_full(s, n, slab);
> > - }
> >
> > - l = m;
> > if (!cmpxchg_double_slab(s, slab,
> > old.freelist, old.counters,
> > new.freelist, new.counters,
> > - "unfreezing slab"))
> > + "unfreezing slab")) {
> > + if (mode == M_PARTIAL || mode == M_FULL)
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > goto redo;
> > + }
> >
> > - if (lock)
> > - spin_unlock_irqrestore(&n->list_lock, flags);
> >
> > - if (m == M_PARTIAL)
> > + if (mode == M_PARTIAL) {
> > + add_partial(n, slab, tail);
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > stat(s, tail);
> > - else if (m == M_FULL)
> > - stat(s, DEACTIVATE_FULL);
> > - else if (m == M_FREE) {
> > + } else if (mode == M_FREE) {
> > stat(s, DEACTIVATE_EMPTY);
> > discard_slab(s, slab);
> > stat(s, FREE_SLAB);
> > - }
> > + } else if (mode == M_FULL) {
> > + add_full(s, n, slab);
> > + spin_unlock_irqrestore(&n->list_lock, flags);
> > + stat(s, DEACTIVATE_FULL);
> > + } else if (mode == M_FULL_NOLIST)
> > + stat(s, DEACTIVATE_FULL);
>
> And here.
>
Yes.
> > }
> >
> > #ifdef CONFIG_SLUB_CPU_PARTIAL
>
Thanks!
--
Thank you, You are awesome!
Hyeonggon :-)
Hello,
On Mon, Mar 7, 2022 at 3:41 PM Hyeonggon Yoo <[email protected]> wrote:
>
> Simplify deactivate_slab() by unlocking n->list_lock and retrying
> cmpxchg_double() when cmpxchg_double() fails, and perform
> add_{partial,full} only when it succeed.
>
> Releasing and taking n->list_lock again here is not harmful as SLUB
> avoids deactivating slabs as much as possible.
>
> [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> succeed.
>
> count deactivating full slabs even if debugging flag is not set. ]
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> ---
> mm/slub.c | 91 +++++++++++++++++++++++--------------------------------
> 1 file changed, 38 insertions(+), 53 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 1ce09b0347ad..f0cb9d0443ac 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2348,10 +2348,10 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> void *freelist)
> {
> - enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
> + enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE, M_FULL_NOLIST };
> struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> - int lock = 0, free_delta = 0;
> - enum slab_modes l = M_NONE, m = M_NONE;
> + int free_delta = 0;
> + enum slab_modes mode = M_NONE;
> void *nextfree, *freelist_iter, *freelist_tail;
> int tail = DEACTIVATE_TO_HEAD;
> unsigned long flags = 0;
> @@ -2393,14 +2393,10 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> * Ensure that the slab is unfrozen while the list presence
> * reflects the actual number of objects during unfreeze.
> *
> - * We setup the list membership and then perform a cmpxchg
> - * with the count. If there is a mismatch then the slab
> - * is not unfrozen but the slab is on the wrong list.
> - *
> - * Then we restart the process which may have to remove
> - * the slab from the list that we just put it on again
> - * because the number of objects in the slab may have
> - * changed.
> + * We first perform cmpxchg holding lock and insert to list
> + * when it succeed. If there is mismatch then the slab is not
> + * unfrozen and number of objects in the slab may have changed.
> + * Then release lock and retry cmpxchg again.
> */
> redo:
>
> @@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> new.frozen = 0;
>
> if (!new.inuse && n->nr_partial >= s->min_partial)
> - m = M_FREE;
> + mode = M_FREE;
> else if (new.freelist) {
> - m = M_PARTIAL;
> - if (!lock) {
> - lock = 1;
> - /*
> - * Taking the spinlock removes the possibility that
> - * acquire_slab() will see a slab that is frozen
> - */
> - spin_lock_irqsave(&n->list_lock, flags);
> - }
> - } else {
> - m = M_FULL;
> - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> - lock = 1;
> - /*
> - * This also ensures that the scanning of full
> - * slabs from diagnostic functions will not see
> - * any frozen slabs.
> - */
> - spin_lock_irqsave(&n->list_lock, flags);
> - }
> - }
> -
> - if (l != m) {
> - if (l == M_PARTIAL)
> - remove_partial(n, slab);
> - else if (l == M_FULL)
> - remove_full(s, n, slab);
> + mode = M_PARTIAL;
> + /*
> + * Taking the spinlock removes the possibility that
> + * acquire_slab() will see a slab that is frozen
> + */
> + spin_lock_irqsave(&n->list_lock, flags);
> + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> + mode = M_FULL;
> + /*
> + * This also ensures that the scanning of full
> + * slabs from diagnostic functions will not see
> + * any frozen slabs.
> + */
> + spin_lock_irqsave(&n->list_lock, flags);
> + } else
> + mode = M_FULL_NOLIST;
>
> - if (m == M_PARTIAL)
> - add_partial(n, slab, tail);
> - else if (m == M_FULL)
> - add_full(s, n, slab);
> - }
>
> - l = m;
> if (!cmpxchg_double_slab(s, slab,
> old.freelist, old.counters,
> new.freelist, new.counters,
> - "unfreezing slab"))
> + "unfreezing slab")) {
> + if (mode == M_PARTIAL || mode == M_FULL)
> + spin_unlock_irqrestore(&n->list_lock, flags);
The slab doesn't belong to any node here, should we remove locking/unlocking
spin for cmpxchg_double_slab() call? Just calling spin_lock_irqsave() before
add_partial()/add_full call is fine?
> goto redo;
How about do {...} while(!cmpxchg_double_slab())? The readability looks better?
Regards,
Xiongwei
> + }
>
> - if (lock)
> - spin_unlock_irqrestore(&n->list_lock, flags);
>
> - if (m == M_PARTIAL)
> + if (mode == M_PARTIAL) {
> + add_partial(n, slab, tail);
> + spin_unlock_irqrestore(&n->list_lock, flags);
> stat(s, tail);
> - else if (m == M_FULL)
> - stat(s, DEACTIVATE_FULL);
> - else if (m == M_FREE) {
> + } else if (mode == M_FREE) {
> stat(s, DEACTIVATE_EMPTY);
> discard_slab(s, slab);
> stat(s, FREE_SLAB);
> - }
> + } else if (mode == M_FULL) {
> + add_full(s, n, slab);
> + spin_unlock_irqrestore(&n->list_lock, flags);
> + stat(s, DEACTIVATE_FULL);
> + } else if (mode == M_FULL_NOLIST)
> + stat(s, DEACTIVATE_FULL);
> }
>
> #ifdef CONFIG_SLUB_CPU_PARTIAL
> --
> 2.33.1
>
>
On Tue, Mar 8, 2022 at 11:50 AM Hyeonggon Yoo <[email protected]> wrote:
>
> On Tue, Mar 08, 2022 at 09:40:07AM +0800, Xiongwei Song wrote:
> > Hello,
> >
> > On Mon, Mar 7, 2022 at 3:41 PM Hyeonggon Yoo <[email protected]> wrote:
> > >
> > > Simplify deactivate_slab() by unlocking n->list_lock and retrying
> > > cmpxchg_double() when cmpxchg_double() fails, and perform
> > > add_{partial,full} only when it succeed.
> > >
> > > Releasing and taking n->list_lock again here is not harmful as SLUB
> > > avoids deactivating slabs as much as possible.
> > >
> > > [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> > > succeed.
> > >
> > > count deactivating full slabs even if debugging flag is not set. ]
> > >
> > > Signed-off-by: Hyeonggon Yoo <[email protected]>
> > > ---
> > > mm/slub.c | 91 +++++++++++++++++++++++--------------------------------
> > > 1 file changed, 38 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 1ce09b0347ad..f0cb9d0443ac 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2348,10 +2348,10 @@ static void init_kmem_cache_cpus(struct kmem_cache *s)
> > > static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > > void *freelist)
> > > {
> > > - enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE };
> > > + enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE, M_FULL_NOLIST };
> > > struct kmem_cache_node *n = get_node(s, slab_nid(slab));
> > > - int lock = 0, free_delta = 0;
> > > - enum slab_modes l = M_NONE, m = M_NONE;
> > > + int free_delta = 0;
> > > + enum slab_modes mode = M_NONE;
> > > void *nextfree, *freelist_iter, *freelist_tail;
> > > int tail = DEACTIVATE_TO_HEAD;
> > > unsigned long flags = 0;
> > > @@ -2393,14 +2393,10 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > > * Ensure that the slab is unfrozen while the list presence
> > > * reflects the actual number of objects during unfreeze.
> > > *
> > > - * We setup the list membership and then perform a cmpxchg
> > > - * with the count. If there is a mismatch then the slab
> > > - * is not unfrozen but the slab is on the wrong list.
> > > - *
> > > - * Then we restart the process which may have to remove
> > > - * the slab from the list that we just put it on again
> > > - * because the number of objects in the slab may have
> > > - * changed.
> > > + * We first perform cmpxchg holding lock and insert to list
> > > + * when it succeed. If there is mismatch then the slab is not
> > > + * unfrozen and number of objects in the slab may have changed.
> > > + * Then release lock and retry cmpxchg again.
> > > */
> > > redo:
> > >
> > > @@ -2420,61 +2416,50 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab,
> > > new.frozen = 0;
> > >
> > > if (!new.inuse && n->nr_partial >= s->min_partial)
> > > - m = M_FREE;
> > > + mode = M_FREE;
> > > else if (new.freelist) {
> > > - m = M_PARTIAL;
> > > - if (!lock) {
> > > - lock = 1;
> > > - /*
> > > - * Taking the spinlock removes the possibility that
> > > - * acquire_slab() will see a slab that is frozen
> > > - */
> > > - spin_lock_irqsave(&n->list_lock, flags);
> > > - }
> > > - } else {
> > > - m = M_FULL;
> > > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) {
> > > - lock = 1;
> > > - /*
> > > - * This also ensures that the scanning of full
> > > - * slabs from diagnostic functions will not see
> > > - * any frozen slabs.
> > > - */
> > > - spin_lock_irqsave(&n->list_lock, flags);
> > > - }
> > > - }
> > > -
> > > - if (l != m) {
> > > - if (l == M_PARTIAL)
> > > - remove_partial(n, slab);
> > > - else if (l == M_FULL)
> > > - remove_full(s, n, slab);
> > > + mode = M_PARTIAL;
> > > + /*
> > > + * Taking the spinlock removes the possibility that
> > > + * acquire_slab() will see a slab that is frozen
> > > + */
> > > + spin_lock_irqsave(&n->list_lock, flags);
> > > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) {
> > > + mode = M_FULL;
> > > + /*
> > > + * This also ensures that the scanning of full
> > > + * slabs from diagnostic functions will not see
> > > + * any frozen slabs.
> > > + */
> > > + spin_lock_irqsave(&n->list_lock, flags);
> > > + } else
> > > + mode = M_FULL_NOLIST;
> > >
> > > - if (m == M_PARTIAL)
> > > - add_partial(n, slab, tail);
> > > - else if (m == M_FULL)
> i> > - add_full(s, n, slab);
> > > - }
> > >
> > > - l = m;
> > > if (!cmpxchg_double_slab(s, slab,
> > > old.freelist, old.counters,
> > > new.freelist, new.counters,
> > > - "unfreezing slab"))
> > > + "unfreezing slab")) {
> > > + if (mode == M_PARTIAL || mode == M_FULL)
> > > + spin_unlock_irqrestore(&n->list_lock, flags);
> >
> > The slab doesn't belong to any node here, should we remove locking/unlocking
> > spin for cmpxchg_double_slab() call? Just calling spin_lock_irqsave() before
> > add_partial()/add_full call is fine?
> >
>
> I thought about that, and tested, but that is not okay.
>
> taking spinlock around cmpxchg prevents race between __slab_free() and
> deactivate_slab(). list can be corrupted without spinlock.
>
>
> think about case below: (when SLAB_STORE_USER is set)
>
> __slab_free() deactivate_slab()
> ================= =================
> (deactivating full slab)
> cmpxchg_double()
>
>
> spin_lock_irqsave()
> cmpxchg_double()
>
> /* not in full list yet */
> remove_full()
> add_partial()
> spin_unlock_irqrestore()
> spin_lock_irqsave()
> add_full()
> spin_unlock_irqrestore()
>
Oh... Looks reasonable. Thanks for the detailed explanation.
>
>
> > > goto redo;
> >
> > How about do {...} while(!cmpxchg_double_slab())? The readability looks better?
> >
>
> This will be:
>
> do {
> if (mode == M_PARTIAL || mode == M_FULL)
> spin_unlock_irqrestore();
>
> [...]
>
> } while (!cmpxchg_double_slab());
>
I saw __slab_free() is doing so. Not a big deal.
Regards,
Xiongwei
> I think goto version is better for reading?
>
> Thanks!
>
> > Regards,
> > Xiongwei
> >
> > > + }
> > >
> > > - if (lock)
> > > - spin_unlock_irqrestore(&n->list_lock, flags);
> > >
> > > - if (m == M_PARTIAL)
> > > + if (mode == M_PARTIAL) {
> > > + add_partial(n, slab, tail);
> > > + spin_unlock_irqrestore(&n->list_lock, flags);
> > > stat(s, tail);
> > > - else if (m == M_FULL)
> > > - stat(s, DEACTIVATE_FULL);
> > > - else if (m == M_FREE) {
> > > + } else if (mode == M_FREE) {
> > > stat(s, DEACTIVATE_EMPTY);
> > > discard_slab(s, slab);
> > > stat(s, FREE_SLAB);
> > > - }
> > > + } else if (mode == M_FULL) {
> > > + add_full(s, n, slab);
> > > + spin_unlock_irqrestore(&n->list_lock, flags);y
> > > + stat(s, DEACTIVATE_FULL);
> > > + } else if (mode == M_FULL_NOLIST)
> > > + stat(s, DEACTIVATE_FULL);
> > > }
> > >
> > > #ifdef CONFIG_SLUB_CPU_PARTIAL
> > > --
> > > 2.33.1
> > >
> > >
>
> --
> Thank you, You are awesome!
> Hyeonggon :-)
On Mon, Mar 07, 2022 at 07:40:56AM +0000, Hyeonggon Yoo wrote:
> Simplify deactivate_slab() by unlocking n->list_lock and retrying
> cmpxchg_double() when cmpxchg_double() fails, and perform
> add_{partial,full} only when it succeed.
>
> Releasing and taking n->list_lock again here is not harmful as SLUB
> avoids deactivating slabs as much as possible.
>
> [ [email protected]: perform add_{partial,full} when cmpxchg_double()
> succeed.
>
> count deactivating full slabs even if debugging flag is not set. ]
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
Reviewed-by: Roman Gushchin <[email protected]>
Thanks!
On Mon, Mar 07, 2022 at 07:40:55AM +0000, Hyeonggon Yoo wrote:
> SLUB sets number of minimum partial slabs for node (min_partial)
> using set_min_partial(). SLUB holds at least min_partial slabs even if
> they're empty to avoid excessive use of page allocator.
>
> set_min_partial() limits value of min_partial limits value of
> min_partial MIN_PARTIAL and MAX_PARTIAL. As set_min_partial() can be
> called by min_partial_store() too, Only limit value of min_partial
> in kmem_cache_open() so that it can be changed to value that a user wants.
>
> [ [email protected]: Fold set_min_partial() into its callers ]
>
> Signed-off-by: Hyeonggon Yoo <[email protected]>
> Reviewed-by: Vlastimil Babka <[email protected]>
LGTM!
Reviewed-by: Roman Gushchin <[email protected]>