2019-09-12 01:02:57

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock

On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
>
> Instead, statically allocate bitmap in struct kmem_cache_node. Given
> currently page->objects has 15 bits, we bloat the per-node struct by
> 4K. So we waste some memory but only do so when slub debug is on.

Why not have single page total protected by a lock?

Listing object from two pages at the same time doesn't make sense anyway.
Cuncurent validating is not something sane to do.

--
Kirill A. Shutemov


2019-09-12 01:34:04

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH 2/3] mm: avoid slub allocation while holding list_lock

On Thu, Sep 12, 2019 at 03:44:01AM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 06:29:28PM -0600, Yu Zhao wrote:
> > If we are already under list_lock, don't call kmalloc(). Otherwise we
> > will run into deadlock because kmalloc() also tries to grab the same
> > lock.
> >
> > Instead, statically allocate bitmap in struct kmem_cache_node. Given
> > currently page->objects has 15 bits, we bloat the per-node struct by
> > 4K. So we waste some memory but only do so when slub debug is on.
>
> Why not have single page total protected by a lock?
>
> Listing object from two pages at the same time doesn't make sense anyway.
> Cuncurent validating is not something sane to do.

Okay, cutting down to static global bitmap.

2019-09-12 02:33:52

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v2 1/4] mm: correct mask size for slub page->objects

Mask of slub objects per page shouldn't be larger than what
page->objects can hold.

It requires more than 2^15 objects to hit the problem, and I don't
think anybody would. It'd be nice to have the mask fixed, but not
really worth cc'ing the stable.

Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..62053ceb4464 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -187,7 +187,7 @@ static inline bool kmem_cache_has_cpu_partial(struct kmem_cache *s)
*/
#define DEBUG_METADATA_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER)

-#define OO_SHIFT 16
+#define OO_SHIFT 15
#define OO_MASK ((1 << OO_SHIFT) - 1)
#define MAX_OBJS_PER_PAGE 32767 /* since page.objects is u15 */

@@ -343,6 +343,8 @@ static inline unsigned int oo_order(struct kmem_cache_order_objects x)

static inline unsigned int oo_objects(struct kmem_cache_order_objects x)
{
+ BUILD_BUG_ON(OO_MASK > MAX_OBJS_PER_PAGE);
+
return x.x & OO_MASK;
}

--
2.23.0.162.g0b9fbb3734-goog

2019-09-12 02:33:52

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v2 2/4] mm: clean up validate_slab()

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 62053ceb4464..7b7e1ee264ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);

- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;

/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);

get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;

- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}

static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.162.g0b9fbb3734-goog

2019-09-12 02:35:22

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v2 4/4] mm: lock slub page when listing objects

Though I have no idea what the side effect of such race would be,
apparently we want to prevent the free list from being changed
while debugging the objects.

Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index baa60dd73942..1c9726c28f0b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4608,11 +4608,15 @@ static void process_slab(struct loc_track *t, struct kmem_cache *s,
void *p;
unsigned long *map;

+ slab_lock(page);
+
map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
put_map(map);
+
+ slab_unlock(page);
}

static int list_locations(struct kmem_cache *s, char *buf,
--
2.23.0.162.g0b9fbb3734-goog

2019-09-12 02:37:47

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);

*** DEADLOCK ***

Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 7b7e1ee264ef..baa60dd73942 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -443,19 +443,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}

#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);

+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}

static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3685,13 +3704,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);

- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {

if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3699,8 +3717,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}

@@ -4386,19 +4405,19 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);

if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;

/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4406,18 +4425,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}

static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4426,7 +4440,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);

list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4437,7 +4451,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;

list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4454,15 +4468,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;

flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4592,18 +4602,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}

static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;

- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}

static int list_locations(struct kmem_cache *s, char *buf,
@@ -4614,11 +4623,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);

- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4633,9 +4640,9 @@ static int list_locations(struct kmem_cache *s, char *buf,

spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}

@@ -4684,7 +4691,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}

free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.162.g0b9fbb3734-goog

2019-09-12 09:43:02

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects

On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> Mask of slub objects per page shouldn't be larger than what
> page->objects can hold.
>
> It requires more than 2^15 objects to hit the problem, and I don't
> think anybody would. It'd be nice to have the mask fixed, but not
> really worth cc'ing the stable.
>
> Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> Signed-off-by: Yu Zhao <[email protected]>

I don't think the patch fixes anything.

Yes, we have one spare bit between order and number of object that is not
used and always zero. So what?

I can imagine for some microarchitecures accessing higher 16 bits of int
is cheaper than shifting by 15.

--
Kirill A. Shutemov

2019-09-12 09:48:50

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 2/4] mm: clean up validate_slab()

On Wed, Sep 11, 2019 at 08:31:09PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
>
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
>
> Signed-off-by: Yu Zhao <[email protected]>
> ---
> mm/slub.c | 21 ++++++++-------------
> 1 file changed, 8 insertions(+), 13 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 62053ceb4464..7b7e1ee264ef 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4386,31 +4386,26 @@ static int count_total(struct page *page)
> #endif
>
> #ifdef CONFIG_SLUB_DEBUG
> -static int validate_slab(struct kmem_cache *s, struct page *page,
> +static void validate_slab(struct kmem_cache *s, struct page *page,
> unsigned long *map)
> {
> void *p;
> void *addr = page_address(page);
>
> - if (!check_slab(s, page) ||
> - !on_freelist(s, page, NULL))
> - return 0;
> + if (!check_slab(s, page) || !on_freelist(s, page, NULL))
> + return;
>
> /* Now we know that a valid freelist exists */
> bitmap_zero(map, page->objects);
>
> get_map(s, page, map);
> for_each_object(p, s, addr, page->objects) {
> - if (test_bit(slab_index(p, s, addr), map))
> - if (!check_object(s, page, p, SLUB_RED_INACTIVE))
> - return 0;
> - }
> + u8 val = test_bit(slab_index(p, s, addr), map) ?
> + SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;

Proper 'if' would be more readable.

Other than that look fine to me.

>
> - for_each_object(p, s, addr, page->objects)
> - if (!test_bit(slab_index(p, s, addr), map))
> - if (!check_object(s, page, p, SLUB_RED_ACTIVE))
> - return 0;
> - return 1;
> + if (!check_object(s, page, p, val))
> + break;
> + }
> }
>
> static void validate_slab_slab(struct kmem_cache *s, struct page *page,
> --
> 2.23.0.162.g0b9fbb3734-goog
>

--
Kirill A. Shutemov

2019-09-12 10:10:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] mm: avoid slub allocation while holding list_lock

On Wed, Sep 11, 2019 at 08:31:10PM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
>
> Fixing the problem by using a static bitmap instead.
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> mount-encrypted/4921 is trying to acquire lock:
> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
> but task is already holding lock:
> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&n->list_lock)->rlock);
> lock(&(&n->list_lock)->rlock);
>
> *** DEADLOCK ***
>
> Signed-off-by: Yu Zhao <[email protected]>

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kirill A. Shutemov

2019-09-12 10:11:43

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm: lock slub page when listing objects

On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.

Codewise looks good to me. But commit message can be better.

Can we repharase it to indicate that slab_lock is required to protect
page->objects?

--
Kirill A. Shutemov

2019-09-12 22:24:33

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] mm: lock slub page when listing objects

On Thu, Sep 12, 2019 at 01:06:42PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:11PM -0600, Yu Zhao wrote:
> > Though I have no idea what the side effect of such race would be,
> > apparently we want to prevent the free list from being changed
> > while debugging the objects.
>
> Codewise looks good to me. But commit message can be better.
>
> Can we repharase it to indicate that slab_lock is required to protect
> page->objects?

Will do.

2019-09-12 22:34:57

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects

On Thu, Sep 12, 2019 at 03:11:14PM -0600, Yu Zhao wrote:
> On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> > On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > > Mask of slub objects per page shouldn't be larger than what
> > > page->objects can hold.
> > >
> > > It requires more than 2^15 objects to hit the problem, and I don't
> > > think anybody would. It'd be nice to have the mask fixed, but not
> > > really worth cc'ing the stable.
> > >
> > > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > > Signed-off-by: Yu Zhao <[email protected]>
> >
> > I don't think the patch fixes anything.
>
> Technically it does. It makes no sense for a mask to have more bits
> than the variable that holds the masked value. I had to look up the
> commit history to find out why and go through the code to make sure
> it doesn't actually cause any problem.
>
> My hope is that nobody else would have to go through the same trouble.

Just put some comments then.

--
Kirill A. Shutemov

2019-09-13 00:34:09

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] mm: correct mask size for slub page->objects

On Thu, Sep 12, 2019 at 12:40:35PM +0300, Kirill A. Shutemov wrote:
> On Wed, Sep 11, 2019 at 08:31:08PM -0600, Yu Zhao wrote:
> > Mask of slub objects per page shouldn't be larger than what
> > page->objects can hold.
> >
> > It requires more than 2^15 objects to hit the problem, and I don't
> > think anybody would. It'd be nice to have the mask fixed, but not
> > really worth cc'ing the stable.
> >
> > Fixes: 50d5c41cd151 ("slub: Do not use frozen page flag but a bit in the page counters")
> > Signed-off-by: Yu Zhao <[email protected]>
>
> I don't think the patch fixes anything.

Technically it does. It makes no sense for a mask to have more bits
than the variable that holds the masked value. I had to look up the
commit history to find out why and go through the code to make sure
it doesn't actually cause any problem.

My hope is that nobody else would have to go through the same trouble.

> Yes, we have one spare bit between order and number of object that is not
> used and always zero. So what?
>
> I can imagine for some microarchitecures accessing higher 16 bits of int
> is cheaper than shifting by 15.

Well, I highly doubt the inconsistency is intended for such
optimization, even it existed.

Subject: Re: [PATCH v2 4/4] mm: lock slub page when listing objects

On Wed, 11 Sep 2019, Yu Zhao wrote:

> Though I have no idea what the side effect of such race would be,
> apparently we want to prevent the free list from being changed
> while debugging the objects.

process_slab() is called under the list_lock which prevents any allocation
from the free list in the slab page. This means that new objects can be
added to the freelist which occurs by updating the freelist pointer in the
slab page with a pointer to the newly free object which in turn contains
the old freelist pointr.

It is therefore possible to safely traverse the objects on the freelist
after the pointer has been retrieved

NAK.

2019-09-14 14:22:14

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v3 1/2] mm: clean up validate_slab()

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..445ef8b2aec0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4384,31 +4384,26 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);

- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;

/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);

get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;

- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}

static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.23.0.237.gc6a4ce50a0-goog

2019-09-14 16:17:25

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v3 2/2] mm: avoid slub allocation while holding list_lock

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);

*** DEADLOCK ***

Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 445ef8b2aec0..c1de01730648 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}

#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);

+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}

static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3683,13 +3702,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);

- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {

if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3697,8 +3715,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}

@@ -4384,19 +4403,19 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);

if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;

/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4404,18 +4423,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}

static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4424,7 +4438,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);

list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4435,7 +4449,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;

list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4452,15 +4466,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;

flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4590,18 +4600,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}

static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;

- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}

static int list_locations(struct kmem_cache *s, char *buf,
@@ -4612,11 +4621,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);

- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4631,9 +4638,9 @@ static int list_locations(struct kmem_cache *s, char *buf,

spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}

@@ -4682,7 +4689,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}

free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.23.0.237.gc6a4ce50a0-goog

2019-09-16 10:36:37

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] mm: clean up validate_slab()

On Fri, Sep 13, 2019 at 06:07:42PM -0600, Yu Zhao wrote:
> The function doesn't need to return any value, and the check can be
> done in one pass.
>
> There is a behavior change: before the patch, we stop at the first
> invalid free object; after the patch, we stop at the first invalid
> object, free or in use. This shouldn't matter because the original
> behavior isn't intended anyway.
>
> Signed-off-by: Yu Zhao <[email protected]>

Acked-by: Kirill A. Shutemov <[email protected]>

--
Kirill A. Shutemov

2019-11-08 19:41:23

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v4 1/2] mm: clean up validate_slab()

The function doesn't need to return any value, and the check can be
done in one pass.

There is a behavior change: before the patch, we stop at the first
invalid free object; after the patch, we stop at the first invalid
object, free or in use. This shouldn't matter because the original
behavior isn't intended anyway.

Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 21 ++++++++-------------
1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index b25c807a111f..6930c3febad7 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4404,31 +4404,26 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static int validate_slab(struct kmem_cache *s, struct page *page,
+static void validate_slab(struct kmem_cache *s, struct page *page,
unsigned long *map)
{
void *p;
void *addr = page_address(page);

- if (!check_slab(s, page) ||
- !on_freelist(s, page, NULL))
- return 0;
+ if (!check_slab(s, page) || !on_freelist(s, page, NULL))
+ return;

/* Now we know that a valid freelist exists */
bitmap_zero(map, page->objects);

get_map(s, page, map);
for_each_object(p, s, addr, page->objects) {
- if (test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_INACTIVE))
- return 0;
- }
+ u8 val = test_bit(slab_index(p, s, addr), map) ?
+ SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;

- for_each_object(p, s, addr, page->objects)
- if (!test_bit(slab_index(p, s, addr), map))
- if (!check_object(s, page, p, SLUB_RED_ACTIVE))
- return 0;
- return 1;
+ if (!check_object(s, page, p, val))
+ break;
+ }
}

static void validate_slab_slab(struct kmem_cache *s, struct page *page,
--
2.24.0.rc1.363.gb1bccd3e3d-goog

2019-11-08 19:43:25

by Yu Zhao

[permalink] [raw]
Subject: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Fixing the problem by using a static bitmap instead.

WARNING: possible recursive locking detected
--------------------------------------------
mount-encrypted/4921 is trying to acquire lock:
(&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

but task is already holding lock:
(&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

other info that might help us debug this:
Possible unsafe locking scenario:

CPU0
----
lock(&(&n->list_lock)->rlock);
lock(&(&n->list_lock)->rlock);

*** DEADLOCK ***

Acked-by: Kirill A. Shutemov <[email protected]>
Signed-off-by: Yu Zhao <[email protected]>
---
mm/slub.c | 88 +++++++++++++++++++++++++++++--------------------------
1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 6930c3febad7..7a4ec3c4b4d9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -441,19 +441,38 @@ static inline bool cmpxchg_double_slab(struct kmem_cache *s, struct page *page,
}

#ifdef CONFIG_SLUB_DEBUG
+static unsigned long object_map[BITS_TO_LONGS(MAX_OBJS_PER_PAGE)];
+static DEFINE_SPINLOCK(object_map_lock);
+
/*
* Determine a map of object in use on a page.
*
* Node listlock must be held to guarantee that the page does
* not vanish from under us.
*/
-static void get_map(struct kmem_cache *s, struct page *page, unsigned long *map)
+static unsigned long *get_map(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);

+ VM_BUG_ON(!irqs_disabled());
+
+ spin_lock(&object_map_lock);
+
+ bitmap_zero(object_map, page->objects);
+
for (p = page->freelist; p; p = get_freepointer(s, p))
- set_bit(slab_index(p, s, addr), map);
+ set_bit(slab_index(p, s, addr), object_map);
+
+ return object_map;
+}
+
+static void put_map(unsigned long *map)
+{
+ VM_BUG_ON(map != object_map);
+ lockdep_assert_held(&object_map_lock);
+
+ spin_unlock(&object_map_lock);
}

static inline unsigned int size_from_object(struct kmem_cache *s)
@@ -3695,13 +3714,12 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
+ unsigned long *map;
+
slab_err(s, page, text, s->name);
slab_lock(page);

- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {

if (!test_bit(slab_index(p, s, addr), map)) {
@@ -3709,8 +3727,9 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page,
print_tracking(s, p);
}
}
+ put_map(map);
+
slab_unlock(page);
- bitmap_free(map);
#endif
}

@@ -4404,19 +4423,19 @@ static int count_total(struct page *page)
#endif

#ifdef CONFIG_SLUB_DEBUG
-static void validate_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
+static void validate_slab(struct kmem_cache *s, struct page *page)
{
void *p;
void *addr = page_address(page);
+ unsigned long *map;
+
+ slab_lock(page);

if (!check_slab(s, page) || !on_freelist(s, page, NULL))
- return;
+ goto unlock;

/* Now we know that a valid freelist exists */
- bitmap_zero(map, page->objects);
-
- get_map(s, page, map);
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects) {
u8 val = test_bit(slab_index(p, s, addr), map) ?
SLUB_RED_INACTIVE : SLUB_RED_ACTIVE;
@@ -4424,18 +4443,13 @@ static void validate_slab(struct kmem_cache *s, struct page *page,
if (!check_object(s, page, p, val))
break;
}
-}
-
-static void validate_slab_slab(struct kmem_cache *s, struct page *page,
- unsigned long *map)
-{
- slab_lock(page);
- validate_slab(s, page, map);
+ put_map(map);
+unlock:
slab_unlock(page);
}

static int validate_slab_node(struct kmem_cache *s,
- struct kmem_cache_node *n, unsigned long *map)
+ struct kmem_cache_node *n)
{
unsigned long count = 0;
struct page *page;
@@ -4444,7 +4458,7 @@ static int validate_slab_node(struct kmem_cache *s,
spin_lock_irqsave(&n->list_lock, flags);

list_for_each_entry(page, &n->partial, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != n->nr_partial)
@@ -4455,7 +4469,7 @@ static int validate_slab_node(struct kmem_cache *s,
goto out;

list_for_each_entry(page, &n->full, slab_list) {
- validate_slab_slab(s, page, map);
+ validate_slab(s, page);
count++;
}
if (count != atomic_long_read(&n->nr_slabs))
@@ -4472,15 +4486,11 @@ static long validate_slab_cache(struct kmem_cache *s)
int node;
unsigned long count = 0;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
-
- if (!map)
- return -ENOMEM;

flush_all(s);
for_each_kmem_cache_node(s, node, n)
- count += validate_slab_node(s, n, map);
- bitmap_free(map);
+ count += validate_slab_node(s, n);
+
return count;
}
/*
@@ -4610,18 +4620,17 @@ static int add_location(struct loc_track *t, struct kmem_cache *s,
}

static void process_slab(struct loc_track *t, struct kmem_cache *s,
- struct page *page, enum track_item alloc,
- unsigned long *map)
+ struct page *page, enum track_item alloc)
{
void *addr = page_address(page);
void *p;
+ unsigned long *map;

- bitmap_zero(map, page->objects);
- get_map(s, page, map);
-
+ map = get_map(s, page);
for_each_object(p, s, addr, page->objects)
if (!test_bit(slab_index(p, s, addr), map))
add_location(t, s, get_track(s, p, alloc));
+ put_map(map);
}

static int list_locations(struct kmem_cache *s, char *buf,
@@ -4632,11 +4641,9 @@ static int list_locations(struct kmem_cache *s, char *buf,
struct loc_track t = { 0, 0, NULL };
int node;
struct kmem_cache_node *n;
- unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);

- if (!map || !alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
- GFP_KERNEL)) {
- bitmap_free(map);
+ if (!alloc_loc_track(&t, PAGE_SIZE / sizeof(struct location),
+ GFP_KERNEL)) {
return sprintf(buf, "Out of memory\n");
}
/* Push back cpu slabs */
@@ -4651,9 +4658,9 @@ static int list_locations(struct kmem_cache *s, char *buf,

spin_lock_irqsave(&n->list_lock, flags);
list_for_each_entry(page, &n->partial, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
list_for_each_entry(page, &n->full, slab_list)
- process_slab(&t, s, page, alloc, map);
+ process_slab(&t, s, page, alloc);
spin_unlock_irqrestore(&n->list_lock, flags);
}

@@ -4702,7 +4709,6 @@ static int list_locations(struct kmem_cache *s, char *buf,
}

free_loc_track(&t);
- bitmap_free(map);
if (!t.count)
len += sprintf(buf, "No data\n");
return len;
--
2.24.0.rc1.363.gb1bccd3e3d-goog

Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

On Fri, 8 Nov 2019, Yu Zhao wrote:

> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.

How did this happen? The kmalloc needs to be always done before the
list_lock is taken.

> Fixing the problem by using a static bitmap instead.
>
> WARNING: possible recursive locking detected
> --------------------------------------------
> mount-encrypted/4921 is trying to acquire lock:
> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
>
> but task is already holding lock:
> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&(&n->list_lock)->rlock);
> lock(&(&n->list_lock)->rlock);
>
> *** DEADLOCK ***


Ahh. list_slab_objects() in shutdown?

There is a much easier fix for this:



[FIX] slub: Remove kmalloc under list_lock from list_slab_objects()

list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.

And there it seems we take the list_lock and call kmalloc while holding that lock.

Perform the allocation in free_partial() before the list_lock is taken.

Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter <[email protected]>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000
@@ -3690,14 +3690,11 @@ error:
}

static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
- if (!map)
- return;
slab_err(s, page, text, s->name);
slab_lock(page);

@@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+
+ if (!map)
+ return;

BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,7 +3733,8 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);

2019-11-09 23:03:33

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

On Sat, Nov 09, 2019 at 08:52:29PM +0000, Christopher Lameter wrote:
> On Fri, 8 Nov 2019, Yu Zhao wrote:
>
> > If we are already under list_lock, don't call kmalloc(). Otherwise we
> > will run into deadlock because kmalloc() also tries to grab the same
> > lock.
>
> How did this happen? The kmalloc needs to be always done before the
> list_lock is taken.
>
> > Fixing the problem by using a static bitmap instead.
> >
> > WARNING: possible recursive locking detected
> > --------------------------------------------
> > mount-encrypted/4921 is trying to acquire lock:
> > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >
> > but task is already holding lock:
> > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&(&n->list_lock)->rlock);
> > lock(&(&n->list_lock)->rlock);
> >
> > *** DEADLOCK ***
>
>
> Ahh. list_slab_objects() in shutdown?
>
> There is a much easier fix for this:
>
>
>
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects()
>
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
>
> And there it seems we take the list_lock and call kmalloc while holding that lock.
>
> Perform the allocation in free_partial() before the list_lock is taken.
>
> Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
> Signed-off-by: Christoph Lameter <[email protected]>
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c 2019-11-09 20:43:52.374187381 +0000
> @@ -3690,14 +3690,11 @@ error:
> }
>
> static void list_slab_objects(struct kmem_cache *s, struct page *page,
> - const char *text)
> + const char *text, unsigned long *map)
> {
> #ifdef CONFIG_SLUB_DEBUG
> void *addr = page_address(page);
> void *p;
> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> - if (!map)
> - return;
> slab_err(s, page, text, s->name);
> slab_lock(page);
>
> @@ -3723,6 +3720,10 @@ static void free_partial(struct kmem_cac
> {
> LIST_HEAD(discard);
> struct page *page, *h;
> + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +
> + if (!map)
> + return;

What would happen if we are trying to allocate from the slab that is
being shut down? And shouldn't the allocation be conditional (i.e.,
only when CONFIG_SLUB_DEBUG=y)?

Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

On Sat, 9 Nov 2019, Yu Zhao wrote:

> > struct page *page, *h;
> > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > +
> > + if (!map)
> > + return;
>
> What would happen if we are trying to allocate from the slab that is
> being shut down? And shouldn't the allocation be conditional (i.e.,
> only when CONFIG_SLUB_DEBUG=y)?

Kmalloc slabs are never shut down.

The allocation does not hurt and CONFIG_SLUB_DEBUG is on in most
configurations.


2019-11-10 18:50:02

by Yu Zhao

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> On Sat, 9 Nov 2019, Yu Zhao wrote:
>
> > > struct page *page, *h;
> > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > +
> > > + if (!map)
> > > + return;
> >
> > What would happen if we are trying to allocate from the slab that is
> > being shut down? And shouldn't the allocation be conditional (i.e.,
> > only when CONFIG_SLUB_DEBUG=y)?
>
> Kmalloc slabs are never shut down.

Maybe I'm not thinking straight -- isn't it what caused the deadlock in
the first place?

Kmalloc slabs can be shut down when memcg is on.

Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

On Sun, 10 Nov 2019, Yu Zhao wrote:

> On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > On Sat, 9 Nov 2019, Yu Zhao wrote:
> >
> > > > struct page *page, *h;
> > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > +
> > > > + if (!map)
> > > > + return;
> > >
> > > What would happen if we are trying to allocate from the slab that is
> > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > only when CONFIG_SLUB_DEBUG=y)?
> >
> > Kmalloc slabs are never shut down.
>
> Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> the first place?

Well if kmalloc allocations become a problem then we have numerous
issues all over the kernel to fix.

> Kmalloc slabs can be shut down when memcg is on.

Kmalloc needs to work even during shutdown of a memcg.

Maybe we need to fix memcg to not allocate from the current memcg during
shutdown?


Subject: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

Regardless of the issue with memcgs allowing allocations from its
kmalloc array during shutdown: This patch cleans things up and properly
allocates the bitmap outside of the list_lock.


[FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.

list_slab_objects() is called when a slab is destroyed and there are objects still left
to list the objects in the syslog. This is a pretty rare event.

And there it seems we take the list_lock and call kmalloc while holding that lock.

Perform the allocation in free_partial() before the list_lock is taken.

Fixes: bbd7d57bfe852d9788bae5fb171c7edb4021d8ac ("slub: Potential stack overflow")
Signed-off-by: Christoph Lameter

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
+++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
@@ -3690,14 +3690,15 @@ error:
}

static void list_slab_objects(struct kmem_cache *s, struct page *page,
- const char *text)
+ const char *text, unsigned long *map)
{
#ifdef CONFIG_SLUB_DEBUG
void *addr = page_address(page);
void *p;
- unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+
if (!map)
return;
+
slab_err(s, page, text, s->name);
slab_lock(page);

@@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
}
}
slab_unlock(page);
- bitmap_free(map);
#endif
}

@@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
{
LIST_HEAD(discard);
struct page *page, *h;
+ unsigned long *map = NULL;
+
+#ifdef CONFIG_SLUB_DEBUG
+ map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
+#endif

BUG_ON(irqs_disabled());
spin_lock_irq(&n->list_lock);
@@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
list_add(&page->slab_list, &discard);
} else {
list_slab_objects(s, page,
- "Objects remaining in %s on __kmem_cache_shutdown()");
+ "Objects remaining in %s on __kmem_cache_shutdown()",
+ map);
}
}
spin_unlock_irq(&n->list_lock);

+#ifdef CONFIG_SLUB_DEBUG
+ bitmap_free(map);
+#endif
+
list_for_each_entry_safe(page, h, &discard, slab_list)
discard_slab(s, page);
}

2019-11-11 18:17:01

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] mm: avoid slub allocation while holding list_lock

+Roman Gushchin

On Mon, Nov 11, 2019 at 7:47 AM Christopher Lameter <[email protected]> wrote:
>
> On Sun, 10 Nov 2019, Yu Zhao wrote:
>
> > On Sat, Nov 09, 2019 at 11:16:28PM +0000, Christopher Lameter wrote:
> > > On Sat, 9 Nov 2019, Yu Zhao wrote:
> > >
> > > > > struct page *page, *h;
> > > > > + unsigned long *map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> > > > > +
> > > > > + if (!map)
> > > > > + return;
> > > >
> > > > What would happen if we are trying to allocate from the slab that is
> > > > being shut down? And shouldn't the allocation be conditional (i.e.,
> > > > only when CONFIG_SLUB_DEBUG=y)?
> > >
> > > Kmalloc slabs are never shut down.
> >
> > Maybe I'm not thinking straight -- isn't it what caused the deadlock in
> > the first place?
>
> Well if kmalloc allocations become a problem then we have numerous
> issues all over the kernel to fix.
>
> > Kmalloc slabs can be shut down when memcg is on.
>
> Kmalloc needs to work even during shutdown of a memcg.
>
> Maybe we need to fix memcg to not allocate from the current memcg during
> shutdown?
>
>

Roman recently added reparenting of memcg kmem caches on memcg offline
and can comment in more detail but we don't shutdown a kmem cache
until all the in-fly memcg allocations are resolved. Also the
allocation here does not look like a __GFP_ACCOUNT allocation.

2019-11-30 23:10:54

by Andrew Morton

[permalink] [raw]
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On Mon, 11 Nov 2019 15:55:05 +0000 (UTC) Christopher Lameter <[email protected]> wrote:

> Regardless of the issue with memcgs allowing allocations from its
> kmalloc array during shutdown: This patch cleans things up and properly
> allocates the bitmap outside of the list_lock.
>
>
> [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
>
> V1->V2 : Properly handle CONFIG_SLUB_DEBUG. Handle bitmap free correctly.
>
> list_slab_objects() is called when a slab is destroyed and there are objects still left
> to list the objects in the syslog. This is a pretty rare event.
>
> And there it seems we take the list_lock and call kmalloc while holding that lock.
>
> Perform the allocation in free_partial() before the list_lock is taken.

No response here? It looks a lot simpler than the originally proposed
patch?

> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
> @@ -3690,14 +3690,15 @@ error:
> }
>
> static void list_slab_objects(struct kmem_cache *s, struct page *page,
> - const char *text)
> + const char *text, unsigned long *map)
> {
> #ifdef CONFIG_SLUB_DEBUG
> void *addr = page_address(page);
> void *p;
> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
> +
> if (!map)
> return;
> +
> slab_err(s, page, text, s->name);
> slab_lock(page);
>
> @@ -3710,7 +3711,6 @@ static void list_slab_objects(struct kme
> }
> }
> slab_unlock(page);
> - bitmap_free(map);
> #endif
> }
>
> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
> {
> LIST_HEAD(discard);
> struct page *page, *h;
> + unsigned long *map = NULL;
> +
> +#ifdef CONFIG_SLUB_DEBUG
> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);
> +#endif
>
> BUG_ON(irqs_disabled());
> spin_lock_irq(&n->list_lock);
> @@ -3732,11 +3737,16 @@ static void free_partial(struct kmem_cac
> list_add(&page->slab_list, &discard);
> } else {
> list_slab_objects(s, page,
> - "Objects remaining in %s on __kmem_cache_shutdown()");
> + "Objects remaining in %s on __kmem_cache_shutdown()",
> + map);
> }
> }
> spin_unlock_irq(&n->list_lock);
>
> +#ifdef CONFIG_SLUB_DEBUG
> + bitmap_free(map);
> +#endif
> +
> list_for_each_entry_safe(page, h, &discard, slab_list)
> discard_slab(s, page);
> }

2019-12-01 01:19:39

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On 2019/12/01 8:09, Andrew Morton wrote:
>> Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
>
>> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
>> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
>> @@ -3690,14 +3690,15 @@ error:
>> }
>>
>> static void list_slab_objects(struct kmem_cache *s, struct page *page,
>> - const char *text)
>> + const char *text, unsigned long *map)
>> {
>> #ifdef CONFIG_SLUB_DEBUG
>> void *addr = page_address(page);
>> void *p;
>> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);

Changing from !(__GFP_IO | __GFP_FS) allocation to

>> +
>> if (!map)
>> return;
>> +
>> slab_err(s, page, text, s->name);
>> slab_lock(page);
>>
>> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
>> {
>> LIST_HEAD(discard);
>> struct page *page, *h;
>> + unsigned long *map = NULL;
>> +
>> +#ifdef CONFIG_SLUB_DEBUG
>> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);

__GFP_IO | __GFP_FS allocation.
How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim?

>> +#endif
>>
>> BUG_ON(irqs_disabled());
>> spin_lock_irq(&n->list_lock);

Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On Sat, 30 Nov 2019, Andrew Morton wrote:

> > Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?

Yup. I prefer this one but its my own patch so I cannot Ack this.

2019-12-07 22:05:04

by Yu Zhao

[permalink] [raw]
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
> On Sat, 30 Nov 2019, Andrew Morton wrote:
>
> > > Perform the allocation in free_partial() before the list_lock is taken.
> >
> > No response here? It looks a lot simpler than the originally proposed
> > patch?
>
> Yup. I prefer this one but its my own patch so I cannot Ack this.

Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
once it's address.

2020-01-10 14:13:41

by Vlastimil Babka

[permalink] [raw]
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On 12/7/19 11:03 PM, Yu Zhao wrote:
> On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
>> On Sat, 30 Nov 2019, Andrew Morton wrote:
>>
>>>> Perform the allocation in free_partial() before the list_lock is taken.
>>>
>>> No response here? It looks a lot simpler than the originally proposed
>>> patch?
>>
>> Yup. I prefer this one but its my own patch so I cannot Ack this.
>
> Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
> once it's address.

Tetsuo's mails don't reach linux-mm for a while and he has given up
trying to do something about it. It's hard to discuss anything outside
the direct CC group then. I don't know what's the pending question, for
example.

2020-01-12 11:07:27

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On 2020/01/10 23:11, Vlastimil Babka wrote:
> On 12/7/19 11:03 PM, Yu Zhao wrote:
>> On Mon, Dec 02, 2019 at 03:12:20PM +0000, Christopher Lameter wrote:
>>> On Sat, 30 Nov 2019, Andrew Morton wrote:
>>>
>>>>> Perform the allocation in free_partial() before the list_lock is taken.
>>>>
>>>> No response here? It looks a lot simpler than the originally proposed
>>>> patch?
>>>
>>> Yup. I prefer this one but its my own patch so I cannot Ack this.
>>
>> Hi, there is a pending question from Tetsuo-san. I'd be happy to ack
>> once it's address.
>
> Tetsuo's mails don't reach linux-mm for a while and he has given up
> trying to do something about it. It's hard to discuss anything outside
> the direct CC group then. I don't know what's the pending question, for
> example.
>

Hmm, this one? Even non-ML destinations are sometimes rejected (e.g.
554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72
). Anyway, I just worried whether it is really safe to do memory allocation
which might involve memory reclaim. You MM guys know better...

-------- Forwarded Message --------
Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2
Message-ID: <[email protected]>
Date: Sun, 1 Dec 2019 10:17:38 +0900

On 2019/12/01 8:09, Andrew Morton wrote:
>> Perform the allocation in free_partial() before the list_lock is taken.
>
> No response here? It looks a lot simpler than the originally proposed
> patch?
>
>> --- linux.orig/mm/slub.c 2019-10-15 13:54:57.032655296 +0000
>> +++ linux/mm/slub.c 2019-11-11 15:52:11.616397853 +0000
>> @@ -3690,14 +3690,15 @@ error:
>> }
>>
>> static void list_slab_objects(struct kmem_cache *s, struct page *page,
>> - const char *text)
>> + const char *text, unsigned long *map)
>> {
>> #ifdef CONFIG_SLUB_DEBUG
>> void *addr = page_address(page);
>> void *p;
>> - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);

Changing from !(__GFP_IO | __GFP_FS) allocation to

>> +
>> if (!map)
>> return;
>> +
>> slab_err(s, page, text, s->name);
>> slab_lock(page);
>>
>> @@ -3723,6 +3723,11 @@ static void free_partial(struct kmem_cac
>> {
>> LIST_HEAD(discard);
>> struct page *page, *h;
>> + unsigned long *map = NULL;
>> +
>> +#ifdef CONFIG_SLUB_DEBUG
>> + map = bitmap_alloc(oo_objects(s->max), GFP_KERNEL);

__GFP_IO | __GFP_FS allocation.
How is this path guaranteed to be safe to perform __GFP_IO | __GFP_FS reclaim?

>> +#endif
>>
>> BUG_ON(irqs_disabled());
>> spin_lock_irq(&n->list_lock);

Subject: Re: [FIX] slub: Remove kmalloc under list_lock from list_slab_objects() V2

On Sun, 12 Jan 2020, Tetsuo Handa wrote:

> On 2020/01/10 23:11, Vlastimil Babka wrote:
> Hmm, this one? Even non-ML destinations are sometimes rejected (e.g.
> 554 5.7.1 Service unavailable; Client host [202.181.97.72] blocked using b.barracudacentral.org; http://www.barracudanetworks.com/reputation/?pr=1&ip=202.181.97.72
> ). Anyway, I just worried whether it is really safe to do memory allocation
> which might involve memory reclaim. You MM guys know better...

We are talking about a call to destroy the kmem_cache. This is not done
under any lock. The lock was taken inside that function before the call to
list_slab_objects. That can be avoided.