In our production experience the percpu memory allocator is sometimes struggling
with returning the memory to the system. A typical example is a creation of
several thousands memory cgroups (each has several chunks of the percpu data
used for vmstats, vmevents, ref counters etc). Deletion and complete releasing
of these cgroups doesn't always lead to a shrinkage of the percpu memory,
so that sometimes there are several GB's of memory wasted.
The underlying problem is the fragmentation: to release an underlying chunk
all percpu allocations should be released first. The percpu allocator tends
to top up chunks to improve the utilization. It means new small-ish allocations
(e.g. percpu ref counters) are placed onto almost filled old-ish chunks,
effectively pinning them in memory.
This patchset pretends to solve this problem by implementing a partial
depopulation of percpu chunks: chunks with many empty pages are being
asynchronously depopulated and the pages are returned to the system.
To illustrate the problem the following script can be used:
--
#!/bin/bash
cd /sys/fs/cgroup
mkdir percpu_test
echo "+memory" > percpu_test/cgroup.subtree_control
cat /proc/meminfo | grep Percpu
for i in `seq 1 1000`; do
mkdir percpu_test/cg_"${i}"
for j in `seq 1 10`; do
mkdir percpu_test/cg_"${i}"_"${j}"
done
done
cat /proc/meminfo | grep Percpu
for i in `seq 1 1000`; do
for j in `seq 1 10`; do
rmdir percpu_test/cg_"${i}"_"${j}"
done
done
sleep 10
cat /proc/meminfo | grep Percpu
for i in `seq 1 1000`; do
rmdir percpu_test/cg_"${i}"
done
rmdir percpu_test
--
It creates 11000 memory cgroups and removes every 10 out of 11.
It prints the initial size of the percpu memory, the size after
creating all cgroups and the size after deleting most of them.
Results:
vanilla:
./percpu_test.sh
Percpu: 7488 kB
Percpu: 481152 kB
Percpu: 481152 kB
with this patchset applied:
./percpu_test.sh
Percpu: 7488 kB
Percpu: 481408 kB
Percpu: 135552 kB
So the total size of the percpu memory was reduced by more than 3.5 times.
v2:
- depopulated chunks are sidelined
- depopulation happens in the reverse order
- depopulate list made per-chunk type
- better results due to better heuristics
v1:
- depopulation heuristics changed and optimized
- chunks are put into a separate list, depopulation scan this list
- chunk->isolated is introduced, chunk->depopulate is dropped
- rearranged patches a bit
- fixed a panic discovered by krobot
- made pcpu_nr_empty_pop_pages per chunk type
- minor fixes
rfc:
https://lwn.net/Articles/850508/
Roman Gushchin (5):
percpu: fix a comment about the chunks ordering
percpu: split __pcpu_balance_workfn()
percpu: make pcpu_nr_empty_pop_pages per chunk type
percpu: generalize pcpu_balance_populated()
percpu: implement partial chunk depopulation
mm/percpu-internal.h | 4 +-
mm/percpu-stats.c | 9 +-
mm/percpu.c | 282 ++++++++++++++++++++++++++++++++++++-------
3 files changed, 246 insertions(+), 49 deletions(-)
--
2.30.2
This patch implements partial depopulation of percpu chunks.
As now, a chunk can be depopulated only as a part of the final
destruction, if there are no more outstanding allocations. However
to minimize a memory waste it might be useful to depopulate a
partially filed chunk, if a small number of outstanding allocations
prevents the chunk from being fully reclaimed.
This patch implements the following depopulation process: it scans
over the chunk pages, looks for a range of empty and populated pages
and performs the depopulation. To avoid races with new allocations,
the chunk is previously isolated. After the depopulation the chunk is
sidelined to a special list or freed. New allocations can't be served
using a sidelined chunk. The chunk can be moved back to a corresponding
slot if there are not enough chunks with empty populated pages.
The depopulation is scheduled on the free path. Is the chunk:
1) has more than 1/4 of total pages free and populated
2) the system has enough free percpu pages aside of this chunk
3) isn't the reserved chunk
4) isn't the first chunk
5) isn't entirely free
it's a good target for depopulation. If it's already depopulated
but got free populated pages, it's a good target too.
The chunk is moved to a special pcpu_depopulate_list, chunk->isolate
flag is set and the async balancing is scheduled.
The async balancing moves pcpu_depopulate_list to a local list
(because pcpu_depopulate_list can be changed when pcpu_lock is
releases), and then tries to depopulate each chunk. The depopulation
is performed in the reverse direction to keep populated pages close to
the beginning, if the global number of empty pages is reached.
Depopulated chunks are sidelined to prevent further allocations.
Skipped and fully empty chunks are returned to the corresponding slot.
On the allocation path, if there are no suitable chunks found,
the list of sidelined chunks in scanned prior to creating a new chunk.
If there is a good sidelined chunk, it's placed back to the slot
and the scanning is restarted.
Many thanks to Dennis Zhou for his great ideas and a very constructive
discussion which led to many improvements in this patchset!
Signed-off-by: Roman Gushchin <[email protected]>
---
mm/percpu-internal.h | 2 +
mm/percpu.c | 164 ++++++++++++++++++++++++++++++++++++++++++-
2 files changed, 164 insertions(+), 2 deletions(-)
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 095d7eaa0db4..8e432663c41e 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -67,6 +67,8 @@ struct pcpu_chunk {
void *data; /* chunk data */
bool immutable; /* no [de]population allowed */
+ bool isolated; /* isolated from chunk slot lists */
+ bool depopulated; /* sidelined after depopulation */
int start_offset; /* the overlap with the previous
region to have a page aligned
base_addr */
diff --git a/mm/percpu.c b/mm/percpu.c
index e20119668c42..0a5a5e84e0a4 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -181,6 +181,19 @@ static LIST_HEAD(pcpu_map_extend_chunks);
*/
int pcpu_nr_empty_pop_pages[PCPU_NR_CHUNK_TYPES];
+/*
+ * List of chunks with a lot of free pages. Used to depopulate them
+ * asynchronously.
+ */
+static struct list_head pcpu_depopulate_list[PCPU_NR_CHUNK_TYPES];
+
+/*
+ * List of previously depopulated chunks. They are not usually used for new
+ * allocations, but can be returned back to service if a need arises.
+ */
+static struct list_head pcpu_sideline_list[PCPU_NR_CHUNK_TYPES];
+
+
/*
* The number of populated pages in use by the allocator, protected by
* pcpu_lock. This number is kept per a unit per chunk (i.e. when a page gets
@@ -542,6 +555,12 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
{
int nslot = pcpu_chunk_slot(chunk);
+ /*
+ * Keep isolated and depopulated chunks on a sideline.
+ */
+ if (chunk->isolated || chunk->depopulated)
+ return;
+
if (oslot != nslot)
__pcpu_chunk_move(chunk, nslot, oslot < nslot);
}
@@ -1778,6 +1797,25 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
}
}
+ /* search through sidelined depopulated chunks */
+ list_for_each_entry(chunk, &pcpu_sideline_list[type], list) {
+ struct pcpu_block_md *chunk_md = &chunk->chunk_md;
+ int bit_off;
+
+ /*
+ * If the allocation can fit in the chunk's contig hint,
+ * place the chunk back into corresponding slot and restart
+ * the scanning.
+ */
+ bit_off = ALIGN(chunk_md->contig_hint_start, align) -
+ chunk_md->contig_hint_start;
+ if (bit_off + bits > chunk_md->contig_hint) {
+ chunk->depopulated = false;
+ pcpu_chunk_relocate(chunk, -1);
+ goto restart;
+ }
+ }
+
spin_unlock_irqrestore(&pcpu_lock, flags);
/*
@@ -2048,6 +2086,106 @@ static void pcpu_grow_populated(enum pcpu_chunk_type type, int nr_to_pop)
}
}
+/**
+ * pcpu_shrink_populated - scan chunks and release unused pages to the system
+ * @type: chunk type
+ *
+ * Scan over chunks in the depopulate list, try to release unused populated
+ * pages to the system. Depopulated chunks are sidelined to prevent further
+ * allocations without a need. Skipped and fully free chunks are returned
+ * to corresponding slots. Stop depopulating if the number of empty populated
+ * pages reaches the threshold. Each chunk is scanned in the reverse order to
+ * keep populated pages close to the beginning of the chunk.
+ */
+static void pcpu_shrink_populated(enum pcpu_chunk_type type)
+{
+ struct pcpu_block_md *block;
+ struct pcpu_chunk *chunk, *tmp;
+ LIST_HEAD(to_depopulate);
+ bool depopulated;
+ int i, end;
+
+ spin_lock_irq(&pcpu_lock);
+
+ list_splice_init(&pcpu_depopulate_list[type], &to_depopulate);
+
+ list_for_each_entry_safe(chunk, tmp, &to_depopulate, list) {
+ WARN_ON(chunk->immutable);
+ depopulated = false;
+
+ /*
+ * Scan chunk's pages in the reverse order to keep populated
+ * pages close to the beginning of the chunk.
+ */
+ for (i = chunk->nr_pages - 1, end = -1; i >= 0; i--) {
+ /*
+ * If the chunk has no empty pages or
+ * we're short on empty pages in general,
+ * just put the chunk back into the original slot.
+ */
+ if (!chunk->nr_empty_pop_pages ||
+ pcpu_nr_empty_pop_pages[type] <=
+ PCPU_EMPTY_POP_PAGES_HIGH)
+ break;
+
+ /*
+ * If the page is empty and populated, start or
+ * extend the (i, end) range. If i == 0, decrease
+ * i and perform the depopulation to cover the last
+ * (first) page in the chunk.
+ */
+ block = chunk->md_blocks + i;
+ if (block->contig_hint == PCPU_BITMAP_BLOCK_BITS &&
+ test_bit(i, chunk->populated)) {
+ if (end == -1)
+ end = i;
+ if (i > 0)
+ continue;
+ i--;
+ }
+
+ /*
+ * Otherwise check if there is an active range,
+ * and if yes, depopulate it.
+ */
+ if (end == -1)
+ continue;
+
+ depopulated = true;
+
+ spin_unlock_irq(&pcpu_lock);
+ pcpu_depopulate_chunk(chunk, i + 1, end + 1);
+ cond_resched();
+ spin_lock_irq(&pcpu_lock);
+
+ pcpu_chunk_depopulated(chunk, i + 1, end + 1);
+
+ /*
+ * Reset the range and continue.
+ */
+ end = -1;
+ }
+
+ chunk->isolated = false;
+ if (chunk->free_bytes == pcpu_unit_size || !depopulated) {
+ /*
+ * If the chunk is empty or hasn't been depopulated,
+ * return it to the original slot.
+ */
+ pcpu_chunk_relocate(chunk, -1);
+ } else {
+ /*
+ * Otherwise put the chunk to the list of depopulated
+ * chunks.
+ */
+ chunk->depopulated = true;
+ list_move(&chunk->list, &pcpu_sideline_list[type]);
+ }
+ }
+
+ spin_unlock_irq(&pcpu_lock);
+}
+
/**
* pcpu_balance_populated - manage the amount of populated pages
* @type: chunk type
@@ -2078,6 +2216,8 @@ static void pcpu_balance_populated(enum pcpu_chunk_type type)
} else if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_HIGH) {
nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH - pcpu_nr_empty_pop_pages[type];
pcpu_grow_populated(type, nr_to_pop);
+ } else if (!list_empty(&pcpu_depopulate_list[type])) {
+ pcpu_shrink_populated(type);
}
}
@@ -2135,7 +2275,13 @@ void free_percpu(void __percpu *ptr)
pcpu_memcg_free_hook(chunk, off, size);
- /* if there are more than one fully free chunks, wake up grim reaper */
+ /*
+ * If there are more than one fully free chunks, wake up grim reaper.
+ * Otherwise if at least 1/4 of its pages are empty and there is no
+ * system-wide shortage of empty pages aside from this chunk, isolate
+ * the chunk and schedule an async depopulation. If the chunk was
+ * depopulated previously and got free pages, depopulate it too.
+ */
if (chunk->free_bytes == pcpu_unit_size) {
struct pcpu_chunk *pos;
@@ -2144,6 +2290,16 @@ void free_percpu(void __percpu *ptr)
need_balance = true;
break;
}
+ } else if (chunk != pcpu_first_chunk && chunk != pcpu_reserved_chunk &&
+ !chunk->isolated &&
+ pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] >
+ PCPU_EMPTY_POP_PAGES_HIGH + chunk->nr_empty_pop_pages &&
+ ((chunk->depopulated && chunk->nr_empty_pop_pages) ||
+ (chunk->nr_empty_pop_pages >= chunk->nr_pages / 4))) {
+ list_move(&chunk->list, &pcpu_depopulate_list[pcpu_chunk_type(chunk)]);
+ chunk->isolated = true;
+ chunk->depopulated = false;
+ need_balance = true;
}
trace_percpu_free_percpu(chunk->base_addr, off, ptr);
@@ -2571,10 +2727,14 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
pcpu_nr_slots * sizeof(pcpu_chunk_lists[0]) *
PCPU_NR_CHUNK_TYPES);
- for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++)
+ for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) {
for (i = 0; i < pcpu_nr_slots; i++)
INIT_LIST_HEAD(&pcpu_chunk_list(type)[i]);
+ INIT_LIST_HEAD(&pcpu_depopulate_list[type]);
+ INIT_LIST_HEAD(&pcpu_sideline_list[type]);
+ }
+
/*
* The end of the static region needs to be aligned with the
* minimum allocation size as this offsets the reserved and
--
2.30.2
Hello,
On Wed, Apr 07, 2021 at 11:26:18AM -0700, Roman Gushchin wrote:
> This patch implements partial depopulation of percpu chunks.
>
> As now, a chunk can be depopulated only as a part of the final
> destruction, if there are no more outstanding allocations. However
> to minimize a memory waste it might be useful to depopulate a
> partially filed chunk, if a small number of outstanding allocations
> prevents the chunk from being fully reclaimed.
>
> This patch implements the following depopulation process: it scans
> over the chunk pages, looks for a range of empty and populated pages
> and performs the depopulation. To avoid races with new allocations,
> the chunk is previously isolated. After the depopulation the chunk is
> sidelined to a special list or freed. New allocations can't be served
> using a sidelined chunk. The chunk can be moved back to a corresponding
> slot if there are not enough chunks with empty populated pages.
>
> The depopulation is scheduled on the free path. Is the chunk:
> 1) has more than 1/4 of total pages free and populated
> 2) the system has enough free percpu pages aside of this chunk
> 3) isn't the reserved chunk
> 4) isn't the first chunk
> 5) isn't entirely free
> it's a good target for depopulation. If it's already depopulated
> but got free populated pages, it's a good target too.
> The chunk is moved to a special pcpu_depopulate_list, chunk->isolate
> flag is set and the async balancing is scheduled.
>
> The async balancing moves pcpu_depopulate_list to a local list
> (because pcpu_depopulate_list can be changed when pcpu_lock is
> releases), and then tries to depopulate each chunk. The depopulation
> is performed in the reverse direction to keep populated pages close to
> the beginning, if the global number of empty pages is reached.
> Depopulated chunks are sidelined to prevent further allocations.
> Skipped and fully empty chunks are returned to the corresponding slot.
>
> On the allocation path, if there are no suitable chunks found,
> the list of sidelined chunks in scanned prior to creating a new chunk.
> If there is a good sidelined chunk, it's placed back to the slot
> and the scanning is restarted.
>
> Many thanks to Dennis Zhou for his great ideas and a very constructive
> discussion which led to many improvements in this patchset!
>
> Signed-off-by: Roman Gushchin <[email protected]>
> ---
> mm/percpu-internal.h | 2 +
> mm/percpu.c | 164 ++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 164 insertions(+), 2 deletions(-)
>
> diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
> index 095d7eaa0db4..8e432663c41e 100644
> --- a/mm/percpu-internal.h
> +++ b/mm/percpu-internal.h
> @@ -67,6 +67,8 @@ struct pcpu_chunk {
>
> void *data; /* chunk data */
> bool immutable; /* no [de]population allowed */
> + bool isolated; /* isolated from chunk slot lists */
> + bool depopulated; /* sidelined after depopulation */
> int start_offset; /* the overlap with the previous
> region to have a page aligned
> base_addr */
> diff --git a/mm/percpu.c b/mm/percpu.c
> index e20119668c42..0a5a5e84e0a4 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -181,6 +181,19 @@ static LIST_HEAD(pcpu_map_extend_chunks);
> */
> int pcpu_nr_empty_pop_pages[PCPU_NR_CHUNK_TYPES];
>
> +/*
> + * List of chunks with a lot of free pages. Used to depopulate them
> + * asynchronously.
> + */
> +static struct list_head pcpu_depopulate_list[PCPU_NR_CHUNK_TYPES];
> +
> +/*
> + * List of previously depopulated chunks. They are not usually used for new
> + * allocations, but can be returned back to service if a need arises.
> + */
> +static struct list_head pcpu_sideline_list[PCPU_NR_CHUNK_TYPES];
> +
> +
> /*
> * The number of populated pages in use by the allocator, protected by
> * pcpu_lock. This number is kept per a unit per chunk (i.e. when a page gets
> @@ -542,6 +555,12 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
> {
> int nslot = pcpu_chunk_slot(chunk);
>
> + /*
> + * Keep isolated and depopulated chunks on a sideline.
> + */
> + if (chunk->isolated || chunk->depopulated)
> + return;
> +
> if (oslot != nslot)
> __pcpu_chunk_move(chunk, nslot, oslot < nslot);
> }
> @@ -1778,6 +1797,25 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
> }
> }
>
> + /* search through sidelined depopulated chunks */
> + list_for_each_entry(chunk, &pcpu_sideline_list[type], list) {
> + struct pcpu_block_md *chunk_md = &chunk->chunk_md;
> + int bit_off;
> +
> + /*
> + * If the allocation can fit in the chunk's contig hint,
> + * place the chunk back into corresponding slot and restart
> + * the scanning.
> + */
> + bit_off = ALIGN(chunk_md->contig_hint_start, align) -
> + chunk_md->contig_hint_start;
> + if (bit_off + bits > chunk_md->contig_hint) {
> + chunk->depopulated = false;
> + pcpu_chunk_relocate(chunk, -1);
> + goto restart;
> + }
This check should be bit_off + bits < chunk_md->contig_hint.
Can you please factor that out to a function:
pcpu_check_chunk_hint(chunk_md, bits)
{
int bit_off = (ALIGN(chunk_md->contig_hint_start, align) -
chunk_md->contig_hint_start);
return (bit_off + bits < chunk_md->contig_hint);
}
Then your use case can just call pcpu_check_chunk_hint() and the other
user pcpu_find_block_fit() can use !pcpu_check_chunk_hint().
> + }
> +
> spin_unlock_irqrestore(&pcpu_lock, flags);
>
> /*
> @@ -2048,6 +2086,106 @@ static void pcpu_grow_populated(enum pcpu_chunk_type type, int nr_to_pop)
> }
> }
>
> +/**
> + * pcpu_shrink_populated - scan chunks and release unused pages to the system
> + * @type: chunk type
> + *
> + * Scan over chunks in the depopulate list, try to release unused populated
> + * pages to the system. Depopulated chunks are sidelined to prevent further
> + * allocations without a need. Skipped and fully free chunks are returned
> + * to corresponding slots. Stop depopulating if the number of empty populated
> + * pages reaches the threshold. Each chunk is scanned in the reverse order to
> + * keep populated pages close to the beginning of the chunk.
> + */
> +static void pcpu_shrink_populated(enum pcpu_chunk_type type)
> +{
> + struct pcpu_block_md *block;
> + struct pcpu_chunk *chunk, *tmp;
> + LIST_HEAD(to_depopulate);
> + bool depopulated;
> + int i, end;
> +
> + spin_lock_irq(&pcpu_lock);
> +
> + list_splice_init(&pcpu_depopulate_list[type], &to_depopulate);
> +
> + list_for_each_entry_safe(chunk, tmp, &to_depopulate, list) {
> + WARN_ON(chunk->immutable);
> + depopulated = false;
> +
> + /*
> + * Scan chunk's pages in the reverse order to keep populated
> + * pages close to the beginning of the chunk.
> + */
> + for (i = chunk->nr_pages - 1, end = -1; i >= 0; i--) {
> + /*
> + * If the chunk has no empty pages or
> + * we're short on empty pages in general,
> + * just put the chunk back into the original slot.
> + */
> + if (!chunk->nr_empty_pop_pages ||
> + pcpu_nr_empty_pop_pages[type] <=
> + PCPU_EMPTY_POP_PAGES_HIGH)
> + break;
> +
> + /*
> + * If the page is empty and populated, start or
> + * extend the (i, end) range. If i == 0, decrease
> + * i and perform the depopulation to cover the last
> + * (first) page in the chunk.
> + */
> + block = chunk->md_blocks + i;
> + if (block->contig_hint == PCPU_BITMAP_BLOCK_BITS &&
> + test_bit(i, chunk->populated)) {
> + if (end == -1)
> + end = i;
> + if (i > 0)
> + continue;
> + i--;
> + }
> +
> + /*
> + * Otherwise check if there is an active range,
> + * and if yes, depopulate it.
> + */
> + if (end == -1)
> + continue;
> +
> + depopulated = true;
> +
> + spin_unlock_irq(&pcpu_lock);
> + pcpu_depopulate_chunk(chunk, i + 1, end + 1);
> + cond_resched();
> + spin_lock_irq(&pcpu_lock);
> +
> + pcpu_chunk_depopulated(chunk, i + 1, end + 1);
> +
> + /*
> + * Reset the range and continue.
> + */
> + end = -1;
> + }
> +
> + chunk->isolated = false;
> + if (chunk->free_bytes == pcpu_unit_size || !depopulated) {
> + /*
> + * If the chunk is empty or hasn't been depopulated,
> + * return it to the original slot.
> + */
> + pcpu_chunk_relocate(chunk, -1);
> + } else {
> + /*
> + * Otherwise put the chunk to the list of depopulated
> + * chunks.
> + */
> + chunk->depopulated = true;
> + list_move(&chunk->list, &pcpu_sideline_list[type]);
> + }
> + }
> +
> + spin_unlock_irq(&pcpu_lock);
> +}
> +
> /**
> * pcpu_balance_populated - manage the amount of populated pages
> * @type: chunk type
> @@ -2078,6 +2216,8 @@ static void pcpu_balance_populated(enum pcpu_chunk_type type)
> } else if (pcpu_nr_empty_pop_pages[type] < PCPU_EMPTY_POP_PAGES_HIGH) {
> nr_to_pop = PCPU_EMPTY_POP_PAGES_HIGH - pcpu_nr_empty_pop_pages[type];
> pcpu_grow_populated(type, nr_to_pop);
> + } else if (!list_empty(&pcpu_depopulate_list[type])) {
> + pcpu_shrink_populated(type);
> }
> }
>
> @@ -2135,7 +2275,13 @@ void free_percpu(void __percpu *ptr)
>
> pcpu_memcg_free_hook(chunk, off, size);
>
> - /* if there are more than one fully free chunks, wake up grim reaper */
> + /*
> + * If there are more than one fully free chunks, wake up grim reaper.
> + * Otherwise if at least 1/4 of its pages are empty and there is no
> + * system-wide shortage of empty pages aside from this chunk, isolate
> + * the chunk and schedule an async depopulation. If the chunk was
> + * depopulated previously and got free pages, depopulate it too.
> + */
> if (chunk->free_bytes == pcpu_unit_size) {
> struct pcpu_chunk *pos;
>
> @@ -2144,6 +2290,16 @@ void free_percpu(void __percpu *ptr)
> need_balance = true;
> break;
> }
> + } else if (chunk != pcpu_first_chunk && chunk != pcpu_reserved_chunk &&
> + !chunk->isolated &&
> + pcpu_nr_empty_pop_pages[pcpu_chunk_type(chunk)] >
> + PCPU_EMPTY_POP_PAGES_HIGH + chunk->nr_empty_pop_pages &&
nit: can you add parethesis around this condition?
> + ((chunk->depopulated && chunk->nr_empty_pop_pages) ||
> + (chunk->nr_empty_pop_pages >= chunk->nr_pages / 4))) {
> + list_move(&chunk->list, &pcpu_depopulate_list[pcpu_chunk_type(chunk)]);
> + chunk->isolated = true;
> + chunk->depopulated = false;
> + need_balance = true;
> }
>
> trace_percpu_free_percpu(chunk->base_addr, off, ptr);
> @@ -2571,10 +2727,14 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
> pcpu_nr_slots * sizeof(pcpu_chunk_lists[0]) *
> PCPU_NR_CHUNK_TYPES);
>
> - for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++)
> + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) {
> for (i = 0; i < pcpu_nr_slots; i++)
> INIT_LIST_HEAD(&pcpu_chunk_list(type)[i]);
>
> + INIT_LIST_HEAD(&pcpu_depopulate_list[type]);
> + INIT_LIST_HEAD(&pcpu_sideline_list[type]);
> + }
> +
> /*
> * The end of the static region needs to be aligned with the
> * minimum allocation size as this offsets the reserved and
> --
> 2.30.2
>
Thanks,
Dennis