2023-01-05 00:50:37

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 0/8] kvfree_rcu updates for v6.3

Hello!

This series contains a few kvfree_rcu() updates:

1. Refactor kvfree_call_rcu() and high-level helpers, courtesy of
"Uladzislau Rezki (Sony)".

2. Switch to a generic linked list API, courtesy of "Uladzislau Rezki
(Sony)".

3. Move bulk/list reclaim to separate functions, courtesy of
"Uladzislau Rezki (Sony)".

4. Move need_offload_krc() out of krcp->lock, courtesy of "Uladzislau
Rezki (Sony)".

5. Use a polled API to speedup a reclaim process, courtesy of
"Uladzislau Rezki (Sony)".

6. Use READ_ONCE() when access to krcp->head, courtesy of "Uladzislau
Rezki (Sony)".

7. Carefully reset number of objects in krcp, courtesy of "Uladzislau
Rezki (Sony)".

8. Split ready for reclaim objects from a batch, courtesy of
"Uladzislau Rezki (Sony)".

Thanx, Paul

------------------------------------------------------------------------

b/include/linux/rcupdate.h | 5
b/include/linux/rcutiny.h | 12 -
b/include/linux/rcutree.h | 2
b/kernel/rcu/tiny.c | 9 -
b/kernel/rcu/tree.c | 29 +--
kernel/rcu/tree.c | 392 +++++++++++++++++++++++++++------------------
6 files changed, 261 insertions(+), 188 deletions(-)


2023-01-05 00:51:01

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 5/8] rcu/kvfree: Use a polled API to speedup a reclaim process

From: "Uladzislau Rezki (Sony)" <[email protected]>

Currently all objects placed into a batch wait for a full grace period
to elapse after that batch is ready to send to RCU. However, this
can unnecessarily delay freeing of the first objects that were added
to the batch. After all, several RCU grace periods might have elapsed
since those objects were added, and if so, there is no point in further
deferring their freeing.

This commit therefore adds per-page grace-period snapshots which are
obtained from get_state_synchronize_rcu(). When the batch is ready
to be passed to call_rcu(), each page's snapshot is checked by passing
it to poll_state_synchronize_rcu(). If a given page's RCU grace period
has already elapsed, its objects are freed immediately by kvfree_rcu_bulk().
Otherwise, these objects are freed after a call to synchronize_rcu().

This approach requires that the pages be traversed in reverse order,
that is, the oldest ones first.

Test example:

kvm.sh --memory 10G --torture rcuscale --allcpus --duration 1 \
--kconfig CONFIG_NR_CPUS=64 \
--kconfig CONFIG_RCU_NOCB_CPU=y \
--kconfig CONFIG_RCU_NOCB_CPU_DEFAULT_ALL=y \
--kconfig CONFIG_RCU_LAZY=n \
--bootargs "rcuscale.kfree_rcu_test=1 rcuscale.kfree_nthreads=16 \
rcuscale.holdoff=20 rcuscale.kfree_loops=10000 \
torture.disable_onoff_at_boot" --trust-make

Before this commit:

Total time taken by all kfree'ers: 8535693700 ns, loops: 10000, batches: 1188, memory footprint: 2248MB
Total time taken by all kfree'ers: 8466933582 ns, loops: 10000, batches: 1157, memory footprint: 2820MB
Total time taken by all kfree'ers: 5375602446 ns, loops: 10000, batches: 1130, memory footprint: 6502MB
Total time taken by all kfree'ers: 7523283832 ns, loops: 10000, batches: 1006, memory footprint: 3343MB
Total time taken by all kfree'ers: 6459171956 ns, loops: 10000, batches: 1150, memory footprint: 6549MB

After this commit:

Total time taken by all kfree'ers: 8560060176 ns, loops: 10000, batches: 1787, memory footprint: 61MB
Total time taken by all kfree'ers: 8573885501 ns, loops: 10000, batches: 1777, memory footprint: 93MB
Total time taken by all kfree'ers: 8320000202 ns, loops: 10000, batches: 1727, memory footprint: 66MB
Total time taken by all kfree'ers: 8552718794 ns, loops: 10000, batches: 1790, memory footprint: 75MB
Total time taken by all kfree'ers: 8601368792 ns, loops: 10000, batches: 1724, memory footprint: 62MB

The reduction in memory footprint is well in excess of an order of
magnitude.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 47 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 0c42fce4efe32..735312f78e980 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2877,11 +2877,13 @@ EXPORT_SYMBOL_GPL(call_rcu);
/**
* struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
* @list: List node. All blocks are linked between each other
+ * @gp_snap: Snapshot of RCU state for objects placed to this bulk
* @nr_records: Number of active pointers in the array
* @records: Array of the kvfree_rcu() pointers
*/
struct kvfree_rcu_bulk_data {
struct list_head list;
+ unsigned long gp_snap;
unsigned long nr_records;
void *records[];
};
@@ -2898,13 +2900,15 @@ struct kvfree_rcu_bulk_data {
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
* @head_free: List of kfree_rcu() objects waiting for a grace period
+ * @head_free_gp_snap: Snapshot of RCU state for objects placed to "@head_free"
* @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
* @krcp: Pointer to @kfree_rcu_cpu structure
*/

struct kfree_rcu_cpu_work {
- struct rcu_work rcu_work;
+ struct work_struct rcu_work;
struct rcu_head *head_free;
+ unsigned long head_free_gp_snap;
struct list_head bulk_head_free[FREE_N_CHANNELS];
struct kfree_rcu_cpu *krcp;
};
@@ -3100,10 +3104,11 @@ static void kfree_rcu_work(struct work_struct *work)
struct rcu_head *head;
struct kfree_rcu_cpu *krcp;
struct kfree_rcu_cpu_work *krwp;
+ unsigned long head_free_gp_snap;
int i;

- krwp = container_of(to_rcu_work(work),
- struct kfree_rcu_cpu_work, rcu_work);
+ krwp = container_of(work,
+ struct kfree_rcu_cpu_work, rcu_work);
krcp = krwp->krcp;

raw_spin_lock_irqsave(&krcp->lock, flags);
@@ -3114,12 +3119,29 @@ static void kfree_rcu_work(struct work_struct *work)
// Channel 3.
head = krwp->head_free;
krwp->head_free = NULL;
+ head_free_gp_snap = krwp->head_free_gp_snap;
raw_spin_unlock_irqrestore(&krcp->lock, flags);

// Handle the first two channels.
- for (i = 0; i < FREE_N_CHANNELS; i++)
+ for (i = 0; i < FREE_N_CHANNELS; i++) {
+ // Start from the tail page, so a GP is likely passed for it.
+ list_for_each_entry_safe_reverse(bnode, n, &bulk_head[i], list) {
+ // Not yet ready? Bail out since we need one more GP.
+ if (!poll_state_synchronize_rcu(bnode->gp_snap))
+ break;
+
+ list_del_init(&bnode->list);
+ kvfree_rcu_bulk(krcp, bnode, i);
+ }
+
+ // Please note a request for one more extra GP can
+ // occur only once for all objects in this batch.
+ if (!list_empty(&bulk_head[i]))
+ synchronize_rcu();
+
list_for_each_entry_safe(bnode, n, &bulk_head[i], list)
kvfree_rcu_bulk(krcp, bnode, i);
+ }

/*
* This is used when the "bulk" path can not be used for the
@@ -3128,7 +3150,10 @@ static void kfree_rcu_work(struct work_struct *work)
* queued on a linked list through their rcu_head structures.
* This list is named "Channel 3".
*/
- kvfree_rcu_list(head);
+ if (head) {
+ cond_synchronize_rcu(head_free_gp_snap);
+ kvfree_rcu_list(head);
+ }
}

static bool
@@ -3195,6 +3220,11 @@ static void kfree_rcu_monitor(struct work_struct *work)
if (!krwp->head_free) {
krwp->head_free = krcp->head;
WRITE_ONCE(krcp->head, NULL);
+
+ // Take a snapshot for this krwp. Please note no more
+ // any objects can be added to attached head_free channel
+ // therefore fixate a GP for it here.
+ krwp->head_free_gp_snap = get_state_synchronize_rcu();
}

WRITE_ONCE(krcp->count, 0);
@@ -3204,7 +3234,7 @@ static void kfree_rcu_monitor(struct work_struct *work)
// be that the work is in the pending state when
// channels have been detached following by each
// other.
- queue_rcu_work(system_wq, &krwp->rcu_work);
+ queue_work(system_wq, &krwp->rcu_work);
}
}

@@ -3332,8 +3362,9 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
list_add(&bnode->list, &(*krcp)->bulk_head[idx]);
}

- /* Finally insert. */
+ // Finally insert and update the GP for this page.
bnode->records[bnode->nr_records++] = ptr;
+ bnode->gp_snap = get_state_synchronize_rcu();
return true;
}

@@ -4783,7 +4814,7 @@ static void __init kfree_rcu_batch_init(void)
struct kfree_rcu_cpu *krcp = per_cpu_ptr(&krc, cpu);

for (i = 0; i < KFREE_N_BATCHES; i++) {
- INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
+ INIT_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
krcp->krw_arr[i].krcp = krcp;

for (j = 0; j < FREE_N_CHANNELS; j++)
--
2.31.1.189.g2e36527f23

2023-01-05 01:10:40

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 3/8] rcu/kvfree: Move bulk/list reclaim to separate functions

From: "Uladzislau Rezki (Sony)" <[email protected]>

The kvfree_rcu() code maintains lists of pages of pointers, but also a
singly linked list, with the latter being used when memory allocation
fails. Traversal of these two types of lists is currently open coded.
This commit simplifies the code by providing kvfree_rcu_bulk() and
kvfree_rcu_list() functions, respectively, to traverse these two types
of lists. This patch does not introduce any functional change.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 114 ++++++++++++++++++++++++++--------------------
1 file changed, 65 insertions(+), 49 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 4088b34ce9610..839e617f6c370 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3029,6 +3029,65 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)
return freed;
}

+static void
+kvfree_rcu_bulk(struct kfree_rcu_cpu *krcp,
+ struct kvfree_rcu_bulk_data *bnode, int idx)
+{
+ unsigned long flags;
+ int i;
+
+ debug_rcu_bhead_unqueue(bnode);
+
+ rcu_lock_acquire(&rcu_callback_map);
+ if (idx == 0) { // kmalloc() / kfree().
+ trace_rcu_invoke_kfree_bulk_callback(
+ rcu_state.name, bnode->nr_records,
+ bnode->records);
+
+ kfree_bulk(bnode->nr_records, bnode->records);
+ } else { // vmalloc() / vfree().
+ for (i = 0; i < bnode->nr_records; i++) {
+ trace_rcu_invoke_kvfree_callback(
+ rcu_state.name, bnode->records[i], 0);
+
+ vfree(bnode->records[i]);
+ }
+ }
+ rcu_lock_release(&rcu_callback_map);
+
+ raw_spin_lock_irqsave(&krcp->lock, flags);
+ if (put_cached_bnode(krcp, bnode))
+ bnode = NULL;
+ raw_spin_unlock_irqrestore(&krcp->lock, flags);
+
+ if (bnode)
+ free_page((unsigned long) bnode);
+
+ cond_resched_tasks_rcu_qs();
+}
+
+static void
+kvfree_rcu_list(struct rcu_head *head)
+{
+ struct rcu_head *next;
+
+ for (; head; head = next) {
+ void *ptr = (void *) head->func;
+ unsigned long offset = (void *) head - ptr;
+
+ next = head->next;
+ debug_rcu_head_unqueue((struct rcu_head *)ptr);
+ rcu_lock_acquire(&rcu_callback_map);
+ trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
+
+ if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
+ kvfree(ptr);
+
+ rcu_lock_release(&rcu_callback_map);
+ cond_resched_tasks_rcu_qs();
+ }
+}
+
/*
* This function is invoked in workqueue context after a grace period.
* It frees all the objects queued on ->bulk_head_free or ->head_free.
@@ -3038,10 +3097,10 @@ static void kfree_rcu_work(struct work_struct *work)
unsigned long flags;
struct kvfree_rcu_bulk_data *bnode, *n;
struct list_head bulk_head[FREE_N_CHANNELS];
- struct rcu_head *head, *next;
+ struct rcu_head *head;
struct kfree_rcu_cpu *krcp;
struct kfree_rcu_cpu_work *krwp;
- int i, j;
+ int i;

krwp = container_of(to_rcu_work(work),
struct kfree_rcu_cpu_work, rcu_work);
@@ -3058,38 +3117,9 @@ static void kfree_rcu_work(struct work_struct *work)
raw_spin_unlock_irqrestore(&krcp->lock, flags);

// Handle the first two channels.
- for (i = 0; i < FREE_N_CHANNELS; i++) {
- list_for_each_entry_safe(bnode, n, &bulk_head[i], list) {
- debug_rcu_bhead_unqueue(bnode);
-
- rcu_lock_acquire(&rcu_callback_map);
- if (i == 0) { // kmalloc() / kfree().
- trace_rcu_invoke_kfree_bulk_callback(
- rcu_state.name, bnode->nr_records,
- bnode->records);
-
- kfree_bulk(bnode->nr_records, bnode->records);
- } else { // vmalloc() / vfree().
- for (j = 0; j < bnode->nr_records; j++) {
- trace_rcu_invoke_kvfree_callback(
- rcu_state.name, bnode->records[j], 0);
-
- vfree(bnode->records[j]);
- }
- }
- rcu_lock_release(&rcu_callback_map);
-
- raw_spin_lock_irqsave(&krcp->lock, flags);
- if (put_cached_bnode(krcp, bnode))
- bnode = NULL;
- raw_spin_unlock_irqrestore(&krcp->lock, flags);
-
- if (bnode)
- free_page((unsigned long) bnode);
-
- cond_resched_tasks_rcu_qs();
- }
- }
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ list_for_each_entry_safe(bnode, n, &bulk_head[i], list)
+ kvfree_rcu_bulk(krcp, bnode, i);

/*
* This is used when the "bulk" path can not be used for the
@@ -3098,21 +3128,7 @@ static void kfree_rcu_work(struct work_struct *work)
* queued on a linked list through their rcu_head structures.
* This list is named "Channel 3".
*/
- for (; head; head = next) {
- void *ptr = (void *) head->func;
- unsigned long offset = (void *) head - ptr;
-
- next = head->next;
- debug_rcu_head_unqueue((struct rcu_head *)ptr);
- rcu_lock_acquire(&rcu_callback_map);
- trace_rcu_invoke_kvfree_callback(rcu_state.name, head, offset);
-
- if (!WARN_ON_ONCE(!__is_kvfree_rcu_offset(offset)))
- kvfree(ptr);
-
- rcu_lock_release(&rcu_callback_map);
- cond_resched_tasks_rcu_qs();
- }
+ kvfree_rcu_list(head);
}

static bool
--
2.31.1.189.g2e36527f23

2023-01-05 01:24:45

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 2/8] rcu/kvfree: Switch to a generic linked list API

From: "Uladzislau Rezki (Sony)" <[email protected]>

This commit improves the readability and maintainability of the
kvfree_rcu() code by switching from an open-coded linked list to
the standard Linux-kernel circular doubly linked list. This patch
does not introduce any functional change.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
---
kernel/rcu/tree.c | 89 +++++++++++++++++++++++------------------------
1 file changed, 43 insertions(+), 46 deletions(-)

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 7d222acd85bfd..4088b34ce9610 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2876,13 +2876,13 @@ EXPORT_SYMBOL_GPL(call_rcu);

/**
* struct kvfree_rcu_bulk_data - single block to store kvfree_rcu() pointers
+ * @list: List node. All blocks are linked between each other
* @nr_records: Number of active pointers in the array
- * @next: Next bulk object in the block chain
* @records: Array of the kvfree_rcu() pointers
*/
struct kvfree_rcu_bulk_data {
+ struct list_head list;
unsigned long nr_records;
- struct kvfree_rcu_bulk_data *next;
void *records[];
};

@@ -2898,21 +2898,21 @@ struct kvfree_rcu_bulk_data {
* struct kfree_rcu_cpu_work - single batch of kfree_rcu() requests
* @rcu_work: Let queue_rcu_work() invoke workqueue handler after grace period
* @head_free: List of kfree_rcu() objects waiting for a grace period
- * @bkvhead_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
+ * @bulk_head_free: Bulk-List of kvfree_rcu() objects waiting for a grace period
* @krcp: Pointer to @kfree_rcu_cpu structure
*/

struct kfree_rcu_cpu_work {
struct rcu_work rcu_work;
struct rcu_head *head_free;
- struct kvfree_rcu_bulk_data *bkvhead_free[FREE_N_CHANNELS];
+ struct list_head bulk_head_free[FREE_N_CHANNELS];
struct kfree_rcu_cpu *krcp;
};

/**
* struct kfree_rcu_cpu - batch up kfree_rcu() requests for RCU grace period
* @head: List of kfree_rcu() objects not yet waiting for a grace period
- * @bkvhead: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
+ * @bulk_head: Bulk-List of kvfree_rcu() objects not yet waiting for a grace period
* @krw_arr: Array of batches of kfree_rcu() objects waiting for a grace period
* @lock: Synchronize access to this structure
* @monitor_work: Promote @head to @head_free after KFREE_DRAIN_JIFFIES
@@ -2936,7 +2936,7 @@ struct kfree_rcu_cpu_work {
*/
struct kfree_rcu_cpu {
struct rcu_head *head;
- struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS];
+ struct list_head bulk_head[FREE_N_CHANNELS];
struct kfree_rcu_cpu_work krw_arr[KFREE_N_BATCHES];
raw_spinlock_t lock;
struct delayed_work monitor_work;
@@ -3031,12 +3031,13 @@ drain_page_cache(struct kfree_rcu_cpu *krcp)

/*
* This function is invoked in workqueue context after a grace period.
- * It frees all the objects queued on ->bkvhead_free or ->head_free.
+ * It frees all the objects queued on ->bulk_head_free or ->head_free.
*/
static void kfree_rcu_work(struct work_struct *work)
{
unsigned long flags;
- struct kvfree_rcu_bulk_data *bkvhead[FREE_N_CHANNELS], *bnext;
+ struct kvfree_rcu_bulk_data *bnode, *n;
+ struct list_head bulk_head[FREE_N_CHANNELS];
struct rcu_head *head, *next;
struct kfree_rcu_cpu *krcp;
struct kfree_rcu_cpu_work *krwp;
@@ -3048,10 +3049,8 @@ static void kfree_rcu_work(struct work_struct *work)

raw_spin_lock_irqsave(&krcp->lock, flags);
// Channels 1 and 2.
- for (i = 0; i < FREE_N_CHANNELS; i++) {
- bkvhead[i] = krwp->bkvhead_free[i];
- krwp->bkvhead_free[i] = NULL;
- }
+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ list_replace_init(&krwp->bulk_head_free[i], &bulk_head[i]);

// Channel 3.
head = krwp->head_free;
@@ -3060,36 +3059,33 @@ static void kfree_rcu_work(struct work_struct *work)

// Handle the first two channels.
for (i = 0; i < FREE_N_CHANNELS; i++) {
- for (; bkvhead[i]; bkvhead[i] = bnext) {
- bnext = bkvhead[i]->next;
- debug_rcu_bhead_unqueue(bkvhead[i]);
+ list_for_each_entry_safe(bnode, n, &bulk_head[i], list) {
+ debug_rcu_bhead_unqueue(bnode);

rcu_lock_acquire(&rcu_callback_map);
if (i == 0) { // kmalloc() / kfree().
trace_rcu_invoke_kfree_bulk_callback(
- rcu_state.name, bkvhead[i]->nr_records,
- bkvhead[i]->records);
+ rcu_state.name, bnode->nr_records,
+ bnode->records);

- kfree_bulk(bkvhead[i]->nr_records,
- bkvhead[i]->records);
+ kfree_bulk(bnode->nr_records, bnode->records);
} else { // vmalloc() / vfree().
- for (j = 0; j < bkvhead[i]->nr_records; j++) {
+ for (j = 0; j < bnode->nr_records; j++) {
trace_rcu_invoke_kvfree_callback(
- rcu_state.name,
- bkvhead[i]->records[j], 0);
+ rcu_state.name, bnode->records[j], 0);

- vfree(bkvhead[i]->records[j]);
+ vfree(bnode->records[j]);
}
}
rcu_lock_release(&rcu_callback_map);

raw_spin_lock_irqsave(&krcp->lock, flags);
- if (put_cached_bnode(krcp, bkvhead[i]))
- bkvhead[i] = NULL;
+ if (put_cached_bnode(krcp, bnode))
+ bnode = NULL;
raw_spin_unlock_irqrestore(&krcp->lock, flags);

- if (bkvhead[i])
- free_page((unsigned long) bkvhead[i]);
+ if (bnode)
+ free_page((unsigned long) bnode);

cond_resched_tasks_rcu_qs();
}
@@ -3125,7 +3121,7 @@ need_offload_krc(struct kfree_rcu_cpu *krcp)
int i;

for (i = 0; i < FREE_N_CHANNELS; i++)
- if (krcp->bkvhead[i])
+ if (!list_empty(&krcp->bulk_head[i]))
return true;

return !!krcp->head;
@@ -3162,21 +3158,20 @@ static void kfree_rcu_monitor(struct work_struct *work)
for (i = 0; i < KFREE_N_BATCHES; i++) {
struct kfree_rcu_cpu_work *krwp = &(krcp->krw_arr[i]);

- // Try to detach bkvhead or head and attach it over any
+ // Try to detach bulk_head or head and attach it over any
// available corresponding free channel. It can be that
// a previous RCU batch is in progress, it means that
// immediately to queue another one is not possible so
// in that case the monitor work is rearmed.
- if ((krcp->bkvhead[0] && !krwp->bkvhead_free[0]) ||
- (krcp->bkvhead[1] && !krwp->bkvhead_free[1]) ||
+ if ((!list_empty(&krcp->bulk_head[0]) && list_empty(&krwp->bulk_head_free[0])) ||
+ (!list_empty(&krcp->bulk_head[1]) && list_empty(&krwp->bulk_head_free[1])) ||
(krcp->head && !krwp->head_free)) {
+
// Channel 1 corresponds to the SLAB-pointer bulk path.
// Channel 2 corresponds to vmalloc-pointer bulk path.
for (j = 0; j < FREE_N_CHANNELS; j++) {
- if (!krwp->bkvhead_free[j]) {
- krwp->bkvhead_free[j] = krcp->bkvhead[j];
- krcp->bkvhead[j] = NULL;
- }
+ if (list_empty(&krwp->bulk_head_free[j]))
+ list_replace_init(&krcp->bulk_head[j], &krwp->bulk_head_free[j]);
}

// Channel 3 corresponds to both SLAB and vmalloc
@@ -3288,10 +3283,11 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
return false;

idx = !!is_vmalloc_addr(ptr);
+ bnode = list_first_entry_or_null(&(*krcp)->bulk_head[idx],
+ struct kvfree_rcu_bulk_data, list);

/* Check if a new block is required. */
- if (!(*krcp)->bkvhead[idx] ||
- (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) {
+ if (!bnode || bnode->nr_records == KVFREE_BULK_MAX_ENTR) {
bnode = get_cached_bnode(*krcp);
if (!bnode && can_alloc) {
krc_this_cpu_unlock(*krcp, *flags);
@@ -3315,18 +3311,13 @@ add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp,
if (!bnode)
return false;

- /* Initialize the new block. */
+ // Initialize the new block and attach it.
bnode->nr_records = 0;
- bnode->next = (*krcp)->bkvhead[idx];
-
- /* Attach it to the head. */
- (*krcp)->bkvhead[idx] = bnode;
+ list_add(&bnode->list, &(*krcp)->bulk_head[idx]);
}

/* Finally insert. */
- (*krcp)->bkvhead[idx]->records
- [(*krcp)->bkvhead[idx]->nr_records++] = ptr;
-
+ bnode->records[bnode->nr_records++] = ptr;
return true;
}

@@ -4761,7 +4752,7 @@ struct workqueue_struct *rcu_gp_wq;
static void __init kfree_rcu_batch_init(void)
{
int cpu;
- int i;
+ int i, j;

/* Clamp it to [0:100] seconds interval. */
if (rcu_delay_page_cache_fill_msec < 0 ||
@@ -4781,8 +4772,14 @@ static void __init kfree_rcu_batch_init(void)
for (i = 0; i < KFREE_N_BATCHES; i++) {
INIT_RCU_WORK(&krcp->krw_arr[i].rcu_work, kfree_rcu_work);
krcp->krw_arr[i].krcp = krcp;
+
+ for (j = 0; j < FREE_N_CHANNELS; j++)
+ INIT_LIST_HEAD(&krcp->krw_arr[i].bulk_head_free[j]);
}

+ for (i = 0; i < FREE_N_CHANNELS; i++)
+ INIT_LIST_HEAD(&krcp->bulk_head[i]);
+
INIT_DELAYED_WORK(&krcp->monitor_work, kfree_rcu_monitor);
INIT_DELAYED_WORK(&krcp->page_cache_work, fill_page_cache_func);
krcp->initialized = true;
--
2.31.1.189.g2e36527f23

2023-02-03 01:43:16

by Paul E. McKenney

[permalink] [raw]
Subject: [PATCH rcu 9/8] Add kvfree_rcu_mightsleep() and kfree_rcu_mightsleep()

The kvfree_rcu() and kfree_rcu() APIs are hazardous in that if you forget
the second argument, it works, but might sleep. This sleeping can be a
correctness bug from atomic contexts, and even in non-atomic contexts
it might introduce unacceptable latencies. This commit therefore adds
kvfree_rcu_mightsleep() and kfree_rcu_mightsleep(), which will replace
the single-argument kvfree_rcu() and kfree_rcu(), respectively.

This commit enables a series of commits that switch from single-argument
kvfree_rcu() and kfree_rcu() to their _mightsleep() counterparts. Once
all of these commits land, the single-argument versions will be removed.

Signed-off-by: Uladzislau Rezki (Sony) <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f38d4469d7f30..84433600885a6 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -1004,6 +1004,9 @@ static inline notrace void rcu_read_unlock_sched_notrace(void)
#define kvfree_rcu(...) KVFREE_GET_MACRO(__VA_ARGS__, \
kvfree_rcu_arg_2, kvfree_rcu_arg_1)(__VA_ARGS__)

+#define kvfree_rcu_mightsleep(ptr) kvfree_rcu_arg_1(ptr)
+#define kfree_rcu_mightsleep(ptr) kvfree_rcu_mightsleep(ptr)
+
#define KVFREE_GET_MACRO(_1, _2, NAME, ...) NAME
#define kvfree_rcu_arg_2(ptr, rhf) \
do { \