2024-04-11 19:23:39

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next v3 0/4] net: dqs: optimize if stall threshold is not set

Here are four patches aimed at enhancing the Dynamic Queue Limit (DQL)
subsystem within the networking stack.

The first two commits involve code refactoring, while the third patch
introduces the actual change. The fourth patch just improves the cache
locality.

Typically, when DQL is enabled, stall information is always populated
through dql_queue_stall(). However, this information is only necessary
if a stall threshold is set, which is stored in struct dql->stall_thrs.

Although dql_queue_stall() is relatively inexpensive, it is not entirely
free due to memory barriers and similar overheads.

To optimize performance, refrain from calling dql_queue_stall() when no
stall threshold is set, thus avoiding the processing of unnecessary
information.

Changelog:

v3:
* Read stall_thrs earlier, at dql_completed(), to avoid any
performance penalty, as suggested by Jakub.
v2:
* Moved the stall_thrs to the very first cache line, as a new
patch. Suggested by Eric Dumazet.
v1:
* https://lore.kernel.org/all/[email protected]/


Breno Leitao (4):
net: dql: Avoid calling BUG() when WARN() is enough
net: dql: Separate queue function responsibilities
net: dql: Optimize stall information population
net: dqs: make struct dql more cache efficient

include/linux/dynamic_queue_limits.h | 50 +++++++++++++++++-----------
lib/dynamic_queue_limits.c | 13 +++++---
2 files changed, 39 insertions(+), 24 deletions(-)

--
2.43.0



2024-04-11 19:23:49

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next v3 1/4] net: dql: Avoid calling BUG() when WARN() is enough

If the dql_queued() function receives an invalid argument, WARN about it
and continue, instead of crashing the kernel.

This was raised by checkpatch, when I am refactoring this code (see
following patch/commit)

WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5693a4be0d9a..ff9c65841ae8 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -91,7 +91,8 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
{
unsigned long map, now, now_hi, i;

- BUG_ON(count > DQL_MAX_OBJECT);
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;

dql->last_obj_cnt = count;

--
2.43.0


2024-04-11 19:24:44

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next v3 4/4] net: dqs: make struct dql more cache efficient

With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.

The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field (stall_thrs) to the
very first cache line, since there is a hole there.

This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.

This is the new structure format now:

struct dql {
unsigned int num_queued;
unsigned int last_obj_cnt;
..
short unsigned int stall_thrs;
/* XXX 2 bytes hole, try to pack */
..
/* --- cacheline 1 boundary (64 bytes) --- */
..
/* Longest stall detected, reported to user */
short unsigned int stall_max;
/* XXX 2 bytes hole, try to pack */
};

Also, read the stall_thrs (now in the very first cache line) earlier,
together with dql->num_queued (also in the first cache line).

Suggested-by: Jakub Kicinski <[email protected]>
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 5 +++--
lib/dynamic_queue_limits.c | 13 +++++++++----
2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 869afb800ea1..281298e77a15 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -50,6 +50,9 @@ struct dql {
unsigned int adj_limit; /* limit + num_completed */
unsigned int last_obj_cnt; /* Count at last queuing */

+ /* Stall threshold (in jiffies), defined by user */
+ unsigned short stall_thrs;
+
unsigned long history_head; /* top 58 bits of jiffies */
/* stall entries, a bit per entry */
unsigned long history[DQL_HIST_LEN];
@@ -71,8 +74,6 @@ struct dql {
unsigned int min_limit; /* Minimum limit */
unsigned int slack_hold_time; /* Time to measure slack */

- /* Stall threshold (in jiffies), defined by user */
- unsigned short stall_thrs;
/* Longest stall detected, reported to user */
unsigned short stall_max;
unsigned long last_reap; /* Last reap (in jiffies) */
diff --git a/lib/dynamic_queue_limits.c b/lib/dynamic_queue_limits.c
index a1389db1c30a..e49deddd3de9 100644
--- a/lib/dynamic_queue_limits.c
+++ b/lib/dynamic_queue_limits.c
@@ -15,12 +15,10 @@
#define POSDIFF(A, B) ((int)((A) - (B)) > 0 ? (A) - (B) : 0)
#define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0)

-static void dql_check_stall(struct dql *dql)
+static void dql_check_stall(struct dql *dql, unsigned short stall_thrs)
{
- unsigned short stall_thrs;
unsigned long now;

- stall_thrs = READ_ONCE(dql->stall_thrs);
if (!stall_thrs)
return;

@@ -86,9 +84,16 @@ void dql_completed(struct dql *dql, unsigned int count)
{
unsigned int inprogress, prev_inprogress, limit;
unsigned int ovlimit, completed, num_queued;
+ unsigned short stall_thrs;
bool all_prev_completed;

num_queued = READ_ONCE(dql->num_queued);
+ /* Read stall_thrs in advance since it belongs to the same (first)
+ * cache line as ->num_queued. This way, dql_check_stall() does not
+ * need to touch the first cache line again later, reducing the window
+ * of possible false sharing.
+ */
+ stall_thrs = READ_ONCE(dql->stall_thrs);

/* Can't complete more than what's in queue */
BUG_ON(count > num_queued - dql->num_completed);
@@ -178,7 +183,7 @@ void dql_completed(struct dql *dql, unsigned int count)
dql->num_completed = completed;
dql->prev_num_queued = num_queued;

- dql_check_stall(dql);
+ dql_check_stall(dql, stall_thrs);
}
EXPORT_SYMBOL(dql_completed);

--
2.43.0


2024-04-11 19:31:40

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next v3 3/4] net: dql: Optimize stall information population

When Dynamic Queue Limit (DQL) is set, it always populate stall
information through dql_queue_stall(). However, this information is
only necessary if a stall threshold is set, stored in struct
dql->stall_thrs.

dql_queue_stall() is cheap, but not free, since it does have memory
barriers and so forth.

Do not call dql_queue_stall() if there is no stall threshold set, and
save some CPU cycles.

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 9980df0b7247..869afb800ea1 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)

dql->num_queued += count;

- dql_queue_stall(dql);
+ /* Only populate stall information if the threshold is set */
+ if (READ_ONCE(dql->stall_thrs))
+ dql_queue_stall(dql);
}

/* Returns how many objects can be queued, < 0 indicates over limit. */
--
2.43.0


2024-04-11 19:46:48

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next v3 2/4] net: dql: Separate queue function responsibilities

The dql_queued() function currently handles both queuing object counts
and populating bitmaps for reporting stalls.

This commit splits the bitmap population into a separate function,
allowing for conditional invocation in scenarios where the feature is
disabled.

This refactor maintains functionality while improving code
organization.

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 44 ++++++++++++++++------------
1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index ff9c65841ae8..9980df0b7247 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -83,28 +83,11 @@ struct dql {
#define DQL_MAX_OBJECT (UINT_MAX / 16)
#define DQL_MAX_LIMIT ((UINT_MAX / 2) - DQL_MAX_OBJECT)

-/*
- * Record number of objects queued. Assumes that caller has already checked
- * availability in the queue with dql_avail.
- */
-static inline void dql_queued(struct dql *dql, unsigned int count)
+/* Populate the bitmap to be processed later in dql_check_stall() */
+static inline void dql_queue_stall(struct dql *dql)
{
unsigned long map, now, now_hi, i;

- if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
- return;
-
- dql->last_obj_cnt = count;
-
- /* We want to force a write first, so that cpu do not attempt
- * to get cache line containing last_obj_cnt, num_queued, adj_limit
- * in Shared state, but directly does a Request For Ownership
- * It is only a hint, we use barrier() only.
- */
- barrier();
-
- dql->num_queued += count;
-
now = jiffies;
now_hi = now / BITS_PER_LONG;

@@ -134,6 +117,29 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
WRITE_ONCE(DQL_HIST_ENT(dql, now_hi), map | BIT_MASK(now));
}

+/*
+ * Record number of objects queued. Assumes that caller has already checked
+ * availability in the queue with dql_avail.
+ */
+static inline void dql_queued(struct dql *dql, unsigned int count)
+{
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;
+
+ dql->last_obj_cnt = count;
+
+ /* We want to force a write first, so that cpu do not attempt
+ * to get cache line containing last_obj_cnt, num_queued, adj_limit
+ * in Shared state, but directly does a Request For Ownership
+ * It is only a hint, we use barrier() only.
+ */
+ barrier();
+
+ dql->num_queued += count;
+
+ dql_queue_stall(dql);
+}
+
/* Returns how many objects can be queued, < 0 indicates over limit. */
static inline int dql_avail(const struct dql *dql)
{
--
2.43.0


2024-04-15 18:40:40

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/4] net: dqs: optimize if stall threshold is not set

Hello:

This series was applied to netdev/net-next.git (main)
by Jakub Kicinski <[email protected]>:

On Thu, 11 Apr 2024 12:22:28 -0700 you wrote:
> Here are four patches aimed at enhancing the Dynamic Queue Limit (DQL)
> subsystem within the networking stack.
>
> The first two commits involve code refactoring, while the third patch
> introduces the actual change. The fourth patch just improves the cache
> locality.
>
> [...]

Here is the summary with links:
- [net-next,v3,1/4] net: dql: Avoid calling BUG() when WARN() is enough
https://git.kernel.org/netdev/net-next/c/4854b463c4b2
- [net-next,v3,2/4] net: dql: Separate queue function responsibilities
https://git.kernel.org/netdev/net-next/c/cbe481a1b741
- [net-next,v3,3/4] net: dql: Optimize stall information population
https://git.kernel.org/netdev/net-next/c/721f076b62cb
- [net-next,v3,4/4] net: dqs: make struct dql more cache efficient
https://git.kernel.org/netdev/net-next/c/4ba67ef3a1fb

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html