2024-04-04 15:00:29

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next 0/3] net: dqs: optimize if stall threshold is not set

Here are three patches aimed at enhancing the Dynamic Queue Limit (DQL)
subsystem within the networking stack.

The first two commits involve code refactoring, while the final patch
introduces the actual change.

Typically, when DQL is enabled, stall information is always populated
through dql_queue_stall(). However, this information is only necessary
if a stall threshold is set, which is stored in struct dql->stall_thrs.

Although dql_queue_stall() is relatively inexpensive, it is not entirely
free due to memory barriers and similar overheads.

To optimize performance, refrain from calling dql_queue_stall() when no
stall threshold is set, thus avoiding the processing of unnecessary
information.

Breno Leitao (3):
net: dql: Avoid calling BUG() when WARN() is enough
net: dql: Separate queue function responsibilities
net: dql: Optimize stall information population

include/linux/dynamic_queue_limits.h | 45 +++++++++++++++++-----------
1 file changed, 27 insertions(+), 18 deletions(-)

--
2.43.0



2024-04-04 15:00:48

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next 1/3] net: dql: Avoid calling BUG() when WARN() is enough

If the dql_queued() function receives an invalid argument, WARN about it
and continue, instead of crashing the kernel.

This was raised by checkpatch, when I am refactoring this code (see
following patch/commit)

WARNING: Do not crash the kernel unless it is absolutely unavoidable--use WARN_ON_ONCE() plus recovery code (if feasible) instead of BUG() or variants

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 5693a4be0d9a..ff9c65841ae8 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -91,7 +91,8 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
{
unsigned long map, now, now_hi, i;

- BUG_ON(count > DQL_MAX_OBJECT);
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;

dql->last_obj_cnt = count;

--
2.43.0


2024-04-04 15:01:08

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next 2/3] net: dql: Separate queue function responsibilities

The dql_queued() function currently handles both queuing object counts
and populating bitmaps for reporting stalls.

This commit splits the bitmap population into a separate function,
allowing for conditional invocation in scenarios where the feature is
disabled.

This refactor maintains functionality while improving code
organization.

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 44 ++++++++++++++++------------
1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index ff9c65841ae8..9980df0b7247 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -83,28 +83,11 @@ struct dql {
#define DQL_MAX_OBJECT (UINT_MAX / 16)
#define DQL_MAX_LIMIT ((UINT_MAX / 2) - DQL_MAX_OBJECT)

-/*
- * Record number of objects queued. Assumes that caller has already checked
- * availability in the queue with dql_avail.
- */
-static inline void dql_queued(struct dql *dql, unsigned int count)
+/* Populate the bitmap to be processed later in dql_check_stall() */
+static inline void dql_queue_stall(struct dql *dql)
{
unsigned long map, now, now_hi, i;

- if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
- return;
-
- dql->last_obj_cnt = count;
-
- /* We want to force a write first, so that cpu do not attempt
- * to get cache line containing last_obj_cnt, num_queued, adj_limit
- * in Shared state, but directly does a Request For Ownership
- * It is only a hint, we use barrier() only.
- */
- barrier();
-
- dql->num_queued += count;
-
now = jiffies;
now_hi = now / BITS_PER_LONG;

@@ -134,6 +117,29 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
WRITE_ONCE(DQL_HIST_ENT(dql, now_hi), map | BIT_MASK(now));
}

+/*
+ * Record number of objects queued. Assumes that caller has already checked
+ * availability in the queue with dql_avail.
+ */
+static inline void dql_queued(struct dql *dql, unsigned int count)
+{
+ if (WARN_ON_ONCE(count > DQL_MAX_OBJECT))
+ return;
+
+ dql->last_obj_cnt = count;
+
+ /* We want to force a write first, so that cpu do not attempt
+ * to get cache line containing last_obj_cnt, num_queued, adj_limit
+ * in Shared state, but directly does a Request For Ownership
+ * It is only a hint, we use barrier() only.
+ */
+ barrier();
+
+ dql->num_queued += count;
+
+ dql_queue_stall(dql);
+}
+
/* Returns how many objects can be queued, < 0 indicates over limit. */
static inline int dql_avail(const struct dql *dql)
{
--
2.43.0


2024-04-04 15:01:45

by Breno Leitao

[permalink] [raw]
Subject: [PATCH net-next 3/3] net: dql: Optimize stall information population

When Dynamic Queue Limit (DQL) is set, it always populate stall
information through dql_queue_stall(). However, this information is
only necessary if a stall threshold is set, stored in struct
dql->stall_thrs.

dql_queue_stall() is cheap, but not free, since it does have memory
barriers and so forth.

Do not call dql_queue_stall() if there is no stall threshold set, and
save some CPU cycles.

Signed-off-by: Breno Leitao <[email protected]>
---
include/linux/dynamic_queue_limits.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
index 9980df0b7247..869afb800ea1 100644
--- a/include/linux/dynamic_queue_limits.h
+++ b/include/linux/dynamic_queue_limits.h
@@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)

dql->num_queued += count;

- dql_queue_stall(dql);
+ /* Only populate stall information if the threshold is set */
+ if (READ_ONCE(dql->stall_thrs))
+ dql_queue_stall(dql);
}

/* Returns how many objects can be queued, < 0 indicates over limit. */
--
2.43.0


2024-04-04 16:36:29

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next 3/3] net: dql: Optimize stall information population

On Thu, Apr 4, 2024 at 5:00 PM Breno Leitao <[email protected]> wrote:
>
> When Dynamic Queue Limit (DQL) is set, it always populate stall
> information through dql_queue_stall(). However, this information is
> only necessary if a stall threshold is set, stored in struct
> dql->stall_thrs.
>
> dql_queue_stall() is cheap, but not free, since it does have memory
> barriers and so forth.
>
> Do not call dql_queue_stall() if there is no stall threshold set, and
> save some CPU cycles.
>
> Signed-off-by: Breno Leitao <[email protected]>
> ---
> include/linux/dynamic_queue_limits.h | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> index 9980df0b7247..869afb800ea1 100644
> --- a/include/linux/dynamic_queue_limits.h
> +++ b/include/linux/dynamic_queue_limits.h
> @@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
>
> dql->num_queued += count;
>
> - dql_queue_stall(dql);
> + /* Only populate stall information if the threshold is set */
> + if (READ_ONCE(dql->stall_thrs))
> + dql_queue_stall(dql);

Note that this field is not in the first cache line of 'struct dql',
we will have some false sharing.

Perhaps we could copy it in a hole of the first cache line (used by producers).

struct dql {
unsigned int num_queued; /* 0 0x4 */
unsigned int adj_limit; /* 0x4 0x4 */
unsigned int last_obj_cnt; /* 0x8 0x4 */

/* XXX 4 bytes hole, try to pack */

unsigned long history_head; /* 0x10 0x8 */
unsigned long history[4]; /* 0x18 0x20 */

/* XXX 8 bytes hole, try to pack */

/* --- cacheline 1 boundary (64 bytes) --- */
unsigned int limit __attribute__((__aligned__(64))); /*
0x40 0x4 */
unsigned int num_completed; /* 0x44 0x4 */
unsigned int prev_ovlimit; /* 0x48 0x4 */
unsigned int prev_num_queued; /* 0x4c 0x4 */
unsigned int prev_last_obj_cnt; /* 0x50 0x4 */
unsigned int lowest_slack; /* 0x54 0x4 */
unsigned long slack_start_time; /* 0x58 0x8 */
unsigned int max_limit; /* 0x60 0x4 */
unsigned int min_limit; /* 0x64 0x4 */
unsigned int slack_hold_time; /* 0x68 0x4 */
unsigned short stall_thrs; /* 0x6c 0x2 */
unsigned short stall_max; /* 0x6e 0x2 */
unsigned long last_reap; /* 0x70 0x8 */
unsigned long stall_cnt; /* 0x78 0x8 */

/* size: 128, cachelines: 2, members: 19 */
/* sum members: 116, holes: 2, sum holes: 12 */
/* forced alignments: 1, forced holes: 1, sum forced holes: 8 */
};

2024-04-08 16:45:27

by Breno Leitao

[permalink] [raw]
Subject: Re: [PATCH net-next 3/3] net: dql: Optimize stall information population

On Thu, Apr 04, 2024 at 06:36:00PM +0200, Eric Dumazet wrote:
> On Thu, Apr 4, 2024 at 5:00 PM Breno Leitao <[email protected]> wrote:
> >
> > When Dynamic Queue Limit (DQL) is set, it always populate stall
> > information through dql_queue_stall(). However, this information is
> > only necessary if a stall threshold is set, stored in struct
> > dql->stall_thrs.
> >
> > dql_queue_stall() is cheap, but not free, since it does have memory
> > barriers and so forth.
> >
> > Do not call dql_queue_stall() if there is no stall threshold set, and
> > save some CPU cycles.
> >
> > Signed-off-by: Breno Leitao <[email protected]>
> > ---
> > include/linux/dynamic_queue_limits.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/dynamic_queue_limits.h b/include/linux/dynamic_queue_limits.h
> > index 9980df0b7247..869afb800ea1 100644
> > --- a/include/linux/dynamic_queue_limits.h
> > +++ b/include/linux/dynamic_queue_limits.h
> > @@ -137,7 +137,9 @@ static inline void dql_queued(struct dql *dql, unsigned int count)
> >
> > dql->num_queued += count;
> >
> > - dql_queue_stall(dql);
> > + /* Only populate stall information if the threshold is set */
> > + if (READ_ONCE(dql->stall_thrs))
> > + dql_queue_stall(dql);
>
> Note that this field is not in the first cache line of 'struct dql',
> we will have some false sharing.
>
> Perhaps we could copy it in a hole of the first cache line (used by producers).

That is a good point. Let me move stall_thrs to the first hole.

Thanks