2023-03-11 16:36:30

by Jason Xing

[permalink] [raw]
Subject: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

From: Jason Xing <[email protected]>

When we encounter some performance issue and then get lost on how
to tune the budget limit and time limit in net_rx_action() function,
we can separately counting both of them to avoid the confusion.

Signed-off-by: Jason Xing <[email protected]>
---
note: this commit is based on the link as below:
https://lore.kernel.org/lkml/[email protected]/
---
include/linux/netdevice.h | 1 +
net/core/dev.c | 12 ++++++++----
net/core/net-procfs.c | 9 ++++++---
3 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6a14b7b11766..5736311a2133 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3157,6 +3157,7 @@ struct softnet_data {
/* stats */
unsigned int processed;
unsigned int time_squeeze;
+ unsigned int budget_squeeze;
#ifdef CONFIG_RPS
struct softnet_data *rps_ipi_list;
#endif
diff --git a/net/core/dev.c b/net/core/dev.c
index 253584777101..bed7a68fdb5d 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
unsigned long time_limit = jiffies +
usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
int budget = READ_ONCE(netdev_budget);
+ bool is_continue = true;
LIST_HEAD(list);
LIST_HEAD(repoll);

@@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
list_splice_init(&sd->poll_list, &list);
local_irq_enable();

- for (;;) {
+ for (; is_continue;) {
struct napi_struct *n;

skb_defer_free_flush(sd);
@@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
* Allow this to run for 2 jiffies since which will allow
* an average latency of 1.5/HZ.
*/
- if (unlikely(budget <= 0 ||
- time_after_eq(jiffies, time_limit))) {
+ if (unlikely(budget <= 0)) {
+ sd->budget_squeeze++;
+ is_continue = false;
+ }
+ if (unlikely(time_after_eq(jiffies, time_limit))) {
sd->time_squeeze++;
- break;
+ is_continue = false;
}
}

diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
index 97a304e1957a..4d1a499d7c43 100644
--- a/net/core/net-procfs.c
+++ b/net/core/net-procfs.c
@@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
*/
seq_printf(seq,
"%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
- "%08x %08x\n",
- sd->processed, sd->dropped, sd->time_squeeze, 0,
+ "%08x %08x %08x %08x\n",
+ sd->processed, sd->dropped,
+ 0, /* was old way to count time squeeze */
+ 0,
0, 0, 0, 0, /* was fastroute */
0, /* was cpu_collision */
sd->received_rps, flow_limit_count,
0, /* was len of two backlog queues */
(int)seq->index,
- softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
+ softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
+ sd->time_squeeze, sd->budget_squeeze);
return 0;
}

--
2.37.3



2023-03-11 17:14:19

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Sun, 12 Mar 2023 00:36:14 +0800
Jason Xing <[email protected]> wrote:

> - for (;;) {
> + for (; is_continue;) {


Easier to read this as a
while (is_continue) {

but what is wrong with using break; instead?

2023-03-12 00:05:08

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Sun, Mar 12, 2023 at 1:14 AM Stephen Hemminger
<[email protected]> wrote:
>
> On Sun, 12 Mar 2023 00:36:14 +0800
> Jason Xing <[email protected]> wrote:
>
> > - for (;;) {
> > + for (; is_continue;) {
>
>
> Easier to read this as a
> while (is_continue) {
>
> but what is wrong with using break; instead?

If we hit the budget limit and 'break;' immediately, we may miss the
collection when we also hit the time limit. That's why I would like to
know if we hit both of them.

Thank,
Jason

2023-03-13 02:06:04

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <[email protected]> wrote:
>
> From: Jason Xing <[email protected]>
>
> When we encounter some performance issue and then get lost on how
> to tune the budget limit and time limit in net_rx_action() function,
> we can separately counting both of them to avoid the confusion.
>
> Signed-off-by: Jason Xing <[email protected]>
> ---
> note: this commit is based on the link as below:
> https://lore.kernel.org/lkml/[email protected]/
> ---
> include/linux/netdevice.h | 1 +
> net/core/dev.c | 12 ++++++++----
> net/core/net-procfs.c | 9 ++++++---
> 3 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6a14b7b11766..5736311a2133 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3157,6 +3157,7 @@ struct softnet_data {
> /* stats */
> unsigned int processed;
> unsigned int time_squeeze;
> + unsigned int budget_squeeze;
> #ifdef CONFIG_RPS
> struct softnet_data *rps_ipi_list;
> #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 253584777101..bed7a68fdb5d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> unsigned long time_limit = jiffies +
> usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> int budget = READ_ONCE(netdev_budget);
> + bool is_continue = true;

I kept thinking during these days, I think it looks not that concise
and elegant and also the name is not that good though the function can
work.

In the next submission, I'm going to choose to use 'while()' instead
of 'for()' suggested by Stephen.

Does anyone else have some advice about this?

Thanks,
Jason

> LIST_HEAD(list);
> LIST_HEAD(repoll);
>
> @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> list_splice_init(&sd->poll_list, &list);
> local_irq_enable();
>
> - for (;;) {
> + for (; is_continue;) {
> struct napi_struct *n;
>
> skb_defer_free_flush(sd);
> @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> * Allow this to run for 2 jiffies since which will allow
> * an average latency of 1.5/HZ.
> */
> - if (unlikely(budget <= 0 ||
> - time_after_eq(jiffies, time_limit))) {
> + if (unlikely(budget <= 0)) {
> + sd->budget_squeeze++;
> + is_continue = false;
> + }
> + if (unlikely(time_after_eq(jiffies, time_limit))) {
> sd->time_squeeze++;
> - break;
> + is_continue = false;
> }
> }
>
> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> index 97a304e1957a..4d1a499d7c43 100644
> --- a/net/core/net-procfs.c
> +++ b/net/core/net-procfs.c
> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> */
> seq_printf(seq,
> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> - "%08x %08x\n",
> - sd->processed, sd->dropped, sd->time_squeeze, 0,
> + "%08x %08x %08x %08x\n",
> + sd->processed, sd->dropped,
> + 0, /* was old way to count time squeeze */
> + 0,
> 0, 0, 0, 0, /* was fastroute */
> 0, /* was cpu_collision */
> sd->received_rps, flow_limit_count,
> 0, /* was len of two backlog queues */
> (int)seq->index,
> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> + sd->time_squeeze, sd->budget_squeeze);
> return 0;
> }
>
> --
> 2.37.3
>

2023-03-13 20:07:33

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote:
> On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <[email protected]> wrote:
> >
> > From: Jason Xing <[email protected]>
> >
> > When we encounter some performance issue and then get lost on how
> > to tune the budget limit and time limit in net_rx_action() function,
> > we can separately counting both of them to avoid the confusion.
> >
> > Signed-off-by: Jason Xing <[email protected]>
> > ---
> > note: this commit is based on the link as below:
> > https://lore.kernel.org/lkml/[email protected]/
> > ---
> > include/linux/netdevice.h | 1 +
> > net/core/dev.c | 12 ++++++++----
> > net/core/net-procfs.c | 9 ++++++---
> > 3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 6a14b7b11766..5736311a2133 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3157,6 +3157,7 @@ struct softnet_data {
> > /* stats */
> > unsigned int processed;
> > unsigned int time_squeeze;
> > + unsigned int budget_squeeze;
> > #ifdef CONFIG_RPS
> > struct softnet_data *rps_ipi_list;
> > #endif
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 253584777101..bed7a68fdb5d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > unsigned long time_limit = jiffies +
> > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> > int budget = READ_ONCE(netdev_budget);
> > + bool is_continue = true;
>
> I kept thinking during these days, I think it looks not that concise
> and elegant and also the name is not that good though the function can
> work.
>
> In the next submission, I'm going to choose to use 'while()' instead
> of 'for()' suggested by Stephen.
>
> Does anyone else have some advice about this?

What about:

int done = false

while (!done) {
...
}

Or:

for (;;) {
int done = false;

...
if (done)
break;
}

>
> Thanks,
> Jason
>
> > LIST_HEAD(list);
> > LIST_HEAD(repoll);
> >
> > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > list_splice_init(&sd->poll_list, &list);
> > local_irq_enable();
> >
> > - for (;;) {
> > + for (; is_continue;) {
> > struct napi_struct *n;
> >
> > skb_defer_free_flush(sd);
> > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > * Allow this to run for 2 jiffies since which will allow
> > * an average latency of 1.5/HZ.
> > */
> > - if (unlikely(budget <= 0 ||
> > - time_after_eq(jiffies, time_limit))) {
> > + if (unlikely(budget <= 0)) {
> > + sd->budget_squeeze++;
> > + is_continue = false;
> > + }
> > + if (unlikely(time_after_eq(jiffies, time_limit))) {
> > sd->time_squeeze++;
> > - break;
> > + is_continue = false;
> > }
> > }
> >
> > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > index 97a304e1957a..4d1a499d7c43 100644
> > --- a/net/core/net-procfs.c
> > +++ b/net/core/net-procfs.c
> > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> > */
> > seq_printf(seq,
> > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > - "%08x %08x\n",
> > - sd->processed, sd->dropped, sd->time_squeeze, 0,
> > + "%08x %08x %08x %08x\n",
> > + sd->processed, sd->dropped,
> > + 0, /* was old way to count time squeeze */
> > + 0,
> > 0, 0, 0, 0, /* was fastroute */
> > 0, /* was cpu_collision */
> > sd->received_rps, flow_limit_count,
> > 0, /* was len of two backlog queues */
> > (int)seq->index,
> > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > + sd->time_squeeze, sd->budget_squeeze);
> > return 0;
> > }
> >
> > --
> > 2.37.3
> >
>

2023-03-13 21:59:26

by Kui-Feng Lee

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior



On 3/11/23 08:36, Jason Xing wrote:
> From: Jason Xing <[email protected]>
>
> When we encounter some performance issue and then get lost on how
> to tune the budget limit and time limit in net_rx_action() function,
> we can separately counting both of them to avoid the confusion.
>
> Signed-off-by: Jason Xing <[email protected]>
> ---
> note: this commit is based on the link as below:
> https://lore.kernel.org/lkml/[email protected]/
> ---
> include/linux/netdevice.h | 1 +
> net/core/dev.c | 12 ++++++++----
> net/core/net-procfs.c | 9 ++++++---
> 3 files changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6a14b7b11766..5736311a2133 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3157,6 +3157,7 @@ struct softnet_data {
> /* stats */
> unsigned int processed;
> unsigned int time_squeeze;
> + unsigned int budget_squeeze;
> #ifdef CONFIG_RPS
> struct softnet_data *rps_ipi_list;
> #endif
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 253584777101..bed7a68fdb5d 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> unsigned long time_limit = jiffies +
> usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> int budget = READ_ONCE(netdev_budget);
> + bool is_continue = true;
> LIST_HEAD(list);
> LIST_HEAD(repoll);
>
> @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> list_splice_init(&sd->poll_list, &list);
> local_irq_enable();
>
> - for (;;) {
> + for (; is_continue;) {
> struct napi_struct *n;
>
> skb_defer_free_flush(sd);
> @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> * Allow this to run for 2 jiffies since which will allow
> * an average latency of 1.5/HZ.
> */
> - if (unlikely(budget <= 0 ||
> - time_after_eq(jiffies, time_limit))) {
> + if (unlikely(budget <= 0)) {
> + sd->budget_squeeze++;
> + is_continue = false;
> + }
> + if (unlikely(time_after_eq(jiffies, time_limit))) {
> sd->time_squeeze++;
> - break;
> + is_continue = false;
> }
> }
>
> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> index 97a304e1957a..4d1a499d7c43 100644
> --- a/net/core/net-procfs.c
> +++ b/net/core/net-procfs.c
> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> */
> seq_printf(seq,
> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> - "%08x %08x\n",
> - sd->processed, sd->dropped, sd->time_squeeze, 0,
> + "%08x %08x %08x %08x\n",
> + sd->processed, sd->dropped,
> + 0, /* was old way to count time squeeze */

Should we show a proximate number? For example,
sd->time_squeeze + sd->bud_squeeze.


> + 0,
> 0, 0, 0, 0, /* was fastroute */
> 0, /* was cpu_collision */
> sd->received_rps, flow_limit_count,
> 0, /* was len of two backlog queues */
> (int)seq->index,
> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> + sd->time_squeeze, sd->budget_squeeze);
> return 0;
> }
>

2023-03-14 01:57:23

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Tue, Mar 14, 2023 at 4:07 AM Simon Horman <[email protected]> wrote:
>
> On Mon, Mar 13, 2023 at 10:05:18AM +0800, Jason Xing wrote:
> > On Sun, Mar 12, 2023 at 12:36 AM Jason Xing <[email protected]> wrote:
> > >
> > > From: Jason Xing <[email protected]>
> > >
> > > When we encounter some performance issue and then get lost on how
> > > to tune the budget limit and time limit in net_rx_action() function,
> > > we can separately counting both of them to avoid the confusion.
> > >
> > > Signed-off-by: Jason Xing <[email protected]>
> > > ---
> > > note: this commit is based on the link as below:
> > > https://lore.kernel.org/lkml/[email protected]/
> > > ---
> > > include/linux/netdevice.h | 1 +
> > > net/core/dev.c | 12 ++++++++----
> > > net/core/net-procfs.c | 9 ++++++---
> > > 3 files changed, 15 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 6a14b7b11766..5736311a2133 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -3157,6 +3157,7 @@ struct softnet_data {
> > > /* stats */
> > > unsigned int processed;
> > > unsigned int time_squeeze;
> > > + unsigned int budget_squeeze;
> > > #ifdef CONFIG_RPS
> > > struct softnet_data *rps_ipi_list;
> > > #endif
> > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > index 253584777101..bed7a68fdb5d 100644
> > > --- a/net/core/dev.c
> > > +++ b/net/core/dev.c
> > > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > > unsigned long time_limit = jiffies +
> > > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> > > int budget = READ_ONCE(netdev_budget);
> > > + bool is_continue = true;
> >
> > I kept thinking during these days, I think it looks not that concise
> > and elegant and also the name is not that good though the function can
> > work.
> >
> > In the next submission, I'm going to choose to use 'while()' instead
> > of 'for()' suggested by Stephen.
> >
> > Does anyone else have some advice about this?
>
> What about:
>
> int done = false
>
> while (!done) {
> ...
> }
>
> Or:
>
> for (;;) {
> int done = false;
>
> ...
> if (done)
> break;
> }
>

Great, that looks much better:)

Thanks,
Jason

> >
> > Thanks,
> > Jason
> >
> > > LIST_HEAD(list);
> > > LIST_HEAD(repoll);
> > >
> > > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > > list_splice_init(&sd->poll_list, &list);
> > > local_irq_enable();
> > >
> > > - for (;;) {
> > > + for (; is_continue;) {
> > > struct napi_struct *n;
> > >
> > > skb_defer_free_flush(sd);
> > > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > > * Allow this to run for 2 jiffies since which will allow
> > > * an average latency of 1.5/HZ.
> > > */
> > > - if (unlikely(budget <= 0 ||
> > > - time_after_eq(jiffies, time_limit))) {
> > > + if (unlikely(budget <= 0)) {
> > > + sd->budget_squeeze++;
> > > + is_continue = false;
> > > + }
> > > + if (unlikely(time_after_eq(jiffies, time_limit))) {
> > > sd->time_squeeze++;
> > > - break;
> > > + is_continue = false;
> > > }
> > > }
> > >
> > > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > > index 97a304e1957a..4d1a499d7c43 100644
> > > --- a/net/core/net-procfs.c
> > > +++ b/net/core/net-procfs.c
> > > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> > > */
> > > seq_printf(seq,
> > > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > > - "%08x %08x\n",
> > > - sd->processed, sd->dropped, sd->time_squeeze, 0,
> > > + "%08x %08x %08x %08x\n",
> > > + sd->processed, sd->dropped,
> > > + 0, /* was old way to count time squeeze */
> > > + 0,
> > > 0, 0, 0, 0, /* was fastroute */
> > > 0, /* was cpu_collision */
> > > sd->received_rps, flow_limit_count,
> > > 0, /* was len of two backlog queues */
> > > (int)seq->index,
> > > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > > + sd->time_squeeze, sd->budget_squeeze);
> > > return 0;
> > > }
> > >
> > > --
> > > 2.37.3
> > >
> >

2023-03-14 01:58:50

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <[email protected]> wrote:
>
>
>
> On 3/11/23 08:36, Jason Xing wrote:
> > From: Jason Xing <[email protected]>
> >
> > When we encounter some performance issue and then get lost on how
> > to tune the budget limit and time limit in net_rx_action() function,
> > we can separately counting both of them to avoid the confusion.
> >
> > Signed-off-by: Jason Xing <[email protected]>
> > ---
> > note: this commit is based on the link as below:
> > https://lore.kernel.org/lkml/[email protected]/
> > ---
> > include/linux/netdevice.h | 1 +
> > net/core/dev.c | 12 ++++++++----
> > net/core/net-procfs.c | 9 ++++++---
> > 3 files changed, 15 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 6a14b7b11766..5736311a2133 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -3157,6 +3157,7 @@ struct softnet_data {
> > /* stats */
> > unsigned int processed;
> > unsigned int time_squeeze;
> > + unsigned int budget_squeeze;
> > #ifdef CONFIG_RPS
> > struct softnet_data *rps_ipi_list;
> > #endif
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index 253584777101..bed7a68fdb5d 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -6637,6 +6637,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > unsigned long time_limit = jiffies +
> > usecs_to_jiffies(READ_ONCE(netdev_budget_usecs));
> > int budget = READ_ONCE(netdev_budget);
> > + bool is_continue = true;
> > LIST_HEAD(list);
> > LIST_HEAD(repoll);
> >
> > @@ -6644,7 +6645,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > list_splice_init(&sd->poll_list, &list);
> > local_irq_enable();
> >
> > - for (;;) {
> > + for (; is_continue;) {
> > struct napi_struct *n;
> >
> > skb_defer_free_flush(sd);
> > @@ -6662,10 +6663,13 @@ static __latent_entropy void net_rx_action(struct softirq_action *h)
> > * Allow this to run for 2 jiffies since which will allow
> > * an average latency of 1.5/HZ.
> > */
> > - if (unlikely(budget <= 0 ||
> > - time_after_eq(jiffies, time_limit))) {
> > + if (unlikely(budget <= 0)) {
> > + sd->budget_squeeze++;
> > + is_continue = false;
> > + }
> > + if (unlikely(time_after_eq(jiffies, time_limit))) {
> > sd->time_squeeze++;
> > - break;
> > + is_continue = false;
> > }
> > }
> >
> > diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> > index 97a304e1957a..4d1a499d7c43 100644
> > --- a/net/core/net-procfs.c
> > +++ b/net/core/net-procfs.c
> > @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> > */
> > seq_printf(seq,
> > "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> > - "%08x %08x\n",
> > - sd->processed, sd->dropped, sd->time_squeeze, 0,
> > + "%08x %08x %08x %08x\n",
> > + sd->processed, sd->dropped,
> > + 0, /* was old way to count time squeeze */
>
> Should we show a proximate number? For example,
> sd->time_squeeze + sd->bud_squeeze.

Yeah, It does make sense. Let the old way to display untouched.

>
>
> > + 0,
> > 0, 0, 0, 0, /* was fastroute */
> > 0, /* was cpu_collision */
> > sd->received_rps, flow_limit_count,
> > 0, /* was len of two backlog queues */
> > (int)seq->index,
> > - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> > + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> > + sd->time_squeeze, sd->budget_squeeze);
> > return 0;
> > }
> >

2023-03-14 08:41:29

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior


On 14/03/2023 02.57, Jason Xing wrote:
> On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <[email protected]> wrote:
>>
>> On 3/11/23 08:36, Jason Xing wrote:
>>> From: Jason Xing <[email protected]>
>>>
>>> When we encounter some performance issue and then get lost on how
>>> to tune the budget limit and time limit in net_rx_action() function,
>>> we can separately counting both of them to avoid the confusion.
>>>
>>> Signed-off-by: Jason Xing <[email protected]>
>>> ---
>>> note: this commit is based on the link as below:
>>> https://lore.kernel.org/lkml/[email protected]/
>>> ---
[...]
>>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
>>> index 97a304e1957a..4d1a499d7c43 100644
>>> --- a/net/core/net-procfs.c
>>> +++ b/net/core/net-procfs.c
>>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
>>> */
>>> seq_printf(seq,
>>> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
>>> - "%08x %08x\n",
>>> - sd->processed, sd->dropped, sd->time_squeeze, 0,
>>> + "%08x %08x %08x %08x\n",
>>> + sd->processed, sd->dropped,
>>> + 0, /* was old way to count time squeeze */
>>
>> Should we show a proximate number? For example,
>> sd->time_squeeze + sd->bud_squeeze.
>
> Yeah, It does make sense. Let the old way to display untouched.
>

Yes, I don't think we can/should remove this squeeze stat because
several tools e.g. my own[1] captures these stats (and I know Willem
also have his own tool).
I like the sd->time_squeeze + sd->budget_squeeze suggestion.

[1]
https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl


>>
>>
>>> + 0,
>>> 0, 0, 0, 0, /* was fastroute */
>>> 0, /* was cpu_collision */
>>> sd->received_rps, flow_limit_count,
>>> 0, /* was len of two backlog queues */
>>> (int)seq->index,
>>> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
>>> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
>>> + sd->time_squeeze, sd->budget_squeeze);
>>> return 0;
>>> }
>>>

We recently had a very long troubleshooting session around a latency
issue (Cc Simon) where we used the tool[1]. The issue was NIC hardware
RX queue was backlogged, but we didn't see any squeeze events, which
confused us. (This happens because budget was 300 and two NICs using 64
budget each doesn't exceed 300).

We were/are missing another counter to tell us net_rx_action() "repoll"
is happening (as code !list_empty(&repoll)). That were the case and it
would have "told" us that hardware RX ring was full (larger than 64).

We worked around this limitation by using the tracepoint for napi_poll,
and manually deduced that 64 bulking must mean that "repoll" were happening.

Oneliner bpftrace script:

bpftrace -e 'tracepoint:napi:napi_poll {
@napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }'

We used this script (that also measures softirq latency):


https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt


I do wonder is it would be valuable to *also* add a tracepoint to
net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and
repoll-not-empty.

--Jesper


2023-03-14 09:22:43

by Jason Xing

[permalink] [raw]
Subject: Re: [PATCH net-next] net: introduce budget_squeeze to help us tune rx behavior

On Tue, Mar 14, 2023 at 4:41 PM Jesper Dangaard Brouer
<[email protected]> wrote:
>
>
> On 14/03/2023 02.57, Jason Xing wrote:
> > On Tue, Mar 14, 2023 at 5:58 AM Kui-Feng Lee <[email protected]> wrote:
> >>
> >> On 3/11/23 08:36, Jason Xing wrote:
> >>> From: Jason Xing <[email protected]>
> >>>
> >>> When we encounter some performance issue and then get lost on how
> >>> to tune the budget limit and time limit in net_rx_action() function,
> >>> we can separately counting both of them to avoid the confusion.
> >>>
> >>> Signed-off-by: Jason Xing <[email protected]>
> >>> ---
> >>> note: this commit is based on the link as below:
> >>> https://lore.kernel.org/lkml/[email protected]/
> >>> ---
> [...]
> >>> diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c
> >>> index 97a304e1957a..4d1a499d7c43 100644
> >>> --- a/net/core/net-procfs.c
> >>> +++ b/net/core/net-procfs.c
> >>> @@ -174,14 +174,17 @@ static int softnet_seq_show(struct seq_file *seq, void *v)
> >>> */
> >>> seq_printf(seq,
> >>> "%08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x %08x "
> >>> - "%08x %08x\n",
> >>> - sd->processed, sd->dropped, sd->time_squeeze, 0,
> >>> + "%08x %08x %08x %08x\n",
> >>> + sd->processed, sd->dropped,
> >>> + 0, /* was old way to count time squeeze */
> >>
> >> Should we show a proximate number? For example,
> >> sd->time_squeeze + sd->bud_squeeze.
> >
> > Yeah, It does make sense. Let the old way to display untouched.
> >
>
[...]
> Yes, I don't think we can/should remove this squeeze stat because
> several tools e.g. my own[1] captures these stats (and I know Willem
> also have his own tool).
> I like the sd->time_squeeze + sd->budget_squeeze suggestion.

So do I. Therefore I followed this suggestion in the next submission.

[1]
https://lore.kernel.org/lkml/[email protected]/

>
> [1]
> https://github.com/netoptimizer/network-testing/blob/master/bin/softnet_stat.pl
>
>
> >>
> >>
> >>> + 0,
> >>> 0, 0, 0, 0, /* was fastroute */
> >>> 0, /* was cpu_collision */
> >>> sd->received_rps, flow_limit_count,
> >>> 0, /* was len of two backlog queues */
> >>> (int)seq->index,
> >>> - softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd));
> >>> + softnet_input_pkt_queue_len(sd), softnet_process_queue_len(sd),
> >>> + sd->time_squeeze, sd->budget_squeeze);
> >>> return 0;
> >>> }
> >>>
>
[...]
> We recently had a very long troubleshooting session around a latency
> issue (Cc Simon) where we used the tool[1]. The issue was NIC hardware
> RX queue was backlogged, but we didn't see any squeeze events, which
> confused us. (This happens because budget was 300 and two NICs using 64
> budget each doesn't exceed 300).

I recently found some users running on our production environment hit
the time_squeeze very often which aroused my interests.
Env:
1) budget is 300;
2) eth0 is virtio_net which only registers 32 input interrupts (32
queue pairs) with a larger number of cpus online.

>
> We were/are missing another counter to tell us net_rx_action() "repoll"
> is happening (as code !list_empty(&repoll)). That were the case and it
> would have "told" us that hardware RX ring was full (larger than 64).
>
> We worked around this limitation by using the tracepoint for napi_poll,
> and manually deduced that 64 bulking must mean that "repoll" were happening.
>
> Oneliner bpftrace script:
>
> bpftrace -e 'tracepoint:napi:napi_poll {
> @napi_rx_bulk[str(args->dev_name)] = lhist(args->work, 0, 64, 4); }'
>
> We used this script (that also measures softirq latency):
>
>
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/napi_monitor.bt
>
>
[...]
> I do wonder is it would be valuable to *also* add a tracepoint to
> net_rx_action, that expose sd->time_squeeze, sd->budget_squeeze and
> repoll-not-empty.

I believe it's useful that we can show more details in softnet_data,
but I'm confused about how to display them.
This morning I submitted one patch[1] and chose to do such things when
reading the softnet_stat file.

Could we add more data in the softnet_stat file while also tracing
those three important points? I'm not sure.

Thanks,
Jason

>
> --Jesper
>