2024-04-30 14:19:23

by Praveen Kumar Kannoju

[permalink] [raw]
Subject: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

Applications are sensitive to long network latency, particularly
heartbeat monitoring ones. Longer the tx timeout recovery higher the
risk with such applications on a production machines. This patch
remedies, yet honoring device set tx timeout.

Modify watchdog next timeout to be shorter than the device specified.
Compute the next timeout be equal to device watchdog timeout less the
how long ago queue stop had been done. At next watchdog timeout tx
timeout handler is called into if still in stopped state. Either called
or not called, restore the watchdog timeout back to device specified.

For example, mellanox driver with 15 sec watchdog timeout on its
interfaces will be called for handling timeout at random times as shown
below:

mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 40, SQ: 0x13106, CQ: 0x5b5, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 28559000
mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 59, SQ: 0x132a4, CQ: 0x4f3, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 21424000
mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 42, SQ: 0x12688, CQ: 0x5b5, SQ Cons: 0x1 SQ Prod: 0x1, usecs since last trans: 15919000

whereas with the proposed fix, timeout handler is always called at
appropriate time set into the watchdog by the driver as shown below:

mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 42, SQ: 0x122d6, CQ: 0x4af, SQ Cons: 0x4 SQ Prod: 0x4, usecs since last trans: 15137000
mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 33, SQ: 0x1410f, CQ: 0x48d, SQ Cons: 0xb SQ Prod: 0xb, usecs since last trans: 15568000
mlx5_core 0000:af:00.0 enp175s0f0: TX timeout on queue: 33, SQ: 0x1424a, CQ: 0x599, SQ Cons: 0x8 SQ Prod: 0x8, usecs since last trans: 15539000

Signed-off-by: Praveen Kumar Kannoju <[email protected]>
---
net/sched/sch_generic.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 4a2c763e2d11..64e31f8b4ac1 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t)
unsigned int timedout_ms = 0;
unsigned int i;
unsigned long trans_start;
+ unsigned long next_check = 0;
+ unsigned long current_jiffies;

for (i = 0; i < dev->num_tx_queues; i++) {
struct netdev_queue *txq;
+ current_jiffies = jiffies;

txq = netdev_get_tx_queue(dev, i);
trans_start = READ_ONCE(txq->trans_start);
- if (netif_xmit_stopped(txq) &&
- time_after(jiffies, (trans_start +
- dev->watchdog_timeo))) {
- timedout_ms = jiffies_to_msecs(jiffies - trans_start);
- atomic_long_inc(&txq->trans_timeout);
- break;
+ if (netif_xmit_stopped(txq)) {
+ if (time_after(current_jiffies, (trans_start +
+ dev->watchdog_timeo))) {
+ timedout_ms = jiffies_to_msecs(current_jiffies -
+ trans_start);
+ atomic_long_inc(&txq->trans_timeout);
+ break;
+ }
+ next_check = trans_start + dev->watchdog_timeo -
+ current_jiffies;
}
}

@@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t)
dev->netdev_ops->ndo_tx_timeout(dev, i);
netif_unfreeze_queues(dev);
}
+ if (!next_check)
+ next_check = dev->watchdog_timeo;
if (!mod_timer(&dev->watchdog_timer,
round_jiffies(jiffies +
- dev->watchdog_timeo)))
+ next_check)))
release = false;
}
}
--
2.31.1



2024-05-01 22:13:05

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

On Tue, 30 Apr 2024 19:30:10 +0530 Praveen Kumar Kannoju wrote:
> Applications are sensitive to long network latency, particularly
> heartbeat monitoring ones. Longer the tx timeout recovery higher the
> risk with such applications on a production machines. This patch
> remedies, yet honoring device set tx timeout.
>
> Modify watchdog next timeout to be shorter than the device specified.
> Compute the next timeout be equal to device watchdog timeout less the
> how long ago queue stop had been done. At next watchdog timeout tx
> timeout handler is called into if still in stopped state. Either called
> or not called, restore the watchdog timeout back to device specified.

Idea makes sense, some comments on the code below.

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 4a2c763e2d11..64e31f8b4ac1 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t)
> unsigned int timedout_ms = 0;
> unsigned int i;
> unsigned long trans_start;
> + unsigned long next_check = 0;
> + unsigned long current_jiffies;
>
> for (i = 0; i < dev->num_tx_queues; i++) {
> struct netdev_queue *txq;
> + current_jiffies = jiffies;

Not sure why you save current jiffies.

> txq = netdev_get_tx_queue(dev, i);
> trans_start = READ_ONCE(txq->trans_start);
> - if (netif_xmit_stopped(txq) &&
> - time_after(jiffies, (trans_start +
> - dev->watchdog_timeo))) {
> - timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> - atomic_long_inc(&txq->trans_timeout);
> - break;
> + if (netif_xmit_stopped(txq)) {

please use continue instead of adding another indentation level

> + if (time_after(current_jiffies, (trans_start +

wrap at 80 characters

> + dev->watchdog_timeo))) {
> + timedout_ms = jiffies_to_msecs(current_jiffies -
> + trans_start);
> + atomic_long_inc(&txq->trans_timeout);
> + break;
> + }
> + next_check = trans_start + dev->watchdog_timeo -
> + current_jiffies;

this will give us "next_check" for last queue. Let's instead find the
oldest trans_start in the loop. Do:

unsigned long oldest_start = jiffies;

then in the loop:

oldest_start = min(...)

> }
> }
>
> @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t)
> dev->netdev_ops->ndo_tx_timeout(dev, i);
> netif_unfreeze_queues(dev);
> }
> + if (!next_check)
> + next_check = dev->watchdog_timeo;
> if (!mod_timer(&dev->watchdog_timer,
> round_jiffies(jiffies +
> - dev->watchdog_timeo)))
> + next_check)))

then here you just need to swap jiffies for oldest_start

> release = false;
> }
> }


2024-05-03 14:28:41

by Praveen Kumar Kannoju

[permalink] [raw]
Subject: RE: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

Hi Jakub,
Thank you very much for the review and positive feedback.
We've incorporated your review comments except, using "continue" instead of adding another indentation level.
Request you to elaborate your comment on it.

-
Praveen.

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: 02 May 2024 03:43 AM
> To: Praveen Kannoju <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Rajesh Sivaramasubramaniom <[email protected]>; Rama Nichanamatlu
> <[email protected]>; Manjunath Patil <manjunath.b.patil@oraclecom>
> Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time
>
> On Tue, 30 Apr 2024 19:30:10 +0530 Praveen Kumar Kannoju wrote:
> > Applications are sensitive to long network latency, particularly
> > heartbeat monitoring ones. Longer the tx timeout recovery higher the
> > risk with such applications on a production machines. This patch
> > remedies, yet honoring device set tx timeout.
> >
> > Modify watchdog next timeout to be shorter than the device specified.
> > Compute the next timeout be equal to device watchdog timeout less the
> > how long ago queue stop had been done. At next watchdog timeout tx
> > timeout handler is called into if still in stopped state. Either
> > called or not called, restore the watchdog timeout back to device specified.
>
> Idea makes sense, some comments on the code below.
>
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index
> > 4a2c763e2d11..64e31f8b4ac1 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -506,18 +506,25 @@ static void dev_watchdog(struct timer_list *t)
> > unsigned int timedout_ms = 0;
> > unsigned int i;
> > unsigned long trans_start;
> > + unsigned long next_check = 0;
> > + unsigned long current_jiffies;
> >
> > for (i = 0; i < dev->num_tx_queues; i++) {
> > struct netdev_queue *txq;
> > + current_jiffies = jiffies;
>
> Not sure why you save current jiffies.
>
> > txq = netdev_get_tx_queue(dev, i);
> > trans_start = READ_ONCE(txq->trans_start);
> > - if (netif_xmit_stopped(txq) &&
> > - time_after(jiffies, (trans_start +
> > - dev->watchdog_timeo))) {
> > - timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> > - atomic_long_inc(&txq->trans_timeout);
> > - break;
> > + if (netif_xmit_stopped(txq)) {
>
> please use continue instead of adding another indentation level

We need to take decision on whether to break out of loop or modify "oldest_start" only when
Queue is stopped. Hence one more level of indentation is needed. Can you please elaborate
on using "continue" in existing condition instead of adding a new indentation level.

>
> > + if (time_after(current_jiffies, (trans_start +
>
> wrap at 80 characters
>
> > + dev->watchdog_timeo))) {
> > + timedout_ms = jiffies_to_msecs(current_jiffies -
> > + trans_start);
> > + atomic_long_inc(&txq->trans_timeout);
> > + break;
> > + }
> > + next_check = trans_start + dev->watchdog_timeo -
> > + current_jiffies;
>
> this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do:
>
> unsigned long oldest_start = jiffies;
>
> then in the loop:
>
> oldest_start = min(...)
>
> > }
> > }
> >
> > @@ -530,9 +537,11 @@ static void dev_watchdog(struct timer_list *t)
> > dev->netdev_ops->ndo_tx_timeout(dev, i);
> > netif_unfreeze_queues(dev);
> > }
> > + if (!next_check)
> > + next_check = dev->watchdog_timeo;
> > if (!mod_timer(&dev->watchdog_timer,
> > round_jiffies(jiffies +
> > - dev->watchdog_timeo)))
> > + next_check)))
>
> then here you just need to swap jiffies for oldest_start
>
> > release = false;
> > }
> > }


2024-05-03 19:32:23

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

On Fri, 3 May 2024 14:28:13 +0000 Praveen Kannoju wrote:
> > > txq = netdev_get_tx_queue(dev, i);
> > > trans_start = READ_ONCE(txq->trans_start);
> > > - if (netif_xmit_stopped(txq) &&
> > > - time_after(jiffies, (trans_start +
> > > - dev->watchdog_timeo))) {
> > > - timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> > > - atomic_long_inc(&txq->trans_timeout);
> > > - break;
> > > + if (netif_xmit_stopped(txq)) {
> >
> > please use continue instead of adding another indentation level
>
> We need to take decision on whether to break out of loop or modify "oldest_start" only when
> Queue is stopped. Hence one more level of indentation is needed. Can you please elaborate
> on using "continue" in existing condition instead of adding a new indentation level.

If the queue is not stopped, continue. Split the condition into
multiple ifs.

> > > + dev->watchdog_timeo))) {
> > > + timedout_ms = jiffies_to_msecs(current_jiffies -
> > > + trans_start);
> > > + atomic_long_inc(&txq->trans_timeout);
> > > + break;
> > > + }
> > > + next_check = trans_start + dev->watchdog_timeo -
> > > + current_jiffies;
> >
> > this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do:
> >
> > unsigned long oldest_start = jiffies;
> >
> > then in the loop:
> >
> > oldest_start = min(...)

BTW, the min() I suggested here needs to be a if (time_after(...)),
we can't use bare min() to compare jiffies, because they may wrap.

2024-05-06 14:03:16

by Praveen Kumar Kannoju

[permalink] [raw]
Subject: RE: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time

Thank you, Jakub.
Your comments have been addressed in the v2 patch. We've tested it internally and the patch works as expected. Please review and let us know If any additional changes are needed.

-
Praveen.

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: 04 May 2024 01:00 AM
> To: Praveen Kannoju <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Rajesh Sivaramasubramaniom <[email protected]>; Rama Nichanamatlu
> <[email protected]>; Manjunath Patil <manjunath.b.patil@oraclecom>
> Subject: Re: [PATCH RFC] net/sched: adjust device watchdog timer to detect stopped queue at right time
>
> On Fri, 3 May 2024 14:28:13 +0000 Praveen Kannoju wrote:
> > > > txq = netdev_get_tx_queue(dev, i);
> > > > trans_start = READ_ONCE(txq->trans_start);
> > > > - if (netif_xmit_stopped(txq) &&
> > > > - time_after(jiffies, (trans_start +
> > > > - dev->watchdog_timeo))) {
> > > > - timedout_ms = jiffies_to_msecs(jiffies - trans_start);
> > > > - atomic_long_inc(&txq->trans_timeout);
> > > > - break;
> > > > + if (netif_xmit_stopped(txq)) {
> > >
> > > please use continue instead of adding another indentation level
> >
> > We need to take decision on whether to break out of loop or modify
> > "oldest_start" only when Queue is stopped. Hence one more level of
> > indentation is needed. Can you please elaborate on using "continue" in existing condition instead of adding a new indentation level.
>
> If the queue is not stopped, continue. Split the condition into multiple ifs.
>
> > > > + dev->watchdog_timeo))) {
> > > > + timedout_ms = jiffies_to_msecs(current_jiffies -
> > > > + trans_start);
> > > > + atomic_long_inc(&txq->trans_timeout);
> > > > + break;
> > > > + }
> > > > + next_check = trans_start + dev->watchdog_timeo -
> > > > + current_jiffies;
> > >
> > > this will give us "next_check" for last queue. Let's instead find the oldest trans_start in the loop. Do:
> > >
> > > unsigned long oldest_start = jiffies;
> > >
> > > then in the loop:
> > >
> > > oldest_start = min(...)
>
> BTW, the min() I suggested here needs to be a if (time_after(...)), we can't use bare min() to compare jiffies, because they may wrap.