by Lennart Sorensen

[permalink] [raw]

On 2017/07/21 11:36, Benjamin Poirier wrote:
> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
> adapter (when operating in msix mode) raises the Other interrupt to signal
> Receiver Overrun.
>
> This is a double problem because 1) at the moment e1000_msix_other()
> assumes that it is only called in case of Link Status Change and 2) if the
> condition persists, the interrupt is repeatedly raised again in quick
> succession.
>
> Ideally we would configure the Other interrupt to not be raised in case of
> receiver overrun but this doesn't seem possible on this adapter. Instead,
> we handle the first part of the problem by reverting to the practice of
> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
> anymore. We handle the second part of the problem by not re-enabling the
> Other interrupt right away when there is overrun. Instead, we wait until
> traffic subsides, napi polling mode is exited and interrupts are
> re-enabled.
>
> Reported-by: Lennart Sorensen <[email protected]>
> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
> Signed-off-by: Benjamin Poirier <[email protected]>

What's the status on these patches please? One month later they still
show up as "new" in patchwork.

2017-09-15 00:18:51

by Brown, Aaron F

[permalink] [raw]

Subject: RE: [Intel-wired-lan] [PATCH 1/5] e1000e: Fix error path in link detection

2017-09-15 00:20:21

by Brown, Aaron F

[permalink] [raw]

Subject: RE: [Intel-wired-lan] [PATCH 3/5] e1000e: Fix return value test

2017-09-15 00:27:50

by Brown, Aaron F

[permalink] [raw]

Subject: RE: [Intel-wired-lan] [PATCH 4/5] e1000e: Separate signaling for link check/link up

2017-09-15 00:38:26

by Brown, Aaron F

[permalink] [raw]

Subject: RE: [Intel-wired-lan] [PATCH 5/5] e1000e: Avoid receiver overrun interrupt bursts

> From: Intel-wired-lan [mailto:[email protected]] On Behalf
> Of Benjamin Poirier
> Sent: Friday, July 21, 2017 11:36 AM
> To: Kirsher, Jeffrey T <[email protected]>
> Cc: [email protected]; [email protected]; linux-
> [email protected]; Lennart Sorensen <[email protected]>
> Subject: [Intel-wired-lan] [PATCH 5/5] e1000e: Avoid receiver overrun
> interrupt bursts
>
> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
> adapter (when operating in msix mode) raises the Other interrupt to signal
> Receiver Overrun.
>
> This is a double problem because 1) at the moment e1000_msix_other()
> assumes that it is only called in case of Link Status Change and 2) if the
> condition persists, the interrupt is repeatedly raised again in quick
> succession.
>
> Ideally we would configure the Other interrupt to not be raised in case of
> receiver overrun but this doesn't seem possible on this adapter. Instead,
> we handle the first part of the problem by reverting to the practice of
> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
> anymore. We handle the second part of the problem by not re-enabling the
> Other interrupt right away when there is overrun. Instead, we wait until
> traffic subsides, napi polling mode is exited and interrupts are
> re-enabled.
>
> Reported-by: Lennart Sorensen <[email protected]>
> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
> Signed-off-by: Benjamin Poirier <[email protected]>
> ---
> drivers/net/ethernet/intel/e1000e/defines.h | 1 +
> drivers/net/ethernet/intel/e1000e/netdev.c | 33
> +++++++++++++++++++++++------
> 2 files changed, 27 insertions(+), 7 deletions(-)
>

I get an error and a few warnings out of checkpatch from this, but I think the error is false (thinking the reference to a commit in the description is this commit, a fixes commit or something like that) and I'm more concerned with the fix than the warnings, so...

Tested-by: Aaron Brown <[email protected]>

Here is the checkpatch output in case anyone has a different opinion on the severity:
-------------
u1484:[0]/usr/src/kernels/next-queue> git format-patch d81d1e6 -1 --stdout|./scripts/checkpatch.pl -
ERROR: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask")'
#20:
0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME

WARNING: braces {} are not necessary for single statement blocks
#73: FILE: drivers/net/ethernet/intel/e1000e/netdev.c:1931:
+ if (!test_bit(__E1000_DOWN, &adapter->state)) {
+ mod_timer(&adapter->watchdog_timer, jiffies + 1);
+ }

WARNING: braces {} are not necessary for single statement blocks
#83: FILE: drivers/net/ethernet/intel/e1000e/netdev.c:1936:
+ if (enable && !test_bit(__E1000_DOWN, &adapter->state)) {
ew32(IMS, E1000_IMS_OTHER);
}

total: 1 errors, 2 warnings, 0 checks, 59 lines checked

NOTE: For some of the reported defects, checkpatch may be able to
mechanically convert to the typical style using --fix or --fix-inplace.

Your patch has style problems, please review.

NOTE: If any of the errors are false positives, please report
them to the maintainer, see CHECKPATCH in MAINTAINERS.
u1484:[0]/usr/src/kernels/next-queue>

2017-09-19 00:13:45

by Brown, Aaron F

[permalink] [raw]

Subject: RE: [Intel-wired-lan] [PATCH 2/5] e1000e: Fix wrong comment related to link detection

2017-09-19 18:38:12

by Philip Prindeville

[permalink] [raw]

Subject: Re: [5/5] e1000e: Avoid receiver overrun interrupt bursts

Hi.

We’ve been running this patchset (all 5) for about as long as they’ve been under review… about 2 months. And in a burn-in lab with heavy traffic.

We’ve not seen a single link-flap in hundreds of ours of saturated traffic.

Would love to see some resolution soon on this as we don’t want to ship a release with unsanctioned patches.

Is there an estimate on when that might be?

Thanks,

-Philip

> On Jul 21, 2017, at 12:36 PM, Benjamin Poirier <[email protected]> wrote:
>
> When e1000e_poll() is not fast enough to keep up with incoming traffic, the
> adapter (when operating in msix mode) raises the Other interrupt to signal
> Receiver Overrun.
>
> This is a double problem because 1) at the moment e1000_msix_other()
> assumes that it is only called in case of Link Status Change and 2) if the
> condition persists, the interrupt is repeatedly raised again in quick
> succession.
>
> Ideally we would configure the Other interrupt to not be raised in case of
> receiver overrun but this doesn't seem possible on this adapter. Instead,
> we handle the first part of the problem by reverting to the practice of
> reading ICR in the other interrupt handler, like before commit 16ecba59bc33
> ("e1000e: Do not read ICR in Other interrupt"). Thanks to commit
> 0a8047ac68e5 ("e1000e: Fix msi-x interrupt automask") which cleared IAME
> from CTRL_EXT, reading ICR doesn't interfere with RxQ0, TxQ0 interrupts
> anymore. We handle the second part of the problem by not re-enabling the
> Other interrupt right away when there is overrun. Instead, we wait until
> traffic subsides, napi polling mode is exited and interrupts are
> re-enabled.
>
> Reported-by: Lennart Sorensen <[email protected]>
> Fixes: 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt")
> Signed-off-by: Benjamin Poirier <[email protected]>
> Tested-by: Aaron Brown <[email protected]>
> ---
> drivers/net/ethernet/intel/e1000e/defines.h | 1 +
> drivers/net/ethernet/intel/e1000e/netdev.c | 33 +++++++++++++++++++++++------
> 2 files changed, 27 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h b/drivers/net/ethernet/intel/e1000e/defines.h
> index 0641c0098738..afb7ebe20b24 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -398,6 +398,7 @@
> #define E1000_ICR_LSC 0x00000004 /* Link Status Change */
> #define E1000_ICR_RXSEQ 0x00000008 /* Rx sequence error */
> #define E1000_ICR_RXDMT0 0x00000010 /* Rx desc min. threshold (0) */
> +#define E1000_ICR_RXO 0x00000040 /* Receiver Overrun */
> #define E1000_ICR_RXT0 0x00000080 /* Rx timer intr (ring 0) */
> #define E1000_ICR_ECCER 0x00400000 /* Uncorrectable ECC Error */
> /* If this bit asserted, the driver should claim the interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 5a8ab1136566..803edd1a6401 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1910,12 +1910,30 @@ static irqreturn_t e1000_msix_other(int __always_unused irq, void *data)
> struct net_device *netdev = data;
> struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = &adapter->hw;
> + u32 icr;
> + bool enable = true;
> +
> + icr = er32(ICR);
> + if (icr & E1000_ICR_RXO) {
> + ew32(ICR, E1000_ICR_RXO);
> + enable = false;
> + /* napi poll will re-enable Other, make sure it runs */
> + if (napi_schedule_prep(&adapter->napi)) {
> + adapter->total_rx_bytes = 0;
> + adapter->total_rx_packets = 0;
> + __napi_schedule(&adapter->napi);
> + }
> + }
> + if (icr & E1000_ICR_LSC) {
> + ew32(ICR, E1000_ICR_LSC);
> + hw->mac.get_link_status = true;
> + /* guard against interrupt when we're going down */
> + if (!test_bit(__E1000_DOWN, &adapter->state)) {
> + mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + }
> + }
>
> - hw->mac.get_link_status = true;
> -
> - /* guard against interrupt when we're going down */
> - if (!test_bit(__E1000_DOWN, &adapter->state)) {
> - mod_timer(&adapter->watchdog_timer, jiffies + 1);
> + if (enable && !test_bit(__E1000_DOWN, &adapter->state)) {
> ew32(IMS, E1000_IMS_OTHER);
> }
>
> @@ -2687,7 +2705,8 @@ static int e1000e_poll(struct napi_struct *napi, int weight)
> napi_complete_done(napi, work_done);
> if (!test_bit(__E1000_DOWN, &adapter->state)) {
> if (adapter->msix_entries)
> - ew32(IMS, adapter->rx_ring->ims_val);
> + ew32(IMS, adapter->rx_ring->ims_val |
> + E1000_IMS_OTHER);
> else
> e1000_irq_enable(adapter);
> }
> @@ -4204,7 +4223,7 @@ static void e1000e_trigger_lsc(struct e1000_adapter *adapter)
> struct e1000_hw *hw = &adapter->hw;
>
> if (adapter->msix_entries)
> - ew32(ICS, E1000_ICS_OTHER);
> + ew32(ICS, E1000_ICS_LSC | E1000_ICS_OTHER);
> else
> ew32(ICS, E1000_ICS_LSC);
> }

2017-09-19 19:41:07

by Benjamin Poirier

[permalink] [raw]

Subject: Re: [5/5] e1000e: Avoid receiver overrun interrupt bursts

On 2017/09/19 12:38, Philip Prindeville wrote:
> Hi.
>
> We’ve been running this patchset (all 5) for about as long as they’ve been under review… about 2 months. And in a burn-in lab with heavy traffic.
>
> We’ve not seen a single link-flap in hundreds of ours of saturated traffic.
>
> Would love to see some resolution soon on this as we don’t want to ship a release with unsanctioned patches.
>
> Is there an estimate on when that might be?

The patches have been added to Jeff Kirsher's next-queue tree. I guess
they will be submitted for v4.15 which might be released in early
2018...
http://phb-crystal-ball.org/