LinuxLists.cc - [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

2008-01-15 05:25:26

Subject: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

After compiling v2.6.24-rc7-163-g1a1b285 (x86_64) yesterday I suddenly see this error
repeatedly:
kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
kernel: Tx Queue <0>
kernel: TDH <a>
kernel: TDT <a>
kernel: next_to_use <a>
kernel: next_to_clean <ff>
kernel: buffer_info[next_to_clean]
kernel: time_stamp <10002738a>
kernel: next_to_watch <ff>
kernel: jiffies <1000275b4>
kernel: next_to_watch.status <1>

My previous kernel was v2.6.24-rc7 and with that this error did not occur. I
have also never seen it with earlier kernels.

The values for "TX Queue" and "next_to_watch.status" are constant, the
others vary.

My NIC is:
01:00.0 Ethernet controller [0200]: Intel Corporation 82573E Gigabit Ethernet Controller (Copper) (rev 03)

01:00.0 0200: 8086:108c (rev 03)
Subsystem: 8086:3096
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 1273
Region 0: Memory at 90200000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at 90100000 (32-bit, non-prefetchable) [size=1M]
Region 2: I/O ports at 1000 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/0 Enable+
Address: 00000000fee0300c Data: 41a9
Capabilities: [e0] Express Endpoint IRQ 0
Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
Device: Latency L0s <512ns, L1 <64us
Device: AtnBtn- AtnInd- PwrInd-
Device: Errors: Correctable- Non-Fatal- Fatal- Unsupported-
Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
Device: MaxPayload 128 bytes, MaxReadReq 512 bytes
Link: Supported Speed 2.5Gb/s, Width x1, ASPM unknown, Port 0
Link: Latency L0s <128ns, L1 <64us
Link: ASPM Disabled RCB 64 bytes CommClk+ ExtSynch-
Link: Speed 2.5Gb/s, Width x1

The system is an Intel D945GCZ main board with
Intel(R) Pentium(R) D CPU 3.20GHz (dual core) processor.

Cheers,
FJP

2008-01-15 05:53:33

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Frans Pop <[email protected]>
Date: Tue, 15 Jan 2008 06:25:10 +0100

> kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang

Does this make the problem go away?

(Note this isn't the final correct patch we should apply. There
is no reason why this revert back to the older ->poll() logic
here should have any effect on the TX hang triggering...)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..cada32c 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_work = 0, work_done = 0;

/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3929,8 +3929,8 @@ e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring[0] is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter,
- &adapter->tx_ring[0]);
+ tx_work = e1000_clean_tx_irq(adapter,
+ &adapter->tx_ring[0]);
spin_unlock(&adapter->tx_queue_lock);
}

@@ -3938,7 +3938,7 @@ e1000_clean(struct napi_struct *napi, int budget)
&work_done, budget);

/* If budget not fully consumed, exit the polling mode */
- if (work_done < budget) {
+ if (!tx_work && (work_done < budget)) {
if (likely(adapter->itr_setting & 3))
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);

2008-01-15 06:17:46

by Frans Pop

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

Wow. That's fast! :-)

On Tuesday 15 January 2008, David Miller wrote:
> From: Frans Pop <[email protected]>
>
> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Does this make the problem go away?

I'm compiling a kernel with the patch now. Will let you know the result.
May take a while as I don't know how to trigger the bug, so I'll just have
to let it run for some time.

2008-01-15 14:04:24

by Frans Pop

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

On Tuesday 15 January 2008, David Miller wrote:
> From: Frans Pop <[email protected]>
> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>
> Does this make the problem go away?

Yes, it very much looks like that solves it.
I ran with the patch for 6 hours or so without any errors. I then switched
back to an unpatched kernel and they reappeared immediately.

> (Note this isn't the final correct patch we should apply. There
> is no reason why this revert back to the older ->poll() logic
> here should have any effect on the TX hang triggering...)

s/no reason/no obvious reason/ ? ;-)

Cheers,
FJP

2008-01-15 16:16:32

by Badalian Vyacheslav

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

Quoting Frans Pop <[email protected]>:

> On Tuesday 15 January 2008, David Miller wrote:
>> From: Frans Pop <[email protected]>
>> > kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>
>> Does this make the problem go away?
>
> Yes, it very much looks like that solves it.
> I ran with the patch for 6 hours or so without any errors. I then switched
> back to an unpatched kernel and they reappeared immediately.
>
>> (Note this isn't the final correct patch we should apply. There
>> is no reason why this revert back to the older ->poll() logic
>> here should have any effect on the TX hang triggering...)
>
> s/no reason/no obvious reason/ ? ;-)
>
> Cheers,
> FJP
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

Hello.

I also try your patch (apply to 2.6.24-rc7-git2)

I catch this message in dmesg
[ 1771.796954] e1000: eth1: e1000_clean_tx_irq: Detected Tx Unit Hang
[ 1771.796957] Tx Queue <0>
[ 1771.796958] TDH <54>
[ 1771.796959] TDT <54>
[ 1771.796960] next_to_use <54>
[ 1771.796961] next_to_clean <a9>
[ 1771.796962] buffer_info[next_to_clean]
[ 1771.796963] time_stamp <14d72e>
[ 1771.796964] next_to_watch <a9>
[ 1771.796965] jiffies <14ddd3>
[ 1771.796966] next_to_watch.status <1>

Thanks.

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

2008-01-15 21:54:12

by Jesse Brandeburg

[permalink] [raw]

Subject: RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

[email protected] wrote:
> Quoting Frans Pop <[email protected]>:
>>> (Note this isn't the final correct patch we should apply. There is
>>> no reason why this revert back to the older ->poll() logic here
>>> should have any effect on the TX hang triggering...)
>>
>> s/no reason/no obvious reason/ ? ;-)

The tx code has an "early exit" that tries to limit the amount of tx
packets handled in a single poll loop and requires napi or interrupt
rescheduling based on the return value from e1000_clean_tx_irq.

see this code in e1000_clean_tx_irq

4005 #ifdef CONFIG_E1000_NAPI
4006 #define E1000_TX_WEIGHT 64
4007 > > /* weight of a sort for tx, to avoid endless
transmit cleanup */
4008 > > if (count++ == E1000_TX_WEIGHT) break;
4009 #endif

I think that is probably related. For a test you could apply the
original patch, and remove this "break" just by commenting out line
4008. This would guarantee all tx work is cleaned at every e1000_clean

Jesse

2008-01-16 05:02:29

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: "Brandeburg, Jesse" <[email protected]>
Date: Tue, 15 Jan 2008 13:53:43 -0800

> The tx code has an "early exit" that tries to limit the amount of tx
> packets handled in a single poll loop and requires napi or interrupt
> rescheduling based on the return value from e1000_clean_tx_irq.

That explains everything, thanks Jesse.

Ok, here is the patch I'll propose to fix this. The goal is to make
it as simple as possible without regressing the thing we were trying
to fix.

Something more sophisticated can be done later.

Three of the 5 Intel drivers had the TX breakout logic. e1000,
e1000e, and ixgbe. e100 and ixgb did not, so they don't have any
problems we need to fix here.

What the fix does is behave as if the budget was fully consumed if
*_clean_tx_irq() returns true.

The only valid way to return from ->poll() without copleting the NAPI
poll is by returning work_done == budget. That signals to the caller
that the NAPI instance has not been descheduled and therefore the
caller fully owns the NAPI context.

This does mean that for these drivers any time TX work is done, we'll
loop at least one extra time in the ->poll() loop of net_rx_work() but
that is historically what these drivers have caused to happen for
years.

For 2.6.25 or similar I would suggest investigating courses of action
to bring closure and consistency to this:

1) Determine whether the loop breakout is actually necessary.
Jesse explained to me that they had seen a case where a
thread on one cpu feeding the TX ring could keep a thread
on another cpu constantly running the *_clean_tx_irq() code
in a loop.

I find this hard to believe since even the slowest CPU should be
able to free up TX entries faster than they can be transmitted on
gigabit links :-)

2) If the investigation in #1 deems the breakout logic is necessary,
then consistently amongst all the 5 drivers a policy should be
implemented which is integrated with the NAPI budgetting logic.
For example, the simplest thing to do is to pass the budget and the
"work_done" thing down into *_clean_tx_irq() and break out if it is
exceeded.

As a further refinement we can say that TX work is about 1/4 the
expense of RX work and adjust the budget checking logic to match
that.

[NET]: Fix TX timeout regression in Intel drivers.

This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")

As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever. If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring[0] is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter,
- &adapter->tx_ring[0]);
+ tx_cleaned = e1000_clean_tx_irq(adapter,
+ &adapter->tx_ring[0]);
spin_unlock(&adapter->tx_queue_lock);
}

adapter->clean_rx(adapter, &adapter->rx_ring[0],
&work_done, budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter);
+ tx_cleaned = e1000_clean_tx_irq(adapter);
spin_unlock(&adapter->tx_queue_lock);
}

adapter->clean_rx(adapter, &work_done, budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (adapter->itr_setting & 3)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a564916..de3f45e 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
struct ixgbe_adapter *adapter = container_of(napi,
struct ixgbe_adapter, napi);
struct net_device *netdev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* In non-MSIX case, there is no multi-Tx/Rx queue */
- ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
+ tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
netif_rx_complete(netdev, napi);

2008-01-16 08:56:23

by Frans Pop

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

On Wednesday 16 January 2008, David Miller wrote:
> Ok, here is the patch I'll propose to fix this. The goal is to make
> it as simple as possible without regressing the thing we were trying
> to fix.

Looks good to me. Tested with -rc8.

Cheers,
FJP

2008-01-16 09:02:53

by Badalian Vyacheslav

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

applied to 2.6.24-rc7-git2
Have messages
Also have regression after apply patch.
System may do above 800mbs traffic before patch. After its "exit polling
mode?" (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3 CPU is IDLE)
After patch system was go to "exit polling mode" at above 600mbs.

Thanks.

> From: Frans Pop <[email protected]>
> Date: Tue, 15 Jan 2008 06:25:10 +0100
>
>
>> kernel: e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
>>
>
> Does this make the problem go away?
>
> (Note this isn't the final correct patch we should apply. There
> is no reason why this revert back to the older ->poll() logic
> here should have any effect on the TX hang triggering...)
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 13d57b0..cada32c 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
> {
> struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
> struct net_device *poll_dev = adapter->netdev;
> - int work_done = 0;
> + int tx_work = 0, work_done = 0;
>
> /* Must NOT use netdev_priv macro here. */
> adapter = poll_dev->priv;
> @@ -3929,8 +3929,8 @@ e1000_clean(struct napi_struct *napi, int budget)
> * simultaneously. A failure obtaining the lock means
> * tx_ring[0] is currently being cleaned anyway. */
> if (spin_trylock(&adapter->tx_queue_lock)) {
> - e1000_clean_tx_irq(adapter,
> - &adapter->tx_ring[0]);
> + tx_work = e1000_clean_tx_irq(adapter,
> + &adapter->tx_ring[0]);
> spin_unlock(&adapter->tx_queue_lock);
> }
>
> @@ -3938,7 +3938,7 @@ e1000_clean(struct napi_struct *napi, int budget)
> &work_done, budget);
>
> /* If budget not fully consumed, exit the polling mode */
> - if (work_done < budget) {
> + if (!tx_work && (work_done < budget)) {
> if (likely(adapter->itr_setting & 3))
> e1000_set_itr(adapter);
> netif_rx_complete(poll_dev, napi);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2008-01-16 10:30:21

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Frans Pop <[email protected]>
Date: Wed, 16 Jan 2008 09:56:08 +0100

> On Wednesday 16 January 2008, David Miller wrote:
> > Ok, here is the patch I'll propose to fix this. The goal is to make
> > it as simple as possible without regressing the thing we were trying
> > to fix.
>
> Looks good to me. Tested with -rc8.

Thanks for testing.

2008-01-16 12:25:31

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Badalian Vyacheslav <[email protected]>
Date: Wed, 16 Jan 2008 12:02:28 +0300

> applied to 2.6.24-rc7-git2
> Have messages
> Also have regression after apply patch.
> System may do above 800mbs traffic before patch. After its "exit polling
> mode?" (4 CPU, 1 cpu get 100% si (process ksoftirqd/0), 3 CPU is IDLE)
> After patch system was go to "exit polling mode" at above 600mbs.

What do you mean by 'system was go to "exit polling mode"'?

Please be more clear about your situation, in particular
provide every detail about what happens so that we can
properly debug this.

THanks.

2008-01-16 12:28:29

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Badalian Vyacheslav <[email protected]>
Date: Wed, 16 Jan 2008 12:02:28 +0300

> Also have regression after apply patch.

BTW, if you are using the e1000e driver then this initial
patch will not work.

My more recent patch posting for this problem, will.

I include it again below for you:

[NET]: Fix TX timeout regression in Intel drivers.

This fixes a regression added by changeset
53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
breakout consistent in Intel ethernet drivers.")

As pointed out by Jesse Brandeburg, for three of the drivers edited
above there is breakout logic in the *_clean_tx_irq() code to prevent
running TX reclaim forever. If this occurs, we have to elide NAPI
poll completion or else those TX events will never be serviced.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 13d57b0..0c9a6f7 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring[0] is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter,
- &adapter->tx_ring[0]);
+ tx_cleaned = e1000_clean_tx_irq(adapter,
+ &adapter->tx_ring[0]);
spin_unlock(&adapter->tx_queue_lock);
}

adapter->clean_rx(adapter, &adapter->rx_ring[0],
&work_done, budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (likely(adapter->itr_setting & 3))
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 4a6fc74..2ab3bfb 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
{
struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
struct net_device *poll_dev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* Must NOT use netdev_priv macro here. */
adapter = poll_dev->priv;
@@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
* simultaneously. A failure obtaining the lock means
* tx_ring is currently being cleaned anyway. */
if (spin_trylock(&adapter->tx_queue_lock)) {
- e1000_clean_tx_irq(adapter);
+ tx_cleaned = e1000_clean_tx_irq(adapter);
spin_unlock(&adapter->tx_queue_lock);
}

adapter->clean_rx(adapter, &work_done, budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
if (adapter->itr_setting & 3)
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index a564916..de3f45e 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
struct ixgbe_adapter *adapter = container_of(napi,
struct ixgbe_adapter, napi);
struct net_device *netdev = adapter->netdev;
- int work_done = 0;
+ int tx_cleaned = 0, work_done = 0;

/* In non-MSIX case, there is no multi-Tx/Rx queue */
- ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
+ tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
budget);

+ if (tx_cleaned)
+ work_done = budget;
+
/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) {
netif_rx_complete(netdev, napi);

2008-01-16 17:07:53

by Robert Olsson

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

David Miller writes:
> > On Wednesday 16 January 2008, David Miller wrote:
> > > Ok, here is the patch I'll propose to fix this. The goal is to make
> > > it as simple as possible without regressing the thing we were trying
> > > to fix.
> >
> > Looks good to me. Tested with -rc8.
>
> Thanks for testing.

Yes that code looks nice. I'm using the patch but I've noticed another
phenomena with the current e1000 driver. There is a race when taking a
device down at high traffic loads. I've tracked and instrumented and it
seems like occasionly irq_sem can get bump up so interrupts can't be
enabled again.

eth0 e1000_irq_enable sem = 1 <- High netload
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1
eth0 e1000_irq_enable sem = 1 <- ifconfig eth0 down
eth0 e1000_irq_disable sem = 2

**e1000_open <- ifconfig eth0 up
eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled
e1000_irq_enable miss
eth0 e1000_irq_enable sem = 2
e1000_irq_enable miss
eth0 e1000_irq_enable sem = 1
ADDRCONF(NETDEV_UP): eth0: link is not ready

Cheers
--ro

static void
e1000_irq_disable(struct e1000_adapter *adapter)
{
atomic_inc(&adapter->irq_sem);
E1000_WRITE_REG(&adapter->hw, IMC, ~0);
E1000_WRITE_FLUSH(&adapter->hw);
synchronize_irq(adapter->pdev->irq);

if(adapter->netdev->ifindex == 3)
printk("%s e1000_irq_disable sem = %d\n", adapter->netdev->name,
atomic_read(&adapter->irq_sem));
}

static void
e1000_irq_enable(struct e1000_adapter *adapter)
{
if (likely(atomic_dec_and_test(&adapter->irq_sem))) {
E1000_WRITE_REG(&adapter->hw, IMS, IMS_ENABLE_MASK);
E1000_WRITE_FLUSH(&adapter->hw);
}
else
printk("e1000_irq_enable miss\n");

if(adapter->netdev->ifindex == 3)
printk("%s e1000_irq_enable sem = %d\n", adapter->netdev->name,
atomic_read(&adapter->irq_sem));
}

2008-01-17 07:10:04

by Jesse Brandeburg

[permalink] [raw]

Subject: RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

David Miller wrote:
> From: "Brandeburg, Jesse" <[email protected]>
> Date: Tue, 15 Jan 2008 13:53:43 -0800
>
>> The tx code has an "early exit" that tries to limit the amount of tx
>> packets handled in a single poll loop and requires napi or interrupt
>> rescheduling based on the return value from e1000_clean_tx_irq.
>
> That explains everything, thanks Jesse.
>
> Ok, here is the patch I'll propose to fix this. The goal is to make
> it as simple as possible without regressing the thing we were trying
> to fix.

We spent Wednesday trying to reproduce (without the patch) these issues
without much luck, and have applied the patch cleanly and will continue
testing it. Given the simplicity of the changes, and the community
testing, I'll give my ack and we will continue testing.

I think we should fix Robert's (unrelated, but in this thread) reported
issue before 2.6.24 final if we can, and I'll look at that tonight and
tomorrow.

Thanks for your work on this Dave,
Jesse

Acked-by: Jesse Brandeburg <[email protected]>

2008-01-17 07:20:55

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: "Brandeburg, Jesse" <[email protected]>
Date: Wed, 16 Jan 2008 23:09:47 -0800

> We spent Wednesday trying to reproduce (without the patch) these issues
> without much luck, and have applied the patch cleanly and will continue
> testing it. Given the simplicity of the changes, and the community
> testing, I'll give my ack and we will continue testing.

You need a slow CPU, and you need to make sure you do actually
trigger the TX limiting code there.

I bet your cpus are fast enough that it simply never triggers.
:-)

> Acked-by: Jesse Brandeburg <[email protected]>

Thanks for reviewing Jesse.

2008-01-17 07:52:18

by Frans Pop

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

On Thursday 17 January 2008, David Miller wrote:
> From: "Brandeburg, Jesse" <[email protected]>
>
> > We spent Wednesday trying to reproduce (without the patch) these issues
> > without much luck, and have applied the patch cleanly and will continue
> > testing it. Given the simplicity of the changes, and the community
> > testing, I'll give my ack and we will continue testing.
>
> You need a slow CPU, and you need to make sure you do actually
> trigger the TX limiting code there.

Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?

2008-01-17 08:00:22

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Frans Pop <[email protected]>
Date: Thu, 17 Jan 2008 08:51:55 +0100

> On Thursday 17 January 2008, David Miller wrote:
> > From: "Brandeburg, Jesse" <[email protected]>
> >
> > > We spent Wednesday trying to reproduce (without the patch) these issues
> > > without much luck, and have applied the patch cleanly and will continue
> > > testing it. Given the simplicity of the changes, and the community
> > > testing, I'll give my ack and we will continue testing.
> >
> > You need a slow CPU, and you need to make sure you do actually
> > trigger the TX limiting code there.
>
> Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?

No of course :-) I guess it therefore depends upon the load
as well.

2008-01-17 09:40:35

by Arnaldo Carvalho de Melo

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

Em Thu, Jan 17, 2008 at 12:00:02AM -0800, David Miller escreveu:
> From: Frans Pop <[email protected]>
> Date: Thu, 17 Jan 2008 08:51:55 +0100
>
> > On Thursday 17 January 2008, David Miller wrote:
> > > From: "Brandeburg, Jesse" <[email protected]>
> > >
> > > > We spent Wednesday trying to reproduce (without the patch) these issues
> > > > without much luck, and have applied the patch cleanly and will continue
> > > > testing it. Given the simplicity of the changes, and the community
> > > > testing, I'll give my ack and we will continue testing.
> > >
> > > You need a slow CPU, and you need to make sure you do actually
> > > trigger the TX limiting code there.
> >
> > Hmmm. Is a dual core Pentium D 3.20GHz considered slow these days?
>
> No of course :-) I guess it therefore depends upon the load
> as well.

I saw it just once, yesterday:

[root@doppio ~]# uname -r
2.6.24-rc5
e1000: eth0: e1000_clean_tx_irq: Detected Tx Unit Hang
Tx Queue <0>
TDH <58>
TDT <8f>
next_to_use <8f>
next_to_clean <55>
buffer_info[next_to_clean]
time_stamp <105e973a9>
next_to_watch <56>
jiffies <105e97992>
next_to_watch.status <1>
[root@doppio ~]#

on a lenovo T60W, core2duo machine (2GHz), when using it to stress test
another machine, I was using netperf TCP_STREAM ranging from 1 to 8
streams + a ping -f using various packet sizes.

I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
to reproduce.

I also applied David's patch while trying some RT experiments on
another, 8 way machine used as a server, but on this machine I didn't
experience the Tx Unit Hang message with or without the patch.

- Arnaldo

2008-01-17 09:45:46

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu, 17 Jan 2008 07:40:07 -0200

> I'll update this machine today to 2.6.24-rc8-git + net-2.6 and try again
> to reproduce.

Thanks for the datapoints and testing.

2008-01-18 12:11:55

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Robert Olsson <[email protected]>
Date: Wed, 16 Jan 2008 18:07:38 +0100

>
> eth0 e1000_irq_enable sem = 1 <- High netload
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1
> eth0 e1000_irq_enable sem = 1 <- ifconfig eth0 down
> eth0 e1000_irq_disable sem = 2
>
> **e1000_open <- ifconfig eth0 up
> eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 2
> e1000_irq_enable miss
> eth0 e1000_irq_enable sem = 1
> ADDRCONF(NETDEV_UP): eth0: link is not ready

Yes, this semaphore thing is highly problematic. In the most crucial
areas where network driver consistency matters the most for ease of
understanding and debugging, the Intel drivers choose to be different
:-(

The way the napi_disable() logic breaks out from high packet load in
net_rx_action() is it simply returns even leaving interrupts disabled
when a pending napi_disable() is pending.

This is what trips up the semaphore logic.

Robert, give this patch a try.

In the long term this semaphore should be completely eliminated,
there is no justification for it.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 0c9a6f7..76c0fa6 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)

#ifdef CONFIG_E1000_NAPI
napi_disable(&adapter->napi);
+ atomic_set(&adapter->irq_sem, 0);
#endif
e1000_irq_disable(adapter);

diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 2ab3bfb..9cc5a6b 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
msleep(10);

napi_disable(&adapter->napi);
+ atomic_set(&adapter->irq_sem, 0);
e1000_irq_disable(adapter);

del_timer_sync(&adapter->watchdog_timer);
diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
index d2fb88d..4f63839 100644
--- a/drivers/net/ixgb/ixgb_main.c
+++ b/drivers/net/ixgb/ixgb_main.c
@@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
{
struct net_device *netdev = adapter->netdev;

+#ifdef CONFIG_IXGB_NAPI
+ napi_disable(&adapter->napi);
+ atomic_set(&adapter->irq_sem, 0);
+#endif
+
ixgb_irq_disable(adapter);
free_irq(adapter->pdev->irq, netdev);

@@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)

if(kill_watchdog)
del_timer_sync(&adapter->watchdog_timer);
-#ifdef CONFIG_IXGB_NAPI
- napi_disable(&adapter->napi);
-#endif
+
adapter->link_speed = 0;
adapter->link_duplex = 0;
netif_carrier_off(netdev);
diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
index de3f45e..a4265bc 100644
--- a/drivers/net/ixgbe/ixgbe_main.c
+++ b/drivers/net/ixgbe/ixgbe_main.c
@@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
IXGBE_WRITE_FLUSH(&adapter->hw);
msleep(10);

+ napi_disable(&adapter->napi);
+ atomic_set(&adapter->irq_sem, 0);
+
ixgbe_irq_disable(adapter);

- napi_disable(&adapter->napi);
del_timer_sync(&adapter->watchdog_timer);

netif_carrier_off(netdev);

2008-01-18 13:01:20

by Robert Olsson

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

David Miller writes:

> > eth0 e1000_irq_enable sem = 1 <- ifconfig eth0 down
> > eth0 e1000_irq_disable sem = 2
> >
> > **e1000_open <- ifconfig eth0 up
> > eth0 e1000_irq_disable sem = 3 Dead. irq's can't be enabled
> > e1000_irq_enable miss
> > eth0 e1000_irq_enable sem = 2
> > e1000_irq_enable miss
> > eth0 e1000_irq_enable sem = 1
> > ADDRCONF(NETDEV_UP): eth0: link is not ready
>
> Yes, this semaphore thing is highly problematic. In the most crucial
> areas where network driver consistency matters the most for ease of
> understanding and debugging, the Intel drivers choose to be different

I don't understand the idea with semaphore for enabling/disabling
irq's either the overall logic must safer/better without it.

> The way the napi_disable() logic breaks out from high packet load in
> net_rx_action() is it simply returns even leaving interrupts disabled
> when a pending napi_disable() is pending.
>
> This is what trips up the semaphore logic.
>
> Robert, give this patch a try.
>
> In the long term this semaphore should be completely eliminated,
> there is no justification for it.

It's on the testing list...

Cheers
--ro

>
> Signed-off-by: David S. Miller <[email protected]>
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 0c9a6f7..76c0fa6 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
>
> #ifdef CONFIG_E1000_NAPI
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> #endif
> e1000_irq_disable(adapter);
>
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 2ab3bfb..9cc5a6b 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
> msleep(10);
>
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> e1000_irq_disable(adapter);
>
> del_timer_sync(&adapter->watchdog_timer);
> diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
> index d2fb88d..4f63839 100644
> --- a/drivers/net/ixgb/ixgb_main.c
> +++ b/drivers/net/ixgb/ixgb_main.c
> @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
> {
> struct net_device *netdev = adapter->netdev;
>
> +#ifdef CONFIG_IXGB_NAPI
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +#endif
> +
> ixgb_irq_disable(adapter);
> free_irq(adapter->pdev->irq, netdev);
>
> @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
>
> if(kill_watchdog)
> del_timer_sync(&adapter->watchdog_timer);
> -#ifdef CONFIG_IXGB_NAPI
> - napi_disable(&adapter->napi);
> -#endif
> +
> adapter->link_speed = 0;
> adapter->link_duplex = 0;
> netif_carrier_off(netdev);
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index de3f45e..a4265bc 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
> IXGBE_WRITE_FLUSH(&adapter->hw);
> msleep(10);
>
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +
> ixgbe_irq_disable(adapter);
>
> - napi_disable(&adapter->napi);
> del_timer_sync(&adapter->watchdog_timer);
>
> netif_carrier_off(netdev);
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-01-18 13:37:33

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Robert Olsson <[email protected]>
Date: Fri, 18 Jan 2008 14:00:57 +0100

> I don't understand the idea with semaphore for enabling/disabling
> irq's either the overall logic must safer/better without it.

They must have had code paths where they didn't know if IRQs were
enabled or not already, so they tried to create something which
approximates the:

local_irq_save(flags);
local_irq_restore(flags);

constructs we have for CPU interrupts, so they could go:

e1000_irq_disable();
/* ... */
e1000_irq_enable();

and this would work even if the caller was running
with e1000 interrupts disabled already.

Or, something like that... it is indeed confusing.

Anyways, yes it's totally bogus and should be removed.

2008-01-20 09:20:39

by Jesse Brandeburg

[permalink] [raw]

Subject: RE: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

David Miller wrote:
> From: Robert Olsson <[email protected]>
> Date: Fri, 18 Jan 2008 14:00:57 +0100
>
>> I don't understand the idea with semaphore for enabling/disabling
>> irq's either the overall logic must safer/better without it.
>
> They must have had code paths where they didn't know if IRQs were
> enabled or not already, so they tried to create something which
> approximates the:
>
> local_irq_save(flags);
> local_irq_restore(flags);
>
> constructs we have for CPU interrupts, so they could go:
>
> e1000_irq_disable();
> /* ... */
> e1000_irq_enable();
>
> and this would work even if the caller was running
> with e1000 interrupts disabled already.
>
> Or, something like that... it is indeed confusing.
>
> Anyways, yes it's totally bogus and should be removed.

I agree, bogus, and in fact I've already removed it from our development
version of ixgbe. Right now I wanted to report I can't remove e1000 at
all on 2.6.24-rc8+git

I continually get the
kernel: unregister_netdevice: waiting for eth2 to become free. Usage
count = 1

Where 2.6.24-rc5 e1000 is okay still. Seems like maybe we are still
missing a netif_rx_complete or a napi_disable somewhere.

I don't think this problem has anything to do with the irq_sem right
now. Something is still badly broken. I am just using the interface
regularly (no heavy load) and I can't unload the module.

Jesse

2008-01-20 10:02:27

by Andrey Rahmatullin

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

On Sun, Jan 20, 2008 at 01:20:11AM -0800, Brandeburg, Jesse wrote:
> I continually get the
> kernel: unregister_netdevice: waiting for eth2 to become free. Usage
> count = 1
http://bugzilla.kernel.org/show_bug.cgi?id=9778

--
WBR, wRAR (ALT Linux Team)

Attachments:

(No filename) (255.00 B)
signature.asc (197.00 B)
Digital signature Download all attachments

2008-01-21 06:55:06

by Badalian Vyacheslav

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

Hello. Its work, thanks for resend it!
Sorry, i understand that patch 53e52c729cc169db82a6105fac7a166e10c2ec36
("[NET]: Make ->poll() breakout consistent in Intel ethernet drivers.")
have regression and rollback it, i not see your patch.
Sorry again.

Thanks!
> From: Badalian Vyacheslav <[email protected]>
> Date: Wed, 16 Jan 2008 12:02:28 +0300
>
>
>> Also have regression after apply patch.
>>
>
> BTW, if you are using the e1000e driver then this initial
> patch will not work.
>
> My more recent patch posting for this problem, will.
>
> I include it again below for you:
>
> [NET]: Fix TX timeout regression in Intel drivers.
>
> This fixes a regression added by changeset
> 53e52c729cc169db82a6105fac7a166e10c2ec36 ("[NET]: Make ->poll()
> breakout consistent in Intel ethernet drivers.")
>
> As pointed out by Jesse Brandeburg, for three of the drivers edited
> above there is breakout logic in the *_clean_tx_irq() code to prevent
> running TX reclaim forever. If this occurs, we have to elide NAPI
> poll completion or else those TX events will never be serviced.
>
> Signed-off-by: David S. Miller <[email protected]>
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 13d57b0..0c9a6f7 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -3919,7 +3919,7 @@ e1000_clean(struct napi_struct *napi, int budget)
> {
> struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
> struct net_device *poll_dev = adapter->netdev;
> - int work_done = 0;
> + int tx_cleaned = 0, work_done = 0;
>
> /* Must NOT use netdev_priv macro here. */
> adapter = poll_dev->priv;
> @@ -3929,14 +3929,17 @@ e1000_clean(struct napi_struct *napi, int budget)
> * simultaneously. A failure obtaining the lock means
> * tx_ring[0] is currently being cleaned anyway. */
> if (spin_trylock(&adapter->tx_queue_lock)) {
> - e1000_clean_tx_irq(adapter,
> - &adapter->tx_ring[0]);
> + tx_cleaned = e1000_clean_tx_irq(adapter,
> + &adapter->tx_ring[0]);
> spin_unlock(&adapter->tx_queue_lock);
> }
>
> adapter->clean_rx(adapter, &adapter->rx_ring[0],
> &work_done, budget);
>
> + if (tx_cleaned)
> + work_done = budget;
> +
> /* If budget not fully consumed, exit the polling mode */
> if (work_done < budget) {
> if (likely(adapter->itr_setting & 3))
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 4a6fc74..2ab3bfb 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -1384,7 +1384,7 @@ static int e1000_clean(struct napi_struct *napi, int budget)
> {
> struct e1000_adapter *adapter = container_of(napi, struct e1000_adapter, napi);
> struct net_device *poll_dev = adapter->netdev;
> - int work_done = 0;
> + int tx_cleaned = 0, work_done = 0;
>
> /* Must NOT use netdev_priv macro here. */
> adapter = poll_dev->priv;
> @@ -1394,12 +1394,15 @@ static int e1000_clean(struct napi_struct *napi, int budget)
> * simultaneously. A failure obtaining the lock means
> * tx_ring is currently being cleaned anyway. */
> if (spin_trylock(&adapter->tx_queue_lock)) {
> - e1000_clean_tx_irq(adapter);
> + tx_cleaned = e1000_clean_tx_irq(adapter);
> spin_unlock(&adapter->tx_queue_lock);
> }
>
> adapter->clean_rx(adapter, &work_done, budget);
>
> + if (tx_cleaned)
> + work_done = budget;
> +
> /* If budget not fully consumed, exit the polling mode */
> if (work_done < budget) {
> if (adapter->itr_setting & 3)
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index a564916..de3f45e 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -1468,13 +1468,16 @@ static int ixgbe_clean(struct napi_struct *napi, int budget)
> struct ixgbe_adapter *adapter = container_of(napi,
> struct ixgbe_adapter, napi);
> struct net_device *netdev = adapter->netdev;
> - int work_done = 0;
> + int tx_cleaned = 0, work_done = 0;
>
> /* In non-MSIX case, there is no multi-Tx/Rx queue */
> - ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
> + tx_cleaned = ixgbe_clean_tx_irq(adapter, adapter->tx_ring);
> ixgbe_clean_rx_irq(adapter, &adapter->rx_ring[0], &work_done,
> budget);
>
> + if (tx_cleaned)
> + work_done = budget;
> +
> /* If budget not fully consumed, exit the polling mode */
> if (work_done < budget) {
> netif_rx_complete(netdev, napi);
>
>

2008-01-21 13:27:36

by Robert Olsson

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

David Miller writes:

> Yes, this semaphore thing is highly problematic. In the most crucial
> areas where network driver consistency matters the most for ease of
> understanding and debugging, the Intel drivers choose to be different
> :-(
>
> The way the napi_disable() logic breaks out from high packet load in
> net_rx_action() is it simply returns even leaving interrupts disabled
> when a pending napi_disable() is pending.
>
> This is what trips up the semaphore logic.
>
> Robert, give this patch a try.

Yes it works. e1000 tested for ~3 hours with high very high load and
interface up/down every 5:th sec. Without the patch the irq's gets
disabled within a couple of seconds

A resolute way of handling the semaphores. :)

Signed-off-by: Robert Olsson <[email protected]>

Cheers
--ro

> In the long term this semaphore should be completely eliminated,
> there is no justification for it.
>
> Signed-off-by: David S. Miller <[email protected]>
>
> diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
> index 0c9a6f7..76c0fa6 100644
> --- a/drivers/net/e1000/e1000_main.c
> +++ b/drivers/net/e1000/e1000_main.c
> @@ -632,6 +632,7 @@ e1000_down(struct e1000_adapter *adapter)
>
> #ifdef CONFIG_E1000_NAPI
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> #endif
> e1000_irq_disable(adapter);
>
> diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
> index 2ab3bfb..9cc5a6b 100644
> --- a/drivers/net/e1000e/netdev.c
> +++ b/drivers/net/e1000e/netdev.c
> @@ -2183,6 +2183,7 @@ void e1000e_down(struct e1000_adapter *adapter)
> msleep(10);
>
> napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> e1000_irq_disable(adapter);
>
> del_timer_sync(&adapter->watchdog_timer);
> diff --git a/drivers/net/ixgb/ixgb_main.c b/drivers/net/ixgb/ixgb_main.c
> index d2fb88d..4f63839 100644
> --- a/drivers/net/ixgb/ixgb_main.c
> +++ b/drivers/net/ixgb/ixgb_main.c
> @@ -296,6 +296,11 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
> {
> struct net_device *netdev = adapter->netdev;
>
> +#ifdef CONFIG_IXGB_NAPI
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +#endif
> +
> ixgb_irq_disable(adapter);
> free_irq(adapter->pdev->irq, netdev);
>
> @@ -304,9 +309,7 @@ ixgb_down(struct ixgb_adapter *adapter, boolean_t kill_watchdog)
>
> if(kill_watchdog)
> del_timer_sync(&adapter->watchdog_timer);
> -#ifdef CONFIG_IXGB_NAPI
> - napi_disable(&adapter->napi);
> -#endif
> +
> adapter->link_speed = 0;
> adapter->link_duplex = 0;
> netif_carrier_off(netdev);
> diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c
> index de3f45e..a4265bc 100644
> --- a/drivers/net/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ixgbe/ixgbe_main.c
> @@ -1409,9 +1409,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
> IXGBE_WRITE_FLUSH(&adapter->hw);
> msleep(10);
>
> + napi_disable(&adapter->napi);
> + atomic_set(&adapter->irq_sem, 0);
> +
> ixgbe_irq_disable(adapter);
>
> - napi_disable(&adapter->napi);
> del_timer_sync(&adapter->watchdog_timer);
>
> netif_carrier_off(netdev);

2008-01-21 13:29:48

by David Miller

[permalink] [raw]

Subject: Re: [REGRESSION] 2.6.24-rc7: e1000: Detected Tx Unit Hang

From: Robert Olsson <[email protected]>
Date: Mon, 21 Jan 2008 14:27:13 +0100

> Yes it works. e1000 tested for ~3 hours with high very high load and
> interface up/down every 5:th sec. Without the patch the irq's gets
> disabled within a couple of seconds
>
> A resolute way of handling the semaphores. :)
>
> Signed-off-by: Robert Olsson <[email protected]>

Thanks for testing Robert.

I sent off that fix to Linus an hour or so ago, hopefully
he will pick it up some time today.