2008-12-08 11:24:21

by Zdenek Kabelac

[permalink] [raw]
Subject: BUG: bad unlock balance detected! e1000e

Hi

During occasional scan of message log - I've found out this BUG which
happened on Dec3 with the -rc7 from that day.
(So if it's now fixed in current git feel free to ignore :))

My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
shutdown and was actually not noticed by me...


NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
shutting down normally.
NetworkManager: <info> (eth0): now unmanaged
NetworkManager: <info> (eth0): device state change: 3 -> 1
NetworkManager: <info> (eth0): cleaning up...
NetworkManager: <info> (eth0): taking down device.

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------
NetworkManager/2058 is trying to release lock (nvm_mutex) at:
[<ffffffff8052acb9>] mutex_unlock+0x9/0x10
but there are no more locks to release!

other info that might help us debug this:
1 lock held by NetworkManager/2058:
#0: (rtnl_mutex){--..}, at: [<ffffffff804b09ba>] rtnetlink_rcv+0x1a/0x40

stack backtrace:
Pid: 2058, comm: NetworkManager Not tainted 2.6.28-rc7 #90
Call Trace:
[<ffffffff8026bf6b>] print_unlock_inbalance_bug+0xfb/0x120
[<ffffffff8026db76>] ? mark_held_locks+0x56/0xa0
[<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
[<ffffffff8052ce23>] ? _spin_unlock_irqrestore+0x43/0x70
[<ffffffff8026f468>] lock_release_non_nested+0x1a8/0x2a0
[<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
[<ffffffff8052acb9>] ? mutex_unlock+0x9/0x10
[<ffffffff8026f61b>] lock_release+0xbb/0x200
[<ffffffff8052abc6>] __mutex_unlock_slowpath+0x86/0x170
[<ffffffff8052acb9>] mutex_unlock+0x9/0x10
[<ffffffffa019cf45>] e1000_release_swflag_ich8lan+0x35/0x40 [e1000e]
[<ffffffffa019de19>] e1000_reset_hw_ich8lan+0xa9/0x130 [e1000e]
[<ffffffffa01ab140>] e1000e_reset+0xf0/0x260 [e1000e]
[<ffffffff802500ea>] ? del_timer_sync+0x1a/0x30
[<ffffffffa01ab4d8>] e1000e_down+0x178/0x180 [e1000e]
[<ffffffffa01ad0b6>] e1000_close+0x26/0xd0 [e1000e]
[<ffffffff804a59b6>] dev_close+0x76/0xc0
[<ffffffff804a7bd6>] dev_change_flags+0x96/0x1e0
[<ffffffff804b1b4c>] do_setlink+0x2ac/0x440
[<ffffffff8052cd46>] ? _read_unlock+0x26/0x30
[<ffffffff804b1ded>] rtnl_setlink+0x10d/0x150
[<ffffffff8052aeb2>] ? mutex_lock_nested+0x1f2/0x300
[<ffffffff804b09ba>] ? rtnetlink_rcv+0x1a/0x40
[<ffffffff804b0b6d>] rtnetlink_rcv_msg+0x18d/0x240
[<ffffffff804b09e0>] ? rtnetlink_rcv_msg+0x0/0x240
[<ffffffff804bba29>] netlink_rcv_skb+0x89/0xb0
[<ffffffff804b09c9>] rtnetlink_rcv+0x29/0x40
[<ffffffff804bb74d>] netlink_unicast+0x2bd/0x2d0
[<ffffffff8049e3ee>] ? __alloc_skb+0x6e/0x150
[<ffffffff804bc6c4>] netlink_sendmsg+0x204/0x2f0
[<ffffffff80499ed8>] ? sock_def_readable+0x68/0x70
[<ffffffff80498077>] sock_sendmsg+0x107/0x130
[<ffffffff8025c690>] ? autoremove_wake_function+0x0/0x40
[<ffffffff80213853>] ? native_sched_clock+0x13/0x60
[<ffffffff802d60e6>] ? fget_light+0x106/0x110
[<ffffffff80497987>] ? move_addr_to_kernel+0x57/0x60
[<ffffffff804a087f>] ? verify_iovec+0x3f/0xe0
[<ffffffff80498229>] sys_sendmsg+0x189/0x320
[<ffffffff8049851d>] ? sys_sendto+0xfd/0x120
[<ffffffff802e6650>] ? d_free+0x50/0x60
[<ffffffff802d6691>] ? __fput+0x171/0x1e0
[<ffffffff8026dda3>] ? trace_hardirqs_on_caller+0x133/0x190
[<ffffffff8028e02f>] ? audit_syscall_entry+0x15f/0x190
[<ffffffff8052c906>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff8020c57b>] system_call_fastpath+0x16/0x1b


2008-12-09 11:04:12

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: BUG: bad unlock balance detected! e1000e

Hello Zdenek,

This could be due to 717d438d1fde94decef874b9808379d1f4523453
"e1000e: debug contention on NVM SWFLAG"
Error handling is missing from e1000_reset_hw_ich8lan so it may happen
that we don't acquire the nvm_mutex if the card times out.

Adding Thomas to CC.

Regards,
Frederik

It some error checking is missing in e1000e: debug contention on NVM
SWFLAG
On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> Hi
>
> During occasional scan of message log - I've found out this BUG which
> happened on Dec3 with the -rc7 from that day.
> (So if it's now fixed in current git feel free to ignore :))
>
> My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> shutdown and was actually not noticed by me...
>
>
> NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
> shutting down normally.
> NetworkManager: <info> (eth0): now unmanaged
> NetworkManager: <info> (eth0): device state change: 3 -> 1
> NetworkManager: <info> (eth0): cleaning up...
> NetworkManager: <info> (eth0): taking down device.
>
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
> NetworkManager/2058 is trying to release lock (nvm_mutex) at:
> [<ffffffff8052acb9>] mutex_unlock+0x9/0x10
> but there are no more locks to release!
>
> other info that might help us debug this:
> 1 lock held by NetworkManager/2058:
> #0: (rtnl_mutex){--..}, at: [<ffffffff804b09ba>] rtnetlink_rcv+0x1a/0x40
>
> stack backtrace:
> Pid: 2058, comm: NetworkManager Not tainted 2.6.28-rc7 #90
> Call Trace:
> [<ffffffff8026bf6b>] print_unlock_inbalance_bug+0xfb/0x120
> [<ffffffff8026db76>] ? mark_held_locks+0x56/0xa0
> [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
> [<ffffffff8052ce23>] ? _spin_unlock_irqrestore+0x43/0x70
> [<ffffffff8026f468>] lock_release_non_nested+0x1a8/0x2a0
> [<ffffffff802500ba>] ? try_to_del_timer_sync+0x5a/0x70
> [<ffffffff8052acb9>] ? mutex_unlock+0x9/0x10
> [<ffffffff8026f61b>] lock_release+0xbb/0x200
> [<ffffffff8052abc6>] __mutex_unlock_slowpath+0x86/0x170
> [<ffffffff8052acb9>] mutex_unlock+0x9/0x10
> [<ffffffffa019cf45>] e1000_release_swflag_ich8lan+0x35/0x40 [e1000e]
> [<ffffffffa019de19>] e1000_reset_hw_ich8lan+0xa9/0x130 [e1000e]
> [<ffffffffa01ab140>] e1000e_reset+0xf0/0x260 [e1000e]
> [<ffffffff802500ea>] ? del_timer_sync+0x1a/0x30
> [<ffffffffa01ab4d8>] e1000e_down+0x178/0x180 [e1000e]
> [<ffffffffa01ad0b6>] e1000_close+0x26/0xd0 [e1000e]
> [<ffffffff804a59b6>] dev_close+0x76/0xc0
> [<ffffffff804a7bd6>] dev_change_flags+0x96/0x1e0
> [<ffffffff804b1b4c>] do_setlink+0x2ac/0x440
> [<ffffffff8052cd46>] ? _read_unlock+0x26/0x30
> [<ffffffff804b1ded>] rtnl_setlink+0x10d/0x150
> [<ffffffff8052aeb2>] ? mutex_lock_nested+0x1f2/0x300
> [<ffffffff804b09ba>] ? rtnetlink_rcv+0x1a/0x40
> [<ffffffff804b0b6d>] rtnetlink_rcv_msg+0x18d/0x240
> [<ffffffff804b09e0>] ? rtnetlink_rcv_msg+0x0/0x240
> [<ffffffff804bba29>] netlink_rcv_skb+0x89/0xb0
> [<ffffffff804b09c9>] rtnetlink_rcv+0x29/0x40
> [<ffffffff804bb74d>] netlink_unicast+0x2bd/0x2d0
> [<ffffffff8049e3ee>] ? __alloc_skb+0x6e/0x150
> [<ffffffff804bc6c4>] netlink_sendmsg+0x204/0x2f0
> [<ffffffff80499ed8>] ? sock_def_readable+0x68/0x70
> [<ffffffff80498077>] sock_sendmsg+0x107/0x130
> [<ffffffff8025c690>] ? autoremove_wake_function+0x0/0x40
> [<ffffffff80213853>] ? native_sched_clock+0x13/0x60
> [<ffffffff802d60e6>] ? fget_light+0x106/0x110
> [<ffffffff80497987>] ? move_addr_to_kernel+0x57/0x60
> [<ffffffff804a087f>] ? verify_iovec+0x3f/0xe0
> [<ffffffff80498229>] sys_sendmsg+0x189/0x320
> [<ffffffff8049851d>] ? sys_sendto+0xfd/0x120
> [<ffffffff802e6650>] ? d_free+0x50/0x60
> [<ffffffff802d6691>] ? __fput+0x171/0x1e0
> [<ffffffff8026dda3>] ? trace_hardirqs_on_caller+0x133/0x190
> [<ffffffff8028e02f>] ? audit_syscall_entry+0x15f/0x190
> [<ffffffff8052c906>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [<ffffffff8020c57b>] system_call_fastpath+0x16/0x1b
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2008-12-09 23:08:31

by Andrew Morton

[permalink] [raw]
Subject: Re: BUG: bad unlock balance detected! e1000e

On Tue, 9 Dec 2008 12:03:37 +0100
Frederik Deweerdt <[email protected]> wrote:

> It some error checking is missing in e1000e: debug contention on NVM
> SWFLAG
> On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > Hi
> >
> > During occasional scan of message log - I've found out this BUG which
> > happened on Dec3 with the -rc7 from that day.
> > (So if it's now fixed in current git feel free to ignore :))
> >
> > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > shutdown and was actually not noticed by me...
> >
> >
> > NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
> > shutting down normally.
> > NetworkManager: <info> (eth0): now unmanaged
> > NetworkManager: <info> (eth0): device state change: 3 -> 1
> > NetworkManager: <info> (eth0): cleaning up...
> > NetworkManager: <info> (eth0): taking down device.
> >
> > =====================================
> > [ BUG: bad unlock balance detected! ]
> > -------------------------------------

(top-posting repaired. Please don't do that!!!).

> Hello Zdenek,
>
> This could be due to 717d438d1fde94decef874b9808379d1f4523453
> "e1000e: debug contention on NVM SWFLAG"
> Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> that we don't acquire the nvm_mutex if the card times out.
>
> Adding Thomas to CC.

yup. 2.6.27 needs fixing also.

Like this?

From: Andrew Morton <[email protected]>

=====================================
[ BUG: bad unlock balance detected! ]
-------------------------------------

A regression added by 717d438d1fde94decef874b9808379d1f4523453 ("e1000e:
debug contention on NVM SWFLAG").

Reported-by: "Zdenek Kabelac" <[email protected]>
Cc: Frederik Deweerdt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Jesse Brandeburg <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

drivers/net/e1000e/ich8lan.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff -puN drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking drivers/net/e1000e/ich8lan.c
--- a/drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking
+++ a/drivers/net/e1000e/ich8lan.c
@@ -400,6 +400,7 @@ static s32 e1000_acquire_swflag_ich8lan(
{
u32 extcnf_ctrl;
u32 timeout = PHY_CFG_TIMEOUT;
+ s32 ret = 0;

might_sleep();

@@ -427,11 +428,11 @@ static s32 e1000_acquire_swflag_ich8lan(
extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
ew32(EXTCNF_CTRL, extcnf_ctrl);
nvm_owner = -1;
- mutex_unlock(&nvm_mutex);
- return -E1000_ERR_CONFIG;
+ ret = -E1000_ERR_CONFIG;
}

- return 0;
+ mutex_unlock(&nvm_mutex);
+ return ret;
}

/**
_

2008-12-09 23:44:00

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: BUG: bad unlock balance detected! e1000e

On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
> On Tue, 9 Dec 2008 12:03:37 +0100
> Frederik Deweerdt <[email protected]> wrote:
>
> > It some error checking is missing in e1000e: debug contention on NVM
> > SWFLAG
> > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > > Hi
> > >
> > > During occasional scan of message log - I've found out this BUG which
> > > happened on Dec3 with the -rc7 from that day.
> > > (So if it's now fixed in current git feel free to ignore :))
> > >
> > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > > shutdown and was actually not noticed by me...
> > >
> > >
> > > NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
> > > shutting down normally.
> > > NetworkManager: <info> (eth0): now unmanaged
> > > NetworkManager: <info> (eth0): device state change: 3 -> 1
> > > NetworkManager: <info> (eth0): cleaning up...
> > > NetworkManager: <info> (eth0): taking down device.
> > >
> > > =====================================
> > > [ BUG: bad unlock balance detected! ]
> > > -------------------------------------
>
> (top-posting repaired. Please don't do that!!!).
Yep, sorry.
>
> > Hello Zdenek,
> >
> > This could be due to 717d438d1fde94decef874b9808379d1f4523453
> > "e1000e: debug contention on NVM SWFLAG"
> > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> > that we don't acquire the nvm_mutex if the card times out.
> >
> > Adding Thomas to CC.
>
> yup. 2.6.27 needs fixing also.
>
> Like this?
I don't think so, e1000_acquire_swflag_ich8lan() locks and
e1000_release_swflag_ich8lan() unlocks. I think it is more along the
lines of:


diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 523b971..f971b83 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
*/
ctrl |= E1000_CTRL_PHY_RST;
}
+
ret_val = e1000_acquire_swflag_ich8lan(hw);
+ if (ret_val) {
+ hw_dbg(hw, "Failed to acquire NVM swflag");
+ return ret_val;
+ }
+
hw_dbg(hw, "Issuing a global reset to ich8lan");
ew32(CTRL, (ctrl | E1000_CTRL_RST));
msleep(20);


But I'm not sure we should cancel the ongoing reset if the card times
out...


Regards,
Frederik

>
> From: Andrew Morton <[email protected]>
>
> =====================================
> [ BUG: bad unlock balance detected! ]
> -------------------------------------
>
> A regression added by 717d438d1fde94decef874b9808379d1f4523453 ("e1000e:
> debug contention on NVM SWFLAG").
>
> Reported-by: "Zdenek Kabelac" <[email protected]>
> Cc: Frederik Deweerdt <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Jesse Brandeburg <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
>
> drivers/net/e1000e/ich8lan.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff -puN drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking drivers/net/e1000e/ich8lan.c
> --- a/drivers/net/e1000e/ich8lan.c~drivers-net-e1000e-ich8lanc-fix-locking
> +++ a/drivers/net/e1000e/ich8lan.c
> @@ -400,6 +400,7 @@ static s32 e1000_acquire_swflag_ich8lan(
> {
> u32 extcnf_ctrl;
> u32 timeout = PHY_CFG_TIMEOUT;
> + s32 ret = 0;
>
> might_sleep();
>
> @@ -427,11 +428,11 @@ static s32 e1000_acquire_swflag_ich8lan(
> extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
> ew32(EXTCNF_CTRL, extcnf_ctrl);
> nvm_owner = -1;
> - mutex_unlock(&nvm_mutex);
> - return -E1000_ERR_CONFIG;
> + ret = -E1000_ERR_CONFIG;
> }
>
> - return 0;
> + mutex_unlock(&nvm_mutex);
> + return ret;
> }
>
> /**
> _
>

2008-12-09 23:57:30

by Andrew Morton

[permalink] [raw]
Subject: Re: BUG: bad unlock balance detected! e1000e

On Wed, 10 Dec 2008 00:43:46 +0100
Frederik Deweerdt <[email protected]> wrote:

> On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
> > On Tue, 9 Dec 2008 12:03:37 +0100
> > Frederik Deweerdt <[email protected]> wrote:
> >
> > > It some error checking is missing in e1000e: debug contention on NVM
> > > SWFLAG
> > > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
> > > > Hi
> > > >
> > > > During occasional scan of message log - I've found out this BUG which
> > > > happened on Dec3 with the -rc7 from that day.
> > > > (So if it's now fixed in current git feel free to ignore :))
> > > >
> > > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
> > > > shutdown and was actually not noticed by me...
> > > >
> > > >
> > > > NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
> > > > shutting down normally.
> > > > NetworkManager: <info> (eth0): now unmanaged
> > > > NetworkManager: <info> (eth0): device state change: 3 -> 1
> > > > NetworkManager: <info> (eth0): cleaning up...
> > > > NetworkManager: <info> (eth0): taking down device.
> > > >
> > > > =====================================
> > > > [ BUG: bad unlock balance detected! ]
> > > > -------------------------------------
> >
> > (top-posting repaired. Please don't do that!!!).
> Yep, sorry.
> >
> > > Hello Zdenek,
> > >
> > > This could be due to 717d438d1fde94decef874b9808379d1f4523453
> > > "e1000e: debug contention on NVM SWFLAG"
> > > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
> > > that we don't acquire the nvm_mutex if the card times out.
> > >
> > > Adding Thomas to CC.
> >
> > yup. 2.6.27 needs fixing also.
> >
> > Like this?
> I don't think so, e1000_acquire_swflag_ich8lan() locks and
> e1000_release_swflag_ich8lan() unlocks.

urgh, OK, I made the mistake of reading the comments.

> I think it is more along the
> lines of:
>
>
> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
> index 523b971..f971b83 100644
> --- a/drivers/net/e1000e/ich8lan.c
> +++ b/drivers/net/e1000e/ich8lan.c
> @@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
> */
> ctrl |= E1000_CTRL_PHY_RST;
> }
> +
> ret_val = e1000_acquire_swflag_ich8lan(hw);
> + if (ret_val) {
> + hw_dbg(hw, "Failed to acquire NVM swflag");
> + return ret_val;
> + }
> +
> hw_dbg(hw, "Issuing a global reset to ich8lan");
> ew32(CTRL, (ctrl | E1000_CTRL_RST));
> msleep(20);
>
>
> But I'm not sure we should cancel the ongoing reset if the card times
> out...
>

Yes, something like that. Or something like

--- a/drivers/net/e1000e/ich8lan.c~a
+++ a/drivers/net/e1000e/ich8lan.c
@@ -1940,12 +1940,14 @@ static s32 e1000_reset_hw_ich8lan(struct
ctrl |= E1000_CTRL_PHY_RST;
}
ret_val = e1000_acquire_swflag_ich8lan(hw);
- hw_dbg(hw, "Issuing a global reset to ich8lan\n");
- ew32(CTRL, (ctrl | E1000_CTRL_RST));
- msleep(20);
+ if (!ret_val) {
+ hw_dbg(hw, "Issuing a global reset to ich8lan\n");
+ ew32(CTRL, (ctrl | E1000_CTRL_RST));
+ msleep(20);

- /* release the swflag because it is not reset by hardware reset */
- e1000_release_swflag_ich8lan(hw);
+ /* release the swflag because it is not reset by hardware reset */
+ e1000_release_swflag_ich8lan(hw);
+ }

ret_val = e1000e_get_auto_rd_done(hw);
if (ret_val) {
_


Dunno. It's e1000-developer-summoning-dance time.

2008-12-11 00:38:09

by Jeff Kirsher

[permalink] [raw]
Subject: Re: [E1000-devel] BUG: bad unlock balance detected! e1000e

On Tue, Dec 9, 2008 at 3:56 PM, Andrew Morton <[email protected]> wrote:
> On Wed, 10 Dec 2008 00:43:46 +0100
> Frederik Deweerdt <[email protected]> wrote:
>
>> On Tue, Dec 09, 2008 at 03:08:01PM -0800, Andrew Morton wrote:
>> > On Tue, 9 Dec 2008 12:03:37 +0100
>> > Frederik Deweerdt <[email protected]> wrote:
>> >
>> > > It some error checking is missing in e1000e: debug contention on NVM
>> > > SWFLAG
>> > > On Mon, Dec 08, 2008 at 12:24:09PM +0100, Zdenek Kabelac wrote:
>> > > > Hi
>> > > >
>> > > > During occasional scan of message log - I've found out this BUG which
>> > > > happened on Dec3 with the -rc7 from that day.
>> > > > (So if it's now fixed in current git feel free to ignore :))
>> > > >
>> > > > My machine T61 - C2D, 2GB, 64bit kernel - message appeared during
>> > > > shutdown and was actually not noticed by me...
>> > > >
>> > > >
>> > > > NetworkManager: <WARN> nm_signal_handler(): Caught signal 15,
>> > > > shutting down normally.
>> > > > NetworkManager: <info> (eth0): now unmanaged
>> > > > NetworkManager: <info> (eth0): device state change: 3 -> 1
>> > > > NetworkManager: <info> (eth0): cleaning up...
>> > > > NetworkManager: <info> (eth0): taking down device.
>> > > >
>> > > > =====================================
>> > > > [ BUG: bad unlock balance detected! ]
>> > > > -------------------------------------
>> >
>> > (top-posting repaired. Please don't do that!!!).
>> Yep, sorry.
>> >
>> > > Hello Zdenek,
>> > >
>> > > This could be due to 717d438d1fde94decef874b9808379d1f4523453
>> > > "e1000e: debug contention on NVM SWFLAG"
>> > > Error handling is missing from e1000_reset_hw_ich8lan so it may happen
>> > > that we don't acquire the nvm_mutex if the card times out.
>> > >
>> > > Adding Thomas to CC.
>> >
>> > yup. 2.6.27 needs fixing also.
>> >
>> > Like this?
>> I don't think so, e1000_acquire_swflag_ich8lan() locks and
>> e1000_release_swflag_ich8lan() unlocks.
>
> urgh, OK, I made the mistake of reading the comments.
>
>> I think it is more along the
>> lines of:
>>
>>
>> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
>> index 523b971..f971b83 100644
>> --- a/drivers/net/e1000e/ich8lan.c
>> +++ b/drivers/net/e1000e/ich8lan.c
>> @@ -1892,7 +1892,13 @@ static s32 e1000_reset_hw_ich8lan(struct e1000_hw *hw)
>> */
>> ctrl |= E1000_CTRL_PHY_RST;
>> }
>> +
>> ret_val = e1000_acquire_swflag_ich8lan(hw);
>> + if (ret_val) {
>> + hw_dbg(hw, "Failed to acquire NVM swflag");
>> + return ret_val;
>> + }
>> +
>> hw_dbg(hw, "Issuing a global reset to ich8lan");
>> ew32(CTRL, (ctrl | E1000_CTRL_RST));
>> msleep(20);
>>
>>
>> But I'm not sure we should cancel the ongoing reset if the card times
>> out...
>>
>
> Yes, something like that. Or something like
>
> --- a/drivers/net/e1000e/ich8lan.c~a
> +++ a/drivers/net/e1000e/ich8lan.c
> @@ -1940,12 +1940,14 @@ static s32 e1000_reset_hw_ich8lan(struct
> ctrl |= E1000_CTRL_PHY_RST;
> }
> ret_val = e1000_acquire_swflag_ich8lan(hw);
> - hw_dbg(hw, "Issuing a global reset to ich8lan\n");
> - ew32(CTRL, (ctrl | E1000_CTRL_RST));
> - msleep(20);
> + if (!ret_val) {
> + hw_dbg(hw, "Issuing a global reset to ich8lan\n");
> + ew32(CTRL, (ctrl | E1000_CTRL_RST));
> + msleep(20);
>
> - /* release the swflag because it is not reset by hardware reset */
> - e1000_release_swflag_ich8lan(hw);
> + /* release the swflag because it is not reset by hardware reset */
> + e1000_release_swflag_ich8lan(hw);
> + }
>
> ret_val = e1000e_get_auto_rd_done(hw);
> if (ret_val) {
> _
>
>
> Dunno. It's e1000-developer-summoning-dance time.
>

Actually, if we time out trying to acquire the swflag, we still want
to reset the part because we are most likely in an unrecoverable
state.

So I would suggest the following
--- a/drivers/net/e1000e/ich8lan.c~a
+++ a/drivers/net/e1000e/ich8lan.c
@@ -1940,9 +1940,10 @@ static s32 e1000_reset_hw_ich8lan(struct
ctrl |= E1000_CTRL_PHY_RST;
}
ret_val = e1000_acquire_swflag_ich8lan(hw);
hw_dbg(hw, "Issuing a global reset to ich8lan\n");
ew32(CTRL, (ctrl | E1000_CTRL_RST));
msleep(20);
+ if (!ret_val) {
-
- /* release the swflag because it is not reset by hardware reset */
- e1000_release_swflag_ich8lan(hw);
+ /* release the swflag because it is not reset by
hardware reset */
+ e1000_release_swflag_ich8lan(hw);
+ }

Of course, we will want to add a comment to the fact that we still
want to reset the part, even if we have not acquired the lock because
we are in an unrecoverable state.

I can provide a patch in a few minutes.

--
Cheers,
Jeff