LinuxLists.cc - Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

2008-10-02 14:30:21

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Mon, 29 Sep 2008, Jesse Brandeburg wrote:

> From: Thomas Gleixner <[email protected]>
>
> This patch adds a mutex to the e1000e driver that would help
> catch any collisions of two e1000e threads accessing hardware
> at the same time.
>
> description and patch updated by Jesse
>
> Signed-off-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Jesse Brandeburg <[email protected]>
> ---
>
> drivers/net/e1000e/ich8lan.c | 17 +++++++++++++++++
> 1 files changed, 17 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
> index a076079..57c6d2f 100644
> --- a/drivers/net/e1000e/ich8lan.c
> +++ b/drivers/net/e1000e/ich8lan.c
> @@ -366,6 +366,9 @@ static s32 e1000_get_variants_ich8lan(struct e1000_adapter *adapter)
> return 0;
> }
>
> +static DEFINE_MUTEX(nvm_mutex);
> +static pid_t nvm_owner = -1;
> +
> /**
> * e1000_acquire_swflag_ich8lan - Acquire software control flag
> * @hw: pointer to the HW structure
> @@ -379,6 +382,15 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
> u32 extcnf_ctrl;
> u32 timeout = PHY_CFG_TIMEOUT;
>
> + WARN_ON(preempt_count());
> +
> + if (!mutex_trylock(&nvm_mutex)) {
> + WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
> + nvm_owner);
> + mutex_lock(&nvm_mutex);
> + }
> + nvm_owner = current->pid;
> +
> while (timeout) {
> extcnf_ctrl = er32(EXTCNF_CTRL);
> extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
> @@ -393,6 +405,8 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
>
> if (!timeout) {
> hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
> + nvm_owner = -1;
> + mutex_unlock(&nvm_mutex);
> return -E1000_ERR_CONFIG;
> }
>
> @@ -414,6 +428,9 @@ static void e1000_release_swflag_ich8lan(struct e1000_hw *hw)
> extcnf_ctrl = er32(EXTCNF_CTRL);
> extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
> ew32(EXTCNF_CTRL, extcnf_ctrl);
> +
> + nvm_owner = -1;
> + mutex_unlock(&nvm_mutex);
> }

A few minutes ago, I have actually just hit this, while debugging the
issue on a kernel that had this patch included.

I was not successful reproducing it yet though, but still it might be a
pointer into direction where the real bug is.

15:49:07 linux-pr0e dhclient: Listening on LPF/eth1/00:15:58:c6:4a:ff
15:49:07 linux-pr0e dhclient: Sending on LPF/eth1/00:15:58:c6:4a:ff
15:49:07 linux-pr0e dhclient: Sending on Socket/fallback
15:49:07 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:49:10 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
15:49:18 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9
15:49:27 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9
15:49:36 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 17
15:49:53 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12
15:50:05 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:50:08 linux-pr0e dhclient: No DHCPOFFERS received.
15:50:08 linux-pr0e dhclient: No working leases in persistent database - sleeping.
15:50:52 linux-pr0e kernel: ------------[ cut here ]------------
15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
15:50:52 linux-pr0e kernel: Modules linked in: af_packet i915 drm ipv6 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq microcode fuse loop dm_mod tulip arc4 ecb snd_hda_intel snd_pcm crypto_blkcipher rtc_cmos snd_timer ppdev iwl3945 thinkpad_acpi pcmcia uvcvideo parport_pc rtc_core snd_page_alloc video rfkill i2c_i801 mac80211 iTCO_wdt compat_ioctl32 rtc_lib yenta_socket pcspkr joydev ohci1394 snd_hwdep rsrc_nonstatic output i2c_core btusb parport battery led_class videodev ac ieee1394 v4l1_compat e1000e wmi iTCO_vendor_support pcmcia_core button snd soundcore intel_agp cfg80211 bluetooth sg sr_mod cdrom sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic ata_piix ahci pata_acpi libata scsi_mod dock thermal processor
15:50:52 linux-pr0e kernel: Pid: 7, comm: events/0 Tainted: G 2.6.27-rc7-7.10-default #1
15:50:52 linux-pr0e kernel:
15:50:52 linux-pr0e kernel: Call Trace:
15:50:52 linux-pr0e kernel: [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
15:50:52 linux-pr0e kernel: [<ffffffff80493716>] dump_stack+0x69/0x6f
15:50:52 linux-pr0e kernel: [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
15:50:52 linux-pr0e kernel: [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
15:50:52 linux-pr0e kernel: [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
15:50:52 linux-pr0e kernel: [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
15:50:52 linux-pr0e kernel: [<ffffffff80251fe5>] kthread+0x47/0x73
15:50:52 linux-pr0e kernel: [<ffffffff8020d7a9>] child_rip+0xa/0x11
15:50:52 linux-pr0e kernel:
15:50:52 linux-pr0e kernel: ---[ end trace 6f68a3c748ede326 ]---
15:51:25 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3
15:51:28 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 8
15:51:36 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
15:51:49 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 13
15:52:02 linux-pr0e dhclient: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 18
15:52:15 linux-pr0e kernel: Machine check events logged

--
Jiri Kosina
SUSE Labs

2008-10-02 15:03:58

by Olaf Kirch

[permalink] [raw]

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thursday 02 October 2008 16:28:42 Jiri Kosina wrote:
>
> 15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
> 15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
> 15:50:52 linux-pr0e kernel: Call Trace:
> 15:50:52 linux-pr0e kernel: [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
> 15:50:52 linux-pr0e kernel: [<ffffffff80493716>] dump_stack+0x69/0x6f
> 15:50:52 linux-pr0e kernel: [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
> 15:50:52 linux-pr0e kernel: [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
> 15:50:52 linux-pr0e kernel: [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
> 15:50:52 linux-pr0e kernel: [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
> 15:50:52 linux-pr0e kernel: [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
> 15:50:52 linux-pr0e kernel: [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
> 15:50:52 linux-pr0e kernel: [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
> 15:50:52 linux-pr0e kernel: [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
> 15:50:52 linux-pr0e kernel: [<ffffffff80251fe5>] kthread+0x47/0x73
> 15:50:52 linux-pr0e kernel: [<ffffffff8020d7a9>] child_rip+0xa/0x11

Looks like the e1000 watchdog racing with some dhclient activity (upping the interface).

I just noticed that the driver actually uses register pages. So it looks like it's
possible to have something like this without the mutex:

process A selects page A
process B selects page B
process A writes to register at offset A'

So we may end up writing to the wrong register. I think I heard Vojtech mention
that the e1000e also has a register based interface to erase/rewrite the NVM
programmatically. Do we know at which offsets these registers live?

Olaf
--
Neo didn't bring down the Matrix. SOA did.
--soafacts.com

2008-10-02 16:27:23

by Jesse Brandeburg

[permalink] [raw]

Subject: RE: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

Olaf Kirch wrote:
> Looks like the e1000 watchdog racing with some dhclient activity
> (upping the interface).

> I just noticed that the driver actually uses register pages. So it
> looks like it's possible to have something like this without the
> mutex:
>
> process A selects page A
> process B selects page B
> process A writes to register at offset A'

I think that is possible, which is why the mutex patch would be good for
the future. However we have not shown that to be happening as a root
cause, but I don't rule it out.

so, why now? Drivers since before the e1000/e1000e split had this same
code, with no reports of problems. This code has been heavily tested,
and one of the platforms easily reproducing this has been available for
3 years now (ich8), with code that is basically unchanged in the driver.

> So we may end up writing to the wrong register. I think I heard
> Vojtech mention
> that the e1000e also has a register based interface to erase/rewrite
> the NVM programmatically. Do we know at which offsets these registers
> live?

The flash control registers are documented in the ICH documentation, and
are located at physical memory location indicated by BAR1 in the config
space of device 0:19.0

I wonder if we couldn't put a check in to see if the value we end up
reading from the register controlling the operation matches the
operation we were expecting (read vs write vs block erase)

2008-10-02 17:33:36

by Olaf Kirch

[permalink] [raw]

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thursday 02 October 2008 18:27:12 Brandeburg, Jesse wrote:
> so, why now? Drivers since before the e1000/e1000e split had this same
> code, with no reports of problems. This code has been heavily tested,
> and one of the platforms easily reproducing this has been available for
> 3 years now (ich8), with code that is basically unchanged in the driver.

Possibly the dhcp client is doing something differently, or at a much higher
frequency. At any rate, it seems we're seeing this now even when we just
use init level 3, without X involvement. Karsten reports NVM corruption
after 34 reboots into init level 3.

> The flash control registers are documented in the ICH documentation, and
> are located at physical memory location indicated by BAR1 in the config
> space of device 0:19.0
>
> I wonder if we couldn't put a check in to see if the value we end up
> reading from the register controlling the operation matches the
> operation we were expecting (read vs write vs block erase)

That may help, yes.

Olaf
--
Neo didn't bring down the Matrix. SOA did.
--soafacts.com

2008-10-02 18:02:58

by Thomas Gleixner

[permalink] [raw]

Subject: RE: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thu, 2 Oct 2008, Brandeburg, Jesse wrote:

> Olaf Kirch wrote:
> > Looks like the e1000 watchdog racing with some dhclient activity
> > (upping the interface).
>
> > I just noticed that the driver actually uses register pages. So it
> > looks like it's possible to have something like this without the
> > mutex:
> >
> > process A selects page A
> > process B selects page B
> > process A writes to register at offset A'
>
> I think that is possible, which is why the mutex patch would be good for
> the future. However we have not shown that to be happening as a root
> cause, but I don't rule it out.

Nevertheless I vote strongly for putting that check in _NOW_. It has
proven that there is concurrent access and that's definitely a bug by
all means.

> so, why now? Drivers since before the e1000/e1000e split had this same
> code, with no reports of problems. This code has been heavily tested,
> and one of the platforms easily reproducing this has been available for
> 3 years now (ich8), with code that is basically unchanged in the driver.

Well, timing of events changes slightly over time and we definitely
had some major changes in the last three years which influence timing
(high res timers, dynticks, NAPI ....)

Thanks,

tglx

2008-10-02 18:59:27

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thu, 2 Oct 2008, Olaf Kirch wrote:

> On Thursday 02 October 2008 18:27:12 Brandeburg, Jesse wrote:
> > so, why now? Drivers since before the e1000/e1000e split had this same
> > code, with no reports of problems. This code has been heavily tested,
> > and one of the platforms easily reproducing this has been available for
> > 3 years now (ich8), with code that is basically unchanged in the driver.
>
> Possibly the dhcp client is doing something differently, or at a much higher
> frequency. At any rate, it seems we're seeing this now even when we just
> use init level 3, without X involvement. Karsten reports NVM corruption
> after 34 reboots into init level 3.

Had Karsten the mutex patch applied or not ?

tglx

2008-10-02 19:07:43

by Olaf Kirch

[permalink] [raw]

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thursday 02 October 2008 20:58:43 Thomas Gleixner wrote:
> On Thu, 2 Oct 2008, Olaf Kirch wrote:
>
> > On Thursday 02 October 2008 18:27:12 Brandeburg, Jesse wrote:
> > > so, why now? Drivers since before the e1000/e1000e split had this same
> > > code, with no reports of problems. This code has been heavily tested,
> > > and one of the platforms easily reproducing this has been available for
> > > 3 years now (ich8), with code that is basically unchanged in the driver.
> >
> > Possibly the dhcp client is doing something differently, or at a much higher
> > frequency. At any rate, it seems we're seeing this now even when we just
> > use init level 3, without X involvement. Karsten reports NVM corruption
> > after 34 reboots into init level 3.
>
> Had Karsten the mutex patch applied or not ?

That was openSuse 11.1 without any patches

Olaf
--
Neo didn't bring down the Matrix. SOA did.
--soafacts.com

2008-10-02 19:08:35

by Olaf Kirch

[permalink] [raw]

Subject: Re: [RFC PATCH 07/12] e1000e: debug contention on NVM SWFLAG

On Thursday 02 October 2008 21:07:22 Olaf Kirch wrote:
> That was openSuse 11.1 without any patches

I should have said 11.1 beta1.

Olaf
--
Neo didn't bring down the Matrix. SOA did.
--soafacts.com

2008-10-02 23:43:41

by Thomas Gleixner

[permalink] [raw]

Subject: [PATCH] e1000e: prevent concurrent access to NVRAM

On Thu, 2 Oct 2008, Olaf Kirch wrote:

> On Thursday 02 October 2008 16:28:42 Jiri Kosina wrote:
> >
> > 15:50:52 linux-pr0e kernel: WARNING: at drivers/net/e1000e/ich8lan.c:424 e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]()
> > 15:50:52 linux-pr0e kernel: e1000e mutex contention. Owned by pid 4162
> > 15:50:52 linux-pr0e kernel: Call Trace:
> > 15:50:52 linux-pr0e kernel: [<ffffffff8020e41e>] show_trace_log_lvl+0x41/0x58
> > 15:50:52 linux-pr0e kernel: [<ffffffff80493716>] dump_stack+0x69/0x6f
> > 15:50:52 linux-pr0e kernel: [<ffffffff8023ee54>] warn_slowpath+0xb4/0xdc
> > 15:50:52 linux-pr0e kernel: [<ffffffffa022ce2e>] e1000_acquire_swflag_ich8lan+0x5a/0xdc [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02317ba>] e1000e_read_phy_reg_igp+0x19/0x64 [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02319f8>] e1000e_phy_has_link_generic+0x50/0xcc [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa02306f9>] e1000e_check_for_copper_link+0x24/0x86 [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffffa0236982>] e1000_watchdog_task+0x5c/0x5eb [e1000e]
> > 15:50:52 linux-pr0e kernel: [<ffffffff8024ecdb>] run_workqueue+0xa4/0x14c
> > 15:50:52 linux-pr0e kernel: [<ffffffff8024ee5b>] worker_thread+0xd8/0xe7
> > 15:50:52 linux-pr0e kernel: [<ffffffff80251fe5>] kthread+0x47/0x73
> > 15:50:52 linux-pr0e kernel: [<ffffffff8020d7a9>] child_rip+0xa/0x11
>
> Looks like the e1000 watchdog racing with some dhclient activity (upping the interface).
>
> I just noticed that the driver actually uses register pages. So it looks like it's
> possible to have something like this without the mutex:
>
> process A selects page A
> process B selects page B
> process A writes to register at offset A'
>
> So we may end up writing to the wrong register. I think I heard Vojtech mention
> that the e1000e also has a register based interface to erase/rewrite the NVM
> programmatically. Do we know at which offsets these registers live?

Linus,

can you please apply the patch below which prevents the concurrent
access to the NVRAM. It triggered the trace which Olaf reported above.

I worked out that patch on Sept 24th and it triggered a couple of
problems in the e1000e code right away. These have been addressed by
Jesse and are part of the patch series he posted in the last days.

Still, all we have in mainline is some "hopefully bug preventing"
patch which does not address the root cause at all.

The confirmed bugs where the nvram acquire code was called
concurrently are still in your tree and the prevention patch along
with the resulting bugfixes are stuck in some obscure intel QA
process.

Please apply at least the bug prevention patch below.

Thanks,

tglx
---
Subject: e1000e prevent concurrent access and debug contention on NVM SWFALG
Date: Wed, 24 Sep 2008 00:45:08 -0700
From: Thomas Gleixner <[email protected]>

Signed-off-by: Thomas Gleixner <[email protected]>
Cc: [email protected]
---

drivers/net/e1000e/ich8lan.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

Index: linux-2.6/drivers/net/e1000e/ich8lan.c
===================================================================
--- linux-2.6.orig/drivers/net/e1000e/ich8lan.c
+++ linux-2.6/drivers/net/e1000e/ich8lan.c
@@ -382,6 +382,9 @@ static s32 e1000_get_variants_ich8lan(st
return 0;
}

+static DEFINE_MUTEX(nvm_mutex);
+static pid_t nvm_owner = -1;
+
/**
* e1000_acquire_swflag_ich8lan - Acquire software control flag
* @hw: pointer to the HW structure
@@ -395,6 +398,15 @@ static s32 e1000_acquire_swflag_ich8lan(
u32 extcnf_ctrl;
u32 timeout = PHY_CFG_TIMEOUT;

+ WARN_ON(preempt_count());
+
+ if (!mutex_trylock(&nvm_mutex)) {
+ WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
+ nvm_owner);
+ mutex_lock(&nvm_mutex);
+ }
+ nvm_owner = current->pid;
+
while (timeout) {
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl |= E1000_EXTCNF_CTRL_SWFLAG;
@@ -409,6 +421,8 @@ static s32 e1000_acquire_swflag_ich8lan(

if (!timeout) {
hw_dbg(hw, "FW or HW has locked the resource for too long.\n");
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
return -E1000_ERR_CONFIG;
}

@@ -430,6 +444,9 @@ static void e1000_release_swflag_ich8lan
extcnf_ctrl = er32(EXTCNF_CTRL);
extcnf_ctrl &= ~E1000_EXTCNF_CTRL_SWFLAG;
ew32(EXTCNF_CTRL, extcnf_ctrl);
+
+ nvm_owner = -1;
+ mutex_unlock(&nvm_mutex);
}

/**

2008-10-03 00:19:18

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [PATCH] e1000e: prevent concurrent access to NVRAM

On Thu, Oct 2, 2008 at 4:42 PM, Thomas Gleixner <[email protected]> wrote:
> The confirmed bugs where the nvram acquire code was called
> concurrently are still in your tree and the prevention patch along
> with the resulting bugfixes are stuck in some obscure intel QA
> process.
>
> Please apply at least the bug prevention patch below.

This is the same patch I posted 7 minutes ago, except that this patch
without the e1000e changes applied before it will cause all sorts of
WARN's to be printed during normal operation. If at all possible I
think they should stay together as a group to prevent un-necessary
noise in the logs.

2008-10-03 00:29:27

by Thomas Gleixner

[permalink] [raw]

Subject: Re: [PATCH] e1000e: prevent concurrent access to NVRAM

On Thu, 2 Oct 2008, Jesse Brandeburg wrote:

> On Thu, Oct 2, 2008 at 4:42 PM, Thomas Gleixner <[email protected]> wrote:
> > The confirmed bugs where the nvram acquire code was called
> > concurrently are still in your tree and the prevention patch along
> > with the resulting bugfixes are stuck in some obscure intel QA
> > process.
> >
> > Please apply at least the bug prevention patch below.
>
> This is the same patch I posted 7 minutes ago, except that this patch
> without the e1000e changes applied before it will cause all sorts of
> WARN's to be printed during normal operation. If at all possible I
> think they should stay together as a group to prevent un-necessary
> noise in the logs.

Sure, I'm all for the combo of those. I just wanted to get some motion
into this stale process.

tglx