Hello Thomas,
I have e1000e compiled into my kernel and
commit 717d438d1fde94decef874b9808379d1f4523453
Author: Thomas Gleixner <[email protected]>
Date: Thu Oct 2 16:33:40 2008 -0700
e1000e: debug contention on NVM SWFLAG
Causes a storm of
[ 15.600387] ------------[ cut here ]------------
[ 15.600388] WARNING: at drivers/net/e1000e/ich8lan.c:399
e1000_acquire_swflag_ich8lan+0xde/0xf0()
[ 15.600389] Modules linked in:
[ 15.600390] Pid: 1, comm: swapper Tainted: G W
2.6.27-rc8-00055-gb5ff7df #26
[ 15.600391] [<c01421cf>] warn_on_slowpath+0x5f/0xa0
[ 15.600394] [<c0464999>] __devinet_sysctl_register+0xc9/0x100
[ 15.600396] [<c015bf0e>] sched_clock_cpu+0xde/0x180
[ 15.600399] [<c015ae98>] down_trylock+0x28/0x40
[ 15.600400] [<c04df645>] _spin_unlock+0x5/0x20
[ 15.600402] [<c02944a4>] delay_tsc+0x84/0xb0
[ 15.600404] [<c031bd6e>] e1000_acquire_swflag_ich8lan+0xde/0xf0
[ 15.600406] [<c031b716>] e1000_read_flash_word_ich8lan+0x76/0xb0
[ 15.600408] [<c031c05b>] e1000_read_nvm_ich8lan+0x5b/0xf0
[ 15.600410] [<c031ecb4>] e1000e_read_pba_num+0x64/0x80
[ 15.600412] [<c04d2158>] e1000_probe+0xb98/0xc20
[ 15.600414] [<c02a4f0e>] pci_device_probe+0x5e/0x80
[ 15.600416] [<c030d416>] driver_probe_device+0x86/0x1a0
[ 15.600418] [<c04df373>] _spin_lock_irqsave+0x33/0x50
[ 15.600420] [<c030d5a1>] __driver_attach+0x71/0x80
[ 15.600422] [<c02a4e50>] pci_device_remove+0x0/0x40
[ 15.600424] [<c030cd44>] bus_for_each_dev+0x44/0x70
[ 15.600426] [<c02a4e50>] pci_device_remove+0x0/0x40
[ 15.600427] [<c030d2a6>] driver_attach+0x16/0x20
[ 15.600430] [<c030d530>] __driver_attach+0x0/0x80
[ 15.600432] [<c030c70f>] bus_add_driver+0x19f/0x220
[ 15.600434] [<c02a4e50>] pci_device_remove+0x0/0x40
[ 15.600435] [<c030d73c>] driver_register+0x5c/0x130
[ 15.600437] [<c06aa88f>] thinkpad_acpi_module_init+0x7b2/0x983
[ 15.600439] [<c06aaa60>] e1000_init_module+0x0/0x70
[ 15.600441] [<c02a5157>] __pci_register_driver+0x47/0x90
[ 15.600443] [<c06aaaa5>] e1000_init_module+0x45/0x70
[ 15.600445] [<c01012ea>] do_one_initcall+0x2a/0x190
[ 15.600446] [<c01defb4>] create_proc_entry+0x54/0xa0
[ 15.600449] [<c0175411>] register_irq_proc+0xc1/0xe0
[ 15.600451] [<c0175478>] init_irq_proc+0x48/0x60
[ 15.600452] [<c068e85d>] kernel_init+0x11a/0x17d
[ 15.600454] [<c068e743>] kernel_init+0x0/0x17d
[ 15.600456] [<c011d48b>] kernel_thread_helper+0x7/0x1c
[ 15.600458] =======================
[ 15.600459] ---[ end trace 1caa30bae2a6fa92 ]---
This is caused by holding a spinlock (__driver_attach) and checking for
preempt_count (e1000_acquire_swflag_ich8lan).
I suggest to revert this commit, since we cannot take a mutex while holding a
spinlock.
The simple solution of replacing the mutex with a spinlock does not work,
since we call msleep on several places in the code. Replacing all that code
doesnt look like 2.6.27 material.
Christian
On Fri, 3 Oct 2008, Christian Borntraeger wrote:
>
> This is caused by holding a spinlock (__driver_attach) and checking for
> preempt_count (e1000_acquire_swflag_ich8lan).
Which spinlock is it? It sure shouldn't be anything in the generic drievr
layer: __driver_attach takes a lock, but it's a semaphore (dev->sem), not
a spinlock. In fact, al lot of drivers will sleep or do allocations, so if
the driver layer were to call the probe function with a spinlock held,
things would be really broken.
The callchain has "spin_lock_irqsave" in it, but that's stale stack data.
Linus
On Fri, 3 Oct 2008 10:41:49 +0200
Christian Borntraeger <[email protected]> wrote:
> This is caused by holding a spinlock (__driver_attach) and checking
> for preempt_count (e1000_acquire_swflag_ich8lan).
>
> I suggest to revert this commit, since we cannot take a mutex while
> holding a spinlock.
> The simple solution of replacing the mutex with a spinlock does not
> work, since we call msleep on several places in the code. Replacing
> all that code doesnt look like 2.6.27 material.
hi,
could you do a run with frame pointers enabled?
that should clear up the stack noise....
--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org
Am Freitag, 3. Oktober 2008 schrieben Sie:
> Which spinlock is it? It sure shouldn't be anything in the generic drievr
> layer: __driver_attach takes a lock, but it's a semaphore (dev->sem), not
> a spinlock. In fact, al lot of drivers will sleep or do allocations, so if
> the driver layer were to call the probe function with a spinlock held,
> things would be really broken.
>
> The callchain has "spin_lock_irqsave" in it, but that's stale stack data.
Very good point indeed. Seems my analysis was wrong. I made a short test and
preempt_count() returns 1 when the WARN_ON triggers. So its either a spinlock
or something else that increased my preempt_count.
On Fri, 3 Oct 2008, Christian Borntraeger wrote:
>
> Very good point indeed. Seems my analysis was wrong. I made a short test and
> preempt_count() returns 1 when the WARN_ON triggers. So its either a spinlock
> or something else that increased my preempt_count.
Ok, I can repro it. I couldn't see it before because I usually don't have
PREEMPT even enabled. Enabling PREEMPT and PREEMPT_DEBUG gets me the
messages too. Jesse?
The call trace is much better with frame pointers enabled, loke so (with
the '?' entries removed as stale):
Call Trace:
warn_on_slowpath+0x53/0x7a
e1000_acquire_swflag_ich8lan+0x2d/0xe1
e1000_read_nvm_ich8lan+0x4f/0xee
e1000e_read_pba_num+0x4e/0x69
e1000_probe+0xa32/0xb71
pci_device_probe+0x4c/0x72
driver_probe_device+0xc0/0x16e
__driver_attach+0x4f/0x79
bus_for_each_dev+0x4f/0x89
driver_attach+0x1c/0x1e
bus_add_driver+0xb7/0x1f4
driver_register+0xa8/0x128
__pci_register_driver+0x66/0x9d
e1000_init_module+0x4f/0x6b
_stext+0x47/0x13f
kernel_init+0x127/0x17b
child_rip+0xa/0x11
and yes, there's quite a lot of them. I'm not seeing what disables
preemption, though. I'll look at it some more, but I decided I should
post that cleaned-up call chain so that others can do so too..
Linus
Am Freitag, 3. Oktober 2008 schrieb Linus Torvalds:
> Ok, I can repro it. I couldn't see it before because I usually don't have
> PREEMPT even enabled. Enabling PREEMPT and PREEMPT_DEBUG gets me the
> messages too. Jesse?
I ow enabled FRAMEBUFFERS and increased the log size to actually get the first
one. Its different from yours, but should be the same problem.
According to grep I have 265 warnings in dmesg.
[ 0.853027] e1000e: Intel(R) PRO/1000 Network Driver - 0.3.3.3-k6
[ 0.867831] e1000e: Copyright (c) 1999-2008 Intel Corporation.
[ 0.882588] e1000e 0000:00:19.0: PCI INT A -> GSI 20 (level, low) -> IRQ 20
[ 0.897300] e1000e 0000:00:19.0: setting latency timer to 64
[ 0.897456] ------------[ cut here ]------------
[ 0.912011] WARNING: at drivers/net/e1000e/ich8lan.c:399
e1000_acquire_swflag_ich8lan+0xe7/0xf0()
[ 0.926825] Modules linked in:
[ 0.941486] Pid: 1, comm: swapper Not tainted 2.6.27-rc8-00055-gb5ff7df #34
[ 0.956340] [<c04da2c1>] ? printk+0x18/0x1f
[ 0.971205] [<c0141ae4>] warn_on_slowpath+0x54/0x80
[ 0.986110] [<c013741b>] ? kunmap_atomic+0x3b/0xc0
[ 1.001016] [<c0134dd3>] ? change_page_attr_set_clr+0xc3/0x1b0
[ 1.015966] [<c0319a67>] e1000_acquire_swflag_ich8lan+0xe7/0xf0
[ 1.030827] [<c013507f>] ? _set_memory_uc+0x1f/0x30
[ 1.045530] [<c031e48c>] e1000e_read_phy_reg_igp+0x1c/0x70
[ 1.060097] [<c031e54e>] e1000e_get_phy_id+0x1e/0x80
[ 1.074481] [<c031e5cf>] e1000e_determine_phy_address+0x1f/0x50
[ 1.088705] [<c031a87f>] e1000_get_variants_ich8lan+0xdf/0x250
[ 1.102774] [<c04cea3f>] e1000_probe+0x40f/0xc80
[ 1.116727] [<c01e5056>] ? sysfs_addrm_finish+0x16/0x200
[ 1.130557] [<c01e4833>] ? sysfs_add_one+0x13/0x50
[ 1.144120] [<c01e48c1>] ? sysfs_addrm_start+0x51/0x90
[ 1.144128] [<c02a0d65>] ? pci_match_device+0xb5/0xc0
[ 1.144136] [<c02a156e>] pci_device_probe+0x5e/0x80
[ 1.144143] [<c030aeb2>] driver_probe_device+0x82/0x1a0
[ 1.144152] [<c030b041>] __driver_attach+0x71/0x80
[ 1.144159] [<c030a7c3>] bus_for_each_dev+0x53/0x80
[ 1.144165] [<c02a14b0>] ? pci_device_remove+0x0/0x40
[ 1.144172] [<c030ad39>] driver_attach+0x19/0x20
[ 1.144179] [<c030afd0>] ? __driver_attach+0x0/0x80
[ 1.144186] [<c030a197>] bus_add_driver+0x1a7/0x220
[ 1.144193] [<c02a14b0>] ? pci_device_remove+0x0/0x40
[ 1.144200] [<c030b1e9>] driver_register+0x69/0x150
[ 1.144206] [<c0315040>] ? fan_write+0x0/0x3c0
[ 1.144213] [<c06a8080>] ? e1000_init_module+0x0/0x70
[ 1.144221] [<c02a178a>] __pci_register_driver+0x4a/0xa0
[ 1.144228] [<c06a8080>] ? e1000_init_module+0x0/0x70
[ 1.144235] [<c06a80c8>] e1000_init_module+0x48/0x70
[ 1.144241] [<c01012f6>] do_one_initcall+0x26/0x170
[ 1.144248] [<c01ddab4>] ? create_proc_entry+0x54/0xb0
[ 1.144255] [<c017502e>] ? register_irq_proc+0xbe/0xe0
[ 1.144264] [<c0175098>] ? init_irq_proc+0x48/0x60
[ 1.144271] [<c068b89d>] kernel_init+0x11a/0x17d
[ 1.144279] [<c068b783>] ? kernel_init+0x0/0x17d
[ 1.144285] [<c011d207>] kernel_thread_helper+0x7/0x10
I got an idea,
kernel_init does lock_kernel (which does preempt_disable).
The kernel_unlock is done in init_post. As far as I can see, this happens
after the driver init calls.
Christian
On Fri, 3 Oct 2008, Christian Borntraeger wrote:
>
> kernel_init does lock_kernel (which does preempt_disable).
> The kernel_unlock is done in init_post. As far as I can see, this happens
> after the driver init calls.
Yes. I just came to the same conclusion: preempt_count is 1 already in
do_one_initcall().
So this whole bug is actually because that debug test is just _broken_. It
shouldn't be WARN_ON(preempt_count()), it should be 'might_sleep()', which
does it right.
The right thing to check is "in_atomic() || irqs_disabled()".
Duh. That was wasted effort for a buggy test.
Linus
Linus Torvalds wrote:
> So this whole bug is actually because that debug test is just
> _broken_. It shouldn't be WARN_ON(preempt_count()), it should be
> 'might_sleep()', which does it right.
>
> The right thing to check is "in_atomic() || irqs_disabled()".
okay, do you want me to post a patch? I think Thomas is probably
offline due to fridaynightitis. Change the code to:?
WARN_ON(in_atomic() || irqs_disabled);
> Duh. That was wasted effort for a buggy test.
Apologies for not catching it.
On Fri, 3 Oct 2008, Linus Torvalds wrote:
>
> So this whole bug is actually because that debug test is just _broken_. It
> shouldn't be WARN_ON(preempt_count()), it should be 'might_sleep()', which
> does it right.
Tested and confirmed. Committed and pushed out,
Linus
This patch seems to fix it. Comments?
kernel_init uses lock_kernel to serialize the initialization. If the driver
is compiled into the kernel, the probe function of e1000e is called under
this lock and therefore the WARN_ON(preempt_count()) triggers. We can now
1. remove this WARN_ON
2. check for SYSTEM_RUNNING
This patch implements 2.
Signed-off-by: Christian Borntraeger <[email protected]>
diff --git a/drivers/net/e1000e/ich8lan.c b/drivers/net/e1000e/ich8lan.c
index 0b6095b..cca9731 100644
--- a/drivers/net/e1000e/ich8lan.c
+++ b/drivers/net/e1000e/ich8lan.c
@@ -396,7 +396,7 @@ static s32 e1000_acquire_swflag_ich8lan(struct e1000_hw *hw)
u32 extcnf_ctrl;
u32 timeout = PHY_CFG_TIMEOUT;
- WARN_ON(preempt_count());
+ WARN_ON(system_state == SYSTEM_RUNNING && preempt_count());
if (!mutex_trylock(&nvm_mutex)) {
WARN(1, KERN_ERR "e1000e mutex contention. Owned by pid %d\n",
On Fri, 3 Oct 2008, Brandeburg, Jesse wrote:
>
> Linus Torvalds wrote:
> > So this whole bug is actually because that debug test is just
> > _broken_. It shouldn't be WARN_ON(preempt_count()), it should be
> > 'might_sleep()', which does it right.
> >
> > The right thing to check is "in_atomic() || irqs_disabled()".
>
> okay, do you want me to post a patch? I think Thomas is probably
> offline due to fridaynightitis. Change the code to:?
>
> WARN_ON(in_atomic() || irqs_disabled);
No, literally just "might_sleep()". It then does everything right,
including taking into account that we allow certain behavior during bootup
that we don't allow later.
Linus
On Fri, 3 Oct 2008, Linus Torvalds wrote:
> On Fri, 3 Oct 2008, Christian Borntraeger wrote:
> >
> > kernel_init does lock_kernel (which does preempt_disable).
> > The kernel_unlock is done in init_post. As far as I can see, this happens
> > after the driver init calls.
>
> Yes. I just came to the same conclusion: preempt_count is 1 already in
> do_one_initcall().
>
> So this whole bug is actually because that debug test is just _broken_. It
> shouldn't be WARN_ON(preempt_count()), it should be 'might_sleep()', which
> does it right.
>
> The right thing to check is "in_atomic() || irqs_disabled()".
>
> Duh. That was wasted effort for a buggy test.
Stupid me. Yes I should have used might_sleep. It just did not trigger
here as I did not have the driver compiled into the kernel.
Sorry,
tglx