2009-03-13 22:15:22

by Andrew Morton

[permalink] [raw]
Subject: Re: [stable] Li-nux 2.6.27.19 2.6.28.7


I fired up this kernel up on my FC8 laptop and I see
http://userweb.kernel.org/~akpm/p3130212.jpg

On the next two boot attempts, the kernel came up OK.



Also, during boot the e1000e driver has conniptions:

------------[ cut here ]------------
WARNING: at drivers/net/e1000e/ich8lan.c:408 e1000_acquire_swflag_ich8lan+0x51/0xf2()
e1000e mutex contention. Owned by pid 10
Modules linked in:
Pid: 9, comm: events/0 Not tainted 2.6.28.7 #1
Call Trace:
[<ffffffff8103a810>] warn_slowpath+0xae/0xcd
[<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
[<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
[<ffffffff8105d63f>] ? __lock_acquire+0x702/0x760
[<ffffffff8105bfc6>] ? mark_held_locks+0x50/0x6d
[<ffffffff812dd950>] ? mutex_trylock+0x104/0x118
[<ffffffff8105c170>] ? trace_hardirqs_on_caller+0xf8/0x123
[<ffffffff8105c1a8>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff811ed838>] e1000_acquire_swflag_ich8lan+0x51/0xf2
[<ffffffff811f2fe9>] e1000e_read_kmrn_reg+0x1b/0x69
[<ffffffff811f63c5>] ? e1000e_downshift_workaround+0x0/0x12
[<ffffffff811ed1e9>] e1000e_gig_downshift_workaround_ich8lan+0x2c/0x71
[<ffffffff811f63d5>] e1000e_downshift_workaround+0x10/0x12
[<ffffffff8104a6ed>] run_workqueue+0xf5/0x1fd
[<ffffffff8104a697>] ? run_workqueue+0x9f/0x1fd
[<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
[<ffffffff8104a9d1>] worker_thread+0xdb/0xe8
[<ffffffff8104de14>] ? autoremove_wake_function+0x0/0x36
[<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
[<ffffffff8104db1a>] kthread+0x44/0x6b
[<ffffffff8100cf59>] child_rip+0xa/0x11
[<ffffffff8100c474>] ? restore_args+0x0/0x30
[<ffffffff8104dad6>] ? kthread+0x0/0x6b
[<ffffffff8100cf4f>] ? child_rip+0x0/0x11
---[ end trace 09c88554f9900e8b ]---

Config: http://userweb.kernel.org/~akpm/config-t61p.txt
dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p.txt


2009-03-13 22:23:24

by Greg KH

[permalink] [raw]
Subject: Re: [stable] Li-nux 2.6.27.19 2.6.28.7

On Fri, Mar 13, 2009 at 03:10:51PM -0700, Andrew Morton wrote:
>
> I fired up this kernel up on my FC8 laptop and I see
> http://userweb.kernel.org/~akpm/p3130212.jpg
>
> On the next two boot attempts, the kernel came up OK.
>

That's wierd, it can't find the root filesystem?

What driver controls the disk for /dev/root? It's not the USB
controller, is it? Is that normally sda?

> Also, during boot the e1000e driver has conniptions:
>
> ------------[ cut here ]------------
> WARNING: at drivers/net/e1000e/ich8lan.c:408 e1000_acquire_swflag_ich8lan+0x51/0xf2()
> e1000e mutex contention. Owned by pid 10
> Modules linked in:
> Pid: 9, comm: events/0 Not tainted 2.6.28.7 #1
> Call Trace:
> [<ffffffff8103a810>] warn_slowpath+0xae/0xcd
> [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
> [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
> [<ffffffff8105d63f>] ? __lock_acquire+0x702/0x760
> [<ffffffff8105bfc6>] ? mark_held_locks+0x50/0x6d
> [<ffffffff812dd950>] ? mutex_trylock+0x104/0x118
> [<ffffffff8105c170>] ? trace_hardirqs_on_caller+0xf8/0x123
> [<ffffffff8105c1a8>] ? trace_hardirqs_on+0xd/0xf
> [<ffffffff811ed838>] e1000_acquire_swflag_ich8lan+0x51/0xf2
> [<ffffffff811f2fe9>] e1000e_read_kmrn_reg+0x1b/0x69
> [<ffffffff811f63c5>] ? e1000e_downshift_workaround+0x0/0x12
> [<ffffffff811ed1e9>] e1000e_gig_downshift_workaround_ich8lan+0x2c/0x71
> [<ffffffff811f63d5>] e1000e_downshift_workaround+0x10/0x12
> [<ffffffff8104a6ed>] run_workqueue+0xf5/0x1fd
> [<ffffffff8104a697>] ? run_workqueue+0x9f/0x1fd
> [<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
> [<ffffffff8104a9d1>] worker_thread+0xdb/0xe8
> [<ffffffff8104de14>] ? autoremove_wake_function+0x0/0x36
> [<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
> [<ffffffff8104db1a>] kthread+0x44/0x6b
> [<ffffffff8100cf59>] child_rip+0xa/0x11
> [<ffffffff8100c474>] ? restore_args+0x0/0x30
> [<ffffffff8104dad6>] ? kthread+0x0/0x6b
> [<ffffffff8100cf4f>] ? child_rip+0x0/0x11
> ---[ end trace 09c88554f9900e8b ]---
>
> Config: http://userweb.kernel.org/~akpm/config-t61p.txt
> dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p.txt

Wierd, I don't know.

thanks,

greg k-h

2009-03-13 23:51:21

by Andrew Morton

[permalink] [raw]
Subject: Re: [stable] Li-nux 2.6.27.19 2.6.28.7

On Fri, 13 Mar 2009 15:20:44 -0700 Greg KH <[email protected]> wrote:

> On Fri, Mar 13, 2009 at 03:10:51PM -0700, Andrew Morton wrote:
> >
> > I fired up this kernel up on my FC8 laptop and I see
> > http://userweb.kernel.org/~akpm/p3130212.jpg
> >
> > On the next two boot attempts, the kernel came up OK.
> >
>
> That's wierd, it can't find the root filesystem?
>
> What driver controls the disk for /dev/root? It's not the USB
> controller, is it? Is that normally sda?

The dmesg output is down there.

ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: ATA-7: HTS721010G9SA00, MCZIC14V, max UDMA/100
ata1.00: 195371568 sectors, multi 16: LBA48
ata1.00: ACPI cmd f5/00:00:00:00:00:a0 filtered out
ata1.00: ACPI cmd ef/10:03:00:00:00:a0 filtered out
ata1.00: configured for UDMA/100
ata1.00: configured for UDMA/100
ata1: EH complete
scsi 0:0:0:0: Direct-Access ATA HTS721010G9SA00 MCZI PQ: 0 ANSI: 5
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 0:0:0:0: [sda] 195371568 512-byte hardware sectors: (100 GB/93.1 GiB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

> > Also, during boot the e1000e driver has conniptions:
> >
> > ------------[ cut here ]------------
> > WARNING: at drivers/net/e1000e/ich8lan.c:408 e1000_acquire_swflag_ich8lan+0x51/0xf2()
> > e1000e mutex contention. Owned by pid 10
> > Modules linked in:
> > Pid: 9, comm: events/0 Not tainted 2.6.28.7 #1
> > Call Trace:
> > [<ffffffff8103a810>] warn_slowpath+0xae/0xcd
> > [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
> > [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
> > [<ffffffff8105d63f>] ? __lock_acquire+0x702/0x760
> > [<ffffffff8105bfc6>] ? mark_held_locks+0x50/0x6d
> > [<ffffffff812dd950>] ? mutex_trylock+0x104/0x118
> > [<ffffffff8105c170>] ? trace_hardirqs_on_caller+0xf8/0x123
> > [<ffffffff8105c1a8>] ? trace_hardirqs_on+0xd/0xf
> > [<ffffffff811ed838>] e1000_acquire_swflag_ich8lan+0x51/0xf2
> > [<ffffffff811f2fe9>] e1000e_read_kmrn_reg+0x1b/0x69
> > [<ffffffff811f63c5>] ? e1000e_downshift_workaround+0x0/0x12
> > [<ffffffff811ed1e9>] e1000e_gig_downshift_workaround_ich8lan+0x2c/0x71
> > [<ffffffff811f63d5>] e1000e_downshift_workaround+0x10/0x12
> > [<ffffffff8104a6ed>] run_workqueue+0xf5/0x1fd
> > [<ffffffff8104a697>] ? run_workqueue+0x9f/0x1fd
> > [<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
> > [<ffffffff8104a9d1>] worker_thread+0xdb/0xe8
> > [<ffffffff8104de14>] ? autoremove_wake_function+0x0/0x36
> > [<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
> > [<ffffffff8104db1a>] kthread+0x44/0x6b
> > [<ffffffff8100cf59>] child_rip+0xa/0x11
> > [<ffffffff8100c474>] ? restore_args+0x0/0x30
> > [<ffffffff8104dad6>] ? kthread+0x0/0x6b
> > [<ffffffff8100cf4f>] ? child_rip+0x0/0x11
> > ---[ end trace 09c88554f9900e8b ]---
> >
> > Config: http://userweb.kernel.org/~akpm/config-t61p.txt
> > dmesg: http://userweb.kernel.org/~akpm/dmesg-t61p.txt
>
> Wierd, I don't know.
>

Also in dmesg:

0000:00:19.0: eth0: 10/100 speed: disabling TSO
0000:00:19.0: eth0: Detected Tx Unit Hang:
TDH <7d>
TDT <ec>
next_to_use <ec>
next_to_clean <41>
buffer_info[next_to_clean]:
time_stamp <fffb80d4>
next_to_watch <41>
jiffies <fffb86ed>
next_to_watch.status <1>

2009-03-14 01:01:59

by Jesse Brandeburg

[permalink] [raw]
Subject: RE: [E1000-devel] [stable] Li-nux 2.6.27.19 2.6.28.7

Greg KH wrote:
> On Fri, Mar 13, 2009 at 03:10:51PM -0700, Andrew Morton wrote:
>>
>> I fired up this kernel up on my FC8 laptop and I see
>> http://userweb.kernel.org/~akpm/p3130212.jpg
>>
>> On the next two boot attempts, the kernel came up OK.
>>

root issue:
seems that something with the 2.6.newer doesn't like some of the stuff with the fedora nash stuff. mkinitrd and friends were updated multiple times to work with these newer kernels in the fedora 10 I was using. I worked around by changing root=LABEL to use root=/dev/foo in grub.conf

>>
>> ------------[ cut here ]------------
>> WARNING: at drivers/net/e1000e/ich8lan.c:408
>> e1000_acquire_swflag_ich8lan+0x51/0xf2() e1000e mutex contention.
>> Owned by pid 10
>> Modules linked in:
>> Pid: 9, comm: events/0 Not tainted 2.6.28.7 #1
>> Call Trace:
>> [<ffffffff8103a810>] warn_slowpath+0xae/0xcd
>> [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
>> [<ffffffff8104394b>] ? lock_timer_base+0x26/0x4a
>> [<ffffffff8105d63f>] ? __lock_acquire+0x702/0x760
>> [<ffffffff8105bfc6>] ? mark_held_locks+0x50/0x6d
>> [<ffffffff812dd950>] ? mutex_trylock+0x104/0x118
>> [<ffffffff8105c170>] ? trace_hardirqs_on_caller+0xf8/0x123
>> [<ffffffff8105c1a8>] ? trace_hardirqs_on+0xd/0xf
>> [<ffffffff811ed838>] e1000_acquire_swflag_ich8lan+0x51/0xf2
>> [<ffffffff811f2fe9>] e1000e_read_kmrn_reg+0x1b/0x69
>> [<ffffffff811f63c5>] ? e1000e_downshift_workaround+0x0/0x12
>> [<ffffffff811ed1e9>]
>> e1000e_gig_downshift_workaround_ich8lan+0x2c/0x71
>> [<ffffffff811f63d5>] e1000e_downshift_workaround+0x10/0x12
>> [<ffffffff8104a6ed>] run_workqueue+0xf5/0x1fd [<ffffffff8104a697>]
>> ? run_workqueue+0x9f/0x1fd [<ffffffff8104a8f6>] ?
>> worker_thread+0x0/0xe8 [<ffffffff8104a9d1>] worker_thread+0xdb/0xe8
>> [<ffffffff8104de14>] ? autoremove_wake_function+0x0/0x36
>> [<ffffffff8104a8f6>] ? worker_thread+0x0/0xe8
>> [<ffffffff8104db1a>] kthread+0x44/0x6b
>> [<ffffffff8100cf59>] child_rip+0xa/0x11
>> [<ffffffff8100c474>] ? restore_args+0x0/0x30
>> [<ffffffff8104dad6>] ? kthread+0x0/0x6b
>> [<ffffffff8100cf4f>] ? child_rip+0x0/0x11

newer kernels have this fixed. This really is a warning as this is only telling you that it had to wait (but that the mutex worked!)

we've isolated all these warnings down to known SMP safe paths (and fixed the relevant issues) and have posted a patch to current net-next that removes the warning. don't have the commit handy but could probably chase it down.

so, WARNING is noisy but okay.

The tx hang you see is bad (as it appears to be a false hang since status is set correctly and you don't get a NETDEV_WATCHDOG)

Jesse

PS please include netdev on network related issues. :-)-

2009-03-14 02:05:32

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [E1000-devel] [stable] Li-nux 2.6.27.19 2.6.28.7

Hello.

Brandeburg, Jesse wrote:
> Greg KH wrote:
> > On Fri, Mar 13, 2009 at 03:10:51PM -0700, Andrew Morton wrote:
> >>
> >> I fired up this kernel up on my FC8 laptop and I see
> >> http://userweb.kernel.org/~akpm/p3130212.jpg
> >>
> >> On the next two boot attempts, the kernel came up OK.
> >>
That picture may be caused by /dev/root being automatically disappeared.
https://bugzilla.redhat.com/show_bug.cgi?id=488679

> root issue:
> seems that something with the 2.6.newer doesn't like some of the stuff with the fedora nash stuff.
> mkinitrd and friends were updated multiple times to work with these newer kernels in the fedora 10
> I was using. I worked around by changing root=LABEL to use root=/dev/foo in grub.conf

I don't experience this problem with Debian Sarge.
I think it is Fedora's nash problem.