[1.] One line summary of the problem:
Applied patch-2.6.9.bz2 on top of linux 2.6.8 tree, reboot -
then suddenly laptop frozes.
[2.] Full description of the problem/report:
It is the same .config used to compile 2.6.9 rc1 and 2.6.9 rc2
(http://www.gemtek.lt/~zilvinas/oops/ for kern.log and .config).
Laptop booted as usual, logged in KDE and started up evolution -
mouse froze, keyboard seemed dead - although sysrq-s, sysrq-u &
sysrq-b worked just fine. After reboot I found a lot of messages
repeated like :
Sep 13 18:51:24 evo800 kernel: Warning: kfree_skb on hard IRQ 000000d8
Sep 13 18:51:24 evo800 kernel: bad: scheduling while atomic!
Sep 13 18:51:24 evo800 kernel: [schedule+1208/1213] schedule+0x4b8/0x4bd
Sep 13 18:51:24 evo800 kernel: [sys_time+22/80] sys_time+0x16/0x50
Sep 13 18:51:24 evo800 kernel: [work_resched+5/22] work_resched+0x5/0x16
or
Sep 13 18:51:24 evo800 kernel: Warning: kfree_skb on hard IRQ 000000d8
Sep 13 18:51:24 evo800 kernel: bad: scheduling while atomic!
Sep 13 18:51:24 evo800 kernel: [schedule+1208/1213] schedule+0x4b8/0x4bd
Sep 13 18:51:24 evo800 kernel: [schedule_timeout+96/179] schedule_timeout+0x60/0xb3
Sep 13 18:51:24 evo800 kernel: [__get_free_pages+31/59] __get_free_pages+0x1f/0x3b
Sep 13 18:51:24 evo800 kernel: [process_timeout+0/5] process_timeout+0x0/0x5
Sep 13 18:51:24 evo800 kernel: [do_select+394/698] do_select+0x18a/0x2ba
Sep 13 18:51:24 evo800 kernel: [__pollwait+0/192] __pollwait+0x0/0xc0
Sep 13 18:51:24 evo800 kernel: [print_context_stack+35/93] print_context_stack+0x23/0x5d
Sep 13 18:51:24 evo800 kernel: [sys_select+670/1176] sys_select+0x29e/0x498
Sep 13 18:51:24 evo800 kernel: [sys_time+22/80] sys_time+0x16/0x50
Sep 13 18:51:24 evo800 kernel: [syscall_call+7/11] syscall_call+0x7/0xb
[3.] Keywords (i.e., modules, networking, kernel):
Modules Loaded nfs esp4 nfsd exportfs lockd sunrpc nsc_ircc
ipt_state iptable_filter iptable_nat crypto_null microcode ehci_hcd
ohci_hcd floppy irtty_sir sir_dev irda crc_ccitt 8250_pnp khazad
twofish sha512 sha256 sha1 serpent md5 md4 des deflate zlib_deflate
zlib_inflate cast6 cast5 blowfish arc4 aes_i586 xfrm_user
ip_conntrack_irc ip_conntrack_ftp ip_conntrack ip_tables ide_cd cdrom
8250 serial_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi
snd_seq_device snd soundcore yenta_socket radeon intel_agp agpgart
[4.] Kernel version (from /proc/version):
see (http://www.gemtek.lt/~zilvinas/oops/
[7.1.] Software (add the output of the ver_linux script here)
sh scripts/ver_linux
Linux swoop 2.6.9-rc1 #1 Wed Aug 25 10:52:32 EEST 2004 i686 GNU/Linux
Gnu C 3.3.4
Gnu make 3.80
binutils 2.15
util-linux 2.12
mount 2.12
module-init-tools 3.1-pre5
e2fsprogs 1.35
reiserfsprogs 3.6.18
reiser4progs line
xfsprogs 2.6.20
pcmcia-cs 3.2.5
nfs-utils 1.0.6
Linux C Library 2.3.2
Dynamic linker (ldd) 2.3.2
Procps 3.2.3
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 5.2.1
[7.2.] Processor information (from /proc/cpuinfo):
see http://www.gemtek.lt/~zilvinas/oops/
Thank you
Zilvinas Valinskas wrote:
> [1.] One line summary of the problem:
> Applied patch-2.6.9.bz2 on top of linux 2.6.8 tree, reboot -
> then suddenly laptop frozes.
>
> [2.] Full description of the problem/report:
> It is the same .config used to compile 2.6.9 rc1 and 2.6.9 rc2
> (http://www.gemtek.lt/~zilvinas/oops/ for kern.log and .config).
> Laptop booted as usual, logged in KDE and started up evolution -
> mouse froze, keyboard seemed dead - although sysrq-s, sysrq-u &
> sysrq-b worked just fine. After reboot I found a lot of messages
> repeated like :
>
> Sep 13 18:51:24 evo800 kernel: Warning: kfree_skb on hard IRQ 000000d8
> Sep 13 18:51:24 evo800 kernel: bad: scheduling while atomic!
> Sep 13 18:51:24 evo800 kernel: [schedule+1208/1213] schedule+0x4b8/0x4bd
> Sep 13 18:51:24 evo800 kernel: [sys_time+22/80] sys_time+0x16/0x50
> Sep 13 18:51:24 evo800 kernel: [work_resched+5/22] work_resched+0x5/0x16
>
> or
>
> Sep 13 18:51:24 evo800 kernel: Warning: kfree_skb on hard IRQ 000000d8
> Sep 13 18:51:24 evo800 kernel: bad: scheduling while atomic!
> Sep 13 18:51:24 evo800 kernel: [schedule+1208/1213] schedule+0x4b8/0x4bd
> Sep 13 18:51:24 evo800 kernel: [schedule_timeout+96/179] schedule_timeout+0x60/0xb3
> Sep 13 18:51:24 evo800 kernel: [__get_free_pages+31/59] __get_free_pages+0x1f/0x3b
> Sep 13 18:51:24 evo800 kernel: [process_timeout+0/5] process_timeout+0x0/0x5
> Sep 13 18:51:24 evo800 kernel: [do_select+394/698] do_select+0x18a/0x2ba
> Sep 13 18:51:24 evo800 kernel: [__pollwait+0/192] __pollwait+0x0/0xc0
> Sep 13 18:51:24 evo800 kernel: [print_context_stack+35/93] print_context_stack+0x23/0x5d
> Sep 13 18:51:24 evo800 kernel: [sys_select+670/1176] sys_select+0x29e/0x498
> Sep 13 18:51:24 evo800 kernel: [sys_time+22/80] sys_time+0x16/0x50
> Sep 13 18:51:24 evo800 kernel: [syscall_call+7/11] syscall_call+0x7/0xb
>
>
> [3.] Keywords (i.e., modules, networking, kernel):
> Modules Loaded nfs esp4 nfsd exportfs lockd sunrpc nsc_ircc
> ipt_state iptable_filter iptable_nat crypto_null microcode ehci_hcd
> ohci_hcd floppy irtty_sir sir_dev irda crc_ccitt 8250_pnp khazad
> twofish sha512 sha256 sha1 serpent md5 md4 des deflate zlib_deflate
> zlib_inflate cast6 cast5 blowfish arc4 aes_i586 xfrm_user
> ip_conntrack_irc ip_conntrack_ftp ip_conntrack ip_tables ide_cd cdrom
> 8250 serial_core snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
> snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi
> snd_seq_device snd soundcore yenta_socket radeon intel_agp agpgart
I'm totally blind, because I don't see your network driver in that big
list of modules.
Your network driver should probably be doing dev_kfree_skb_any()
somewhere, but isn't.
Jeff
On Mon, Sep 13, 2004 at 01:12:13PM -0400, Jeff Garzik wrote:
> I'm totally blind, because I don't see your network driver in that big
> list of modules.
>
> Your network driver should probably be doing dev_kfree_skb_any()
> somewhere, but isn't.
>
> Jeff
>
It is compiled in, see :
CONFIG_E100=y
CONFIG_E100_NAPI=y
Can it be IPsec related ?
On Mon, Sep 13, 2004 at 01:12:13PM -0400, Jeff Garzik wrote:
> I'm totally blind, because I don't see your network driver in that big
> list of modules.
>
> Your network driver should probably be doing dev_kfree_skb_any()
> somewhere, but isn't.
>
> Jeff
it e100 network driver, compiled in.
CONFIG_E100=y
CONFIG_E100_NAPI=y
@see http://www.gemtek.lt/~zilvinas/oops/
there are full boot logs for 2.6.9 rc1 and rc2.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
Am Mo, den 13.09.2004 schrieb Zilvinas Valinskas um 19:16:
> On Mon, Sep 13, 2004 at 01:12:13PM -0400, Jeff Garzik wrote:
> > I'm totally blind, because I don't see your network driver in that big
> > list of modules.
> >
> > Your network driver should probably be doing dev_kfree_skb_any()
> > somewhere, but isn't.
> >
> > Jeff
> >
> It is compiled in, see :
>
> CONFIG_E100=y
> CONFIG_E100_NAPI=y
>
> Can it be IPsec related ?
I got a similar problem here, I am running 2.6.9-rc2 with acpi patch. I
got an e1000, ipsec is compiled in, modules loaded, racoon started but
no tunnels configured.
The system freezes when I type apt-get update, in the moment apt-get
tries to connect all the mirrors or resolves them.
I did not see any messages, sysrq was not compiled in, so I cannot check
if it still works.
On Wed, Sep 15, 2004 at 10:25:50AM +0200, Erik Tews wrote:
> Am Mo, den 13.09.2004 schrieb Zilvinas Valinskas um 19:16:
> > On Mon, Sep 13, 2004 at 01:12:13PM -0400, Jeff Garzik wrote:
> > > I'm totally blind, because I don't see your network driver in that big
> > > list of modules.
> > >
> > > Your network driver should probably be doing dev_kfree_skb_any()
> > > somewhere, but isn't.
> > >
> > > Jeff
> > >
> > It is compiled in, see :
> >
> > CONFIG_E100=y
> > CONFIG_E100_NAPI=y
> >
> > Can it be IPsec related ?
>
> I got a similar problem here, I am running 2.6.9-rc2 with acpi patch. I
> got an e1000, ipsec is compiled in, modules loaded, racoon started but
> no tunnels configured.
>
> The system freezes when I type apt-get update, in the moment apt-get
> tries to connect all the mirrors or resolves them.
That is the first impression I've got. When I rebooted back to 2.6.9-rc1
I went through /var/log/kern.log and found messages I sent earlier.
>
> I did not see any messages, sysrq was not compiled in, so I cannot check
> if it still works.
In my cases, I've got a DHCP enabled, racoon running. If I set up policies
via script :
#!/usr/sbin/setkey -f
flush;
spdflush;
spdadd 0.0.0.0 0.0.0.0[500] udp -P out none;
spdadd 0.0.0.0[500] 0.0.0.0 udp -P in none;
spdadd 192.168.3.3 192.168.3.2 any -P out ipsec
esp/transport//require;
spdadd 192.168.3.2 192.168.3.3 any -P in ipsec
esp/transport//require;
Mine laptop ip address is 192.168.3.3, and if I have 192.168.3.2
connecting my laptop freezes ... Last linux kernel I used was 2.6.9-rc1-bk16
and it was ok. 2.6.9-rc2 freezes laptop ...
Perhaps that is mixture of PREEMPT=y and ipsec ? dunno ...
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Wed, 15 Sep 2004, Zilvinas Valinskas wrote:
>Perhaps that is mixture of PREEMPT=y and ipsec ? dunno ...
No mixture necessary. PREEMPT is uber-screwed up. Try rebuilding your
kernel/modules with it disabled. (make clean first; the kernel deps don't
track CONFIG_PREEMPT correctly.)
--Ricky
On Wed, 2004-09-15 at 10:55, Ricky Beam wrote:
> On Wed, 15 Sep 2004, Zilvinas Valinskas wrote:
> >Perhaps that is mixture of PREEMPT=y and ipsec ? dunno ...
>
> No mixture necessary. PREEMPT is uber-screwed up. Try rebuilding your
> kernel/modules with it disabled. (make clean first; the kernel deps don't
> track CONFIG_PREEMPT correctly.)
Um, PREEMPT works just fine. Anything that breaks on PREEMPT will also
break on SMP. And the kernel deps do track CONFIG_PREEMPT correctly.
Maybe you are doing it wrong.
Lee
Lee Revell wrote:
> On Wed, 2004-09-15 at 10:55, Ricky Beam wrote:
>
>>On Wed, 15 Sep 2004, Zilvinas Valinskas wrote:
>>
>>>Perhaps that is mixture of PREEMPT=y and ipsec ? dunno ...
>>
>>No mixture necessary. PREEMPT is uber-screwed up. Try rebuilding your
>>kernel/modules with it disabled. (make clean first; the kernel deps don't
>>track CONFIG_PREEMPT correctly.)
>
>
> Um, PREEMPT works just fine. Anything that breaks on PREEMPT will also
> break on SMP. And the kernel deps do track CONFIG_PREEMPT correctly.
PREEMPT is a hack. I do not recommend using it on production servers.
Jeff
Lee Revell wrote:
> Anyway, if you are running anything on your server that breaks under
> PREEMPT, it will break anyway as soon as you add another processor.
Incorrect. The spinlock behavior is very different.
That's why we had net stack problems in the past under preempt but not
under SMP.
Jeff
On Wed, 2004-09-15 at 11:58, Jeff Garzik wrote:
> Lee Revell wrote:
> > On Wed, 2004-09-15 at 10:55, Ricky Beam wrote:
> >
> >>On Wed, 15 Sep 2004, Zilvinas Valinskas wrote:
> >>
> >>>Perhaps that is mixture of PREEMPT=y and ipsec ? dunno ...
> >>
> >>No mixture necessary. PREEMPT is uber-screwed up. Try rebuilding your
> >>kernel/modules with it disabled. (make clean first; the kernel deps don't
> >>track CONFIG_PREEMPT correctly.)
> >
> >
> > Um, PREEMPT works just fine. Anything that breaks on PREEMPT will also
> > break on SMP. And the kernel deps do track CONFIG_PREEMPT correctly.
>
>
> PREEMPT is a hack. I do not recommend using it on production servers.
>
Not every Linux machine is a server. Just because you can't bang a
square peg through a round hole does not mean the peg is defective.
Anyway, if you are running anything on your server that breaks under
PREEMPT, it will break anyway as soon as you add another processor.
Lee
On Wed, Sep 15, 2004 at 12:06:48PM -0400, Lee Revell wrote:
> Anyway, if you are running anything on your server that breaks under
> PREEMPT, it will break anyway as soon as you add another processor.
Wrong. Code can be SMP safe but not preempt safe.
This is why we have get_cpu()/put_cpu(), and
preempt_disable()/preempt_enable() pairs around certain parts of code.
Anything using per-CPU data like MSRs for example needs explicit
protection against preemption.
Dave
On Wed, 15 Sep 2004, Jeff Garzik wrote:
>Lee Revell wrote:
>> Anyway, if you are running anything on your server that breaks under
>> PREEMPT, it will break anyway as soon as you add another processor.
>
>Incorrect. The spinlock behavior is very different.
Indeed. Enable PREEMPT (my default for some time now) and the machine
will lockup after spewing pages of scheduling while atomic's. Disable
PREEMPT and the machine is stable again:
[jfbeam:pts/2{2}]gir:~/[12:55pm]:uname -a
Linux gir [email protected] #71 SMP BK[20040914173940] Tue Sep 14 16:14:33 EDT 2004 i686 athlon i386 GNU/Linux
[jfbeam:pts/2{2}]gir:~/[12:55pm]:uptime
12:55pm up 19:54, 2 users, load average: 0.01, 0.02, 0.00
[jfbeam:pts/2{2}]gir:~/[12:55pm]:grep ^proc /proc/cpuinfo
processor : 0
processor : 1
--Ricky
On Wed, 2004-09-15 at 12:58, Ricky Beam wrote:
> On Wed, 15 Sep 2004, Jeff Garzik wrote:
> >Lee Revell wrote:
> >> Anyway, if you are running anything on your server that breaks under
> >> PREEMPT, it will break anyway as soon as you add another processor.
> >
> >Incorrect. The spinlock behavior is very different.
>
> Indeed. Enable PREEMPT (my default for some time now) and the machine
> will lockup after spewing pages of scheduling while atomic's. Disable
> PREEMPT and the machine is stable again:
>
Interesting. Still, this looks like a specific bug that needs fixing,
it doesn't imply that preemption is a hack. For many workloads
preemption is a necessity.
Lee
Lee Revell wrote:
> Interesting. Still, this looks like a specific bug that needs fixing,
> it doesn't imply that preemption is a hack. For many workloads
> preemption is a necessity.
For any workload that you feel preemption is a necessity, that indicates
a latency problem in the kernel that should be solved.
Preemption is a hack that hides broken drivers, IMHO.
I would rather directly address any latency problems that appear.
Jeff
On Wed, 2004-09-15 at 13:52, Jeff Garzik wrote:
> Lee Revell wrote:
> > Interesting. Still, this looks like a specific bug that needs fixing,
> > it doesn't imply that preemption is a hack. For many workloads
> > preemption is a necessity.
>
>
> For any workload that you feel preemption is a necessity, that indicates
> a latency problem in the kernel that should be solved.
>
> Preemption is a hack that hides broken drivers, IMHO.
>
> I would rather directly address any latency problems that appear.
>
Please explain. I was under the impression that there was a 1:1
correspondence between latency problems and long non-preemptible code
paths. The latency problem is solved by making the code path
preemptible.
How else are you going to schedule in the high priority process quickly
if you don't preempt something?
Lee
Jeff Garzik wrote:
> Lee Revell wrote:
>
>> Interesting. Still, this looks like a specific bug that needs fixing,
>> it doesn't imply that preemption is a hack. For many workloads
>> preemption is a necessity.
>
>
>
> For any workload that you feel preemption is a necessity, that
> indicates a latency problem in the kernel that should be solved.
>
> Preemption is a hack that hides broken drivers, IMHO.
>
> I would rather directly address any latency problems that appear.
Current preempt is broken, sure. But having robust preempt
would allow code simplification. Long loops outside critical
sections would be ok - no time or code spent testing for a need for
rescheduling because you'll be preempted when necessary anyway.
Or am I missing something? Other than that current preempt isn't up to
this and might be hard to get there?
Helge Hafting
On Thu, Sep 16, 2004 at 10:39:10AM +0200, Helge Hafting wrote:
> Jeff Garzik wrote:
>
> >Lee Revell wrote:
> >
> >>Interesting. Still, this looks like a specific bug that needs fixing,
> >>it doesn't imply that preemption is a hack. For many workloads
> >>preemption is a necessity.
> >
> >
> >
> >For any workload that you feel preemption is a necessity, that
> >indicates a latency problem in the kernel that should be solved.
> >
> >Preemption is a hack that hides broken drivers, IMHO.
> >
> >I would rather directly address any latency problems that appear.
>
> Current preempt is broken, sure. But having robust preempt
> would allow code simplification. Long loops outside critical
> sections would be ok - no time or code spent testing for a need for
> rescheduling because you'll be preempted when necessary anyway.
Could be the case. This morning I've turned off PREEMPT support in
linux 2.6.9-rc2 kernel, booted just fine, ran apt-get update ... it
seemed everything is ok.
Then setup IPsec policies, ping remote end, racoon has tried to negotiate
with a remote end and ... laptop freezes again (this time without
PREEMPT).
At a time I was in X, couldn't capture the OOPS, after reboot
/var/log/kern.log is empty ... :(
Doesn't seem it is PREEMPT related I think now.
>
> Or am I missing something? Other than that current preempt isn't up to
> this and might be hard to get there?
>
> Helge Hafting
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, Sep 17, 2004 at 11:05:04AM +0300, Zilvinas Valinskas wrote:
> On Thu, Sep 16, 2004 at 10:39:10AM +0200, Helge Hafting wrote:
> > Jeff Garzik wrote:
> >
> > >Lee Revell wrote:
> > >
> > >>Interesting. Still, this looks like a specific bug that needs fixing,
> > >>it doesn't imply that preemption is a hack. For many workloads
> > >>preemption is a necessity.
> > >
> > >
> > >
> > >For any workload that you feel preemption is a necessity, that
> > >indicates a latency problem in the kernel that should be solved.
> > >
> > >Preemption is a hack that hides broken drivers, IMHO.
> > >
> > >I would rather directly address any latency problems that appear.
> >
> > Current preempt is broken, sure. But having robust preempt
> > would allow code simplification. Long loops outside critical
> > sections would be ok - no time or code spent testing for a need for
> > rescheduling because you'll be preempted when necessary anyway.
>
> Could be the case. This morning I've turned off PREEMPT support in
> linux 2.6.9-rc2 kernel, booted just fine, ran apt-get update ... it
> seemed everything is ok.
>
> Then setup IPsec policies, ping remote end, racoon has tried to negotiate
> with a remote end and ... laptop freezes again (this time without
> PREEMPT).
>
> At a time I was in X, couldn't capture the OOPS, after reboot
> /var/log/kern.log is empty ... :(
Here is backtrace (with PREEMPT turned off) :
bad: scheduling while atomic!
[<c030cd3e>] schedule+-0x446/0x44b
[<c010595b>] do_IRQ+0xdd/0x14b
[<c0103d36>] work_resched+0x5/0x16
this backtrace is repeated 4x times
bad: scheduling while atomic!
[<c030cd3e>] schedule+0x446/0x44b
[<c0112f82>] sys_sched_yield+0x45/0x57
[<c014ceaa>] coredump_wait+0x32/0x97
[<c014cfd7>] do_coredump+0xc8/0x189
[<c0256b44>] complement_pos+0x1e/0x16e
[<c011cb13>] __dequeue_signal+0xc2/0x154
[<c011cbc8>] dequeue_signal+0x23/0x75
[<c011e12e>] get_signal_to_deliver+0x1d4/0x2c0
[<c0103b04>] do_signal+0x8e/0x10d
[<c010595b>] do_IRQ+0xdd/0x14b
[<c0103e7c>] common_interrupt+0x18/0x20
[<c030cb76>] schedule+0x27e/0x44b
[<c010595b>] do_IRQ+0xdd/0x14b
[<c0110f98>] do_page_fault+0x0/0x544
[<c0103bb8>] do_notify_resume+0x35/0x39
[<c0103d5a>] work_notifysig+0x13/0x15
Kernel panic - not syncing: Aiee, killing interrupt handler!
Any ideas ?
>
> Doesn't seem it is PREEMPT related I think now.
> >
> > Or am I missing something? Other than that current preempt isn't up to
> > this and might be hard to get there?
> >
> > Helge Hafting
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/