2004-04-02 10:21:54

by Marco Fais

[permalink] [raw]
Subject: kernel BUG at page_alloc.c:98 -- compiling with distcc

Hi!


[1.] Kernel panic while using distcc

[2.] I have 5-6 development linux systems that we use without problem
under a normal development workload. Trying distcc for speeding up
compilation, we have a fully reproducible kernel panic in a very short
time (seconds after compilation start). The kernel panic happens *only*
when the systems are "remotely controlled" (the distcc daemon is
receiving source files from remote systems, compile and send back
compiled objects). When compiling with distcc the local system doesn't
show any kernel panic, while the same system used as a "remote compiler
system" dies very quickly.

[3.] Keywords: distcc BUG page_alloc.c

[4.] Linux version 2.4.25 (root@test1) (gcc version 3.2 20020903 (Red
Hat Linux 8.0 3.2-7)) #1 mer mar 31 10:28:36 CEST 2004

[5.]
ksymoops 2.4.5 on i686 2.4.25. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.25/ (default)
-m /boot/System.map-2.4.25 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

kernel BUG at page_alloc.c:98!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c01372ae>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000002 ebx: c14b3f00 ecx: c14b3f00 edx: 00000000
esi: 00000000 edi: dec11340 ebp: c02f1d04 esp: c02f1cd4
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c02f1000)
Stack: ddd46000 c02f1cfc c0135a76 c158f6f0 de9bcdf4 ddd45800 de9bcdf4
005207dc
ddd45800 00000001 dec4d894 dec11340 c02f1d18 c021667b 00000282
dec4d894
dec4d8c4 c02f1d2c c02166b4 dec4d894 dec4d894 dec4d894 c02f1d44
c0216816
Call Trace: [<c0135a76>] [<c021667b>] [<c02166b4>] [<c0216816>]
[<c023be39>]
[<c023c385>] [<c023f51c>] [<c02465a9>] [<c0246a76>] [<c022dad0>]
[<c022dc25>]
[<c0222780>] [<c022dad0>] [<c022d88f>] [<c022dad0>] [<c022de3a>]
[<e08d7eab>]
[<c021ad14>] [<c021ae3f>] [<c021af5a>] [<c0121cd7>] [<c010a66d>]
[<c01070a0>]
[<c010cb58>] [<c01070a0>] [<c01070c6>] [<c0107142>] [<c0105000>]
Code: 0f 0b 62 00 f7 60 27 c0 e9 ad fd ff ff 90 8d 74 26 00 55 89


> >EIP; c01372ae <__free_pages_ok+26e/280> <=====

> >ebx; c14b3f00 <_end+116e728/204d48a8>
> >ecx; c14b3f00 <_end+116e728/204d48a8>
> >edi; dec11340 <_end+1e8cbb68/204d48a8>
> >ebp; c02f1d04 <init_task_union+1d04/2000>
> >esp; c02f1cd4 <init_task_union+1cd4/2000>

Trace; c0135a76 <kmem_cache_free_one+f6/210>
Trace; c021667b <skb_release_data+6b/90>
Trace; c02166b4 <kfree_skbmem+14/70>
Trace; c0216816 <__kfree_skb+106/160>
Trace; c023be39 <tcp_clean_rtx_queue+139/330>
Trace; c023c385 <tcp_ack+c5/380>
Trace; c023f51c <tcp_rcv_state_process+19c/a90>
Trace; c02465a9 <tcp_v4_do_rcv+a9/130>
Trace; c0246a76 <tcp_v4_rcv+446/560>
Trace; c022dad0 <ip_local_deliver_finish+0/180>
Trace; c022dc25 <ip_local_deliver_finish+155/180>
Trace; c0222780 <nf_hook_slow+b0/170>
Trace; c022dad0 <ip_local_deliver_finish+0/180>
Trace; c022d88f <ip_local_deliver+4f/70>
Trace; c022dad0 <ip_local_deliver_finish+0/180>
Trace; c022de3a <ip_rcv_finish+1ea/270>
Trace; e08d7eab <[8139too]rtl8139_rx_interrupt+6b/3b0>
Trace; c021ad14 <netif_receive_skb+c4/180>
Trace; c021ae3f <process_backlog+6f/120>
Trace; c021af5a <net_rx_action+6a/100>
Trace; c0121cd7 <do_softirq+97/a0>
Trace; c010a66d <do_IRQ+bd/f0>
Trace; c01070a0 <default_idle+0/30>
Trace; c010cb58 <call_do_IRQ+5/d>
Trace; c01070a0 <default_idle+0/30>
Trace; c01070c6 <default_idle+26/30>
Trace; c0107142 <cpu_idle+42/60>
Trace; c0105000 <_stext+0/0>

Code; c01372ae <__free_pages_ok+26e/280>
00000000 <_EIP>:
Code; c01372ae <__free_pages_ok+26e/280> <=====
0: 0f 0b ud2a <=====
Code; c01372b0 <__free_pages_ok+270/280>
2: 62 00 bound %eax,(%eax)
Code; c01372b2 <__free_pages_ok+272/280>
4: f7 60 27 mull 0x27(%eax)
Code; c01372b5 <__free_pages_ok+275/280>
7: c0 e9 ad shr $0xad,%cl
Code; c01372b8 <__free_pages_ok+278/280>
a: fd std
Code; c01372b9 <__free_pages_ok+279/280>
b: ff (bad)
Code; c01372ba <__free_pages_ok+27a/280>
c: ff 90 8d 74 26 00 call *0x26748d(%eax)
Code; c01372c0 <rmqueue+0/230>
12: 55 push %ebp
Code; c01372c1 <rmqueue+1/230>
13: 89 00 mov %eax,(%eax)

<0>Kernel panic: Aiee, killing interrupt handler!

1 warning issued. Results may not be reliable.


[6.] Launch distccd --daemon on the affected system, then on the remote
host, set DISTCC_HOSTS="<problematic remote system>" and launch, for
example, a kernel compile: make -j2 CC=distcc bzImage.

[7.] All system are AthlonXP 2.6+, on a VIA KT400 chipset (various
motherboard vendors). All using EXT3 filesystems, with various redhat
distributions (8.0, 9, RHEL3 -- not using NPTL)

[7.1.]
Gnu C 3.2
Gnu make 3.79.1
util-linux 2.11r
mount 2.11r
modutils 2.4.18
e2fsprogs 1.27
jfsutils 1.0.17
reiserfsprogs 3.6.2
pcmcia-cs 3.1.31
quota-tools 3.06.
PPP 2.4.1
isdn4k-utils 3.1pre4
Linux C Library 2.3.2
Dynamic linker (ldd) 2.3.2
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0.12

[7.2.] Processor information (from /proc/cpuinfo):

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 8
model name : AMD Athlon(tm) XP 2600+
stepping : 1
cpu MHz : 2075.355
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips : 4141.87

[7.3.] Module information (from /proc/modules):

Module Size Used by Not tainted
nfs 78968 1 (autoclean)
binfmt_misc 7304 1
nfsd 80304 8 (autoclean)
lockd 58480 1 (autoclean) [nfs nfsd]
sunrpc 84188 1 (autoclean) [nfs nfsd lockd]
8139too 19784 2
mii 3944 0 [8139too]
crc32 3680 0 [8139too]
iptable_filter 2412 0 (autoclean) (unused)
ip_tables 15392 1 [iptable_filter]
ohci1394 33608 0 (unused)
ieee1394 64676 0 [ohci1394]
mousedev 5428 0 (unused)
keybdev 3072 0 (unused)
input 5824 0 [mousedev keybdev]
hid 12248 0 (unused)
rtc 8764 0 (autoclean)

[7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)

0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial(auto)
0cf8-0cff : PCI conf1
c000-c0ff : Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
c000-c0ff : 8139too
c400-c4ff : Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (#2)
c400-c4ff : 8139too
d400-d47f : VIA Technologies, Inc. IEEE 1394 Host Controller
d800-d8ff : C-Media Electronics Inc CM8738
dc00-dc1f : VIA Technologies, Inc. USB
e000-e01f : VIA Technologies, Inc. USB (#2)
e400-e41f : VIA Technologies, Inc. USB (#3)
e800-e80f : VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT823x/A/C
PIPC Bus Master IDE
e800-e807 : ide0
e808-e80f : ide1

00000000-0009ffff : System RAM
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-1ffeffff : System RAM
00100000-002667cb : Kernel code
002667cc-002ef563 : Kernel data
1fff0000-1fff2fff : ACPI Non-volatile Storage
1fff3000-1fffffff : ACPI Tables
d0000000-dfffffff : PCI Bus #01
d0000000-d7ffffff : nVidia Corporation NV17 [GeForce4 MX 440]
d8000000-d807ffff : nVidia Corporation NV17 [GeForce4 MX 440]
e0000000-e3ffffff : VIA Technologies, Inc. VT8377 [KT400 AGP] Host Bridge
e4000000-e5ffffff : PCI Bus #01
e4000000-e4ffffff : nVidia Corporation NV17 [GeForce4 MX 440]
e6020000-e60200ff : Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (#2)
e6020000-e60200ff : 8139too
e6022000-e60220ff : Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+
e6022000-e60220ff : 8139too
e6023000-e60237ff : VIA Technologies, Inc. IEEE 1394 Host Controller
e6023000-e60237ff : ohci1394
e6024000-e60240ff : VIA Technologies, Inc. USB 2.0
fec00000-fec00fff : reserved
fee00000-fee00fff : reserved
ffff0000-ffffffff : reserved

[7.5.] PCI information

00:00.0 Host bridge: VIA Technologies, Inc.: Unknown device 3189
Subsystem: VIA Technologies, Inc.: Unknown device 3189
Flags: bus master, 66Mhz, medium devsel, latency 8
Memory at e0000000 (32-bit, prefetchable) [size=64M]
Capabilities: [a0] AGP version 2.0
Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc.: Unknown device b168 (prog-if
00 [Normal decode])
Flags: bus master, 66Mhz, medium devsel, latency 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
Memory behind bridge: e4000000-e5ffffff
Prefetchable memory behind bridge: d0000000-dfffffff
Capabilities: [80] Power Management version 2

00:09.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Flags: bus master, medium devsel, latency 32, IRQ 17
I/O ports at c000 [size=256]
Memory at e6022000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL-8139/8139C/8139C+ (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at c400 [size=256]
Memory at e6020000 (32-bit, non-prefetchable) [size=256]
Capabilities: [50] Power Management version 2

00:0e.0 FireWire (IEEE 1394): VIA Technologies, Inc. IEEE 1394 Host
Controller (rev 46) (prog-if 10 [OHCI])
Subsystem: Biostar Microtech Int'l Corp: Unknown device 4200
Flags: bus master, medium devsel, latency 32, IRQ 18
Memory at e6023000 (32-bit, non-prefetchable) [size=2K]
I/O ports at d400 [size=128]
Capabilities: [50] Power Management version 2

00:0f.0 Multimedia audio controller: C-Media Electronics Inc CM8738 (rev 10)
Subsystem: Biostar Microtech Int'l Corp: Unknown device 8738
Flags: bus master, medium devsel, latency 32, IRQ 19
I/O ports at d800 [size=256]
Capabilities: [c0] Power Management version 2

00:10.0 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00
[UHCI])
Subsystem: VIA Technologies, Inc. USB
Flags: bus master, medium devsel, latency 32, IRQ 21
I/O ports at dc00 [size=32]
Capabilities: [80] Power Management version 2

00:10.1 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00
[UHCI])
Subsystem: VIA Technologies, Inc. USB
Flags: bus master, medium devsel, latency 32, IRQ 21
I/O ports at e000 [size=32]
Capabilities: [80] Power Management version 2

00:10.2 USB Controller: VIA Technologies, Inc. USB (rev 80) (prog-if 00
[UHCI])
Subsystem: VIA Technologies, Inc. USB
Flags: bus master, medium devsel, latency 32, IRQ 21
I/O ports at e400 [size=32]
Capabilities: [80] Power Management version 2

00:10.3 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 82) (prog-if
20 [EHCI])
Subsystem: VIA Technologies, Inc. USB 2.0
Flags: bus master, medium devsel, latency 32, IRQ 19
Memory at e6024000 (32-bit, non-prefetchable) [size=256]
Capabilities: [80] Power Management version 2

00:11.0 ISA bridge: VIA Technologies, Inc. VT8233A ISA Bridge
Subsystem: VIA Technologies, Inc. VT8233A ISA Bridge
Flags: bus master, stepping, medium devsel, latency 0
Capabilities: [c0] Power Management version 2

00:11.1 IDE interface: VIA Technologies, Inc. VT82C586B PIPC Bus Master
IDE (rev 06) (prog-if 8a [Master SecP PriP])
Subsystem: VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at e800 [size=16]
Capabilities: [c0] Power Management version 2

01:00.0 VGA compatible controller: nVidia Corporation NV17 [GeForce4
MX440] (rev a3) (prog-if 00 [VGA])
Flags: bus master, 66Mhz, medium devsel, latency 32, IRQ 16
Memory at e4000000 (32-bit, non-prefetchable) [size=16M]
Memory at d0000000 (32-bit, prefetchable) [size=128M]
Memory at d8000000 (32-bit, prefetchable) [size=512K]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [60] Power Management version 2
Capabilities: [44] AGP version 2.0

CPU0
0: 292363 IO-APIC-edge timer
1: 3 IO-APIC-edge keyboard
2: 0 XT-PIC cascade
8: 1 IO-APIC-edge rtc
12: 0 XT-PIC PS/2 Mouse
14: 8958 IO-APIC-edge ide0
15: 4 IO-APIC-edge ide1
17: 6482 IO-APIC-level eth0
18: 2 IO-APIC-level ohci1394
19: 28 IO-APIC-level eth1
NMI: 0
LOC: 292280
ERR: 0
MIS: 0

[7.7.] Other information that might be relevant to the problem

Other systems (DL-360G3 dual Xeon 2.8 GHz, RHEL3, SMP or UP kernel)
doesn't show the problem.





2004-04-02 13:16:16

by Marco Roeland

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Friday April 2nd 2004 Marco Fais wrote:

> [...]

> When compiling with distcc the local system doesn't show any kernel
> panic, while the same system used as a "remote compiler system" dies
> very quickly.

> >>EIP; c01372ae <__free_pages_ok+26e/280> <=====
> ...
> Trace; e08d7eab <[8139too]rtl8139_rx_interrupt+6b/3b0>

> <0>Kernel panic: Aiee, killing interrupt handler!

>From a very superficial examination of your data, it looks like there is
something going wrong in the interrupt handling of the driver for (one
of) the network cards.

Distcc can generate a lot of network traffic. You might experiment with
switching the role of the two network cards (in case there might be
something wrong with the hardware of one of them) or use the '--listen'
directive in the distccd configuration to do so.

If the panic is indeed caused by the network driver, then it should also
be possible to trigger and debug this with a tool like netcat (listen on
the panicking box with 'nc -l someport' and send some stuff from another
box ('cat /dev/zero | nc panicker someport' or vice versa).

Sadly, nothing of this will solve your problem of course, but it might
pinpoint the cause somewhat more accurately, leading hopefully to a
solution!
--
Marco Roeland

2004-04-02 15:06:00

by Marco Roeland

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Friday April 2nd 2004 Marco Fais wrote:

> Mmmh, all the servers use an RTL-8139 compatible card, with the same
> 8139too driver. So this can be the problem.

Hey, I'm by no means an expert. Suggesting the driver is to blame was
mostly based on the fact that compiling locally worked, and from a
remote machine triggered a panick. The rest of your description below
indicates that it probably *isnt't* the driver.

> But in this moment I'm doing a kernel compile while receiving and sending
> huge amounts of data using netcat, as you suggested... and works perfectly.

> Ok, next I will test the second network card on the server, just to avoid
> the possibility of an hardware failure -- but I have other 4 servers that
> show the same behaviour, so I don't think it's caused by faulty hardware.

If 4 other servers show the same behaviour, and netcatting a lot of data
doesn't panick the machine, that highly suggests that the network card
and driver are innocent! I thought only one machine had the problem.

> Running this test for about an hour, using all the available bandwidth on
> the NIC, while compiling the kernel in a loop... no problem. Using distcc,
> compiling the same files, cause a kernel panic in a few seconds.
> So this test doesn't show the problem, but I think that anyway the network
> card driver (or the hardware) is involved.

Why do you think so, it seems there's nothing wrong with it; you've just
tested that?

One last suggestion:

Have you tried a local distcc compile, but specifying the host name as
it's IP address or its real name. Distcc treats 'localhost' differently,
but if it sees an IP address it will use the network route. As specified
in the man page this is slower, but if there's something peculiar with
the interaction of distcc with the network layer, then perhaps this
triggers it. You can also use the '--verbose' option on distccd, perhaps
it reports something useful before panicking.
--
Marco Roeland

2004-04-02 23:34:24

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc


(linux-2.4.25)

Marco Fais <[email protected]> wrote:
>
> kernel BUG at page_alloc.c:98!
>

uh-oh.

>
> > >EIP; c01372ae <__free_pages_ok+26e/280> <=====
>
> > >ebx; c14b3f00 <_end+116e728/204d48a8>
> > >ecx; c14b3f00 <_end+116e728/204d48a8>
> > >edi; dec11340 <_end+1e8cbb68/204d48a8>
> > >ebp; c02f1d04 <init_task_union+1d04/2000>
> > >esp; c02f1cd4 <init_task_union+1cd4/2000>
>
> Trace; c0135a76 <kmem_cache_free_one+f6/210>
> Trace; c021667b <skb_release_data+6b/90>
> Trace; c02166b4 <kfree_skbmem+14/70>
> Trace; c0216816 <__kfree_skb+106/160>
> Trace; c023be39 <tcp_clean_rtx_queue+139/330>
> Trace; c023c385 <tcp_ack+c5/380>
> Trace; c023f51c <tcp_rcv_state_process+19c/a90>
> Trace; c02465a9 <tcp_v4_do_rcv+a9/130>
> Trace; c0246a76 <tcp_v4_rcv+446/560>
> Trace; c022dad0 <ip_local_deliver_finish+0/180>
> Trace; c022dc25 <ip_local_deliver_finish+155/180>
> Trace; c0222780 <nf_hook_slow+b0/170>
> Trace; c022dad0 <ip_local_deliver_finish+0/180>
> Trace; c022d88f <ip_local_deliver+4f/70>
> Trace; c022dad0 <ip_local_deliver_finish+0/180>
> Trace; c022de3a <ip_rcv_finish+1ea/270>
> Trace; e08d7eab <[8139too]rtl8139_rx_interrupt+6b/3b0>
> Trace; c021ad14 <netif_receive_skb+c4/180>
> Trace; c021ae3f <process_backlog+6f/120>
> Trace; c021af5a <net_rx_action+6a/100>
> Trace; c0121cd7 <do_softirq+97/a0>
> Trace; c010a66d <do_IRQ+bd/f0>

distcc uses sendfile(). The 8139too hardware and driver are
zerocopy-capable so the kernel uses zerocopy direct-from-user-pages for
sendfile().

The bug is that the networking layer is releasing the final ref to user
pages from softirq context. Those pages are still on the page LRU so
__free_pages_ok() will take them off.

Problem is, removing these pages from the LRU requires that the
pagemap_lru_lock be taken, and that lock may not be taken from interrupt
context. So we go BUG instead.

This was all discussed fairly extensively a couple of years back and I
thought it ended up being fixed.

2004-04-05 10:42:14

by Marco Fais

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Marco Roeland ha scritto:

>>Mmmh, all the servers use an RTL-8139 compatible card, with the same
>>8139too driver. So this can be the problem.
> Hey, I'm by no means an expert. Suggesting the driver is to blame was
> mostly based on the fact that compiling locally worked, and from a
> remote machine triggered a panick. The rest of your description below
> indicates that it probably *isnt't* the driver.

I was not saying *this is the problem*, just noticing that all the
systems that show this problem have this network card, while the other
systems that are working perfectly are using other network hardware
(e100 driver) :)

>>Ok, next I will test the second network card on the server, just to avoid
>>the possibility of an hardware failure -- but I have other 4 servers that
>>show the same behaviour, so I don't think it's caused by faulty hardware.
> If 4 other servers show the same behaviour, and netcatting a lot of data
> doesn't panick the machine, that highly suggests that the network card
> and driver are innocent! I thought only one machine had the problem.

If you read Andrew's message, seems that distcc uses a function that
trigger the problem -- sendfile() -- so, if netcat doesn't use it, it's
clear why doesn't panic the kernel.

> Have you tried a local distcc compile, but specifying the host name as
> it's IP address or its real name. Distcc treats 'localhost' differently,
> but if it sees an IP address it will use the network route. As specified

Good test.

Yeah, kernel panic in a few seconds. Using localhost instead, compile
run perfectly for hours.
So it's definitely an issue related to distcc AND networking (and
probably interaction between network driver and kernel).

Thank you again for your advice!

2004-04-05 10:47:44

by Marco Fais

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Andrew Morton ha scritto:

>>kernel BUG at page_alloc.c:98!
> uh-oh.

That was the same thing that I've said when I saw all the leds blinking
in *all* the keyboards ... :)

> distcc uses sendfile(). The 8139too hardware and driver are
> zerocopy-capable so the kernel uses zerocopy direct-from-user-pages for
> sendfile().

Ok. Other servers with e100 driver doesn't show the problem. This means
that they're not "zerocopy-capable"?

> This was all discussed fairly extensively a couple of years back and I
> thought it ended up being fixed.

There are any workarounds for this, until the problem is corrected?

Thank you very much.


2004-04-05 10:56:38

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Marco Fais <[email protected]> wrote:
>
> Andrew Morton ha scritto:
>
> >>kernel BUG at page_alloc.c:98!
> > uh-oh.
>
> That was the same thing that I've said when I saw all the leds blinking
> in *all* the keyboards ... :)
>
> > distcc uses sendfile(). The 8139too hardware and driver are
> > zerocopy-capable so the kernel uses zerocopy direct-from-user-pages for
> > sendfile().
>
> Ok. Other servers with e100 driver doesn't show the problem. This means
> that they're not "zerocopy-capable"?

They are. It could be a timing thing.

> > This was all discussed fairly extensively a couple of years back and I
> > thought it ended up being fixed.
>
> There are any workarounds for this, until the problem is corrected?

This will probably make it go away.

--- linux-2.4.26-rc1/drivers/net/8139too.c 2004-03-27 22:06:18.000000000 -0800
+++ 24/drivers/net/8139too.c 2004-04-05 03:54:50.478692968 -0700
@@ -983,7 +983,7 @@ static int __devinit rtl8139_init_one (s
* through the use of skb_copy_and_csum_dev we enable these
* features
*/
- dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA;
+ dev->features |= NETIF_F_SG | NETIF_F_HIGHDMA;

dev->irq = pdev->irq;


2004-04-05 11:47:13

by Marco Roeland

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Monday April 5th 2004 Marco Fais wrote:

> I was not saying *this is the problem*, just noticing that all the
> systems that show this problem have this network card, while the other
> systems that are working perfectly are using other network hardware
> (e100 driver) :)

Yes, my conclusion was too hasty, it *is* driver related! ;-)

With hindsight we also should have tried, of course, a 'strace distccd
--no-detach' in a crashing and a non-crashing situation. This would
probably have shown that 'sendfile()' was the first missing system call
(and therefore likely the culprit) in the crashing situation. Oh, well...

> If you read Andrew's message, seems that distcc uses a function that
> trigger the problem -- sendfile() -- so, if netcat doesn't use it, it's
> clear why doesn't panic the kernel.

Yes, sendfile() in combination with the 8139too driver seems to be
causing the trouble. Until that will hopefully be fixed, it doesn't seem
easy to workaround against. At the moment it looks like it is not an
easy configurable option to *not* want to use zero_copy functionality,
either in the kernel, nor in distcc.

There is an '8139cp' driver too, it's supposed to be working better
as well, perhaps that one might not free the pages that are to be
zero_copied across the network before they are sent?! That is the real
problem if I understand Andrew's mail correctly.

You might send a 'linux 8139too sendfile() panic' kind of bugreport
to the '[email protected]' mailing list. That is the list where the
networking gurus are supposed to be hanging out. Although IMVHO this bug
is more on the kernel than on the network side. Also filing an entry to
bugzilla.kernel.org might speed up someone fixing the real problem.

Easiest workaround might be to just use a customised distcc for the
machines involved: just download the source from 'distcc.samba.org', do
a regular './configure', and then in the generated 'src/config.h' hand
edit '#undef HAVE_SENDFILE' and '#undef HAVE_SYS_SENDFILE_H'. That
should stop distcc from using sendfile().
--
Marco Roeland

2004-04-05 13:58:29

by Marco Fais

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Hola Andrew!

Andrew Morton ha scritto:

>>There are any workarounds for this, until the problem is corrected?
> This will probably make it go away.
>
> --- linux-2.4.26-rc1/drivers/net/8139too.c 2004-03-27 22:06:18.000000000 -0800
> +++ 24/drivers/net/8139too.c 2004-04-05 03:54:50.478692968 -0700
> @@ -983,7 +983,7 @@ static int __devinit rtl8139_init_one (s
> * through the use of skb_copy_and_csum_dev we enable these
> * features
> */
> - dev->features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_HIGHDMA;
> + dev->features |= NETIF_F_SG | NETIF_F_HIGHDMA;
>
> dev->irq = pdev->irq;

Unfortunately, this doesn't solve the problem. Seems that the panic it's
triggered a little later (1-2 minutes instead of a few seconds), but
anyway I have a kernel panic every time, also with this patch.

The oops tracing looks very similar to the one I've posted on the
linux-kernel list.

Thank you Andrew, bye!

2004-04-05 14:08:35

by Marco Fais

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Marco Roeland ha scritto:

> There is an '8139cp' driver too, it's supposed to be working better
> as well, perhaps that one might not free the pages that are to be
> zero_copied across the network before they are sent?! That is the real
> problem if I understand Andrew's mail correctly.

Just tried that, unfortunately this network card isn't supported from
8139cp driver.

> You might send a 'linux 8139too sendfile() panic' kind of bugreport
> to the '[email protected]' mailing list. That is the list where the
> networking gurus are supposed to be hanging out. Although IMVHO this bug

Andrew's messages are in CC: to the [email protected] list, so I think
they're already aware of the problem.

> is more on the kernel than on the network side. Also filing an entry to
> bugzilla.kernel.org might speed up someone fixing the real problem.

Ok, let see if we get a patch from this discussion, otherwise I'll file
a new bugzilla entry.

> Easiest workaround might be to just use a customised distcc for the
> machines involved: just download the source from 'distcc.samba.org', do
> a regular './configure', and then in the generated 'src/config.h' hand
> edit '#undef HAVE_SENDFILE' and '#undef HAVE_SYS_SENDFILE_H'. That
> should stop distcc from using sendfile().

Great! I'm going to test that right now, surely better than deploying
customized kernels in all servers until an "official" patch comes out.

Thank you very much, Marco.

2004-04-05 14:36:46

by Marco Roeland

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Monday April 5th 2004 Marco Fais wrote:

> Ok, let see if we get a patch from this discussion, otherwise I'll file
> a new bugzilla entry.

Perhaps the fact that you have *two* cards in each machine that crashes
with the 8139too driver could be important? I have two Athlon XP 2000+
with Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ that distcc
quite a lot, and never any crash. But network topology and timings might
just trigger the panic in your situation and not with others...

> [building distcc without sendfile()]
> Great! I'm going to test that right now, surely better than deploying
> customized kernels in all servers until an "official" patch comes out.

Yeah, although that viewpoint might not be very popular on this mailing
list. ;-) By the way the patch looks quite alright and applies (with
an offset) to 2.6.5 as well. If you build 8139too modular, you might
even make two modules, a modified one with the reduced advertised
capabilities (so that the kernel assumes the card isn't zero-copy
capable) under another name perhaps like 8139too-nosendfile, and the
standard one. You can than at least distribute one kernel package, and
only on the affected machines modprobe the bugfix module.

Anyway, first installing a distcc without sendfile() usages, can make
you (distcc)build patched kernels much faster in the future. ;-)
--
Marco Roeland

2004-04-05 17:03:29

by Max Valdez

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

I Sent an email a couple os weeks ago about the same issue.

But it wasnt so documented and organized.

I can say that the card and hardware are inocents, maybe the driver, the
"remote" machines that hang are using the latest fedore stable kernel.

I would need really good pointing to the procedure to debug the problem, I'm
not expert in anything about kernel.

I think it's a problem in the network handling because it happens on different
kernels, in different hardware. And it happens from a couple of months ago
(we got a new faster network "arquitecture") and the problems seems to be
triggered by fast transport of file over NTF, and distcc. I remember having a
crash using scp too for some iso files.

If needed I can help track this problem, but I need some hints on the
procedure

Max

--
Linux garaged 2.6.5-rc2-mm3 #1 Fri Mar 26 11:07:16 CST 2004 i686 Intel(R)
Pentium(R) 4 CPU 2.80GHz GenuineIntel GNU/Linux
-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GS/S d- s: a-29 C++(+++) ULAHI+++ P+ L++>+++ E--- W++ N* o-- K- w++++ O- M--
V-- PS+ PE Y-- PGP++ t- 5- X+ R tv++ b+ DI+++ D- G++ e++ h+ r+ z**
------END GEEK CODE BLOCK------
gpg-key: http://garaged.homeip.net/gpg-key.txt

2004-04-23 22:33:30

by Carson Gaspar

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

FYI, we see the exact same panic with the tg3 driver using 2.4.25 and
distcc with sendfile(). The bcm5700 driver also panics, but I haven't
captured a panic message to be certain it's the same bug.

kernel BUG at page_alloc.c:98!
invalid operand: 0000
CPU: 1
EIP: 0010:[<c0139492>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000001 ebx: c294dcb0 ecx: 00000001 edx: 00000020
esi: edb6e2e0 edi: 00000000 ebp: 00000004 esp: c55af9b4
ds: 0018 es: 0018 ss: 0018
Process cc1plus (pid: 21186, stackpage=c55af000)
Stack: c022e9ee f6fb1000 c022aa9c 00000287 00000206 00000286 db5a9600
00000001
edb6e2e0 edb6e2e0 00000004 c022aa4e edb6e2e0 f3716100 c022aa9c
edb6e2e0
f371623c f3716100 c022ac25 edb6e2e0 00000000 c025423a edb6e2e0
c55ae000
Call Trace: [<c022e9ee>] [<c022aa9c>] [<c022aa4e>] [<c022aa9c>]
[<c022ac25>]
[<c025423a>] [<c0247d28>] [<c024be53>] [<c025675b>] [<c02547c8>]
[<c0256bdf>]
[<c0138175>] [<c022aa9c>] [<c0254307>] [<c0258a67>] [<c022aa9c>]
[<c0254307>]
[<c025ef5b>] [<c025f4ad>] [<c022ac25>] [<c0256bec>] [<c01550dc>]
[<c014ba00>]
[<c02449a3>] [<c02449a3>] [<c0244da6>] [<c025ef5b>] [<c0139c05>]
[<c025f4ad>]
[<c022a8af>] [<c022f189>] [<c022a8af>] [<f8990d48>] [<c02449a3>]
[<f8990ef9>]
[<c022f3a3>]o[<c0122c5b>] [<c010a74e>] [<c0131a04>] [<c012e232>]
[<c0131487>]
[<c0119e06>] [<c0131b08>] [<c0131990>] [<c01410d6>] [<c012e72a>]
[<c0108b5f>]
Code: 0f 0b 62 00 bd 35 2a c0 89 d8 e8 5f ed ff ff 8b 6b 28 85 ed

>>EIP; c0139492 <__free_pages_ok+32/2b0> <=====
Trace; c022e9ee <dev_queue_xmit+14e/320>
Trace; c022aa9c <kfree_skbmem+c/70>
Trace; c022aa4e <skb_release_data+4e/90>
Trace; c022aa9c <kfree_skbmem+c/70>
Trace; c022ac25 <__kfree_skb+125/130>
Trace; c025423a <tcp_clean_rtx_queue+15a/310>
Trace; c0247d28 <ip_queue_xmit+3d8/550>
Trace; c024be53 <tcp_write_space+53/80>
Trace; c025675b <tcp_new_space+7b/80>
Trace; c02547c8 <tcp_ack+138/360>
Trace; c0256bdf <tcp_rcv_established+ef/8b0>
Trace; c0138175 <lru_cache_add+75/80>
Trace; c022aa9c <kfree_skbmem+c/70>
Trace; c0254307 <tcp_clean_rtx_queue+227/310>
Trace; c0258a67 <tcp_transmit_skb+567/620>
Trace; c022aa9c <kfree_skbmem+c/70>
Trace; c0254307 <tcp_clean_rtx_queue+227/310>
Trace; c025ef5b <tcp_v4_do_rcv+3b/120>
Trace; c025f4ad <tcp_v4_rcv+46d/6f0>
Trace; c022ac25 <__kfree_skb+125/130>
Trace; c0256bec <tcp_rcv_established+fc/8b0>
Trace; c01550dc <dput+1c/160>
Trace; c014ba00 <cached_lookup+10/50>
Trace; c02449a3 <ip_local_deliver+f3/190>
Trace; c02449a3 <ip_local_deliver+f3/190>
Trace; c0244da6 <ip_rcv+366/400>
Trace; c025ef5b <tcp_v4_do_rcv+3b/120>
Trace; c0139c05 <__alloc_pages+75/2f0>
Trace; c025f4ad <tcp_v4_rcv+46d/6f0>
Trace; c022a8af <alloc_skb+ef/1c0>
Trace; c022f189 <netif_receive_skb+189/1c0>
Trace; c022a8af <alloc_skb+ef/1c0>
Trace; f8990d48 <[usbcore]__kstrtab_usb_hcd_giveback_urb+52f8/6a50>
Trace; c02449a3 <ip_local_deliver+f3/190>
Trace; f8990ef9 <[usbcore]__kstrtab_usb_hcd_giveback_urb+54a9/6a50>
Trace; c022f3a3 <net_rx_action+b3/170>
Trace; c0119e06 <do_page_fault+1a6/4eb>
Trace; c0131b08 <generic_file_read+88/170>
Trace; c0131990 <file_read_actor+0/f0>
Trace; c01410d6 <sys_read+96/110>
Trace; c012e72a <sys_brk+ba/f0>
Trace; c0108b5f <system_call+33/38>
Code; c0139492 <__free_pages_ok+32/2b0>
00000000 <_EIP>:
Code; c0139492 <__free_pages_ok+32/2b0> <=====
0: 0f 0b ud2a <=====
Code; c0139494 <__free_pages_ok+34/2b0>
2: 62 00 bound %eax,(%eax)
Code; c0139496 <__free_pages_ok+36/2b0>
4: bd 35 2a c0 89 mov $0x89c02a35,%ebp
Code; c013949b <__free_pages_ok+3b/2b0>
9: d8 e8 fsubr %st(0),%st
Code; c013949d <__free_pages_ok+3d/2b0>
b: 5f pop %edi
Code; c013949e <__free_pages_ok+3e/2b0>
c: ed in (%dx),%eax
Code; c013949f <__free_pages_ok+3f/2b0>
d: ff (bad)
Code; c01394a0 <__free_pages_ok+40/2b0>
e: ff 8b 6b 28 85 ed decl 0xed85286b(%ebx)

<0>Kernel panic: Aiee, killing interrupt handler!

2004-04-28 02:03:44

by Jeff Moyer

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc


>FYI, we see the exact same panic with the tg3 driver using 2.4.25 and
>distcc with sendfile(). The bcm5700 driver also panics, but I haven't
>captured a panic message to be certain it's the same bug.

>kernel BUG at page_alloc.c:98!

Andrea fixed this in his tree by deferring the page free to process context
instead of BUG()ing on PageLRU(page).

-Jeff

2004-04-29 21:14:05

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Tue, Apr 27, 2004 at 10:02:11PM -0400, Jeff Moyer wrote:
>
> >FYI, we see the exact same panic with the tg3 driver using 2.4.25 and
> >distcc with sendfile(). The bcm5700 driver also panics, but I haven't
> >captured a panic message to be certain it's the same bug.
>
> >kernel BUG at page_alloc.c:98!
>
> Andrea fixed this in his tree by deferring the page free to process context
> instead of BUG()ing on PageLRU(page).

Yeap, his fix looks OK.

Can you please people seeing the oops try this, from Andrea (on top of 2.4.26):

--- a/mm/page_alloc.c.orig 2004-04-29 17:38:14.184021976 -0300
+++ b/mm/page_alloc.c 2004-04-29 17:47:27.906843312 -0300
@@ -46,6 +46,34 @@

int vm_gfp_debug = 0;

+static void FASTCALL(__free_pages_ok (struct page *page, unsigned int order));
+
+static spinlock_t free_pages_ok_no_irq_lock = SPIN_LOCK_UNLOCKED;
+struct page * free_pages_ok_no_irq_head;
+
+static void do_free_pages_ok_no_irq(void * arg)
+{
+ struct page * page, * __page;
+
+ spin_lock_irq(&free_pages_ok_no_irq_lock);
+
+ page = free_pages_ok_no_irq_head;
+ free_pages_ok_no_irq_head = NULL;
+
+ spin_unlock_irq(&free_pages_ok_no_irq_lock);
+
+ while (page) {
+ __page = page;
+ page = page->next_hash;
+ __free_pages_ok(__page, __page->index);
+ }
+}
+
+static struct tq_struct free_pages_ok_no_irq_task = {
+ .routine = do_free_pages_ok_no_irq,
+};
+
+
/*
* Temporary debugging check.
*/
@@ -81,7 +109,6 @@
* -- wli
*/

-static void FASTCALL(__free_pages_ok (struct page *page, unsigned int order));
static void __free_pages_ok (struct page *page, unsigned int order)
{
unsigned long index, page_idx, mask, flags;
@@ -94,8 +121,20 @@
* a reference to a page in order to pin it for io. -ben
*/
if (PageLRU(page)) {
- if (unlikely(in_interrupt()))
- BUG();
+ if (unlikely(in_interrupt())) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&free_pages_ok_no_irq_lock, flags);
+ page->next_hash = free_pages_ok_no_irq_head;
+ free_pages_ok_no_irq_head = page;
+ page->index = order;
+
+ spin_unlock_irqrestore(&free_pages_ok_no_irq_lock, flags);
+
+ schedule_task(&free_pages_ok_no_irq_task);
+ return;
+ }
+
lru_cache_del(page);
}

2004-04-29 21:27:40

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Marcelo Tosatti <[email protected]> wrote:
>
> > Andrea fixed this in his tree by deferring the page free to process context
> > instead of BUG()ing on PageLRU(page).
>
> Yeap, his fix looks OK.

It does.

It would be nice to change

if (in_interrupt())

to

if (in_interrupt() || ((count++ % 10000) == 0))

just to exercise that code path a bit more.

2004-04-29 22:51:46

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Thu, Apr 29, 2004 at 02:28:07PM -0700, Andrew Morton wrote:
> just to exercise that code path a bit more.

what's the point of exercising that code path more? are you worried that
there are bugs in it?

2004-04-29 23:24:17

by Andrew Morton

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

Andrea Arcangeli <[email protected]> wrote:
>
> On Thu, Apr 29, 2004 at 02:28:07PM -0700, Andrew Morton wrote:
> > just to exercise that code path a bit more.
>
> what's the point of exercising that code path more? are you worried that
> there are bugs in it?

The only application which we know will exercise that code is the distcc
server. Making that little change while testing the patch will increase
the chance of shaking out any problems.

2004-04-30 00:15:21

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: kernel BUG at page_alloc.c:98 -- compiling with distcc

On Thu, Apr 29, 2004 at 04:26:32PM -0700, Andrew Morton wrote:
> The only application which we know will exercise that code is the distcc
> server. Making that little change while testing the patch will increase
> the chance of shaking out any problems.

if you're scared it has bugs I think it'd be more useful to change it to
"|| 1" and run it under some stress test, and then remove the "|| 1".
the aio code in unmap_kvec is also a big user of that. a schedule every
40M of ram freed isn't too nice to my eyes (but I doubt it can be
measured).