2002-06-23 01:23:25

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Hi, I'm using kernel 2.4.19-pre10-ac2 and it just printed the pretty
message below. Other pertinent information follows.

CC: if needed, because I'm not in the list.

Cheers,

Alexandre
--

kernel BUG at page_alloc.c:131!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c014177e>] Tainted: P
EFLAGS: 00010286
eax: cba53f48 ebx: c144955c ecx: c025f6dc edx: 00000000
esi: 00000000 edi: 138de070 ebp: 00001000 esp: c41b7ec8
ds: 0018 es: 0018 ss: 0018
Process X (pid: 3924, stackpage=c41b7000)
Stack: c144955c c0142fca c02ba2a0 00000000 00000001 c144955c 00000000
138de070
d38de070 00000000 138de070 00001000 c01351c6 c144955c 0000000e
d2fca688
0000000e 00000001 40400000 cd841404 4001d000 00000000 c013320b
d5decd40
Call Trace: [<c0142fca>] [<c01351c6>] [<c013320b>] [<c01367a1>]
[<c01368ca>]
[<c010ba0f>]

Code: 0f 0b 83 00 53 b0 23 c0 8b 43 18 c6 43 24 05 89 f1 89 dd 83
<3>X[3924] exited with preempt_count 1



--

The output of scripts/ver_linux:

If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

Linux PolesApart 2.4.19-pre10-ac2 #1 Sat Jun 22 17:58:33 BRT 2002 i686
unknown

Gnu C gcc (GCC) 3.1 Copyright (C) 2002 Free Software
Foundation, Inc. This is free software; see the source for copying
conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.
Gnu make 3.79.1
util-linux 2.11r
mount 2.11r
modutils 2.4.16
e2fsprogs 1.27
pcmcia-cs 3.1.33
PPP 2.4.1
Linux C Library 2.2.5
Dynamic linker (ldd) 2.2.5
Linux C++ Library 4.0.0
Procps 2.0.7
Net-tools 1.60
Kbd 1.06
Sh-utils 2.0
Modules Loaded ide-cd agpgart NVdriver cs4281 soundcore autofs
8139too mii sr_mod scsi_mod cdrom


# Output from /proc/cpuinfo

processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) Processor
stepping : 2
cpu MHz : 908.105
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 1808.79


# Output from /proc/modules

ide-cd 34072 0 (autoclean)
agpgart 15080 3 (autoclean)
NVdriver 999648 12 (autoclean)
cs4281 49512 0
soundcore 4068 3 [cs4281]
autofs 12388 1 (autoclean)
8139too 15784 1
mii 1280 0 [8139too]
sr_mod 15064 0 (autoclean) (unused)
scsi_mod 59636 1 (autoclean) [sr_mod]
cdrom 29056 0 (autoclean) [ide-cd sr_mod]

# Output from /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
0170-0177 : ide1
01f0-01f7 : ide0
0376-0376 : ide1
03c0-03df : vga+
03f6-03f6 : ide0
0cf8-0cff : PCI conf1
a400-a4ff : Realtek Semiconductor Co., Ltd. RTL-8139/8139C
a400-a4ff : 8139too
d000-d01f : VIA Technologies, Inc. UHCI USB (#2)
d400-d41f : VIA Technologies, Inc. UHCI USB
d800-d80f : VIA Technologies, Inc. Bus Master IDE
d800-d807 : ide0
d808-d80f : ide1
e200-e27f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
e400-e4ff : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
e800-e80f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]

# Output from /proc/iomem
00000000-0009fbff : System RAM
0009fc00-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000f0000-000fffff : System ROM
00100000-15febfff : System RAM
00100000-00230c70 : Kernel code
00230c71-0026995f : Kernel data
15fec000-15feefff : ACPI Tables
15fef000-15ffefff : reserved
15fff000-15ffffff : ACPI Non-volatile Storage
d4800000-d48000ff : Realtek Semiconductor Co., Ltd. RTL-8139/8139C
d4800000-d48000ff : 8139too
d5000000-d500ffff : Cirrus Logic Crystal CS4281 PCI Audio
d5800000-d5800fff : Cirrus Logic Crystal CS4281 PCI Audio
d6000000-d7dfffff : PCI Bus #01
d6000000-d6ffffff : nVidia Corporation NV11 (GeForce2 MX DDR)
d7f00000-dfffffff : PCI Bus #01
d8000000-dfffffff : nVidia Corporation NV11 (GeForce2 MX DDR)
e0000000-e7ffffff : VIA Technologies, Inc. VT8363/8365 [KT133/KM133]
ffff0000-ffffffff : reserved

# lspci -vvv
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 02)
Subsystem: Asustek Computer, Inc. A7V Mainboard
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 8
Region 0: Memory at e0000000 (32-bit, prefetchable) [size=128M]
Capabilities: [a0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW+ Rate=x1,x2,x4
Command: RQ=0 SBA- AGP+ 64bit- FW+ Rate=x4
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR+
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000e000-0000dfff
Memory behind bridge: d6000000-d7dfffff
Prefetchable memory behind bridge: d7f00000-dfffffff
BridgeCtl: Parity- SERR- NoISA- VGA+ MAbort- >Reset- FastB2B-
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 22)
Subsystem: Asustek Computer, Inc. A7V Mainboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0

00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 10) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at d800 [size=16]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:04.2 USB Controller: VIA Technologies, Inc. USB (rev 10) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 9
Region 4: I/O ports at d400 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:04.3 USB Controller: VIA Technologies, Inc. USB (rev 10) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, cache line size 08
Interrupt: pin D routed to IRQ 9
Region 4: I/O ports at d000 [size=32]
Capabilities: [80] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:04.4 Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 30)
Subsystem: Asustek Computer, Inc. A7V Mainboard
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 9
Capabilities: [68] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0a.0 Multimedia audio controller: Cirrus Logic Crystal CS4281 PCI Audio (rev 01)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (1000ns min, 6000ns max)
Interrupt: pin A routed to IRQ 5
Region 0: Memory at d5800000 (32-bit, non-prefetchable) [size=4K]
Region 1: Memory at d5000000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [40] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:0b.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RT8139
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (8000ns min, 16000ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at a400 [size=256]
Region 1: Memory at d4800000 (32-bit, non-prefetchable) [size=256]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0-,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

01:00.0 VGA compatible controller: nVidia Corporation NV11 [GeForce2 MX DDR] (rev b2) (prog-if 00 [VGA])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 248 (1250ns min, 250ns max)
Interrupt: pin A routed to IRQ 11
Region 0: Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
Region 1: Memory at d8000000 (32-bit, prefetchable) [size=128M]
Expansion ROM at d7ff0000 [disabled] [size=64K]
Capabilities: [60] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [44] AGP version 2.0
Status: RQ=31 SBA- 64bit- FW+ Rate=x1,x2,x4
Command: RQ=31 SBA- AGP+ 64bit- FW+ Rate=x4



#Xfree is the version bundled with slackware 8.1
XFree86 Version 4.2.0 / X Window System
(protocol Version 11, revision 0, vendor release 6600)
Release Date: 18 January 2002
If the server is older than 6-12 months, or if your card is
newer than the above date, look for a newer version before
reporting problems. (See http://www.XFree86.Org/)
Build Operating System: Linux 2.4.18 i686 [ELF]
Module Loader present


2002-06-23 11:05:37

by Diego Calleja

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Sat, 22 Jun 2002 22:13:52 -0300 (BRT)
Alexandre Pereira Nunes <[email protected]> escribi?:
>
> kernel BUG at page_alloc.c:131!
> invalid operand: 0000
> CPU: 0
> EIP: 0010:[<c014177e>] Tainted: P
> EFLAGS: 00010286
> eax: cba53f48 ebx: c144955c ecx: c025f6dc edx: 00000000
> esi: 00000000 edi: 138de070 ebp: 00001000 esp: c41b7ec8
> ds: 0018 es: 0018 ss: 0018
> Process X (pid: 3924, stackpage=c41b7000)
> Stack: c144955c c0142fca c02ba2a0 00000000 00000001 c144955c 00000000
> 138de070
> d38de070 00000000 138de070 00001000 c01351c6 c144955c 0000000e
> d2fca688
> 0000000e 00000001 40400000 cd841404 4001d000 00000000 c013320b
> d5decd40
> Call Trace: [<c0142fca>] [<c01351c6>] [<c013320b>] [<c01367a1>]
> [<c01368ca>]
> [<c010ba0f>]
>
> Code: 0f 0b 83 00 53 b0 23 c0 8b 43 18 c6 43 24 05 89 f1 89 dd 83
> <3>X[3924] exited with preempt_count 1

It'd be nice if you could pass this oops throught ksymoops....

2002-06-23 11:41:02

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Sun, 23 Jun 2002, Diego Calleja wrote:

>
> It'd be nice if you could pass this oops throught ksymoops....
>

Ok, here it is...

Cheers,

Alexandre

>---------------------------------------------------

ksymoops 2.4.5 on i686 2.4.19-pre10-ac2. Options used
-v /usr/src/linux-2.4.19-pre/vmlinux (specified)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.19-pre10-ac2/ (default)
-m /usr/src/linux/System.map (default)

kernel BUG at page_alloc.c:131!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c014177e>] Tainted: P
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: cba53f48 ebx: c144955c ecx: c025f6dc edx: 00000000
esi: 00000000 edi: 138de070 ebp: 00001000 esp: c41b7ec8
ds: 0018 es: 0018 ss: 0018
Process X (pid: 3924, stackpage=c41b7000)
Stack: c144955c c0142fca c02ba2a0 00000000 00000001 c144955c 00000000 138de070
d38de070 00000000 138de070 00001000 c01351c6 c144955c 0000000e d2fca688
0000000e 00000001 40400000 cd841404 4001d000 00000000 c013320b d5decd40
Call Trace: [<c0142fca>] [<c01351c6>] [<c013320b>] [<c01367a1>] [<c01368ca>]
[<c010ba0f>]
Code: 0f 0b 83 00 53 b0 23 c0 8b 43 18 c6 43 24 05 89 f1 89 dd 83


>>EIP; c014177e <__free_pages_ok+be/2a0> <=====

>>eax; cba53f48 <_end+b77b478/165f9530>
>>ebx; c144955c <_end+1170a8c/165f9530>
>>ecx; c025f6dc <contig_page_data+dc/3c0>
>>edi; 138de070 Before first symbol
>>ebp; 00001000 Before first symbol
>>esp; c41b7ec8 <_end+3edf3f8/165f9530>

Trace; c0142fca <remove_exclusive_swap_page+ba/120>
Trace; c01351c6 <zap_pte_range+106/260>
Trace; c013320b <zap_page_range+9b/120>
Trace; c01367a1 <do_munmap+221/300>
Trace; c01368ca <sys_munmap+4a/70>
Trace; c010ba0f <system_call+33/38>

Code; c014177e <__free_pages_ok+be/2a0>
00000000 <_EIP>:
Code; c014177e <__free_pages_ok+be/2a0> <=====
0: 0f 0b ud2a <=====
Code; c0141780 <__free_pages_ok+c0/2a0>
2: 83 00 53 addl $0x53,(%eax)
Code; c0141783 <__free_pages_ok+c3/2a0>
5: b0 23 mov $0x23,%al
Code; c0141785 <__free_pages_ok+c5/2a0>
7: c0 8b 43 18 c6 43 24 rorb $0x24,0x43c61843(%ebx)
Code; c014178c <__free_pages_ok+cc/2a0>
e: 05 89 f1 89 dd add $0xdd89f189,%eax
Code; c0141791 <__free_pages_ok+d1/2a0>
13: 83 00 00 addl $0x0,(%eax)

--
Life would be so much easier if we could just look at the source code.

2002-06-24 12:23:25

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Markus Schoder wrote:

>Hi Alexandre,
>
>you are using the proprietary nVidia module (NVdriver), so you would have
>to address any kernel problems to nVidia since they are the ones who have the
>source.
>
>However lots of people (including myself) have seen the same problem when
>using the nVidia module. For me it went away when upgrading to the 1.0-2960
>drivers. If you haven't done so yet this would be the first thing to try.
>
>Hope this helps,
>Markus
>
>
>
I'm already using the 1.0-2960 version, and that's the version in the
report. It's possible that the NVdriver module is the cause of the
problem, but the bug spots in kernel's vm, in a place which it's no
supposed to, at the point I understand. So, or the module does something
very ugly, or the kernel really have a bug, or yet it's nothing related
to the nvdriver. Unfortunately, the backtrace don't help me figuring
that out, since I'm no vm expert, but perhaps someone will. I may
attempt to forward this to Nvidia folks, but reporting a bug which only
spotted once and in a "pre" series kernel may hurt their feelings...

Thanks,

Alexandre


2002-06-24 13:55:53

by Alan

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

> report. It's possible that the NVdriver module is the cause of the
> problem, but the bug spots in kernel's vm, in a place which it's no
> supposed to, at the point I understand. So, or the module does something
> very ugly, or the kernel really have a bug, or yet it's nothing related
> to the nvdriver. Unfortunately, the backtrace don't help me figuring
> that out, since I'm no vm expert, but perhaps someone will. I may
> attempt to forward this to Nvidia folks, but reporting a bug which only
> spotted once and in a "pre" series kernel may hurt their feelings...

Their problem - they have our source we dont have theirs. If it occurs
with nvdriver ever loaded in that boot send it to nvidia or duplicate it
from a cold boot without the driver ever loadinhg

2002-06-24 15:06:41

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Alan Cox wrote:

>>report. It's possible that the NVdriver module is the cause of the
>>problem, but the bug spots in kernel's vm, in a place which it's no
>>supposed to, at the point I understand. So, or the module does something
>>very ugly, or the kernel really have a bug, or yet it's nothing related
>>to the nvdriver. Unfortunately, the backtrace don't help me figuring
>>that out, since I'm no vm expert, but perhaps someone will. I may
>>attempt to forward this to Nvidia folks, but reporting a bug which only
>>spotted once and in a "pre" series kernel may hurt their feelings...
>>
>>
>
>Their problem - they have our source we dont have theirs. If it occurs
>with nvdriver ever loaded in that boot send it to nvidia or duplicate it
>from a cold boot without the driver ever loadinhg
>
>
I sent an email to they.
I'm not able to try to reproduce it either with or without the module
loaded, since I have no access to the machine in question right now. In
the case I can, maybe I'll try to do it. Since it just happened once,
after I happened to get with swap pratically full, I guess that would be
hard (there was no OOM reporting from the kernel, though).

Maybe I got it the wrong way, but it seems to me that from your point of
view, as long as proprietary driver is in use, it's not anyone else
problem but to the vendor, even if the bug could happen to be in the
kernel, is that right? If so, everyone else in this list who could try
to fix this (again assuming it could be something related to the kernel
and not to the proprietary driver) necessarily share your oppinion? (I'm
not flaming in here, just trying to get the path).


Thank you,

Alexandre

2002-06-24 15:11:05

by David Miller

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

From: "Alexandre P. Nunes" <[email protected]>
Date: Mon, 24 Jun 2002 12:06:32 -0300

Maybe I got it the wrong way, but it seems to me that from your point of
view, as long as proprietary driver is in use, it's not anyone else
problem but to the vendor, even if the bug could happen to be in the
kernel, is that right? If so, everyone else in this list who could try
to fix this (again assuming it could be something related to the kernel
and not to the proprietary driver) necessarily share your oppinion? (I'm
not flaming in here, just trying to get the path).

This has to do with facts, not opinions. Since we lack the source to
their drivers, we have no idea if some bug in their driver is
scribbling over (ie. corrupting) memory. It is therefore an unknown
which makes it a waste of time for us to pursue the bug report.

2002-06-24 15:47:53

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Mon, 24 Jun 2002, Alexandre P. Nunes wrote:

> Maybe I got it the wrong way, but it seems to me that from your point of
> view, as long as proprietary driver is in use, it's not anyone else
> problem but to the vendor, even if the bug could happen to be in the
> kernel, is that right? If so, everyone else in this list who could try
> to fix this (again assuming it could be something related to the kernel
> and not to the proprietary driver) necessarily share your oppinion? (I'm
> not flaming in here, just trying to get the path).

This particular one has cropped up a multitude of times, i can assure you
that you're not the first. Try searching the archives for
__free_pages_ok, nvidia and a few other keywords. I've seen that
particular bug first hand and its definitely the work of the nvidia
driver. Try their 2314 driver, i can't recall seeing it with that
particular version.

Cheers,
Zwane

--
http://function.linuxpower.ca


2002-06-24 16:47:58

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

David S. Miller wrote:

> From: "Alexandre P. Nunes" <[email protected]>
> Date: Mon, 24 Jun 2002 12:06:32 -0300
>
> Maybe I got it the wrong way, but it seems to me that from your point of
> view, as long as proprietary driver is in use, it's not anyone else
> problem but to the vendor, even if the bug could happen to be in the
> kernel, is that right? If so, everyone else in this list who could try
> to fix this (again assuming it could be something related to the kernel
> and not to the proprietary driver) necessarily share your oppinion? (I'm
> not flaming in here, just trying to get the path).
>
>This has to do with facts, not opinions. Since we lack the source to
>their drivers, we have no idea if some bug in their driver is
>scribbling over (ie. corrupting) memory. It is therefore an unknown
>which makes it a waste of time for us to pursue the bug report.
>
>
Thanks, now with your report (and in fact Zwane's one, which cleared the
fact this bug scenario have been seen before) it's all clear to me. I
don't subscribe to nor used to read this list archives, so I couldn't
take a better picture and guess without actually someone pointing this out.

I'll direct the pressure all over nvidia, and in mean time I'll read the
relevant articles in the archives, which seems the way to avoid
unnecessary redundancy ...

Sorry folks if I bothered too much, I'll try avoid that :-)

Thanks for all of you,

Alexandre


2002-06-26 20:48:35

by William Lee Irwin III

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Sat, Jun 22, 2002 at 10:13:52PM -0300, Alexandre Pereira Nunes wrote:
> Hi, I'm using kernel 2.4.19-pre10-ac2 and it just printed the pretty
> message below. Other pertinent information follows.
> CC: if needed, because I'm not in the list.
> Cheers,
> Alexandre
> <3>X[3924] exited with preempt_count 1

Would you mind showing us the patch you used to merge preempt into -ac?


Thanks,
Bill

2002-06-26 20:58:15

by Bongani Hlope

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

IIRC the preemptive patch is now part of -ac

On Wed, 2002-06-26 at 22:47, William Lee Irwin III wrote:
> On Sat, Jun 22, 2002 at 10:13:52PM -0300, Alexandre Pereira Nunes wrote:
> > Hi, I'm using kernel 2.4.19-pre10-ac2 and it just printed the pretty
> > message below. Other pertinent information follows.
> > CC: if needed, because I'm not in the list.
> > Cheers,
> > Alexandre
> > <3>X[3924] exited with preempt_count 1
>
> Would you mind showing us the patch you used to merge preempt into -ac?
>
>
> Thanks,
> Bill
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2002-06-26 21:41:37

by khromy

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, Jun 26, 2002 at 11:00:08PM +0200, Bongani wrote:
> IIRC the preemptive patch is now part of -ac

I don't think that's correct. I currently have to apply rml's
preempt-kernel-rml-2.4.19-pre9-ac3-a.patch on top of 2.4.19-pre10-ac2 in
order to get preempt support.

--
L1: khromy ;khromy(at)lnuxlab.ath.cx

2002-06-26 21:59:49

by Robert Love

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, 2002-06-26 at 17:00, Bongani wrote:
> IIRC the preemptive patch is now part of -ac

The preemptive kernel is not part of 2.4-ac.

Btw, fwiw, I do not think this problem has anything to do with
preemption. The "exited with preempt_count" message just means the task
exited with preemption disabled. It is not a problem if the task died
abnormally.

Robert Love

2002-06-26 22:10:43

by Bongani Hlope

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Sorry I was confusing the preemptive patch with the O(1) patch
I just remember that you worked on a patch for -ac ;)



On Wed, 2002-06-26 at 23:54, Robert Love wrote:
> On Wed, 2002-06-26 at 17:00, Bongani wrote:
> > IIRC the preemptive patch is now part of -ac
>
> The preemptive kernel is not part of 2.4-ac.
>
> Btw, fwiw, I do not think this problem has anything to do with
> preemption. The "exited with preempt_count" message just means the task
> exited with preemption disabled. It is not a problem if the task died
> abnormally.
>
> Robert Love
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


2002-06-26 22:13:41

by Robert Love

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, 2002-06-26 at 18:12, Bongani wrote:
> Sorry I was confusing the preemptive patch with the O(1) patch
> I just remember that you worked on a patch for -ac ;)

Ah yes O(1) is in 2.4-ac and I have been working on bits for Alan :)

Robert Love

2002-06-27 00:55:50

by William Lee Irwin III

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, 2002-06-26 at 17:00, Bongani wrote:
> The preemptive kernel is not part of 2.4-ac.

On Wed, Jun 26, 2002 at 05:54:13PM -0400, Robert Love wrote:
> Btw, fwiw, I do not think this problem has anything to do with
> preemption. The "exited with preempt_count" message just means the task
> exited with preemption disabled. It is not a problem if the task died
> abnormally.

Well, my concern here is for the pte_chain_lock() / pte_chain_unlock()
bits. Teaching them about preemption should be all that's needed there.


Cheers,
Bill

2002-06-27 15:46:47

by Robert Love

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, 2002-06-26 at 20:54, William Lee Irwin III wrote:

> Well, my concern here is for the pte_chain_lock() / pte_chain_unlock()
> bits. Teaching them about preemption should be all that's needed there.

The newest patch should have the code I shared with you. So we are OK,
no?

Robert Love

2002-06-27 15:48:28

by William Lee Irwin III

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Wed, 2002-06-26 at 20:54, William Lee Irwin III wrote:
>> Well, my concern here is for the pte_chain_lock() / pte_chain_unlock()
>> bits. Teaching them about preemption should be all that's needed there.

On Thu, Jun 27, 2002 at 11:40:39AM -0400, Robert Love wrote:
> The newest patch should have the code I shared with you. So we are OK,
> no?

That should cover it, yes. The only questions left are if the user is
using the right version and where the bug is.

Cheers,
Bill

2002-06-27 18:28:20

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

William Lee Irwin III wrote:

>On Wed, 2002-06-26 at 20:54, William Lee Irwin III wrote:
>
>
>>>Well, my concern here is for the pte_chain_lock() / pte_chain_unlock()
>>>bits. Teaching them about preemption should be all that's needed there.
>>>
>>>
>
>On Thu, Jun 27, 2002 at 11:40:39AM -0400, Robert Love wrote:
>
>
>>The newest patch should have the code I shared with you. So we are OK,
>>no?
>>
>>
>
>That should cover it, yes. The only questions left are if the user is
>using the right version and where the bug is.
>
>Cheers,
>Bill
>
>
The user (oops, that is me) was using
preempt-kernel-rml-2.4.19-pre10-ac2-1, by the time of the report.

Cheers,

Alexandre

2002-06-27 18:35:00

by Robert Love

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Thu, 2002-06-27 at 14:29, Alexandre P. Nunes wrote:

> The user (oops, that is me) was using
> preempt-kernel-rml-2.4.19-pre10-ac2-1, by the time of the report.

Ah try preempt-kernel-rml-2.4.19-pre10-ac2-2 which has the bit lock
safing William mentioned.

Robert Love

2002-06-27 18:51:11

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Robert Love wrote:

>On Thu, 2002-06-27 at 14:29, Alexandre P. Nunes wrote:
>
>
>
>>The user (oops, that is me) was using
>>preempt-kernel-rml-2.4.19-pre10-ac2-1, by the time of the report.
>>
>>
>
>Ah try preempt-kernel-rml-2.4.19-pre10-ac2-2 which has the bit lock
>safing William mentioned.
>
> Robert Love
>
>
>
I'll, possibly at the weekend, since the machine is in home and I
usually don't use it in business days. If I find something relevant,
I'll send feedback.

Thank you, I'll upgrade my machine at work anyway. :-)

Alexandre

P.S..: Someone knows if 2.4.19-rc1 has O(1) applied?



2002-06-27 20:03:47

by Joe

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Alexandre P. Nunes wrote:

>
>
> P.S..: Someone knows if 2.4.19-rc1 has O(1) applied?
>
That would be nice... but no -

both -aa and -ac have it though, and it seems
solid, so maybe there's hope for 2.4 mainline
getting it eventually.

Joe


2002-07-01 18:46:12

by Pavel Machek

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Hi!

> Maybe I got it the wrong way, but it seems to me that from your point of
> view, as long as proprietary driver is in use, it's not anyone else
> problem but to the vendor, even if the bug could happen to be in the
> kernel, is that right? If so, everyone else in this list who could try
> to fix this (again assuming it could be something related to the kernel
> and not to the proprietary driver) necessarily share your oppinion? (I'm
> not flaming in here, just trying to get the path).
>
> This has to do with facts, not opinions. Since we lack the source to
> their drivers, we have no idea if some bug in their driver is
> scribbling over (ie. corrupting) memory. It is therefore an unknown
> which makes it a waste of time for us to pursue the bug report.

Actually, then you should taint kernel for starting X, too... Anything
running with root priviledges can scribble over memory.
Pavel
PS: Not that I'm advocating nvidia junk, and of course it is way
easier to cause corruption from kernel.
--
(about SSSCA) "I don't say this lightly. However, I really think that the U.S.
no longer is classifiable as a democracy, but rather as a plutocracy." --hpa

2002-07-02 01:37:50

by Horst von Brand

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Pavel Machek <[email protected]> said:

[...]

> Actually, then you should taint kernel for starting X, too... Anything
> running with root priviledges can scribble over memory.

Come on, a wild pointer in a random program running as root won't ever
bring the kernel down, as a wild pointer in a module certainly can/will.
--
Horst von Brand [email protected]
Casilla 9G, Vin~a del Mar, Chile +56 32 672616

2002-07-02 16:32:57

by Bill Davidsen

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Mon, 24 Jun 2002, David S. Miller wrote:

> This has to do with facts, not opinions. Since we lack the source to
> their drivers, we have no idea if some bug in their driver is
> scribbling over (ie. corrupting) memory. It is therefore an unknown
> which makes it a waste of time for us to pursue the bug report.

By that logic if source is freely available the kernel should not be
marked tainted, even if the source license is not GPL, as in you can get
it and use it to debug, but the license is something like BSD, or the
Kermit limited redistribution, etc.

I'm asking in general, not about just one particular binary-only driver.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-07-02 17:02:27

by Alexandre Pereira Nunes

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

Bill Davidsen wrote:

>On Mon, 24 Jun 2002, David S. Miller wrote:
>
>
>
>>This has to do with facts, not opinions. Since we lack the source to
>>their drivers, we have no idea if some bug in their driver is
>>scribbling over (ie. corrupting) memory. It is therefore an unknown
>>which makes it a waste of time for us to pursue the bug report.
>>
>>
>
>By that logic if source is freely available the kernel should not be
>marked tainted, even if the source license is not GPL, as in you can get
>it and use it to debug, but the license is something like BSD, or the
>Kermit limited redistribution, etc.
>
>I'm asking in general, not about just one particular binary-only driver.
>
>
>

How this taint stuff works, actually ? It's just a marker or it impose
any restrictions?

While I made all efforts to send nvidia all information pertinent to the
reported bug, I also found that the source to o/s dependent parts are in
fact (at least partially) available, with an absurdly restrictive
license, though. If someone else is interested in looking at, one of the
files in the distribution contains the mm code and all general
interfacing to the kernel.

I agree it's nvidia responsability for checking its own source, but help
is always welcome when it's true help after all.

In last weekend I patched 2.4.19-pre10-ac2 with the last preempt-kernel
patch, and since I was unable to reproduce the crash, though I didn't
stress the machine enough by lack of time, so it's just informative
report in case someone want to try.

Cheers,

Alexandre


2002-07-02 23:23:07

by Matthias Andree

[permalink] [raw]
Subject: Re: PROBLEM: 2.4.19-pre10-ac2 bug in page_alloc.c:131

On Tue, 02 Jul 2002, Alexandre P. Nunes wrote:

> How this taint stuff works, actually ? It's just a marker or it impose
> any restrictions?
>
> While I made all efforts to send nvidia all information pertinent to the
> reported bug, I also found that the source to o/s dependent parts are in
> fact (at least partially) available, with an absurdly restrictive
> license, though. If someone else is interested in looking at, one of the
> files in the distribution contains the mm code and all general
> interfacing to the kernel.
>
> I agree it's nvidia responsability for checking its own source, but help
> is always welcome when it's true help after all.
>
> In last weekend I patched 2.4.19-pre10-ac2 with the last preempt-kernel
> patch, and since I was unable to reproduce the crash, though I didn't
> stress the machine enough by lack of time, so it's just informative
> report in case someone want to try.

I didn't get the start of this thread, but I have seen bugs at
page_alloc.c:131 and :117 with a "stock" 2.4.19-pre10-ac2. It not really
Alan's version because I have Chris Mason's reiserfs logging patches, a
BSD super.c fix and the bridge-netfilter patch.

The bug usually strikes at shutdown of the X server, and I have yet to
see this with the opensource nv X11 driver.

Is providing Nvidia with information do any good? Do they respond to bug
reports and actually fix bugs? It might be that they just need to track
new VM behaviour, but I can save the time of writing a bug report if
it's either no good or if someone already did.

--
Matthias Andree