2001-11-15 17:00:31

by Gerd Knorr

[permalink] [raw]
Subject: kiobuf / vm bug

Hi,

I think I have found a kiobuf-related bug in the VM of recent linux
kernels. 2.4.13 is fine, 2.4.14-pre1 doesn't boot my machine,
2.4.14-pre2 + newer kernels are broken.

/me runs a kernel with a few v4l-related patches and my current 0.8.x
bttv version (available from http://bytesex.org/patches/ +
http://bytesex.org/bttv/).

With this kernel I can trigger the following BUG():


ksymoops 2.4.3 on i686 2.4.15-pre4. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.4.15-pre4/ (default)
-m /boot/System.map-2.4.15-pre4 (default)

Warning: You did not tell me where to find symbol information. I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc. ksymoops -h explains the options.

Error (expand_objects): cannot stat(/modules/ide-disk.o) for ide-disk
Error (expand_objects): cannot stat(/modules/ide-probe-mod.o) for ide-probe-mod
Error (expand_objects): cannot stat(/modules/ide-mod.o) for ide-mod
Error (expand_objects): cannot stat(/modules/ext2.o) for ext2
Warning (map_ksym_to_module): cannot match loaded module ext2 to a unique module object. Trace may not be reliable.
kernel BUG at page_alloc.c:84!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0129e5b>] Tainted: PF
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 0000001f ebx: c1f1e300 ecx: c01e22a0 edx: 00003528
esi: c1f1e300 edi: f94e2000 ebp: 00000000 esp: f3da7ebc
ds: 0018 es: 0018 ss: 0018
Process setiathome (pid: 738, stackpage=f3da7000)
Stack: c01c0ee7 00000054 c1f1e300 00000000 f94e2000 00000000 f2ee69a0 f2ee69c4
00000000 f94e2060 f94e2000 c012a6f3 c0121120 f2ee69ec f2ee69a0 f94cdfa0
f94bd6d8 f94e2000 f2ee69ec f94c6b4e f2ee69ec f7ffb800 f2ee69ec f7ffb800
Call Trace: [<c012a6f3>] [<c0121120>] [<f94cdfa0>] [<f94bd6d8>] [<f94c6b4e>]
[<f94cdfa0>] [<f94c3b3d>] [<f94cdfa0>] [<f94cdfa0>] [<f94c3c7c>] [<f94cdfa0>]
[<c0107f9d>] [<f94cdfa0>] [<c0108106>]
Code: 0f 0b 83 c4 08 8b 46 18 a8 80 74 11 6a 56 68 e7 0e 1c c0 e8

>>EIP; c0129e5a <__free_pages_ok+aa/29c> <=====
Trace; c012a6f2 <__free_pages+1a/1c>
Trace; c0121120 <unmap_kiobuf+34/48>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94bd6d8 <.bss.end+4e1a/????>
Trace; f94c6b4e <END_OF_CODE+e290/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94c3b3c <END_OF_CODE+b27e/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; f94c3c7c <END_OF_CODE+b3be/????>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; c0107f9c <handle_IRQ_event+30/5c>
Trace; f94cdfa0 <END_OF_CODE+156e2/????>
Trace; c0108106 <do_IRQ+6a/a8>
Code; c0129e5a <__free_pages_ok+aa/29c>
00000000 <_EIP>:
Code; c0129e5a <__free_pages_ok+aa/29c> <=====
0: 0f 0b ud2a <=====
Code; c0129e5c <__free_pages_ok+ac/29c>
2: 83 c4 08 add $0x8,%esp
Code; c0129e5e <__free_pages_ok+ae/29c>
5: 8b 46 18 mov 0x18(%esi),%eax
Code; c0129e62 <__free_pages_ok+b2/29c>
8: a8 80 test $0x80,%al
Code; c0129e64 <__free_pages_ok+b4/29c>
a: 74 11 je 1d <_EIP+0x1d> c0129e76 <__free_pages_ok+c6/29c>
Code; c0129e66 <__free_pages_ok+b6/29c>
c: 6a 56 push $0x56
Code; c0129e68 <__free_pages_ok+b8/29c>
e: 68 e7 0e 1c c0 push $0xc01c0ee7
Code; c0129e6c <__free_pages_ok+bc/29c>
13: e8 00 00 00 00 call 18 <_EIP+0x18> c0129e72 <__free_pages_ok+c2/29c>

<0>Kernel panic: Aiee, killing interrupt handler!
cpu: 0, clocks: 1002240, slice: 501120

2 warnings and 4 errors issued. Results may not be reliable.


The Oops seems to be triggered by the following actions:

(1) the application maps /dev/video0. bttv 0.8.x simply returns some
shared anonymous memory to as mapping.
(2) the application asks the driver to capture a frame. bttv will lock
down the anonymous memory using kiobufs for I/O and prepare
everything for DMA xfer.
(3) The applications exits for some reason, i.e. the anonymous memory
will be unmapped while the DMA transfer is active and the pages are
locked down for I/O.
(4) The DMA xfer is done and bttv's irq handler cleans up everything.
This includes calling unlock_kiovec+unmap_kiobuf for the locked
pages. The unmap_kiobuf call at this point triggeres the Oops
listed above ...

Anyone has a idea what is going wrong here?

Gerd

--
Gerd Knorr <[email protected]>


2001-11-15 17:35:26

by Gerd Knorr

[permalink] [raw]
Subject: Re: kiobuf / vm bug

Gerd Knorr wrote:
> Hi,
>
> I think I have found a kiobuf-related bug in the VM of recent linux
> kernels. 2.4.13 is fine, 2.4.14-pre1 doesn't boot my machine,
> 2.4.14-pre2 + newer kernels are broken.

[ ... ]

ok, the patch below (suggested by Andrea) fixes it.

Gerd

--------------------- cut here -----------------
--- 2.4.15-pre4/mm/memory.c~ Tue Nov 13 10:52:01 2001
+++ 2.4.15-pre4/mm/memory.c Thu Nov 15 18:24:16 2001
@@ -588,7 +588,7 @@
if (map) {
if (iobuf->locked)
UnlockPage(map);
- __free_page(map);
+ page_cache_release(map);
}
}

2001-11-15 17:47:49

by Hugh Dickins

[permalink] [raw]
Subject: [PATCH] Re: kiobuf / vm bug

On Thu, 15 Nov 2001, Gerd Knorr wrote:
>
> I think I have found a kiobuf-related bug in the VM of recent linux
> kernels. 2.4.13 is fine, 2.4.14-pre1 doesn't boot my machine,
> 2.4.14-pre2 + newer kernels are broken.
>
> /me runs a kernel with a few v4l-related patches and my current 0.8.x
> bttv version (available from http://bytesex.org/patches/ +
> http://bytesex.org/bttv/).
>
> With this kernel I can trigger the following BUG():
> ksymoops 2.4.3 on i686 2.4.15-pre4. Options used
> kernel BUG at page_alloc.c:84!
> >>EIP; c0129e5a <__free_pages_ok+aa/29c> <=====
> Trace; c012a6f2 <__free_pages+1a/1c>
> Trace; c0121120 <unmap_kiobuf+34/48>
>
> The Oops seems to be triggered by the following actions:
>
> (1) the application maps /dev/video0. bttv 0.8.x simply returns some
> shared anonymous memory to as mapping.
> (2) the application asks the driver to capture a frame. bttv will lock
> down the anonymous memory using kiobufs for I/O and prepare
> everything for DMA xfer.
> (3) The applications exits for some reason, i.e. the anonymous memory
> will be unmapped while the DMA transfer is active and the pages are
> locked down for I/O.
> (4) The DMA xfer is done and bttv's irq handler cleans up everything.
> This includes calling unlock_kiovec+unmap_kiobuf for the locked
> pages. The unmap_kiobuf call at this point triggeres the Oops
> listed above ...
>
> Anyone has a idea what is going wrong here?

Yes, page_alloc.c line 84 is the "if (PageLRU(page)) BUG();" check.

I think I know what's going on here, it passed right through my mind
(and out the other side) when Linus made the changes for anon pages
on LRU. map_user_kiobuf and unmap_kiobuf are not using the standard
get/release primitives used elsewhere in memory.c, so they've slipped
through the net of what needed to be fixed up for anon pages on LRU.

Please try patch below and let us all know if it solves your problem
(and doesn't introduce others). The first hunk is irrelevant, just
satisfies my preference for using complementarily-named operations.

Hugh

--- 2.4.15-pre4/mm/memory.c Sun Nov 4 17:29:14 2001
+++ linux/mm/memory.c Thu Nov 15 17:30:36 2001
@@ -520,7 +520,7 @@
map = get_page_map(map);
if (map) {
flush_dcache_page(map);
- atomic_inc(&map->count);
+ page_cache_get(map);
} else
printk (KERN_INFO "Mapped page missing [%d]\n", i);
spin_unlock(&mm->page_table_lock);
@@ -588,7 +588,7 @@
if (map) {
if (iobuf->locked)
UnlockPage(map);
- __free_page(map);
+ page_cache_release(map);
}
}


2001-11-15 21:42:29

by Keith Owens

[permalink] [raw]
Subject: ksymoops and initrd (was kiobuf / vm bug)

On Thu, 15 Nov 2001 17:55:31 +0100,
Gerd Knorr <[email protected]> wrote:
>ksymoops 2.4.3 on i686 2.4.15-pre4. Options used
>Error (expand_objects): cannot stat(/modules/ide-disk.o) for ide-disk
>Error (expand_objects): cannot stat(/modules/ide-probe-mod.o) for ide-probe-mod
>Error (expand_objects): cannot stat(/modules/ide-mod.o) for ide-mod
>Error (expand_objects): cannot stat(/modules/ext2.o) for ext2
>Warning (map_ksym_to_module): cannot match loaded module ext2 to a unique module object. Trace may not be reliable.

You are using ksymoops to decode an oops from a system booted with
modules in initrd. The module pathnames in initrd do not match the
pathnames for the on disk modules which causes those warning messages.
When decoding an oops from an initrd boot, use 'ksymoops -i'.