2003-01-27 15:31:46

by Martin MOKREJŠ

[permalink] [raw]
Subject: 2.4.21-pre3 kernel crash

Hi,
yesterday I've hit a bug on my laptopo ASUS L3800C ;)
with linux kernel 2.4.21-pre3. I think the crash is related to my tunes
with hdparm() (enabling 32bit, enabling DMA, enabling udma5), but I'm not
sure, although I could check, if someone asks. Currently, I can 100%
reproduce the bug (i.e. cannot boot with that kernel).

The crash I actually observe while booting. One of the init scripts starts up also
network and adds default gateway. That's where it crashes. Please Cc: me
in replies.

# ksymoops -K -o /lib/modules/2.4.21-pre3/ -m /boot/System.map-2.4.21-pre3 kernel.crash
ksymoops 2.4.8 on i686 2.4.19-gentoo-r10. Options used
-V (default)
-K (specified)
-l /proc/modules (default)
-o /lib/modules/2.4.21-pre3/ (specified)
-m /boot/System.map-2.4.21-pre3 (specified)

No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Kernel BUG at panic.c: 141!
invalid operand: 0000
CPU: 0
EIP: 0010:[<c011a3e7>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000026 ebx: 00000000 ecx: 00000001 edx: 00000001
esi: c03e5920 edi: 00001000 ebp: 00000000 esp: f764fbf0
ds: 0018 es: 0018 ss: 0018
Process runscript.sh (pid: 1038, stackpage=f764f000)
Stack: c02fee40 0000009b c024dfc1 0000009b 00000000 00000014 00000014 00000001
f7e32000 f7e35000 c03e59d0 00000000 c03e5920 c024e1b4 c03e5920 f7e2d700
00000002 f79f6c80 00000000 c03e5920 c03e5920 c03e59d0 f7e2d700 c03e5920
Call trace: [<c024dfc1>] [<c024e1b4>] [<c024e6df>] [<c024a11d>] [<c01b2380>]
[<c023fc5e>] [<c01b2380>] [<c0248d71>] [<c0248eeb>] [<c024902a>] [<c021b186>]
[<c011f3ca>] [<c013f5bf>] [<c012c41d>] [<c012ccde>] [<c012d0d0>] [<c012d258>]
[<c012d0d0>] [<c0143312>] [<c01437a6>] [<c0143cb5>] [<c0105cc0>] [<c010744f>]
Code: 0f 0b 8d 00 5f fb 2f c0 90 eb fe 90 90 90 90 90 90 90 90 90


>>EIP; c011a3e7 <__out_of_line_bug+17/30> <=====

>>esi; c03e5920 <ide_hwifs+0/2af8>

Trace; c024dfc1 <ide_build_sglist+181/1a0>
Trace; c024e1b4 <ide_build_dmatable+54/1a0>
Trace; c024e6df <__ide_dma_read+3f/150>
Trace; c024a11d <do_rw_disk+1ed/740>
Trace; c01b2380 <reiserfs_find_actor+0/20>
Trace; c023fc5e <ide_wait_stat+fe/140>
Trace; c01b2380 <reiserfs_find_actor+0/20>
Trace; c0248d71 <start_request+1c1/240>
Trace; c0248eeb <ide_do_request+ab/1a0>
Trace; c024902a <do_ide_request+1a/20>
Trace; c021b186 <generic_unplug_device+36/40>
Trace; c011f3ca <__run_task_queue+5a/70>
Trace; c013f5bf <block_sync_page+1f/30>
Trace; c012c41d <___wait_on_page+bd/c0>
Trace; c012ccde <do_generic_file_read+2fe/450>
Trace; c012d0d0 <file_read_actor+0/e0>
Trace; c012d258 <generic_file_read+a8/150>
Trace; c012d0d0 <file_read_actor+0/e0>
Trace; c0143312 <kernel_read+72/80>
Trace; c01437a6 <prepare_binprm+136/150>
Trace; c0143cb5 <do_execve+e5/220>
Trace; c0105cc0 <sys_execve+50/80>
Trace; c010744f <system_call+33/38>

Code; c011a3e7 <__out_of_line_bug+17/30>
00000000 <_EIP>:
Code; c011a3e7 <__out_of_line_bug+17/30> <=====
0: 0f 0b ud2a <=====
Code; c011a3e9 <__out_of_line_bug+19/30>
2: 8d 00 lea (%eax),%eax
Code; c011a3eb <__out_of_line_bug+1b/30>
4: 5f pop %edi
Code; c011a3ec <__out_of_line_bug+1c/30>
5: fb sti
Code; c011a3ed <__out_of_line_bug+1d/30>
6: 2f das
Code; c011a3ee <__out_of_line_bug+1e/30>
7: c0 90 eb fe 90 90 90 rclb $0x90,0x9090feeb(%eax)
Code; c011a3f5 <__out_of_line_bug+25/30>
e: 90 nop
Code; c011a3f6 <__out_of_line_bug+26/30>
f: 90 nop
Code; c011a3f7 <__out_of_line_bug+27/30>
10: 90 nop
Code; c011a3f8 <__out_of_line_bug+28/30>
11: 90 nop
Code; c011a3f9 <__out_of_line_bug+29/30>
12: 90 nop
Code; c011a3fa <__out_of_line_bug+2a/30>
13: 90 nop

--
Martin Mokrejs <[email protected]>, <[email protected]>
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics <http://mips.gsf.de>
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax:?+49-89-3187 3585


2003-01-27 16:44:18

by Ross Biro

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

This looks like the same problem I ran into with IDE and highmem not
getting along. Try compiling your kernel with out highmem enabled and
see what happenes.

Ross

Martin MOKREJ? wrote:

>
>
>
>Trace; c024dfc1 <ide_build_sglist+181/1a0>
>Trace; c024e1b4 <ide_build_dmatable+54/1a0>
>Trace; c024e6df <__ide_dma_read+3f/150>
>
>
>


2003-01-27 17:04:42

by Martin MOKREJŠ

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, 27 Jan 2003, Ross Biro wrote:

> This looks like the same problem I ran into with IDE and highmem not
> getting along. Try compiling your kernel with out highmem enabled and
> see what happenes.

Yes, that "fixes" it. Any "better solution"? ;-)

> >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> >Trace; c024e6df <__ide_dma_read+3f/150>

--
Martin Mokrejs <[email protected]>, <[email protected]>
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics <http://mips.gsf.de>
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax:?+49-89-3187 3585

2003-01-27 18:52:16

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, 2003-01-27 at 17:53, Ross Biro wrote:
> This looks like the same problem I ran into with IDE and highmem not
> getting along. Try compiling your kernel with out highmem enabled and
> see what happenes.

Indeed, looking at the code, it seems ide_build_sglist() doesn't worry
much about highmem, just picks bh->b_data, assume it's a virtual
address, and gives that to pci_map_sg(). I beleive, at least for highmem
pages, it should rather pick bh->b_page and bh_offset(bh)

I can hack something, maybe tonight, but I can't test HIGHMEM for a while
here. Interestingly, I had no problem report on PPC from users using IDE
with highmem though.

Ben.

2003-01-27 19:10:37

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

Ok, after a second look, 2.4.20 seem correct. Here's a patch against 2.4.21-pre3
doing it like 2.4.20 did.

Let me know if it helps


===== drivers/ide/ide-dma.c 1.7 vs edited =====
--- 1.7/drivers/ide/ide-dma.c Tue Dec 10 21:21:57 2002
+++ edited/drivers/ide/ide-dma.c Mon Jan 27 20:17:35 2003
@@ -249,36 +249,54 @@
{
struct buffer_head *bh;
struct scatterlist *sg = hwif->sg_table;
+ unsigned long lastdataend = ~0UL;
int nents = 0;

if (hwif->sg_dma_active)
BUG();

+ hwif->sg_dma_direction = ddir;
+
bh = rq->bh;
do {
- unsigned char *virt_addr = bh->b_data;
- unsigned int size = bh->b_size;
+ struct scatterlist *sge;
+
+ /*
+ * continue segment from before?
+ */
+ if (bh_phys(bh) == lastdataend) {
+ sg[nents - 1].length += bh->b_size;
+ lastdataend += bh->b_size;
+ continue;
+ }

+ /*
+ * start new segment
+ */
if (nents >= PRD_ENTRIES)
return 0;

- while ((bh = bh->b_reqnext) != NULL) {
- if ((virt_addr + size) != (unsigned char *) bh->b_data)
- break;
- size += bh->b_size;
+ sge = &sg[nents];
+ memset(sge, 0, sizeof(*sge));
+
+ if (bh->b_page) {
+ sge->page = bh->b_page;
+ sge->offset = bh_offset(bh);
+ } else {
+ if (((unsigned long) bh->b_data) < PAGE_SIZE)
+ BUG();
+
+ sge->address = bh->b_data;
}
- memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].address = virt_addr;
- sg[nents].length = size;
- if(size == 0)
- BUG();
+
+ sge->length = bh->b_size;
+ lastdataend = bh_phys(bh) + bh->b_size;
nents++;
- } while (bh != NULL);
+ } while ((bh = bh->b_reqnext) != NULL);

if(nents == 0)
BUG();

- hwif->sg_dma_direction = ddir;
return pci_map_sg(hwif->pci_dev, sg, nents, ddir);
}


2003-01-27 19:16:39

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, Jan 27 2003, Benjamin Herrenschmidt wrote:
> On Mon, 2003-01-27 at 17:53, Ross Biro wrote:
> > This looks like the same problem I ran into with IDE and highmem not
> > getting along. Try compiling your kernel with out highmem enabled and
> > see what happenes.
>
> Indeed, looking at the code, it seems ide_build_sglist() doesn't worry
> much about highmem, just picks bh->b_data, assume it's a virtual
> address, and gives that to pci_map_sg(). I beleive, at least for highmem
> pages, it should rather pick bh->b_page and bh_offset(bh)

Exactly, just take a look at 2.4.20.

> I can hack something, maybe tonight, but I can't test HIGHMEM for a while
> here. Interestingly, I had no problem report on PPC from users using IDE
> with highmem though.

Then noone is using highmem on ppc with 2.4.21-pre yet, it will
definitely barf on the very first highmem bh->b_page

--
Jens Axboe

2003-01-27 19:14:34

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, Jan 27 2003, Martin MOKREJ? wrote:
> On Mon, 27 Jan 2003, Ross Biro wrote:
>
> > This looks like the same problem I ran into with IDE and highmem not
> > getting along. Try compiling your kernel with out highmem enabled and
> > see what happenes.
>
> Yes, that "fixes" it. Any "better solution"? ;-)
>
> > >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> > >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> > >Trace; c024e6df <__ide_dma_read+3f/150>

Someone completely lost the highmem capable scatterlist setup, *boggle*.
This should fix it.

===== drivers/ide/ide-dma.c 1.7 vs edited =====
--- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
+++ edited/drivers/ide/ide-dma.c Mon Jan 27 20:22:06 2003
@@ -249,6 +249,7 @@
{
struct buffer_head *bh;
struct scatterlist *sg = hwif->sg_table;
+ unsigned long lastdataend = ~0UL;
int nents = 0;

if (hwif->sg_dma_active)
@@ -256,22 +257,28 @@

bh = rq->bh;
do {
- unsigned char *virt_addr = bh->b_data;
- unsigned int size = bh->b_size;
+ if (bh_phys(bh) == lastdataend) {
+ sg[nents - 1].length += bh->b_size;
+ lastdataend += bh->b_size;
+ continue;
+ }

if (nents >= PRD_ENTRIES)
return 0;

- while ((bh = bh->b_reqnext) != NULL) {
- if ((virt_addr + size) != (unsigned char *) bh->b_data)
- break;
- size += bh->b_size;
- }
memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].address = virt_addr;
- sg[nents].length = size;
- if(size == 0)
- BUG();
+ if (bh->b_page) {
+ sg[nents].page = bh->b_page;
+ sg[nents].offset = bh_offset(bh);
+ } else {
+ if (((unsigned long) bh->b_data) < PAGE_SIZE)
+ BUG();
+
+ sg[nents].address = bh->b_data;
+ }
+
+ sg[nents].length = bh->b_size;
+ lastdataend = bh_phys(bh) + bh->b_size;
nents++;
} while (bh != NULL);


--
Jens Axboe

2003-01-27 19:35:56

by Alan Cox

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

> On Mon, 2003-01-27 at 17:53, Ross Biro wrote:
> > This looks like the same problem I ran into with IDE and highmem not
> > getting along. Try compiling your kernel with out highmem enabled and
> > see what happenes.
>
> Indeed, looking at the code, it seems ide_build_sglist() doesn't worry
> much about highmem, just picks bh->b_data, assume it's a virtual
> address, and gives that to pci_map_sg(). I beleive, at least for highmem
> pages, it should rather pick bh->b_page and bh_offset(bh)
>
> I can hack something, maybe tonight, but I can't test HIGHMEM for a while
> here. Interestingly, I had no problem report on PPC from users using IDE
> with highmem though.

I don't see how 2.4 IDE would be getting highmem pages. 2.5 IDE does handle
this and does need to

2003-01-27 19:40:17

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, Jan 27 2003, Alan Cox wrote:
> > On Mon, 2003-01-27 at 17:53, Ross Biro wrote:
> > > This looks like the same problem I ran into with IDE and highmem not
> > > getting along. Try compiling your kernel with out highmem enabled and
> > > see what happenes.
> >
> > Indeed, looking at the code, it seems ide_build_sglist() doesn't worry
> > much about highmem, just picks bh->b_data, assume it's a virtual
> > address, and gives that to pci_map_sg(). I beleive, at least for highmem
> > pages, it should rather pick bh->b_page and bh_offset(bh)
> >
> > I can hack something, maybe tonight, but I can't test HIGHMEM for a while
> > here. Interestingly, I had no problem report on PPC from users using IDE
> > with highmem though.
>
> I don't see how 2.4 IDE would be getting highmem pages. 2.5 IDE does handle
> this and does need to

?

The block-highmem patch is in the 2.4 kernels since 2.4.20-pre2/3 (I
forget which). __ide_dma_on() calls ide_toggle_bounce() which turns on
full 32-bit dma for that drive, if it's a disk. So IDE will be getting
highmem pages for io if you have them.

--
Jens Axboe

2003-01-27 20:18:28

by Edward Tandi

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

FYI It works!

Thanks, I actually reported high-memory and DMA problems on the
13th-15th of Jan (under the title of Linux 2.4.21-pre3-ac3 and KT400)
but I thought it was VIA specific. I knew it was only time before a
fix...

The VIA audio issue is still there though ;-)

Ed-T.

On Mon, 2003-01-27 at 19:23, Jens Axboe wrote:
> On Mon, Jan 27 2003, Martin MOKREJ? wrote:
> > On Mon, 27 Jan 2003, Ross Biro wrote:
> >
> > > This looks like the same problem I ran into with IDE and highmem not
> > > getting along. Try compiling your kernel with out highmem enabled and
> > > see what happenes.
> >
> > Yes, that "fixes" it. Any "better solution"? ;-)
> >
> > > >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> > > >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> > > >Trace; c024e6df <__ide_dma_read+3f/150>
>
> Someone completely lost the highmem capable scatterlist setup, *boggle*.
> This should fix it.
>
> ===== drivers/ide/ide-dma.c 1.7 vs edited =====
> --- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
> +++ edited/drivers/ide/ide-dma.c Mon Jan 27 20:22:06 2003
> @@ -249,6 +249,7 @@
> {
> struct buffer_head *bh;
> struct scatterlist *sg = hwif->sg_table;
> + unsigned long lastdataend = ~0UL;
> int nents = 0;
>
> if (hwif->sg_dma_active)
> @@ -256,22 +257,28 @@
>
> bh = rq->bh;
> do {
> - unsigned char *virt_addr = bh->b_data;
> - unsigned int size = bh->b_size;
> + if (bh_phys(bh) == lastdataend) {
> + sg[nents - 1].length += bh->b_size;
> + lastdataend += bh->b_size;
> + continue;
> + }
>
> if (nents >= PRD_ENTRIES)
> return 0;
>
> - while ((bh = bh->b_reqnext) != NULL) {
> - if ((virt_addr + size) != (unsigned char *) bh->b_data)
> - break;
> - size += bh->b_size;
> - }
> memset(&sg[nents], 0, sizeof(*sg));
> - sg[nents].address = virt_addr;
> - sg[nents].length = size;
> - if(size == 0)
> - BUG();
> + if (bh->b_page) {
> + sg[nents].page = bh->b_page;
> + sg[nents].offset = bh_offset(bh);
> + } else {
> + if (((unsigned long) bh->b_data) < PAGE_SIZE)
> + BUG();
> +
> + sg[nents].address = bh->b_data;
> + }
> +
> + sg[nents].length = bh->b_size;
> + lastdataend = bh_phys(bh) + bh->b_size;
> nents++;
> } while (bh != NULL);
>

2003-01-27 20:27:15

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, Jan 27, 2003 at 08:27:43PM +0000, Edward Tandi wrote:
> The VIA audio issue is still there though ;-)

What issue?

Alan has patches that should address support for VT8233 in
via82cxxx_audio driver, in this -ac kernel.

Jeff



2003-01-27 20:38:40

by Edward Tandi

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, 2003-01-27 at 20:36, Jeff Garzik wrote:
> On Mon, Jan 27, 2003 at 08:27:43PM +0000, Edward Tandi wrote:
> > The VIA audio issue is still there though ;-)
>
> What issue?
>
> Alan has patches that should address support for VT8233 in
> via82cxxx_audio driver, in this -ac kernel.
>
> Jeff

>From the original e-mail (-pre3-ac3) slightly altered:

I am running Linux on an ASUS A7V8X, VIA KT400 chipset motherboard. The
processor is a 1.5GHz Athlon XP. I started experimenting with new-ish
kernels again because of the general lack of kernel support for this
chipset in stock kernels.

...

2) The OSS audio driver. It works and this is the main reason why I use
this version of the kernel. The issue I have with it, is that if I start
certain applications (gaim, macromedia flash player 6 for example), esd
gets itself into some kind of hung/blocked state. When this happens, I
need to kill esd and re-start it. Games and xmms work however. The
reason I ask about this is that the downloaded driver from the viaarena
works on a stock kernel without this glitch. Is this a known problem?

The chip is a VT8235 and I'm happy that it mostly works in pre3 too. The
alsa driver reportedly works OK.

While I'm here, any chance of getting the bcm4400 NIC driver into the
2.4 kernel?

Ed-T.


2003-01-27 20:48:06

by Jeff Garzik

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, Jan 27, 2003 at 08:47:57PM +0000, Edward Tandi wrote:
> 2) The OSS audio driver. It works and this is the main reason why I use
> this version of the kernel. The issue I have with it, is that if I start
> certain applications (gaim, macromedia flash player 6 for example), esd
> gets itself into some kind of hung/blocked state. When this happens, I
> need to kill esd and re-start it. Games and xmms work however. The
> reason I ask about this is that the downloaded driver from the viaarena
> works on a stock kernel without this glitch. Is this a known problem?
>
> The chip is a VT8235 and I'm happy that it mostly works in pre3 too. The
> alsa driver reportedly works OK.

hmmm, not a known problem to me. Alan?



> While I'm here, any chance of getting the bcm4400 NIC driver into the
> 2.4 kernel?

Nope, no chance at all. It's more of BroadCom's abstraction layer,
bug-filled crap.

The route here is to fix up "b44" in 2.5, and then put it into 2.4.

Jeff



2003-01-27 23:09:29

by J.A. Magallon

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash


On 2003.01.27 Jens Axboe wrote:
> On Mon, Jan 27 2003, Martin MOKREJ? wrote:
> > On Mon, 27 Jan 2003, Ross Biro wrote:
> >
> > > This looks like the same problem I ran into with IDE and highmem not
> > > getting along. Try compiling your kernel with out highmem enabled and
> > > see what happenes.
> >
> > Yes, that "fixes" it. Any "better solution"? ;-)
> >
> > > >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> > > >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> > > >Trace; c024e6df <__ide_dma_read+3f/150>
>
> Someone completely lost the highmem capable scatterlist setup, *boggle*.
> This should fix it.
>

Applied on top of 2.4.21-pre3-aa (no highmem), it makes my box hang on drive
detection:

PIIX4: IDE controller at PCI slot 00:07.1
PIIX4: chipset revision 1
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
hda:

<hangs here>

normal startup says:

hda: Conner Peripherals 1080MB - CFS1081A, ATA DISK drive
hdb: TOSHIBA DVD-ROM SD-M1712, ATAPI CD/DVD-ROM drive
blk: queue 403386e0, I/O limit 4095Mb (mask 0xffffffff)
hdc: YAMAHA CRW8424E, ATAPI CD/DVD-ROM drive
hdd: IOMEGA ZIP 250 ATAPI, ATAPI FLOPPY drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
hda: task_no_data_intr: error=0x04 { DriveStatusError }
hda: 2114180 sectors (1082 MB), CHS=524/64/63, DMA

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre3-jam4 (gcc 3.2.1 (Mandrake Linux 9.1 3.2.1-4mdk))

2003-01-27 23:14:57

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Tue, Jan 28 2003, J.A. Magallon wrote:
>
> On 2003.01.27 Jens Axboe wrote:
> > On Mon, Jan 27 2003, Martin MOKREJ? wrote:
> > > On Mon, 27 Jan 2003, Ross Biro wrote:
> > >
> > > > This looks like the same problem I ran into with IDE and highmem not
> > > > getting along. Try compiling your kernel with out highmem enabled and
> > > > see what happenes.
> > >
> > > Yes, that "fixes" it. Any "better solution"? ;-)
> > >
> > > > >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> > > > >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> > > > >Trace; c024e6df <__ide_dma_read+3f/150>
> >
> > Someone completely lost the highmem capable scatterlist setup, *boggle*.
> > This should fix it.
> >
>
> Applied on top of 2.4.21-pre3-aa (no highmem), it makes my box hang on drive
> detection:
>
> PIIX4: IDE controller at PCI slot 00:07.1
> PIIX4: chipset revision 1
> PIIX4: not 100% native mode: will probe irqs later
> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> hda:
>
> <hangs here>
>
> normal startup says:
>
> hda: Conner Peripherals 1080MB - CFS1081A, ATA DISK drive
> hdb: TOSHIBA DVD-ROM SD-M1712, ATAPI CD/DVD-ROM drive
> blk: queue 403386e0, I/O limit 4095Mb (mask 0xffffffff)
> hdc: YAMAHA CRW8424E, ATAPI CD/DVD-ROM drive
> hdd: IOMEGA ZIP 250 ATAPI, ATAPI FLOPPY drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: task_no_data_intr: error=0x04 { DriveStatusError }
> hda: 2114180 sectors (1082 MB), CHS=524/64/63, DMA

Reviewing the patch, it did have a nasty bug, didn't iterate
buffer_heads at all so a clustered request will fail. Attached version
should work.

===== drivers/ide/ide-dma.c 1.7 vs edited =====
--- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
+++ edited/drivers/ide/ide-dma.c Tue Jan 28 00:23:45 2003
@@ -249,6 +249,7 @@
{
struct buffer_head *bh;
struct scatterlist *sg = hwif->sg_table;
+ unsigned long lastdataend = ~0UL;
int nents = 0;

if (hwif->sg_dma_active)
@@ -256,24 +257,30 @@

bh = rq->bh;
do {
- unsigned char *virt_addr = bh->b_data;
- unsigned int size = bh->b_size;
+ if (bh_phys(bh) == lastdataend) {
+ sg[nents - 1].length += bh->b_size;
+ lastdataend += bh->b_size;
+ continue;
+ }

if (nents >= PRD_ENTRIES)
return 0;

- while ((bh = bh->b_reqnext) != NULL) {
- if ((virt_addr + size) != (unsigned char *) bh->b_data)
- break;
- size += bh->b_size;
- }
memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].address = virt_addr;
- sg[nents].length = size;
- if(size == 0)
- BUG();
+ if (bh->b_page) {
+ sg[nents].page = bh->b_page;
+ sg[nents].offset = bh_offset(bh);
+ } else {
+ if (((unsigned long) bh->b_data) < PAGE_SIZE)
+ BUG();
+
+ sg[nents].address = bh->b_data;
+ }
+
+ sg[nents].length = bh->b_size;
+ lastdataend = bh_phys(bh) + bh->b_size;
nents++;
- } while (bh != NULL);
+ } while ((bh = bh->b_reqnext) != NULL);

if(nents == 0)
BUG();

--
Jens Axboe

2003-01-27 23:52:52

by J.A. Magallon

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash


On 2003.01.28 Jens Axboe wrote:
> On Tue, Jan 28 2003, J.A. Magallon wrote:
[...]
> > Applied on top of 2.4.21-pre3-aa (no highmem), it makes my box hang on drive
> > detection:
> >
> > PIIX4: IDE controller at PCI slot 00:07.1
> > PIIX4: chipset revision 1
> > PIIX4: not 100% native mode: will probe irqs later
> > ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
> > ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> > hda:
> >
[...]
>
> Reviewing the patch, it did have a nasty bug, didn't iterate
> buffer_heads at all so a clustered request will fail. Attached version
> should work.
>

Thanks, it works both on a box without himem and on one with 1Gb.

--
J.A. Magallon <[email protected]> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.1 (Cooker) for i586
Linux 2.4.21-pre3-jam4 (gcc 3.2.1 (Mandrake Linux 9.1 3.2.1-4mdk))

2003-01-28 08:59:36

by Alan Cox

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

> > certain applications (gaim, macromedia flash player 6 for example), esd
> > gets itself into some kind of hung/blocked state. When this happens, I
> > need to kill esd and re-start it. Games and xmms work however. The
> > reason I ask about this is that the downloaded driver from the viaarena
> > works on a stock kernel without this glitch. Is this a known problem?
> >
> > The chip is a VT8235 and I'm happy that it mostly works in pre3 too. The
> > alsa driver reportedly works OK.
>
> hmmm, not a known problem to me. Alan?

I have an idea actually, and if so a quickfix for Arjan. What happens is

app -> open /dev/audio (gets the 6 channel audio)
app2 (esd) -> open /dev/audio1 (gets the secondary dsp)

or app2 opens /dev/audio and we have a close v open wakeup bug.

I will investigate both paths

2003-01-28 10:40:48

by Martin MOKREJŠ

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Mon, 27 Jan 2003, Edward Tandi wrote:

> FYI It works!
>
> > ===== drivers/ide/ide-dma.c 1.7 vs edited =====
> > --- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
> > +++ edited/drivers/ide/ide-dma.c Mon Jan 27 20:22:06 2003

Hi,
yes, for me too. Thanks! BTW: Someone will have a look who removed that
part and what else got removed too. ;)

--
Martin Mokrejs <[email protected]>, <[email protected]>
PGP5.0i key is at http://www.natur.cuni.cz/~mmokrejs
MIPS / Institute for Bioinformatics <http://mips.gsf.de>
GSF - National Research Center for Environment and Health
Ingolstaedter Landstrasse 1, D-85764 Neuherberg, Germany
tel.: +49-89-3187 3683 , fax:?+49-89-3187 3585

2003-01-28 16:32:11

by Larry Sendlosky

[permalink] [raw]
Subject: RE: 2.4.21-pre3 kernel crash

I was glad to see the physical page support in 2.4.20.
(and also noticed that the current BK tree clobbered it
on a patch set from Alan).

One question,

+ lastdataend = bh_phys(bh) + bh->b_size;

bh_phys(x) uses bh->b_page. Does it make a difference
if bh->b_page is zero? What if someone combines virt and phys
buffer addresses in bh list?

thanks
larry


-----Original Message-----
From: Jens Axboe [mailto:[email protected]]
Sent: Monday, January 27, 2003 6:24 PM
To: J.A. Magallon
Cc: [email protected]
Subject: Re: 2.4.21-pre3 kernel crash


On Tue, Jan 28 2003, J.A. Magallon wrote:
>
> On 2003.01.27 Jens Axboe wrote:
> > On Mon, Jan 27 2003, Martin MOKREJ? wrote:
> > > On Mon, 27 Jan 2003, Ross Biro wrote:
> > >
> > > > This looks like the same problem I ran into with IDE and highmem not
> > > > getting along. Try compiling your kernel with out highmem enabled and
> > > > see what happenes.
> > >
> > > Yes, that "fixes" it. Any "better solution"? ;-)
> > >
> > > > >Trace; c024dfc1 <ide_build_sglist+181/1a0>
> > > > >Trace; c024e1b4 <ide_build_dmatable+54/1a0>
> > > > >Trace; c024e6df <__ide_dma_read+3f/150>
> >
> > Someone completely lost the highmem capable scatterlist setup, *boggle*.
> > This should fix it.
> >
>
> Applied on top of 2.4.21-pre3-aa (no highmem), it makes my box hang on drive
> detection:
>
> PIIX4: IDE controller at PCI slot 00:07.1
> PIIX4: chipset revision 1
> PIIX4: not 100% native mode: will probe irqs later
> ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:DMA
> ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
> hda:
>
> <hangs here>
>
> normal startup says:
>
> hda: Conner Peripherals 1080MB - CFS1081A, ATA DISK drive
> hdb: TOSHIBA DVD-ROM SD-M1712, ATAPI CD/DVD-ROM drive
> blk: queue 403386e0, I/O limit 4095Mb (mask 0xffffffff)
> hdc: YAMAHA CRW8424E, ATAPI CD/DVD-ROM drive
> hdd: IOMEGA ZIP 250 ATAPI, ATAPI FLOPPY drive
> ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
> ide1 at 0x170-0x177,0x376 on irq 15
> hda: task_no_data_intr: status=0x51 { DriveReady SeekComplete Error }
> hda: task_no_data_intr: error=0x04 { DriveStatusError }
> hda: 2114180 sectors (1082 MB), CHS=524/64/63, DMA

Reviewing the patch, it did have a nasty bug, didn't iterate
buffer_heads at all so a clustered request will fail. Attached version
should work.

===== drivers/ide/ide-dma.c 1.7 vs edited =====
--- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
+++ edited/drivers/ide/ide-dma.c Tue Jan 28 00:23:45 2003
@@ -249,6 +249,7 @@
{
struct buffer_head *bh;
struct scatterlist *sg = hwif->sg_table;
+ unsigned long lastdataend = ~0UL;
int nents = 0;

if (hwif->sg_dma_active)
@@ -256,24 +257,30 @@

bh = rq->bh;
do {
- unsigned char *virt_addr = bh->b_data;
- unsigned int size = bh->b_size;
+ if (bh_phys(bh) == lastdataend) {
+ sg[nents - 1].length += bh->b_size;
+ lastdataend += bh->b_size;
+ continue;
+ }

if (nents >= PRD_ENTRIES)
return 0;

- while ((bh = bh->b_reqnext) != NULL) {
- if ((virt_addr + size) != (unsigned char *) bh->b_data)
- break;
- size += bh->b_size;
- }
memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].address = virt_addr;
- sg[nents].length = size;
- if(size == 0)
- BUG();
+ if (bh->b_page) {
+ sg[nents].page = bh->b_page;
+ sg[nents].offset = bh_offset(bh);
+ } else {
+ if (((unsigned long) bh->b_data) < PAGE_SIZE)
+ BUG();
+
+ sg[nents].address = bh->b_data;
+ }
+
+ sg[nents].length = bh->b_size;
+ lastdataend = bh_phys(bh) + bh->b_size;
nents++;
- } while (bh != NULL);
+ } while ((bh = bh->b_reqnext) != NULL);

if(nents == 0)
BUG();

--
Jens Axboe

2003-01-28 21:57:09

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Tue, Jan 28 2003, Larry Sendlosky wrote:
> I was glad to see the physical page support in 2.4.20.
> (and also noticed that the current BK tree clobbered it
> on a patch set from Alan).
>
> One question,
>
> + lastdataend = bh_phys(bh) + bh->b_size;
>
> bh_phys(x) uses bh->b_page. Does it make a difference
> if bh->b_page is zero? What if someone combines virt and phys
> buffer addresses in bh list?

Yes good catch! New version attached.

===== drivers/ide/ide-dma.c 1.7 vs edited =====
--- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
+++ edited/drivers/ide/ide-dma.c Tue Jan 28 23:04:32 2003
@@ -249,6 +249,7 @@
{
struct buffer_head *bh;
struct scatterlist *sg = hwif->sg_table;
+ unsigned long lastdataend = ~0UL;
int nents = 0;

if (hwif->sg_dma_active)
@@ -256,24 +257,42 @@

bh = rq->bh;
do {
- unsigned char *virt_addr = bh->b_data;
- unsigned int size = bh->b_size;
+ int contig = 0;
+
+ if (bh->b_page) {
+ if (bh_phys(bh) == lastdataend)
+ contig = 1;
+ } else {
+ if ((unsigned long) bh->b_data == lastdataend)
+ contig = 1;
+ }
+
+ if (contig) {
+ sg[nents - 1].length += bh->b_size;
+ lastdataend += bh->b_size;
+ continue;
+ }

if (nents >= PRD_ENTRIES)
return 0;

- while ((bh = bh->b_reqnext) != NULL) {
- if ((virt_addr + size) != (unsigned char *) bh->b_data)
- break;
- size += bh->b_size;
- }
memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].address = virt_addr;
- sg[nents].length = size;
- if(size == 0)
- BUG();
+
+ if (bh->b_page) {
+ sg[nents].page = bh->b_page;
+ sg[nents].offset = bh_offset(bh);
+ lastdataend = bh_phys(bh) + bh->b_size;
+ } else {
+ if ((unsigned long) bh->b_data < PAGE_SIZE)
+ BUG();
+
+ sg[nents].address = bh->b_data;
+ lastdataend = (unsigned long) bh->b_data + bh->b_size;
+ }
+
+ sg[nents].length = bh->b_size;
nents++;
- } while (bh != NULL);
+ } while ((bh = bh->b_reqnext) != NULL);

if(nents == 0)
BUG();

--
Jens Axboe

2003-01-29 08:14:15

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Tue, 2003-01-28 at 23:06, Jens Axboe wrote:
> On Tue, Jan 28 2003, Larry Sendlosky wrote:
> > I was glad to see the physical page support in 2.4.20.
> > (and also noticed that the current BK tree clobbered it
> > on a patch set from Alan).
> >
> > One question,
> >
> > + lastdataend = bh_phys(bh) + bh->b_size;
> >
> > bh_phys(x) uses bh->b_page. Does it make a difference
> > if bh->b_page is zero? What if someone combines virt and phys
> > buffer addresses in bh list?
>
> Yes good catch! New version attached.

That's interesting. I wasn't awaye you could have a request
containing such a "mixed" set of bh without valid pages.
Actually, I though b_page was always valid. Looking at
other drivers (typically the the csiss.c driver), it also
unconditionally use b_page & bh_phys(). So either we are
looking at a false problem, or that driver need fixing as
well.

Now assuming that mix can happen, I don't like the fact that
your new version will use lastdataend to compare against both
physical and virtual addresses.

While I agree they should usually not collide (as those virtual
addresses are supposed to be grok-able by pci_map_* they are
in the linear mapping while the physical ones will typically
be smaller than c0000000), there is still some risk of
collision especially with HIGHMEM, or did I miss something ?

Maybe the solution is to have an additional variable indicating
if the last bh was virtual or physical, and reset lastdataend
to ~0 when the current one is different...

Ben.

> ===== drivers/ide/ide-dma.c 1.7 vs edited =====
> --- 1.7/drivers/ide/ide-dma.c Wed Nov 20 18:46:24 2002
> +++ edited/drivers/ide/ide-dma.c Tue Jan 28 23:04:32 2003
> @@ -249,6 +249,7 @@
> {
> struct buffer_head *bh;
> struct scatterlist *sg = hwif->sg_table;
> + unsigned long lastdataend = ~0UL;
> int nents = 0;
>
> if (hwif->sg_dma_active)
> @@ -256,24 +257,42 @@
>
> bh = rq->bh;
> do {
> - unsigned char *virt_addr = bh->b_data;
> - unsigned int size = bh->b_size;
> + int contig = 0;
> +
> + if (bh->b_page) {
> + if (bh_phys(bh) == lastdataend)
> + contig = 1;
> + } else {
> + if ((unsigned long) bh->b_data == lastdataend)
> + contig = 1;
> + }
> +
> + if (contig) {
> + sg[nents - 1].length += bh->b_size;
> + lastdataend += bh->b_size;
> + continue;
> + }
>
> if (nents >= PRD_ENTRIES)
> return 0;
>
> - while ((bh = bh->b_reqnext) != NULL) {
> - if ((virt_addr + size) != (unsigned char *) bh->b_data)
> - break;
> - size += bh->b_size;
> - }
> memset(&sg[nents], 0, sizeof(*sg));
> - sg[nents].address = virt_addr;
> - sg[nents].length = size;
> - if(size == 0)
> - BUG();
> +
> + if (bh->b_page) {
> + sg[nents].page = bh->b_page;
> + sg[nents].offset = bh_offset(bh);
> + lastdataend = bh_phys(bh) + bh->b_size;
> + } else {
> + if ((unsigned long) bh->b_data < PAGE_SIZE)
> + BUG();
> +
> + sg[nents].address = bh->b_data;
> + lastdataend = (unsigned long) bh->b_data + bh->b_size;
> + }
> +
> + sg[nents].length = bh->b_size;
> nents++;
> - } while (bh != NULL);
> + } while ((bh = bh->b_reqnext) != NULL);
>
> if(nents == 0)
> BUG();
--
Benjamin Herrenschmidt <[email protected]>

2003-01-29 08:44:24

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Wed, Jan 29 2003, Benjamin Herrenschmidt wrote:
> On Tue, 2003-01-28 at 23:06, Jens Axboe wrote:
> > On Tue, Jan 28 2003, Larry Sendlosky wrote:
> > > I was glad to see the physical page support in 2.4.20.
> > > (and also noticed that the current BK tree clobbered it
> > > on a patch set from Alan).
> > >
> > > One question,
> > >
> > > + lastdataend = bh_phys(bh) + bh->b_size;
> > >
> > > bh_phys(x) uses bh->b_page. Does it make a difference
> > > if bh->b_page is zero? What if someone combines virt and phys
> > > buffer addresses in bh list?
> >
> > Yes good catch! New version attached.
>
> That's interesting. I wasn't awaye you could have a request
> containing such a "mixed" set of bh without valid pages.
> Actually, I though b_page was always valid. Looking at
> other drivers (typically the the csiss.c driver), it also
> unconditionally use b_page & bh_phys(). So either we are
> looking at a false problem, or that driver need fixing as
> well.

b_page is not always valid for IDE, this is a special case. ide-scsi
fabricates its own buffer_heads. cciss etc can rely on valid b_page
always.

> Now assuming that mix can happen, I don't like the fact that
> your new version will use lastdataend to compare against both
> physical and virtual addresses.

They should not be mixed in one call of build_sglist().

> Maybe the solution is to have an additional variable indicating
> if the last bh was virtual or physical, and reset lastdataend
> to ~0 when the current one is different...

That should be a BUG(), if anything.

--
Jens Axboe

2003-01-29 09:34:15

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Wed, 2003-01-29 at 09:53, Jens Axboe wrote:
> On Wed, Jan 29 2003, Benjamin Herrenschmidt wrote:
> > On Tue, 2003-01-28 at 23:06, Jens Axboe wrote:
> > > On Tue, Jan 28 2003, Larry Sendlosky wrote:
> > > > I was glad to see the physical page support in 2.4.20.
> > > > (and also noticed that the current BK tree clobbered it
> > > > on a patch set from Alan).
> > > >
> > > > One question,
> > > >
> > > > + lastdataend = bh_phys(bh) + bh->b_size;
> > > >
> > > > bh_phys(x) uses bh->b_page. Does it make a difference
> > > > if bh->b_page is zero? What if someone combines virt and phys
> > > > buffer addresses in bh list?
> > >
> > > Yes good catch! New version attached.
> >
> > That's interesting. I wasn't awaye you could have a request
> > containing such a "mixed" set of bh without valid pages.
> > Actually, I though b_page was always valid. Looking at
> > other drivers (typically the the csiss.c driver), it also
> > unconditionally use b_page & bh_phys(). So either we are
> > looking at a false problem, or that driver need fixing as
> > well.
>
> b_page is not always valid for IDE, this is a special case. ide-scsi
> fabricates its own buffer_heads. cciss etc can rely on valid b_page
> always.

Ok. Well, looking at ide-scsi, I see that:

#if 1
bh->b_data = sg->address;
#else
if (sg->address) {
bh->b_page = virt_to_page(sg->address);
bh->b_data = (char *) ((unsigned long) sg->address & ~PAGE_MASK);
} else if (sg->page) {
bh->b_page = sg->page;
bh->b_data = (char *) sg->offset;
}
#endif

Can't we just turn that #if 1 into #if 0 ? The case of non segmented
requests remains, where we have

bh->b_data = pc->scsi_cmd->request_buffer;
bh->b_size = pc->request_transfer;

But then, can't we use the same approach as above using virt_to_page() ?

I don't say we should do it now. I'm mostly asking for my own eductation
about all this, though I admit I don't like the "mixing" done in
ide_build_sglist.

Ben.

2003-01-29 10:07:37

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.21-pre3 kernel crash

On Wed, Jan 29 2003, Benjamin Herrenschmidt wrote:
> On Wed, 2003-01-29 at 09:53, Jens Axboe wrote:
> > On Wed, Jan 29 2003, Benjamin Herrenschmidt wrote:
> > > On Tue, 2003-01-28 at 23:06, Jens Axboe wrote:
> > > > On Tue, Jan 28 2003, Larry Sendlosky wrote:
> > > > > I was glad to see the physical page support in 2.4.20.
> > > > > (and also noticed that the current BK tree clobbered it
> > > > > on a patch set from Alan).
> > > > >
> > > > > One question,
> > > > >
> > > > > + lastdataend = bh_phys(bh) + bh->b_size;
> > > > >
> > > > > bh_phys(x) uses bh->b_page. Does it make a difference
> > > > > if bh->b_page is zero? What if someone combines virt and phys
> > > > > buffer addresses in bh list?
> > > >
> > > > Yes good catch! New version attached.
> > >
> > > That's interesting. I wasn't awaye you could have a request
> > > containing such a "mixed" set of bh without valid pages.
> > > Actually, I though b_page was always valid. Looking at
> > > other drivers (typically the the csiss.c driver), it also
> > > unconditionally use b_page & bh_phys(). So either we are
> > > looking at a false problem, or that driver need fixing as
> > > well.
> >
> > b_page is not always valid for IDE, this is a special case. ide-scsi
> > fabricates its own buffer_heads. cciss etc can rely on valid b_page
> > always.
>
> Ok. Well, looking at ide-scsi, I see that:
>
> #if 1
> bh->b_data = sg->address;
> #else
> if (sg->address) {
> bh->b_page = virt_to_page(sg->address);
> bh->b_data = (char *) ((unsigned long) sg->address & ~PAGE_MASK);
> } else if (sg->page) {
> bh->b_page = sg->page;
> bh->b_data = (char *) sg->offset;
> }
> #endif
>
> Can't we just turn that #if 1 into #if 0 ? The case of non segmented
> requests remains, where we have
>
> bh->b_data = pc->scsi_cmd->request_buffer;
> bh->b_size = pc->request_transfer;
>
> But then, can't we use the same approach as above using virt_to_page() ?
>
> I don't say we should do it now. I'm mostly asking for my own eductation
> about all this, though I admit I don't like the "mixing" done in
> ide_build_sglist.

Sure that is a possibility, there's definitely nothing wrong with doing
so.

--
Jens Axboe