2002-09-23 21:47:37

by Paul Larson

[permalink] [raw]
Subject: Bug in last night's bk test

The automated nightly testing turned up a bug on one of the test
machines last night. The system that had the problem was running ltp
and was a 2-way PII-550, 2GB ram, ext2. Here is the ksymoops dump:

ksymoops 2.4.5 on i686 2.4.18. Options used
-V (default)
-K (specified)
-L (specified)
-O (specified)
-m System.map (specified)

kernel BUG at ll_rw_blk.c:1802!
invalid operand: 0000
CPU: 0
EIP: 0060:[<c0215971>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010202
eax: 00000080 ebx: f7e3a418 ecx: f7dc8f60 edx: 000000a0
esi: f71f3960 edi: 000000a0 ebp: 00008cd8 esp: ef18bbf0
ds: 0068 es: 0068 ss: 0068
Stack: 00000000 00000000 00000000 0000000c c02159fd f71f3960 ef18bc4c c015b4b1
00000000 f71f3960 ef18bc4c c015b683 ef18bc4c 0809d000 ef18bc4c c015bd52
ef18bc4c 00000fff f7295a54 00000000 f71db900 ef18bc4c 00014000 f71f3960
Call Trace: [<c02159fd>] [<c015b4b1>] [<c015b683>] [<c015bd52>] [<c017decc>]
[<c0266b05>] [<c02674a3>] [<c0265c78>] [<c0248d20>] [<c0263a16>] [<c024889c>]
[<c01157f6>] [<c0214ba0>] [<c012d689>] [<c015be13>] [<c017decc>] [<c017df2e>]
[<c017decc>] [<c015be4f>] [<c012e0dd>] [<c012f502>] [<c012f54e>] [<c013f95a>]
[<c012e1c0>] [<c012c4c9>] [<c012123b>] [<c013efb0>] [<c013f2af>] [<c013faa6>]
[<c0107073>]
Code: 0f 0b 0a 07 ac a1 36 c0 8d b4 26 00 00 00 00 3b 49 44 74 0b


>>EIP; c0215971 <generic_make_request+e1/118> <=====

>>ebx; f7e3a418 <END_OF_CODE+3795a15c/????>
>>ecx; f7dc8f60 <END_OF_CODE+378e8ca4/????>
>>esi; f71f3960 <END_OF_CODE+36d136a4/????>
>>ebp; 00008cd8 Before first symbol
>>esp; ef18bbf0 <END_OF_CODE+2ecab934/????>

Trace; c02159fd <submit_bio+55/60>
Trace; c015b4b1 <dio_bio_submit+29/44>
Trace; c015b683 <dio_await_completion+13/44>
Trace; c015bd52 <direct_io_worker+1ce/1f4>
Trace; c017decc <ext2_get_blocks+0/38>
Trace; c0266b05 <ips_send_cmd+685/690>
Trace; c02674a3 <ips_getscb+3f/60>
Trace; c0265c78 <ips_next+718/7d8>
Trace; c0248d20 <scsi_done+0/90>
Trace; c0263a16 <ips_queue+246/2a4>
Trace; c024889c <scsi_dispatch_cmd+ec/17c>
Trace; c01157f6 <schedule+35e/3b0>
Trace; c0214ba0 <blk_run_queues+8c/9c>
Trace; c012d689 <wait_on_page_bit+c1/cc>
Trace; c015be13 <generic_direct_IO+9b/a8>
Trace; c017decc <ext2_get_blocks+0/38>
Trace; c017df2e <ext2_direct_IO+2a/30>
Trace; c017decc <ext2_get_blocks+0/38>
Trace; c015be4f <generic_file_direct_IO+2f/4f>
Trace; c012e0dd <__generic_file_aio_read+f1/1a4>
Trace; c012f502 <generic_file_readv+5e/78>
Trace; c012f54e <generic_file_writev+32/48>
Trace; c013f95a <do_readv_writev+186/278>
Trace; c012e1c0 <generic_file_read+0/88>
Trace; c012c4c9 <do_brk+109/1e4>
Trace; c012123b <update_process_times+27/30>
Trace; c013efb0 <generic_file_llseek+0/d8>
Trace; c013f2af <sys_lseek+6f/98>
Trace; c013faa6 <sys_readv+5a/6c>
Trace; c0107073 <syscall_call+7/b>

Code; c0215971 <generic_make_request+e1/118>
00000000 <_EIP>:
Code; c0215971 <generic_make_request+e1/118> <=====
0: 0f 0b ud2a <=====
Code; c0215973 <generic_make_request+e3/118>
2: 0a 07 or (%edi),%al
Code; c0215975 <generic_make_request+e5/118>
4: ac lods %ds:(%esi),%al
Code; c0215976 <generic_make_request+e6/118>
5: a1 36 c0 8d b4 mov 0xb48dc036,%eax
Code; c021597b <generic_make_request+eb/118>
a: 26 00 00 add %al,%es:(%eax)
Code; c021597e <generic_make_request+ee/118>
d: 00 00 add %al,(%eax)
Code; c0215980 <generic_make_request+f0/118>
f: 3b 49 44 cmp 0x44(%ecx),%ecx
Code; c0215983 <generic_make_request+f3/118>
12: 74 0b je 1f <_EIP+0x1f> c0215990 <generic_make_request+100/118>


It didn't hang the machine, or crash it. Just showed up in the logs.
This error did not show up in the previous night's test. Please let me
know if any other information would be helpful.

Thanks,
Paul Larson


2002-09-23 23:08:50

by Andrew Morton

[permalink] [raw]
Subject: Re: Bug in last night's bk test

Badari Pulavarty wrote:
>
> >
> > ...
> > Until that's done you'll need to set BIO_MAX_PAGES to 16 in
> > include/linux/bio.h
> >
>
> I am little confused here. I thought IPS driver can handle 64K IO.
> Infact, IPS_MAX_SG is set to 17. So it should be able to handle 68K.
> I have been told that it can handle more than that.. but for some
> reason it was set to 17.
>
> Paul, what kernel are u running ? 2.5.38 ?
>

Current bitkeeper has

#define BIO_MAX_PAGES (256)

That's a megabyte. It works fine with mpage.c. But direct-io.c
is still using BIO_MAX_PAGES. It really is building 1 megabyte
BIOs, which will break just about every device out there.

I think we just ask Linus to do the below until we get it fixed up?


--- 2.5.38-bk2/fs/direct-io.c~direct-io-size Mon Sep 23 16:12:25 2002
+++ 2.5.38-bk2-akpm/fs/direct-io.c Mon Sep 23 16:12:47 2002
@@ -26,7 +26,7 @@
* The largest-sized BIO which this code will assemble, in bytes. Set this
* to PAGE_SIZE if your drivers are broken.
*/
-#define DIO_BIO_MAX_SIZE BIO_MAX_SIZE
+#define DIO_BIO_MAX_SIZE (16*1024)

/*
* How many user pages to map in one call to get_user_pages(). This determines

.

2002-09-23 23:32:07

by Badari Pulavarty

[permalink] [raw]
Subject: Re: Bug in last night's bk test

Ok !! Making direct-io code use bio_add_page() is little tricky.
We operate on the same page multiple times to update the length of
the IO (in case of raw device). I will look at it closely.

Thanks,
Badari

> Current bitkeeper has
>
> #define BIO_MAX_PAGES (256)
>
> That's a megabyte. It works fine with mpage.c. But direct-io.c
> is still using BIO_MAX_PAGES. It really is building 1 megabyte
> BIOs, which will break just about every device out there.
>
> I think we just ask Linus to do the below until we get it fixed up?
>
>
> --- 2.5.38-bk2/fs/direct-io.c~direct-io-size Mon Sep 23 16:12:25 2002
> +++ 2.5.38-bk2-akpm/fs/direct-io.c Mon Sep 23 16:12:47 2002
> @@ -26,7 +26,7 @@
> * The largest-sized BIO which this code will assemble, in bytes. Set this
> * to PAGE_SIZE if your drivers are broken.
> */
> -#define DIO_BIO_MAX_SIZE BIO_MAX_SIZE
> +#define DIO_BIO_MAX_SIZE (16*1024)
>
> /*
> * How many user pages to map in one call to get_user_pages(). This determines
>
> .