The current elevator_linus() doesn't obey the true elevator
scheduling, and causes I/O livelock during frequent random write
traffics. In such environment I/O (read/write) transactions may be
delayed almost infinitely (more than 1 hour).
Problem:
Current elevator_linus() traverses the I/O requesting queue from the
tail to top. And when the current request has smaller sector number
than the request on the top of queue, it is always placed just after
the top.
This means, if requests in some sector range are continuously
generated, a request with larger sector number is always places at the
last and has no chance to go to the front. e.g. it is not scheduled.
This is not hypothetical but actually observed. Running a random
disk write benchmark can completely supress other disk I/O by this
reason.
The following patch fixes this problem. It still doesn't follow a
strict elevator scheduling, but it does much better. Additionally, it
may be better to add extra priority to reads than writes to obtain
better response, but this patch doesn't.
diff -ru linux-2.4.0-test11-pre2/drivers/block/elevator.c linux-2.4.0-test11-pre2-test5/drivers/block/elevator.c
--- linux-2.4.0-test11-pre2/drivers/block/elevator.c Wed Aug 23 14:33:46 2000
+++ linux-2.4.0-test11-pre2-test5/drivers/block/elevator.c Tue Nov 21 15:32:01 2000
@@ -47,6 +47,11 @@
break;
tmp->elevator_sequence--;
}
+ if (entry == head) {
+ tmp = blkdev_entry_to_request(entry);
+ if (IN_ORDER(req, tmp))
+ entry = real_head->prev;
+ }
list_add(&req->queue, entry);
}
To implement a complete elevator scheduling, preparing an alternate
waiting queue is better, I think.
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]
On Tue, Nov 21 2000, [email protected] wrote:
> The current elevator_linus() doesn't obey the true elevator
> scheduling, and causes I/O livelock during frequent random write
> traffics. In such environment I/O (read/write) transactions may be
> delayed almost infinitely (more than 1 hour).
>
> Problem:
> Current elevator_linus() traverses the I/O requesting queue from the
> tail to top. And when the current request has smaller sector number
> than the request on the top of queue, it is always placed just after
> the top.
> This means, if requests in some sector range are continuously
> generated, a request with larger sector number is always places at the
> last and has no chance to go to the front. e.g. it is not scheduled.
Believe it or not, but this is intentional. In that regard, the
function name is a misnomer -- call it i/o scheduler instead :-)
The current settings in test11 cause this behaviour, because the
starting request sequence numbers are a 'bit' too high.
I'd be very interested if you could repeat your test with my
block patch applied. It has, among other things, a more fair (and
faster) insertion.
*.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.0-test11/blk-11.bz2
> [...] Additionally, it may be better to add extra priority to reads
> than writes to obtain better response, but this patch doesn't.
READs do have bigger priority, they start out with lower sequence
numbers than WRITEs do:
latency = elevator_request_latency(elevator, rw);
With my patch, READ sequence start is now 8192. WRITEs are twice
that.
--
* Jens Axboe <[email protected]>
* SuSE Labs
Jens Axboe writes:
> > Problem:
> > Current elevator_linus() traverses the I/O requesting queue from the
> > tail to top. And when the current request has smaller sector number
> > than the request on the top of queue, it is always placed just after
> > the top.
> > This means, if requests in some sector range are continuously
> > generated, a request with larger sector number is always places at the
> > last and has no chance to go to the front. e.g. it is not scheduled.
>
> Believe it or not, but this is intentional. In that regard, the
> function name is a misnomer -- call it i/o scheduler instead :-)
I never believe it intentional. If it is true, the current kernel
will be suffered from a kind of DOS attack. Yes, actually I'm a
victim of it.
By Running ZD's ServerBench, not only the performance down, but my
machine blocks all commands execution including /bin/ps, /bin/ls... ,
and those are not ^C able unless the benchmark is stopped. Those
commands are read from disks but the requests are wating at the end of
I/O queue, those won't be executed.
Anyway, I'll try your patch.
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]
On Tue, Nov 21 2000, [email protected] wrote:
> > Believe it or not, but this is intentional. In that regard, the
> > function name is a misnomer -- call it i/o scheduler instead :-)
>
> I never believe it intentional. If it is true, the current kernel
> will be suffered from a kind of DOS attack. Yes, actually I'm a
> victim of it.
The problem is caused by the too high sequence numbers in stock
kernel, as I said. Plus, the sequence decrementing doesn't take
request/buffer size into account. So the starvation _is_ limited,
the limit is just too high.
> By Running ZD's ServerBench, not only the performance down, but my
> machine blocks all commands execution including /bin/ps, /bin/ls... ,
> and those are not ^C able unless the benchmark is stopped. Those
> commands are read from disks but the requests are wating at the end of
> I/O queue, those won't be executed.
If performance is down, then that problem is most likely elsewhere.
I/O limited benchmarking typically thrives on lots of request
latency -- with that comes better throughput for individual threads.
> Anyway, I'll try your patch.
Thanks
--
* Jens Axboe <[email protected]>
* SuSE Labs
Jens Axboe writes:
> On Tue, Nov 21 2000, [email protected] wrote:
> > I never believe it intentional. If it is true, the current kernel
> > will be suffered from a kind of DOS attack. Yes, actually I'm a
> > victim of it.
>
> The problem is caused by the too high sequence numbers in stock
> kernel, as I said. Plus, the sequence decrementing doesn't take
> request/buffer size into account. So the starvation _is_ limited,
> the limit is just too high.
Yes, current limit is 1000000 and if I/O can manage 200req/s, then it
will expire 5000 sec after. So, I said "infinite (more than 1hour)".
Why do you add extreme priotity to lower sector accesses, which breaks
elevator scheduling idea?
> If performance is down, then that problem is most likely elsewhere.
> I/O limited benchmarking typically thrives on lots of request
> latency -- with that comes better throughput for individual threads.
No, the performance down caused from this point. Server benchmark has
a standard configuration workload which consists from several kind of
task, such as, CPU interntional, disk seq-read, seq-write, random-read,
random-write.
The benchmark invokes lots of processes, each corresponds to a client,
and each accesses different portion of few large files. We have
enough memory to hold all dirty data at onece (1GB without himem), so
if no I/O blocking occur, all process can be run simultaneously with
limited amount of dirty flush I/O stream.
If some processes eagerly access relatively lower blocks, and other
process unfortunately requests higher block read, then the process is
blocked. Eventually this happens to large portion of processes, the
performance goes extremely down.
During the measurement of test10 or test11, the performance is very
fluctuated and lots of idle time observed by vmstat. Such instability
is not observed on test1 or test2.
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]
On Tue, Nov 21 2000, [email protected] wrote:
> > The problem is caused by the too high sequence numbers in stock
> > kernel, as I said. Plus, the sequence decrementing doesn't take
> > request/buffer size into account. So the starvation _is_ limited,
> > the limit is just too high.
>
> Yes, current limit is 1000000 and if I/O can manage 200req/s, then it
> will expire 5000 sec after. So, I said "infinite (more than 1hour)".
> Why do you add extreme priotity to lower sector accesses, which breaks
> elevator scheduling idea?
Look at how it works in my blk-11 patch. It's not adding extreme
priority to low sector requests, it's always trying to sort sector
wise ascendingly (which of course then tends to put lower sectors
at the front of the queue). blk-11 does it a bit differently though,
the sequence number is in sector size units. And the queue scan
will apply simple aging to requests just sitting there.
> > If performance is down, then that problem is most likely elsewhere.
> > I/O limited benchmarking typically thrives on lots of request
> > latency -- with that comes better throughput for individual threads.
>
> No, the performance down caused from this point. Server benchmark has
> a standard configuration workload which consists from several kind of
> task, such as, CPU interntional, disk seq-read, seq-write, random-read,
> random-write.
>
> The benchmark invokes lots of processes, each corresponds to a client,
> and each accesses different portion of few large files. We have
> enough memory to hold all dirty data at onece (1GB without himem), so
> if no I/O blocking occur, all process can be run simultaneously with
> limited amount of dirty flush I/O stream.
Flushing that much dirty data will always end up blocking waiting
for request slots.
> If some processes eagerly access relatively lower blocks, and other
> process unfortunately requests higher block read, then the process is
> blocked. Eventually this happens to large portion of processes, the
> performance goes extremely down.
> During the measurement of test10 or test11, the performance is very
> fluctuated and lots of idle time observed by vmstat. Such instability
> is not observed on test1 or test2.
So check why there's much idle time -- the test2 elevator is identical
to the one in test11... Or check where it breaks exactly, what kernel
revision. Comparing test8 and test9 would be interesting.
--
* Jens Axboe <[email protected]>
* SuSE Labs
On Tue, Nov 21, 2000 at 05:28:40PM +0900, [email protected] wrote:
> @@ -47,6 +47,11 @@
> break;
> tmp->elevator_sequence--;
> }
> + if (entry == head) {
> + tmp = blkdev_entry_to_request(entry);
> + if (IN_ORDER(req, tmp))
> + entry = real_head->prev;
> + }
> list_add(&req->queue, entry);
> }
This patch is buggy. head with scsi doesn't point to a request so it
doesn't make sense to compare it.
> To implement a complete elevator scheduling, preparing an alternate
Complete elevator scheduling is _just_ implemented, but it's enterely disabled.
You should always enable it before running a 2.4.x kernel. To do that use
elvtune or apply this patch:
--- 2.4.0-test11-pre6/include/linux/elevator.h.~1~ Wed Jul 19 06:43:10 2000
+++ 2.4.0-test11-pre6/include/linux/elevator.h Tue Nov 21 15:57:51 2000
@@ -100,8 +100,8 @@
((elevator_t) { \
0, /* not used */ \
\
- 1000000, /* read passovers */ \
- 2000000, /* write passovers */ \
+ 500, /* read passovers */ \
+ 1000, /* write passovers */ \
0, /* max_bomb_segments */ \
\
0, /* not used */ \
The "DoS" attack is the bug that is been fixed by implementing the new elevator
with proper scheduling.
Andrea
Jens Axboe writes:
> > The benchmark invokes lots of processes, each corresponds to a client,
> > and each accesses different portion of few large files. We have
> > enough memory to hold all dirty data at onece (1GB without himem), so
> > if no I/O blocking occur, all process can be run simultaneously with
> > limited amount of dirty flush I/O stream.
>
> Flushing that much dirty data will always end up blocking waiting
> for request slots.
Yes, such benchmarks need moderate long startup time, like 10 minutes
or more. During that period, physical read requests are issued and
those are waiting on a queue, resulting low performance value.
As file data is accumulated into memory, CPU usage goes
high. Finally, all required data are on the buffer, the system can use
100% CPU.
I show you the example of "vmstat 10" during the test.
It includes the beginning, startup, full-usage states.
At the current setting, the hard-threshold of dirty cache, which needs
synchronous flushing, is over 300MB and this test doesn't reach at the
limit.
During startup period, "cache" steadily increase, but until READ
is disappeared cpu usage stays very low.
If some of the read requests have inferior priority to others, those
requests need (extremely) long time to be served, then the startup
process never end in reasonable time.
procs memory swap io system cpu
r b wswpd free buff cache si so bi bo in cs us sy id
0 0 0 0 863424 1760 13200 0 0 0 0 101 6 0 0 100
START CLIENTS
0 16 0 0 843448 2052 25796 0 0 322 0 587 458 1 2 97
0 16 0 0 827988 2052 40836 0 0 376 0 619 491 1 0 99
0 16 0 0 813276 2052 55120 0 0 357 0 630 489 1 0 98
DIRTY FLUSH STARTS
0 16 0 0 802488 2052 65608 0 0 262 34 541 420 1 0 99
9 more lines
0 16 0 0 717824 2052 148040 0 0 151 73 572 442 1 0 99
9 more lines
0 16 1 0 657216 2052 206980 0 0 119 71 790 602 2 1 98
9 more lines
0 16 1 0 652484 2052 211592 0 0 115 72 815 613 2 0 98
9 more lines
0 16 1 0 623256 2052 240072 0 0 20 119 432 424 1 0 99
9 more lines
0 16 1 0 598600 2052 264088 0 0 16 119 884 719 2 1 97
9 more lines
1 15 1 0 592156 2052 270332 0 0 15 119 1609 1185 5 1 94
0 16 1 0 591540 2052 270916 0 0 15 119 1475 1102 4 1 95
1 15 1 0 590920 2052 271512 0 0 15 119 1669 1243 5 1 94
1 15 1 0 590344 2052 272072 0 0 14 122 2091 1500 7 2 92
0 16 1 0 589752 2052 272644 0 0 14 120 2484 1767 8 2 91
1 15 1 0 589180 2052 273196 0 0 14 120 2674 1905 8 2 90
2 14 1 0 588608 2052 273748 0 0 14 122 3059 2142 10 2 88
0 16 1 0 588044 2052 274288 0 0 14 122 4430 3037 16 3 82
0 16 1 0 587524 2052 274800 0 0 13 123 5892 3913 21 4 75
1 14 2 0 587036 2052 275236 0 0 11 121 8525 6062 31 7 62
10 4 2 0 586688 2052 275576 0 0 9 124 12162 8902 47 10 43
14 1 2 0 586528 2052 275708 0 0 3 129 17605 7710 80 18 2
FULL CPU USAGE (No physical read request is issued)
15 0 2 0 586492 2052 275740 0 0 1 131 18843 6652 82 18 0
14 0 2 0 586484 2052 275748 0 0 0 132 19210 6340 82 18 0
CONTINUE TO THE END
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]
Jens Axboe writes:
> I'd be very interested if you could repeat your test with my
> block patch applied. It has, among other things, a more fair (and
> faster) insertion.
>
> *.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.0-test11/blk-11.bz2
This patch fixes the "DoS" behavior of the current queuing mechanism.
Even if I set the "pass-over" values to very large number (1000000) it
stably runs.
Thank you for your patch.
>From my understanding, passover is an option of the elevator
scheduling to prioritize long waiting requests for improving online
responses. In that test, passover value setting doesn't affect the
benchmark number, which is probable.
Will the patch is included in the next kernel?
BTW,
The major performance difference between test1 and test2 was caused by
whether the hard_dirty_limit is hit or not.
The current Linux has a lot of difficult to set parameters in
/proc/sys.
Once a system goes beyond some settable limits, the system behavior
changes so sharp. Bdf_prm.nrfract in fs/buffer.c is one of the
difficult parameters. I hope a tool to monitor or set these value.
--
Computer Systems Laboratory, Fujitsu Labs.
[email protected]
On Wed, 22 Nov 2000 [email protected] wrote:
> The current Linux has a lot of difficult to set parameters in
> /proc/sys.
> Once a system goes beyond some settable limits, the system behavior
> changes so sharp. Bdf_prm.nrfract in fs/buffer.c is one of the
> difficult parameters. I hope a tool to monitor or set these value.
http://www.powertweak.org
(See CVS version). Helpful(?) advice, profiles, and easy to use UI.
If we missed anything, we take patches, and can always use extra hands :)
regards,
Davej.
--
| Dave Jones <[email protected]> http://www.suse.de/~davej
| SuSE Labs
Jens Axboe wrote:
> On Tue, Nov 21 2000, [email protected] wrote:
> > > Believe it or not, but this is intentional. In that regard, the
> > > function name is a misnomer -- call it i/o scheduler instead :-)
> >
> > I never believe it intentional. If it is true, the current kernel
> > will be suffered from a kind of DOS attack. Yes, actually I'm a
> > victim of it.
>
> The problem is caused by the too high sequence numbers in stock
> kernel, as I said. Plus, the sequence decrementing doesn't take
> request/buffer size into account. So the starvation _is_ limited,
> the limit is just too high.
>
> > By Running ZD's ServerBench, not only the performance down, but my
> > machine blocks all commands execution including /bin/ps, /bin/ls... ,
> > and those are not ^C able unless the benchmark is stopped. Those
> > commands are read from disks but the requests are wating at the end of
> > I/O queue, those won't be executed.
>
> If performance is down, then that problem is most likely elsewhere.
> I/O limited benchmarking typically thrives on lots of request
> latency -- with that comes better throughput for individual threads.
>
> > Anyway, I'll try your patch.
Well this patch does help with the request starvation problem.
Unfortunately it has introduced another problem.
Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
I did see something about problems with aic7xxx driver in test11, so this may
not be related to your patch.
I'm going to run without kiobuf to see if the problem still occurs.
XFS (dev: 8/17) mounting with KIOBUFIO
Start mounting filesystem: sd(8,17)
Ending clean XFS mount for filesystem: sd(8,17)
NMI Watchdog detected LOCKUP on CPU1, registers:
CPU: 1
EIP: 0010:[<c0217a9f>]
EFLAGS: 00000082
eax: c01b21ac ebx: c197b078 ecx: 00000000 edx: 00000012
esi: 00000286 edi: dfff7f70 ebp: dfff7f20 esp: dfff7f14
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=dfff7000)
Stack: c190fb20 04000001 00000020 dfff7f40 c010c539 00000012 c197b078
dfff7f70
00000240 c0331a40 00000012 dfff7f68 c010c73d 00000012 dfff7f70
c190fb20
c0108960 dfff6000 c0108960 c190fb20 00000001 dfff7fa4 c010a8c8
c0108960
Call Trace: [<c010c539>] [<c010c73d>] [<c0108960>] [<c0108960>] [<c010a8c8>]
[<c0108960>] [<c0108960>]
[<c0100018>] [<c010898f>] [<c0108a02>] [<c010a9be>]
Code: f3 90 7e f5 e9 1b a7 f9 ff 80 3d e4 e4 2e c0 00 f3 90 7e f5
Entering kdb (current=0xdfff6000, pid 0) on processor 1 due to WatchDog
Interrupt @ 0xc0217a9f
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x00000000 edx = 0x00000012
esi = 0x00000286 edi = 0xdfff7f70 esp = 0xdfff7f14 eip = 0xc0217a9f
ebp = 0xdfff7f20 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00000082
xds = 0xc1970018 xes = 0xdfff0018 origeax = 0xc01b21ac ®s = 0xdfff7ee0
[1]kdb> bt
EBP EIP Function(args)
0x00000000c0217a9f stext_lock+0x43af
kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xdfff7f20 0x00000000c01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078,
0xdfff7f70, 0x240, 0xc0331a40)
kernel .text 0xc0100000 0xc01b21ac 0xc01b225c
0xdfff7f40 0x00000000c010c539 handle_IRQ_event+0x4d (0x12, 0xdfff7f70,
0xc190fb20, 0xc0108960, 0xdfff6000)
kernel .text 0xc0100000 0xc010c4ec 0xc010c568
0xdfff7f68 0x00000000c010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xdfff6000,
0xdfff6000, 0xc0108960)
kernel .text 0xc0100000 0xc010c6a4 0xc010c790
0x00000000c010a8c8 ret_from_intr
kernel .text 0xc0100000 0xc010a8c8 0xc010a8e8
Interrupt registers:
eax = 0x00000000 ebx = 0xc0108960 ecx = 0x00000000 edx = 0xdfff6000
esi = 0xdfff6000 edi = 0xc0108960 esp = 0xdfff7fa4 eip = 0xc010898f
ebp = 0xdfff7fa4 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00000246
xds = 0xc0100018 xes = 0xdfff0018 origeax = 0xffffff12 ®s = 0xdfff7f70
0x00000000c010898f default_idle+0x2f
kernel .text 0xc0100000 0xc0108960 0xc0108998
0xdfff7fb8 0x00000000c0108a02 cpu_idle+0x42
kernel .text 0xc0100000 0xc01089c0 0xc0108a18
0xdfff7fc0 0x00000000c02fb5b9 start_secondary+0x25
kernel .text.init 0xc02f6000 0xc02fb594
0xc02fb5c0
On Fri, Dec 01 2000, Russell Cattelan wrote:
> > If performance is down, then that problem is most likely elsewhere.
> > I/O limited benchmarking typically thrives on lots of request
> > latency -- with that comes better throughput for individual threads.
> >
> > > Anyway, I'll try your patch.
>
> Well this patch does help with the request starvation problem.
> Unfortunately it has introduced another problem.
> Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
This looks like a generic aic7xxx problem, and not block related. Since
you are doing such nice traces, what is the other CPU doing? CPU1
seems to be stuck grabbing the io_request_lock (for reasons not entirely
clear from reading the aic7xxx source...)
--
* Jens Axboe <[email protected]>
* SuSE Labs
Jens Axboe wrote:
> On Fri, Dec 01 2000, Russell Cattelan wrote:
> > > If performance is down, then that problem is most likely elsewhere.
> > > I/O limited benchmarking typically thrives on lots of request
> > > latency -- with that comes better throughput for individual threads.
> > >
> > > > Anyway, I'll try your patch.
> >
> > Well this patch does help with the request starvation problem.
> > Unfortunately it has introduced another problem.
> > Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
>
> This looks like a generic aic7xxx problem, and not block related. Since
> you are doing such nice traces, what is the other CPU doing? CPU1
> seems to be stuck grabbing the io_request_lock (for reasons not entirely
> clear from reading the aic7xxx source...)
>
Sorry I haven't been able to get a decent backtrace of the other processor.
According to Keith Owens the maintainer of kdb there is a race condition in
kbd and the NMI loop detection stuff that resulting in not being able to
switch cpus.
I'll keep try to dig up some more info.
I'm also seeing various other panics in XFS (well pagebuf to be specific)
with this patch.
Nothing seems to be very consistent and this point.
Ok I did manage to switch processors.
Entering kdb (current=0xd7c0a000, pid 645) on processor 1 due to cpu switch
[1]kdb> bt
EBP EIP Function(args)
0x00000000c0216594 stext_lock+0x2ea4
kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xd7c0bf98 0x00000000c0155964 ext2_sync_file+0x2c (0xd8257560, 0xd7348220,
0x0, 0xd7c0a000)
kernel .text 0xc0100000 0xc0155938
0xc0155a40
0xd7c0bfbc 0x00000000c0136064 sys_fsync+0x54 (0x1, 0xbffff020, 0x0,
0xbffff048, 0x8051738)
kernel .text 0xc0100000 0xc0136010
0xc0136088
0x00000000c010a807 system_call+0x33
kernel .text 0xc0100000 0xc010a7d4
0xc010a80c
[1]kdb>
>
> --
> * Jens Axboe <[email protected]>
> * SuSE Labs
Jens Axboe wrote:
> On Fri, Dec 01 2000, Russell Cattelan wrote:
> > > If performance is down, then that problem is most likely elsewhere.
> > > I/O limited benchmarking typically thrives on lots of request
> > > latency -- with that comes better throughput for individual threads.
> > >
> > > > Anyway, I'll try your patch.
> >
> > Well this patch does help with the request starvation problem.
> > Unfortunately it has introduced another problem.
> > Running 4 doio programs, on and XFS partion with KIO buf IO turned on.
>
> This looks like a generic aic7xxx problem, and not block related. Since
> you are doing such nice traces, what is the other CPU doing? CPU1
> seems to be stuck grabbing the io_request_lock (for reasons not entirely
> clear from reading the aic7xxx source...)
>
Ok Keith gave me a quick hack to help with the race condition.
Here is the latest set up back traces...
The actually accuracy of these back traces... well? who knows, but it does
give us something to go on.
It doesn't make much sense to me right now, but I'm guessing the problem is
starting with that do_div error.
I'm going to take a closer look at the scsi_back_merge_fn.
This may have more to due with our/Chait's kiobuf modifications than
anything else.
XFS (dev: 8/20) mounting with KIOBUFIO
Start mounting filesystem: sd(8,20)
Ending clean XFS mount for filesystem: sd(8,20)
kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a10000
Unable to handle kernel NULL pointer dereference at virtual address
00000008
printing eip:
c019f8b5
*pde = 00000000
Entering kdb (current=0xc1910000, pid 5) on processor 1 Panic: Oops
due to panic @ 0xc019f8b5
eax = 0x00000002 ebx = 0x00000001 ecx = 0x00081478 edx = 0x00000000
esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
ebp = 0xc1911e9c xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010046
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc1911e60
[1]kdb> bt
EBP EIP Function(args)
0xc1911e9c 0x00000000c019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
0xc1957da0, 0xcfb05780, 0x80)
kernel .text 0xc0100000 0xc019f8a0
0xc019f98c
0xc1911f2c 0x00000000c016a0df __make_request+0x1af (0xc1923a98, 0x1,
0xcfb05780, 0x0, 0x814)
kernel .text 0xc0100000 0xc0169f30
0xc016a8a4
0xc1911f70 0x00000000c016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780,
0x0, 0x0, 0x0)
kernel .text 0xc0100000 0xc016a8a4
0xc016aa50
0xc1911fac 0x00000000c016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0)
kernel .text 0xc0100000 0xc016aa50
0xc016ac58
0xc1911fd4 0x00000000c0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00)
kernel .text 0xc0100000 0xc0138e40
0xc0138f24
0xc1911fec 0x00000000c01391ab bdflush+0x8f
kernel .text 0xc0100000 0xc013911c
0xc0139260
0x00000000c0108c9b kernel_thread+0x23
kernel .text 0xc0100000 0xc0108c78
0xc0108cb0
[1]kdb> go
Oops: 0000
CPU: 1
EIP: 0010:[<c019f8b5>]
EFLAGS: 00010046
eax: 00000002 ebx: 00000001 ecx: 00081478 edx: 00000000
esi: c1957da0 edi: c1923ac8 ebp: c1911e9c esp: c1911e94
ds: 0018 es: 0018 ss: 0018
Process kflushd (pid: 5, stackpage=c1911000)
Stack: cfb05780 c1923a98 c1911f2c c016a0df c1923a98 c1957da0 cfb05780
00000080
00000814 00081478 cfb05780 00000008 00000001 00000200 00000000
c1923ac0
00000000 0000000e c1910000 c1911efc c010c77e 00000246 00000814
def0e800
Call Trace: [<c016a0df>] [<c010c77e>] [<c010a8c8>] [<c016a9c8>]
[<c016abde>] [<c0138ed7>] [<c01391ab>]
[<c0108c9b>]
Code: 66 81 7a 08 00 10 0f 47 d8 8b 4a 2c 85 c9 74 19 0f b7 42 08
NMI Watchdog detected LOCKUP on CPU0, registers:
CPU: 0
EIP: 0010:[<c0217a98>]
EFLAGS: 00000086
eax: c01b21ac ebx: c197b078 ecx: 00000000 edx: 00000012
esi: 00000286 edi: c02f5f94 ebp: c02f5f44 esp: c02f5f38
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, stackpage=c02f5000)
Stack: c190fb20 04000001 00000000 c02f5f64 c010c539 00000012 c197b078
c02f5f94
00000240 c0331a40 00000012 c02f5f8c c010c73d 00000012 c02f5f94
c190fb20
c0108960 c02f4000 c0108960 c190fb20 00000000 c02f5fc8 c010a8c8
c0108960
Call Trace: [<c010c539>] [<c010c73d>] [<c0108960>] [<c0108960>]
[<c010a8c8>] [<c0108960>] [<c0108960>]
[<c0100018>] [<c010898f>] [<c0108a02>] [<c0105000>] [<c01001d0>]
Code: 80 3d 64 47 2e c0 00 f3 90 7e f5 e9 1b a7 f9 ff 80 3d 64 e3
Entering kdb (current=0xc02f4000, pid 0) on processor 0 due to WatchDog
Interrupt @ 0xc0217a98
eax = 0xc01b21ac ebx = 0xc197b078 ecx = 0x00000000 edx = 0x00000012
esi = 0x00000286 edi = 0xc02f5f94 esp = 0xc02f5f38 eip = 0xc0217a98
ebp = 0xc02f5f44 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00000086
xds = 0x00000018 xes = 0xc02f0018 origeax = 0xc01b21ac ®s = 0xc02f5f04
[0]kdb> bt
EBP EIP Function(args)
0x00000000c0217a98 stext_lock+0x43a8
kernel .text.lock 0xc02136f0 0xc02136f0
0xc02197c0
0xc02f5f44 0x00000000c01b21c3 do_aic7xxx_isr+0x17 (0x12, 0xc197b078,
0xc02f5f94, 0x240, 0xc0331a40)
kernel .text 0xc0100000 0xc01b21ac
0xc01b225c
0xc02f5f64 0x00000000c010c539 handle_IRQ_event+0x4d (0x12, 0xc02f5f94,
0xc190fb20, 0xc0108960, 0xc02f4000)
kernel .text 0xc0100000 0xc010c4ec
0xc010c568
0xc02f5f8c 0x00000000c010c73d do_IRQ+0x99 (0xc0108960, 0x0, 0xc02f4000,
0xc02f4000, 0xc0108960)
kernel .text 0xc0100000 0xc010c6a4
0xc010c790
0x00000000c010a8c8 ret_from_intr
kernel .text 0xc0100000 0xc010a8c8
0xc010a8e8
Interrupt registers:
eax = 0x00000000 ebx = 0xc0108960 ecx = 0x00000000 edx = 0xc02f4000
esi = 0xc02f4000 edi = 0xc0108960 esp = 0xc02f5fc8 eip = 0xc010898f
ebp = 0xc02f5fc8 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00000246
xds = 0xc0100018 xes = 0xc02f0018 origeax = 0xffffff12 ®s = 0xc02f5f94
0x00000000c010898f default_idle+0x2f
kernel .text 0xc0100000 0xc0108960
0xc0108998
0xc02f5fdc 0x00000000c0108a02 cpu_idle+0x42
kernel .text 0xc0100000 0xc01089c0
0xc0108a18
[0]kdb> cpu
Currently on cpu 0
Available cpus: 0, 1
[0]kdb> cpu 1
Entering kdb (current=0xc1910000, pid 5) on processor 1 due to cpu switch
[1]kdb> bt
EBP EIP Function(args)
0xc1911c88 0x00000000c010c387 __global_cli+0xb7
kernel .text 0xc0100000 0xc010c2d0
0xc010c424
0xc1911c9c 0x00000000c01793a7 rs_timer+0x37 (0x0)
kernel .text 0xc0100000 0xc0179370
0xc017946c
0xc1911cc4 0x00000000c01231b5 timer_bh+0x269 (0xc034de40, 0x20, 0x0)
kernel .text 0xc0100000 0xc0122f4c
0xc0123210
0xc1911cd8 0x00000000c0120248 bh_action+0x50 (0x0, 0x3, 0xc033a660)
kernel .text 0xc0100000 0xc01201f8
0xc01202a8
0xc1911cf0 0x00000000c012011b tasklet_hi_action+0x4f (0xc033a660, 0x260,
0xc0331a60)
kernel .text 0xc0100000 0xc01200cc
0xc0120154
0xc1911d10 0x00000000c011ffad do_softirq+0x5d (0xc1910000, 0xc02dca60)
kernel .text 0xc0100000 0xc011ff50
0xc011ffe0
0xc1911d2c 0x00000000c010c77e do_IRQ+0xda (0xc1910000, 0x0, 0x0,
0xc02dca60, 0xc1910000)
kernel .text 0xc0100000 0xc010c6a4
0xc010c790
0x00000000c010a8c8 ret_from_intr
kernel .text 0xc0100000 0xc010a8c8
0xc010a8e8
Interrupt registers:
eax = 0xc1910648 ebx = 0xc1910000 ecx = 0x00000000 edx = 0x00000000
esi = 0xc02dca60 edi = 0xc1910000 esp = 0xc1911d68 eip = 0xc011512e
ebp = 0xc1911d70 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00000246
xds = 0xc0330018 xes = 0xc0330018 origeax = 0xffffff13 ®s = 0xc1911d34
[1]more>
0x00000000c011512e exit_sighand+0x66 (0xc1910000)
kernel .text 0xc0100000 0xc01150c8
0xc0115134
0xc1911d88 0x00000000c011eef5 do_exit+0x219 (0xb, 0x0, 0x0, 0xc0114798,
0xc1911e50)
kernel .text 0xc0100000 0xc011ecdc
0xc011ef50
0xc1911da0 0x00000000c010aef0 do_divide_error (0xc1911e60, 0x0, 0x1,
0x81478, 0x0)
kernel .text 0xc0100000 0xc010aef0
0xc010af90
0x00000000c010a938 error_code+0x34
kernel .text 0xc0100000 0xc010a904
0xc010a940
Interrupt registers:
eax = 0x00000002 ebx = 0x00000001 ecx = 0x00081478 edx = 0x00000000
esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
ebp = 0xc1911e9c xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010046
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc1911e60
0x00000000c019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
0xc1957da0, 0xcfb05780, 0x80)
kernel .text 0xc0100000 0xc019f8a0
0xc019f98c
0xc1911f2c 0x00000000c016a0df __make_request+0x1af (0xc1923a98, 0x1,
0xcfb05780, 0x0, 0x814)
kernel .text 0xc0100000 0xc0169f30
0xc016a8a4
0xc1911f70 0x00000000c016a9c8 generic_make_request+0x124 (0x1, 0xcfb05780,
0x0, 0x0, 0x0)
kernel .text 0xc0100000 0xc016a8a4
0xc016aa50
0xc1911fac 0x00000000c016abde ll_rw_block+0x18e (0x1, 0x1, 0xc1911fd0, 0x0)
kernel .text 0xc0100000 0xc016aa50
0xc016ac58
0xc1911fd4 0x00000000c0138ed7 flush_dirty_buffers+0x97 (0x0, 0x10f00)
kernel .text 0xc0100000 0xc0138e40
0xc0138f24
[1]more>
0xc1911fec 0x00000000c01391ab bdflush+0x8f
kernel .text 0xc0100000 0xc013911c
0xc0139260
0x00000000c0108c9b kernel_thread+0x23
kernel .text 0xc0100000 0xc0108c78
0xc0108cb0
[1]kdb>
On Mon, Dec 04 2000, Russell Cattelan wrote:
> I'm going to take a closer look at the scsi_back_merge_fn.
> This may have more to due with our/Chait's kiobuf modifications than
> anything else.
>
>
>
> XFS (dev: 8/20) mounting with KIOBUFIO
> Start mounting filesystem: sd(8,20)
> Ending clean XFS mount for filesystem: sd(8,20)
> kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a10000
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000008
> printing eip:
> c019f8b5
> *pde = 00000000
>
> Entering kdb (current=0xc1910000, pid 5) on processor 1 Panic: Oops
> due to panic @ 0xc019f8b5
> eax = 0x00000002 ebx = 0x00000001 ecx = 0x00081478 edx = 0x00000000
> esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
> ebp = 0xc1911e9c xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010046
> xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc1911e60
> [1]kdb> bt
> EBP EIP Function(args)
> 0xc1911e9c 0x00000000c019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
> 0xc1957da0, 0xcfb05780, 0x80)
> kernel .text 0xc0100000 0xc019f8a0
Ah, I see what it is now. The elevator is attempting to merge a buffer
head into a kio based request, poof. The attached diff should take
care of that in your tree.
--
* Jens Axboe <[email protected]>
* SuSE Labs
Jens Axboe wrote:
> On Mon, Dec 04 2000, Russell Cattelan wrote:
> > I'm going to take a closer look at the scsi_back_merge_fn.
> > This may have more to due with our/Chait's kiobuf modifications than
> > anything else.
> >
> >
> >
> > XFS (dev: 8/20) mounting with KIOBUFIO
> > Start mounting filesystem: sd(8,20)
> > Ending clean XFS mount for filesystem: sd(8,20)
> > kmem_alloc doing a vmalloc 262144 size & PAGE_SIZE 0 rval=0xe0a10000
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 00000008
> > printing eip:
> > c019f8b5
> > *pde = 00000000
> >
> > Entering kdb (current=0xc1910000, pid 5) on processor 1 Panic: Oops
> > due to panic @ 0xc019f8b5
> > eax = 0x00000002 ebx = 0x00000001 ecx = 0x00081478 edx = 0x00000000
> > esi = 0xc1957da0 edi = 0xc1923ac8 esp = 0xc1911e94 eip = 0xc019f8b5
> > ebp = 0xc1911e9c xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010046
> > xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc1911e60
> > [1]kdb> bt
> > EBP EIP Function(args)
> > 0xc1911e9c 0x00000000c019f8b5 scsi_back_merge_fn_c+0x15 (0xc1923a98,
> > 0xc1957da0, 0xcfb05780, 0x80)
> > kernel .text 0xc0100000 0xc019f8a0
>
> Ah, I see what it is now. The elevator is attempting to merge a buffer
> head into a kio based request, poof. The attached diff should take
> care of that in your tree.
Hmm.. Yup... that is actually the mods made for kio in our base XFS tree.
I wonder why the patch dropped them?
I should have caught that.
Thanks.
I'll let you know how things go.
>
>
> --
> * Jens Axboe <[email protected]>
> * SuSE Labs
>
> ------------------------------------------------------------------------
>
> xfs-elv-1Name: xfs-elv-1
> Type: Plain Text (text/plain)