2003-07-29 14:37:21

by Martin J. Bligh

[permalink] [raw]
Subject: Panic on 2.6.0-test1-mm1

The big box had this on the console ... looks like it was doing a
compile at the time ... sorry, only just noticed it after returning
from OLS, so don't have more context (2.6.0-test1-mm1).

kernel BUG at include/linux/list.h:149!
invalid operand: 0000 [#1]
SMP
CPU: 3
EIP: 0060:[<c0117f98>] Not tainted VLI
EFLAGS: 00010083
EIP is at pgd_dtor+0x64/0x8c
eax: c1685078 ebx: c1685060 ecx: c0288348 edx: c1685078
esi: 00000082 edi: c030b6a0 ebp: e9b9c000 esp: e9ea3ed0
ds: 007b es: 007b ss: 0068
Process cc1 (pid: 4439, threadinfo=e9ea2000 task=eac36690)
Stack: 00000000 f01fdecc c0139588 e9b9c1e0 f01fdecc 00000000 00000039 f01fdecc
0000000a f01fdf54 e9871000 c013a540 f01fdecc e9b9c000 f01e9000 00000024
f01e9010 f01fdecc c013ace8 f01fdecc f01e9010 00000024 f01fdecc f01fdfb8
Call Trace:
[<c0139588>] slab_destroy+0x40/0x124
[<c013a540>] free_block+0xfc/0x13c
[<c013ace8>] drain_array_locked+0x80/0xac
[<c013adef>] reap_timer_fnc+0xdb/0x1e0
[<c013ad14>] reap_timer_fnc+0x0/0x1e0
[<c0125aa5>] run_timer_softirq+0x13d/0x170
[<c0121f7c>] do_softirq+0x6c/0xcc
[<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
[<c023a752>] apic_timer_interrupt+0x1a/0x20

Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00 99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0 89 48 04 89 01 c7 43 18 00 01 10 00 c7
<0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing


2003-07-29 21:21:27

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

> The big box had this on the console ... looks like it was doing a
> compile at the time ... sorry, only just noticed it after returning
> from OLS, so don't have more context (2.6.0-test1-mm1).
>
> kernel BUG at include/linux/list.h:149!
> invalid operand: 0000 [#1]
> SMP
> CPU: 3
> EIP: 0060:[<c0117f98>] Not tainted VLI
> EFLAGS: 00010083
> EIP is at pgd_dtor+0x64/0x8c
> eax: c1685078 ebx: c1685060 ecx: c0288348 edx: c1685078
> esi: 00000082 edi: c030b6a0 ebp: e9b9c000 esp: e9ea3ed0
> ds: 007b es: 007b ss: 0068
> Process cc1 (pid: 4439, threadinfo=e9ea2000 task=eac36690)
> Stack: 00000000 f01fdecc c0139588 e9b9c1e0 f01fdecc 00000000 00000039 f01fdecc
> 0000000a f01fdf54 e9871000 c013a540 f01fdecc e9b9c000 f01e9000 00000024
> f01e9010 f01fdecc c013ace8 f01fdecc f01e9010 00000024 f01fdecc f01fdfb8
> Call Trace:
> [<c0139588>] slab_destroy+0x40/0x124
> [<c013a540>] free_block+0xfc/0x13c
> [<c013ace8>] drain_array_locked+0x80/0xac
> [<c013adef>] reap_timer_fnc+0xdb/0x1e0
> [<c013ad14>] reap_timer_fnc+0x0/0x1e0
> [<c0125aa5>] run_timer_softirq+0x13d/0x170
> [<c0121f7c>] do_softirq+0x6c/0xcc
> [<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
> [<c023a752>] apic_timer_interrupt+0x1a/0x20
>
> Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00 99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0 89 48 04 89 01 c7 43 18 00 01 10 00 c7
> <0>Kernel panic: Fatal exception in interrupt
> In interrupt handler - not syncing

Seems to be trivially reproducible by doing "make -j vmlinux".
I'll try your latest one to see if it's fixed already, I guess.

M.

kernel BUG at include/linux/list.h:149!
invalid operand: 0000 [#1]
SMP
CPU: 3
EIP: 0060:[<c0117f98>] Not tainted VLI
EFLAGS: 00010083
EIP is at pgd_dtor+0x64/0x8c
eax: c1573450 ebx: c1573438 ecx: c0288348 edx: c1573450
esi: 00000082 edi: c030b6a0 ebp: e2e1b000 esp: e2813ed4
ds: 007b es: 007b ss: 0068
Process cc1 (pid: 11439, threadinfo=e2812000 task=e4869980)
Stack: 00000000 f01fdecc c0139588 e2e1b1e0 f01fdecc 00000000 00000039 f01fdecc
00000017 f01fdf54 e05f4000 c013a540 f01fdecc e2e1b000 f01c6410 00000018
f01c6400 f01fdecc c013ac31 f01fdecc f01c6410 00000018 f01fdecc f01fdfb8
Call Trace:
[<c0139588>] slab_destroy+0x40/0x124
[<c013a540>] free_block+0xfc/0x13c
[<c013ac31>] drain_array+0x55/0x8c
[<c013ad14>] reap_timer_fnc+0x0/0x1e0
[<c013ada7>] reap_timer_fnc+0x93/0x1e0
[<c013ad14>] reap_timer_fnc+0x0/0x1e0
[<c0125aa5>] run_timer_softirq+0x13d/0x170
[<c0121f7c>] do_softirq+0x6c/0xcc
[<c01159df>] smp_apic_timer_interrupt+0x14b/0x158
[<c023a752>] apic_timer_interrupt+0x1a/0x20

Code: 80 50 26 00 00 8d 14 92 8d 1c d0 8d 53 18 8b 4a 04 39 11 74 0e 0f 0b 94 00
99 3e 24 c0 8d b6 00 00 00 00 8b 43 18 39 50 04 74 08 <0f> 0b 95 00 99 3e 24 c0
89 48 04 89 01 c7 43 18 00 01 10 00 c7
<0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing

2003-07-30 15:01:46

by Martin J. Bligh

[permalink] [raw]
Subject: 2.6.0-test2-mm1 results

OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
Only noticeable thing is that -mm tree is consistently a little slower
at kernbench

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
Elapsed System User CPU
2.5.74 45.17 97.88 568.43 1474.75
2.5.74-mm1 45.84 109.66 568.05 1477.50
2.6.0-test1 45.25 98.63 568.45 1473.50
2.6.0-test2-mm1 45.38 101.47 569.16 1476.25
2.6.0-test2-mjb1 43.31 75.98 564.33 1478.00

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.5.74 45.74 114.56 571.62 1500.00
2.5.74-mm1 46.59 133.65 570.90 1511.50
2.6.0-test1 45.68 114.68 571.70 1503.00
2.6.0-test2-mm1 46.66 119.82 579.32 1497.25
2.6.0-test2-mjb1 44.03 87.85 569.97 1493.75

Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.5.74 46.11 115.86 571.77 1491.50
2.5.74-mm1 47.13 139.07 571.52 1509.25
2.6.0-test1 46.09 115.76 571.74 1491.25
2.6.0-test2-mm1 46.95 121.18 582.00 1497.50
2.6.0-test2-mjb1 44.08 85.54 570.57 1487.25


DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 3.7%
2.5.74-mm1 88.5% 10.9%
2.6.0-test1 103.0% 2.0%
2.6.0-test2-mm1 99.7% 3.1%
2.6.0-test2-mjb1 107.2% 3.6%

SDET 2 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 53.7%
2.5.74-mm1 133.9% 1.4%
2.6.0-test1 136.4% 1.9%
2.6.0-test2-mm1 132.1% 4.2%
2.6.0-test2-mjb1 156.6% 1.1%

SDET 4 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 3.9%
2.5.74-mm1 92.5% 2.5%
2.6.0-test1 96.7% 5.7%
2.6.0-test2-mm1 70.6% 49.1%
2.6.0-test2-mjb1 134.2% 2.1%

SDET 8 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 45.9%
2.5.74-mm1 123.5% 0.6%
2.6.0-test1 86.1% 70.7%
2.6.0-test2-mm1 127.8% 0.4%
2.6.0-test2-mjb1 158.6% 0.7%

SDET 16 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 0.3%
2.5.74-mm1 92.8% 0.8%
2.6.0-test1 99.3% 0.6%
2.6.0-test2-mm1 97.9% 0.5%
2.6.0-test2-mjb1 120.8% 0.6%

SDET 32 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 0.1%
2.5.74-mm1 94.4% 0.4%
2.6.0-test1 100.4% 0.2%
2.6.0-test2-mm1 97.9% 0.2%
2.6.0-test2-mjb1 123.2% 0.5%

SDET 64 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 0.4%
2.5.74-mm1 95.6% 0.3%
2.6.0-test1 101.1% 0.3%
2.6.0-test2-mm1 100.3% 0.5%
2.6.0-test2-mjb1 127.1% 0.2%

SDET 128 (see disclaimer)
Throughput Std. Dev
2.5.74 100.0% 0.1%
2.5.74-mm1 97.6% 0.2%
2.6.0-test1 100.6% 0.6%
2.6.0-test2-mm1 101.8% 0.0%
2.6.0-test2-mjb1 127.9% 0.3%

diffprofile for kernbench (from test1 to test2-mm1, so not really
fair, but might help):

4383 2.6% total
1600 6.8% page_remove_rmap
934 61.6% do_no_page
470 13.9% __copy_from_user_ll
469 12.8% find_get_page
373 0.0% pgd_ctor
368 4.7% __d_lookup
349 6.6% __copy_to_user_ll
278 15.1% atomic_dec_and_lock
273 154.2% may_open
240 15.6% kmem_cache_free
182 30.2% __wake_up
163 11.2% schedule
152 10.4% free_hot_cold_page
148 21.2% pte_alloc_one
123 6.5% path_lookup
100 9.8% clear_page_tables
77 12.5% copy_process
76 4.2% buffered_rmqueue
70 19.0% .text.lock.file_table
70 5.8% release_pages
66 825.0% free_percpu
55 21.0% vfs_read
54 300.0% cache_grow
50 9.5% kmap_atomic
....
-57 -38.0% __generic_file_aio_read
-62 -100.0% free_pages_bulk
-255 -77.5% dentry_open
-316 -2.2% do_anonymous_page
-415 -77.3% do_page_cache_readahead
-562 -96.1% pgd_alloc
-683 -68.9% filemap_nopage
-1005 -2.0% default_idle

Someone messing with the pgd alloc stuff, perhaps?

2003-07-30 15:25:23

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
> Only noticeable thing is that -mm tree is consistently a little slower
> at kernbench

Could conceivably be my hacks throwing the cc cpu hogs onto the expired array
more frequently.

Con

2003-07-30 16:28:16

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

> On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
>> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
>> Only noticeable thing is that -mm tree is consistently a little slower
>> at kernbench
>
> Could conceivably be my hacks throwing the cc cpu hogs onto the expired array
> more frequently.

OK, do you have that against straight mainline? I can try it broken
out if so ...

M.

2003-07-31 14:56:55

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

--Con Kolivas <[email protected]> wrote (on Thursday, July 31, 2003 01:28:49 +1000):

> On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
>> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
>> Only noticeable thing is that -mm tree is consistently a little slower
>> at kernbench
>
> Could conceivably be my hacks throwing the cc cpu hogs onto the expired array
> more frequently.

Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.6.0-test2 46.05 115.20 571.75 1491.25
2.6.0-test2-con 46.98 121.02 583.55 1498.75
2.6.0-test2-mm1 46.95 121.18 582.00 1497.50

Good guess ;-)

Does this help interactivity a lot, or was it just an experiment?
Perhaps it could be less agressive or something?

M.

2003-07-31 15:22:06

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

>> Does this help interactivity a lot, or was it just an experiment?
>> Perhaps it could be less agressive or something?
>
> Well basically this is a side effect of selecting out the correct cpu hogs in
> the interactivity estimator. It seems to be working ;-) The more cpu hogs
> they are the lower dynamic priority (higher number) they get, and the more
> likely they are to be removed from the active array if they use up their full
> timeslice. The scheduler in it's current form costs more to resurrect things
> from the expired array and restart them, and the cpu hogs will have to wait
> till other less cpu hogging tasks run.
>
> How do we get around this? I'll be brave here and say I'm not sure we need to,
> as cpu hogs have a knack of slowing things down for everyone, and it is best
> not just for interactivity for this to happen, but for fairness.
>
> I suspect a lot of people will have something to say on this one...

Well, what you want to do is prioritise interactive tasks over cpu hogs.
What *seems* to be happening is you're just switching between cpu hogs
more ... that doesn't help anyone really. I don't have an easy answer
for how to fix that, but it doesn't seem desireable to me - we need some
better way of working out what's interactive, and what's not.

M.

2003-07-31 15:09:38

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

On Fri, 1 Aug 2003 00:56, Martin J. Bligh wrote:
> --Con Kolivas <[email protected]> wrote (on Thursday, July 31, 2003
01:28:49 +1000):
> > On Thu, 31 Jul 2003 01:01, Martin J. Bligh wrote:
> >> OK, so test2-mm1 fixes the panic I was seeing in test1-mm1.
> >> Only noticeable thing is that -mm tree is consistently a little slower
> >> at kernbench
> >
> > Could conceivably be my hacks throwing the cc cpu hogs onto the expired
> > array more frequently.
>
> Kernbench: (make -j vmlinux, maximal tasks)
> Elapsed System User CPU
> 2.6.0-test2 46.05 115.20 571.75 1491.25
> 2.6.0-test2-con 46.98 121.02 583.55 1498.75
> 2.6.0-test2-mm1 46.95 121.18 582.00 1497.50
>
> Good guess ;-)
>
> Does this help interactivity a lot, or was it just an experiment?
> Perhaps it could be less agressive or something?

Well basically this is a side effect of selecting out the correct cpu hogs in
the interactivity estimator. It seems to be working ;-) The more cpu hogs
they are the lower dynamic priority (higher number) they get, and the more
likely they are to be removed from the active array if they use up their full
timeslice. The scheduler in it's current form costs more to resurrect things
from the expired array and restart them, and the cpu hogs will have to wait
till other less cpu hogging tasks run.

How do we get around this? I'll be brave here and say I'm not sure we need to,
as cpu hogs have a knack of slowing things down for everyone, and it is best
not just for interactivity for this to happen, but for fairness.

I suspect a lot of people will have something to say on this one...

Con

2003-07-31 15:33:13

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
> >> Does this help interactivity a lot, or was it just an experiment?
> >> Perhaps it could be less agressive or something?
> >
> > Well basically this is a side effect of selecting out the correct cpu
> > hogs in the interactivity estimator. It seems to be working ;-) The more
> > cpu hogs they are the lower dynamic priority (higher number) they get,
> > and the more likely they are to be removed from the active array if they
> > use up their full timeslice. The scheduler in it's current form costs
> > more to resurrect things from the expired array and restart them, and the
> > cpu hogs will have to wait till other less cpu hogging tasks run.
> >
> > How do we get around this? I'll be brave here and say I'm not sure we
> > need to, as cpu hogs have a knack of slowing things down for everyone,
> > and it is best not just for interactivity for this to happen, but for
> > fairness.
> >
> > I suspect a lot of people will have something to say on this one...
>
> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> What *seems* to be happening is you're just switching between cpu hogs
> more ... that doesn't help anyone really. I don't have an easy answer
> for how to fix that, but it doesn't seem desireable to me - we need some
> better way of working out what's interactive, and what's not.

Indeed and now that I've thought about it some more, there are 2 other
possible contributors

1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too low,
and it definitely drops throughput measurably but slightly.
A simple experiment is changing the timeslice granularity in sched.c and see
if that fixes it to see if that's the cause.

2. Tasks waiting for 1 second are considered starved, so cpu hogs running with
their full timeslice used up when something is waiting that long will be
expired. That used to be 10 seconds.
Changing starvation limit will show if that contributes.

Con

2003-07-31 16:02:28

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

> On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
>> >> Does this help interactivity a lot, or was it just an experiment?
>> >> Perhaps it could be less agressive or something?
>> >
>> > Well basically this is a side effect of selecting out the correct cpu
>> > hogs in the interactivity estimator. It seems to be working ;-) The more
>> > cpu hogs they are the lower dynamic priority (higher number) they get,
>> > and the more likely they are to be removed from the active array if they
>> > use up their full timeslice. The scheduler in it's current form costs
>> > more to resurrect things from the expired array and restart them, and the
>> > cpu hogs will have to wait till other less cpu hogging tasks run.
>> >
>> > How do we get around this? I'll be brave here and say I'm not sure we
>> > need to, as cpu hogs have a knack of slowing things down for everyone,
>> > and it is best not just for interactivity for this to happen, but for
>> > fairness.
>> >
>> > I suspect a lot of people will have something to say on this one...
>>
>> Well, what you want to do is prioritise interactive tasks over cpu hogs.
>> What *seems* to be happening is you're just switching between cpu hogs
>> more ... that doesn't help anyone really. I don't have an easy answer
>> for how to fix that, but it doesn't seem desireable to me - we need some
>> better way of working out what's interactive, and what's not.
>
> Indeed and now that I've thought about it some more, there are 2 other
> possible contributors
>
> 1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too low,
> and it definitely drops throughput measurably but slightly.
> A simple experiment is changing the timeslice granularity in sched.c and see
> if that fixes it to see if that's the cause.
>
> 2. Tasks waiting for 1 second are considered starved, so cpu hogs running with
> their full timeslice used up when something is waiting that long will be
> expired. That used to be 10 seconds.
> Changing starvation limit will show if that contributes.

Ah. If I'm doing a full "make -j" I have almost 100 tasks per cpu.
if it's 25ms or 100ms timeslice that's 2.5 or 10s to complete the
timeslice. Won't that make *everyone* seem starved? Not sure that's
a good idea ... reminds me of Dilbert: "we're going to focus particularly
on ... everything!" ;-)

M.


2003-07-31 16:08:44

by Con Kolivas

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

On Fri, 1 Aug 2003 02:01, Martin J. Bligh wrote:
> > On Fri, 1 Aug 2003 01:19, Martin J. Bligh wrote:
> >> >> Does this help interactivity a lot, or was it just an experiment?
> >> >> Perhaps it could be less agressive or something?
> >> >
> >> > Well basically this is a side effect of selecting out the correct cpu
> >> > hogs in the interactivity estimator. It seems to be working ;-) The
> >> > more cpu hogs they are the lower dynamic priority (higher number) they
> >> > get, and the more likely they are to be removed from the active array
> >> > if they use up their full timeslice. The scheduler in it's current
> >> > form costs more to resurrect things from the expired array and restart
> >> > them, and the cpu hogs will have to wait till other less cpu hogging
> >> > tasks run.
> >> >
> >> > How do we get around this? I'll be brave here and say I'm not sure we
> >> > need to, as cpu hogs have a knack of slowing things down for everyone,
> >> > and it is best not just for interactivity for this to happen, but for
> >> > fairness.
> >> >
> >> > I suspect a lot of people will have something to say on this one...
> >>
> >> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> >> What *seems* to be happening is you're just switching between cpu hogs
> >> more ... that doesn't help anyone really. I don't have an easy answer
> >> for how to fix that, but it doesn't seem desireable to me - we need some
> >> better way of working out what's interactive, and what's not.
> >
> > Indeed and now that I've thought about it some more, there are 2 other
> > possible contributors
> >
> > 1. Tasks also round robin at 25ms. Ingo said he's not sure if that's too
> > low, and it definitely drops throughput measurably but slightly.
> > A simple experiment is changing the timeslice granularity in sched.c and
> > see if that fixes it to see if that's the cause.
> >
> > 2. Tasks waiting for 1 second are considered starved, so cpu hogs running
> > with their full timeslice used up when something is waiting that long
> > will be expired. That used to be 10 seconds.
> > Changing starvation limit will show if that contributes.
>
> Ah. If I'm doing a full "make -j" I have almost 100 tasks per cpu.
> if it's 25ms or 100ms timeslice that's 2.5 or 10s to complete the
> timeslice. Won't that make *everyone* seem starved? Not sure that's
> a good idea ... reminds me of Dilbert: "we're going to focus particularly
> on ... everything!" ;-)

The starvation thingy is also dependent on number of running tasks.

I quote from the master engineer Ingo's codebook:

#define EXPIRED_STARVING(rq) \
(STARVATION_LIMIT && ((rq)->expired_timestamp && \
(jiffies - (rq)->expired_timestamp >= \
STARVATION_LIMIT * ((rq)->nr_running) + 1)))

Where STARVATION_LIMIT is 1 second.

Con

2003-07-31 17:11:33

by Bill Davidsen

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

On Fri, 1 Aug 2003, Con Kolivas wrote:


> > Does this help interactivity a lot, or was it just an experiment?
> > Perhaps it could be less agressive or something?
>
> Well basically this is a side effect of selecting out the correct cpu hogs in
> the interactivity estimator. It seems to be working ;-) The more cpu hogs
> they are the lower dynamic priority (higher number) they get, and the more
> likely they are to be removed from the active array if they use up their full
> timeslice. The scheduler in it's current form costs more to resurrect things
> from the expired array and restart them, and the cpu hogs will have to wait
> till other less cpu hogging tasks run.

If that's what it really does, fine. I'm not sure it really finds hogs,
though, or rather "finds only true hogs."

>
> How do we get around this? I'll be brave here and say I'm not sure we need to,
> as cpu hogs have a knack of slowing things down for everyone, and it is best
> not just for interactivity for this to happen, but for fairness.

While this does a good job I'm still worried that we don't have a good
handle on which processes are realy interactive in term of interfacing
with a human. I don't think we can make the scheduler do the right thing
in every case unless it has better information.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2003-07-31 21:18:51

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.6.0-test2-mm1 results

At some point in the past, Con Kolivas wrote:
>> How do we get around this? I'll be brave here and say I'm not sure
>> we need to, as cpu hogs have a knack of slowing things down for
>> everyone, and it is best not just for interactivity for this to
>> happen, but for fairness.

On Thu, Jul 31, 2003 at 08:19:01AM -0700, Martin J. Bligh wrote:
> Well, what you want to do is prioritise interactive tasks over cpu hogs.
> What *seems* to be happening is you're just switching between cpu hogs
> more ... that doesn't help anyone really. I don't have an easy answer
> for how to fix that, but it doesn't seem desireable to me - we need some
> better way of working out what's interactive, and what's not.

I don't believe so. You're describing the precise effect of finite-
quantum FB (or tiny quantum RR) on long-running tasks. Generally
multilevel queues are used to back off to a service-time dependent
queueing discipline (e.g. use RR with increasing quanta for each level
and use level promotion and demotion to discriminate interactive tasks,
which remain higher-priority since overall policy is FB) with longer
timeslices for such beasts for less context-switching overhead. I say
lengthen timeslices with service time and make priority preemption work.

-- wli

2003-07-31 22:40:35

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

On Thu, Jul 31, 2003 at 03:37:10PM -0700, William Lee Irwin III wrote:
> You've applied mingo's patch, which needs to check for PAE in certain
> places like the above. Backing out highpmd didn't make this easier, it
> just gave you performance problems because now all your pmd's are stuck
> on node 0 and another side-effect of those changes is that you're now
> pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
> altogether, if you're hellbent on backing out everyone else's patches
> until your code has nothing to merge against.

I also did the merging of pgtable.c for highpmd and the preconstruction
code correctly, sent it upstream, and it got ignored in favor of code
that does it incorrectly, oopses, and by some voodoo gets something else
I wrote dropped while remaining incorrect.

You may now put the "aggravated" magnet beneath the "wli" position on
the fridge.


-- wli

2003-07-31 22:35:58

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

On Tue, Jul 29, 2003 at 07:37:00AM -0700, Martin J. Bligh wrote:
> The big box had this on the console ... looks like it was doing a
> compile at the time ... sorry, only just noticed it after returning
> from OLS, so don't have more context (2.6.0-test1-mm1).
> kernel BUG at include/linux/list.h:149!
> invalid operand: 0000 [#1]
> SMP
> CPU: 3
> EIP: 0060:[<c0117f98>] Not tainted VLI
> EFLAGS: 00010083
> EIP is at pgd_dtor+0x64/0x8c

This is on PAE, so you're in far deeper trouble than I could have caused:

pgd_cache = kmem_cache_create("pgd",
PTRS_PER_PGD*sizeof(pgd_t),
0,
SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
pgd_ctor,
PTRS_PER_PMD == 1 ? pgd_dtor : NULL);

You've applied mingo's patch, which needs to check for PAE in certain
places like the above. Backing out highpmd didn't make this easier, it
just gave you performance problems because now all your pmd's are stuck
on node 0 and another side-effect of those changes is that you're now
pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
altogether, if you're hellbent on backing out everyone else's patches
until your code has nothing to merge against.


-- wli

2003-07-31 22:52:14

by Andrew Morton

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

William Lee Irwin III <[email protected]> wrote:
>
> You may now put the "aggravated" magnet beneath the "wli" position on
> the fridge.

I never, ever, at any stage was told that highpmd.patch offered any
benefits wrt lock contention or node locality. I was only told that it
saved a little bit of memory on highmem boxes.

It would be useful to actually tell me what your patches do. And to
provide test results which demonstrate the magnitude of the performance
benefits.

2003-08-01 00:14:24

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

William Lee Irwin III <[email protected]> wrote:
>> You may now put the "aggravated" magnet beneath the "wli" position on
>> the fridge.

On Thu, Jul 31, 2003 at 03:40:20PM -0700, Andrew Morton wrote:
> I never, ever, at any stage was told that highpmd.patch offered any
> benefits wrt lock contention or node locality. I was only told that it
> saved a little bit of memory on highmem boxes.

The lock contention is unrelated apart from the mangling of pgd_ctor().
The node locality is only important on systems with exaggerated NUMA
characteristics, such as the kind Martin and I bench on.


On Thu, Jul 31, 2003 at 03:40:20PM -0700, Andrew Morton wrote:
> It would be useful to actually tell me what your patches do. And to
> provide test results which demonstrate the magnitude of the performance
> benefits.

I don't believe it would be valuable to push it on the grounds of
performance, as the performance characteristics of modern midrange i386
systems don't have such high remote access penalties.

The complaint was targetted more at errors in some new incoming patch
motivating mine being backed out.


-- wli

2003-08-01 00:19:15

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

On Thu, Jul 31, 2003 at 05:15:38PM -0700, William Lee Irwin III wrote:
> The complaint was targetted more at errors in some new incoming patch
> motivating mine being backed out.

Oh, and mbligh's inaccurate bug reporting (failure to report the XKVA
patch being applied).


-- wli

2003-08-01 00:30:25

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

On Thu, 31 Jul 2003, William Lee Irwin III wrote:

> I don't believe it would be valuable to push it on the grounds of
> performance, as the performance characteristics of modern midrange i386
> systems don't have such high remote access penalties.

Others might be interested to know about the effects (performance, memory
consumption etc) nonetheless, regardless of how large or negligible. It
helps in finding out where to start looking when things improve (or regress).

Thanks for the work anyway,
Zwane
--
function.linuxpower.ca

2003-08-01 00:44:50

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

> On Tue, Jul 29, 2003 at 07:37:00AM -0700, Martin J. Bligh wrote:
>> The big box had this on the console ... looks like it was doing a
>> compile at the time ... sorry, only just noticed it after returning
>> from OLS, so don't have more context (2.6.0-test1-mm1).
>> kernel BUG at include/linux/list.h:149!
>> invalid operand: 0000 [#1]
>> SMP
>> CPU: 3
>> EIP: 0060:[<c0117f98>] Not tainted VLI
>> EFLAGS: 00010083
>> EIP is at pgd_dtor+0x64/0x8c
>
> This is on PAE, so you're in far deeper trouble than I could have caused:
>
> pgd_cache = kmem_cache_create("pgd",
> PTRS_PER_PGD*sizeof(pgd_t),
> 0,
> SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
> pgd_ctor,
> PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
>
> You've applied mingo's patch, which needs to check for PAE in certain
> places like the above. Backing out highpmd didn't make this easier, it
> just gave you performance problems because now all your pmd's are stuck
> on node 0 and another side-effect of those changes is that you're now
> pounding pgd_lock on 16x+ boxen. You could back out the preconstruction
> altogether, if you're hellbent on backing out everyone else's patches
> until your code has nothing to merge against.

I think this was just virgin -mm1, I can go back and double check ...
Not sure what the stuff about backing out other peoples patches was
all about, I just pointed out the crash.

Andrew had backed out highpmd for other reasons before I even mailed
this out, if that's what your knickers are all twisted about ... I have
no evidence that was causing the problem ... merely that it goes away
on -test2-mm1 ... it was Andrew's suggestion, not mine.

M.

2003-08-01 00:54:43

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

> At some point in the past, I wrote:
>>> pgd_cache = kmem_cache_create("pgd",
>>> PTRS_PER_PGD*sizeof(pgd_t),
>>> 0,
>>> SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
>>> pgd_ctor,
>>> PTRS_PER_PMD == 1 ? pgd_dtor : NULL);
>
> On Thu, Jul 31, 2003 at 05:47:55PM -0700, Martin J. Bligh wrote:
>> I think this was just virgin -mm1, I can go back and double check ...
>> Not sure what the stuff about backing out other peoples patches was
>> all about, I just pointed out the crash.
>
> pgd_dtor() will never be called on PAE due to the above code (thanks to
> the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
> out the PTRS_PER_PMD check).

OK, might have made a mistake ... I can rerun it if you want, but the
latest kernel seems to work now.

M.

2003-08-01 00:51:55

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

At some point in the past, I wrote:
>> pgd_cache = kmem_cache_create("pgd",
>> PTRS_PER_PGD*sizeof(pgd_t),
>> 0,
>> SLAB_HWCACHE_ALIGN | SLAB_MUST_HWCACHE_ALIGN,
>> pgd_ctor,
>> PTRS_PER_PMD == 1 ? pgd_dtor : NULL);

On Thu, Jul 31, 2003 at 05:47:55PM -0700, Martin J. Bligh wrote:
> I think this was just virgin -mm1, I can go back and double check ...
> Not sure what the stuff about backing out other peoples patches was
> all about, I just pointed out the crash.

pgd_dtor() will never be called on PAE due to the above code (thanks to
the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
out the PTRS_PER_PMD check).


-- wli

2003-08-01 01:01:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Panic on 2.6.0-test1-mm1

At some point in the past, I wrote:
>> pgd_dtor() will never be called on PAE due to the above code (thanks to
>> the PTRS_PER_PMD check), _unless_ mingo's patch is applied (which backs
>> out the PTRS_PER_PMD check).

On Thu, Jul 31, 2003 at 05:57:49PM -0700, Martin J. Bligh wrote:
> OK, might have made a mistake ... I can rerun it if you want, but the
> latest kernel seems to work now.

There was a spinlock acquisition in there, too, so if you're seeing
weird performance effects in an update (not sure if there are any yet),
generating a patch to skip that, the list op, and not install pgd_dtor()
when PTRS_PER_PMD == 1 is in order.


-- wli