LinuxLists.cc - Nick's scheduler v18

2003-11-10 17:20:14

Subject: Nick's scheduler v18

http://www.kerneltrap.org/~npiggin/v18/

Nothing exciting for desktop users. High end performance is now starting
to get better.

Has an (unimportant) accounting fix that shouldn't really be here, but
doesn't look like it will get in before 2.6.0.

Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
bk14 bk14-v18
real 83.5s 81.7s
user 987.6s 992.5s
sys 158.0s 142.3s

Volanomark looks much better than mainline.

More testing welcome.

2003-11-11 22:31:51

by Tom Sightler

[permalink] [raw]

Subject: Re: Nick's scheduler v18

On Tue, 2003-11-11 at 17:22, Tom Sightler wrote:
> http://www.kerneltrap.org/~npiggin/v18/
>
> Nothing exciting for desktop users. High end performance is now starting
> to get better.

Hey Nick,

Was this tested against single processor? On my Dell Latitude C810 I
can boot test9 and test9-mm2 without problems, but using the identical
config with this patch my system will not even boot up all the way. It
stops at various stages during the init scripts. It seems to
consistently get further if I add elevator=deadline but it never boots
all the way up in either case.

No messages or other good info, just hangs and won't go any further.
Any thoughts?

Later,
Tom

2003-11-12 00:38:34

by Nick Piggin

[permalink] [raw]

Subject: Re: Nick's scheduler v18

Tom Sightler wrote:

>On Tue, 2003-11-11 at 17:22, Tom Sightler wrote:
>
>>http://www.kerneltrap.org/~npiggin/v18/
>>
>>Nothing exciting for desktop users. High end performance is now starting
>>to get better.
>>
>
>Hey Nick,
>
>Was this tested against single processor? On my Dell Latitude C810 I
>can boot test9 and test9-mm2 without problems, but using the identical
>config with this patch my system will not even boot up all the way. It
>stops at various stages during the init scripts. It seems to
>consistently get further if I add elevator=deadline but it never boots
>all the way up in either case.
>
>No messages or other good info, just hangs and won't go any further.
>Any thoughts?
>

Yeah, tested on UP. Sigh. Can I have a look at your .config? Do you have
preempt on?

Thanks

2003-11-13 18:07:20

by Mary Edie Meredith

[permalink] [raw]

Subject: Re: Nick's scheduler v18

Nick,

We ran your patch on STP against one of our database workloads (DBT3 on
postgreSQL which uses file system rather than raw).

The test was able to compile, successfully start up the database,
successfully load the database from source file, successfully run the
power test (single stream update/query/delete).

It failed, however at the next stage, where it starts 8 streams of query
and one stream of updates/deletes where it ran for approximately 40
minutes (usually takes over an hour to complete). The updates appear to
have completed and only queries were active at the time of failure. See
the error message below from the database log.

The queries did produce results and the query streams appear to have
failed at the same time.

.config is at:

http://khack.osdl.org/stp/282959/environment/kernel-config

iostat,vmstat data at this location:
http://khack.osdl.org/stp/282959/results/

from the database log (normal until the line beginning with "PANIC")
...
LOG: removing transaction log file 000000010000004D
LOG: removing transaction log file 0000000100000050
PANIC: fdatasync of log file 1, segment 81 failed: Input/output error
LOG: statement: update time_statistics set e_time=current_timestamp where task_name='PERF1.THRUPUT.QS6.Q11';
LOG: server process (pid 23182) was terminated by signal 6
LOG: terminating any other active server processes
WARNING: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
...

Jenny searched the postgreSQL site for this error and so far can't find
any more details about it. We are puzzled by the error on the log when
at the time there should not have been any actual updates. We will
forward this question to the PostreSQL folks.

On Mon, 2003-11-10 at 09:20, Nick Piggin wrote:
> http://www.kerneltrap.org/~npiggin/v18/
>
> Nothing exciting for desktop users. High end performance is now starting
> to get better.
>
> Has an (unimportant) accounting fix that shouldn't really be here, but
> doesn't look like it will get in before 2.6.0.
>
> Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
> bk14 bk14-v18
> real 83.5s 81.7s
> user 987.6s 992.5s
> sys 158.0s 142.3s
>
> Volanomark looks much better than mainline.
>
> More testing welcome.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Mary Edie Meredith <[email protected]>
Open Source Development Lab

2003-11-13 19:38:41

by Andrew Morton

[permalink] [raw]

Subject: Re: Nick's scheduler v18

Mary Edie Meredith <[email protected]> wrote:
>
> Nick,
>
> We ran your patch on STP against one of our database workloads (DBT3 on
> postgreSQL which uses file system rather than raw).
>
> The test was able to compile, successfully start up the database,
> successfully load the database from source file, successfully run the
> power test (single stream update/query/delete).
>
> It failed, however at the next stage, where it starts 8 streams of query
> and one stream of updates/deletes where it ran for approximately 40
> minutes (usually takes over an hour to complete). The updates appear to
> have completed and only queries were active at the time of failure. See
> the error message below from the database log.
>
> ...
>
> PANIC: fdatasync of log file 1, segment 81 failed: Input/output error
>

It's hard to see how a CPU scheduler change could cause fdatasync() to
return EIO.

What filesystem was being used?

If it was ext2 then perhaps you hit the recently-fixed block allocator
race. That fix was merged after test9. Please check the kernel logs for
any filesystem error messages.

Also, please retry the run, see if it is repeatable.

Thanks.

2003-11-13 21:05:06

by Martin J. Bligh

[permalink] [raw]

Subject: Re: Nick's scheduler v18

> Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
> bk14 bk14-v18
> real 83.5s 81.7s
> user 987.6s 992.5s
> sys 158.0s 142.3s
>
> Volanomark looks much better than mainline.
>
> More testing welcome.

-noint is just backing out the interactivity patch (part of your patch)
Not sure that's helping you much really, but maybe it conflicts with
your other stuff.

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
Elapsed System User CPU
2.6.0-test9 45.28 100.19 568.01 1474.75
2.6.0-test9-noint 48.20 99.05 567.26 1389.00
2.6.0-test9-nick18 45.06 91.56 568.77 1467.50

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.6.0-test9 46.17 122.20 571.58 1501.00
2.6.0-test9-noint 46.43 117.96 577.60 1498.00
2.6.0-test9-nick18 46.90 109.05 589.77 1488.75

Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.6.0-test9 45.84 120.14 570.93 1507.00
2.6.0-test9-noint 47.42 123.52 582.91 1488.75
2.6.0-test9-nick18 46.83 110.70 588.91 1494.00

It seems that you're decreasing system time significantly, but increasing
user time if you have lots of tasks ... context switch thrash, maybe?

Would be interesting if you know which of the many patches in there make
the performance difference ... the whole thing is a bit too big to pick
up and maintain easily ;-)

M.

2003-11-13 22:27:52

by Mike Fedyk

[permalink] [raw]

Subject: Re: Nick's scheduler v18

On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote:
> What filesystem was being used?
>
> If it was ext2 then perhaps you hit the recently-fixed block allocator
> race. That fix was merged after test9. Please check the kernel logs for
> any filesystem error messages.
>
> Also, please retry the run, see if it is repeatable.

Did that hit ext3 also? ISTR, getting some "access beyond end of device"
while running ext3.

Interestingly enough, I didn't get this while using reiserfs3...

And me still running 2.6.0-test6-mm4 :-/

2003-11-14 02:18:08

by Nick Piggin

[permalink] [raw]

Subject: Re: Nick's scheduler v18

Martin J. Bligh wrote:

>>Average of 5 kernel compiles (make -j) on a 16-way 512KB cache NUMAQ gives
>> bk14 bk14-v18
>>real 83.5s 81.7s
>>user 987.6s 992.5s
>>sys 158.0s 142.3s
>>
>>Volanomark looks much better than mainline.
>>
>>More testing welcome.
>>
>
>-noint is just backing out the interactivity patch (part of your patch)
>Not sure that's helping you much really, but maybe it conflicts with
>your other stuff.
>
>Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
> Elapsed System User CPU
> 2.6.0-test9 45.28 100.19 568.01 1474.75
> 2.6.0-test9-noint 48.20 99.05 567.26 1389.00
> 2.6.0-test9-nick18 45.06 91.56 568.77 1467.50
>
>Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
> Elapsed System User CPU
> 2.6.0-test9 46.17 122.20 571.58 1501.00
> 2.6.0-test9-noint 46.43 117.96 577.60 1498.00
> 2.6.0-test9-nick18 46.90 109.05 589.77 1488.75
>
>Kernbench: (make -j vmlinux, maximal tasks)
> Elapsed System User CPU
> 2.6.0-test9 45.84 120.14 570.93 1507.00
> 2.6.0-test9-noint 47.42 123.52 582.91 1488.75
> 2.6.0-test9-nick18 46.83 110.70 588.91 1494.00
>
>It seems that you're decreasing system time significantly, but increasing
>user time if you have lots of tasks ... context switch thrash, maybe?
>

OK, thanks for testing. Still not great.

My patchset does a _lot_ less SMP and NUMA balancing, although I think
that sometimes causes too much idle time. It might be doing more context
switching though.

>
>Would be interesting if you know which of the many patches in there make
>the performance difference ... the whole thing is a bit too big to pick
>up and maintain easily ;-)
>

Its not well broken out though unfortunately. I really need to document and
comment it better.

2003-11-14 05:45:42

by Nick Piggin

[permalink] [raw]

Subject: Re: Nick's scheduler v18

Andrew Morton wrote:

>Mary Edie Meredith <[email protected]> wrote:
>
>>Nick,
>>
>>We ran your patch on STP against one of our database workloads (DBT3 on
>>postgreSQL which uses file system rather than raw).
>>
>>The test was able to compile, successfully start up the database,
>>successfully load the database from source file, successfully run the
>>power test (single stream update/query/delete).
>>
>>It failed, however at the next stage, where it starts 8 streams of query
>>and one stream of updates/deletes where it ran for approximately 40
>>minutes (usually takes over an hour to complete). The updates appear to
>>have completed and only queries were active at the time of failure. See
>>the error message below from the database log.
>>
>>...
>>
>>PANIC: fdatasync of log file 1, segment 81 failed: Input/output error
>>
>>
>
>It's hard to see how a CPU scheduler change could cause fdatasync() to
>return EIO.
>
>What filesystem was being used?
>
>If it was ext2 then perhaps you hit the recently-fixed block allocator
>race. That fix was merged after test9. Please check the kernel logs for
>any filesystem error messages.
>

The kernel tested was test9-bk14 + my patch.

I don't think it would be due to a problem my patch. Perhaps different
scheduling patterns made some race more likely though.

>
>Also, please retry the run, see if it is repeatable.
>

I will let someone else take over from here ;) I'll run the test
again with the latest bk when I submit another round of STP tests
sometime.

2003-11-14 10:34:21

by Sven Luther

[permalink] [raw]

Subject: Re: Nick's scheduler v18

On Thu, Nov 13, 2003 at 02:27:51PM -0800, Mike Fedyk wrote:
> On Thu, Nov 13, 2003 at 11:39:06AM -0800, Andrew Morton wrote:
> > What filesystem was being used?
> >
> > If it was ext2 then perhaps you hit the recently-fixed block allocator
> > race. That fix was merged after test9. Please check the kernel logs for
> > any filesystem error messages.
> >
> > Also, please retry the run, see if it is repeatable.
>
> Did that hit ext3 also? ISTR, getting some "access beyond end of device"
> while running ext3.

BTW, i did encounter some problem with amiga partitions which had some
bad values due to a bug in libparted now fixed. The head size was
counted double or something such, which resulted in accesses beyon the
end of the device. It has a funny effect though. The box would freeze,
and the IDE led would flash in 1 second intervals. Not sure it is the
expected behavior. This is with a 2.4.22 kernel, both on x86 and ppc.

Friendly,

Sven Luther