2001-10-31 08:02:08

by Linus Torvalds

[permalink] [raw]
Subject: 2.4.14-pre6


Incredibly, I didn't get a _single_ bugreport about the fact that I had
forgotten to change the version number in pre5. Usually that's everybody's
favourite bug.. Is everybody asleep on the lists?

Anyway, pre6 is out, and now it's too late. I updated the version number.

Other changes:

Bulk of pre5->pre6 differences: the sparc and net updates from David.

Oh, and the first funny patches for the upcoming SMT P4 cores are starting
to show up. More to come.

The MM has calmed down, but the OOM killer didn't use to work. Now it
does, with heurstics that are so incredibly simple that it's almost
embarrassing.

And I dare anybody to break those OOM heuristics - either by not
triggering when they should, or by triggering too early. You'll get an
honourable mention if you can break them and tell me how ("Honourable
mention"? Yeah, I'm cheap. What else is new?)

In fact, I'd _really_ like to know of any VM loads that show bad
behaviour. If you have a pet peeve about the VM, now is the time to speak
up. Because otherwise I think I'm done.

Anybody out there with cerberus?

Linus "128MB of RAM and 1GB into swap, and happy" Torvalds

----

pre6:
- me: remember to bump the version number ;)
- Hugh Dickins: export "free_lru_page()" for modules
- Jeff Garzik: don't change nopage arguments, just make the last a dummy one
- David Miller: sparc and net updates (netfilter, VLAN etc)
- Nikita Danilov: reiserfs cleanups
- Jan Kara: quota initialization race
- Tigran Aivazian: make the x86 microcode update driver happy about
hyperthreaded P4's
- me: shrink dcache/icache more aggressively
- me: fix up oom-killer so that it actually works

pre5:
- Andrew Morton: remove stale UnlockPage
- me: swap cache page locking update

pre4:
- Mikael Pettersson: fix P4 boot with APIC enabled
- me: fix device queuing thinko, clean up VM locking

pre3:
- Ren? Scharfe: random bugfix
- me: block device queuing low-water-marks, VM mapped tweaking.

pre2:
- Alan Cox: more merging
- Alexander Viro: block device module race fixes
- Richard Henderson: mmap for 32-bit alpha personality
- Jeff Garzik: 8139 and natsemi update

pre1:
- Michael Warfield: computone serial driver update
- Alexander Viro: cdrom module race fixes
- David Miller: Acenic driver fix
- Andrew Grover: ACPI update
- Kai Germaschewski: ISDN update
- Tim Waugh: parport update
- David Woodhouse: JFFS garbage collect sleep


2001-10-31 09:15:11

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Linus Torvalds wrote:
>
> If you have a pet peeve about the VM, now is the time to speak
> up.
>

I'm peeved by the request queue changes.

Appended here is a program which creates 100,000 small files.
Using ext2 on -pre5. We see how long it takes to run

(make-many-files ; sync)

For several values of queue_nr_requests:

queue_nr_requests: 128 8192 32768
execution time: 4:43 3:25 3:20

Almost all of the execution time is in the `sync'.

This is on a disk with a 2 meg cache which does pretty aggressive
write-behind. I expect the difference would be worse with a disk
which doesn't help so much.

By restricting the number of requests in flight to 128 we're
giving new requests only a very small chance of getting merged with
an existing request. More seeking.

OK, not an interesting workload. But I suspect that there are real
workloads which will be bitten by this.

Why is the queue length so tiny now? Latency? If so, couldn't this
be addressed by giving reads higher priority versus writes?



#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>

static void doit(char *name)
{
static char stuff[4096];
int fd;

fd = creat(name, 0666);
if (fd < 0) {
perror(name);
exit(1);
}
write(fd, stuff, sizeof(stuff));
close(fd);
}

int main(void)
{
int i, j, k, l, m;
char buf[100];

for (i = 0; i < 10; i++) {
sprintf(buf, "%d", i);
mkdir(buf, 0777);
for (j = 0; j < 10; j++) {
sprintf(buf, "%d/%d", i, j);
mkdir(buf, 0777);
printf("%s\n", buf);
for (k = 0; k < 10; k++) {
sprintf(buf, "%d/%d/%d", i, j, k);
mkdir(buf, 0777);
for (l = 0; l < 10; l++) {
sprintf(buf, "%d/%d/%d/%d", i, j, k, l);
mkdir(buf, 0777);
for (m = 0; m < 10; m++) {
sprintf(buf, "%d/%d/%d/%d/%d", i, j, k, l, m);
doit(buf);
}
}
}
}
}
exit(0);
}

2001-10-31 09:30:31

by bert hubert

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, Oct 31, 2001 at 12:00:00AM -0800, Linus Torvalds wrote:


> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.

The Google case comes to mind. And we should be good for google!

Regards,

bert

--
http://www.PowerDNS.com Versatile DNS Software & Services
Trilab The Technology People
Netherlabs BV / Rent-a-Nerd.nl - Nerd Available -
'SYN! .. SYN|ACK! .. ACK!' - the mating call of the internet

2001-10-31 09:29:21

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, Oct 31 2001, Andrew Morton wrote:
> Linus Torvalds wrote:
> >
> > If you have a pet peeve about the VM, now is the time to speak
> > up.
> >
>
> I'm peeved by the request queue changes.

I was too. However it didn't seem to make too much of a difference in
real life, I guess your test cases shows a bit differently.

> Appended here is a program which creates 100,000 small files.
> Using ext2 on -pre5. We see how long it takes to run
>
> (make-many-files ; sync)
>
> For several values of queue_nr_requests:
>
> queue_nr_requests: 128 8192 32768
> execution time: 4:43 3:25 3:20
>
> Almost all of the execution time is in the `sync'.
>
> This is on a disk with a 2 meg cache which does pretty aggressive
> write-behind. I expect the difference would be worse with a disk
> which doesn't help so much.
>
> By restricting the number of requests in flight to 128 we're
> giving new requests only a very small chance of getting merged with
> an existing request. More seeking.
>
> OK, not an interesting workload. But I suspect that there are real
> workloads which will be bitten by this.
>
> Why is the queue length so tiny now? Latency? If so, couldn't this
> be addressed by giving reads higher priority versus writes?

Should be possible. Try for yourself. When you do your 100,000 small
file tes with 8k or more of requests, how is interactive feel of other
programs accessing the same spindle? Play around with the READ and WRITE
intial elevator sequence numbers, repeat :-)

--
Jens Axboe

2001-10-31 16:17:38

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.14-pre6


In article <[email protected]>,
Andrew Morton <[email protected]> wrote:
>
>Appended here is a program which creates 100,000 small files.
>Using ext2 on -pre5. We see how long it takes to run
>
> (make-many-files ; sync)
>
>For several values of queue_nr_requests:
>
>queue_nr_requests: 128 8192 32768
>execution time: 4:43 3:25 3:20
>
>Almost all of the execution time is in the `sync'.

Hmm.. I don't consider "sync" to be a benchmark, and one of the things
that made me limit the queue size was in fact that Linux in the
timeframe before roughly 2.4.7 or so was _completely_ unresponsive when
you did a big "untar" followed by a "sync".

I'd rather have a machine where I don't even much notice the sync than
one where a made-up-load and a "sync" that servers no purpose shows
lower throughput.

Do you actually have any real load that cares?

>By restricting the number of requests in flight to 128 we're
>giving new requests only a very small chance of getting merged with
>an existing request. More seeking.

If you can come up with alternatives that do not suck from a latency
standpoint, I'm open to ideas.

However, having tested the -ac approach, I know from personal experience
that it's just way too easy to find behaviour with so horrible latency
on a 2GB machine that it's not in the _least_ funny.

Making the elevator heavily favour reads over writes might be ok enough
to make the long queues even an option but:

>OK, not an interesting workload. But I suspect that there are real
>workloads which will be bitten by this.
>
>Why is the queue length so tiny now? Latency? If so, couldn't this
>be addressed by giving reads higher priority versus writes?

It's a write-write latency thing too, but that's probably not as strong an
argument.

Trivial example: do the above thing at the same time you have a mail agent
open that does a "fsync()" on its mail store (and depending on your mail
agent and your mail folder layout, you may have quite a lot of small
fsyncs going on).

I don't know about you, but I start up mail agents a _lot_ more often
than I do "sync". And I'd rather do "sync &" than have bad interactive
performance from the mail agent.

I'm not against making the queues larger, but on the other hand I see so
many _better_ approaches that I would rather people spent some effort on,
for example, making the dirty list itself be more ordered.

We have actually talked about some higher-level ordering of the dirty list
for at least five years, but nobody has ever done it. And I bet you $5
that you'll get (a) better throughput than by making the queues longer and
(b) you'll have fine latency while you write and (c) that you want to
order the write-queue anyway for filesystems that care about ordering.

So yes, making the queue longer is an "easy" solution, but if it then
leads to complex problems like how to make an elevator that is guaranteed
to not have bad latency behaviour, I actually think that doing some (even
just fairly rudimentary) ordering of the write queue ends up being easier
_and_ more effective.

Linus

2001-10-31 18:41:02

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Linus Torvalds wrote:
>
> In article <[email protected]>,
> Andrew Morton <[email protected]> wrote:
> >
> >Appended here is a program which creates 100,000 small files.
> >Using ext2 on -pre5. We see how long it takes to run
> >
> > (make-many-files ; sync)
> >
> >For several values of queue_nr_requests:
> >
> >queue_nr_requests: 128 8192 32768
> >execution time: 4:43 3:25 3:20
> >
> >Almost all of the execution time is in the `sync'.
>
> Hmm.. I don't consider "sync" to be a benchmark, and one of the things
> that made me limit the queue size was in fact that Linux in the
> timeframe before roughly 2.4.7 or so was _completely_ unresponsive when
> you did a big "untar" followed by a "sync".

Sure. I chose `sync' because it's measurable. That sync took
four minutes, so the machine will be locked up seeking for four
minutes whether the writeback was initiated by /bin/sync or by
kupdate/bdflush.

> I'd rather have a machine where I don't even much notice the sync than
> one where a made-up-load and a "sync" that servers no purpose shows
> lower throughput.
>
> Do you actually have any real load that cares?

All I do is compile kernels :)

Actually, ext3 journal replay can sometimes take thirty seconds
or longer - it reads maybe ten megs from the journal and then
it has to splatter it all over the platter and wait on it.

> ...
> We have actually talked about some higher-level ordering of the dirty list
> for at least five years, but nobody has ever done it. And I bet you $5
> that you'll get (a) better throughput than by making the queues longer and
> (b) you'll have fine latency while you write and (c) that you want to
> order the write-queue anyway for filesystems that care about ordering.

I'll buy that. It's not just the dirty list, either. I've seen
various incarnations of page_launder() and its successor which
were pretty suboptimal from a write clustering pov.

But it's actually quite seductive to take a huge amount of data and
just chuck it at the request layer and let Jens sort it out. This
usually works well and keeps the complexity in one place.

One does wonder whether everything is working as it should, though.
Creating those 100,000 4k files is going to require writeout of
how many blocks? 120,000? And four minutes is enough time for
34,000 seven-millisecond seeks. And ext2 is pretty good at laying
things out contiguously. These numbers don't gel.

Ah-ha. Look at the sync_inodes stuff:

for (zillions of files) {
filemap_fdatasync(file)
filemap_fdatawait(file)
}

If we turn this into

for (zillions of files)
filemap_fdatasync(file)
for (zillions of files)
filemap_fdatawait(file)

I suspect that interesting effects will be observed, yes? Especially
if we have a nice long request queue, and the results of the
preceding sync_buffers() are still available for being merged with.

kupdate runs this code path as well. Why is there any need for
kupdate to wait on the writes?

Anyway. I'll take a look....


-

2001-10-31 19:09:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.14-pre6


On Wed, 31 Oct 2001, Andrew Morton wrote:
>
> But it's actually quite seductive to take a huge amount of data and
> just chuck it at the request layer and let Jens sort it out. This
> usually works well and keeps the complexity in one place.

Fair enough, I see your point. How would you suggest we handle the latency
thing, though?

I'm not against making the elevator more intelligent, and you have a
good argument. But I've very much against "allow the queues to grow with
no sense of latency".

> One does wonder whether everything is working as it should, though.
> Creating those 100,000 4k files is going to require writeout of
> how many blocks? 120,000? And four minutes is enough time for
> 34,000 seven-millisecond seeks. And ext2 is pretty good at laying
> things out contiguously. These numbers don't gel.
>
> Ah-ha. Look at the sync_inodes stuff:
>
> for (zillions of files) {
> filemap_fdatasync(file)
> filemap_fdatawait(file)
> }
>
> If we turn this into
>
> for (zillions of files)
> filemap_fdatasync(file)
> for (zillions of files)
> filemap_fdatawait(file)

Good catch, I bet you're right.

> kupdate runs this code path as well. Why is there any need for
> kupdate to wait on the writes?

At least historically (and I think it's still true in some cases),
kupdated was also in charge of trying to write out buffers under
low-memory circumstances. And without any throttling, blind writing can
make things worse.

However, the request throttle should be _plenty_ good enough, so I think
you're right.

Oh, one issue in case you're going to work on this: kupdated does need to
do the "wait_for_locked_buffers()" at some point, as that is also what
moves buffers from the locked list to the clean list. But that has nothing
to do with the fdatawait thing.

Linus

2001-10-31 19:22:07

by Michael Peddemors

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Lets' let this testing cycle go a little longer before making any
changes.. Let developers catch up..
My own kernel patches I had to stop because I couldn't keep up .... Can
we go a full month with you just hitting us over the head with a bat
yelling 'test, dammit, test', until this is tested fully before
releasing another production release?

I would like to get a chance to test this one on more than one hardware
platform :) And I want to test it under production loads as well..


On Wed, 2001-10-31 at 00:00, Linus Torvalds wrote:
>

> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.
>
> Anybody out there with cerberus?
>
> Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
>
--
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
Wizard IT Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada

2001-10-31 19:40:49

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.14-pre6


On 31 Oct 2001, Michael Peddemors wrote:
>
> Lets' let this testing cycle go a little longer before making any
> changes.. Let developers catch up..

My not-so-cunning plan is actually to try to figure out the big problems
now, then release a reasonable 2.4.14, and then just stop for a while,
refusing to take new features.

Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets
to do whatever he wants to do with 2.4.x. Including, of course, just
reverting all my and Andrea's VM changes ;)

I'm personally convinced that my tree does the right thing VM-wise, but
Alan _will_ be the maintainer, and I'm not going to butt in on his
decisions. The last thing I want to be is a micromanaging pointy-haired
boss.

(2.5.x will obviously use the new VM regardless, and I actually believe
that the new VM simply is better. I think that Alan will see the light
eventually, but at the same time I clearly admit that Alan was right on a
stability front for the last month or two ;)

> My own kernel patches I had to stop because I couldn't keep up .... Can
> we go a full month with you just hitting us over the head with a bat
> yelling 'test, dammit, test', until this is tested fully before
> releasing another production release?

I think we're really close.

[ I'd actually like to thank Gary Sandine from laclinux.com who made the
"Ultimate Linux Box" for an article by Eric Raymond for Linux Journal.
They sent me one too, and the 2GB box made it easier to test some real
highmem loads. This has given me additional load environments to test,
and made me able to see some of the problems people reported.. ]

But I do want to make a real 2.4.14, not just another "final" pre-kernel,
and let that be the base for a reasonably orderly switch-over at 2.4.15
(ie I'd still release 2.4.15, everything from then on is Alan).

Linus

2001-10-31 19:53:10

by Philipp Matthias Hahn

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, 31 Oct 2001, Linus Torvalds wrote:

> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?
Message-ID: <Pine.LNX.4.32.0110302228010.17012-100000@skynet>

> Other changes:
linux/zlib_fs.h is still missing in you tree and breaks compilation of
fs/cramfs and other.

http://marc.theaimsgroup.com/?l=linux-kernel&m=100407670605760&q=raw

BYtE
Philipp
--
/ / (_)__ __ ____ __ Philipp Hahn
/ /__/ / _ \/ // /\ \/ /
/____/_/_//_/\_,_/ /_/\_\ [email protected]



2001-10-31 19:55:03

by Mike Castle

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, Oct 31, 2001 at 11:38:44AM -0800, Linus Torvalds wrote:
> Then, 2.4.15 would be the point where I start 2.5.x, and where Alan gets
> to do whatever he wants to do with 2.4.x. Including, of course, just
> reverting all my and Andrea's VM changes ;)

There are a lot of patches applied to -ac that are not in the main line.
If many of those are applied to 2.4.16+, would they also be put into the
2.5.x line early in the process so that they will be fairly synced, plus
give you ample time to feel comfortable with their stability?

Especially patches that did not come directly from the maintainers.

mrc
--
Mike Castle [email protected] http://www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

2001-10-31 20:02:50

by Rik van Riel

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, 31 Oct 2001, Linus Torvalds wrote:

> (2.5.x will obviously use the new VM regardless, and I actually
> believe that the new VM simply is better. I think that Alan will see
> the light eventually, but at the same time I clearly admit that Alan
> was right on a stability front for the last month or two ;)

Will you document the new VM ?

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-10-31 21:05:41

by H. Peter Anvin

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Followup to: <[email protected]>
By author: Philipp Matthias Hahn <[email protected]>
In newsgroup: linux.dev.kernel
>
> > Other changes:
> linux/zlib_fs.h is still missing in you tree and breaks compilation of
> fs/cramfs and other.
>

I have submitted patches to Linus to make cramfs and zisofs work.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2001-10-31 23:18:01

by Erik Andersen

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed Oct 31, 2001 at 11:38:44AM -0800, Linus Torvalds wrote:
>
> On 31 Oct 2001, Michael Peddemors wrote:
> >
> > Lets' let this testing cycle go a little longer before making any
> > changes.. Let developers catch up..
>
> My not-so-cunning plan is actually to try to figure out the big problems
> now, then release a reasonable 2.4.14, and then just stop for a while,
> refusing to take new features.

How about ext3 for 2.4.14?

-Erik

--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--

2001-10-31 23:40:43

by Dax Kelson

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, 31 Oct 2001, Erik Andersen wrote:

> How about ext3 for 2.4.14?

Seconded.

2001-10-31 23:52:43

by Michael Peddemors

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, 2001-10-31 at 15:40, Dax Kelson wrote:
> On Wed, 31 Oct 2001, Erik Andersen wrote:
>
> > How about ext3 for 2.4.14?
>
> Seconded.
>

As much as I would like to ext3 get in, NOT IN THIS RELEASE please...
Don't put anything else in, until what we got works.. Hit him up on
2.4.15 :)

--
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
Wizard IT Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada

2001-11-01 10:20:46

by NeilBrown

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wednesday October 31, [email protected] wrote:
>
> We have actually talked about some higher-level ordering of the dirty list
> for at least five years, but nobody has ever done it. And I bet you $5
> that you'll get (a) better throughput than by making the queues longer and
> (b) you'll have fine latency while you write and (c) that you want to
> order the write-queue anyway for filesystems that care about ordering.
>

But what is the "right" order. A raid5 array might well respond to a
different ordering that an JBOD.

I've thought a bit about how to best give blocks to RAID5 so that they
can be written efficiently. I suspect the issues are similar for
normal disk io:

Currently the device (or block-device-layer) doesn't see a block until
the upper levels really want the IO to happen. There is a little bit
of a grace period betwen the submit_bh and the run_task_queue(&tq_disk)
when re-ordering can happen, but it isn't very long. There is a bit
more grace time while waiting to get a turn on the device. But it is
still a lot less time than the amount of time that most buffers are
sitting around in cache.

What I would like is that as soon as a buffer was marked "dirty", it
would get passed down to the driver (or at least to the
block-device-layer) with something like
submit_bh(WRITEA, bh);
i.e. a write ahead. (or is it write-behind...)
The device handler (the elevator algorithm for normal disks, other
code for other devices) could keep them ordered in whatever way it
chooses, and feed them into the queues at some appropriate time.

The submit_bh(WRITE, bh) would then push the buffer out if it hadn't
gone already.

The elevator code could possibly keep two sorted lists: one of WRITEA
(or READA) requests and one of WRITE (or READ) requests.
It processes the second merging in some of the first as it goes.
Maybe capping it to 2 -ahead blocks for every immediate block.
Probably also allowing for larger numbers of -ahead blocks if they are
contiguous with an immediate block.

RAID5 would do something a bit different. Possibly whenever it wanted
to write a stripe, it would hunt though the -ahead list (sort of like
the 2.2 code did) for other blocks that could be proactive added to
the stripe.


This would allow a nice ordering of write-behind (and read-ahead)
requests but give the driver control of latency by allowing it to
limit the extent to which write-behind/read-ahead blocks can usurp the
position of other blocks.

Does that make any sense? Is it conceptually simple enough?

NeilBrown

2001-11-01 19:45:26

by Pozsar Balazs

[permalink] [raw]
Subject: Re: 2.4.14-pre6


On Wed, 31 Oct 2001, Linus Torvalds wrote:

> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?

You should read lkml :))
Dave Airlie posted a mail with the subject "FYI: 2.4.14-pre5 issues.."
that noticed this bug on okt 30.

--
Balazs Pozsar.

2001-11-01 21:01:08

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Neil Brown wrote:
>
> ...
> What I would like is that as soon as a buffer was marked "dirty", it
> would get passed down to the driver (or at least to the
> block-device-layer) with something like
> submit_bh(WRITEA, bh);
> i.e. a write ahead. (or is it write-behind...)
> The device handler (the elevator algorithm for normal disks, other
> code for other devices) could keep them ordered in whatever way it
> chooses, and feed them into the queues at some appropriate time.
>

Sounds sensible to me.

In many ways, it's similar to the current scheme when it's used
with an enormous request queue - all writeable blocks in the
system are candidates for request merging. But your proposal
is a whole lot smarter.

In particular, the current kupdate scheme of writing the
dirty block list out in six chunks, five seconds apart
does potentially miss out on a very large number of merging
opportunities. Your proposal would fix that.

Another potential microoptimisation would be to write out
clean blocks if that helps merging. So if we see a write
for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
then write it out too. I suspect this would be a win for
ATA but a loss for SCSI. Not sure.

But I have a gut feel that all this is in the noisefloor,
compared to The Big Problem. It's just a matter of identifying
and fixing TBP. Fixing the fdatasync() thing didn't help,
because ext2_write_inode() for a new file has to read the
inode block from disk. Fixing that, by doing an async preread
of the inode's block in ext2_new_inode() didn't help either,
I suspect because my working set was so large that the VM
tossed out my preread before I got to use it. A few more days
poking is needed.



Oh. I have a gripe concerning prune_icache(). The design
idea behind keventd is that it's a "process context bottom
half handler". It's used for things like cardbus hotplug
interrupt handlers, handling tty hangups, etc. It should
probably run SCHED_FIFO.

Using keventd to synchronously flush large amounts of
data out to disk constitutes gross abuse - it's being blocked
from performing its designed duties for many seconds. Can we
please not do that? We already have kswapd, kupdate, bdflush,
which should be sufficient.

-

2001-11-01 21:29:59

by Chris Mason

[permalink] [raw]
Subject: Re: 2.4.14-pre6



On Thursday, November 01, 2001 12:55:41 PM -0800 Andrew Morton
<[email protected]> wrote:

> Oh. I have a gripe concerning prune_icache(). The design
> idea behind keventd is that it's a "process context bottom
> half handler". It's used for things like cardbus hotplug
> interrupt handlers, handling tty hangups, etc. It should
> probably run SCHED_FIFO.
>
> Using keventd to synchronously flush large amounts of
> data out to disk constitutes gross abuse - it's being blocked
> from performing its designed duties for many seconds. Can we
> please not do that? We already have kswapd, kupdate, bdflush,
> which should be sufficient.

One of the worst parts of prune_icache was that if a journaled
FS needed to log dirty inodes, kswapd would wait on the log, who was
probably waiting on kswapd. Thus the dirty_inode call, which I'd like to
get rid of.

I don't think kupdate or bdflush are suitable to flush the dirty inodes,
kupdate shouldn't do memory pressure and bdflush shouldn't wait on the log.
So how about a new kinoded?

-chris

2001-11-02 08:01:40

by Helge Hafting

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Andrew Morton wrote:

> Another potential microoptimisation would be to write out
> clean blocks if that helps merging. So if we see a write
> for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> then write it out too. I suspect this would be a win for
> ATA but a loss for SCSI. Not sure.
>

A not to stupid disk would implement the seek to block 5
as waiting for block 4 to move past. So
rewriting block 4 probably wouldn't help. could be
interesting to see a benchmark for that though, perhaps
some drives are really dumb.

The average "half rotation delay" when seeking does not apply
when the seek _isn't_ random.

Helge Hafting

2001-11-02 16:49:57

by jogi

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Wed, Oct 31, 2001 at 12:00:00AM -0800, Linus Torvalds wrote:
>
> Incredibly, I didn't get a _single_ bugreport about the fact that I had
> forgotten to change the version number in pre5. Usually that's everybody's
> favourite bug.. Is everybody asleep on the lists?

I noticed but I thought everybody else would complain :-)

[...]

> The MM has calmed down, but the OOM killer didn't use to work. Now it
> does, with heurstics that are so incredibly simple that it's almost
> embarrassing.
>
> And I dare anybody to break those OOM heuristics - either by not
> triggering when they should, or by triggering too early. You'll get an
> honourable mention if you can break them and tell me how ("Honourable
> mention"? Yeah, I'm cheap. What else is new?)
>
> In fact, I'd _really_ like to know of any VM loads that show bad
> behaviour. If you have a pet peeve about the VM, now is the time to speak
> up. Because otherwise I think I'm done.

I did my usual kernel compile testings and here are the resuls:

j25 j50 j75 j100

2.4.13-pre5aa1: 5:02.63 5:09.18 5:26.27 5:34.36
2.4.13-pre5aa1: 4:58.80 5:12.30 5:26.23 5:32.14
2.4.13-pre5aa1: 4:57.66 5:11.29 5:45.90 6:03.53
2.4.13-pre5aa1: 4:58.39 5:13.10 5:29.32 5:44.49
2.4.13-pre5aa1: 4:57.93 5:09.76 5:24.76 5:26.79

2.4.14-pre6: 4:58.88 5:16.68 5:45.93 7:16.56
2.4.14-pre6: 4:55.72 5:34.65 5:57.94 6:50.58
2.4.14-pre6: 4:59.46 5:16.88 6:25.83 6:51.43
2.4.14-pre6: 4:56.38 5:18.88 6:15.97 6:31.72
2.4.14-pre6: 4:55.79 5:17.47 6:00.23 6:44.85

2.4.14-pre7: 4:56.39 5:22.84 6:09.05 9:56.59
2.4.14-pre7: 4:56.55 5:25.15 7:01.37 7:03.74
2.4.14-pre7: 4:59.44 5:15.10 6:06.78 12:51.39*
2.4.14-pre7: 4:58.07 5:30.55 6:15.37 *
2.4.14-pre7: 4:58.17 5:26.80 6:41.44 *


The last three of the runs of make -j100 with -pre7 failed since some
processes (portmap and cc1) were killed. So the oom killer seems to
kill (in case of portmap) the wrong processes and might trigger a little
too early. I have no data about the swap / mem usage at that time since
the script runs unattended.

Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.

I have not checked about the interactivity issues so this might be a
*feature*.


Regards,

Jogi

--

Well, yeah ... I suppose there's no point in getting greedy, is there?

<< Calvin & Hobbes >>

2001-11-03 12:48:07

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On 2 Nov 2001 [email protected] wrote:

> I did my usual kernel compile testings and here are the resuls:
>
> j25 j50 j75 j100
>
> 2.4.13-pre5aa1: 5:02.63 5:09.18 5:26.27 5:34.36
> 2.4.13-pre5aa1: 4:58.80 5:12.30 5:26.23 5:32.14
> 2.4.13-pre5aa1: 4:57.66 5:11.29 5:45.90 6:03.53
> 2.4.13-pre5aa1: 4:58.39 5:13.10 5:29.32 5:44.49
> 2.4.13-pre5aa1: 4:57.93 5:09.76 5:24.76 5:26.79
>
> 2.4.14-pre6: 4:58.88 5:16.68 5:45.93 7:16.56
> 2.4.14-pre6: 4:55.72 5:34.65 5:57.94 6:50.58
> 2.4.14-pre6: 4:59.46 5:16.88 6:25.83 6:51.43
> 2.4.14-pre6: 4:56.38 5:18.88 6:15.97 6:31.72
> 2.4.14-pre6: 4:55.79 5:17.47 6:00.23 6:44.85
>
> 2.4.14-pre7: 4:56.39 5:22.84 6:09.05 9:56.59
> 2.4.14-pre7: 4:56.55 5:25.15 7:01.37 7:03.74
> 2.4.14-pre7: 4:59.44 5:15.10 6:06.78 12:51.39*
> 2.4.14-pre7: 4:58.07 5:30.55 6:15.37 *
> 2.4.14-pre7: 4:58.17 5:26.80 6:41.44 *

<snip>

> Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.

My box agrees. Notice pre5aa1/ac IO numbers below. I'm getting
~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.

-Mike

fresh boot -> time make -j30 bzImage && procinfo >> /stats

2.4.13-pre2.virgin
real 8m44.484s
user 6m37.800s
sys 0m27.040s

user : 0:06:44.26 68.4% page in : 653397
nice : 0:00:00.00 0.0% page out: 617078
system: 0:01:22.68 14.0% swap in : 112202
idle : 0:01:43.93 17.6% swap out: 149382

2.4.13-pre2.aa1
real 8m5.204s
user 6m38.590s
sys 0m27.220s

user : 0:06:44.90 74.8% page in : 560202
nice : 0:00:00.00 0.0% page out: 568467
system: 0:01:09.70 12.9% swap in : 97083
idle : 0:01:06.55 12.3% swap out: 137374

2.4.13-pre5.virgin
real 9m1.709s
user 6m37.310s
sys 0m53.880s

user : 0:06:44.49 66.1% page in : 519473
nice : 0:00:00.00 0.0% page out: 521926
system: 0:01:51.32 18.2% swap in : 93794
idle : 0:01:35.91 15.7% swap out: 125145

2.4.13-pre5.aa1
real 7m30.261s
user 6m35.930s
sys 0m28.500s

user : 0:06:42.74 76.8% page in : 402421
nice : 0:00:00.00 0.0% page out: 390429
system: 0:01:21.20 15.5% swap in : 70652
idle : 0:00:40.51 7.7% swap out: 90871

2.4.13.virgin
real 9m13.976s
user 6m36.910s
sys 0m27.510s

user : 0:06:43.67 64.3% page in : 523516
nice : 0:00:00.00 0.0% page out: 547148
system: 0:00:41.29 6.6% swap in : 85945
idle : 0:03:02.39 29.1% swap out: 131574

2.4.14-pre2.virgin
real 8m0.051s
user 6m34.060s
sys 0m31.020s

user : 0:06:40.77 72.9% page in : 425768
nice : 0:00:00.00 0.0% page out: 494520
system: 0:00:44.65 8.1% swap in : 82020
idle : 0:01:44.23 19.0% swap out: 117066

2.4.14-pre2.virgin+p2p3
real 8m0.094s
user 6m35.450s
sys 0m29.810s

user : 0:06:41.38 73.2% page in : 432894
nice : 0:00:00.00 0.0% page out: 483079
system: 0:00:43.71 8.0% swap in : 82909
idle : 0:01:42.92 18.8% swap out: 113578

2.4.14-pre3.virgin
real 8m30.454s
user 6m35.760s
sys 0m29.770s

user : 0:06:42.40 69.6% page in : 430062
nice : 0:00:00.00 0.0% page out: 610021
system: 0:00:42.29 7.3% swap in : 84529
idle : 0:02:13.18 23.0% swap out: 147283

2.4.14-pre6.virgin
real 7m58.841s
user 6m37.220s
sys 0m30.370s

user : 0:06:43.37 73.6% page in : 576081
nice : 0:00:00.00 0.0% page out: 704720
system: 0:00:42.87 7.8% swap in : 120317
idle : 0:01:41.45 18.5% swap out: 170619

2.4.14-pre7.virgin
real 7m56.357s
user 6m36.580s
sys 0m30.600s

user : 0:06:42.88 74.5% page in : 646265
nice : 0:00:00.00 0.0% page out: 704490
system: 0:00:43.11 8.0% swap in : 136957
idle : 0:01:34.61 17.5% swap out: 171134

2.4.14-pre6aa1
real 8m29.484s
user 6m38.650s
sys 0m27.940s

user : 0:06:45.45 70.6% page in : 641298
nice : 0:00:00.00 0.0% page out: 634494
system: 0:00:41.73 7.3% swap in : 118869
idle : 0:02:06.90 22.1% swap out: 154141

2.4.12-ac1
real 8m12.184s
user 6m35.170s
sys 0m33.630s

user : 0:06:41.35 71.8% page in : 402144
nice : 0:00:00.00 0.0% page out: 382625
system: 0:01:44.76 18.7% swap in : 65589
idle : 0:00:53.25 9.5% swap out: 89164

2.4.12-ac3
real 8m8.200s
user 6m36.230s
sys 0m32.340s

user : 0:06:43.05 71.7% page in : 419527
nice : 0:00:00.00 0.0% page out: 385711
system: 0:00:49.29 8.8% swap in : 70491
idle : 0:01:49.46 19.5% swap out: 89771

2.4.13-ac6
real 8m15.366s
user 6m35.710s
sys 0m33.570s

user : 0:06:42.25 71.6% page in : 461270
nice : 0:00:00.00 0.0% page out: 494015
system: 0:00:49.03 8.7% swap in : 82114
idle : 0:01:50.74 19.7% swap out: 117766

2001-11-03 18:04:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.14-pre6


On Sat, 3 Nov 2001, Mike Galbraith wrote:
>
> > Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.
>
> My box agrees. Notice pre5aa1/ac IO numbers below. I'm getting
> ~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.

Well, pre7 gets the second-best numbers, and the reason I really don't
like pre5aa1 is that since pre4, the virgin kernels have had all mapped
pages in the LRU queue, and can use that knowledge to decide when to
start swapping.

So in those kernels, the balance between scanning the VM tables and
scanning the regular unmapped caches is something that is strictly
deterministic, which is something I _really_ want to have.

We've had too much trouble with the "let's hope this works" approach.
Which is why I want the anonymous pages to clearly show up in the
scanning, and not have them be these virtual ghosts that only show up when
you start swapping stuff out.

Your array cut down to just the ones that made the benchmark in under 8
minutes makes it easier to read, and clearly pre6+ seems to be a bit _too_
swap-happy. I'm trying the "dynamic max_mapped" approach now.

Linus

2001-11-03 19:07:47

by Mike Galbraith

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Sat, 3 Nov 2001, Linus Torvalds wrote:

> On Sat, 3 Nov 2001, Mike Galbraith wrote:
> >
> > > Otherwise -pre5aa1 still seems to be the fastest kernel *in this test*.
> >
> > My box agrees. Notice pre5aa1/ac IO numbers below. I'm getting
> > ~good %user/wallclock with pre6/pre7 despite (thrash?) IO numbers.
>
> Well, pre7 gets the second-best numbers, and the reason I really don't
> like pre5aa1 is that since pre4, the virgin kernels have had all mapped
> pages in the LRU queue, and can use that knowledge to decide when to
> start swapping.
>
> So in those kernels, the balance between scanning the VM tables and
> scanning the regular unmapped caches is something that is strictly
> deterministic, which is something I _really_ want to have.
>
> We've had too much trouble with the "let's hope this works" approach.
> Which is why I want the anonymous pages to clearly show up in the
> scanning, and not have them be these virtual ghosts that only show up when
> you start swapping stuff out.
>
> Your array cut down to just the ones that made the benchmark in under 8
> minutes makes it easier to read, and clearly pre6+ seems to be a bit _too_
> swap-happy. I'm trying the "dynamic max_mapped" approach now.

Swap-happy doesn't bother this load too much. What it's really sensitive
to is pagein. Turning the cache knobs (vigorously:) in aa-latest...

2.4.14-pre6aa1
real 8m29.484s
user 6m38.650s
sys 0m27.940s

user : 0:06:45.45 70.6% page in : 641298
nice : 0:00:00.00 0.0% page out: 634494
system: 0:00:41.73 7.3% swap in : 118869
idle : 0:02:06.90 22.1% swap out: 154141

echo 2 > /proc/sys/vm/vm_mapped_ratio
echo 128 > /proc/sys/vm/vm_balance_ratio

real 7m25.069s
user 6m37.390s
sys 0m27.540s

user : 0:06:43.60 78.7% page in : 588488
nice : 0:00:00.00 0.0% page out: 514865
system: 0:00:40.47 7.9% swap in : 118738
idle : 0:01:08.92 13.4% swap out: 122340

..lowers the sleep time noticibly despite swapin remaining constant.

-Mike

2001-11-04 22:22:56

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Hi!

> > What I would like is that as soon as a buffer was marked "dirty", it
> > would get passed down to the driver (or at least to the
> > block-device-layer) with something like
> > submit_bh(WRITEA, bh);
> > i.e. a write ahead. (or is it write-behind...)
> > The device handler (the elevator algorithm for normal disks, other
> > code for other devices) could keep them ordered in whatever way it
> > chooses, and feed them into the queues at some appropriate time.
>
> Sounds sensible to me.
>
> In many ways, it's similar to the current scheme when it's used
> with an enormous request queue - all writeable blocks in the
> system are candidates for request merging. But your proposal
> is a whole lot smarter.
>
> In particular, the current kupdate scheme of writing the
> dirty block list out in six chunks, five seconds apart
> does potentially miss out on a very large number of merging
> opportunities. Your proposal would fix that.
>
> Another potential microoptimisation would be to write out
> clean blocks if that helps merging. So if we see a write
> for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> then write it out too. I suspect this would be a win for
> ATA but a loss for SCSI. Not sure.

Please don't do this, it is bug.

If user did not ask writing somewhere, DO NOT WRITE THERE! If power
fails in the middle of the sector... Or if that is flashcard.... Just
don't do this.
Pavel
--
STOP THE WAR! Someone killed innocent Americans. That does not give
U.S. right to kill people in Afganistan.


2001-11-04 23:15:59

by Daniel Phillips

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On November 4, 2001 11:34 pm, Pavel Machek wrote:
> > Another potential microoptimisation would be to write out
> > clean blocks if that helps merging. So if we see a write
> > for blocks 1,2,3,5,6,7 and block 4 is known to be in memory,
> > then write it out too. I suspect this would be a win for
> > ATA but a loss for SCSI. Not sure.
>
> Please don't do this, it is bug.
>
> If user did not ask writing somewhere, DO NOT WRITE THERE! If power
> fails in the middle of the sector... Or if that is flashcard....

or raid or nbd...

> Just don't do this.

--
Daniel

2001-11-05 20:58:03

by Johannes Erdfelt

[permalink] [raw]
Subject: Re: 2.4.14-pre6

On Fri, Nov 02, 2001, Pavel Machek <[email protected]> wrote:
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
>
> What is SMT P4?

Symmetric Multi Threading IIRC.

Essentially having a virtual dual CPU system on one die where you can
dispatch multiple programs to the differen execution units. For example
you can run a FP intensive program at the same time as an Integer
intensive program.

Nowhere close to true dual CPU performance because of resource
contention on the execution units, but better than single CPU
performance.

JE

2001-11-05 21:04:02

by Charles Cazabon

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Pavel Machek <[email protected]> wrote:
>
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
>
> What is SMT P4?

"Jackson" technology-enabled P4 core. Also known as "hyperthreading".

Charles
--
-----------------------------------------------------------------------
Charles Cazabon <[email protected]>
GPL'ed software available at: http://www.qcc.sk.ca/~charlesc/software/
-----------------------------------------------------------------------

2001-11-05 21:28:23

by Josh Fryman

[permalink] [raw]
Subject: Re: 2.4.14-pre6

> Basically, you get two virtual CPU's per die, and each CPU can run two
> threads at the same time. It slows some stuff down, because it makes for
> much more cache pressure, but Intel claims up to 30% improvement on some
> loads that scale well.
>
> The 30% is probably a marketing number (ie it might be more like 10% on
> more normal loads), but you have to give them points for interesting
> technology <)

Specifically, the 30% comes in two places. Using Intel proprietary
benchmarks (unreleased, according to the footnotes) they find that a
typical IA32 instruction mix uses some 35% of system resources in an
advanced device like the P4 with NetBurst. the rest is idle.

by using the SMT model with two virtual systems - each with complete
register sets and independent APICs, sharing only the backend exec
units - they claim you get a 30% improvement in wall-clock time. This
is supposed to be on their benchmarks *without* recompiling anything. To
get "additional" improvement, using code to take advantage of the dual
virtual CPUs nature of the chip and recompiling should give some
unquantified gain.

-josh

to help your searching if you want more details, Intel has called this:
Jackson Technology aka Project Foster aka Hyper-Threading Technology
and is known in the rest of the world as SMT.

Intel has a whitepaper or two available for download. If you can't find
them at developer.intel.com or via Google, let me know and I've got some
copies laying around. Amusingly, they seem to be ultra scared of
releasing any real information about it. Alpha was working on a 4-way
design that seemed a bit more clever for the 21464, which appears to be
destined for the bit bucket now :(

2001-11-05 21:08:52

by Wilson

[permalink] [raw]
Subject: Re: 2.4.14-pre6

----- Original Message -----
From: "Pavel Machek" <[email protected]>
To: "Linus Torvalds" <[email protected]>
Cc: "Kernel Mailing List" <[email protected]>
Sent: Friday, November 02, 2001 7:01 AM
Subject: Re: 2.4.14-pre6


> Hi!
>
> > Oh, and the first funny patches for the upcoming SMT P4 cores are
starting
> > to show up. More to come.
>
> What is SMT P4?
>
> > Anybody out there with cerberus?
> >
> > Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
>
> Someone go and steal 64MB from Linus....
>

SMT == Simultaneous Multi-Threading:
http://www.anandtech.com/showdoc.html?i=1525&p=4

They accused us of suppressing freedom of expression.
This was a lie and we could not let them publish it.
-- Nelba Blandon, Nicaraguan Interior Ministry Director of Censorship



2001-11-05 20:39:31

by Pavel Machek

[permalink] [raw]
Subject: Re: 2.4.14-pre6

Hi!

> Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> to show up. More to come.

What is SMT P4?

> Anybody out there with cerberus?
>
> Linus "128MB of RAM and 1GB into swap, and happy" Torvalds

Someone go and steal 64MB from Linus....

Pavel "12MB of RAM and no space left for swap" Machek
(on my Velo thingie....)
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2001-11-05 20:52:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.14-pre6


On Fri, 2 Nov 2001, Pavel Machek wrote:
>
> > Oh, and the first funny patches for the upcoming SMT P4 cores are starting
> > to show up. More to come.
>
> What is SMT P4?

It's the upcoming symmetric multi-threading on the P4 chips (disabled in
hardware in currently selling stuff, but apparently various Intel contacts
already have chips to test with).

Basically, you get two virtual CPU's per die, and each CPU can run two
threads at the same time. It slows some stuff down, because it makes for
much more cache pressure, but Intel claims up to 30% improvement on some
loads that scale well.

The 30% is probably a marketing number (ie it might be more like 10% on
more normal loads), but you have to give them points for interesting
technology <)

> > Anybody out there with cerberus?
> >
> > Linus "128MB of RAM and 1GB into swap, and happy" Torvalds
>
> Someone go and steal 64MB from Linus....

Hey, hey. I actually have spent a _lot_ of time with 40MB of RAM and KDE
over the last few weeks. And this is with DRI on a graphics card that also
seems to eat up 8MB just for the direct rendering stuff, _and_ with kernel
profiling enabled, so it actually had more like 30MB of "real" memory
available. In 1600x1200, 16-bit color.

Konqueror is a pig, but it's _usable_. I did real work, including kernel
compiles, with it.

Admittedly I do like the behaviour with 2GB a lot better. That way I can
cache every kernel tree I work on, and not ever think about "diff" times.

Linus

2001-11-05 21:49:57

by Gérard Roudier

[permalink] [raw]
Subject: Re: 2.4.14-pre6



On Mon, 5 Nov 2001, Josh Fryman wrote:

> > Basically, you get two virtual CPU's per die, and each CPU can run two
> > threads at the same time. It slows some stuff down, because it makes for
> > much more cache pressure, but Intel claims up to 30% improvement on some
> > loads that scale well.
> >
> > The 30% is probably a marketing number (ie it might be more like 10% on
> > more normal loads), but you have to give them points for interesting
> > technology <)
>
> Specifically, the 30% comes in two places. Using Intel proprietary
> benchmarks (unreleased, according to the footnotes) they find that a
> typical IA32 instruction mix uses some 35% of system resources in an
> advanced device like the P4 with NetBurst. the rest is idle.
>
> by using the SMT model with two virtual systems - each with complete
> register sets and independent APICs, sharing only the backend exec
> units - they claim you get a 30% improvement in wall-clock time. This
> is supposed to be on their benchmarks *without* recompiling anything. To
> get "additional" improvement, using code to take advantage of the dual
> virtual CPUs nature of the chip and recompiling should give some
> unquantified gain.

All things being equal, this probably will make a NxMHz P4 be as fast as a
NxMHz PIII. But the new complexity it may require in real life may just
turn the gain into just nil.

What a great improvement, indeed! :-)

G?rard.