2001-12-16 00:14:24

by Linus Torvalds

[permalink] [raw]
Subject: Re: Just a second ...


On Sat, 15 Dec 2001, Davide Libenzi wrote:
>
> when you find 10 secs free in your spare time i really would like to know
> the reason ( if any ) of your abstention from any schdeuler discussion.
> No hurry, just a few lines out of lkml.

I just don't find it very interesting. The scheduler is about 100 lines
out of however-many-million (3.8 at least count), and doesn't even impact
most normal performace very much.

We'll clearly do per-CPU runqueues or something some day. And that worries
me not one whit, compared to thigns like VM and block device layer ;)

I know a lot of people think schedulers are important, and the operating
system theory about them is overflowing - it's one of those things that
people can argue about forever, yet is conceptually simple enough that
people aren't afraid of it. I just personally never found it to be a major
issue.

Let's face it - the current scheduler has the same old basic structure
that it did almost 10 years ago, and yes, it's not optimal, but there
really aren't that many real-world loads where people really care. I'm
sorry, but it's true.

And you have to realize that there are not very many things that have
aged as well as the scheduler. Which is just another proof that scheduling
is easy.

We've rewritten the VM several times in the last ten years, and I expect
it will be changed several more times in the next few years. Withing five
years we'll almost certainly have to make the current three-level page
tables be four levels etc.

In comparison to those kinds of issues, I suspect that making the
scheduler use per-CPU queues together with some inter-CPU load balancing
logic is probably _trivial_. Patches already exist, and I don't feel that
people can screw up the few hundred lines too badly.

Linus


2001-12-17 22:47:16

by Davide Libenzi

[permalink] [raw]
Subject: Scheduler ( was: Just a second ) ...

On Sat, 15 Dec 2001, Linus Torvalds wrote:

> I just don't find it very interesting. The scheduler is about 100 lines
> out of however-many-million (3.8 at least count), and doesn't even impact
> most normal performace very much.

Linus, sharing queue and lock between CPUs for a "thing" highly frequency
( schedule()s + wakeup()s ) accessed like the scheduler it's quite ugly
and it's not that much funny. And it's not only performance wise, it's
more design wise.


> We'll clearly do per-CPU runqueues or something some day. And that worries
> me not one whit, compared to thigns like VM and block device layer ;)

Why not 2.5.x ?


> I know a lot of people think schedulers are important, and the operating
> system theory about them is overflowing - ...

It's no more important of anything else, it's just one of the remaining
scalability/design issues. No, it's not more important than VM but
there're enough people working on VM. And the hope is to get the scheduler
right with an ETA of less than 10 years.


> it's one of those things that people can argue about forever, ...

Yes, i suppose that if something is not addressed, it'll come up again and
again.


> yet is conceptually simple enough that people aren't afraid of it.
^^^^^^^^^^^^^^^^^^^

1, ...


> Let's face it - the current scheduler has the same old basic structure
> that it did almost 10 years ago, and yes, it's not optimal, but there
> really aren't that many real-world loads where people really care. I'm
> sorry, but it's true.

Moving to 4, 8, 16 CPUs the run queue load, that would be thought insane
for UP systems, starts to matter. Just to leave out cache line effects.
Just to leave out the way the current scheduler moves tasks around CPUs.
Linus, it's not only about performance benchmarks with 2451 processes
jumping on the run queue, that i could not care less about, it's just a
sum of sucky "things" that make an issue. You can look at it like a
cosmetic/design patch more than a strict performance patch if you like.


> And you have to realize that there are not very many things that have
> aged as well as the scheduler. Which is just another proof that
> scheduling is easy.
^^^^^^^^^^^^^^^^^^

..., 2, ...


> We've rewritten the VM several times in the last ten years, and I expect
> it will be changed several more times in the next few years. Withing five
> years we'll almost certainly have to make the current three-level page
> tables be four levels etc.
>
> In comparison to those kinds of issues, I suspect that making the
> scheduler use per-CPU queues together with some inter-CPU load balancing
> logic is probably _trivial_.
^^^^^^^^^

... 3, there should be a subliminal message inside but i'm not able to
get it ;)
I would not call selecting the right task to run in an SMP system trivial.
The difference between selecting the right task to run and selecting the
right page to swap is that if you screw up with the task the system
impact is lower. But, if you screw up, your design will suck in both cases.
Anyway, given that 1) real men do VM ( i thought they didn't eat quiche )
and easy-coders do scheduling 2) the schdeuler is easy/trivial and you do
not seem interested in working on it 3) whoever is doing the scheduler
cannot screw up things, why don't you give the responsibility for example
to Alan or Ingo so that a discussion ( obviously easy ) about the future
of the schdeuler can be started w/out hurting real men doing VM ?
I'm talking about, you know, that kind of discussions where people bring
solutions, code and numbers, they talk about the good and bad of certain
approaches and they finally come up ( after some sane fight ) with a much
or less widely approved solution. The scheduler, besides the real men
crap, is one of the basic components of an OS, and having a public
debate, i'm not saying every month and neither every year, but at least
once every four years ( this is the last i remember ) could be a nice thing.
And no, if you do not give to someone that you trust the "power" to
redesign the scheduler, no schdeuler discussions will start simply
because people don't like the result of a debate to be dumped to /dev/null.


> Patches already exist, and I don't feel that people can screw up the few
> hundred lines too badly.

Can you point me to a Linux patch that implements _real_independent_
( queue and locking ) CPU schedulers with global balancing policy ?
I searched very badly but i did not find anything.




Your faithfully,
Jimmy Scheduler





2001-12-17 22:55:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Mon, 17 Dec 2001, Davide Libenzi wrote:

> On Sat, 15 Dec 2001, Linus Torvalds wrote:
>
> > I just don't find it very interesting. The scheduler is about 100 lines
> > out of however-many-million (3.8 at least count), and doesn't even impact
> > most normal performace very much.
>
> Linus, sharing queue and lock between CPUs for a "thing" highly frequency
> ( schedule()s + wakeup()s ) accessed like the scheduler it's quite ugly
> and it's not that much funny. And it's not only performance wise, it's
> more design wise.

"Design wise" is highly overrated.

Simplicity is _much_ more important, if something commonly is only done a
few hundred times a second. Locking overhead is basically zero for that
case.

> > We'll clearly do per-CPU runqueues or something some day. And that worries
> > me not one whit, compared to thigns like VM and block device layer ;)
>
> Why not 2.5.x ?

Maybe. But read the rest of the sentence. There are issues that are about
a million times more important.

> Moving to 4, 8, 16 CPUs the run queue load, that would be thought insane
> for UP systems, starts to matter.

4 cpu's are "high end" today. We can probably point to tens of thousands
of UP machines for each 4-way out there. The ratio gets even worse for 8,
and 16 CPU's is basically a rounding error.

You have to prioritize. Scheduling overhead is way down the list.

Linus

2001-12-17 23:19:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Mon, 17 Dec 2001, Davide Libenzi wrote:
> >
> > You have to prioritize. Scheduling overhead is way down the list.
>
> You don't really have to serialize/prioritize, old Latins used to say
> "Divide Et Impera" ;)

Well, you explicitly _asked_ me why I had been silent on the issue. I told
you.

I also told you that I thought it wasn't that big of a deal, and that
patches already exist.

So I'm letting the patches fight it out among the people who _do_ care.

Then, eventually, I'll do something about it, when we have a winner.

If that isn't "Divide et Impera", I don't know _what_ is. Remember: the
romans didn't much care for their subjects. They just wanted the glory,
and the taxes.

Linus

2001-12-17 23:12:52

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

>
> On Mon, 17 Dec 2001, Davide Libenzi wrote:
>
> > On Sat, 15 Dec 2001, Linus Torvalds wrote:
> >
> > > I just don't find it very interesting. The scheduler is about 100 lines
> > > out of however-many-million (3.8 at least count), and doesn't even impact
> > > most normal performace very much.
> >
> > Linus, sharing queue and lock between CPUs for a "thing" highly frequency
> > ( schedule()s + wakeup()s ) accessed like the scheduler it's quite ugly
> > and it's not that much funny. And it's not only performance wise, it's
> > more design wise.
>
> "Design wise" is highly overrated.
>
> Simplicity is _much_ more important, if something commonly is only done a
> few hundred times a second. Locking overhead is basically zero for that
> case.

Few hundred is a nice definition because you can basically range from 0 to
infinite. Anyway i agree that we can spend days debating about what this
"few hundred" translate to, and i do not really want to.


> 4 cpu's are "high end" today. We can probably point to tens of thousands
> of UP machines for each 4-way out there. The ratio gets even worse for 8,
> and 16 CPU's is basically a rounding error.
>
> You have to prioritize. Scheduling overhead is way down the list.

You don't really have to serialize/prioritize, old Latins used to say
"Divide Et Impera" ;)




- Davide


2001-12-17 23:37:24

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

> So I'm letting the patches fight it out among the people who _do_ care.
>
> Then, eventually, I'll do something about it, when we have a winner.
>
> If that isn't "Divide et Impera", I don't know _what_ is. Remember: the
> romans didn't much care for their subjects. They just wanted the glory,
> and the taxes.

Just like today, everyone I talk to wants glory, and everyone I talk to
wants to _not_ pay taxes.



- Davide


2001-12-17 23:53:05

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, Dec 17, 2001 at 03:18:14PM -0800, Linus Torvalds wrote:
> Well, you explicitly _asked_ me why I had been silent on the issue. I told
> you.

Well, what about those of us who need syscall numbers assigned for which
you are the only official assigned number registry?

-ben

2001-12-18 01:12:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Mon, 17 Dec 2001, Benjamin LaHaise wrote:
> On Mon, Dec 17, 2001 at 03:18:14PM -0800, Linus Torvalds wrote:
> > Well, you explicitly _asked_ me why I had been silent on the issue. I told
> > you.
>
> Well, what about those of us who need syscall numbers assigned for which
> you are the only official assigned number registry?

I've told you a number of times that I'd like to see the preliminary
implementation publicly discussed and some uses outside of private
companies that I have no insight into..

Linus

2001-12-18 01:47:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Followup to: <[email protected]>
By author: Linus Torvalds <[email protected]>
In newsgroup: linux.dev.kernel
>
> I've told you a number of times that I'd like to see the preliminary
> implementation publicly discussed and some uses outside of private
> companies that I have no insight into..
>

There was a group at IBM who presented on an alternate SMP scheduler
at this year's OLS; it generated quite a bit of good discussion.

-hpa

--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2001-12-18 01:55:34

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

> You have to prioritize. Scheduling overhead is way down the list.

That's not what the profiling on my UP machine indicates,
let alone on SMP machines.

Try readprofile some day, chances are schedule() is pretty
near the top of the list.

regards,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 02:37:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Mon, 17 Dec 2001, Rik van Riel wrote:
>
> Try readprofile some day, chances are schedule() is pretty
> near the top of the list.

Ehh.. Of course I do readprofile.

But did you ever compare readprofile output to _total_ cycles spent?

The fact is, it's not even noticeable under any normal loads, and
_definitely_ not on UP except with totally made up benchmarks that just
pass tokens around or yield all the time.

Because we spend 95-99% in user space or idle. Which is as it should be.
There are _very_ few loads that are kernel-intensive, and in fact the best
way to get high system times is to do either lots of fork/exec/wait with
everything cached, or do lots of open/read/write/close with everything
cached.

Of the remaining 1-5% of time, schedule() shows up as one fairly high
thing, but on most profiles I've seen of real work it shows up long after
things like "clear_page()" and "copy_page()".

And look closely at the profile, and you'll notice that it tends to be a
_loong_ tail of stuff.

Quite frankly, I'd be a _lot_ more interested in making the scheduling
slices _shorter_ during 2.5.x, and go to a 1kHz clock on x86 instead of a
100Hz one, _despite_ the fact that it will increase scheduling load even
more. Because it improves interactive feel, and sometimes even performance
(ie being able to sleep for shorter sequences of time allows some things
that want "almost realtime" behaviour to avoid busy-looping for those
short waits - improving performace exactly _because_ they put more load on
the scheduler).

The benchmark that is just about _the_ worst on the scheduler is actually
something like "lmbench", and if you look at profiles for that you'll
notice that system call entry and exit together with the read/write path
ends up being more of a performance issue.

And you know what? From a user standpoint, improving disk latency is again
a _lot_ more noticeable than scheduler overhead.

And even more important than performance is being able to read and write
to CD-RW disks without having to know about things like "ide-scsi" etc,
and do it sanely over different bus architectures etc.

The scheduler simply isn't that important.

Linus

2001-12-18 03:17:10

by David Lang

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

one problem the current scheduler has on SMP machines (even 2 CPU ones) is
that if the system is running one big process it will bounce from CPU to
CPU and actually finish considerably slower then if you are running two
CPU intensive tasks (with less cpu hopping). I saw this a few months ago
as I was doing something as simple as gunzip on a large file, I got a 30%
speed increase by running setiathome at the same time.

I'm not trying to say that it should be the top priority, but there are
definante weaknesses showing in the current implementation.

David Lang


On Mon, 17 Dec 2001, Linus Torvalds wrote:

> Date: Mon, 17 Dec 2001 18:35:54 -0800 (PST)
> From: Linus Torvalds <[email protected]>
> To: Rik van Riel <[email protected]>
> Cc: Davide Libenzi <[email protected]>,
> Kernel Mailing List <[email protected]>
> Subject: Re: Scheduler ( was: Just a second ) ...
>
>
> On Mon, 17 Dec 2001, Rik van Riel wrote:
> >
> > Try readprofile some day, chances are schedule() is pretty
> > near the top of the list.
>
> Ehh.. Of course I do readprofile.
>
> But did you ever compare readprofile output to _total_ cycles spent?
>
> The fact is, it's not even noticeable under any normal loads, and
> _definitely_ not on UP except with totally made up benchmarks that just
> pass tokens around or yield all the time.
>
> Because we spend 95-99% in user space or idle. Which is as it should be.
> There are _very_ few loads that are kernel-intensive, and in fact the best
> way to get high system times is to do either lots of fork/exec/wait with
> everything cached, or do lots of open/read/write/close with everything
> cached.
>
> Of the remaining 1-5% of time, schedule() shows up as one fairly high
> thing, but on most profiles I've seen of real work it shows up long after
> things like "clear_page()" and "copy_page()".
>
> And look closely at the profile, and you'll notice that it tends to be a
> _loong_ tail of stuff.
>
> Quite frankly, I'd be a _lot_ more interested in making the scheduling
> slices _shorter_ during 2.5.x, and go to a 1kHz clock on x86 instead of a
> 100Hz one, _despite_ the fact that it will increase scheduling load even
> more. Because it improves interactive feel, and sometimes even performance
> (ie being able to sleep for shorter sequences of time allows some things
> that want "almost realtime" behaviour to avoid busy-looping for those
> short waits - improving performace exactly _because_ they put more load on
> the scheduler).
>
> The benchmark that is just about _the_ worst on the scheduler is actually
> something like "lmbench", and if you look at profiles for that you'll
> notice that system call entry and exit together with the read/write path
> ends up being more of a performance issue.
>
> And you know what? From a user standpoint, improving disk latency is again
> a _lot_ more noticeable than scheduler overhead.
>
> And even more important than performance is being able to read and write
> to CD-RW disks without having to know about things like "ide-scsi" etc,
> and do it sanely over different bus architectures etc.
>
> The scheduler simply isn't that important.
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2001-12-18 05:43:38

by John Heil

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Thierry Forveille wrote:

> Date: Mon, 17 Dec 2001 19:11:10 -1000 (HST)
> From: Thierry Forveille <[email protected]>
> To: [email protected]
> Subject: Re: Scheduler ( was: Just a second ) ...
>
> Linus Torvalds ([email protected]) writes
> > On Mon, 17 Dec 2001, Rik van Riel wrote:
> > >
> > > Try readprofile some day, chances are schedule() is pretty
> > > near the top of the list.
> >
> > Ehh.. Of course I do readprofile.
> >
> > But did you ever compare readprofile output to _total_ cycles spent?
> >
> I have a feeling that this discussion got sidetracked: cpu cycles burnt
> in the scheduler indeed is non-issue, but big tasks being needlessly moved
> around on SMPs is worth tackling.

Given a cpu affinity facility, policy mgmt would belong in user space.
CPU affinity would be pretty simple and I think the effort is already
in flight IIRC.

Johnh

-
-----------------------------------------------------------------
John Heil
South Coast Software
Custom systems software for UNIX and IBM MVS mainframes
1-714-774-6952
[email protected]
http://www.sc-software.com
-----------------------------------------------------------------

2001-12-18 03:06:10

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

> Quite frankly, I'd be a _lot_ more interested in making the scheduling
> slices _shorter_ during 2.5.x, and go to a 1kHz clock on x86 instead of a
> 100Hz one, _despite_ the fact that it will increase scheduling load even
> more. Because it improves interactive feel, and sometimes even performance
> (ie being able to sleep for shorter sequences of time allows some things
> that want "almost realtime" behaviour to avoid busy-looping for those
> short waits - improving performace exactly _because_ they put more load on
> the scheduler).

I'm ok with increasing HZ but not so ok with decreasing time slices.
When you switch a task you've a fixed cost ( tlb, cache image,... ) that,
if you decrease the time slice, you're going to weigh with a lower run time
highering its percent impact.
The more interactive feel can be achieved by using a real BVT
implementation :

- p->counter = (p->counter >> 1) + NICE_TO_TICKS(p->nice);
+ p->counter += NICE_TO_TICKS(p->nice);

The only problem with this is that, with certain task run patterns,
processes can run a long time ( having an high dynamic priority ) before
they get scheduled.
What i was thinking was something like, in timer.c :

if (p->counter > decay_ticks)
--p->counter;
else if (++p->timer_ticks >= MAX_RUN_TIME) {
p->counter -= p->timer_ticks;
p->timer_ticks = 0;
p->need_resched = 1;
}

Having MAX_RUN_TIME ~= NICE_TO_TICKS(0)
In this way I/O bound tasks can run with high priority giving a better
interactive feel, w/out running too much freezing the system when exiting
from a quite long I/O wait.




- Davide


2001-12-18 03:17:30

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Davide Libenzi wrote:

> What i was thinking was something like, in timer.c :
>
> if (p->counter > decay_ticks)
> --p->counter;
> else if (++p->timer_ticks >= MAX_RUN_TIME) {
> p->counter -= p->timer_ticks;
> p->timer_ticks = 0;
> p->need_resched = 1;
> }

Obviously that code doesn't work :) but the idea is to not permit the task
to run more than a maximum time consecutively.



- Davide


2001-12-18 04:29:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


[ cc'd back to Linux kernel, in case somebody wants to take a look whether
there is something wrong in the sound drivers, for example ]

On Mon, 17 Dec 2001, William Lee Irwin III wrote:
>
> This is no benchmark. This is my home machine it's taking a bite out of.
> I'm trying to websurf and play mp3's and read email here. No forkbombs.
> No databases. No made-up benchmarks. I don't know what it's doing (or
> trying to do) in there but I'd like the CPU cycles back.
>
> From a recent /proc/profile dump on 2.4.17-pre1 (no patches), my top 5
> (excluding default_idle) are:
> --------------------------------------------------------
> 22420 total 0.0168
> 4624 default_idle 96.3333
> 1280 schedule 0.6202
> 1130 handle_IRQ_event 11.7708
> 929 file_read_actor 9.6771
> 843 fast_clear_page 7.5268

The most likely cause is simply waking up after each sound interrupt: you
also have a _lot_ of time handling interrupts. Quite frankly, web surfing
and mp3 playing simply shouldn't use any noticeable amounts of CPU.

The point being that I really doubt it's the scheduler proper, it's
probably how it is _used_. And I'd suspect your sound driver (or user)
conspires to keep scheduling stuff.

For example (and this is _purely_ an example, I don't know if this is
your particular case), this sounds like a classic case of "bad buffering".
What bad buffering would do is:
- you have a sound buffer that the mp3 player tries to keep full
- your sound buffer is, let's pick a random number, 64 entries of 1024
bytes each.
- the sound card gives an interrupt every time it has emptied a buffer.
- the mp3 player is waiting on "free space"
- we wake up the mp3 player for _every_ sound fragment filled.

Do you see what this leads to? We schedule the mp3 task (which gets a high
priority because it tends to run for a really short time, filling just 1
small buffer each time) _every_ time a single buffer empties. Even though
we have 63 other full buffers.

The classic fix for these kinds of things is _not_ to make the scheduler
faster. Sure, that would help, but that's not really the problem. The
_real_ fix is to use water-marks, and make the sound driver wake up the
writing process only when (say) half the buffers have emptied.

Now the mp3 player can fill 32 of the buffers at a time, and gets
scheduled an order of magnitude less. It doesn't end up waking up every
time.

Which sound driver are you using, just in case this _is_ the reason?

Linus

2001-12-18 04:56:11

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> The most likely cause is simply waking up after each sound interrupt: you
> also have a _lot_ of time handling interrupts. Quite frankly, web surfing
> and mp3 playing simply shouldn't use any noticeable amounts of CPU.

I think we have a winner:
/proc/interrupts
------------------------------------------------
CPU0
0: 17321824 XT-PIC timer
1: 4 XT-PIC keyboard
2: 0 XT-PIC cascade
5: 46490271 XT-PIC soundblaster
9: 400232 XT-PIC usb-ohci, eth0, eth1
11: 939150 XT-PIC aic7xxx, aic7xxx
14: 13 XT-PIC ide0

Approximately 4 times more often than the timer interrupt.
That's not nice...

On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> Which sound driver are you using, just in case this _is_ the reason?

SoundBlaster 16
A change of hardware should help verify this.


Cheers,
Bill

2001-12-18 05:11:36

by Thierry Forveille

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Linus Torvalds ([email protected]) writes
> On Mon, 17 Dec 2001, Rik van Riel wrote:
> >
> > Try readprofile some day, chances are schedule() is pretty
> > near the top of the list.
>
> Ehh.. Of course I do readprofile.
>
> But did you ever compare readprofile output to _total_ cycles spent?
>
I have a feeling that this discussion got sidetracked: cpu cycles burnt
in the scheduler indeed is non-issue, but big tasks being needlessly moved
around on SMPs is worth tackling.

2001-12-18 05:59:30

by ganesh

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

In article <[email protected]> you wrote:
: On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
:> The most likely cause is simply waking up after each sound interrupt: you
:> also have a _lot_ of time handling interrupts. Quite frankly, web surfing
:> and mp3 playing simply shouldn't use any noticeable amounts of CPU.

: I think we have a winner:
: /proc/interrupts
: ------------------------------------------------
: CPU0
: 0: 17321824 XT-PIC timer
: 1: 4 XT-PIC keyboard
: 2: 0 XT-PIC cascade
: 5: 46490271 XT-PIC soundblaster
: 9: 400232 XT-PIC usb-ohci, eth0, eth1
: 11: 939150 XT-PIC aic7xxx, aic7xxx
: 14: 13 XT-PIC ide0

: Approximately 4 times more often than the timer interrupt.
: That's not nice...

a bit offtopic, but the reason why there are so many interrupts is
that there's probably something like esd running. I've observed that idle
esd manages to generate tons of interrupts, although an strace of esd
reveals it stuck in a select(). probably one of the ioctls it issued
earlier is causing the driver to continuously read/write to the device.
the interrupts stop as soon as you kill esd.

: SoundBlaster 16
: A change of hardware should help verify this.

it happens even with cs4232 (redhat 7.2, 2.4.7-10smp), so I doubt it's
a soundblaster issue.

ganesh

2001-12-18 05:54:50

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, Dec 17, 2001 at 05:11:09PM -0800, Linus Torvalds wrote:
> I've told you a number of times that I'd like to see the preliminary
> implementation publicly discussed and some uses outside of private
> companies that I have no insight into..

Well, we've got serious chicken and egg problems then.

-ben
--
Fish.

2001-12-18 06:13:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Benjamin LaHaise wrote:

> On Mon, Dec 17, 2001 at 05:11:09PM -0800, Linus Torvalds wrote:
> > I've told you a number of times that I'd like to see the preliminary
> > implementation publicly discussed and some uses outside of private
> > companies that I have no insight into..
>
> Well, we've got serious chicken and egg problems then.

Why?

I'd rather have people playing around with new system calls and _test_
them, and then have to recompile their apps if the system calls move
later, than introduce new system calls that haven't gotten any public
testing at all..

Linus

2001-12-18 06:11:21

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Mon, 17 Dec 2001, William Lee Irwin III wrote:
>
> 5: 46490271 XT-PIC soundblaster
>
> Approximately 4 times more often than the timer interrupt.
> That's not nice...

Yeah.

Well, looking at the issue, the problem is probably not just in the sb
driver: the soundblaster driver shares the output buffer code with a
number of other drivers (there's some horrible "dmabuf.c" code in common).

And yes, the dmabuf code will wake up the writer on every single DMA
complete interrupt. Considering that you seem to have them at least 400
times a second (and probably more, unless you've literally had sound going
since the machine was booted), I think we know why your setup spends time
in the scheduler.

> On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> > Which sound driver are you using, just in case this _is_ the reason?
>
> SoundBlaster 16
> A change of hardware should help verify this.

A number of sound drivers will use the same logic.

You may be able to change this more easily some other way, by using a
larger fragment size for example. That's up to the sw that actually feeds
the sound stream, so it might be your decoder that selects a small
fragment size.

Quite frankly I don't know the sound infrastructure well enough to make
any more intelligent suggestions about other decoders or similar to try,
at this point I just start blathering.

But yes, I bet you'll also see much less impact of this if you were to
switch to more modern hardware.

grep grep grep.. Oh, before you do that, how about changing "min_fragment"
in sb_audio.c from 5 to something bigger like 9 or 10?

That

audio_devs[devc->dev]->min_fragment = 5;

literally means that your minimum fragment size seems to be a rather
pathetic 32 bytes (which doesn't mean that your sound will be set to that,
but it _might_ be). That sounds totally ridiculous, but maybe I've
misunderstood the code.

Jeff, you've worked on the sb code at some point - does it really do
32-byte sound fragments? Why? That sounds truly insane if I really parsed
that code correctly. That's thousands of separate DMA transfers
and interrupts per second..

Raising that min_fragment thing from 5 to 10 would make the minimum DMA
buffer go from 32 bytes to 1kB, which is a _lot_ more reasonable (what,
at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
in less than 1/100th of a second, but at least it should be < 200 irqs/sec
rather than >400).

Linus

2001-12-18 06:34:44

by Jeff Garzik

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Linus Torvalds wrote:
> Jeff, you've worked on the sb code at some point - does it really do
> 32-byte sound fragments? Why? That sounds truly insane if I really parsed
> that code correctly. That's thousands of separate DMA transfers
> and interrupts per second..

I do not see a hardware minimum fragment size in the HW docs... The
default hardware reset frag size is 2048 bytes. So, yes, 32 bytes is
pretty small for today's rate.

But... I wonder if the fault lies more with the application setting a
too-small fragment size and the driver actually allows it to do so, or,
the code following this comment in reorganize_buffers in
drivers/sound/audio.c needs to be revisited:
/* Compute the fragment size using the default algorithm */

Remember this code is from ancient times... probably written way before
44 Khz was common at all.

Jeff


--
Jeff Garzik | Only so many songs can be sung
Building 1024 | with two lips, two lungs, and one tongue.
MandrakeSoft | - nomeansno

2001-12-18 12:24:31

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:
> On Mon, 17 Dec 2001, William Lee Irwin III wrote:
> >
> > 5: 46490271 XT-PIC soundblaster
> >
> > Approximately 4 times more often than the timer interrupt.
> > That's not nice...

That's not nearly as much as your typical server system runs
in network packets and wakeups of the samba/database/http
daemons, though ...

> Well, looking at the issue, the problem is probably not just in the sb
> driver: the soundblaster driver shares the output buffer code with a
> number of other drivers (there's some horrible "dmabuf.c" code in common).

So you fixed it for the sound driver, nice. We still have
the issue tha the scheduler can take up lots of time on busy
server systems, though.

(though I suspect on those systems it probably spends more
time recalculating than selecting processes)

regards,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 14:00:22

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> to CD-RW disks without having to know about things like "ide-scsi" etc,
> and do it sanely over different bus architectures etc.
>
> The scheduler simply isn't that important.

The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
That isn't going to go away by sticking heads in sand.

2001-12-18 14:21:36

by Adam Schrotenboer

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Monday 17 December 2001 23:55, William Lee Irwin III wrote:
> On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> > The most likely cause is simply waking up after each sound interrupt: you
> > also have a _lot_ of time handling interrupts. Quite frankly, web surfing
> > and mp3 playing simply shouldn't use any noticeable amounts of CPU.
>
> I think we have a winner:
> /proc/interrupts
> ------------------------------------------------
> CPU0
> 0: 17321824 XT-PIC timer
> 1: 4 XT-PIC keyboard
> 2: 0 XT-PIC cascade
> 5: 46490271 XT-PIC soundblaster
> 9: 400232 XT-PIC usb-ohci, eth0, eth1
> 11: 939150 XT-PIC aic7xxx, aic7xxx
> 14: 13 XT-PIC ide0
>
> Approximately 4 times more often than the timer interrupt.
> That's not nice...

FWIW, I have an ES1371 based sound card, and mpg123 drives it at 172
interrupts/sec (calculated in procinfo). But that _is_ only when playing. And
(my slightly hacked) timidity drives my card w/ only 23(@48kHz sample rate;
21 @ 44.1kHz) interrupts/sec

Is this 172 figure right? (Not through esd either. i almost always turn it
off, and sp recompiled mpg123 to use the std OSS driver)

>
> On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> > Which sound driver are you using, just in case this _is_ the reason?
>
> SoundBlaster 16
> A change of hardware should help verify this.
>
>
> Cheers,
> Bill
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-12-18 15:35:25

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> Well, looking at the issue, the problem is probably not just in the sb
> driver: the soundblaster driver shares the output buffer code with a
> number of other drivers (there's some horrible "dmabuf.c" code in common).

The sb driver is fine

> A number of sound drivers will use the same logic.

Most hardware does

> Quite frankly I don't know the sound infrastructure well enough to make
> any more intelligent suggestions about other decoders or similar to try,
> at this point I just start blathering.

some of the sound stuff uses very short fragments to get accurate
audio/video synchronization. Some apps also do it gratuitously when they
should be using other API's. Its also used sensibly for things like
gnome-meeting where its worth trading CPU for latency because 1K of
buffering starts giving you earth<->moon type conversations

> But yes, I bet you'll also see much less impact of this if you were to
> switch to more modern hardware.

Not really - the app asked for an event every 32 bytes. This is an app not
kernel problem.

> at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
> in less than 1/100th of a second, but at least it should be < 200 irqs/sec
> rather than >400).

With a few exceptions the applications tend to use 4K or larger DMA chunks
anyway. Very few need tiny chunks.

Alan

2001-12-18 15:36:35

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> I have a feeling that this discussion got sidetracked: cpu cycles burnt
> in the scheduler indeed is non-issue, but big tasks being needlessly moved
> around on SMPs is worth tackling.]

Its not a non issue - 40% of an 8 way box is a lot of lost CPU. Fixing the
CPU bounce around problem also matters a lot - Ingo's speedups seen just by
improving that on the current scheduler show its worth the work


2001-12-18 15:54:35

by Martin Josefsson

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

>
> On Mon, 17 Dec 2001, William Lee Irwin III wrote:
> >
> > 5: 46490271 XT-PIC soundblaster
> >
> > Approximately 4 times more often than the timer interrupt.
> > That's not nice...

0: 24867181 XT-PIC timer
5: 9070614 XT-PIC soundblaster

After I bootup I start X and then xmms and then my system plays mp3's
almost all the time.

> > > Which sound driver are you using, just in case this _is_ the reason?
> >
> > SoundBlaster 16

I have an old ISA SoundBlaster 16

> Raising that min_fragment thing from 5 to 10 would make the minimum DMA
> buffer go from 32 bytes to 1kB, which is a _lot_ more reasonable (what,
> at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
> in less than 1/100th of a second, but at least it should be < 200 irqs/sec
> rather than >400).

After watchning /proc/interrupts with 30 second intervals I see that I
only get 43 interrupts/second when playing 16bit 44.1kHz stereo.

And according to vmstat I have 153-158 interrupts/second in total
(it's probably the networktraffic that increases it a little above 143).

/Martin

2001-12-18 16:19:46

by Roger Larsson

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

This might be of interest on linux-audio-dev too...

On Tuesday den 18 December 2001 07.09, Linus Torvalds wrote:
> On Mon, 17 Dec 2001, William Lee Irwin III wrote:
> > 5: 46490271 XT-PIC soundblaster
> >
> > Approximately 4 times more often than the timer interrupt.
> > That's not nice...
>
> Yeah.
>
> Well, looking at the issue, the problem is probably not just in the sb
> driver: the soundblaster driver shares the output buffer code with a
> number of other drivers (there's some horrible "dmabuf.c" code in common).
>
> And yes, the dmabuf code will wake up the writer on every single DMA
> complete interrupt. Considering that you seem to have them at least 400
> times a second (and probably more, unless you've literally had sound going
> since the machine was booted), I think we know why your setup spends time
> in the scheduler.
>
> > On Mon, Dec 17, 2001 at 08:27:18PM -0800, Linus Torvalds wrote:
> > > Which sound driver are you using, just in case this _is_ the reason?
> >
> > SoundBlaster 16
> > A change of hardware should help verify this.
>
> A number of sound drivers will use the same logic.
>
> You may be able to change this more easily some other way, by using a
> larger fragment size for example. That's up to the sw that actually feeds
> the sound stream, so it might be your decoder that selects a small
> fragment size.
>
> Quite frankly I don't know the sound infrastructure well enough to make
> any more intelligent suggestions about other decoders or similar to try,
> at this point I just start blathering.
>
> But yes, I bet you'll also see much less impact of this if you were to
> switch to more modern hardware.
>
> grep grep grep.. Oh, before you do that, how about changing "min_fragment"
> in sb_audio.c from 5 to something bigger like 9 or 10?
>
> That
>
> audio_devs[devc->dev]->min_fragment = 5;
>
> literally means that your minimum fragment size seems to be a rather
> pathetic 32 bytes (which doesn't mean that your sound will be set to that,
> but it _might_ be). That sounds totally ridiculous, but maybe I've
> misunderstood the code.

I think it really is 32 samples, yes that is little - but too small?
It depends on the used sample frequency...

Paul Davis wrote this on linux-audio-dev 2001-12-05
"in doing lots of testing on JACK, i've noticed that although the
trident driver now works (there were some patches from jaroslav and
myself), in general i still get xruns with the lowest possible latency
setting for that card (1.3msec per interrupt, 2.6msec buffer). with
the same settings on my hammerfall, i don't get xruns, even with
substantial system load."

>
> Jeff, you've worked on the sb code at some point - does it really do
> 32-byte sound fragments? Why? That sounds truly insane if I really parsed
> that code correctly. That's thousands of separate DMA transfers
> and interrupts per second..
>

Lets see: we have >1 GHz CPU and interrupts at >1000 Hz
=> 1 Mcycle / interrupt - is that insane?

If the hardware can support it? Why not let it? It is really up to the
applications/user to decide...

> Raising that min_fragment thing from 5 to 10 would make the minimum DMA
> buffer go from 32 bytes to 1kB, which is a _lot_ more reasonable (what,
> at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
> in less than 1/100th of a second, but at least it should be < 200 irqs/sec
> rather than >400).
>

Yes, it is probably more reasonable - but if the soundcard can support it?
(I have a vision of lots of linux-audio-dev folks pulling out their new
soundcard and replacing it with their since long forgotten SB16...)

/RogerL

--
Roger Larsson
Skellefte?
Sweden

2001-12-18 16:52:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Benjamin LaHaise wrote:
> On Mon, Dec 17, 2001 at 10:10:30PM -0800, Linus Torvalds wrote:
> > > Well, we've got serious chicken and egg problems then.
> >
> > Why?
>
> The code can't go into glibc without syscall numbers being reserved.

It sure as hell can.

And I'll bet $5 USD that glibc wouldn't take the patches anyway before
the kernel interfaces are _tested_.

> I've posted the code, there are people playing with it. I can't make them
> comment.

Well, if people aren't interested, then it doesn't _ever_ go in.

Remember: we do not add features just because we can.

Quite frankly, I don't think you've told that many people. I haven't seen
any discussion about the aio stuff on linux-kernel, which may be because
you posted several announcements and nobody cared, or it may be that
you've only mentioned it fleetingly and people didn't notice.

Take a look at how long it took for ext3 to be "standard" - I put them in
my tree when I started getting real feedback that it was used and people
liked using it. I simply do not like applying patches "just to get users".
Not even reservations - because I reserve the right to _never_ apply
something if critical review ends up saying that "that doesn't make
sense".

Quite frankly, the fact that it is being tested out at places like Oracle
etc is secondary - those people will use anything. That's proven by
history. That doesn't mean that _I_ accept anything.

Now, the fact that I like the interfaces is actually secondary - it does
make me much more likely to include it even in a half-baked thing, but it
does NOT mean that I trust my own taste so much that I'd do it "under the
covers" with little open discussion, use and modification.

Where _is_ the discussion on linux-kernel?

Where are the negative comments from Al? (Al _always_ has negative
comments and suggestions for improvements, don't try to say that he also
liked it unconditionally ;)

Linus

2001-12-18 16:53:00

by Mike Kravetz

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, Dec 18, 2001 at 02:09:16PM +0000, Alan Cox wrote:
> The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> That isn't going to go away by sticking heads in sand.

Can you be more specific as to the workload you are referring to?
As someone who has been playing with the scheduler for a while,
I am interested in all such workloads.

--
Mike

2001-12-18 16:56:52

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> Where _is_ the discussion on linux-kernel?

Which mailing lists do you want to be subscribed to ? ;)

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 17:02:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Alan Cox wrote:
>
> The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> That isn't going to go away by sticking heads in sand.

Did you _read_ what I said?

We _have_ patches. You apparently have your own set.

Fight it out. Don't involve me, because I don't think it's even a
challenging thing. I wrote what is _still_ largely the algorithm in 1991,
and it's damn near the only piece of code from back then that even _has_
some similarity to the original code still. All the "recompute count when
everybody has gone down to zero" was there pretty much from day 1 (*).

Which makes me say: "oh, a quick hack from 1991 works on most machines in
2001, so how hard a problem can it be?"

Fight it out. People asked whether I was interested, and I said "no". Take
a clue: do benchmarks on all the competing patches, and try to create the
best one, and present it to me as a done deal.

Linus

(*) The single biggest change from day 1 is that it used to iterate over a
global array of process slots, and for scalability reasons (not CPU
scalability, but "max nr of processes in the system" scalability) the
array was gotten rid of, giving the current doubly linked list. Everything
else that any scheduler person complains about was pretty much there
otherwise ;)

2001-12-18 17:10:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Martin Josefsson wrote:
>
> After watchning /proc/interrupts with 30 second intervals I see that I
> only get 43 interrupts/second when playing 16bit 44.1kHz stereo.

That's _exactly_ what you get with a 4kB fragment size.

You have a sane player that asks for a sane fragment size. While whatever
William uses seems to ask for a really small one..

Linus

2001-12-18 17:09:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Alan Cox wrote:
>
> > at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
> > in less than 1/100th of a second, but at least it should be < 200 irqs/sec
> > rather than >400).
>
> With a few exceptions the applications tend to use 4K or larger DMA chunks
> anyway. Very few need tiny chunks.

Doing another grep seems to imply that none of the other drivers even
allow as small chunks as the sb driver does, 32 byte "events" is just
ridiculous. At simple 2-channel, 16-bits, CD-quality sound, that's a DMA
event every 0.18 msec (5500 times a second, 181 _micro_seconds appart).

I obviously agree that the app shouldn't even ask for small chunks:
whether a mp3 player reacts within 1/10th or 1/1000th of a second of the
user asking it to switch tracks, nobody can even tell. So an mp3 player
should probably use a big fragment size on the order of 4kB or similar
(that still gives max fragment latency of 0.022 seconds, faster than
humans can react).

So it sounds like a player sillyness, but I don't think the driver should
even allow such waste of resources, considering that no other driver
allows it either..

Linus

2001-12-18 17:16:51

by John Heil

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Alan Cox wrote:

> Date: Tue, 18 Dec 2001 14:09:16 +0000 (GMT)
> From: Alan Cox <[email protected]>
> To: Linus Torvalds <[email protected]>
> Cc: Rik van Riel <[email protected]>,
> Davide Libenzi <[email protected]>,
> Kernel Mailing List <[email protected]>
> Subject: Re: Scheduler ( was: Just a second ) ...
>
> > to CD-RW disks without having to know about things like "ide-scsi" etc,
> > and do it sanely over different bus architectures etc.
> >
> > The scheduler simply isn't that important.
>
> The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> That isn't going to go away by sticking heads in sand.

What % of a std 2 cpu, do you think it eats?

> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

-
-----------------------------------------------------------------
John Heil
South Coast Software
Custom systems software for UNIX and IBM MVS mainframes
1-714-774-6952
[email protected]
http://www.sc-software.com
-----------------------------------------------------------------

2001-12-18 17:13:40

by Herman Oosthuysen

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

My tuppence worth from a real-time embedded perspective:
A shorter time slice and other real-time improvements to the scheduler will
certainly improve life to the embedded crowd. Bear in mind that 90% of
processors are used for embedded apps. Shorter time slices etc. means
smaller buffers, less RAM and lower cost.

I don't know what the current distribution is for Linux regarding embedded
vs data processing, but the embedded use of Linux is certainly growing
rapidly - we expect to make a million thingummyjigs running Linux next year
and there are many other companies doing the same. Within the next few
years, I expect embedded use of Linux to overshadow data use by a large
margin.

Since embedded processors are 'invisible' and never in the news, I would be
very happy if Linus and others will keep us poor boys in mind...
--
Herman Oosthuysen
[email protected]
Suite 300, #3016, 5th Ave NE,
Calgary, Alberta, T2A 6K4, Canada
Phone: (403) 569-5688, Fax: (403) 235-3965
----- Original Message ----- >
> Lets see: we have >1 GHz CPU and interrupts at >1000 Hz
> => 1 Mcycle / interrupt - is that insane?
>
> If the hardware can support it? Why not let it? It is really up to the
> applications/user to decide...
>
> > Raising that min_fragment thing from 5 to 10 would make the minimum DMA
> > buffer go from 32 bytes to 1kB, which is a _lot_ more reasonable (what,
> > at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer
empties
> > in less than 1/100th of a second, but at least it should be < 200
irqs/sec
> > rather than >400).
> >
>
> /RogerL
>
> --
> Roger Larsson
> Skellefte?
> Sweden


2001-12-18 17:18:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Roger Larsson wrote:
>
> Lets see: we have >1 GHz CPU and interrupts at >1000 Hz
> => 1 Mcycle / interrupt - is that insane?

Ehh.. First off, the CPU may be 1GHz, but the memory subsystem, and the
PCI subsystem definitely are _not_. Most PCI cards still run at a
(comparatively) leisurely 33MHz, and when we're talking about audio, we're
talking about actually having to _access_ that audio device.

Yes. At 33MHz, not at 1GHz.

Also, at 32-byte fragments, the frequency is actually 5.5kHz, not 1kHz.
Now, I seriously doubt the mp3-player actually used 32-byte fragments (it
probably just asked for something small, and got it), but let's say it
asked for something in the kHz range (ie 256-512 byte frags). That does
_not_ equate to "1 Mcycle". It equates to 33 _kilocycles_ in PCI-land, and
a PCI read will take several cycles.

> If the hardware can support it? Why not let it? It is really up to the
> applications/user to decide...

Well, this particular user was unhappy with the CPU spending a noticeably
amount of time on just web-surfing and mp3-playing.

So clearly the _user_ didn't ask for it.

And I suspect that the app writer just didn't even realize what he did. He
may have used another sound card that didn't even allow small fragments.

Linus

2001-12-18 17:22:20

by David Mansfield

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

>
> audio_devs[devc->dev]->min_fragment = 5;
>

Generally speaking, you want to be able to specify about a 1ms fragment,
speaking as a realtime audio programmer (no offense Victor...). However,
1ms is 128 bytes at 16bit stereo, but only 32 bytes at 8bit mono. Nobody
does 8bit mono, but that's probably why it's there. A lot of drivers seem
to have 128 byte as minimum fragment size. Even the high end stuff like
the RME hammerfall only go down to 64 byte fragment PER CHANNEL, which is
the same as 128 bytes for stereo in the SB 16.

> Raising that min_fragment thing from 5 to 10 would make the minimum DMA
> buffer go from 32 bytes to 1kB, which is a _lot_ more reasonable (what,
> at 2*2 bytes per sample and 44kHz would mean that a 1kB DMA buffer empties
> in less than 1/100th of a second, but at least it should be < 200 irqs/sec
> rather than >400).

Note that the ALSA drivers allow the app to set watermarks for wakeup,
while allowing flexibility in fragment size and number. You can
essentially say, wake me up when there are at least n fragments empty, and
put me to sleep if m fragments are full.

David

--
/==============================\
| David Mansfield |
| [email protected] |
\==============================/

2001-12-18 17:24:00

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Mike Kravetz wrote:
> On Tue, Dec 18, 2001 at 02:09:16PM +0000, Alan Cox wrote:
> > The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> > That isn't going to go away by sticking heads in sand.
>
> Can you be more specific as to the workload you are referring to?
> As someone who has been playing with the scheduler for a while,
> I am interested in all such workloads.

Well, careful: depending on what "%" means, a 8-cpu machine has either
"100% max" or "800% max".

So are we talking about "we spend 40-60% of all CPU cycles in the
scheduler" or are we talking about "we spend 40-60% of the CPU power of
_one_ CPU out of 8 in the scheduler".

Yes, 40-60% sounds like a lot ("Wow! About half the time is spent in the
scheduler"), but I bet it's 40-60% of _one_ CPU, which really translates
to "The worst scheduler case I've ever seen under a real load spent 5-8%
of the machine CPU resources on scheduling".

And let's face it, 5-8% is bad, but we're not talking "half the CPU power"
here.

Linus

2001-12-18 17:20:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Rik van Riel wrote:
> On Tue, 18 Dec 2001, Linus Torvalds wrote:
>
> > Where _is_ the discussion on linux-kernel?
>
> Which mailing lists do you want to be subscribed to ? ;)

I'm not subscribed to any, thank you very much. I read them through a news
gateway, which gives me access to the common ones.

And if the discussion wasn't on the common ones, then it wasn't an open
discussion.

And no, I don't think IRC counts either, sorry.

Linus

2001-12-18 17:29:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, David Mansfield wrote:
> >
> > audio_devs[devc->dev]->min_fragment = 5;
> >
>
> Generally speaking, you want to be able to specify about a 1ms fragment,
> speaking as a realtime audio programmer (no offense Victor...). However,
> 1ms is 128 bytes at 16bit stereo, but only 32 bytes at 8bit mono. Nobody
> does 8bit mono, but that's probably why it's there. A lot of drivers seem
> to have 128 byte as minimum fragment size.

Good point.

Somebody should really look at "dma_set_fragment", and see whether we can
make "min_fragment" be really just a hardware minimum chunk size, but use
other heuristics like frequency to cut off the minimum size (ie just do
something like

/* We want to limit it to 1024 Hz */
min_bytes = freq*channel*bytes_per_channel >> 10;

Although I'm not sure we _have_ the frequency at that point: somebody
might set the fragment size first, and the frequency later.

Maybe the best thing to do is to educate the people who write the sound
apps for Linux (somebody was complaining about "esd" triggering this, for
example).

Linus

2001-12-18 17:48:31

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

>
> On Tue, 18 Dec 2001, Mike Kravetz wrote:
> > On Tue, Dec 18, 2001 at 02:09:16PM +0000, Alan Cox wrote:
> > > The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> > > That isn't going to go away by sticking heads in sand.
> >
> > Can you be more specific as to the workload you are referring to?
> > As someone who has been playing with the scheduler for a while,
> > I am interested in all such workloads.
>
> Well, careful: depending on what "%" means, a 8-cpu machine has either
> "100% max" or "800% max".
>
> So are we talking about "we spend 40-60% of all CPU cycles in the
> scheduler" or are we talking about "we spend 40-60% of the CPU power of
> _one_ CPU out of 8 in the scheduler".
>
> Yes, 40-60% sounds like a lot ("Wow! About half the time is spent in the
> scheduler"), but I bet it's 40-60% of _one_ CPU, which really translates
> to "The worst scheduler case I've ever seen under a real load spent 5-8%
> of the machine CPU resources on scheduling".
>
> And let's face it, 5-8% is bad, but we're not talking "half the CPU power"
> here.

Linus, you're plain right that we can spend days debating about the
scheduler load.
You've to agree that sharing a single lock/queue for multiple CPU is,
let's say, quite crappy.
You agreed that the scheduler is easy and the fix should not take that
much time.
You said that you're going to accept the solution that is coming out from
the mailing list.
Why don't we start talking about some solution and code ?
Starting from a basic architecture down to the implementation.
Alan and Rik are quite "unloaded" now, what do You think ?



- Davide


2001-12-18 17:53:11

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> Quite frankly, I don't think you've told that many people. I haven't seen
> any discussion about the aio stuff on linux-kernel, which may be because
> you posted several announcements and nobody cared, or it may be that
> you've only mentioned it fleetingly and people didn't notice.

This is not to ask the inclusion of /dev/epoll inside the kernel ( it can
be easily merged by users that want to use it ) but i've had its users to
prefer talking about that out of the mailing list. Maybe because they're
scared to be eaten by some gurus when asking easy questions :)




- Davide


2001-12-18 17:56:31

by Andreas Dilger

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Dec 18, 2001 09:27 -0800, Linus Torvalds wrote:
> Maybe the best thing to do is to educate the people who write the sound
> apps for Linux (somebody was complaining about "esd" triggering this, for
> example).

Yes, esd is an interrupt hog, it seems. When reading this thread, I
checked, and sure enough I was getting 190 interrupts/sec on the
sound card while not playing any sound. I killed esd (which I don't
use anyways), and interrupts went to 0/sec when not playing sound.
Still at 190/sec when using mpg123 on my ymfpci (Yamaha YMF744B DS-1S)
sound card.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-12-18 18:04:21

by Daniel Egger

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On 18 Dec, Alan Cox wrote:

> The scheduler is eating 40-60% of the machine on real world 8 cpu
> workloads. That isn't going to go away by sticking heads in sand.

What about a CONFIG_8WAY which, if set, activates a scheduler that
performs better on such nontypical machines? I see and understand
boths sides arguments yet I fail to see where the real problem is
with having a scheduler that just kicks in _iff_ we're running the
kernel on a nontypical kind of machine.
This would keep the straigtforward scheduler Linus is defending
for the single processor machines while providing more performance
to heavy SMP machines by having a more complex scheduler better suited
for this task.

--
Servus,
Daniel

2001-12-18 18:11:31

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, 17 Dec 2001, Linus Torvalds wrote:

> The most likely cause is simply waking up after each sound interrupt: you
> also have a _lot_ of time handling interrupts. Quite frankly, web surfing
> and mp3 playing simply shouldn't use any noticeable amounts of CPU.

It must be noted that wking up a task is going to take two lock operations
( and two unlock ), one in try_to_wakeup() and the other one in schedule().
This double the frequency seen by the lock.



- Davide


2001-12-18 18:25:53

by William Lee Irwin III

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Mon, Dec 17, 2001 at 10:09:22PM -0800, Linus Torvalds wrote:
> Well, looking at the issue, the problem is probably not just in the sb
> driver: the soundblaster driver shares the output buffer code with a
> number of other drivers (there's some horrible "dmabuf.c" code in common).
> And yes, the dmabuf code will wake up the writer on every single DMA
> complete interrupt. Considering that you seem to have them at least 400
> times a second (and probably more, unless you've literally had sound going
> since the machine was booted), I think we know why your setup spends time
> in the scheduler.
> A number of sound drivers will use the same logic.

I've chucked the sb32 and plugged in the emu10k1 I had been planning
to install for a while, to good effect. It's not an ISA sb16, but it
apparently uses the same driver.

I'm getting an overall 1% reduction in system load, and the following
"top 5" profile:

53374 total 0.0400
11430 default_idle 238.1250
8820 handle_IRQ_event 91.8750
2186 do_softirq 10.5096
1984 schedule 1.2525
1612 number 1.4816
1473 __generic_copy_to_user 18.4125

Oddly, I'm getting even more interrupts than I saw with the sb32...

0: 2752924 XT-PIC timer
9: 14223905 XT-PIC EMU10K1, eth1

(eth1 generates orders of magnitude fewer interrupts than the timer)

On Mon, Dec 17, 2001 at 10:09:22PM -0800, Linus Torvalds wrote:
> You may be able to change this more easily some other way, by using a
> larger fragment size for example. That's up to the sw that actually feeds
> the sound stream, so it might be your decoder that selects a small
> fragment size.
> Quite frankly I don't know the sound infrastructure well enough to make
> any more intelligent suggestions about other decoders or similar to try,
> at this point I just start blathering.

Already more insight into the problem I was experiencing than I had
before, and I must confess to those such as myself this lead certainly
seems "plucked out of the air". Good work! =)

On Mon, Dec 17, 2001 at 10:09:22PM -0800, Linus Torvalds wrote:
> But yes, I bet you'll also see much less impact of this if you were to
> switch to more modern hardware.

I hear from elsewhere the emu10k1 has a bad reputation as source of
excessive interrupts. Looks like I bought the wrong sound card(s).
Maybe I should go shopping. =)


Thanks a bunch!
Bill

2001-12-18 18:28:03

by Doug Ledford

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Andreas Dilger wrote:

> On Dec 18, 2001 09:27 -0800, Linus Torvalds wrote:
>
>>Maybe the best thing to do is to educate the people who write the sound
>>apps for Linux (somebody was complaining about "esd" triggering this, for
>>example).
>>
>
> Yes, esd is an interrupt hog, it seems. When reading this thread, I
> checked, and sure enough I was getting 190 interrupts/sec on the
> sound card while not playing any sound. I killed esd (which I don't
> use anyways), and interrupts went to 0/sec when not playing sound.
> Still at 190/sec when using mpg123 on my ymfpci (Yamaha YMF744B DS-1S)
> sound card.


Weel, evidently esd and artsd both do this (well, I assume esd does now, it
didn't do this in the past). Basically, they both transmit silence over the
sound chip when nothing else is going on. So even though you don't hear
anything, the same sound output DMA is taking place. That avoids things
like nasty pops when you start up the sound hardware for a beep and that
sort of thing. It also maintains state where as dropping output entirely
could result in things like module auto unloading and then reloading on the
next beep, etc. Personally, the interrupt count and overhead annoyed me
enough that when I started hacking on the i810 sound driver one of my
primary goals was to get overhead and interrupt count down. I think I
suceeded quite well. On my current workstation:

Context switches per second not playing any sound: 8300 - 8800
Context switches per second playing an MP3: 9200 - 9900
Interrupts per second from sound device: 86
%CPU used when not playing MP3: 0 - 3% (magicdev is a CPU pig once every 2
seconds)
%CPU used when playing MP3s: 0 - 4%

In any case, it might be worth the original poster's time in figuring out
just how much of his lost CPU is because of playing sound and how much is
actually caused by the windowing system and all the associated bloat that
comes with it now a days.





--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-18 18:37:43

by Mike Kravetz

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, Dec 18, 2001 at 04:34:57PM +0100, [email protected] wrote:
> What about a CONFIG_8WAY which, if set, activates a scheduler that
> performs better on such nontypical machines?

I'm pretty sure that we can create a scheduler that works well on
an 8-way, and works just as well as the current scheduler on a UP
machine. There is already a CONFIG_SMP which is all that should
be necessary to distinguish between the two.

What may be of more concern is support for different architectures
such as HMT and NUMA. What about better scheduler support for
people working in the RT embedded space? Each of these seem to
have different scheduling requirements. Do people working on these
'non-typical' machines need to create their own scheduler patches?
OR is there some 'clean' way to incorporate them into the source
tree?

--
Mike

2001-12-18 18:47:53

by Davide Libenzi

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001 [email protected] wrote:

> On 18 Dec, Alan Cox wrote:
>
> > The scheduler is eating 40-60% of the machine on real world 8 cpu
> > workloads. That isn't going to go away by sticking heads in sand.
>
> What about a CONFIG_8WAY which, if set, activates a scheduler that
> performs better on such nontypical machines? I see and understand
> boths sides arguments yet I fail to see where the real problem is
> with having a scheduler that just kicks in _iff_ we're running the
> kernel on a nontypical kind of machine.
> This would keep the straigtforward scheduler Linus is defending
> for the single processor machines while providing more performance
> to heavy SMP machines by having a more complex scheduler better suited
> for this task.

By using a multi queue scheduler with global balancing policy you can keep
the core scheduler as is and have the balancing code to take care of
distributing the load.
Obviously that code is under CONFIG_SMP, so it's not even compiled in UP.
In this way you've the same scheduler code running independently with a
lower load on the run queue and an high locality of locking.




- Davide


2001-12-18 18:52:44

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> Maybe the best thing to do is to educate the people who write the sound
> apps for Linux (somebody was complaining about "esd" triggering this, for
> example).

esd is a culprit, and artsd to an extent. esd is scheduled to die so artsd
is the big one to tidy. Kernel side OSS is dead so its a matter for ALSA

2001-12-18 18:55:45

by Andreas Dilger

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Dec 18, 2001 13:27 -0500, Doug Ledford wrote:
> Andreas Dilger wrote:
> > Yes, esd is an interrupt hog, it seems. When reading this thread, I
> > checked, and sure enough I was getting 190 interrupts/sec on the
> > sound card while not playing any sound. I killed esd (which I don't
> > use anyways), and interrupts went to 0/sec when not playing sound.
> > Still at 190/sec when using mpg123 on my ymfpci (Yamaha YMF744B DS-1S)
> > sound card.
>
> Weel, evidently esd and artsd both do this (well, I assume esd does now, it
> didn't do this in the past). Basically, they both transmit silence over the
> sound chip when nothing else is going on. So even though you don't hear
> anything, the same sound output DMA is taking place. That avoids things
> like nasty pops when you start up the sound hardware for a beep and that
> sort of thing.

Hmm, I _do_ notice a pop when the sound hardware is first initialized at
boot time, but not when mpg123 starts/stops (without esd running) so I
personally don't get any benefit from "the sound of silence". That said,
asside from the 190 interrupts/sec from esd, it doesn't appear to use any
measurable CPU time by itself.

> Context switches per second not playing any sound: 8300 - 8800
> Context switches per second playing an MP3: 9200 - 9900

Hmm, something seems very strange there. On an idle system, I get about
100 context switches/sec, and about 150/sec when playing sound (up to 400/sec
when moving the mouse between windows). 9000 cswitches/sec is _very_ high.
This is with a text-only player which has screen output (other than the
ID3 info from the currently played song).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-12-18 19:11:34

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> I'm not subscribed to any, thank you very much. I read them through a news
> gateway, which gives me access to the common ones.
>
> And if the discussion wasn't on the common ones, then it wasn't an open
> discussion.

If the discussion was on the l/k list then most kernel developers arent
going to read it because tey dont have time to wade through all the crap
that doesnt matter to them.

> And no, I don't think IRC counts either, sorry.

IRC is where most stuff, especially cross vendor stuff is initially
discussed nowdays, along with kernelnewbies where most of the intro
stuff is - but thats disussed rather than formally proposed and studied

2001-12-18 19:06:56

by Doug Ledford

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Andreas Dilger wrote:


> Hmm, I _do_ notice a pop when the sound hardware is first initialized at
> boot time, but not when mpg123 starts/stops (without esd running) so I
> personally don't get any benefit from "the sound of silence". That said,
> asside from the 190 interrupts/sec from esd, it doesn't appear to use any
> measurable CPU time by itself.
>
>
>>Context switches per second not playing any sound: 8300 - 8800
>>Context switches per second playing an MP3: 9200 - 9900
>>
>
> Hmm, something seems very strange there. On an idle system, I get about
> 100 context switches/sec, and about 150/sec when playing sound (up to 400/sec
> when moving the mouse between windows). 9000 cswitches/sec is _very_ high.
> This is with a text-only player which has screen output (other than the
> ID3 info from the currently played song).


I haven't taken the time to track down what's causing all the context
switches, but on my system they are indeed "normal". I suspect large
numbers of them are a result of interactions between gnome, nautilus, X,
xmms, esd, and gnome-xmms. However, I did just track down one reason for
it. It's not 8300 - 8800, its 830 - 880. There appears to be a bug in the
procinfo -n1 mode that results in an extra digit getting tacked onto the end
of the context switch line. So, take my original numbers and lop off the
last digit from the context switch numbers and that's more like what the
machine is actually doing.





--

Doug Ledford <[email protected]> http://people.redhat.com/dledford
Please check my web site for aic7xxx updates/answers before
e-mailing me about problems

2001-12-18 19:11:34

by Alan

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> > The scheduler is eating 40-60% of the machine on real world 8 cpu workloads.
> > That isn't going to go away by sticking heads in sand.
>
> Did you _read_ what I said?
>
> We _have_ patches. You apparently have your own set.

I did read that mail - but somewhat later. Right now Im scanning l/k
every few days no more.

As to my stuff - everything I propose different to ibm/davide is about
cost/speed of ordering or minor optimisations. I don't plan to compete and
duplicate work

2001-12-18 19:18:57

by Mike Galbraith

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> And no, I don't think IRC counts either, sorry.

Well yeah.. it's synchronous IO :)

-Mike

2001-12-18 19:17:33

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> And no, I don't think IRC counts either, sorry.

Whether you think it counts or not, IRC is where
most stuff is happening nowadays.

cheers,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 18:37:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Tue, 18 Dec 2001, Andreas Dilger wrote:
>
> Yes, esd is an interrupt hog, it seems. When reading this thread, I
> checked, and sure enough I was getting 190 interrupts/sec on the
> sound card while not playing any sound. I killed esd (which I don't
> use anyways), and interrupts went to 0/sec when not playing sound.
> Still at 190/sec when using mpg123 on my ymfpci (Yamaha YMF744B DS-1S)
> sound card.

190 interrupts / sec sounds excessive, but not wildly so. The interrupt
per se is not going to be a CPU hog unless the sound card does programmed
IO to fill the data queues, and while that is not unheard of, I don't
think such a card has been made in the last five years.

Obviously getting 190 irq's per second even when not actually _doing_
anything is a total waste of CPU, and is bad form. There may be some
reason why esd does it, most probably for good synchronization between
sound events and to avoid popping when the sound is shut down (many sound
drivers seem to pop a bit on open/close, possibly due to driver bugs, but
possibly because some hard-to-avoid-programmatically hardware glitch when
powering down the logic.

So waiting a while with the driver active may actually be a reasonable
thing to do, although I suspect that after long sequences of silence "esd"
should really shut down for a while (and "long" here is probably on the
order of seconds, not minutes).

What probably _really_ ends up hurting performance is probably not the
interrupt per se (although it is noticeable), but the fact that we wake up
and cause a schedule - which often blows any CPU caches, making the _next_
interrupt also be more expensive than it would possibly need to be.

The code for that (in the case of drivers that use the generic "dmabuf.c"
infrastructure) seems to be in "finish_output_interrupt()", and I suspect
that it could be improved with something like

dmap = adev->dmap_out;
lim = dmap->nbufs;
if (lim < 2) lim = 2;
if (dmap->qlen <= lim/2) {
...
}

around the current unconditional wakeups.

Yeah, yeah, untested, stupid example, the idea being that we only wake up
if we have at least half the frags free now, instead of waking up for
_every_ fragment that free's up.

The above is just as a suggestion for some testing, if somebody actually
feels like trying it out. It probably won't be good as-is, but as a
starting point..

Linus

2001-12-18 19:46:26

by Alexander Viro

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...



On Tue, 18 Dec 2001, Linus Torvalds wrote:

> Where are the negative comments from Al? (Al _always_ has negative
> comments and suggestions for improvements, don't try to say that he also
> liked it unconditionally ;)

Heh.

Aside of a _big_ problem with exposing async API to userland (for a
lot of reasons, including usual quality of async code in general and
event-drivel one in particular) there is more specific one - Ben's
long-promised full-async writepage() and friends. I'll believe it
when I see it and so far it didn't appear.

So for the time being I'm staying the fsck out of that - I don't like
it, but I'm sick and tired of this sort of religious wars.

2001-12-18 20:01:55

by Gerd Knorr

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

> Kernel side OSS is dead

What do you mean with "Kernel side OSS"? Only Hannu's OSS/free drivers?
Or all current kernel drivers which support the OSS API, including most
(all?) PCI sound drivers which don't use any old OSS/free code?

Gerd

--
#define ENOCLUE 125 /* userland programmer induced race condition */

2001-12-18 20:37:53

by Ingo Molnar

[permalink] [raw]
Subject: Re: in defense of the linux-kernel mailing list


On Tue, 18 Dec 2001, Rik van Riel wrote:

> > And no, I don't think IRC counts either, sorry.
>
> Whether you think it counts or not, IRC is where most stuff is
> happening nowadays.

most of the useful traffic on lkml cannot be expressed well on IRC. While
IRC might be useful as an additional form of communication channel, email
lists IMO should still be the main driving force of Linux kernel
development, else we'll only concentrate on those minute ideas that can be
expressed in 1-2 lines on irc and which are simple enough to be understood
until the next message comes. Also, the lack of reliable archiving of IRC
traffic prevents newcomers of reproducing the thought process afterwards.
While IRC might result in the seasoned kernel developer doing the next
super-patch quickly, it will in the end effect only isolate and alienate
newcomers and will only result in an aging, personality-driven elitist
old-boys network and a dying OS.

Regarding the use of IRC as the main development medium for the Linux
kernel - the fast pace of IRC often prevents deeper thoughts - while this
is definitely the point for many people who use IRC, it cannot result in a
much better kernel. [that having said, i'm using irc on a daily basis as
well so this is not irc-bashing, but i rarely use it for development
purposes.]

It's true that reading off-topic emails on lkml isnt a wise use of
developer powers either, but this has to be taken into account just like
spam - it's the price of having an open forum.

and honestly, much of the complaints about lkml's quality are exagerated.
What you dont take into account is the fact that while 3 or 5 years ago
you found perhaps every email on lkml exciting and challenging, today you
are an experienced kernel hacker and find perhaps 90% of the traffic
'boring'. I've just done a test - and perhaps i picked the wrong set of
emails - but the majority of lkml traffic is pretty legitimate, and i
would have found most of them 'interesting and exciting' just 5 years ago.
Today i know what they mean and might find them less challenging to
understand - but that is one of the bad side-effects of experience.
Today there are more people on lkml, more bugs get reported, and more
patches are discussed - so keeping up with lkml traffic is harder. Perhaps
it might make sense to separate linux-kernel into two lists:
linux-kernel-bugs and linux-kernel-devel (without moderation), but
otherwise the current form and quality of discussions (knock on wood) is
pretty OK i think.

also, more formal emails match actual source code format better than the
informal IRC traffic. So by being kindof forced to structure information
into a larger set of ASCII text, it will also be the first step towards
good kernel code.

(on IRC one might be the super-hacker with a well-known nick, entering and
exiting channels, being talked to by newbies. It might boost one's ego.
But it should not cloud one's judgement.)

Ingo

2001-12-18 21:02:54

by Larry McVoy

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Maybe I'm an old stick in the mud, but IRC seems like a big waste of
time to me. It's perfect for off the cuff answers and fairly useless
for thoughtful answers. We used to write well thought out papers and
specifications for OS work. These days if you can't do it in a paragraph
on IRC it must not be worth doing, eh?

On Tue, Dec 18, 2001 at 07:04:59PM +0000, Alan Cox wrote:
> > I'm not subscribed to any, thank you very much. I read them through a news
> > gateway, which gives me access to the common ones.
> >
> > And if the discussion wasn't on the common ones, then it wasn't an open
> > discussion.
>
> If the discussion was on the l/k list then most kernel developers arent
> going to read it because tey dont have time to wade through all the crap
> that doesnt matter to them.
>
> > And no, I don't think IRC counts either, sorry.
>
> IRC is where most stuff, especially cross vendor stuff is initially
> discussed nowdays, along with kernelnewbies where most of the intro
> stuff is - but thats disussed rather than formally proposed and studied
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-12-18 21:19:26

by Larry McVoy

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, Dec 18, 2001 at 01:14:20PM -0800, David S. Miller wrote:
> From: Larry McVoy <[email protected]>
> Date: Tue, 18 Dec 2001 13:02:28 -0800
>
> Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> time to me.
>
> It's like being at a Linux conference all the time. :-)
>
> It does kind of make sense given that people are so scattered across
> the planet. Sometimes I want to just grill someone on something, and
> email would be too much back and forth, IRC is one way to accomplish
> that.

Let me introduce you to this neat invention called a telephone. It's
the black thing next to your desk, it rings, has buttons. If you push
the right buttons, well, it's magic...

:-)

--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-12-18 21:22:24

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Larry McVoy wrote:
> On Tue, Dec 18, 2001 at 01:14:20PM -0800, David S. Miller wrote:
> > From: Larry McVoy <[email protected]>
> > Date: Tue, 18 Dec 2001 13:02:28 -0800
> >
> > Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> > time to me.
> >
> > It's like being at a Linux conference all the time. :-)
>
> Let me introduce you to this neat invention called a telephone. It's
> the black thing next to your desk, it rings, has buttons. If you push
> the right buttons, well, it's magic...

Yeah, but you can't scroll up a page on the phone...

(also, talking with multiple people at the same time
is kind of annoying in audio, while it's ok on irc)

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 21:17:53

by David Miller

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

From: Larry McVoy <[email protected]>
Date: Tue, 18 Dec 2001 13:02:28 -0800

Maybe I'm an old stick in the mud, but IRC seems like a big waste of
time to me.

It's like being at a Linux conference all the time. :-)

It does kind of make sense given that people are so scattered across
the planet. Sometimes I want to just grill someone on something, and
email would be too much back and forth, IRC is one way to accomplish
that.

2001-12-18 21:20:45

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Larry McVoy wrote:

> Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> time to me. It's perfect for off the cuff answers and fairly useless
> for thoughtful answers. We used to write well thought out papers and
> specifications for OS work. These days if you can't do it in a
> paragraph on IRC it must not be worth doing, eh?

Actually, we tend to use multiple media at the same time.

It happens very often that because of some discussion on
IRC we end up writing up a few paragraphs and sending it
to people by email.

For other things, email is clearly too slow, so stuff is
done on IRC (eg. walking somebody through a piece of code
to identify and agree on a bug).

cheers,

Rik
--
DMCA, SSSCA, W3C? Who cares? http://thefreeworld.net/

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-18 21:33:03

by David Miller

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

From: Larry McVoy <[email protected]>
Date: Tue, 18 Dec 2001 13:17:13 -0800

Let me introduce you to this neat invention called a telephone. It's
the black thing next to your desk, it rings, has buttons. If you push
the right buttons, well, it's magic...

I'm not calling Holland every time I want to poke Jens about
something in a patch we're working on :-)

I hate telephones for technical stuff, because people can call the
fucking thing when I am not behind my computer or even worse when I AM
behind my computer and I want to concentrate on the code on my screen
without being disturbed. With IRC it is MY CHOICE to get involved in
the discussion, I can choose to respond or not respond to someone, I
can choose to be available or not available at any given time. It's
just a real-time version of email. And the "passive, I can ignore
you" part is what I like about it.

Telephones frankly suck for discussing technical topics. I can't cut
and paste pieces of code from my other editor buffer to show you over
the phone, as another example as to why.

A lot of people like to use telephones specifically because it does
not give the other party the option of ignoring you once they pick up
the phone. I value the ability to make the choice to ignore people
because a lot of ideas I don't give a crap about come under my nose.

In fact that may be one of the best parts about Linux development
compared to doing stuff at a company, one isn't required to listen to
someone's idea or to even read it. If today I don't give a crap about
Joe's filesystem idea, hey guess what I'm not going to read any of his
emails about the thing.

2001-12-19 09:17:13

by Peter Wächtler

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Doug Ledford schrieb:
>
> Andreas Dilger wrote:
>
> > On Dec 18, 2001 09:27 -0800, Linus Torvalds wrote:
> >
> >>Maybe the best thing to do is to educate the people who write the sound
> >>apps for Linux (somebody was complaining about "esd" triggering this, for
> >>example).
> >>
> >
> > Yes, esd is an interrupt hog, it seems. When reading this thread, I
> > checked, and sure enough I was getting 190 interrupts/sec on the
> > sound card while not playing any sound. I killed esd (which I don't
> > use anyways), and interrupts went to 0/sec when not playing sound.
> > Still at 190/sec when using mpg123 on my ymfpci (Yamaha YMF744B DS-1S)
> > sound card.
>
> Weel, evidently esd and artsd both do this (well, I assume esd does now, it
> didn't do this in the past). Basically, they both transmit silence over the
> sound chip when nothing else is going on. So even though you don't hear
> anything, the same sound output DMA is taking place. That avoids things
> like nasty pops when you start up the sound hardware for a beep and that
> sort of thing. It also maintains state where as dropping output entirely
> could result in things like module auto unloading and then reloading on the
> next beep, etc. Personally, the interrupt count and overhead annoyed me
> enough that when I started hacking on the i810 sound driver one of my
> primary goals was to get overhead and interrupt count down. I think I
> suceeded quite well. On my current workstation:
>
> Context switches per second not playing any sound: 8300 - 8800
> Context switches per second playing an MP3: 9200 - 9900
> Interrupts per second from sound device: 86
> %CPU used when not playing MP3: 0 - 3% (magicdev is a CPU pig once every 2
> seconds)
> %CPU used when playing MP3s: 0 - 4%
>
> In any case, it might be worth the original poster's time in figuring out
> just how much of his lost CPU is because of playing sound and how much is
> actually caused by the windowing system and all the associated bloat that
> comes with it now a days.
>

Do you really think 8000 context switches are sane?

pippin:/var/log # vmstat 1
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
2 0 0 100728 4424 121572 27800 0 1 6 6 61 77 98 2 0
2 0 0 100728 5448 121572 27800 0 0 0 68 112 811 93 7 0
2 0 0 100728 5448 121572 27800 0 0 0 0 101 776 95 5 0
3 0 0 100728 4928 121572 27800 0 0 0 0 101 794 92 8 0

having a load ~2.1 (2 seti@home)

2001-12-19 11:05:37

by Helge Hafting

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

Doug Ledford wrote:

> Weel, evidently esd and artsd both do this (well, I assume esd does now, it
> didn't do this in the past). Basically, they both transmit silence over the
> sound chip when nothing else is going on. So even though you don't hear
> anything, the same sound output DMA is taking place.

Uuurgh. :-(

> That avoids things
> like nasty pops when you start up the sound hardware for a beep and that

Yuk, bad hardware. Pops when you start or stop writing? You don't even
have to turn the volume off or something to get a pop? Toss it.

> sort of thing. It also maintains state where as dropping output entirely
> could result in things like module auto unloading and then reloading on the
> next beep, etc.

Much better solved by having the device open, but not writing anything.
Open devices don't unload.

Helge Hafting

2001-12-19 16:48:32

by Daniel Phillips

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On December 18, 2001 08:04 pm, Alan Cox wrote:
> > I'm not subscribed to any, thank you very much. I read them through a news
> > gateway, which gives me access to the common ones.
> >
> > And if the discussion wasn't on the common ones, then it wasn't an open
> > discussion.
>
> If the discussion was on the l/k list then most kernel developers arent
> going to read it because tey dont have time to wade through all the crap
> that doesnt matter to them.

Hi Alan,

It's AIO we're talking about, right? AIO is interesting to quite a few
people. I'd read the thread. I'd also read any background material that Ben
would be so kind as to supply.

--
Daniel

2001-12-19 17:41:58

by Daniel Phillips

[permalink] [raw]
Subject: IRC (was: Scheduler)

On December 18, 2001 10:02 pm, Larry McVoy wrote:
> Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> time to me. It's perfect for off the cuff answers and fairly useless
> for thoughtful answers. We used to write well thought out papers and
> specifications for OS work. These days if you can't do it in a paragraph
> on IRC it must not be worth doing, eh?

Hi Larry,

It's a question of using the right tool for the job. As you know, email is
no substitute for a traditional everybody-in-one-room design meeting. These
days, with development distributed all over the world it's just not practical
for everyone to physically get together more than a few times a year, so what
can we do? Right, hang on IRC.

In some ways IRC is more efficient than a face-to-face meeting:

- You can do other things at the same time without offending anyone
(usually)

- Everything is logged for reference

- You can copy code examples and URLs into the channel

- It's normal to send/forward emails, perhaps with traditional papers
attached, patches, whatever, while talking on the channel, or as a
result of talking on the channel

- It's there 24 hours a day

- You can leave the meeting and do work any time you want to, as opposed to
keeping some portion of a group of highly paid engineers bored and idle
for hours at at time.

IRC also solves a big problem for distributed companies: how can you be sure
that your people are actually on the job? (You ping them on IRC and they
respond.)

While there's no doubt about IRC's value, there's also a danger: IRC is
addictive. You can easily end up spending all your time there, and doing
very little design/coding as a result. That's a matter of self-discipline.

To put this into a more immediate perspective for you, suppose you wanted to
get some traction under your SMP Clusters proposal? I'd suggest it's already
been kicked around as much as it's going to be on lkml, and you already wrote
your paper, so the next step would be to get together face-to-face with some
folks who have a clue. Well, unless you're willing to wait months for the
right people to show up in the Bay Area, IRC is the way to go.

Come on in, the water's fine ;-)

--
Daniel

2001-12-19 17:51:42

by Larry McVoy

[permalink] [raw]
Subject: Re: IRC (was: Scheduler)

On Wed, Dec 19, 2001 at 06:44:35PM +0100, Daniel Phillips wrote:
> On December 18, 2001 10:02 pm, Larry McVoy wrote:
> > Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> > time to me. It's perfect for off the cuff answers and fairly useless
> > for thoughtful answers. We used to write well thought out papers and
> > specifications for OS work. These days if you can't do it in a paragraph
> > on IRC it must not be worth doing, eh?
>
> To put this into a more immediate perspective for you, suppose you wanted to
> get some traction under your SMP Clusters proposal? I'd suggest it's already
> been kicked around as much as it's going to be on lkml, and you already wrote
> your paper, so the next step would be to get together face-to-face with some
> folks who have a clue. Well, unless you're willing to wait months for the
> right people to show up in the Bay Area, IRC is the way to go.

Actually, I haven't written a paper. A paper is something which lays out

goals
architecture
milestones
design details

and should be sufficient to make the project happen should I be hit by a
bus. That's my main complaint with IRC, it requires me to keep coming
back and explaining the same thing over and over again.

Here's an idea: you go try and get some traction on the OS cluster idea.
I'll give you 6 months and we'll see what happens. If nothing has
happened, I'll produce a decent paper describing it and then we wait
another 6 months to see what happens. I'll bet you 10:1 odds I get a
lot more action from a lot more people than you do. Nope, wait, make
that 100:1 odds.

I've seen how little I manage to get done by talking. Talk is cheap.
I've also seen how much I get done when I write a paper which other
people can pass around, think about, discuss, and implement. A senior
guy at Morgan Stanley (hi marc) once told me "if you want to get things
done, write them down". And in my case, since people tend to like to
argue with me rather than listen to me (yup, it's my fault, my "style"
leaves "room for improvement" translation: sucks rocks), a paper is
far more effective. My style is pretty much removed from the equation.

I can just see me on IRC, all I'd be getting is style complaints while
people successfully avoid the real points. Look at the last 8 years
of LKML. I'd say most of the effect was from the LMbench paper and
maybe a few threads on performance which would have been more effective
if I'd written a detailed paper explaining my point of view.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2001-12-19 18:21:34

by Daniel Phillips

[permalink] [raw]
Subject: Re: IRC (was: Scheduler)

On December 19, 2001 06:51 pm, Larry McVoy wrote:
> On Wed, Dec 19, 2001 at 06:44:35PM +0100, Daniel Phillips wrote:
> > On December 18, 2001 10:02 pm, Larry McVoy wrote:
> > > Maybe I'm an old stick in the mud, but IRC seems like a big waste of
> > > time to me. It's perfect for off the cuff answers and fairly useless
> > > for thoughtful answers. We used to write well thought out papers and
> > > specifications for OS work. These days if you can't do it in a paragraph
> > > on IRC it must not be worth doing, eh?
> >
> > To put this into a more immediate perspective for you, suppose you wanted to
> > get some traction under your SMP Clusters proposal? I'd suggest it's already
> > been kicked around as much as it's going to be on lkml, and you already wrote
> > your paper, so the next step would be to get together face-to-face with some
> > folks who have a clue. Well, unless you're willing to wait months for the
> > right people to show up in the Bay Area, IRC is the way to go.
>
> Actually, I haven't written a paper. A paper is something which lays out
>
> goals
> architecture
> milestones
> design details
>
> and should be sufficient to make the project happen should I be hit by a
> bus. That's my main complaint with IRC, it requires me to keep coming
> back and explaining the same thing over and over again.
>
> Here's an idea: you go try and get some traction on the OS cluster idea.
> I'll give you 6 months and we'll see what happens. If nothing has
> happened, I'll produce a decent paper describing it and then we wait
> another 6 months to see what happens. I'll bet you 10:1 odds I get a
> lot more action from a lot more people than you do. Nope, wait, make
> that 100:1 odds.

Sorry, reverse psychology doesn't doesn't work that well on me ;)

> I've seen how little I manage to get done by talking. Talk is cheap.
> I've also seen how much I get done when I write a paper which other
> people can pass around, think about, discuss, and implement. A senior
> guy at Morgan Stanley (hi marc) once told me "if you want to get things
> done, write them down". And in my case, since people tend to like to
> argue with me rather than listen to me (yup, it's my fault, my "style"
> leaves "room for improvement" translation: sucks rocks), a paper is
> far more effective. My style is pretty much removed from the equation.
>
> I can just see me on IRC, all I'd be getting is style complaints while
> people successfully avoid the real points. Look at the last 8 years
> of LKML. I'd say most of the effect was from the LMbench paper and
> maybe a few threads on performance which would have been more effective
> if I'd written a detailed paper explaining my point of view.

By all means, write the detailed paper if you've got time, then make
sure people have read it before you talk to them. But trust me, there
are people hanging out on IRC right now who have more than a clue about
exactly the subject you're interested in, who would need no more than
a short note to be up to speed and ready to address the real issues
intelligently.

--
Daniel

2001-12-19 18:20:14

by M. Edward Borasky

[permalink] [raw]
Subject: Re: IRC (was: Scheduler)

On Wed, 19 Dec 2001, Daniel Phillips wrote:

> To put this into a more immediate perspective for you, suppose you wanted to
> get some traction under your SMP Clusters proposal? I'd suggest it's already
> been kicked around as much as it's going to be on lkml, and you already wrote
> your paper, so the next step would be to get together face-to-face with some
> folks who have a clue. Well, unless you're willing to wait months for the
> right people to show up in the Bay Area, IRC is the way to go.
>
> Come on in, the water's fine ;-)

I've watched with great interest the discussion of IRC for Linux folk
and have yet to see anyone mention server/network names and channel
names. I've been on IRC for 2.5 years -- I tracked the Y2K transition on
IRC despite all the dire warnings that evil impulses were going to shoot
down the wire and fry the LCD screen on my laptop. So -- just where
exactly *is* this water that is so fine? mIRC 5.91 and I await with
bated breath. (Yes, I do use a Windows IRC client -- wanna make
something of it? :-)
--
Ed Borasky [email protected] http://www.borasky-research.net
(sometimes known as znmeb on IRC :-)

How to Stop A Folksinger Cold # 4
"Tie me kangaroo down, sport..."
Tie your own kangaroo down -- and stop calling me "sport"!

2001-12-19 18:26:34

by Daniel Phillips

[permalink] [raw]
Subject: Re: IRC (was: Scheduler)

On December 19, 2001 07:19 pm, M. Edward (Ed) Borasky wrote:
> On Wed, 19 Dec 2001, Daniel Phillips wrote:
> I've watched with great interest the discussion of IRC for Linux folk
> and have yet to see anyone mention server/network names and channel
> names. I've been on IRC for 2.5 years -- I tracked the Y2K transition on
> IRC despite all the dire warnings that evil impulses were going to shoot
> down the wire and fry the LCD screen on my laptop. So -- just where
> exactly *is* this water that is so fine? mIRC 5.91 and I await with
> bated breath. (Yes, I do use a Windows IRC client -- wanna make
> something of it? :-)

/server irc.openprojects.net

/list

--
Daniel

2001-12-19 18:41:25

by jjs

[permalink] [raw]
Subject: Re: IRC (was: Scheduler)

"M. Edward (Ed) Borasky" wrote:

> mIRC 5.91 and I await with
> bated breath. (Yes, I do use a Windows IRC client -- wanna make
> something of it? :-)

(shrug) whatever turns you on, I guess...

I will mention that there is this cool OS
called Linux, you might have heard of it -

There are a number of very nice irc clients
available for it

;-)

jjs



2001-12-19 18:57:36

by Benjamin LaHaise

[permalink] [raw]
Subject: aio

On Wed, Dec 19, 2001 at 09:01:59AM -0800, Linus Torvalds wrote:
>
> On Wed, 19 Dec 2001, Daniel Phillips wrote:
> >
> > It's AIO we're talking about, right? AIO is interesting to quite a few
> > people. I'd read the thread. I'd also read any background material that Ben
> > would be so kind as to supply.
>
> Case closed.
>
> Dan didn't even _know_ of the patches.

He doesn't read l-k apparently.

> Ben: end of discussion. I will _not_ apply any patches for aio if they
> aren't openly discussed. We're not microsoft, and we're not Sun. We're
> "Open Source", not "cram things down peoples throat and spring new
> features on them as a fait accompli".

Discuss them then to your heart's content. I've posted announcements to
both l-k and linux-aio which are both on marc.theaimsgroup.com if you're
too lazy to get your IS to add a new list yo the internal news gateway.

> The ghost of "binary compatibility" is not an issue - if Ben or anytbody
> else finds a flaw with the design, it's a hell of a lot better to have
> that flaw fixed _before_ it's part of my kernel rather than afterwards.

Thanks for the useful feedback on the userland interface then. Evidently
nobody cares within the community about improving functionality on a
reasonable timescale. If this doesn't change soon, Linux is doomed.

-ben

2001-12-19 19:27:31

by Dan Kegel

[permalink] [raw]
Subject: Re: aio

Ben LaHaise wrote:
> > Ben: end of discussion. I will _not_ apply any patches for aio if they
> > aren't openly discussed. We're not microsoft, and we're not Sun. We're
> > "Open Source", not "cram things down peoples throat and spring new
> > features on them as a fait accompli".
>
> Discuss them then to your heart's content. I've posted announcements to
> both l-k and linux-aio which are both on marc.theaimsgroup.com ...

Ben, I think maybe we need to get people excited about your patches,
and build up a user base, before putting them in the mainline kernel.
The volume on the linux-aio list has been pretty light, and the
visibility of the patches has been pretty low.

I know I volunteered to write some doc for your aio, and haven't delivered;
thus I'm contributing to the problem. Mea culpa. But there are some
small things that could be done. A freshmeat.net entry for the project,
for instance. Shall I create one, or would you rather do it?
A home page for linux-aio would be great, too.
- Dan

2001-12-19 20:06:30

by Daniel Phillips

[permalink] [raw]
Subject: Re: aio

On December 19, 2001 07:57 pm, Ben LaHaise wrote:
> On Wed, Dec 19, 2001 at 09:01:59AM -0800, Linus Torvalds wrote:
> >
> > On Wed, 19 Dec 2001, Daniel Phillips wrote:
> > >
> > > It's AIO we're talking about, right? AIO is interesting to quite a few
> > > people. I'd read the thread. I'd also read any background material
> > > that Ben would be so kind as to supply.
> >
> > Case closed.
> >
> > Dan didn't even _know_ of the patches.
>
> He doesn't read l-k apparently.

Dan Kegel put it succinctly:

http://marc.theaimsgroup.com/?l=linux-aio&m=100879005201064&w=2

Your original patch is here, and I do remember the post at the time:

http://marc.theaimsgroup.com/?l=linux-kernel&m=98114243104171&w=2

This post provides *zero* context. Ever since, I've been expecting to see
some explanation of what the goals are, what the design principles are, what
the historical context is, etc. etc., and that hasn't happened.

I've got a fairly recent version of the patch too, it's a little too long to
just sit down and read, to reverse-engineer the above information. What's
missing here is some kind of writeup like Suparna did for Jens' bio patch
(hint, hint). There's no reason why every single person who might be
interested should have to take the time to reverse-engineer the patch without
context.

As Linus points out, the active discussion hasn't happened yet.

--
Daniel

2001-12-19 20:19:30

by Davide Libenzi

[permalink] [raw]
Subject: Re: aio

On Wed, 19 Dec 2001, Ben LaHaise wrote:

> Thanks for the useful feedback on the userland interface then. Evidently
> nobody cares within the community about improving functionality on a
> reasonable timescale. If this doesn't change soon, Linux is doomed.

Ben, maybe it's true, nobody cares :( This could be either bad or good.
On one side it could be good because this means that everyone is happy
with the kernel performance level and this could be due the fact that real
world loads does not put their applications under stress. It could be bad
because it's possible that exist applications that are currently under
stress ( yes ), but their developers do not understand that by using
different interfaces they can improve their software ( or they simply do
not understand that the application is under stress ). Or maybe application
developers are not in lk. Or maybe they're not willing to rewrite/experiment
new APIs. On one side i understand that you can have an intrinsic attitude
to push/defend your patch, while one the other side i can agree with the
Linus point to have some kind of broad discussion/adoption about it.
But if applications developers are not in this list there won't be a broad
discussion and if the patch does not go inside the mainstream kernel
"external" applications developers are not going to use it. The Linus
point could be: "why do i have to merge a new api that has had a so cold
discussion/adoption inside lk ?".
Yes egg-chicken draws the picture very well.



- Davide



2001-12-19 20:25:40

by Pete Zaitcev

[permalink] [raw]
Subject: Re: aio

> > > > It's AIO we're talking about, right? AIO is interesting to quite a few
> > > > people. I'd read the thread. I'd also read any background material
> > > > that Ben would be so kind as to supply.
> > >
> > > Case closed.
> > >
> > > Dan didn't even _know_ of the patches.

> I've got a fairly recent version of the patch too, it's a little too long to
> just sit down and read, to reverse-engineer the above information.

Heh, I agree, in a way. I did that once, did not find any major
objections and documented about 20 small things like functions
that have extra arguments which are never used, etc. Ben saw it
and said "I know about all that, never mind".
Perhaps I should have had posted it somewhere?

-- Pete

2001-12-20 00:14:37

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Ben LaHaise <[email protected]>
Date: Wed, 19 Dec 2001 13:57:08 -0500

Thanks for the useful feedback on the userland interface then. Evidently
nobody cares within the community about improving functionality on a
reasonable timescale. If this doesn't change soon, Linux is doomed.

Maybe it's because the majority of people don't care nor would ever
need to use AIO. Are you willing to accept this possibly? :-) Linux
is anything but doomed, because you will notice that the things that
actually matter for most people are in fact improved and worked on
within a reasonable timescale.

Only very specialized applications can even benefit from AIO. This
doesn't make it useless, but it does decrease the amount of interest
(and priority) anyone in the community will have in working on it.

Now, if these few and far between people who are actually interested
in AIO are willing to throw money at the problem to get it worked on,
that is how the "reasonable timescale" will be arrived at. And if
they aren't willing to toss money at the problem, how important can it
really be to them? :-)

Maybe, just maybe, most people simply do not care one iota about AIO.

Linux caters to the general concerns not the nooks and cranies, that
is why it is anything but doomed.

2001-12-20 00:21:58

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 04:13:59PM -0800, David S. Miller wrote:
> Now, if these few and far between people who are actually interested
> in AIO are willing to throw money at the problem to get it worked on,
> that is how the "reasonable timescale" will be arrived at. And if
> they aren't willing to toss money at the problem, how important can it
> really be to them? :-)

People are throwing money at the problem. We're now at a point that in
order to provide the interested people with something they can use, we
need some kind of way to protect their applications against calling an
unsuspecting new mmap syscall instead of the aio syscall specified in
the kernel they compiled against.

> Maybe, just maybe, most people simply do not care one iota about AIO.
>
> Linux caters to the general concerns not the nooks and cranies, that
> is why it is anything but doomed.

What I'm saying is that for more people to play with it, it needs to be
more widely available. The set of developers that read linux-kernel and
linux-aio aren't giving much feedback. I do not expect the code to go
into 2.5 at this point in time. All I need is a set of syscall numbers
that aren't going to change should this implementation stand up to the
test of time.

-ben
--
Fish.

2001-12-20 00:37:38

by Andrew Morton

[permalink] [raw]
Subject: Re: aio

Benjamin LaHaise wrote:
>
> All I need is a set of syscall numbers that aren't going to change
> should this implementation stand up to the test of time.

The aio_* functions are part of POSIX and SUS, so merely reserving
system call numbers for them does not seems a completely dumb
thing to do, IMO.

-

2001-12-20 00:44:49

by Davide Libenzi

[permalink] [raw]
Subject: Re: aio

On Wed, 19 Dec 2001, Benjamin LaHaise wrote:

> What I'm saying is that for more people to play with it, it needs to be
> more widely available. The set of developers that read linux-kernel and
> linux-aio aren't giving much feedback. I do not expect the code to go
> into 2.5 at this point in time. All I need is a set of syscall numbers
> that aren't going to change should this implementation stand up to the
> test of time.

It would be nice to have a cooperation between glibc and the kernel to
have syscalls mapped by name, not by number.
With name->number resolved by crtbegin.o reading a public kernel table
or accessing a fixed-ID kernel map function and filling a map.
So if internally ( at the application ) sys_getpid has index 0, the
sysmap[0] will be filled with the id retrieved inside the kernel by
looking up "sys_getpid".
Eat too spicy today ?




- Davide



2001-12-20 00:56:19

by H. Peter Anvin

[permalink] [raw]
Subject: Re: aio

Followup to: <[email protected]>
By author: Andrew Morton <[email protected]>
In newsgroup: linux.dev.kernel
>
> The aio_* functions are part of POSIX and SUS, so merely reserving
> system call numbers for them does not seems a completely dumb
> thing to do, IMO.
>

Yes, it is, unless you already have a design for how to map the aio_*
library functions onto system calls.

-hpa


--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[email protected]>

2001-12-20 01:21:36

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Bill Huey <[email protected]>
Date: Wed, 19 Dec 2001 17:16:31 -0800

Like the Java folks ? few and far between ?

Precisely, in fact. Anyone who can say that Java is going to be
relevant in a few years time, with a straight face, is only kidding
themselves.

Java is not something to justify a new kernel feature, that is for
certain.

2001-12-20 01:17:26

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 04:13:59PM -0800, David S. Miller wrote:
> Maybe it's because the majority of people don't care nor would ever
> need to use AIO. Are you willing to accept this possibly? :-) Linux
> is anything but doomed, because you will notice that the things that
> actually matter for most people are in fact improved and worked on
> within a reasonable timescale.
>
> Only very specialized applications can even benefit from AIO. This
> doesn't make it useless, but it does decrease the amount of interest
> (and priority) anyone in the community will have in working on it.

Folks doing serious server side Java and runtime internals would
definitely be able to use this stuff, namely me. It'll remove the
abuse of threading used to deal with large IO systems when NIO comes out
for 1.4. And as a JVM engineer for the FreeBSD community I'm drooling
over stuff like that.

> Now, if these few and far between people who are actually interested
> in AIO are willing to throw money at the problem to get it worked on,
> that is how the "reasonable timescale" will be arrived at. And if
> they aren't willing to toss money at the problem, how important can it
> really be to them? :-)

Like the Java folks ? few and far between ? What you're saying it just
plain outdated and from a previous generation of thinking that has
become irrelevant as the community has grown.

> Maybe, just maybe, most people simply do not care one iota about AIO.
>
> Linux caters to the general concerns not the nooks and cranies, that
> is why it is anything but doomed.

Again, Linux collectively has outgrown that thinking and the scope of
what the previous generation of engineers can responsible for, which is
why it's important for folks like Ben should be encourage to take it to
the next level.

bill

2001-12-20 02:27:17

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 05:20:46PM -0800, David S. Miller wrote:
> Precisely, in fact. Anyone who can say that Java is going to be
> relevant in a few years time, with a straight face, is only kidding
> themselves.

Oh give me coke shooting, Steeley Dan, late 70s bitter kernel
programmer break...

> Java is not something to justify a new kernel feature, that is for
> certain.

Java is here now and used extensively on server side applications.
Simply dismissing it doesn't invalidate the claim that I made before
about how this mentality is outdated.

The economic inertia of Java driven server applications should have
enough force that it is justifyable to RedHat and other commerical
organizations to support it regardless of what your current view is
on this topic.

Even within the BSD/OS group at BSDi/WindRiver, (/me former BSD/OS
engineer) some kind of dedicated async IO system inside kernel was
talked about as highly desireable and possibly a more direct way
of dealing with VM page/async IO event issues that don't map
conceptually to a scheduler context cleanly.

AIO is good, plain and simple.

bill

2001-12-20 02:37:49

by Cameron Simpson

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 05:20:46PM -0800, David S. Miller <[email protected]> wrote:
| From: Bill Huey <[email protected]>
| Like the Java folks ? few and far between ?
| Precisely, in fact. Anyone who can say that Java is going to be
| relevant in a few years time, with a straight face, is only kidding
| themselves.

Maybe. I'm good at that.

| Java is not something to justify a new kernel feature, that is for
| certain.

Of itself, maybe. (Though an attitude like yours is a core reason Java is
spreading as slowly as it is - much like Linux desktops...)

However, heavily threaded apps regardless of language are hardly likely
to disappear; threads are the natural way to write many many things. And
if the kernel implements threads as on Linux, then the scheduler will
become much more important to good performance.
--
Cameron Simpson, DoD#743 [email protected] http://www.zip.com.au/~cs/

Questions are a burden to others,
Answers, a prison for oneself.

2001-12-20 02:46:09

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Bill Huey <[email protected]>
Date: Wed, 19 Dec 2001 18:26:28 -0800

The economic inertia of Java driven server applications should have
enough force that it is justifyable to RedHat and other commerical
organizations to support it regardless of what your current view is
on this topic.

So they'll get paid to implement and support it, and that is precisely
what is happening right now. And the whole point I'm trying to make
is that that is where it's realm is right now.

If AIO was so relevant+sexy we'd be having threads of discussion about
the AIO implementation instead of threads about how relevant it is or
is not for the general populace. Wouldn't you concur? :-)

The people doing Java server applets are such a small fraction of the
Linux user community.

2001-12-20 02:48:09

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Cameron Simpson <[email protected]>
Date: Thu, 20 Dec 2001 13:37:05 +1100

(Though an attitude like yours is a core reason Java is
spreading as slowly as it is - much like Linux desktops...)

It's actually Sun's fault more than anyone else's.

However, heavily threaded apps regardless of language are hardly likely
to disappear; threads are the natural way to write many many things. And
if the kernel implements threads as on Linux, then the scheduler will
become much more important to good performance.

We are not talking about the scheduler, we are talking about
AIO.

2001-12-20 02:57:10

by Mikael Pettersson

[permalink] [raw]
Subject: Re: aio

On Wed, 19 Dec 2001 19:21:36 -0500, Benjamin LaHaise wrote:
>People are throwing money at the problem. We're now at a point that in
>order to provide the interested people with something they can use, we
>need some kind of way to protect their applications against calling an
>unsuspecting new mmap syscall instead of the aio syscall specified in
>the kernel they compiled against.

One option is to use a mechanism where the kernel selects the aio
syscall number from any of the available numbers, and publishes it
in some easily accessible location. User-space then needs to grab it
from that location at app/lib init time before doing any actual work.
(This of course goes away when/if you do get an official syscall number.)

This is at least how I intend to do the "ioctl on device" to "real
syscall" transition for my x86 performance counters driver.

/Mikael

2001-12-20 02:56:30

by Cameron Simpson

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 06:47:18PM -0800, David S. Miller <[email protected]> wrote:
| (Though an attitude like yours is a core reason Java is
| spreading as slowly as it is - much like Linux desktops...)
| It's actually Sun's fault more than anyone else's.

Debatable. But fortunately off topic.

| However, heavily threaded apps regardless of language are hardly likely
| to disappear; threads are the natural way to write many many things. And
| if the kernel implements threads as on Linux, then the scheduler will
| become much more important to good performance.
| We are not talking about the scheduler, we are talking about
| AIO.

It was in the same thread - I must have ignored the detail switch. Ignore
me in turn. But while I'm here, tell me why async I/O is important
to Java and not to anything else, which still seems the thrust of your
remarks.
--
Cameron Simpson, DoD#743 [email protected] http://www.zip.com.au/~cs/

Always code as if the guy who ends up maintaining your code will be a violent
psychopath who knows where you live.
- Martin Golding, DoD #0236, [email protected]

2001-12-20 02:59:30

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Cameron Simpson <[email protected]>
Date: Thu, 20 Dec 2001 13:52:21 +1100

tell me why async I/O is important
to Java and not to anything else, which still seems the thrust of
your remarks.

Not precisely my thrust, which is that AIO is not important to any
significant population of Linux users, it is "nook and cranny" in
scope. And that those "nook and cranny" folks who really find it
important can get paid implementation+support of AIO.

2001-12-20 03:00:42

by John Heil

[permalink] [raw]
Subject: Re: aio

On Wed, 19 Dec 2001, David S. Miller wrote:

> Date: Wed, 19 Dec 2001 18:45:27 -0800 (PST)
> From: "David S. Miller" <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected], [email protected],
> [email protected]
> Subject: Re: aio
>
> From: Bill Huey <[email protected]>
> Date: Wed, 19 Dec 2001 18:26:28 -0800
>
> The economic inertia of Java driven server applications should have
> enough force that it is justifyable to RedHat and other commerical
> organizations to support it regardless of what your current view is
> on this topic.
>
> So they'll get paid to implement and support it, and that is precisely
> what is happening right now. And the whole point I'm trying to make
> is that that is where it's realm is right now.
>
> If AIO was so relevant+sexy we'd be having threads of discussion about
> the AIO implementation instead of threads about how relevant it is or
> is not for the general populace. Wouldn't you concur? :-)
>
> The people doing Java server applets are such a small fraction of the
> Linux user community.

True for now, but if we want to expand linux into the enterprise and the
desktop to a greater degree, then we need to support the Java community to
draw them and their management in, rather than delaying beneficial
features until their number on lkml reaches critical mass for a design
discussion.


-
-----------------------------------------------------------------
John Heil
South Coast Software
Custom systems software for UNIX and IBM MVS mainframes
1-714-774-6952
[email protected]
http://www.sc-software.com
-----------------------------------------------------------------

2001-12-20 03:08:13

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 06:45:27PM -0800, David S. Miller wrote:
> So they'll get paid to implement and support it, and that is precisely
> what is happening right now. And the whole point I'm trying to make
> is that that is where it's realm is right now.
>
> If AIO was so relevant+sexy we'd be having threads of discussion about
> the AIO implementation instead of threads about how relevant it is or
> is not for the general populace. Wouldn't you concur? :-)

I attribute the lack of technical discussion to the least common denomiator
culture of the Linux community and not the merits of the actual technical
system itself. That's what linux-aio@ is for...

And using lkml as a AIO forum is probably outside of the scope of this list
and group.

> The people doing Java server applets are such a small fraction of the
> Linux user community.

Yeah, but the overall Unix community probably has something different to say
about that, certainly. Even in BSD/OS, this JVM project I've been working on is
recognized as one of the most important systems second (probably) only to the
kernel itself. And, IMO, they have a more balanced view of this language
system and the value of it economically as a money making platform instead of
showing off to their peers. It's a greatly anticipated project in all of the
BSDs.

That's my semi-obnoxious take on it. ;-)

bill

2001-12-20 03:07:22

by David Miller

[permalink] [raw]
Subject: Re: aio

From: John Heil <[email protected]>
Date: Wed, 19 Dec 2001 18:57:34 +0000 ( )

True for now, but if we want to expand linux into the enterprise and the
desktop to a greater degree, then we need to support the Java community to
draw them and their management in, rather than delaying beneficial
features until their number on lkml reaches critical mass for a design
discussion.

Firstly, you say this as if server java applets do not function at all
or with acceptable performance today. That is not true for the vast
majority of cases.

If java server applet performance in all cases is dependent upon AIO
(it is not), that would be pretty sad. But it wouldn't be the first
time I've heard crap like that. There is propaganda out there telling
people that 64-bit address spaces are needed for good java
performance. Guess where that came from? (hint: they invented java
and are in the buisness of selling 64-bit RISC processors)

2001-12-20 03:14:32

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Bill Huey <[email protected]>
Date: Wed, 19 Dec 2001 19:07:16 -0800

And using lkml as a AIO forum is probably outside of the scope of this list
and group.

This whole thread exists because Linus wants public general and
technical discussion on lkml of new features to happen before he
considers putting them into the tree, and the fact that they are not
in the tree because he isn't seeing such enthusiastic discussions
happening at all.

I don't think AIO, because of it's non-trivial impact to the tree, is
at all outside the scope of this list. This is in fact the place
where major stuff like AIO is meant to be discussed, not some special
list where only "AIO people" hang out, of course people on that list
will be enthusiastic about AIO!

Frankly, on your other comments, I don't give a rats ass what BSD/OS
people are doing about, nor how highly they rate, Java. That is
neither here nor there. Java is going to be dead in a few years, and
let's just agree to disagree about this particular point ok?

2001-12-20 03:21:54

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 07:06:29PM -0800, David S. Miller wrote:
> Firstly, you say this as if server java applets do not function at all
> or with acceptable performance today. That is not true for the vast
> majority of cases.
>
> If java server applet performance in all cases is dependent upon AIO
> (it is not), that would be pretty sad. But it wouldn't be the first

Java is pretty incomplete in this area, which should be addressed to a
great degree in the new NIO API.

The core JVM isn't dependent on this stuff per se for performance, but
it is critical to server side programs that have to deal with highly
scalable IO systems, largely number of FDs, that go beyond the current
expressiveness of select()/poll().

This is all standard fare in *any* kind of high performance networking
application where some kind of high performance kernel/userspace event
delivery system is needed, kqueue() principally.

> time I've heard crap like that. There is propaganda out there telling
> people that 64-bit address spaces are needed for good java
> performance. Guess where that came from? (hint: they invented java
> and are in the buisness of selling 64-bit RISC processors)

What ? oh god. HotSpot is a pretty amazing compiler and it performs well.
Swing does well now, but the lingering issue in Java is the shear size
of it and possibly GC issues. It pretty clear that it's going to get
larger, which is fine since memory is cheap.

bill

2001-12-20 03:33:43

by John Heil

[permalink] [raw]
Subject: Re: aio

On Wed, 19 Dec 2001, David S. Miller wrote:

> Date: Wed, 19 Dec 2001 19:06:29 -0800 (PST)
> From: "David S. Miller" <[email protected]>
> To: [email protected]
> Cc: [email protected], [email protected], [email protected],
> [email protected], [email protected]
> Subject: Re: aio
>
> From: John Heil <[email protected]>
> Date: Wed, 19 Dec 2001 18:57:34 +0000 ( )
>
> True for now, but if we want to expand linux into the enterprise and the
> desktop to a greater degree, then we need to support the Java community to
> draw them and their management in, rather than delaying beneficial
> features until their number on lkml reaches critical mass for a design
> discussion.
>
> Firstly, you say this as if server java applets do not function at all
> or with acceptable performance today. That is not true for the vast
> majority of cases.
>
> If java server applet performance in all cases is dependent upon AIO
> (it is not), that would be pretty sad. But it wouldn't be the first
> time I've heard crap like that. There is propaganda out there telling
> people that 64-bit address spaces are needed for good java
> performance. Guess where that came from? (hint: they invented java
> and are in the buisness of selling 64-bit RISC processors)
>

Agree. However, put your business hat for a minute. We want increased
market share for linux and a lot of us, you included, live by it.
If aio, the proposed implementation or some other, can provide an
adequate performance boost for Java (yet to be seen), that at least
allows the marketing folks one more argument to draw users to linux.
Do think the trade mags etc don't watch what we do? A demonstrable
advantage in Java performance is marketable and beneficial to all.


-
-----------------------------------------------------------------
John Heil
South Coast Software
Custom systems software for UNIX and IBM MVS mainframes
1-714-774-6952
[email protected]
http://www.sc-software.com
-----------------------------------------------------------------

2001-12-20 03:51:04

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> The thing is, I'm personally very suspicious of the "features for that
> exclusive 0.1%" mentality.

Then why do we have sendfile(), or that idiotic sys_readahead() ?

(is there _any_ use for sys_readahead() ? at all ?)

cheers,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-20 03:47:43

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 07:13:54PM -0800, David S. Miller wrote:
> I don't think AIO, because of it's non-trivial impact to the tree, is
> at all outside the scope of this list. This is in fact the place
> where major stuff like AIO is meant to be discussed, not some special
> list where only "AIO people" hang out, of course people on that list
> will be enthusiastic about AIO!

Well maybe yourself and others should make some comments about it then.

> Frankly, on your other comments, I don't give a rats ass what BSD/OS
> people are doing about, nor how highly they rate, Java. That is
> neither here nor there. Java is going to be dead in a few years, and
> let's just agree to disagree about this particular point ok?

Who cares about Java? What about high performance LDAP servers or tux-like
userspace performance? How about faster select and poll? An X server that
doesn't have to make a syscall to find out that more data has arrived? What
about nbd or iscsi servers that are in userspace and have all the benefits
that their kernel side counterparts do?

-ben
--
Fish.

2001-12-20 04:04:27

by Ryan Cumming

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On December 19, 2001 19:50, Rik van Riel wrote:
> On Tue, 18 Dec 2001, Linus Torvalds wrote:
> > The thing is, I'm personally very suspicious of the "features for that
> > exclusive 0.1%" mentality.
>
> Then why do we have sendfile(), or that idiotic sys_readahead() ?

Damn straights

sendfile(2) had an oppertunity to be a real extention of the Unix philosophy.
If it was called something like "copy" (to match "read" and "write"), and
worked on all fds (even if it didn't do zerocopy, it should still just work),
it'd fit in a lot more nicely than even BSD sockets. Alas, as it is, it's
more of a wart than an extention.

Now, sys_readahead() is pretty much the stupidest thing I've ever heard. If
we had a copy(2) syscall, we could do the same thing by: copy(sourcefile,
/dev/null, count). I don't think sys_readahead() even qualifies as a wart.

-Ryan

2001-12-20 04:04:46

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 11:29:15AM -0800, Dan Kegel wrote:
> I know I volunteered to write some doc for your aio, and haven't delivered;
> thus I'm contributing to the problem. Mea culpa. But there are some
> small things that could be done. A freshmeat.net entry for the project,
> for instance. Shall I create one, or would you rather do it?
> A home page for linux-aio would be great, too.

I've started writing some web pages, and will gratiously accept additional
docs/text/questions. Freshmeat will get an entry right after I send this
out.

-ben
--
Fish.

2001-12-20 05:07:48

by Pete Zaitcev

[permalink] [raw]
Subject: Re: aio

>[...]
> However, heavily threaded apps regardless of language are hardly likely
> to disappear; threads are the natural way to write many many things. And
> if the kernel implements threads as on Linux, then the scheduler will
> become much more important to good performance.

Cameron seems to be arguing with DaveM, but subconsciously he
only supports DaveM's point about AIO: Java cannot make use
of AIO, so that's one (large or small, important or unimportant)
group of applications down from the count.

Just trying to keep on topic :)

-- Pete

2001-12-20 05:10:49

by Cameron Simpson

[permalink] [raw]
Subject: Re: aio

On Thu, Dec 20, 2001 at 12:07:21AM -0500, Pete Zaitcev <[email protected]> wrote:
| >[...]
| > However, heavily threaded apps regardless of language are hardly likely
| > to disappear; threads are the natural way to write many many things. And
| > if the kernel implements threads as on Linux, then the scheduler will
| > become much more important to good performance.
|
| Cameron seems to be arguing with DaveM,

About the wrong things, but no matter.

| but subconsciously he
| only supports DaveM's point about AIO: Java cannot make use
| of AIO, so that's one (large or small, important or unimportant)
| group of applications down from the count.

You're sure? Java _authors_ can't make use of it, but Java _implementors_
probably have good reason to want it ...

| Just trying to keep on topic :)

Whatever for?
--
Cameron Simpson, DoD#743 [email protected] http://www.zip.com.au/~cs/

Reaching consensus in a group often is confused with finding the right
answer. - Norman Maier

2001-12-20 05:16:39

by Pete Zaitcev

[permalink] [raw]
Subject: Re: aio

> I attribute the lack of technical discussion to the least common denomiator
> culture of the Linux community and not the merits of the actual technical
> system itself. That's what linux-aio@ is for...
>
> And using lkml as a AIO forum is probably outside of the scope of this list
> and group.

Bill, who is going to read linux-aio? There are many splinter
lists. Just about a week ago Dave Gilbert hissed at me for
not posting to linux-scsi. OK, I admit, the USB cabal made me
to subscribe to linux-usb-devel - only because the subsystem
was so out of whack that I was spending all my time trying to fix
it and dealing with broken sourceforge listserver did not make
it much worse. I can make exception for Ben, out of pure respect.
But then what? Those lists proliferate like cockroaches, every day!
I wish I could subscribe to linux-aio, linux-scsi, linux-nfs,
linux-networking, linux-afs, linux-sound, an OpenGFS list,
and "open" AFS list, linux-s390, linux-on-vaio, linux-usb-user,
linux-infi-devel, Hotplug, and perhaps more.

-- Pete

2001-12-20 05:30:00

by David Miller

[permalink] [raw]
Subject: Re: aio

From: John Heil <[email protected]>
Date: Wed, 19 Dec 2001 19:30:13 +0000 ( )

Agree. However, put your business hat for a minute. We want increased
market share for linux and a lot of us, you included, live by it.

Oh my buisness hat is certainly on, which is why I keep talking about
the people who need this "paying for implementation and support of AIO
for Linux". :-)

Make no mistake, I do agree with your points though in general.

But those things are not dependent upon "standard Linus Linux" having
AIO first, this is what vendors do for differentiation by shipping
feature X in their kernel before others.

2001-12-20 05:40:51

by David Miller

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

From: Rik van Riel <[email protected]>
Date: Thu, 20 Dec 2001 01:50:36 -0200 (BRST)

On Tue, 18 Dec 2001, Linus Torvalds wrote:

> The thing is, I'm personally very suspicious of the "features for that
> exclusive 0.1%" mentality.

Then why do we have sendfile(), or that idiotic sys_readahead() ?

Sending files over sockets are %99 of what most network servers are
actually doing today, it is much more than 0.1% :-)

2001-12-20 05:40:01

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Benjamin LaHaise <[email protected]>
Date: Wed, 19 Dec 2001 22:47:17 -0500

Well maybe yourself and others should make some comments about it then.

Because, like I keep saying, it is totally uninteresting for most of
us.

Who cares about Java?

The people telling me on this list how important AIO is for Linux :-)

What about high performance LDAP servers or tux-like
userspace performance?

People have done "faster than TUX" userspace web service with the
current kernel, that is without AIO. There is no reason you can't
do a fast LDAP server with the current kernel either, any such claim
is simply rubbish. Why do we need AIO again?

How about faster select and poll?

You don't need faster select and poll as demonstrated by the
userspace "faster than TUX" example above.

An X server that doesn't have to make a syscall to find out that
more data has arrived?

Who really needs this kind of performance improvement? Like anyone
really cares if their window gets the keyboard focus or a pixel over a
AF_UNIX socket a few nanoseconds faster. How many people do you think
believe they have unacceptable X performance right now and that
select()/poll() syscalls overhead is the cause? Please get real.

People who want graphics performance are not pushing their data
through X over a filedescriptor, they are either using direct
rendering in the app itself (ala OpenGL) or they are using shared
memory for the bulk of the data (ala Xshm or Xv extensions).

What about nbd or iscsi servers that are in userspace and have all
the benefits that their kernel side counterparts do?

I do not buy this claim that it is not possible the achieve the
desired performance using existing facilities.

The only example of AIO benefitting performance I see right now are
databases.

Franks a lot,
David S. Miller
[email protected]

2001-12-20 05:49:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: aio


On Wed, 19 Dec 2001, David S. Miller wrote:
>
> Not precisely my thrust, which is that AIO is not important to any
> significant population of Linux users, it is "nook and cranny" in
> scope. And that those "nook and cranny" folks who really find it
> important can get paid implementation+support of AIO.

I disagree - we can probably make the aio by Ben quite important. Done
right, it becomes a very natural way of doing event handling, and it could
very well be rather useful for many things that use select loops right
now.

So I actually like the thing as it stands now. What I don't like is how
it's been handled, with people inside Oracle etc working with it, but
_not_ people on the kernel mailing list. I don't worry about the code
nearly as much as I worry about people starting to clique together.

Linus

2001-12-20 05:54:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Thu, 20 Dec 2001, Rik van Riel wrote:
> On Tue, 18 Dec 2001, Linus Torvalds wrote:
>
> > The thing is, I'm personally very suspicious of the "features for that
> > exclusive 0.1%" mentality.
>
> Then why do we have sendfile(), or that idiotic sys_readahead() ?

Hey, I expect others to do things in their tree, and I live by the same
rules: I do my stuff openly in my tree.

The Apache people actually seemed quite interested in sendfile. Of course,
that was before apache seemed to stop worrying about trying to beat
others at performance (rightly or wrongly - I think they are right
from a pragmatic viewpoint, and wrong from a PR one).

And hey, the same way I encourage others to experiment openly with their
trees, I experiment with mine.

Linus

2001-12-20 05:58:31

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Linus Torvalds <[email protected]>
Date: Wed, 19 Dec 2001 21:47:18 -0800 (PST)

it could very well be rather useful for many things that use select
loops right now.

Then let us agree to disagree. :-) I think it's potential advantages,
and how many things really "require it" for better performance, is
being blown out of proportion.

2001-12-20 06:00:44

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 09:39:10PM -0800, David S. Miller wrote:
> How about faster select and poll?
>
> You don't need faster select and poll as demonstrated by the
> userspace "faster than TUX" example above.

Step back for a moment. I know of phttpd and zeus. They both have
a serious problem: they fall down when the load on the system exceeds
the capabilities of the cpu. If you'd bother to take a look at the
aio api I'm proposing, it has less overhead under heavy load as events
get coalesced. Even then, the overhead under light load is less than
signals or select or poll.

-ben
--
Fish.

2001-12-20 06:00:51

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 09:57:30PM -0800, David S. Miller wrote:
> Then let us agree to disagree. :-) I think it's potential advantages,
> and how many things really "require it" for better performance, is
> being blown out of proportion.

Show me how to make a single process server that can handle 100000 or more
open tcp sockets that doesn't collapse under load. I can do it with aio;
can you do it without?

-ben
--
Fish.

2001-12-20 06:00:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...


On Wed, 19 Dec 2001, David S. Miller wrote:
>
> Then why do we have sendfile(), or that idiotic sys_readahead() ?
>
> Sending files over sockets are %99 of what most network servers are
> actually doing today, it is much more than 0.1% :-)

Well, that was true when the thing was written, but whether anybody _uses_
it any more, I don't know. Tux gets the same effect on its own, and I
don't know if Apache defaults to using sendfile or not.

readahead was just a personal 5-minute experiment, we can certainly remove
that ;)

Linus

2001-12-20 06:06:21

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Benjamin LaHaise <[email protected]>
Date: Thu, 20 Dec 2001 00:59:28 -0500

Show me how to make a single process server that can handle 100000 or more
open tcp sockets that doesn't collapse under load. I can do it with aio;
can you do it without?

Why are you limiting me to a single process? :-) Can I have at least
1 per cpu possibly? :-)))

2001-12-20 06:03:41

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Benjamin LaHaise <[email protected]>
Date: Thu, 20 Dec 2001 00:58:03 -0500

Step back for a moment. I know of phttpd and zeus. They both have
a serious problem: they fall down when the load on the system exceeds
the capabilities of the cpu. If you'd bother to take a look at the
aio api I'm proposing, it has less overhead under heavy load as events
get coalesced. Even then, the overhead under light load is less than
signals or select or poll.

No I'm not talking about phttpd nor zeus, I'm talking about the guy
who did the hacks where he'd put the http headers + content into a
seperate file and just sendfile() that to the client.

I forget what his hacks were named, but there certainly was a longish
thread on this list about it about 1 year ago if memory serves.

2001-12-20 06:09:31

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 10:02:47PM -0800, David S. Miller wrote:
> Why are you limiting me to a single process? :-) Can I have at least
> 1 per cpu possibly? :-)))

1 process. 1 cpu machine. 1 gige card. As much ram as you want. No
syscalls. Must exhibit a load curve similar to:

y
| ...............
| .
|.
+----------------x

Where x == requests per second sent to the machine and y is the number
of resposes per second sent out of the machine. Hint: read the phttpd
and /dev/poll papers for an idea of the breakdown that happens for larger
values of x (make the cpu slower to cause the interesting points to move
lower). For a third dimension to the graph, make the number of total
connections the z axis.

-ben
--
Fish.

2001-12-20 06:13:31

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Benjamin LaHaise <[email protected]>
Date: Thu, 20 Dec 2001 01:07:42 -0500

1 process. 1 cpu machine. 1 gige card. As much ram as you want. No
syscalls. Must exhibit a load curve similar to:

y
| ...............
| .
|.
+----------------x

Where x == requests per second sent to the machine and y is the number
of resposes per second sent out of the machine. Hint: read the phttpd
and /dev/poll papers for an idea of the breakdown that happens for larger
values of x (make the cpu slower to cause the interesting points to move
lower). For a third dimension to the graph, make the number of total
connections the z axis.

Ok, TUX can do it. Now list for me some server that really matters
other than web and ftp? If you say databases, then I agree with you
but I will also reiterate how the people who need that level of
database performance is "nook and cranny".

I think there is nothing wrong with doing a TUX module for situations
where 1) the server is important for enough people and 2) scaling to
the levels you are talking about is a real issue for that service.

2001-12-20 06:03:41

by David Miller

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

From: Linus Torvalds <[email protected]>
Date: Wed, 19 Dec 2001 21:58:41 -0800 (PST)

Well, that was true when the thing was written, but whether anybody _uses_
it any more, I don't know. Tux gets the same effect on its own, and I
don't know if Apache defaults to using sendfile or not.

Samba uses it by default, that I know for sure :-)

2001-12-20 06:12:42

by Linus Torvalds

[permalink] [raw]
Subject: Re: aio


Could we get back on track, and possibly discuss the patches themselves,
ok? We want _constructive_ criticism of the interfaces.

I think it's clear that many people do want to have aio support. At least
as far as I'm concerned, that's not the reason I want to have public
discussion. I want to make sure that the interfaces are good for aio
users, and that the design isn't stupid.

If somebody can point to a better way of doing aio, and giving good
arguments for that, more power to him. But let's not go down the path of
"_I_ don't like aio, so _you_ must be stupid".

Linus

2001-12-20 06:25:13

by Linus Torvalds

[permalink] [raw]
Subject: Re: aio


On Wed, 19 Dec 2001, David S. Miller wrote:
>
> Ok, TUX can do it. Now list for me some server that really matters
> other than web and ftp?

Now now, that's unfair. We should be able to do it in user space.

I think the question you _should_ be lobbying at Ben and the other aio
people is how the aio stuff could do zero-copy from disk cache to the
network, ie do the things that Tux does internally where it does
nonblocking reads from disk ad then sends them out non-blocking to the
network without havign to copy the data _or_ have to use extremely
expensive TLB mapping tricks to get at it..

Ie tie the "sendfile" and "aio" threads together, and ask Ben if we can do
aio-sendfile and have thousands of asynchronous sendfiles going on at the
same time, like Tux can do. And if not, then why not? Missing or bad
interfaces?

Ben? Doing user-space IO is all well and good, but that extra copy and TLB
stuff kills you. Tell us how to do it ;)

Linus

2001-12-20 06:38:44

by Timothy Covell

[permalink] [raw]
Subject: Scheduler, Can we save some juice ...

On Wednesday 19 December 2001 21:50, Rik van Riel wrote:
> On Tue, 18 Dec 2001, Linus Torvalds wrote:
> > The thing is, I'm personally very suspicious of the "features for that
> > exclusive 0.1%" mentality.
>
> Then why do we have sendfile(), or that idiotic sys_readahead() ?
>
> (is there _any_ use for sys_readahead() ? at all ?)
>
> cheers,
>
> Rik


OK, here's another 0.1% for you. Considering how Linux SMP
doesn't have high CPU affinity, would it be possible to make a
patch such that the additional CPUs remain in deep sleep/HALT
mode until the first CPU hits a high-water mark of say 90%
utilization? I've started doing this by hand with the (x)pulse
application. My goal is to save electricity and cut down on
excess heat when I'm just browsing the web and not compiling
or seti@home'ing.


--
[email protected].

2001-12-20 06:47:13

by Mike Castle

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 10:00:40PM -0800, David S. Miller wrote:
> No I'm not talking about phttpd nor zeus, I'm talking about the guy
> who did the hacks where he'd put the http headers + content into a
> seperate file and just sendfile() that to the client.
>
> I forget what his hacks were named, but there certainly was a longish
> thread on this list about it about 1 year ago if memory serves.


Would that be Fabio Riccardi's X15 stuff?

mrc
--
Mike Castle [email protected] http://www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

2001-12-20 06:51:23

by Ryan Cumming

[permalink] [raw]
Subject: Re: Scheduler, Can we save some juice ...

On December 19, 2001 22:33, Timothy Covell wrote:
> OK, here's another 0.1% for you. Considering how Linux SMP
> doesn't have high CPU affinity, would it be possible to make a
> patch such that the additional CPUs remain in deep sleep/HALT
> mode until the first CPU hits a high-water mark of say 90%
> utilization? I've started doing this by hand with the (x)pulse
> application. My goal is to save electricity and cut down on
> excess heat when I'm just browsing the web and not compiling
> or seti@home'ing.

I seriously doubt there would be a noticable power consumption or heat
difference between two CPU's running HLT half the time, and one CPU running
HLT all the time. And I'm downright certain it isn't worth the code
complexity even if it was, there is very little (read: no) intersection
between the SMP and low-power user base.

-Ryan

2001-12-20 06:55:33

by Robert Love

[permalink] [raw]
Subject: Re: aio

On Thu, 2001-12-20 at 01:46, Mike Castle wrote:
> On Wed, Dec 19, 2001 at 10:00:40PM -0800, David S. Miller wrote:
> > No I'm not talking about phttpd nor zeus, I'm talking about the guy
> > who did the hacks where he'd put the http headers + content into a
> > seperate file and just sendfile() that to the client.
> >
> > I forget what his hacks were named, but there certainly was a longish
> > thread on this list about it about 1 year ago if memory serves.
>
> Would that be Fabio Riccardi's X15 stuff?

Yes. I was about to reply to this effect.

X15 was a userspace httpd that operated using the Tux-designed
constructs -- sendfile and such. IIRC, Ingo actually pointed out some
things Fabio did were non-RFC (sending the static headers may of been
one of them, since the timestamp was wrong) and Fabio made a lot of
changes. X15 seemed promising, especially since it trumpeted that Linux
"worked" without sticking things in kernel-space, but I don't remember
if we ever saw source (let alone a free license)?

Robert Love

2001-12-20 06:54:03

by Robert Love

[permalink] [raw]
Subject: Re: Scheduler, Can we save some juice ...

On Thu, 2001-12-20 at 01:33, Timothy Covell wrote:

> OK, here's another 0.1% for you. Considering how Linux SMP
> doesn't have high CPU affinity, would it be possible to make a
> patch such that the additional CPUs remain in deep sleep/HALT
> mode until the first CPU hits a high-water mark of say 90%
> utilization? I've started doing this by hand with the (x)pulse
> application. My goal is to save electricity and cut down on
> excess heat when I'm just browsing the web and not compiling
> or seti@home'ing.

You'd probably be better off working against load and not CPU usage,
since a single app can hit you at 100% CPU. Load average is the sort of
metric you want, since if there is more than 1 task waiting to run on
average, you will benefit from multiple CPUs.

That said, this would be easy to do in user space using the hotplug CPU
patch. Monitor load average (just like any X applet does) and when it
crosses over the threshold: "echo 1 > /proc/sys/cpu/2/online"

Another solution would be to use CPU affinity to lock init (and thus all
tasks) to 0x00000001 or whatever and then start allowing 0x00000002 or
whatever when load gets too high.

My point: it is awful easy in user space.

Robert Love

2001-12-20 07:02:23

by David Miller

[permalink] [raw]
Subject: Re: aio

From: Mike Castle <[email protected]>
Date: Wed, 19 Dec 2001 22:46:52 -0800

Would that be Fabio Riccardi's X15 stuff?

Yes, that sounds like the one.

2001-12-20 07:13:55

by Mike Castle

[permalink] [raw]
Subject: Re: aio

On Thu, Dec 20, 2001 at 01:55:25AM -0500, Robert Love wrote:
> changes. X15 seemed promising, especially since it trumpeted that Linux
> "worked" without sticking things in kernel-space, but I don't remember
> if we ever saw source (let alone a free license)?

We did, but licensing was personal use only. It's all in the archives for
those curious. Though following the URLs Fabio posted just got me to
login screen.

mrc
--
Mike Castle [email protected] http://www.netcom.com/~dalgoda/
We are all of us living in the shadow of Manhattan. -- Watchmen
fatal ("You are in a maze of twisty compiler features, all different"); -- gcc

2001-12-20 07:26:20

by Daniel Phillips

[permalink] [raw]
Subject: Re: aio

On December 20, 2001 06:39 am, David S. Miller wrote:
> From: Benjamin LaHaise <[email protected]>
> Date: Wed, 19 Dec 2001 22:47:17 -0500
> An X server that doesn't have to make a syscall to find out that
> more data has arrived?
>
> Who really needs this kind of performance improvement? Like anyone
> really cares if their window gets the keyboard focus or a pixel over a
> AF_UNIX socket a few nanoseconds faster. How many people do you think
> believe they have unacceptable X performance right now and that
> select()/poll() syscalls overhead is the cause? Please get real.

I care, I always like faster graphics.

> People who want graphics performance are not pushing their data
> through X over a filedescriptor, they are either using direct
> rendering in the app itself (ala OpenGL) or they are using shared
> memory for the bulk of the data (ala Xshm or Xv extensions).

You're probably overgeneralizing. Actually, I run games on my server and
display the graphics on my laptop. It works. I'd be happy if it was faster.

I don't see right off how AIO would make that happen though. Ben, could you
please enlighten me, what would be the mechanism? Are other OSes doing X
with AIO?

--
Daniel

2001-12-20 08:21:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: aio


On Wed, 19 Dec 2001, Linus Torvalds wrote:

> I think the question you _should_ be lobbying at Ben and the other aio
> people is how the aio stuff could do zero-copy from disk cache to the
> network, ie do the things that Tux does internally where it does
> nonblocking reads from disk ad then sends them out non-blocking to the
> network without havign to copy the data _or_ have to use extremely
> expensive TLB mapping tricks to get at it..

months ago i already offered Ben to port TUX to the aio interfaces once
they are available in the kernel. Unfortunately right now i cant afford
maintaining two separate TUX trees - so it's a chicken and egg thing in
this context.

But once aio is available, i *will* do it, because one of Ben's goals
fully state-machine-driven async block IO, which i'd like to use (and
test, and finetune, and improve) very much. (right now TUX does async
block IO via helper kernel threads. Async net-io is fully IRQ-driven.)
I'd also like to prove that our aio interfaces are capable.

there are two possibilities i can think of:

1) lets get Ben's patch in but do *not* export the syscalls, yet.

2) find some nice way of doing 'experimental syscalls', which are not
guaranteed to stay that way. (Perhaps this is a naive proposition,
often there is nothing more permanent than temporary solutions.)
Something like reserving 'temporary' syscalls at the end of the syscall
space, which would be frequently moved/removed/renamed just to keep
folks from relying on it. No interface is guaranteed. Perhaps some
technical solution can be find to make these syscalls truly temporary.

i'm sure people will get excited about (ie. use) aio once it's in the
kernel. Ben is very good at coding, perhaps not as good at PR, but should
such level of PR really be a natural part of Linux development?

> Ie tie the "sendfile" and "aio" threads together, and ask Ben if we
> can do aio-sendfile and have thousands of asynchronous sendfiles going
> on at the same time, like Tux can do. And if not, then why not?
> Missing or bad interfaces?

i'd love to find out. *If* it's guaranteed that some sort of sane aio will
always be available from the point on it's introduced into the kernel then
i'll switch TUX to it. (it will change TUX upside down, this is why i
cannot maintain two separate TUX trees.) TUX doesnt need stable
interfaces. While TUX might not be as important, usage-wise, it's
certainly a good playing ground for such things.

Ingo

2001-12-20 11:30:17

by Rik van Riel

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Wed, 19 Dec 2001, David S. Miller wrote:
> From: Rik van Riel <[email protected]>
> On Tue, 18 Dec 2001, Linus Torvalds wrote:
>
> > The thing is, I'm personally very suspicious of the "features for that
> > exclusive 0.1%" mentality.
>
> Then why do we have sendfile(), or that idiotic sys_readahead() ?
>
> Sending files over sockets are %99 of what most network servers are
> actually doing today, it is much more than 0.1% :-)

The same could be said for AIO, there are a _lot_ of
server programs which are heavily overthreaded because
of a lack of AIO...

cheers,

Rik
--
Shortwave goes a long way: irc.starchat.net #swl

http://www.surriel.com/ http://distro.conectiva.com/

2001-12-20 11:36:07

by David Miller

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

From: Rik van Riel <[email protected]>
Date: Thu, 20 Dec 2001 09:29:28 -0200 (BRST)

On Wed, 19 Dec 2001, David S. Miller wrote:
> Sending files over sockets are %99 of what most network servers are
> actually doing today, it is much more than 0.1% :-)

The same could be said for AIO, there are a _lot_ of
server programs which are heavily overthreaded because
of a lack of AIO...

If you read my most recent responses to Ingo's postings, you'll see
that I'm starting to completely agree with you :-)

2001-12-20 11:50:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: aio

On Thu, Dec 20, 2001 at 11:44:05AM +0100, Ingo Molnar wrote:
> we need a sane interface that covers *all* sorts of IO, not just sockets.
> I used to have exactly the same optinion as you have now, but now i'd like
> to have a common async IO interface that will cover network IO, block IO
> [or graphics IO, or whatever comes up]. We should have something saner and
> more explicit than a side-branch of fcntl() handling the socket fasync
> code.

I second this wholeheartedly. And I believe there are still more
motivations for providing asynchronous interfaces for all I/O in
the realm of assisting the userland:

(1) It would simplify the ways applications have and the kernel
overhead of responding to user input while I/O is in progress.

(2) It would provide a more efficient way to do M:N threading than
watchdogs and nonblocking poll/select in itimers.


Cheers,
Bill

2001-12-20 11:52:47

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: aio

On Wed, Dec 19, 2001 at 10:09:40PM -0800, Linus Torvalds wrote:
> Could we get back on track, and possibly discuss the patches themselves,
> ok? We want _constructive_ criticism of the interfaces.
>
> I think it's clear that many people do want to have aio support. At least

Yes, to add to the lot, even though you probably don't need any more proof :)
we have at least 3 different products requiring this. Both for scalable
communications aio (large no. of connections) as well as file/disk aio.
In fact, one of the things that I've been worried about once 2.5 opened up
and the bio changes started flowing in (much as I was delighted to see
Jens' stuff finally getting integrated), was whether this would mean a
longer timeframe before we can hope to see aio in a distribution (which is
the question which I have to respond to our product groups about).

> as far as I'm concerned, that's not the reason I want to have public
> discussion. I want to make sure that the interfaces are good for aio
> users, and that the design isn't stupid.
>

My feeling about this is that we shouldn't necessarily need to
have the entire aio code in perfect shape and bring it in all in one shot.
The thing I like about the design is that it is quite possible to split
this up into smaller core patches and bring them in slowly. And I agree
with Ingo that we should be able to start stabilizing the basic internal
mechanisms or foundations on which aio is built before we freeze the
external interfaces. In fact, these two things could happen parallely.
So I was hoping that an evolutionary approach would work. In the current
design, the aio path is handled separately from the normal i/o paths, so the
impact on existing interfaces is less. It mainly affects aio users. So it
ought to be possible to integrate it in a way that doesn't hurt regular
operations, and it should be easier to change it once it is in without
breaking too many things.

Existing aio users on other platforms that I have come across seem to use
either POSIX aio (for file /disk aio) or completion port style interfaces
(mainly for communications aio), both of which seem to be possible with
Ben's implementation. One has to explicitly associate each i/o with the
completion queue (ctx), rather than associate an fd as a whole with it so
that all completion events on that fd come to the ctx. That should be OK.
Besides with async poll support we can have per-file readiness notification
as well. I was hoping for the async poll piece being available early to
exercise the top half or event handling side of aio, so we have scalable
select/poll support, so was focussing on that part to start with.

Your point about some critical discussion of the interfaces and the design
is well-taken. We have had a few discussions on the aio mailing and more on
irc about some aspects, but not quite a thorough out and out analysis of
pros and cons of the whole design. I just started writing some points here,
but then realized that it is going to take much longer, so decided to do
that while working with Ben on the documentation, and discuss more after
that.

Regards
Suparna

2001-12-20 14:39:56

by Luigi Genoni

[permalink] [raw]
Subject: Re: aio



On Wed, 19 Dec 2001, David S. Miller wrote:

> From: Bill Huey <[email protected]>
> Date: Wed, 19 Dec 2001 19:07:16 -0800
>
> And using lkml as a AIO forum is probably outside of the scope of this list
> and group.
>
> This whole thread exists because Linus wants public general and
> technical discussion on lkml of new features to happen before he
> considers putting them into the tree, and the fact that they are not
> in the tree because he isn't seeing such enthusiastic discussions
> happening at all.
>
YES, and he is right doing so.

> I don't think AIO, because of it's non-trivial impact to the tree, is
> at all outside the scope of this list. This is in fact the place
> where major stuff like AIO is meant to be discussed, not some special
> list where only "AIO people" hang out, of course people on that list
> will be enthusiastic about AIO!
agreed
>
> Frankly, on your other comments, I don't give a rats ass what BSD/OS
> people are doing about, nor how highly they rate, Java. That is
> neither here nor there. Java is going to be dead in a few years, and
> let's just agree to disagree about this particular point ok?
mmhh, java will not be death untill a lot of commecial software will use
it for graphical interfaces.
Infact it is simpler and cheaper for them to use java, and so we have to
deal with this bad future, where a dead language will be keept alive by
software houses.
That said, should we care about this? In my opinion, NO. and why we
should? when there are no good technical reasons, political reasons should
please disappear.

Luigi

2001-12-20 16:14:18

by Dan Kegel

[permalink] [raw]
Subject: Re: aio

"David S. Miller" wrote:
> If AIO was so relevant+sexy we'd be having threads of discussion about
> the AIO implementation instead of threads about how relevant it is or
> is not for the general populace. Wouldn't you concur? :-)
>
> The people doing Java server applets are such a small fraction of the
> Linux user community.

People writing code for NT/Win2K/WinXP are being channelled into
using AIO because that's the way to do things there (NT doesn't
really support nonblocking I/O). Thus another valid economic
reason AIO is important is to make it easier to port code from NT.
I have received requests from NT folks for things like aio_recvfrom()
(and have passed them on to Ben), so I'm not just guessing here.

As should be clear from my c10k page, I love nonblocking I/O,
but I firmly believe that some form of AIO is vital.

- Dan

2001-12-20 16:30:29

by Dan Kegel

[permalink] [raw]
Subject: Re: aio

Ingo Molnar wrote:

> it's not a fair comparison. The system was set up to not exhibit any async
> IO load. So a pure, atomic sendfile() outperformed TUX slightly, where TUX
> did something slightly more complex (and more RFC-conform as well - see
> Date: caching in X12 for example). Not something i'd call a proof - this
> simply works around the async IO interface. (which RT-signal driven,
> fasync-helped async IO interface, as phttpd has proven, is not only hard
> to program and is unrobust, it also performs *very badly*.)

Proper wrapper code can make them (almost) easy to program with.
See http://www.kegel.com/dkftpbench/doc/Poller_sigio.html for an example
of a wrapper that automatically handles the fallback to poll() on overflow.
Using this wrapper I wrote ftp clients and servers which use a thin wrapper
api that lets the user choose from select, poll, /dev/poll, kqueue/kevent, and RT signals
at runtime.

That said, I think that using the RT signal queue is just plain the
wrong way to go, and I can't wait for better approaches to make it
into the standard kernel someday.

- Dan

2001-12-20 17:25:02

by henning

[permalink] [raw]
Subject: Re: aio

"David S. Miller" <[email protected]> writes:

>The people doing Java server applets are such a small fraction of the
>Linux user community.

The people doing Java Servlets are maybe a small fraction of the Linux
Kernel Hackers community. Not the user community. They simply do not
show up here because they don't care for Linux 2.5.0-rc1-prepatched.

Short head count: Who here is also a regular reader on Apache Jakarta
lists?

Kernel hacking and Java most of the times doesn't mix. And Java folks
are completly ambivalent to their OS: If Linux doesn't deliver,
well. Windows, Solaris and BSD do. That's what Java is all about.

Regards
Henning


--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]

Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20

2001-12-20 17:26:22

by henning

[permalink] [raw]
Subject: Re: aio

"David S. Miller" <[email protected]> writes:

>neither here nor there. Java is going to be dead in a few years, and
>let's just agree to disagree about this particular point ok?

Care to point out why? Because of Sun or because of C#?

Regards
Henning


--
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH [email protected]

Am Schwabachgrund 22 Fon.: 09131 / 50654-0 [email protected]
D-91054 Buckenhof Fax.: 09131 / 50654-20

2001-12-20 17:44:05

by Timothy Covell

[permalink] [raw]
Subject: Re: Scheduler, Can we save some juice ...

On Thursday 20 December 2001 00:52, Robert Love wrote:
> On Thu, 2001-12-20 at 01:33, Timothy Covell wrote:
> > OK, here's another 0.1% for you. Considering how Linux SMP
> > doesn't have high CPU affinity, would it be possible to make a
> > patch such that the additional CPUs remain in deep sleep/HALT
> > mode until the first CPU hits a high-water mark of say 90%
> > utilization? I've started doing this by hand with the (x)pulse
> > application. My goal is to save electricity and cut down on
> > excess heat when I'm just browsing the web and not compiling
> > or seti@home'ing.
>
> You'd probably be better off working against load and not CPU usage,
> since a single app can hit you at 100% CPU. Load average is the sort of
> metric you want, since if there is more than 1 task waiting to run on
> average, you will benefit from multiple CPUs.
>
> That said, this would be easy to do in user space using the hotplug CPU
> patch. Monitor load average (just like any X applet does) and when it
> crosses over the threshold: "echo 1 > /proc/sys/cpu/2/online"
>
> Another solution would be to use CPU affinity to lock init (and thus all
> tasks) to 0x00000001 or whatever and then start allowing 0x00000002 or
> whatever when load gets too high.
>
> My point: it is awful easy in user space.
>
> Robert Love
>

You make good points. I'll try the hotplug CPU patch to automate things
more than with my simple use of Xpulse, (whose code I could have
used if I wanted to get off my butt and write a useful C application.)


--
[email protected].

2001-12-20 18:02:57

by Davide Libenzi

[permalink] [raw]
Subject: Re: aio

On Thu, 20 Dec 2001, Dan Kegel wrote:

> Ingo Molnar wrote:
>
> > it's not a fair comparison. The system was set up to not exhibit any async
> > IO load. So a pure, atomic sendfile() outperformed TUX slightly, where TUX
> > did something slightly more complex (and more RFC-conform as well - see
> > Date: caching in X12 for example). Not something i'd call a proof - this
> > simply works around the async IO interface. (which RT-signal driven,
> > fasync-helped async IO interface, as phttpd has proven, is not only hard
> > to program and is unrobust, it also performs *very badly*.)
>
> Proper wrapper code can make them (almost) easy to program with.
> See http://www.kegel.com/dkftpbench/doc/Poller_sigio.html for an example
> of a wrapper that automatically handles the fallback to poll() on overflow.
> Using this wrapper I wrote ftp clients and servers which use a thin wrapper
> api that lets the user choose from select, poll, /dev/poll, kqueue/kevent, and RT signals
> at runtime.

Hey, you forgot /dev/epoll, the fastest one :)




- Davide


2001-12-20 18:21:49

by Robert Love

[permalink] [raw]
Subject: Re: aio

On Thu, 2001-12-20 at 05:18, Ingo Molnar wrote:

> there are two possibilities i can think of:
>
> 1) lets get Ben's patch in but do *not* export the syscalls, yet.

This is an excellent way to give aio the testing and exposure Linus
wants without getting into the commitment / syscall mess.

Stick aio in the kernel, play with it via Tux, etc. The really
interested can add temporary syscalls. aio (which I like, btw) will get
testing and in time, once proven, we can add the syscalls.

Comments?

Robert Love

2001-12-20 20:04:30

by M. Edward Borasky

[permalink] [raw]
Subject: Re: aio

On Thu, 20 Dec 2001, Henning Schmiedehausen wrote:

> "David S. Miller" <[email protected]> writes:
>
> >neither here nor there. Java is going to be dead in a few years, and
> >let's just agree to disagree about this particular point ok?
>
> Care to point out why? Because of Sun or because of C#?

Because MSFT is bigger than SUNW :-) As that great American philosopher,
Damon Runyon, once said, "The race is not always to the swift, nor the
battle to the strong -- but that's the way to bet!"
--
M. Edward Borasky

[email protected]
http://www.borasky-research.net

If I had 40 billion dollars for every software monopoly that sells an
unwieldy and hazardously complex development environment and is run by
an arrogant college dropout with delusions of grandeur who treats his
employees like serfs while he is acclaimed as a man of compelling
vision, I'd be a wealthy man.

2001-12-20 21:52:57

by Lincoln Dale

[permalink] [raw]
Subject: Re: aio

At 08:32 AM 20/12/2001 -0800, Dan Kegel wrote:
>Proper wrapper code can make them (almost) easy to program with.
>See http://www.kegel.com/dkftpbench/doc/Poller_sigio.html for an example
>of a wrapper that automatically handles the fallback to poll() on overflow.
>Using this wrapper I wrote ftp clients and servers which use a thin wrapper
>api that lets the user choose from select, poll, /dev/poll, kqueue/kevent,
>and RT signals
>at runtime.

SIGIO sucks in the real-world for a few reasons right now, most of them
unrelated to 'sigio' itself:

1. SIGIO uses signals.
look at how signals are handled on multiprocessor (SMP) boxes.
can you say "cache ping-pong", not to mention the locking and
task_struct
loop lookups?
every signal to user-space results in 512-bytes of memory-copy
from kernel-to-user-space.

2. SIGIO is very heavy
userspace only gets back one-event-per-system-call, thus you end
up with tens-of-
thousands of user<->kernel transitions per second eating up
valuable cpu resources.
there is neither: (a) aggregation of SIGIO events on a per-socket
basis, nor
(b) aggregating multiple SIGIO events from multiple sockets onto a
single system call.

3. enabling SIGIO is racy at socket-accept
multiple system calls are required to accept a connection on a
socket and then
enable SIGIO on it. packets can arrive in the meantime.
one can workaround this with a poll() but its bad.

4. in practical terms, SIGIO-based I/O isn't very good at expressing a
"no POLL_OUT" signal.

5. SIGIO is only a _notification_ mechanism. it does NOTHING for
zero-copy-i/o from/to-disk
from/to-userspace from/to-network.


cheers,

lincoln.

2001-12-20 22:01:58

by Linus Torvalds

[permalink] [raw]
Subject: Re: aio


On Thu, 20 Dec 2001, Lincoln Dale wrote:
>
> SIGIO sucks in the real-world for a few reasons right now, most of them
> unrelated to 'sigio' itself:

Well, there _is_ one big one, which definitely is fundamentally related to
sigio itself:

sigio is an asynchronous event programming model.

And let's face it, asynchronous programming models suck. They inherently
require that you handle various race conditions etc, and have extra
locking.

Note that "asynchronous programming model" is not the same as
"asynchronous IO completion". The former implies a threaded user space,
the latter implies threaded kernel IO.

And let's face it - threading is _hard_ to get right. People just don't
think well about asynchronous events.

It's much easier to have a synchronous interface to the asynchronous IO,
ie one where you do not have to worry about events happening "at the same
time".

SIGIO just isn't very nice. It's useful for some event notification (ie if
you don't actually _do_ anything in the signal handler), but let's be
honest: it's an extremely heavy notifier. Something synchronous like
"poll" or "select" will beat it just about every time (yes, they don't
scale well, but neither does SIGIO).

Linus

2001-12-20 22:35:38

by Cameron Simpson

[permalink] [raw]
Subject: Re: aio

On Thu, Dec 20, 2001 at 01:20:55PM -0500, Robert Love <[email protected]> wrote:
| On Thu, 2001-12-20 at 05:18, Ingo Molnar wrote:
| > there are two possibilities i can think of:
| > 1) lets get Ben's patch in but do *not* export the syscalls, yet.
|
| This is an excellent way to give aio the testing and exposure Linus
| wants without getting into the commitment / syscall mess.
| Stick aio in the kernel, play with it via Tux, etc. The really
| interested can add temporary syscalls. aio (which I like, btw) will get
| testing and in time, once proven, we can add the syscalls.
| Comments?

Only that it would be hard for user space people to try it - does Ben's
patch (with hypothetical syscalls) present the POSIX async interfaces out
of the box? If not, testing with in-kernel things is sufficient. But
if it does then it becomes more reasonable to transiently define some
syscall numbers (high up, in some defined as "testing and like shifting
sands" range) so user space can test the interface.

Thought: is there a meta-syscall in the kernel API for calling other syscalls?
You could have such a beast taking negative numbers for experimental calls...
--
Cameron Simpson, DoD#743 [email protected] http://www.zip.com.au/~cs/

Sometimes the only solution is to find a new problem.

2001-12-20 22:40:48

by Troels Walsted Hansen

[permalink] [raw]
Subject: RE: Scheduler ( was: Just a second ) ...

>From: David S. Miller
> From: Linus Torvalds <[email protected]>
> Well, that was true when the thing was written, but whether anybody
_uses_
> it any more, I don't know. Tux gets the same effect on its own, and
I
> don't know if Apache defaults to using sendfile or not.
>
>Samba uses it by default, that I know for sure :-)

I wish... Neither Samba 2.2.2 nor the bleeding edge 3.0alpha11 includes
the word "sendfile" in the source at least. :( Wonder why the sendfile
patches where never merged...

--
Troels Walsted Hansen

2001-12-20 22:46:40

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Fri, Dec 21, 2001 at 09:30:27AM +1100, Cameron Simpson wrote:
> Only that it would be hard for user space people to try it - does Ben's
> patch (with hypothetical syscalls) present the POSIX async interfaces out
> of the box?

No. POSIX aio does not have any concept of a completion queue. Completion
in POSIX aio comes via a thread callback, signal delivery or polling, all
of which are horrendously inefficient.

> If not, testing with in-kernel things is sufficient. But
> if it does then it becomes more reasonable to transiently define some
> syscall numbers (high up, in some defined as "testing and like shifting
> sands" range) so user space can test the interface.

Maybe. The unfortunate aspect to this is that you can't tell if a number
matches the name you expect it to be, and invariably people end up running
the wrong code on the wrong kernel. Or vendors start shipping patches to
enable these new syscalls....

> Thought: is there a meta-syscall in the kernel API for calling other
> syscalls? You could have such a beast taking negative numbers for
> experimental calls...

I'm working on something. Stay tuned.

-ben
--
Fish.

2001-12-20 23:09:00

by Lincoln Dale

[permalink] [raw]
Subject: Re: aio

At 01:59 PM 20/12/2001 -0800, Linus Torvalds wrote:

>On Thu, 20 Dec 2001, Lincoln Dale wrote:
> >
> > SIGIO sucks in the real-world for a few reasons right now, most of them
> > unrelated to 'sigio' itself:
>
>Well, there _is_ one big one, which definitely is fundamentally related to
>sigio itself:
>
>sigio is an asynchronous event programming model.
>
>And let's face it, asynchronous programming models suck. They inherently
>require that you handle various race conditions etc, and have extra
>locking.

actually, i disagree with your assertion that "asyncronous programming
models suck".

for MANY applications, it doesn't matter. the equivalent to async is to do
either:
- thread-per-connection or process-per-connection (ala apache, sendmail,
inetd-type services, ...)
- a system that blocks -- handles one-connection-at-a-time

the only time async actually starts to matter is if you start to stress the
precipitous performance characteristics associated with thousands of
concurrent tasks in a thread/process-per-connection model. (limited
processor L2 cache size, multiple tasks sharing the same cache-lines
(suboptimal cache colouring), scheduler overhead, wasted
stack-space-per-thread/process, ..).

if you care about that level of performance, then you generally move to an
async model.
moving to an async model doesn't have to be hard -- people generally start
with their own pseudo scheduler and go from there.
"harder" than non-async: yes. but "hard": no.

>SIGIO just isn't very nice. It's useful for some event notification (ie if
>you don't actually _do_ anything in the signal handler), but let's be
>honest: it's an extremely heavy notifier. Something synchronous like
>"poll" or "select" will beat it just about every time (yes, they don't
>scale well, but neither does SIGIO).

actually, my experience (circa 12 months ago) was that they were roughly equal.
poll()'s performance dropped off significantly at a few thousand FDs
whereas sigio's latency just went up.
but it was somewhat trivial to _make_ poll() go faster by being intelligent
about what fd's to poll. simple logic of "if a FD didn't have anything
active, don't poll for it on the next poll() loop" didn't increase the
latency in servicing that FD by any noticable amount but basically triples
the # of FDs one could handle.


cheers,

lincoln.
NB. sounds like you're making a case for the current trend in Java Virtual
Machines insistance on "lots of processes" being a good thing. <grin, duck,
run>

2001-12-20 23:56:13

by Chris Ricker

[permalink] [raw]
Subject: RE: Scheduler ( was: Just a second ) ...

On Thu, 20 Dec 2001, Troels Walsted Hansen wrote:

> >From: David S. Miller
> > From: Linus Torvalds <[email protected]>
> > Well, that was true when the thing was written, but whether anybody
> _uses_
> > it any more, I don't know. Tux gets the same effect on its own, and
> I
> > don't know if Apache defaults to using sendfile or not.
> >
> >Samba uses it by default, that I know for sure :-)
>
> I wish... Neither Samba 2.2.2 nor the bleeding edge 3.0alpha11 includes
> the word "sendfile" in the source at least. :( Wonder why the sendfile
> patches where never merged...

The only real-world source I've noticed actually using sendfile() are some
of the better ftp daemons (such as vsftpd).

later,
chris

--
Chris Ricker [email protected]

This is a dare to the Bush administration.
-- Thurston Moore


2001-12-20 23:54:03

by David Miller

[permalink] [raw]
Subject: Re: aio

From: [email protected] (Henning Schmiedehausen)
Date: Thu, 20 Dec 2001 17:26:05 +0000 (UTC)

Care to point out why? Because of Sun or because of C#?

That's a circular question, because C# exists due to Sun's mistakes
with handling Java. So my answer is "both".

2001-12-21 00:00:43

by CaT

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Thu, Dec 20, 2001 at 04:55:55PM -0700, Chris Ricker wrote:
> > I wish... Neither Samba 2.2.2 nor the bleeding edge 3.0alpha11 includes
> > the word "sendfile" in the source at least. :( Wonder why the sendfile
> > patches where never merged...
>
> The only real-world source I've noticed actually using sendfile() are some
> of the better ftp daemons (such as vsftpd).

proftpd uses it also.

--
CaT - A high level of technology does not a civilisation make.

2001-12-21 00:03:56

by Davide Libenzi

[permalink] [raw]
Subject: RE: Scheduler ( was: Just a second ) ...

On Thu, 20 Dec 2001, Chris Ricker wrote:

> On Thu, 20 Dec 2001, Troels Walsted Hansen wrote:
>
> > >From: David S. Miller
> > > From: Linus Torvalds <[email protected]>
> > > Well, that was true when the thing was written, but whether anybody
> > _uses_
> > > it any more, I don't know. Tux gets the same effect on its own, and
> > I
> > > don't know if Apache defaults to using sendfile or not.
> > >
> > >Samba uses it by default, that I know for sure :-)
> >
> > I wish... Neither Samba 2.2.2 nor the bleeding edge 3.0alpha11 includes
> > the word "sendfile" in the source at least. :( Wonder why the sendfile
> > patches where never merged...
>
> The only real-world source I've noticed actually using sendfile() are some
> of the better ftp daemons (such as vsftpd).

And XMail :)




- Davide


2001-12-21 00:30:14

by Bill Huey

[permalink] [raw]
Subject: Offtopic Java/C# [Re: aio]

On Thu, Dec 20, 2001 at 03:53:13PM -0800, David S. Miller wrote:
> That's a circular question, because C# exists due to Sun's mistakes
> with handling Java. So my answer is "both".

Well, they really serve different purposes and can't be compare. One
is an unified object/class model for DCOM with more static typing stuff
(boxing, parametric types), which the other is more CLish with class
reflection built closely into the language runtime, threading and other
self contained things within that system.

bill

2001-12-21 11:51:34

by Ingo Molnar

[permalink] [raw]
Subject: Re: aio


On Fri, 21 Dec 2001, Gerold Jury wrote:

> It is simply too early for sexy discussions. For me, the most
> appealing part of AIO is the socket handling. It seems a little bit
> broken in the current glibc emulation/implementation. Recv and send
> operations are ordered when used on the same socket handle. Thus a
> recv must be finished before a subsequent send will happen. Good idea
> for files, bad for sockets.

is this a fundamental limitation expressed in the interface, or just an
implementational limitation? On sockets this is indeed a big problem, HTTP
pipelining wants completely separate receive/send queues.

Ingo

2001-12-21 11:46:04

by Gerold Jury

[permalink] [raw]
Subject: Re: aio

On Thursday 20 December 2001 17:16, Dan Kegel wrote:
> "David S. Miller" wrote:
> > If AIO was so relevant+sexy we'd be having threads of discussion about
> > the AIO implementation instead of threads about how relevant it is or
> > is not for the general populace. Wouldn't you concur? :-)
> >
> > The people doing Java server applets are such a small fraction of the
> > Linux user community.
>
> reason AIO is important is to make it easier to port code from NT.
>
> but I firmly believe that some form of AIO is vital.
>
> - Dan

>From the aio-0.3.1/README
section Current State

IPv4 TCP and UDP (rx only) sockets.

It is simply too early for sexy discussions. For me, the most appealing part
of AIO is the socket handling. It seems a little bit broken in the current
glibc emulation/implementation.
Recv and send operations are ordered when used on the same socket handle.
Thus a recv must be finished before a subsequent send will happen.
Good idea for files, bad for sockets.

SGI's implementation kaio, which works perfect for me, is widespread ignored
and sufferes from the unreserved syscall problem like Ben's aio. I am sure
there is a reason for ignoring SGI-kaio, i just do not remember.

With the current state of the different implementations it is difficult to
have sex about or use them.
But i would really like tooooooooooooooooooo.

Gerold

-
The one-sig-perfd patch did not get much attention either.
No one seems to use sockets these days.

2001-12-21 15:28:18

by Gerold Jury

[permalink] [raw]
Subject: Re: aio

On Friday 21 December 2001 14:48, Ingo Molnar wrote:
> On Fri, 21 Dec 2001, Gerold Jury wrote:
> > It is simply too early for sexy discussions. For me, the most
> > appealing part of AIO is the socket handling. It seems a little bit
> > broken in the current glibc emulation/implementation. Recv and send
> > operations are ordered when used on the same socket handle. Thus a
> > recv must be finished before a subsequent send will happen. Good idea
> > for files, bad for sockets.
>
> is this a fundamental limitation expressed in the interface, or just an
> implementational limitation? On sockets this is indeed a big problem, HTTP
> pipelining wants completely separate receive/send queues.
>
> Ingo
>

That is a very good question.

The Single UNIX ? Specification, Version 2 has the following to say.

If _POSIX_SYNCHRONIZED_IO is defined and synchronised I/O is enabled on the
file associated with aiocbp->aio_fildes, the behaviour of this function is
according to the definitions of synchronised I/O data integrity completion
and synchronised I/O file integrity completion.

Maybe a was a little bit too fast in blaming glibc. I will go and look for
more documentation about disabling synchronised I/O on a socket.

Dup()licating the socket handle is an easy workaround, but now i am
convinced, a little bit man page digging will be lots of fun.

I hope the efforts of Benjamin LaHaise receive more attention and as soon as
i know more about disabling synchronised I/O on sockets i will send an other
email.

Gerold

--
I love AIO

2001-12-21 17:15:40

by Alan

[permalink] [raw]
Subject: Re: aio

> Who cares about Java? What about high performance LDAP servers or tux-like
> userspace performance? How about faster select and poll? An X server that

select/poll is a win - and Java recently discovered poll/select semantics 8)

2001-12-21 17:17:01

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: aio

On Fri, Dec 21, 2001 at 05:24:33PM +0000, Alan Cox wrote:
> select/poll is a win - and Java recently discovered poll/select semantics 8)

Anything is a win over Java's threading model.

-ben
--
Killer Attack Fish.

2001-12-21 17:19:51

by Alan

[permalink] [raw]
Subject: Re: aio

> Precisely, in fact. Anyone who can say that Java is going to be
> relevant in a few years time, with a straight face, is only kidding
> themselves.

Oh it'll be very relevant. Its leaking into all sorts of embedded uses, from
Digital TV to smartcards. Its still useless for serious high end work an
likely to stay so.

> Java is not something to justify a new kernel feature, that is for
> certain.

There we agree. Things like the current asynch/thread mess in java are
partly poor design of language and greatly stupid design of JVM.

2001-12-22 04:25:44

by Rob Landley

[permalink] [raw]
Subject: Re: Scheduler ( was: Just a second ) ...

On Tuesday 18 December 2001 01:27 pm, Doug Ledford wrote:

> Weel, evidently esd and artsd both do this (well, I assume esd does now, it
> didn't do this in the past). Basically, they both transmit silence over
> the sound chip when nothing else is going on. So even though you don't
> hear anything, the same sound output DMA is taking place. That avoids

THAT explains it.

My Dell Inspiron 3500 laptop's built-in sound (NeoMagic MagicMedia 256 AV,
uses ad1848 module) works fine when I first boot the sucker, but looses its
marbles after an APM suspend and stops receiving interrupts. (Extensive
poking around with setpci has so far failed to get it working again, but on a
shutdown and restart the bios sets it up fine. Not a clue what's up there.
The bios and module agree it's using IRQ 7, but lspci insists it's IRQ 11,
both before and after apm suspend. Boggle.)

I was confused for a while about how exactly it was failing because KDE and
mpg123 from the command line fail in different ways. mpg123 will play the
same half-second clip in a loop (ahah! no interrupt!), but sound in kde just
vanishes and I get silence and hung apps whenever I try to launch anything.

The clue is that it doesn't always fail when I suspend it without having X
up. Translation: maybe the sound card's getting hosed by being open and in
use on APM shutdown!

Hmmm... I should poke at this over the weekend...

(Nope, not a new problem. My laptop's sound has been like this since at
least 2.4.4, which I think was the first version I installed on the box. But
it's still annoying, I can go weeks without a true reboot 'cause I have a
zillion konqueror windows and such open. I have to clear my desktop to get
sound working again for a few hours. Obnoxious...)

Rob

2001-12-23 05:37:05

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Fri, Dec 21, 2001 at 12:16:45PM -0500, Benjamin LaHaise wrote:
> On Fri, Dec 21, 2001 at 05:24:33PM +0000, Alan Cox wrote:
> > select/poll is a win - and Java recently discovered poll/select semantics 8)
>
> Anything is a win over Java's threading model.
>
> -ben

Yeah, it's just another abstraction layer that lives on top of native threading
model, so you don't have to worry about stuff like spinlock contention since
it's been pushed down into the native threading implementation. It doesn't really
add a tremendous amount of overhead given how delegates all of that to the
native OS threading model.

Also, t would be nice to have some regular way of doing read-write lock without
having to implement it in Java language itself, but it's not too critical since
folks don't really push or use the JVM in that way just yet. It's certainly
important in certain high contention systems in the kernel.

bill

2001-12-23 05:48:07

by Bill Huey

[permalink] [raw]
Subject: Re: aio

On Fri, Dec 21, 2001 at 05:28:36PM +0000, Alan Cox wrote:
> > Precisely, in fact. Anyone who can say that Java is going to be
> > relevant in a few years time, with a straight face, is only kidding
> > themselves.
>
> Oh it'll be very relevant. Its leaking into all sorts of embedded uses, from
> Digital TV to smartcards. Its still useless for serious high end work an
> likely to stay so.
>
> > Java is not something to justify a new kernel feature, that is for
> > certain.
>
> There we agree. Things like the current asynch/thread mess in java are
> partly poor design of language and greatly stupid design of JVM.

It's not the fault of the JVM runtime nor the the language per se since
both are excellent. The blame should instead be placed on the political
process within Sun, which has created a lag in getting a decent IO event
model/system available in the form of an API.

This newer system is suppose to be able to scale to tens of thousands of
FDs and be able to handle heavy duty server side stuff in a more graceful
manner. It's a reasonable system from what I saw, but the implementation
of it is highly OS dependent and will be subject to those environmental
constraints. Couple this and the HotSpot compiler (supposeablly competitive
with gcc's -O3 from benchmarks) and it should be high useable for a broad
range of of server side work when intelligently engineered.

bill

2001-12-23 06:32:48

by Dan Kegel

[permalink] [raw]
Subject: Re: aio

Bill Huey wrote:
> > There we agree. Things like the current asynch/thread mess in java are
> > partly poor design of language and greatly stupid design of JVM.
>
> It's not the fault of the JVM runtime nor the the language per se since
> both are excellent. The blame should instead be placed on the political
> process within Sun, which has created a lag in getting a decent IO event
> model/system available in the form of an API.
>
> This newer system is suppose to be able to scale to tens of thousands of
> FDs and be able to handle heavy duty server side stuff in a more graceful
> manner. It's a reasonable system from what I saw, but the implementation
> of it is highly OS dependent and will be subject to those environmental
> constraints. Couple this and the HotSpot compiler (supposeablly competitive
> with gcc's -O3 from benchmarks) and it should be high useable for a broad
> range of of server side work when intelligently engineered.

I served on JSR-51, the expert group that helped design the new I/O
model. (The design was Sun's, but we had quite a bit of input.)
For network I/O, there's a Selector object which is essentially
a nice OO wrapper around the /dev/poll or kqueue/kevent abstraction.
Selector does have a distinctly Unixy feel to it, but it can probably
be implemented well on top of any reasonable OS; I'm quite sure
it can be expressed fairly well in terms of Windows NT's asych I/O
or Linux's rt signal stuff.

(I suspect the initial Linux implementations will just use poll(),
but that's something the Blackdown team can fix. And heck, it
ought to be easy to implement it on top of all the nifty poll
replacements and choose between them at jvm startup time without
any noticable overhead.)

- Dan

p.s. Davide, I didn't forget /dev/epoll, I just haven't had time to
post Poller_devepoll yet!

2001-12-23 18:41:23

by Davide Libenzi

[permalink] [raw]
Subject: Re: aio

On Sat, 22 Dec 2001, Dan Kegel wrote:

> p.s. Davide, I didn't forget /dev/epoll, I just haven't had time to
> post Poller_devepoll yet!

Yep, i just started feeling angry about this :)




- Davide


2001-12-24 11:09:22

by Gerold Jury

[permalink] [raw]
Subject: Re: aio

> On Friday 21 December 2001 14:48, Ingo Molnar wrote:
> > is this a fundamental limitation expressed in the interface, or just an
> > implementational limitation? On sockets this is indeed a big problem,
> > HTTP pipelining wants completely separate receive/send queues.
> >
> > Ingo
>

I got the _POSIX_SYNCHRONIZED_IO completely wrong.
It has nothing to do with the ordering of the aio read/write requests.

Aio_read and aio_write work with absolute positions inside the file size.
The order of requests is unspecified in SUSV V2.
SUSV V2 does neither prevent nor force the desired behaviour.

It seems like it is up to the implementation how to deal with the request
order.
As mentioned earlier, SGI-kaio does the right thing with the same interface.

I want to add, that a combination of sigwaitinfo / sigtimedwait and aio is a
very efficient way to deal with sockets. The accept may be handled with real
time signals as well by using fcntl F_SETSIG and F_SETFL FASYNC.


Gerold

2001-12-24 11:44:27

by Gerold Jury

[permalink] [raw]
Subject: Re: aio

sigtimedwait and sigwaitinfo in combination with SIGIO prevents the
asynchronous event problem and works very well for me.

Gerold

On Thursday 20 December 2001 22:59, Linus Torvalds wrote:
> It's much easier to have a synchronous interface to the asynchronous IO,
> ie one where you do not have to worry about events happening "at the same
> time".
>

2001-12-26 20:42:46

by Daniel Phillips

[permalink] [raw]
Subject: Java and Flam^H^H^H^H AIO (was: aio)

On December 23, 2001 06:46 am, Bill Huey wrote:
> On Fri, Dec 21, 2001 at 05:28:36PM +0000, Alan Cox wrote:
> > > Precisely, in fact. Anyone who can say that Java is going to be
> > > relevant in a few years time, with a straight face, is only kidding
> > > themselves.
> >
> > Oh it'll be very relevant. Its leaking into all sorts of embedded uses,
> > from Digital TV to smartcards. Its still useless for serious high end
> > work an likely to stay so.
> >
> > > Java is not something to justify a new kernel feature, that is for
> > > certain.
> >
> > There we agree. Things like the current asynch/thread mess in java are
> > partly poor design of language and greatly stupid design of JVM.
>
> It's not the fault of the JVM runtime nor the the language per se since
> both are excellent. The blame should instead be placed on the political
> process within Sun, which has created a lag in getting a decent IO event
> model/system available in the form of an API.

Hey wait, it can't be so. Sun apparently uses a boot camp system to
guarantee that every project finishes on time, every time.

* daniel ducks and runs

--
Daniel

2001-12-27 09:49:32

by Martin Dalecki

[permalink] [raw]
Subject: Re: aio

Bill Huey wrote:

>On Wed, Dec 19, 2001 at 07:06:29PM -0800, David S. Miller wrote:
>
>>Firstly, you say this as if server java applets do not function at all
>>or with acceptable performance today. That is not true for the vast
>>majority of cases.
>>
>>If java server applet performance in all cases is dependent upon AIO
>>(it is not), that would be pretty sad. But it wouldn't be the first
>>
>
>Java is pretty incomplete in this area, which should be addressed to a
>great degree in the new NIO API.
>
>The core JVM isn't dependent on this stuff per se for performance, but
>it is critical to server side programs that have to deal with highly
>scalable IO systems, largely number of FDs, that go beyond the current
>expressiveness of select()/poll().
>
>This is all standard fare in *any* kind of high performance networking
>application where some kind of high performance kernel/userspace event
>delivery system is needed, kqueue() principally.
>
>>time I've heard crap like that. There is propaganda out there telling
>>people that 64-bit address spaces are needed for good java
>>performance. Guess where that came from? (hint: they invented java
>>and are in the buisness of selling 64-bit RISC processors)
>>
>
>What ? oh god. HotSpot is a pretty amazing compiler and it performs well.
>Swing does well now, but the lingering issue in Java is the shear size
>of it and possibly GC issues. It pretty clear that it's going to get
>larger, which is fine since memory is cheap.
>
I remind you: ORACLE 9i is requiring half a gig as a minimum just due to the
use of the CRAPPY PIECE OF SHIT written in the Java, called, you guess
it: Just the
bloody damn Installer. Java is really condemned just due to the fact
that both terms: speed
and memmory usage are both allways only *relative* to other systems.

And yes GC's have only one problem - they try to give a general solution
for problems
which can be easly prooven to be mathmematically insolvable. The
resulting undeterministic
behaviour of applications is indeed the thing which is hurting most.