2002-09-27 02:23:17

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

Linus Torvalds wrote:
> For 2.6.x I care about getting the drivers _working_.

Tangent question, is it definitely to be named 2.6?

Maybe it's just my impression from development speed, but it felt more
like a 3.0 to me :)

Jeff





2002-09-27 04:52:47

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver


On Thu, 26 Sep 2002, Jeff Garzik wrote:
>
> Linus Torvalds wrote:
> > For 2.6.x I care about getting the drivers _working_.
>
> Tangent question, is it definitely to be named 2.6?

I see no real reason to call it 3.0.

The order-of-magnitude threading improvements might just come closest to
being a "new thing", but yeah, I still consider it 2.6.x. We don't have
new architectures or other really fundamental stuff. In many ways the jump
from 2.2 -> 2.4 was bigger than the 2.4 -> 2.6 thing will be, I suspect.

But hey, it's just a number. I don't feel that strongly either way. I
think version number inflation (can anybody say "distribution makers"?) is
a bit silly, and the way the kernel numbering works there is no reason to
bump the major number for regular releases.

Linus

2002-09-28 07:32:07

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver


On Thu, 26 Sep 2002, Linus Torvalds wrote:
> On Thu, 26 Sep 2002, Jeff Garzik wrote:
> > Tangent question, is it definitely to be named 2.6?
>
> I see no real reason to call it 3.0.
>
> The order-of-magnitude threading improvements might just come closest to
> being a "new thing", but yeah, I still consider it 2.6.x. We don't have
> new architectures or other really fundamental stuff. In many ways the
> jump from 2.2 -> 2.4 was bigger than the 2.4 -> 2.6 thing will be, I
> suspect.

i consider the VM and IO improvements one of the most important things
that happened in the past 5 years - and it's definitely something that
users will notice. Finally we have a top-notch VM and IO subsystem (in
addition to the already world-class networking subsystem) giving
significant improvements both on the desktop and the server - the jump
from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.

I think due to these improvements if we dont call the next kernel 3.0 then
probably no Linux kernel in the future will deserve a major number. In 2-4
years we'll only jump to 3.0 because there's no better number available
after 2.8. That i consider to be ... boring :) [while kernel releases are
supposed to be a bit boring, i dont think they should be _that_ boring.]

Ingo

2002-09-28 09:11:23

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On Sat, Sep 28, 2002 at 09:46:35AM +0200, Ingo Molnar wrote:
>
> On Thu, 26 Sep 2002, Linus Torvalds wrote:
> > On Thu, 26 Sep 2002, Jeff Garzik wrote:
> > > Tangent question, is it definitely to be named 2.6?
> >
> > I see no real reason to call it 3.0.
> >
> > The order-of-magnitude threading improvements might just come closest to
> > being a "new thing", but yeah, I still consider it 2.6.x. We don't have
> > new architectures or other really fundamental stuff. In many ways the
> > jump from 2.2 -> 2.4 was bigger than the 2.4 -> 2.6 thing will be, I
> > suspect.
>
> i consider the VM and IO improvements one of the most important things
> that happened in the past 5 years - and it's definitely something that
> users will notice. Finally we have a top-notch VM and IO subsystem (in
> addition to the already world-class networking subsystem) giving
> significant improvements both on the desktop and the server - the jump
> from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.
>
> I think due to these improvements if we dont call the next kernel 3.0 then
> probably no Linux kernel in the future will deserve a major number. In 2-4
> years we'll only jump to 3.0 because there's no better number available
> after 2.8. That i consider to be ... boring :) [while kernel releases are
> supposed to be a bit boring, i dont think they should be _that_ boring.]
>

Ingo, I agree with Linus. My recollection of when we moved
to 2.0 was that the major number reflected the user<->kernel
ABI. I have no problem with a version 2.42 if things stay
stable that long. I hope they don't but that is another
issue.

Version 3.0 implies incompatibility with binaries from 2.x
The distributions can play around with version numbers
reflecting the GUI interface, libraries or installers but
the kernel major version should stay the same until binary
compatibility is broken. When we move old syscalls (such as
32 bit file ops) from deprecated to unsupported is when we
increment the major number.

It may be that 2.7 will see the cruft cut out and be the end
of 2.x but 2.5 isn't that. So far 2.5 is performance
enhancement. Terrific performance enhancement, thanks to you
and many others. But it isn't adding major new features nor
is it removing old interfaces. In many ways 2.6 looks like
a sign that the 2.x kernel is getting mature. 2.6 means
users can expect improvements but don't have to make big changes.
2.6 is an upgrade, 3.0 would be a replacement.


--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-28 15:35:13

by Horst H. von Brand

[permalink] [raw]
Subject: Kernel version [Was: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver]

Ingo Molnar <[email protected]> said:
> On Thu, 26 Sep 2002, Linus Torvalds wrote:
> > On Thu, 26 Sep 2002, Jeff Garzik wrote:
> > > Tangent question, is it definitely to be named 2.6?
> >
> > I see no real reason to call it 3.0.
> >
> > The order-of-magnitude threading improvements might just come closest to
> > being a "new thing", but yeah, I still consider it 2.6.x. We don't have
> > new architectures or other really fundamental stuff. In many ways the
> > jump from 2.2 -> 2.4 was bigger than the 2.4 -> 2.6 thing will be, I
> > suspect.
>
> i consider the VM and IO improvements one of the most important things
> that happened in the past 5 years - and it's definitely something that
> users will notice. Finally we have a top-notch VM and IO subsystem (in
> addition to the already world-class networking subsystem) giving
> significant improvements both on the desktop and the server - the jump
> from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.

But is is as large as the jump from 1.2.x to 2.0.x?

> I think due to these improvements if we dont call the next kernel 3.0 then
> probably no Linux kernel in the future will deserve a major number. In 2-4
> years we'll only jump to 3.0 because there's no better number available
> after 2.8. That i consider to be ... boring :) [while kernel releases are
> supposed to be a bit boring, i dont think they should be _that_ boring.]

What is wrong with 2.10, or 2.256 for that matter?
--
Dr. Horst H. von Brand User #22616 counter.li.org
Departamento de Informatica Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria +56 32 654239
Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513

2002-09-29 01:25:18

by Linus Torvalds

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


On Sat, 28 Sep 2002, Ingo Molnar wrote:
>
> i consider the VM and IO improvements one of the most important things
> that happened in the past 5 years - and it's definitely something that
> users will notice. Finally we have a top-notch VM and IO subsystem (in
> addition to the already world-class networking subsystem) giving
> significant improvements both on the desktop and the server - the jump
> from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.

Hey, _if_ people actually are universally happy with the VM in the current
2.5.x tree, I'll happily call the dang thing 5.0 or whatever (just
kidding, but yeah, that would be a good enough reason to bump the major
number).

However, I'll believe that when I see it. Usually people don't complain
during a development kernel, because they think they shouldn't, and then
when it becomes stable (ie when the version number changes) they are
surprised that the behabviour didn't magically improve, and _then_ we get
tons of complaints about how bad the VM is under their load.

Am I hapyy with current 2.5.x? Sure. Are others? Apparently. But does
that mean that we have a top-notch VM and we should bump the major number?
I wish.

The block IO cleanups are important, and that was the major thing _I_
personally wanted from the 2.5.x tree when it was opened. I agree with you
there. But I don't think they are major-number-material.

Anyway, people who are having VM trouble with the current 2.5.x series,
please _complain_, and tell what your workload is. Don't sit silent and
make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
thing.

Linus

2002-09-29 06:08:57

by james

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Saturday 28 September 2002 08:31 pm, Linus Torvalds wrote:
> On Sat, 28 Sep 2002, Ingo Molnar wrote:
> > i consider the VM and IO improvements one of the most important things
> > that happened in the past 5 years - and it's definitely something that
> > users will notice. Finally we have a top-notch VM and IO subsystem (in
> > addition to the already world-class networking subsystem) giving
> > significant improvements both on the desktop and the server - the jump
> > from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.
>
> Hey, _if_ people actually are universally happy with the VM in the current
> 2.5.x tree, I'll happily call the dang thing 5.0 or whatever (just
> kidding, but yeah, that would be a good enough reason to bump the major
> number).
>
> However, I'll believe that when I see it. Usually people don't complain
> during a development kernel, because they think they shouldn't, and then
> when it becomes stable (ie when the version number changes) they are
> surprised that the behabviour didn't magically improve, and _then_ we get
> tons of complaints about how bad the VM is under their load.
>
> Am I hapyy with current 2.5.x? Sure. Are others? Apparently. But does
> that mean that we have a top-notch VM and we should bump the major number?
> I wish.
>
> The block IO cleanups are important, and that was the major thing _I_
> personally wanted from the 2.5.x tree when it was opened. I agree with you
> there. But I don't think they are major-number-material.
>
> Anyway, people who are having VM trouble with the current 2.5.x series,
> please _complain_, and tell what your workload is. Don't sit silent and
> make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> thing.
>
How many people are sitting on the sidelines waiting for guarantee that ide is
not going to blow up on our filesystems and take our data with it. Guarantee
that ide is working and not dangerous to our data, then I bet a lot more
people will come back and bang on 2.5.

I know this whole ide mess have taken me away from the devolemental series.
And I bet a lot of others.

My vote for reason to advance to v3.0 would be more based on our filesystems
surport. .i.e. XFS and the latest Reiserfs and redoing our middle layer,
.i.e. treating a cdrw as another drive instead of an ide-scsi device and
ridding us of /dev/[hs][dg][a=z] and replacing it with a lot saner
replacement (I know this talked about it, don't know if it has been or will
be implemented.) Along with the changes others have mentioned, but I really
can't judge those because I have not used 2.5 lately for reasons stated
above.

Sincerly

James




> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-09-29 06:51:58

by Andre Hedrick

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, james wrote:

> How many people are sitting on the sidelines waiting for guarantee that ide is
> not going to blow up on our filesystems and take our data with it. Guarantee
> that ide is working and not dangerous to our data, then I bet a lot more
> people will come back and bang on 2.5.
>
> I know this whole ide mess have taken me away from the devolemental series.
> And I bet a lot of others.

Your points are noted and taken, and once AC and I bang out the details in
2.4-ac series they are easily brought forward. I am staying off 2.5
until I can ramp back up the learning curve on the changing API's.

I really do not want to go in and change what Jens has port forwarded
until I have a complete grasp again. There are no more major changes at
this point and only delta's as needed to constrain concerns.

The only change could be the addition of SATA II support as soon as I
receive the WG's documents.

Cheers,

Andre Hedrick
Linux Serial ATA Solutions
LAD Storage Consulting Group

2002-09-29 07:56:06

by jbradford

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

> The block IO cleanups are important, and that was the major thing _I_
> personally wanted from the 2.5.x tree when it was opened. I agree with you
> there. But I don't think they are major-number-material.

I'd definitely have voted for stable IPV6 being a 3.0.x requirement, but I guess it's a bit late now :-/

> Anyway, people who are having VM trouble with the current 2.5.x series,
> please _complain_, and tell what your workload is. Don't sit silent and
> make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> thing.

I think the broken IDE in 2.5.x has meant that it got seriously less testing overall than previous development trees :-(. Maybe after halloween when it stabilises a bit more we'll get more reports in.

John

2002-09-29 08:19:18

by David Miller

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

From: [email protected]
Date: Sun, 29 Sep 2002 08:16:23 +0100 (BST)

I'd definitely have voted for stable IPV6 being a 3.0.x
requirement, but I guess it's a bit late now :-/

Not at all, the goal is to get a full USAGI merge at a minimum
by the end of October.

2002-09-29 09:15:42

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sat, Sep 28 2002, Linus Torvalds wrote:
>
> On Sat, 28 Sep 2002, Ingo Molnar wrote:
> >
> > i consider the VM and IO improvements one of the most important things
> > that happened in the past 5 years - and it's definitely something that
> > users will notice. Finally we have a top-notch VM and IO subsystem (in
> > addition to the already world-class networking subsystem) giving
> > significant improvements both on the desktop and the server - the jump
> > from 2.4 to 2.5 is much larger than from eg. 2.0 to 2.4.
>
> Hey, _if_ people actually are universally happy with the VM in the current
> 2.5.x tree, I'll happily call the dang thing 5.0 or whatever (just
> kidding, but yeah, that would be a good enough reason to bump the major
> number).

Works For Me, at _least_ as well as 2.4.20-pre kernels. On my desktop
machine it feels better. After a few days of uptime it's fairly easy to
feel how well a kernel performs for that workload. And 2.5.39 is just
smoother than current 2.4.

> The block IO cleanups are important, and that was the major thing _I_
> personally wanted from the 2.5.x tree when it was opened. I agree with you
> there. But I don't think they are major-number-material.

Dang :-)

--
Jens Axboe, rooting for 3.x

2002-09-29 09:12:46

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, [email protected] wrote:
> > Anyway, people who are having VM trouble with the current 2.5.x series,
> > please _complain_, and tell what your workload is. Don't sit silent and
> > make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> > thing.
>
> I think the broken IDE in 2.5.x has meant that it got seriously less
> testing overall than previous development trees :-(. Maybe after
> halloween when it stabilises a bit more we'll get more reports in.

2.5 is definitely desktop stable, so please test it if you can. Until
recently there was a personal show stopper for me, the tasklist
deadline. Now 2.5 is happily running on my desktop as well.

2.5 IDE stability should be just as good as 2.4-ac.

--
Jens Axboe

2002-09-29 11:14:02

by Murray J. Root

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29, 2002 at 11:12:29AM +0200, Jens Axboe wrote:
> On Sun, Sep 29 2002, [email protected] wrote:
> > > Anyway, people who are having VM trouble with the current 2.5.x series,
> > > please _complain_, and tell what your workload is. Don't sit silent and
> > > make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> > > thing.
> >
> > I think the broken IDE in 2.5.x has meant that it got seriously less
> > testing overall than previous development trees :-(. Maybe after
> > halloween when it stabilises a bit more we'll get more reports in.
>
> 2.5 is definitely desktop stable, so please test it if you can. Until
> recently there was a personal show stopper for me, the tasklist
> deadline. Now 2.5 is happily running on my desktop as well.
>
> 2.5 IDE stability should be just as good as 2.4-ac.
>
Hmm - our definitions must be different.

ASUS P4S533 (SiS645DX chipset)
P4 2Ghz
1G PC2700 RAM

Disable SMP, enable APIC & IO APIC
Get "WARNING - Unexpected IO APIC found"
system freezes

Disable IO APIC, enable ACPI
system detects ACPI, builds table, freezes.

Disable ACPI, enable ide-scsi in the kernel
kernel panic analyzing hdc

None of these have been reported because I haven't had time to do all the
work involved in making a report that anyone on the team will read.

--
Murray J. Root
------------------------------------------------
DISCLAIMER: http://www.goldmark.org/jeff/stupid-disclaimers/
------------------------------------------------
Mandrake on irc.openprojects.net:
#mandrake & #mandrake-linux = help for newbies
#mdk-cooker = Mandrake Cooker

2002-09-29 12:54:02

by Gerhard Mack

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

nOn Sun, 29 Sep 2002, james wrote:

> How many people are sitting on the sidelines waiting for guarantee that ide is
> not going to blow up on our filesystems and take our data with it. Guarantee
> that ide is working and not dangerous to our data, then I bet a lot more
> people will come back and bang on 2.5.
> James

Some of us are waiting until it actually compiles for us ;) (see previous
bug report)

Gerhard

--
Gerhard Mack

[email protected]

<>< As a computer I find your faith in technology amusing.

2002-09-29 13:40:59

by Dave Gilbert (Home)

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


In my case I gave 2.5.x an attempt at building on my x86 box a few weeks
ago but had to give up because of the lack of LVM which I rely on.

I fancy having a go on some of my non-x86 boxen; does anyone know the
state of 2.5.x for non-x86?

(Does anyone other than some marketing bods really care if it is 2.6 or
3.0 - I definitly don't).

Dave
---------------- Have a happy GNU millennium! ----------------------
/ Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \
\ gro.gilbert @ treblig.org | MIPS,x86,ARM, SPARC and HP-PA | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/

2002-09-29 13:52:28

by Wakko Warner

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

> In my case I gave 2.5.x an attempt at building on my x86 box a few weeks
> ago but had to give up because of the lack of LVM which I rely on.
>
> I fancy having a go on some of my non-x86 boxen; does anyone know the
> state of 2.5.x for non-x86?
>
> (Does anyone other than some marketing bods really care if it is 2.6 or
> 3.0 - I definitly don't).

I thought 2.4 should be 3.0 since 1.3 went to 2.0 =)

--
Lab tests show that use of micro$oft causes cancer in lab animals

2002-09-29 14:46:35

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 10:12, Jens Axboe wrote:
> 2.5 is definitely desktop stable, so please test it if you can. Until
> recently there was a personal show stopper for me, the tasklist
> deadline. Now 2.5 is happily running on my desktop as well.

Its very hard to make that assessment when the audio layer still doesnt
work, most scsi drivers havent been ported, most other drivers are full
of 2.4 fixed problems and so on.

Most of my boxes won't even run a 2.5 tree yet. I'm sure its hardly
unique. Middle of November we may begin to find out how solid the core
code actually is, as drivers get fixed up and also in the other
direction as we eliminate numerous crashes caused by "fixed in 2.4" bugs

2002-09-29 15:13:03

by Trever L. Adams

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 02:14, james wrote:
> How many people are sitting on the sidelines waiting for guarantee that ide is
> not going to blow up on our filesystems and take our data with it. Guarantee
> that ide is working and not dangerous to our data, then I bet a lot more
> people will come back and bang on 2.5.

I can tell you right now that I am one of these. I usually would have
been involved in testing it for my situations/needs several months ago,
but I have been very leary of the IDE and block changes. I have one
machine (a router) that I could test it on if I knew that the dangers of
IDE and block were at least low and that the IPv4 and associated
networking connection tracking and NAT stuff worked.

Trever

2002-09-29 15:21:34

by Matthias Andree

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sat, 28 Sep 2002, Linus Torvalds wrote:

> Am I hapyy with current 2.5.x? Sure. Are others? Apparently. But does
> that mean that we have a top-notch VM and we should bump the major number?
> I wish.
>
> The block IO cleanups are important, and that was the major thing _I_
> personally wanted from the 2.5.x tree when it was opened. I agree with you
> there. But I don't think they are major-number-material.
>
> Anyway, people who are having VM trouble with the current 2.5.x series,
> please _complain_, and tell what your workload is. Don't sit silent and
> make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> thing.

I personally have the feeling that 2.2.x performed better than 2.4.x
does, but I cannot go figure because I'm using ReiserFS 3.6 file
systems. I'd also really like to give Linux 2.5.39 or whatever is
current a whirl, but I'm currently using LVM and I'd need anything to
read that. Which one (EVMS or LVM2) is an ignorant-proof install and
reliable enough to read old LVM1 partitions and volumes?

2002-09-29 15:29:20

by Andi Kleen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

[email protected] writes:

> > The block IO cleanups are important, and that was the major thing _I_
> > personally wanted from the 2.5.x tree when it was opened. I agree with you
> > there. But I don't think they are major-number-material.
>
> I'd definitely have voted for stable IPV6 being a 3.0.x requirement, but I guess it's a bit late now :-/

Actually current IPv6 is stable and has been for a long time, it's just not
completely standards compliant (but still quite usable for a lot of people)

If you mean stable implies the latest whizbang features you have a different
meaning of stable than me.

-Andi

2002-09-29 15:33:54

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Alan Cox wrote:
> On Sun, 2002-09-29 at 10:12, Jens Axboe wrote:
> > 2.5 is definitely desktop stable, so please test it if you can. Until
> > recently there was a personal show stopper for me, the tasklist
> > deadlock. Now 2.5 is happily running on my desktop as well.
>
> Its very hard to make that assessment when the audio layer still doesnt
> work, most scsi drivers havent been ported, most other drivers are full
> of 2.4 fixed problems and so on.

I can only talk for myself, 2.5 works fine here on my boxes. Dunno what
you mean about audio layer, emu10k works for me.

SCSI drivers can be a real problem. Not the porting of them, most of
that is _trivial_ and can be done as we enter 3.0-pre and people show up
running that on hardware that actually needs to be ported. The worst bit
is error handling, this I view as the only problem.

Update of drivers to 2.4 level is mainly a matter of Dave (or someone
else) resyncing his -dj tree and feeding it back to Linus.

> Most of my boxes won't even run a 2.5 tree yet. I'm sure its hardly
> unique. Middle of November we may begin to find out how solid the core
> code actually is, as drivers get fixed up and also in the other
> direction as we eliminate numerous crashes caused by "fixed in 2.4" bugs

Well why don't they run with 2.5?

Alan, I think you are a pessimist painting a much bleaker picture of 2.5
than it deserves. Sure lots of drivers may be broken still, I would be
naive if I thought that this is all changed in time for oct 31. Most of
these will not be fixed until people actually _use_ 2.5 (or 3.0-pre, or
whatever it will be called), and that will not happen until Linus
actually releases a -rc or similar. And so the fsck what? Noone expects
2.6-pre/3.0-pre to be perfect.

I'm not worried.

--
Jens Axboe

2002-09-29 15:38:26

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Dr. David Alan Gilbert wrote:
>
> In my case I gave 2.5.x an attempt at building on my x86 box a few weeks
> ago but had to give up because of the lack of LVM which I rely on.

This is a good point. Noone has cared enough about LVM to work on it,
looking at the code in the kernel I cannot blame them. Sistina have
abandoned 2.5 LVM.

Has anyone actually sent patches to Linus removing LVM completely from
2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
exactly that long ago. Linus, what's your oppinion on this?

--
Jens Axboe

2002-09-29 15:40:12

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Trever L. Adams wrote:
> On Sun, 2002-09-29 at 02:14, james wrote:
> > How many people are sitting on the sidelines waiting for guarantee
> > that ide is not going to blow up on our filesystems and take our
> > data with it. Guarantee that ide is working and not dangerous to our
> > data, then I bet a lot more people will come back and bang on 2.5.
>
> I can tell you right now that I am one of these. I usually would have
> been involved in testing it for my situations/needs several months
> ago, but I have been very leary of the IDE and block changes. I have
> one machine (a router) that I could test it on if I knew that the
> dangers of IDE and block were at least low and that the IPv4 and
> associated networking connection tracking and NAT stuff worked.

How many accounts of the new block layer corrupting data have you been
aware of? Since 2.5.1-preX when bio was introduced, I know of one such
bug: floppy, due to the partial completion changes. Hardly critical.

--
Jens Axboe

2002-09-29 15:45:48

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Murray J. Root wrote:
> On Sun, Sep 29, 2002 at 11:12:29AM +0200, Jens Axboe wrote:
> > On Sun, Sep 29 2002, [email protected] wrote:
> > > > Anyway, people who are having VM trouble with the current 2.5.x series,
> > > > please _complain_, and tell what your workload is. Don't sit silent and
> > > > make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> > > > thing.
> > >
> > > I think the broken IDE in 2.5.x has meant that it got seriously less
> > > testing overall than previous development trees :-(. Maybe after
> > > halloween when it stabilises a bit more we'll get more reports in.
> >
> > 2.5 is definitely desktop stable, so please test it if you can. Until
> > recently there was a personal show stopper for me, the tasklist
> > deadline. Now 2.5 is happily running on my desktop as well.
> >
> > 2.5 IDE stability should be just as good as 2.4-ac.
> >
> Hmm - our definitions must be different.

Not necessarily, you may just have worse luck than me.

> ASUS P4S533 (SiS645DX chipset)
> P4 2Ghz
> 1G PC2700 RAM
>
> Disable SMP, enable APIC & IO APIC
> Get "WARNING - Unexpected IO APIC found"
> system freezes
>
> Disable IO APIC, enable ACPI
> system detects ACPI, builds table, freezes.
>
> Disable ACPI, enable ide-scsi in the kernel
> kernel panic analyzing hdc
>
> None of these have been reported because I haven't had time to do all the
> work involved in making a report that anyone on the team will read.

But you have time to write this email and complain that it doesn't work?
-> /dev/null, until you send proper reports.

--
Jens Axboe

2002-09-29 15:53:43

by Trever L. Adams

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 11:45, Jens Axboe wrote:
> How many accounts of the new block layer corrupting data have you been
> aware of? Since 2.5.1-preX when bio was introduced, I know of one such
> bug: floppy, due to the partial completion changes. Hardly critical.
>
> --
> Jens Axboe

Sorry Jens, I never meant to imply I had heard of any since that floppy
bug. I just understand there were some problems at the beginning.
Also, I haven't been able to follow LKM as well as I would have liked
lately, but a few months ago, in one of the many IDE bash sessions that
have happened in 2.5.x I read a few people blaiming some of the problems
on interactions between the new block layer and the IDE layer.

Sorry about the worries. I am just trying to be cautious. I am
guessing you are saying that the block layer is now solid? If this is
the case, it sure knocks a few of my worries out of the ball park and I
will be that much closer to trying out 2.5.x myself.

Trever ADams

2002-09-29 16:07:39

by Trever L. Adams

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 12:06, Jens Axboe wrote:
> Nah I'm saying that it's always been solid. Why would I suddenly
> destabilize it now? :-)
>

Close enough. Thank you.

> > the case, it sure knocks a few of my worries out of the ball park and I
> > will be that much closer to trying out 2.5.x myself.
>
> As always, it's untested territory so a backup may be in order. But I
> don't view testing 2.5 as any more dangerous as testing 2.4-ac.
>
> --
> Jens Axboe

I used to religiously test out ac kernels (in the 2.2, 2.3.x and early
2.4.x days). I don't anymore, so the comparison may not be valid here.
Anyway, I will try to either test 2.5.x on my router or else find a box
I can play with that doesnt' have so much important data on it. (I hate
to say it, but I haven't been able to afford, $$ wise, backup for a few
years... I know... I can't afford not to either).

Trever

2002-09-29 16:00:54

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Murray J. Root wrote:

> ASUS P4S533 (SiS645DX chipset)
> P4 2Ghz
> 1G PC2700 RAM
>
> Disable SMP, enable APIC & IO APIC
> Get "WARNING - Unexpected IO APIC found"
> system freezes

Send the subsequent messages (iirc it prints some verbose info about the
IOAPIC in question).

> Disable IO APIC, enable ACPI
> system detects ACPI, builds table, freezes.

Send messages, motherboard/chipset info..

> Disable ACPI, enable ide-scsi in the kernel
> kernel panic analyzing hdc

ditto.

> None of these have been reported because I haven't had time to do all the
> work involved in making a report that anyone on the team will read.

Shouldn't take too long, most time would be spent writing them down if you
can't retrieve via serial console.

Zwane
--
function.linuxpower.ca

2002-09-29 16:00:57

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Trever L. Adams wrote:
> On Sun, 2002-09-29 at 11:45, Jens Axboe wrote:
> > How many accounts of the new block layer corrupting data have you been
> > aware of? Since 2.5.1-preX when bio was introduced, I know of one such
> > bug: floppy, due to the partial completion changes. Hardly critical.
> >
> > --
> > Jens Axboe
>
> Sorry Jens, I never meant to imply I had heard of any since that floppy
> bug. I just understand there were some problems at the beginning.
> Also, I haven't been able to follow LKM as well as I would have liked
> lately, but a few months ago, in one of the many IDE bash sessions that
> have happened in 2.5.x I read a few people blaiming some of the problems
> on interactions between the new block layer and the IDE layer.

No worries. I can understand how people would be weary of block layer
changes, as they have the potential to corrupt your data.

> Sorry about the worries. I am just trying to be cautious. I am
> guessing you are saying that the block layer is now solid? If this is

Nah I'm saying that it's always been solid. Why would I suddenly
destabilize it now? :-)

> the case, it sure knocks a few of my worries out of the ball park and I
> will be that much closer to trying out 2.5.x myself.

As always, it's untested territory so a backup may be in order. But I
don't view testing 2.5 as any more dangerous as testing 2.4-ac.

--
Jens Axboe

2002-09-29 16:14:25

by Dave Jones

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29, 2002 at 05:42:54PM +0200, Jens Axboe wrote:

> Has anyone actually sent patches to Linus removing LVM completely from
> 2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
> exactly that long ago. Linus, what's your oppinion on this?

Joe Thornber sent a patch removing LVM1, but LVM2 has yet to
make an appearance in 2.5.x patchform afair. LVM is in one of
those sneaky positions where they could theoretically cheat
the feature freeze, as whats in the tree right now is fubar,
and we need /something/ before going 2.6/3.0.

It'd be nice to get /something/ in before the feature freeze so
people can bang on this after halloween when we ramp up stability
testing instead of waiting until the last minute.

There are some patches in -dj which make the existing LVM1 code
compile and 'sort of' work, but they're not fit for inclusion imo.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-09-29 16:12:49

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Alan Cox wrote:
> On Sun, 2002-09-29 at 16:42, Jens Axboe wrote:
> > Has anyone actually sent patches to Linus removing LVM completely from
> > 2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
> > exactly that long ago. Linus, what's your oppinion on this?
>
> I added LVM2 a while ago for my 2.4-ac tree and haven't looked back, its
> much nicer code and its clean and easy to understand. I wouldnt
> guarantee its bug free but its the kind of code where you can *find* a
> bug if one turns up

As far as I'm concerned that settles it for me. I'll check up on 2.5
lvm2 status tomorrow.

--
Jens Axboe

2002-09-29 16:10:42

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 16:42, Jens Axboe wrote:
> Has anyone actually sent patches to Linus removing LVM completely from
> 2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
> exactly that long ago. Linus, what's your oppinion on this?

I added LVM2 a while ago for my 2.4-ac tree and haven't looked back, its
much nicer code and its clean and easy to understand. I wouldnt
guarantee its bug free but its the kind of code where you can *find* a
bug if one turns up

2002-09-29 16:12:23

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 16:26, Matthias Andree wrote:
> I personally have the feeling that 2.2.x performed better than 2.4.x
> does, but I cannot go figure because I'm using ReiserFS 3.6 file

On low end boxes the benchmarks I did show later 2.4-rmap beats 2.2. 2.0
worked suprisingly well (better than pre-rmap 2.4) and as Stephen
claimed the best code was about 2.1.100, 2.2 then dropped badly from
that point.

Low memory is of course where rmap does best, so the 2.4-rmap v 2.4
parts of such testing are not actually that useful


2002-09-29 16:22:22

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Dave Jones wrote:
> On Sun, Sep 29, 2002 at 05:42:54PM +0200, Jens Axboe wrote:
>
> > Has anyone actually sent patches to Linus removing LVM completely from
> > 2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
> > exactly that long ago. Linus, what's your oppinion on this?
>
> Joe Thornber sent a patch removing LVM1, but LVM2 has yet to
> make an appearance in 2.5.x patchform afair. LVM is in one of
> those sneaky positions where they could theoretically cheat
> the feature freeze, as whats in the tree right now is fubar,
> and we need /something/ before going 2.6/3.0.

Indeed. Joe, what's the status on dm2 for 2.5? I seem to recall seeing
patches for 2.5, maybe even as long as 6 months ago.

> It'd be nice to get /something/ in before the feature freeze so
> people can bang on this after halloween when we ramp up stability
> testing instead of waiting until the last minute.

Yep, as far as I'm concerned, if a 2.5 dm2 is in decent shape then I'd
glady kill lvm1 immediately.

--
Jens Axboe

2002-09-29 16:24:38

by Dave Jones

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29, 2002 at 05:38:17PM +0200, Jens Axboe wrote:

> Update of drivers to 2.4 level is mainly a matter of Dave (or someone
> else) resyncing his -dj tree and feeding it back to Linus.

Theres still boatloads of bits in my tree (around 4MB worth),
last night I spent some time banging on it trying to get things
into a usable, testable state again. The fact it doesn't boot
on my testboxes right now is somewhat limiting, as is being
buried alive in non-2.5 work.

> > Most of my boxes won't even run a 2.5 tree yet. I'm sure its hardly
> > unique. Middle of November we may begin to find out how solid the core
> > code actually is, as drivers get fixed up and also in the other
> > direction as we eliminate numerous crashes caused by "fixed in 2.4" bugs
> Well why don't they run with 2.5?

Probably numerous reasons (as me). My laptop hangs on boot (no idea why),
my VIA C3 box dies with preemption, some other boxes are still unusable
due to broken SCSI drivers afair.

> Alan, I think you are a pessimist painting a much bleaker picture of 2.5
> than it deserves. Sure lots of drivers may be broken still, I would be
> naive if I thought that this is all changed in time for oct 31.

There's mountains of silly one liner fixes for various problems
(from compile fixes to stability to security issues) in my tree
that need pushing to Linus, the hard part right now is finding
time to do so, but lots of it can even wait until after the feature freeze.
What's important right now is getting everything in that we *need*
included, (biggest absense imo is probably a replacement LVM right now)

> Most of
> these will not be fixed until people actually _use_ 2.5 (or 3.0-pre, or
> whatever it will be called), and that will not happen until Linus
> actually releases a -rc or similar. And so the fsck what? Noone expects
> 2.6-pre/3.0-pre to be perfect.

*nods*, and with the addition of the various debugging aids that have
popped up in the last week or so, I've no doubt we're on track to nail
down a lot more hard-to-find bugs than we ever have been before long
before hitting a x.x.0 release

Dave

--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs

2002-09-29 16:39:34

by Bjoern A. Zeeb

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Jens Axboe wrote:

Hi,

> On Sun, Sep 29 2002, Alan Cox wrote:
> > On Sun, 2002-09-29 at 10:12, Jens Axboe wrote:
> > > 2.5 is definitely desktop stable, so please test it if you can. Until
> > > recently there was a personal show stopper for me, the tasklist
> > > deadlock. Now 2.5 is happily running on my desktop as well.
> >
> > Its very hard to make that assessment when the audio layer still doesnt
> > work, most scsi drivers havent been ported, most other drivers are full
> > of 2.4 fixed problems and so on.
>
> I can only talk for myself, 2.5 works fine here on my boxes. Dunno what
> you mean about audio layer, emu10k works for me.
>
> SCSI drivers can be a real problem. Not the porting of them, most of
[snip]

simply replying to one of you all ...

Most important problem I currently see is that one of two kernels
do not boot on my MP machine I use as a workstation.

Apart from that and after early 2.5.3x probs were sorted out
I already had 2.5-bk-kernels running and did the following on that
MP machine:

- compiled linux-2.5-bks
- compiled X (runs with multi head)
- listend to music (emu10k)
- watched TV (bttv)
- burned CDs (SCSI)
- ran amanda: dumped multiple input streams from network to IDE disks
before writing to SCSI tape
- ran vmware (after patchwork to compile ;-)
- started looking at sym53c416 cli() removal and had the scanner
doing his work (started to debug some pnp things there too, results
to be posted)
- changed to devfs
- printing and serial are fine too
- the new input stuff now behaves properly too

often did multiple things in parallel (watching tv while compiling
a new kernel, ...)

had really few crashes (~4-6 since 2.5.34)
had some compilation probs with modules and MP but they got either
fixed too fast or patches went into bk within 1-2 days :-)

Going to check JFS (and XFS) in the near future...

So I think I am either one almost happy person with a lotta luck or
you all (did) do a very excellent job!!! ... but please get those
MP (boot) probs sorted out ;-)

Before you start asking what probs: this time it's around ACPI init.

--- snipp ---
PCI: PCI BIOS revision 2.10 entry at 0xfdb91, last bus=1
PCI: Using configuration type 1
ACPI: Subsystem revision 20020918
tbxface-0099 [03] Acpi_load_tables : ACPI Tables successfully loaded
Parsing Methods:......................................................................................................
Table [DSDT] - 309 Objects with 22 Devices 102 Methods 19 Regions
ACPI Namespace successfully loaded at root c03a741c
--- dead end where no keyboard or serial console sysreqs are answered ---


so it must be around ... and I assume it's mp_config_ioapic_for_sci()
but still have to trace ...

--- drivers/acpi/bus.c:606 ---
/*
* Get a separate copy of the FADT for use by other drivers.
*/
status = acpi_get_table(ACPI_TABLE_FADT, 1, &buffer);
if (ACPI_FAILURE(status)) {
printk(KERN_ERR PREFIX "Unable to get the FADT\n");
goto error1;
}

#ifdef CONFIG_X86
/* Ensure the SCI is set to level-triggered, active-low */
if (acpi_ioapic)
mp_config_ioapic_for_sci(acpi_fadt.sci_int);
else
eisa_set_level_irq(acpi_fadt.sci_int);
#endif

status = acpi_enable_subsystem(ACPI_FULL_INITIALIZATION);
if (ACPI_FAILURE(status)) {
printk(KERN_ERR PREFIX "Unable to start the ACPI Interpreter\n");
goto error1;
}
--- end ---

--
Greetings

Bjoern A. Zeeb bzeeb at Zabbadoz dot NeT
56 69 73 69 74 http://www.zabbadoz.net/

2002-09-29 17:02:26

by Jochen Friedrich

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

Hi Gerhard,

> Some of us are waiting until it actually compiles for us ;) (see previous
> bug report)

Ack (on Alpha), and waiting that after compiling, it also boots :-)

My Avanti (currently running 2.5.18):

cat /proc/cpuinfo
cpu : Alpha
cpu model : EV4
cpu variation : 0
cpu revision : 0
cpu serial number : Linux_is_Great!
system type : Avanti
system variation : 0
system revision : 0
system serial number : MILO-2.2-18
cycle frequency [Hz] : 166521620
timer frequency [Hz] : 1024.00
page size [bytes] : 8192
phys. address bits : 34
max. addr. space # : 63
BogoMIPS : 326.08
kernel unaligned acc : 7671003
(pc=fffffc0000954730,va=fffffc00052da056)
user unaligned acc : 252 (pc=120011758,va=12006c7e4)
platform string : N/A
cpus detected : 0

with

CONFIG_FB_ATY=y
CONFIG_FB_ATY_GX=y
CONFIG_FB_ATY_CT=y

i just get a black screen with a wild jumping cursor and than a hang. With
"normal" console, the boot dies with an zero-pointer exception.

I'll try to compile 2.5.39 and send more details about the compile
failures and boot exceptions...

--jochen

2002-09-29 17:21:31

by Jochen Friedrich

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

Hi Andi,

> Actually current IPv6 is stable and has been for a long time, it's just not
> completely standards compliant (but still quite usable for a lot of people)

For end systems (no router) with static IPv6 definitions this seems to be
true. However, for machines which use autoconfiguration (stateless as
there isn't a usable IPv6 capable DHCP server AFAIK) or act as routers,
the current state of the implementation of the default route can best be
described as buggy. (Autoconfigured machines seem to loose their default
route after some time, e.g.).

Also, there could be a better communication between the kernel and the
resolver to check if if IPv6 is available, at all. Currently, on IPv4 only
kernels, we often see dialogs like this:

ssh -v mail.scram.de
OpenSSH_3.4p1 Debian 1:3.4p1-2.1, SSH protocols 1.5/2.0, OpenSSL
0x0090607f
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Rhosts Authentication disabled, originating port will not be
trusted.
debug1: ssh_connect: needpriv 0
debug1: Connecting to mail.scram.de [3ffe:400:470:1::1:1] port 22.
socket: Address family not supported by protocol
debug1: Connecting to mail.scram.de [195.226.127.117] port 22.
debug1: Connection established.

So IPv6 is returned by the resolver even though IPv6 isn't available in
the kernel. The default of the resolver options should be dependent
on the presence or absence of IPv6 in the currently running kernel IMHO.

Finally, IPv6 sockets which also communicate over IPv4 using mapped
addresses are considered bad nowadays ;-)

Cheers,
--jochen

2002-09-29 17:35:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


On Sun, 29 Sep 2002, james wrote:
>
> How many people are sitting on the sidelines waiting for guarantee that ide is
> not going to blow up on our filesystems and take our data with it. Guarantee
> that ide is working and not dangerous to our data, then I bet a lot more
> people will come back and bang on 2.5.

How the hell can I _guarantee_ anything like that?

I can say that the IDE code is the same code that is in 2.4.x, so if
you're comfortable with 2.4.x wrt IDE, then you should be comfy with
2.5.x too.

Linus

2002-09-29 17:56:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


On 29 Sep 2002, Alan Cox wrote:
>
> Its very hard to make that assessment when the audio layer still doesnt
> work,

Which reminds me: it would be good to have somebody try to merge stuff
from the ALSA tree.

ALSA never got out of their CVS mentality, and apparently nobody bothers
to do incrementeal merges. Is anybody interested and listening?

Linus

2002-09-29 17:50:25

by Rik van Riel

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Linus Torvalds wrote:

> How the hell can I _guarantee_ anything like that?

"Quality IDE code, or your disk space back"

No wait, that didn't come out quite right...

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-29 18:09:06

by Jaroslav Kysela

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Linus Torvalds wrote:

>
> On 29 Sep 2002, Alan Cox wrote:
> >
> > Its very hard to make that assessment when the audio layer still doesnt
> > work,
>
> Which reminds me: it would be good to have somebody try to merge stuff
> from the ALSA tree.
>
> ALSA never got out of their CVS mentality, and apparently nobody bothers
> to do incrementeal merges. Is anybody interested and listening?

I am doing that. It seems that you have rejected my big patch, so I am
trying to split our changed to small chunks. I have about 10 patches, I will
send them to you and lkml. All patches are in BK style with imported
comments from CVS.

Jaroslav

-----
Jaroslav Kysela <[email protected]>
Linux Kernel Sound Maintainer
ALSA Project http://www.alsa-project.org
SuSE Linux http://www.suse.com

2002-09-29 18:14:09

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 18:42, Linus Torvalds wrote:
> I can say that the IDE code is the same code that is in 2.4.x, so if
> you're comfortable with 2.4.x wrt IDE, then you should be comfy with
> 2.5.x too.

*NO*

The IDE code is the experimental code in 2.4-ac. It is _NOT_ the IDE
code in 2.4 and its a lot less tested. I don't think it has any
corruption bugs but it is most definitely not the base 2.4 code and has
plenty of non corruption bugs (PCMCIA hang, taskfile write hang, irq
blocking performance problems)

I use the 2.4-ac version of that code for day to day work. Thats about
as good a guarantee as I can give.

Alan

2002-09-29 19:50:19

by james

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


Upon thinking about 2.6 v3.0 argument, I think we may be looking at this
version comparison in the wrong light, it is not wether we have come far
enough from 2.4.x to make it 3.0 it is wether we have change enough from
version 2.0.x.

When I compare running linux 2.0.x to running what will be the next version we
are looking at a completely different system. For example in v2.0 the only
file system choices were ext2 or DOS, with a few others that wern't in wide
spread use. where you created small partitions to keep fsck's fast, even if
you had battery backup, you were still basicly limited to 8 gig file systems.
Today we have ext2, ext3, reiserfs, JFS, XFS, in the last four, journaling
capabilities. it is possible and expected have huge filesystems and patches
exist to break the 2 terabyte file systems exist in various stages of
testing. Not to mention we have LVM, and raid file systems, being used on
desktop as well server systems.

Networking has changed as well, we went from mostly 10mbit eternet cards and a
few 100 mbit cards, to now having 100mbit ethernet as the base of home
networking, not to mention gigabit ethernet, and ATM gaining popularity in
the server market, while they are just drivers, the real shift of thinking
comes in zero copy file transfer and a mature state of the art
firewalling/routing/bridging etc. in NAT and iptables

For video we changed from base VGA video text and X, to acellerated video
processors not just in X, but in framebuffers used as consoles.

We also have support for diverse set of buses, that change the way we think
about our system, multiple bridges on PCI, USB v1 and v2, to firewire.

I will let others more in the know in memory management, discuss the finer
points of this one, but it is a major change, in 2.0 we just killed random
programs when out of memory. today we make a slightly more educated guess as
what to kill when we are out of memory, not to mention a just one base mix of
address support, I think it was 2gig user and 2gig, Today we can choose, 1.
2, or 3 gig of kernel space. Large memory support in the Kernel , supporting
36bit memory accessing, That support more memory than I will ever see in the
near future.

we have changed from a System that barely supported smp with 2 processors with
basicly one big kernel lock to a system with finely grained locks and
semaphores and subsystem spinlocks, that has decent performance on 8+ cpu
systems. Numa system surport also appeared since version 2.0.x

In 2.0.0 we had a 15bit pid with a maximum of 1000 active ( i beleve it is
less than this) today we have a 32+bit pid on the table with support of many
more active processes. of couse we have numourous internal file systems that
did not exist, tmpfs, devfs, etc..... and changed the way we all think about
our systems.

A prempted kernel, need I say more.


well that is just a small list of the globals systems that change the way we
think of linux.

If we continue to justify major version changes based on change in minor
version to minor version, can we expect linux 2.98,x in the future? In each
minor version we rewrite one or two subsytems. And these take many months to
plan, complete and test, so big enough change in a single minor version
number to minor version may not be possible at the current size of this
devolement effort, So yes we have come far enougth from v2.0.x to justify a
version 3.0.x. If I was a marketing person I would call it linux 3.0.0
enterprize edition, if we can get LVM2, raid and break the 2 terabyte
filesystem limit along with what we allready have accomplised.

Just my opionion

James







2002-09-29 21:11:24

by Russell King

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29, 2002 at 05:38:17PM +0200, Jens Axboe wrote:
> SCSI drivers can be a real problem. Not the porting of them, most of
> that is _trivial_ and can be done as we enter 3.0-pre and people show up
> running that on hardware that actually needs to be ported. The worst bit
> is error handling, this I view as the only problem.

2.4.19 SCSI error handling leaves a lot to be desired currently. I have
a growing pile of patches that fix up that mess. They are/have been having
an airing on linux-scsi.

Unfortunately, Alan seems to be ignoring those which linux-scsi is happy
with for unknown reasons currently, so I haven't sent them to Marcelo
(even the ones linux-scsi have said should go to Marcelo; I'd prefer them
to get an airing and some feedback from elsewhere first.)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-09-29 21:22:15

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 2002-09-29 at 22:16, Russell King wrote:
> Unfortunately, Alan seems to be ignoring those which linux-scsi is happy
> with for unknown reasons currently,

Because I've been in Finland


2002-09-29 21:41:35

by Matthias Andree

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Dave Jones wrote:

> Joe Thornber sent a patch removing LVM1, but LVM2 has yet to
> make an appearance in 2.5.x patchform afair. LVM is in one of
> those sneaky positions where they could theoretically cheat
> the feature freeze, as whats in the tree right now is fubar,
> and we need /something/ before going 2.6/3.0.

Is not EVMS ready for the show? Is Linux >=2.6 going to have LVM2 and
EVMS? Or just LVM2? I'm not aware of the current status, but I do recall
having seen EVMS stable announcements (but not sure about 2.5 status).

> It'd be nice to get /something/ in before the feature freeze so
> people can bang on this after halloween when we ramp up stability
> testing instead of waiting until the last minute.

Indeed.

2002-09-29 21:46:45

by Matthias Andree

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Jens Axboe wrote:

> SCSI drivers can be a real problem. Not the porting of them, most of
> that is _trivial_ and can be done as we enter 3.0-pre and people show up
> running that on hardware that actually needs to be ported. The worst bit
> is error handling, this I view as the only problem.

And a long-standing one. This should have been fixed in 2.2, it has not
been fixed in 2.4, it's much desired for 2.6 -- and people are going to
point away from Linux (and expect J?rg Schilling speaking up again
should 2.6 be released with what he considers broken API -- I cannot
tell if all his items are right, but if a third of what he says is true,
Linux SCSI is not in good shape).

2002-09-29 21:55:02

by Matthias Andree

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Alan Cox wrote:

> On Sun, 2002-09-29 at 16:26, Matthias Andree wrote:
> > I personally have the feeling that 2.2.x performed better than 2.4.x
> > does, but I cannot go figure because I'm using ReiserFS 3.6 file
>
> On low end boxes the benchmarks I did show later 2.4-rmap beats 2.2. 2.0
> worked suprisingly well (better than pre-rmap 2.4) and as Stephen
> claimed the best code was about 2.1.100, 2.2 then dropped badly from
> that point.

Granted, but I don't expect any roll-back to happen. If Stephen can dig
up the best version VM-wise, then if somebody could benchmark 2.6pre
against 2.1.BEST, that might be a good competition to 2.6pre -- modulo
different application profile, of course.

My major concern is usability: VM can be so bad it freezes hell or so
good it brings instant world peace: It won't buy me anything if I cannot
get to my data because LVM1 is unusable and neither EVMS nor LVM2 is in.
I'd like to test-drive 2.5, but booting my kernel and mounting a small
root partition from ext3 (non-LVM) and going without /usr and /opt
(because these are in LVM) is not terribly helpful to give it a try.

It's some big things that must be fixed before the tuning (towards
stability, fixes, performance) can take place. You really can't do the
tasting before you've put the meat in.

2002-09-29 21:39:40

by steve

[permalink] [raw]
Subject: Re: v2.6 vs v3.0



We did catch flak on stability issues on 2.4 for whatever the
reasons. The way I see it we should not move to 3.0 until it's been
running stable under at least 2.6. The less technical the person
the more valuable perception becomes. By only moving to 3.0 when
2.x is seen as totally stable, more new (corporate) people will
consider it as the foundation for their infrastructure. Look at the
views of 2.2...

Besides, stability must be more important than features!

--

Steve Szmidt
______________________________________________________

2002-09-29 23:55:12

by Andi Kleen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

Jochen Friedrich <[email protected]> writes:

> Hi Andi,
>
> > Actually current IPv6 is stable and has been for a long time, it's just not
> > completely standards compliant (but still quite usable for a lot of people)
>
> For end systems (no router) with static IPv6 definitions this seems to be
> true. However, for machines which use autoconfiguration (stateless as
> there isn't a usable IPv6 capable DHCP server AFAIK) or act as routers,
> the current state of the implementation of the default route can best be
> described as buggy. (Autoconfigured machines seem to loose their default
> route after some time, e.g.).

Are you sure this is not related to the routing daemon or rdisc daemon you
use ? In the past when I had problems with lost default routes always such
a daemon was to blame.

> So IPv6 is returned by the resolver even though IPv6 isn't available in
> the kernel. The default of the resolver options should be dependent
> on the presence or absence of IPv6 in the currently running kernel IMHO.

Sounds more like an glibc issue. I would file a glibc gnats bug on this,
then it may even get fixed. The kernel has nothing to do with this at least.

> Finally, IPv6 sockets which also communicate over IPv4 using mapped
> addresses are considered bad nowadays ;-)

Hmm?

-Andi

2002-09-30 00:35:21

by Jeff Chua

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


On 29 Sep 2002, Alan Cox wrote:

> On Sun, 2002-09-29 at 16:42, Jens Axboe wrote:
> > Has anyone actually sent patches to Linus removing LVM completely from
> > 2.5 and adding the LVM2 device mapper? If I used LVM, I would have done
> > exactly that long ago. Linus, what's your oppinion on this?
>
> I added LVM2 a while ago for my 2.4-ac tree and haven't looked back, its
> much nicer code and its clean and easy to understand. I wouldnt
> guarantee its bug free but its the kind of code where you can *find* a
> bug if one turns up

I can't even get past "make apply-patches" with device-mapper.0.96.04 on
2.5.39.

Anyone running lvm2 on 2.5.3x ?

Thanks,
Jeff


2002-09-30 07:00:13

by kaih

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

[email protected] (Jens Axboe) wrote on 29.09.02 in <[email protected]>:

> On Sun, Sep 29 2002, Murray J. Root wrote:

> > None of these have been reported because I haven't had time to do all the
> > work involved in making a report that anyone on the team will read.
>
> But you have time to write this email and complain that it doesn't work?
> -> /dev/null, until you send proper reports.

That was precisely the point, no?

For some people, this goes "bake kernel, make sure nobody is doing
something critical, reboot, hang, curse, reboot to old kernel, apologize
for delay, stop fiddling with this thing for today" as the machine in
question needs to do other stuff.

That's certainly the reason why I haven't figured out yet why our damn
"new" central server doesn't boot bloody 2.4 without hanging - I certainly
don't *want* to run 2.2 on that thing. Probably config options.

MfG Kai

2002-09-30 07:01:29

by kaih

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

[email protected] (Trever L. Adams) wrote on 29.09.02 in <[email protected]>:

> I can play with that doesnt' have so much important data on it. (I hate
> to say it, but I haven't been able to afford, $$ wise, backup for a few
> years... I know... I can't afford not to either).

Tape drive cost?

One idea we've come up (and surely we're not the only ones) is to use
cheap IDE disks for backup, possibly in a cold-swappable insert. As long
as you can keep several backups per disk (say using some of those 100GB
disks), preferrably even on a different machine, that's fairly cheap.

If you want to keep daily backups for a week, weekly for a year, and all
on separate media, of course, that's *not* cheap with this method, and
even DLT or similar prices become acceptable in comparision. But it
certainly beats *no* backup!

MfG Kai

2002-09-30 07:25:41

by Tomas Szepe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

> > SCSI drivers can be a real problem. Not the porting of them, most of
> > that is _trivial_ and can be done as we enter 3.0-pre and people show up
> > running that on hardware that actually needs to be ported. The worst bit
> > is error handling, this I view as the only problem.
>
> And a long-standing one. This should have been fixed in 2.2, it has not
> been fixed in 2.4, it's much desired for 2.6 -- and people are going to
> point away from Linux (and expect J?rg Schilling speaking up again
> should 2.6 be released with what he considers broken API -- I cannot
> tell if all his items are right, but if a third of what he says is true,
> Linux SCSI is not in good shape).

As long as most of that bloke's argumentation strips down to "you don't do
it like everyone else [solaris/irix/whatever] implies you're bound to suck,"
nobody with a bit of sense is going to take him seriously regardless of how
much blah blah he posts on l-k.

T.

2002-09-30 07:51:43

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29 2002, Alan Cox wrote:
> On Sun, 2002-09-29 at 18:42, Linus Torvalds wrote:
> > I can say that the IDE code is the same code that is in 2.4.x, so if
> > you're comfortable with 2.4.x wrt IDE, then you should be comfy with
> > 2.5.x too.
>
> *NO*
>
> The IDE code is the experimental code in 2.4-ac. It is _NOT_ the IDE
> code in 2.4 and its a lot less tested. I don't think it has any
> corruption bugs but it is most definitely not the base 2.4 code and has
> plenty of non corruption bugs (PCMCIA hang, taskfile write hang, irq
> blocking performance problems)

2.5 at least does not have the taskfile hang, because I killed taskfile
io.

--
Jens Axboe

2002-09-30 09:11:17

by Denis Vlasenko

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On 28 September 2002 07:16, jw schultz wrote:
> Ingo, I agree with Linus. My recollection of when we moved
> to 2.0 was that the major number reflected the user<->kernel
> ABI. I have no problem with a version 2.42 if things stay
> stable that long. I hope they don't but that is another
> issue.
>
> Version 3.0 implies incompatibility with binaries from 2.x
> The distributions can play around with version numbers
> reflecting the GUI interface, libraries or installers but
> the kernel major version should stay the same until binary
> compatibility is broken. When we move old syscalls (such as
> 32 bit file ops) from deprecated to unsupported is when we
> increment the major number.
>
> It may be that 2.7 will see the cruft cut out and be the end
> of 2.x but 2.5 isn't that. So far 2.5 is performance
> enhancement. Terrific performance enhancement, thanks to you
> and many others. But it isn't adding major new features nor
> is it removing old interfaces. In many ways 2.6 looks like
> a sign that the 2.x kernel is getting mature. 2.6 means
> users can expect improvements but don't have to make big changes.
> 2.6 is an upgrade, 3.0 would be a replacement.

Technically correct. Major version jump should be made when there is
a binary incompatibility. It can be made without, but it is usually
done for marketing reasons. I hope we'll never have marketing reasons
for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
3.0
--
vda

2002-09-30 09:50:02

by Andre Hedrick

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, 30 Sep 2002, Jens Axboe wrote:

> On Sun, Sep 29 2002, Alan Cox wrote:
> > On Sun, 2002-09-29 at 18:42, Linus Torvalds wrote:
> > > I can say that the IDE code is the same code that is in 2.4.x, so if
> > > you're comfortable with 2.4.x wrt IDE, then you should be comfy with
> > > 2.5.x too.
> >
> > *NO*
> >
> > The IDE code is the experimental code in 2.4-ac. It is _NOT_ the IDE
> > code in 2.4 and its a lot less tested. I don't think it has any
> > corruption bugs but it is most definitely not the base 2.4 code and has
> > plenty of non corruption bugs (PCMCIA hang, taskfile write hang, irq
> > blocking performance problems)
>
> 2.5 at least does not have the taskfile hang, because I killed taskfile
> io.

Great :-/ Now that you have restored the "rq->wrq" aka working copy of
the request which in its past life under PIO only updated to block when
the entire request was completed. So there are no partial completions
possible given the old method in the legacy path.

One of the issues Linus kick my can over was the "requirement" of partial
completeions. What I need rom block is a way to know how much is
completed of the original total request. So whatever value is the
original rq->nr_sectors assigned to "TF.2/HF.2" or nsector_offset(s),
needs to be carried in block and updated to reflect how much more is
remaining of this CDB task.

I do not care if you call it "rq->dumbass_accounting_for_andre", but
provide this dummy accounting variable in "struct request" and I will be
happy. This has nothing to do with bio or bh segments from the kernel.
It is everything about device side accounting carried by block; whereas,
the ll_driver can use it to determine what or if there is to be another
interrupt.

Why are we getting lost interrupts?

Because there is a beautiful "data-block completion" v/s "immediate
interrupt assertion" race between the device and the kernel. So please
provide a counter which can be used to determine where the interrupt
driven partial completion model the driver is wrt the device/request.

Jens, not asking for much.

Otherwise the ADMA/VDMA is not doable period.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

2002-09-30 10:17:35

by Tomas Szepe

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

> Technically correct. Major version jump should be made when there is
> a binary incompatibility. It can be made without, but it is usually
> done for marketing reasons. I hope we'll never have marketing reasons
> for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
> 3.0

... and go Solaris, as in 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 7, 8, 9. :D

T.

2002-09-30 11:05:34

by jw schultz

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On Mon, Sep 30, 2002 at 12:22:28PM +0200, Tomas Szepe wrote:
> > Technically correct. Major version jump should be made when there is
> > a binary incompatibility. It can be made without, but it is usually
> > done for marketing reasons. I hope we'll never have marketing reasons
> > for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
> > 3.0
>
> ... and go Solaris, as in 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 7, 8, 9. :D

I've no problem per-se with 2.6, 2.7, 2.8, 2.9, 2.10...
And i see no reason to compare it with Solaris where numbers
are mostly marketing although their major number refereed to
the codebase (bsd vs SVr4).

I can see a number of real reasons to advance to 3.x:
Finishing the block layer and VM rewrite, maybe;
making FS blocksize independent of pagesize, probably;
flexibility with regard to pagesize for archs that support
variable pagesize (if market share and performance gains add
up the CPU designers will give it to us), probably;
initramfs, perhaps;
new module interface, possibly;
hotplug everything (and i mean everything), maybe;
elimination of the 32 bit versions of system-calls
and other deprecated interfaces, absolutely;
new filesystems and device drivers, nope;
incremental performance improvements, you've got to be kidding;

It is just that right now, from what little i can see, 2.5
is part way through the process of a block layer redesign
and the VM is in a similar state. Evidently LVM is in limbo
but that has to be at least operational before code freeze.
Driverfs looks promising but the API isn't even set and
documented yet. BTW what happened to moving away from
major/minor numbers?

The developers are doing a great job and things are moving
along but 2.6 looks more than anything else like a stabilized
snapshot so the improvements become available (trustworthy)
for production. That is consistent with "release early,
release often".



--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: [email protected]

Remember Cernan and Schmitt

2002-09-30 11:12:13

by Adrian Bunk

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On Mon, 30 Sep 2002, Tomas Szepe wrote:

> > Technically correct. Major version jump should be made when there is
> > a binary incompatibility. It can be made without, but it is usually
> > done for marketing reasons. I hope we'll never have marketing reasons
> > for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
> > 3.0
>
> ... and go Solaris, as in 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 7, 8, 9. :D

NetBSD still has sane version numbers: :-)

0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6

> T.


cu
Adrian

--

You only think this is a free country. Like the US the UK spends a lot of
time explaining its a free country because its a police state.
Alan Cox

2002-09-30 11:49:51

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, Sep 30 2002, Andre Hedrick wrote:
> On Mon, 30 Sep 2002, Jens Axboe wrote:
>
> > On Sun, Sep 29 2002, Alan Cox wrote:
> > > On Sun, 2002-09-29 at 18:42, Linus Torvalds wrote:
> > > > I can say that the IDE code is the same code that is in 2.4.x, so if
> > > > you're comfortable with 2.4.x wrt IDE, then you should be comfy with
> > > > 2.5.x too.
> > >
> > > *NO*
> > >
> > > The IDE code is the experimental code in 2.4-ac. It is _NOT_ the IDE
> > > code in 2.4 and its a lot less tested. I don't think it has any
> > > corruption bugs but it is most definitely not the base 2.4 code and has
> > > plenty of non corruption bugs (PCMCIA hang, taskfile write hang, irq
> > > blocking performance problems)
> >
> > 2.5 at least does not have the taskfile hang, because I killed taskfile
> > io.
>
> Great :-/ Now that you have restored the "rq->wrq" aka working copy of

Make taskfile io work 2.4-ac, and it will work in 2.5 as well. The only
sensible thing to do right now was to disable it in 2.5, imo, and so I
did.

> the request which in its past life under PIO only updated to block when
> the entire request was completed. So there are no partial completions
> possible given the old method in the legacy path.

I haven't restored anything. 2.4-ac (your base) uses ->wrq copy, so does
2.5.

> One of the issues Linus kick my can over was the "requirement" of partial
> completeions. What I need rom block is a way to know how much is
> completed of the original total request. So whatever value is the
> original rq->nr_sectors assigned to "TF.2/HF.2" or nsector_offset(s),
> needs to be carried in block and updated to reflect how much more is
> remaining of this CDB task.

Now that the block layer really can do partial completions properly, I
patched ide-disk to do just that. It's not very well tested, just did it
last week as proof-of-concept.

This breaks the typical offset rules, ie

current_segment_offset = rq->hard_cur_sectors - rq->current_nr_sectors;
total_offset = rq->hard_nr_sectors - rq->nr_sectors;

Haven't though too much about that yet.

> I do not care if you call it "rq->dumbass_accounting_for_andre", but
> provide this dummy accounting variable in "struct request" and I will be
> happy. This has nothing to do with bio or bh segments from the kernel.
> It is everything about device side accounting carried by block; whereas,
> the ll_driver can use it to determine what or if there is to be another
> interrupt.

What you ask for is already there, but requires that you massage
current_nr_sectors and nr_sectors like ide has always done.

> Why are we getting lost interrupts?
>
> Because there is a beautiful "data-block completion" v/s "immediate
> interrupt assertion" race between the device and the kernel. So please
> provide a counter which can be used to determine where the interrupt
> driven partial completion model the driver is wrt the device/request.
>
> Jens, not asking for much.

Indeed, you are asking for stuff we've had for years.

===== drivers/ide/ide-disk.c 1.16 vs edited =====
--- 1.16/drivers/ide/ide-disk.c Sat Sep 21 02:32:22 2002
+++ edited/drivers/ide/ide-disk.c Mon Sep 23 17:18:48 2002
@@ -139,8 +139,8 @@
*/
static ide_startstop_t read_intr (ide_drive_t *drive)
{
- ide_hwif_t *hwif = HWIF(drive);
- int i = 0, nsect = 0, msect = drive->mult_count;
+ ide_hwif_t *hwif = HWIF(drive);
+ int nsect = 0, msect = drive->mult_count;
struct request *rq;
unsigned long flags;
u8 stat;
@@ -174,25 +174,24 @@
(unsigned long) rq->buffer+(nsect<<9), rq->nr_sectors-nsect);
#endif
ide_unmap_buffer(rq, to, &flags);
- rq->sector += nsect;
- rq->errors = 0;
- i = (rq->nr_sectors -= nsect);
- if (((long)(rq->current_nr_sectors -= nsect)) <= 0)
- ide_end_request(drive, 1, rq->hard_cur_sectors);
+
+ /*
+ * all done
+ */
+ if (!ide_end_request(drive, 1, nsect))
+ return ide_stopped;
+
/*
* Another BH Page walker and DATA INTERGRITY Questioned on ERROR.
* If passed back up on multimode read, BAD DATA could be ACKED
* to FILE SYSTEMS above ...
*/
- if (i > 0) {
- if (msect)
- goto read_next;
- if (HWGROUP(drive)->handler != NULL)
- BUG();
- ide_set_handler(drive, &read_intr, WAIT_CMD, NULL);
- return ide_started;
- }
- return ide_stopped;
+ if (msect)
+ goto read_next;
+ if (HWGROUP(drive)->handler != NULL)
+ BUG();
+ ide_set_handler(drive, &read_intr, WAIT_CMD, NULL);
+ return ide_started;
}

/*
@@ -203,7 +202,6 @@
ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
struct request *rq = hwgroup->rq;
- int i = 0;
u8 stat;

if (!OK_STAT(stat = hwif->INB(IDE_STATUS_REG),
@@ -217,23 +215,19 @@
rq->nr_sectors-1);
#endif
if ((rq->nr_sectors == 1) ^ ((stat & DRQ_STAT) != 0)) {
- rq->sector++;
- rq->errors = 0;
- i = --rq->nr_sectors;
- --rq->current_nr_sectors;
- if (((long)rq->current_nr_sectors) <= 0)
- ide_end_request(drive, 1, rq->hard_cur_sectors);
- if (i > 0) {
- unsigned long flags;
- char *to = ide_map_buffer(rq, &flags);
- taskfile_output_data(drive, to, SECTOR_WORDS);
- ide_unmap_buffer(rq, to, &flags);
- if (HWGROUP(drive)->handler != NULL)
- BUG();
- ide_set_handler(drive, &write_intr, WAIT_CMD, NULL);
- return ide_started;
- }
- return ide_stopped;
+ unsigned long flags;
+ char *to;
+
+ if (!ide_end_request(drive, 1, 1))
+ return ide_stopped;
+
+ to = ide_map_buffer(rq, &flags);
+ taskfile_output_data(drive, to, SECTOR_WORDS);
+ ide_unmap_buffer(rq, to, &flags);
+ if (HWGROUP(drive)->handler != NULL)
+ BUG();
+ ide_set_handler(drive, &write_intr, WAIT_CMD, NULL);
+ return ide_started;
}
/* the original code did this here (?) */
return ide_stopped;
===== drivers/ide/ide-taskfile.c 1.4 vs edited =====
--- 1.4/drivers/ide/ide-taskfile.c Fri Sep 20 00:13:51 2002
+++ edited/drivers/ide/ide-taskfile.c Mon Sep 23 17:04:47 2002
@@ -611,9 +611,8 @@
* BH walking or segment can only be updated after we have a good
* hwif->INB(IDE_STATUS_REG); return.
*/
- if (--rq->current_nr_sectors <= 0)
- if (!DRIVER(drive)->end_request(drive, 1, 0))
- return ide_stopped;
+ if (!DRIVER(drive)->end_request(drive, 1, 1))
+ return ide_stopped;
/*
* ERM, it is techincally legal to leave/exit here but it makes
* a mess of the code ...
@@ -669,7 +668,6 @@
taskfile_input_data(drive, pBuf, nsect * SECTOR_WORDS);
task_unmap_rq(rq, pBuf, &flags);
rq->errors = 0;
- rq->current_nr_sectors -= nsect;
msect -= nsect;
/*
* FIXME :: We really can not legally get a new page/bh
@@ -677,10 +675,8 @@
* BH walking or segment can only be updated after we have a
* good hwif->INB(IDE_STATUS_REG); return.
*/
- if (!rq->current_nr_sectors) {
- if (!DRIVER(drive)->end_request(drive, 1, 0))
- return ide_stopped;
- }
+ if (!DRIVER(drive)->end_request(drive, 1, 1))
+ return ide_stopped;
} while (msect);
if (HWGROUP(drive)->handler == NULL)
ide_set_handler(drive, &task_mulin_intr, WAIT_WORSTCASE, NULL);
@@ -740,9 +736,9 @@
* Safe to update request for partial completions.
* We have a good STATUS CHECK!!!
*/
- if (!rq->current_nr_sectors)
- if (!DRIVER(drive)->end_request(drive, 1, 0))
- return ide_stopped;
+ if (!DRIVER(drive)->end_request(drive, 1, 1))
+ return ide_stopped;
+
if ((rq->current_nr_sectors==1) ^ (stat & DRQ_STAT)) {
rq = HWGROUP(drive)->rq;
pBuf = task_map_rq(rq, &flags);
@@ -802,13 +798,10 @@
msect -= nsect;
taskfile_output_data(drive, pBuf, nsect * SECTOR_WORDS);
task_unmap_rq(rq, pBuf, &flags);
- rq->current_nr_sectors -= nsect;
- if (!rq->current_nr_sectors) {
- if (!DRIVER(drive)->end_request(drive, 1, 0))
- if (!rq->bio) {
- stat = hwif->INB(IDE_STATUS_REG);
- return ide_stopped;
- }
+ if (!DRIVER(drive)->end_request(drive, 1, 1)) {
+ /* stat for...? */
+ stat = hwif->INB(IDE_STATUS_REG);
+ return ide_stopped;
}
} while (msect);
rq->errors = 0;
@@ -922,18 +915,14 @@
msect -= nsect;
taskfile_output_data(drive, pBuf, nsect * SECTOR_WORDS);
task_unmap_rq(rq, pBuf, &flags);
- rq->current_nr_sectors -= nsect;
/*
* FIXME :: We really can not legally get a new page/bh
* regardless, if this is the end of our segment.
* BH walking or segment can only be updated after we
* have a good hwif->INB(IDE_STATUS_REG); return.
*/
- if (!rq->current_nr_sectors) {
- if (!DRIVER(drive)->end_request(drive, 1, 0))
- if (!rq->bio)
- return ide_stopped;
- }
+ if (!DRIVER(drive)->end_request(drive, 1, 1))
+ return ide_stopped;
} while (msect);
rq->errors = 0;
if (HWGROUP(drive)->handler == NULL)

--
Jens Axboe

2002-09-30 12:47:25

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, 2002-09-30 at 08:56, Jens Axboe wrote:
> 2.5 at least does not have the taskfile hang, because I killed taskfile
> io.

Thats not exactly a fix 8). 2.5 certainly has the others. Taskfile I/O
is pretty low on my fix list. The fix isnt trivial because we set the
IRQ handler late - so the IRQ can beat us setting the handler, but
equally if we set it early we get to worry about all the old races in
2.3.x

2002-09-30 13:00:09

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, Sep 30 2002, Alan Cox wrote:
> On Mon, 2002-09-30 at 08:56, Jens Axboe wrote:
> > 2.5 at least does not have the taskfile hang, because I killed taskfile
> > io.
>
> Thats not exactly a fix 8). 2.5 certainly has the others. Taskfile I/O

I didn't claim it was, I just don't want a user setting taskfile io to
'y' because he thinks its cool when we know its broken.

> is pretty low on my fix list. The fix isnt trivial because we set the
> IRQ handler late - so the IRQ can beat us setting the handler, but
> equally if we set it early we get to worry about all the old races in
> 2.3.x

Where exactly is the race?

--
Jens Axboe

2002-09-30 15:27:54

by Jan Harkes

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, Sep 29, 2002 at 05:38:17PM +0200, Jens Axboe wrote:
> On Sun, Sep 29 2002, Alan Cox wrote:
> > Most of my boxes won't even run a 2.5 tree yet. I'm sure its hardly
> > unique. Middle of November we may begin to find out how solid the core
> > code actually is, as drivers get fixed up and also in the other
> > direction as we eliminate numerous crashes caused by "fixed in 2.4" bugs
>
> Well why don't they run with 2.5?
>
> Alan, I think you are a pessimist painting a much bleaker picture of 2.5
> than it deserves. Sure lots of drivers may be broken still, I would be
> naive if I thought that this is all changed in time for oct 31. Most of
> these will not be fixed until people actually _use_ 2.5 (or 3.0-pre, or
> whatever it will be called), and that will not happen until Linus
> actually releases a -rc or similar. And so the fsck what? Noone expects
> 2.6-pre/3.0-pre to be perfect.

Ok, after losing a disk in the early 2.5 series, and not being able to
compile pretty much any kernel since 2.5.33, I decided to give 2.5.39 a
try last weekend.

Built kernel, rebooted, almost seems to get stuch during the ide-probing
(10 seconds wait is a conservative estimate), but it came up in single
user. Checking for errors in /proc/kmsg, nothing. Great reboot
multiuser start X open a window lose all access to my keyboard. Completely
log in remotely with ssh, hmm kernel errors about unknown scancodes.

Reboot, just don't use X for the moment, maybe I can catch an oops,
lockup during boot while loading the uhci usb driver. Alt-sysrq works,
another fsck later (these seem to take a lot longer, but that could be
subjective). Disable hotplug/usb during startup, reboot, within 2
minutes orinoco_cs driver locks up and starts throwing debugging goo
about transmit timeouts and resetting card. Nice, except for the fact
that interrupts seem to be disabled and this time magic-sysrq doesn't
work.

Pull the battery out to be able to reboot the laptop, and went back to
2.4.20-latest for now. 2.5.33 did work mostly (after fixing up a bunch
of compile fixes and the oss cs4281 driver), but seems to last only
about 1 hour on battery life vs. the solid 3 1/2 hours with a 2.4 kernel.
All of this is on a Thinkpad X20, which doesn't have a serial console.

Using APM, not ACPI. But this is not a bugreport, because I haven't even
got a chance to isolate any single problem in a way that I can create a
useful report.

> I'm not worried.

I am a bit worried, at least as far as Coda is concerned, there is a lot
of unmerged stuff, and as long as I can't do any testing of the changes
it is a bit useless to send them off to Linus. I hope things stabilize
before the feature freeze.

Jan

2002-09-30 16:28:24

by jbradford

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

> > How many people are sitting on the sidelines waiting for guarantee that ide is
> > not going to blow up on our filesystems and take our data with it. Guarantee
> > that ide is working and not dangerous to our data, then I bet a lot more
> > people will come back and bang on 2.5.
>
> How the hell can I _guarantee_ anything like that?

You don't need to - just post "2.5.x ide is working, and not dangerous to your data", and loads of people will start using it. That way, we get it tested a decent amount.

Of course when somebody's root fs get fsck'ed, (pun intended), the list is bound to get a flamewar^Whelpfully worded bug report.

The false rumors that IDE was fubar for a long time in 2.5.x, coupled with the fact that a lot of recent 2.5.x kernels don't compile, seem to have scared off people which is rediculous.

John.

2002-09-30 16:42:21

by Pau Aliagas

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, james wrote:

> I know this whole ide mess have taken me away from the devolemental series.
> And I bet a lot of others.

That is precisely what has kept me out of 2.5. I do not want to risk my
data due to the IDE problems; otherwise I'd be happy testing 2.5 all
around in all kind of machines I had available.

Pau

2002-09-30 18:43:36

by Bill Davidsen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On 30 Sep 2002, Kai Henningsen wrote:

> One idea we've come up (and surely we're not the only ones) is to use
> cheap IDE disks for backup, possibly in a cold-swappable insert. As long
> as you can keep several backups per disk (say using some of those 100GB
> disks), preferrably even on a different machine, that's fairly cheap.
>
> If you want to keep daily backups for a week, weekly for a year, and all
> on separate media, of course, that's *not* cheap with this method, and
> even DLT or similar prices become acceptable in comparision. But it
> certainly beats *no* backup!

I do that, but it doesn't make for a storage medium I can easily use on
another system. The cost of DVD writers is coming down, and non-magnetic
media may have some advantages as well. Still, thay're small compared to
disk sizes.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-30 18:39:18

by Bill Davidsen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sat, 28 Sep 2002, Linus Torvalds wrote:

> However, I'll believe that when I see it. Usually people don't complain
> during a development kernel, because they think they shouldn't, and then
> when it becomes stable (ie when the version number changes) they are
> surprised that the behabviour didn't magically improve, and _then_ we get
> tons of complaints about how bad the VM is under their load.

Part of this is because people who complain often get answers which sound
a lot like "what do you expect, it's a test kernel," or "you have the
source, go fix it," or even "if you don't like go run Windows." This list
is FAR more cordial than newsgroups, but I have seen people who suggested
an improvement get invited to submit a patch.

The other reason is the "it must be me" effect, if something doesn't work
for the user there is a general reaction that something must be configured
wrong.

Anyway that's my impression of why the complaints come as you say, I think
it's going to happen regardless of the version number.

For what it's worth the changes feel more like 2.2 to 2.4 than 1.2.13 to
2.0, but as long as you don't call it Windows I don't really care;-)

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-30 19:05:35

by Bill Davidsen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On 29 Sep 2002, Alan Cox wrote:

> On Sun, 2002-09-29 at 16:26, Matthias Andree wrote:
> > I personally have the feeling that 2.2.x performed better than 2.4.x
> > does, but I cannot go figure because I'm using ReiserFS 3.6 file
>
> On low end boxes the benchmarks I did show later 2.4-rmap beats 2.2. 2.0
> worked suprisingly well (better than pre-rmap 2.4) and as Stephen
> claimed the best code was about 2.1.100, 2.2 then dropped badly from
> that point.

I might have said 2.1.106 (I'm still running that on one box), but that's
the general sweet spot.

> Low memory is of course where rmap does best, so the 2.4-rmap v 2.4
> parts of such testing are not actually that useful

In the 2.4-ac vs. 2.4-aa tests I did in the spring, rmap was better on
small memory, -aa was better with large memory and heavy write load. I
expect ioscheduling to address this, and when I get a totally expendable
large machine I'll try 2.5 again.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-30 19:34:50

by Bill Davidsen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Sun, 29 Sep 2002, Jens Axboe wrote:

> On Sun, Sep 29 2002, [email protected] wrote:
> > > Anyway, people who are having VM trouble with the current 2.5.x series,
> > > please _complain_, and tell what your workload is. Don't sit silent and
> > > make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> > > thing.
> >
> > I think the broken IDE in 2.5.x has meant that it got seriously less
> > testing overall than previous development trees :-(. Maybe after
> > halloween when it stabilises a bit more we'll get more reports in.
>
> 2.5 is definitely desktop stable, so please test it if you can. Until
> recently there was a personal show stopper for me, the tasklist
> deadline. Now 2.5 is happily running on my desktop as well.

2.5.38-mm2 has been stable for me on uni, what is the status of SMP? I had
what looked like logical to physical mapping problems on a BP6 and Abit
dual P5C-166, resulting in syslog data on every drive including those with
no Linux partition. That was somewhere around 2.5.22 to 2.5.26.

> 2.5 IDE stability should be just as good as 2.4-ac.

A laudable goal.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-09-30 19:43:05

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On Mon, 30 Sep 2002, Tomas Szepe wrote:

> > for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
> > 3.0
>
> ... and go Solaris, as in 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 7, 8, 9. :D

I wonder what SunOS 6.0 is going to be called ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".

http://www.surriel.com/ http://distro.conectiva.com/

Spamtraps of the month: [email protected] [email protected]

2002-09-30 20:25:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice driver

On Mon, Sep 30, 2002 at 04:48:00PM -0300, Rik van Riel wrote:
> On Mon, 30 Sep 2002, Tomas Szepe wrote:
>
> > > for lk. :-) We can be actually _proud_ to have 2.$BIGNUM instead of
> > > 3.0
> >
> > ... and go Solaris, as in 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 7, 8, 9. :D
>
> I wonder what SunOS 6.0 is going to be called ;)

Solaris .COM

2002-10-01 02:14:12

by Andre Hedrick

[permalink] [raw]
Subject: Re: v2.6 vs v3.0


First an apology to Russell for bring him into this thread.

On Mon, 30 Sep 2002, Jens Axboe wrote:

> On Mon, Sep 30 2002, Alan Cox wrote:
> > On Mon, 2002-09-30 at 08:56, Jens Axboe wrote:
> > > 2.5 at least does not have the taskfile hang, because I killed taskfile
> > > io.
> >
> > Thats not exactly a fix 8). 2.5 certainly has the others. Taskfile I/O
>
> I didn't claim it was, I just don't want a user setting taskfile io to
> 'y' because he thinks its cool when we know its broken.
>
> > is pretty low on my fix list. The fix isnt trivial because we set the
> > IRQ handler late - so the IRQ can beat us setting the handler, but
> > equally if we set it early we get to worry about all the old races in
> > 2.3.x
>
> Where exactly is the race?

As soon as you complete read or writing the final byte in a pio state
diagram, the device can interrupt instantly! I do mean instantly.


ide_startstop_t task_out_intr (ide_drive_t *drive)
{
ide_hwif_t *hwif = HWIF(drive);
struct request *rq = HWGROUP(drive)->rq;
char *pBuf = NULL;
unsigned long flags;
u8 stat;

if (!OK_STAT(stat = hwif->INB(IDE_STATUS_REG),
DRIVE_READY, drive->bad_wstat)) {
DTF("%s: WRITE attempting to recover last " \
"sector counter status=0x%02x\n",
drive->name, stat);
rq->current_nr_sectors++;
return DRIVER(drive)->error(drive, "task_out_intr", stat);
}
/*
* Safe to update request for partial completions.
* We have a good STATUS CHECK!!!
*/
if (!rq->current_nr_sectors)
if (!DRIVER(drive)->end_request(drive, 1))
return ide_stopped;
if ((rq->current_nr_sectors==1) ^ (stat & DRQ_STAT)) {
rq = HWGROUP(drive)->rq;
pBuf = task_map_rq(rq, &flags);
DTF("write: %p, rq->current_nr_sectors: %d\n",
pBuf, (int) rq->current_nr_sectors);
taskfile_output_data(drive, pBuf, SECTOR_WORDS);
KABOOM! The RACE is on! (The handler start point)
task_unmap_rq(rq, pBuf, &flags);
rq->errors = 0;
rq->current_nr_sectors--;
}
if (HWGROUP(drive)->handler == NULL)
ide_set_handler(drive, &task_out_intr, WAIT_WORSTCASE, NULL);
Driver WINS!
return ide_started;
}

If the device issues an interrupt to the host controller before we can arm
the handler we are dead.

void taskfile_output_data (ide_drive_t *drive, void *buffer, u32 wcount)
{
if (drive->bswap) {
ata_bswap_data(buffer, wcount);
HWIF(drive)->ata_output_data(drive, buffer, wcount);
KABOOM! The RACE is on! (The Second fake start point)
ata_bswap_data(buffer, wcount);
} else {
HWIF(drive)->ata_output_data(drive, buffer, wcount);
KABOOM! The RACE is on! (The Second fake start point)
}
}

void ata_output_data (ide_drive_t *drive, void *buffer, u32 wcount)
{
ide_hwif_t *hwif = HWIF(drive);
u8 io_32bit = drive->io_32bit;

if (io_32bit) {
if (io_32bit & 2) {
unsigned long flags;
local_irq_save(flags);
ata_vlb_sync(drive, IDE_NSECTOR_REG);
hwif->OUTSL(IDE_DATA_REG, buffer, wcount);
local_irq_restore(flags);
} else
hwif->OUTSL(IDE_DATA_REG, buffer, wcount);
} else {
hwif->OUTSW(IDE_DATA_REG, buffer, wcount<<1);
}
KABOOM! The RACE is on! (The Real start point)
}


If we are having to lollygag in the kernel for a byteswap or a bounce
buffer (aka memcpy/free) we can/will loose the interrupt. The old code
would push the handler early resulting in timeouts and double handlers
added.

Now the question is how to addresss the race.

At this point we have two paths each with bugs.
The old legacy path can allow for the wrong handler to be executed for a
given interrupt. The old path can with the above bug can potentially crap
data. Specifically wrong handle execution.

The new path can miss setting the handler in time.

It can be fixed and maybe the account process stuff is already present,
and we are at another communication delay but it shall be worked through
calmly, not like the past where nothing gets done and people just become
offended.

Cheers,


Andre Hedrick
LAD Storage Consulting Group



2002-10-01 06:24:22

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, Sep 30 2002, Bill Davidsen wrote:
> On Sun, 29 Sep 2002, Jens Axboe wrote:
>
> > On Sun, Sep 29 2002, [email protected] wrote:
> > > > Anyway, people who are having VM trouble with the current 2.5.x series,
> > > > please _complain_, and tell what your workload is. Don't sit silent and
> > > > make us think we're good to go.. And if Ingo is right, I'll do the 3.0.x
> > > > thing.
> > >
> > > I think the broken IDE in 2.5.x has meant that it got seriously less
> > > testing overall than previous development trees :-(. Maybe after
> > > halloween when it stabilises a bit more we'll get more reports in.
> >
> > 2.5 is definitely desktop stable, so please test it if you can. Until
> > recently there was a personal show stopper for me, the tasklist
> > deadline. Now 2.5 is happily running on my desktop as well.
>
> 2.5.38-mm2 has been stable for me on uni, what is the status of SMP? I had
> what looked like logical to physical mapping problems on a BP6 and Abit
> dual P5C-166, resulting in syslog data on every drive including those with
> no Linux partition. That was somewhere around 2.5.22 to 2.5.26.

Well I do all my 2.5 testing on SMP, I don't even remember when I last
compiled a UP 2.5 kernel. Well works for me as I wrote earlier, I don't
keep the deskop up more than a few days at the time though. Then I boot
a newer 2.5 on it.

> > 2.5 IDE stability should be just as good as 2.4-ac.
>
> A laudable goal.

If you know of any points where this is currently not true, I'd like to
hear about it. I'm considering this goal reached. Whether 2.4-ac is at
the level we want is a different story.

--
Jens Axboe

2002-10-01 07:49:27

by Mikael Pettersson

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

Jens Axboe writes:
> On Mon, Sep 30 2002, Bill Davidsen wrote:
> > On Sun, 29 Sep 2002, Jens Axboe wrote:
> > > 2.5 IDE stability should be just as good as 2.4-ac.
> >
> > A laudable goal.
>
> If you know of any points where this is currently not true, I'd like to
> hear about it. I'm considering this goal reached. Whether 2.4-ac is at
> the level we want is a different story.

2.5.39 IDE is nowhere near as stable as 2.4.20-pre8:

- I have several boxes with decent PCI chipsets (BX, HX) but old disks.
With 2.5.39, they tend to spew a couple of ..._intr errors on boot.
(Sorry, can't be more specific right now. I won't be near those
boxes until Saturday.)

- Same ..._intr errors on my 486 with a qd6580 VLB controller.
It also has, in post-2.5.36 kernels, an instant-reboot problem which
occurs whenever I pass the ide0=qd65xx kernel option required to
activate its chipset support. (I _believe_ this is because the code
does something, like a kmalloc, which is illegal at the early
point IDE's __setup runs.) With 2.5.3x kernels, this box also sees
a steady stream of spurious interrupts while doing a kernel recompile,
something it doesn't see in older kernels.

- My Intel AL440LX box (440LX chipset, 20G Quantum Fireball) worked
brilliantly up to 2.5.36, but hangs *hard* with 2.5.39 as soon
as I tar zxf the kernel source tarball.
(May or may not be IDE. I'll try a minimal 2.5.39 tonight.)

All of these work perfectly with 2.4.20-pre8, indeed all previous 2.4
standard kernels, 2.2 + Andre's ide-patch, and with the exception of
the ..._intr errors, 2.5.36.

OTOH, I have three boxes which do appear to work fine with 2.5.39.

/Mikael

2002-10-01 08:22:02

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Tue, Oct 01 2002, Mikael Pettersson wrote:
> Jens Axboe writes:
> > On Mon, Sep 30 2002, Bill Davidsen wrote:
> > > On Sun, 29 Sep 2002, Jens Axboe wrote:
> > > > 2.5 IDE stability should be just as good as 2.4-ac.
> > >
> > > A laudable goal.
> >
> > If you know of any points where this is currently not true, I'd like to
> > hear about it. I'm considering this goal reached. Whether 2.4-ac is at
> > the level we want is a different story.
>
> 2.5.39 IDE is nowhere near as stable as 2.4.20-pre8:

Common misconception. I wrote 2.4-ac, not 2.4 vanilla tre. 2.4-ac is in
flux, 2.5 is too. There are some quirks, most of the 'doesnt work'
nature and not the 'corrupting data' kind.

> - I have several boxes with decent PCI chipsets (BX, HX) but old disks.
> With 2.5.39, they tend to spew a couple of ..._intr errors on boot.
> (Sorry, can't be more specific right now. I won't be near those
> boxes until Saturday.)

But they come up?

> - Same ..._intr errors on my 486 with a qd6580 VLB controller.
> It also has, in post-2.5.36 kernels, an instant-reboot problem which
> occurs whenever I pass the ide0=qd65xx kernel option required to
> activate its chipset support. (I _believe_ this is because the code
> does something, like a kmalloc, which is illegal at the early
> point IDE's __setup runs.) With 2.5.3x kernels, this box also sees
> a steady stream of spurious interrupts while doing a kernel recompile,
> something it doesn't see in older kernels.

Ok this is a new one, at least to me

> - My Intel AL440LX box (440LX chipset, 20G Quantum Fireball) worked
> brilliantly up to 2.5.36, but hangs *hard* with 2.5.39 as soon
> as I tar zxf the kernel source tarball.
> (May or may not be IDE. I'll try a minimal 2.5.39 tonight.)

Probably not ide, no important changes in there in between 2.6.36 and
present.

> All of these work perfectly with 2.4.20-pre8, indeed all previous 2.4
> standard kernels, 2.2 + Andre's ide-patch, and with the exception of
> the ..._intr errors, 2.5.36.

If you (or anyone else for that matter) come across ide oddities in 2.5,
please try 2.4.20-pre-ac kernels and see if you can reproduce.

--
Jens Axboe

2002-10-01 08:30:44

by jbradford

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

> > - My Intel AL440LX box (440LX chipset, 20G Quantum Fireball) worked
> > brilliantly up to 2.5.36, but hangs *hard* with 2.5.39 as soon
> > as I tar zxf the kernel source tarball.
> > (May or may not be IDE. I'll try a minimal 2.5.39 tonight.)
>
> Probably not ide, no important changes in there in between 2.6.36 and
> present.

Where can I get the 2.6.x tree, then? :-)

John.

2002-10-01 11:20:44

by Jens Axboe

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Tue, Oct 01 2002, Alan Cox wrote:
> > - Same ..._intr errors on my 486 with a qd6580 VLB controller.
> > It also has, in post-2.5.36 kernels, an instant-reboot problem which
> > occurs whenever I pass the ide0=qd65xx kernel option required to
>
> Seems to be specific to the 2.5.x version of the new ide so I guess its
> a port error (or just bad luck it now breaks and was iffy before)

ok, I'll try it in 2.5 then

> > - My Intel AL440LX box (440LX chipset, 20G Quantum Fireball) worked
> > brilliantly up to 2.5.36, but hangs *hard* with 2.5.39 as soon
> > as I tar zxf the kernel source tarball.
> > (May or may not be IDE. I'll try a minimal 2.5.39 tonight.)
>
> Thats PIIX, which should be the most boringly stable configuration of
> the lot 8(

There's no evidence that this is an ide error yet. I'd like to see some
serial console or similar on that beast. I have no LX board here, but
piix is rock solid.

--
Jens Axboe

2002-10-01 11:18:43

by Alan

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Tue, 2002-10-01 at 08:54, Mikael Pettersson wrote:
> - I have several boxes with decent PCI chipsets (BX, HX) but old disks.
> With 2.5.39, they tend to spew a couple of ..._intr errors on boot.
> (Sorry, can't be more specific right now. I won't be near those
> boxes until Saturday.)

Thats fine. Its issuing commands the drives reject. Right now we dont do
it quietly that is all.

> - Same ..._intr errors on my 486 with a qd6580 VLB controller.
> It also has, in post-2.5.36 kernels, an instant-reboot problem which
> occurs whenever I pass the ide0=qd65xx kernel option required to

Seems to be specific to the 2.5.x version of the new ide so I guess its
a port error (or just bad luck it now breaks and was iffy before)

> - My Intel AL440LX box (440LX chipset, 20G Quantum Fireball) worked
> brilliantly up to 2.5.36, but hangs *hard* with 2.5.39 as soon
> as I tar zxf the kernel source tarball.
> (May or may not be IDE. I'll try a minimal 2.5.39 tonight.)

Thats PIIX, which should be the most boringly stable configuration of
the lot 8(

2002-10-01 12:33:17

by Matthias Andree

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Mon, 30 Sep 2002, Bill Davidsen wrote:

> I do that, but it doesn't make for a storage medium I can easily use on
> another system. The cost of DVD writers is coming down, and non-magnetic
> media may have some advantages as well. Still, thay're small compared to
> disk sizes.

There are big drives available if you really want one (and can afford
one, which is the bigger problem usually).

Tandberg has some big SLR drives (50 GB native data, maybe even more,
didn't check for some months), many companies have DLT and SuperDLT that
store several dozen GB each, then there's Ultrium, and if you're after
cheap stuff, there's also ADR (but there are some that require the osst
driver, which is not helpful if you need to support other OSs beyond
Windows and Linux). This list is not complete, and it deliberately omits
helical scan technologies such as DDS.

2002-10-01 19:22:38

by Petr Baudis

[permalink] [raw]
Subject: IPv6 stability (success story ;)

Dear diary, on Sun, Sep 29, 2002 at 07:26:37PM CEST, I got a letter,
where Jochen Friedrich <[email protected]> told me, that...
> Hi Andi,
>
> > Actually current IPv6 is stable and has been for a long time, it's just not
> > completely standards compliant (but still quite usable for a lot of people)
>
> For end systems (no router) with static IPv6 definitions this seems to be
> true. However, for machines which use autoconfiguration (stateless as
> there isn't a usable IPv6 capable DHCP server AFAIK) or act as routers,
> the current state of the implementation of the default route can best be
> described as buggy. (Autoconfigured machines seem to loose their default
> route after some time, e.g.).

Well, I maintain Point of Presence for XS26 at Prague running on linux
(2.4.19), and it works with almost no problems routing about 20 kilobytes per
second through about 520 interfaces (tunnels) and with routing table consisting
of cca 2100 entries (there's zebra, ospf6d and bgpd running there ;). The only
one real problem we had was neighbour discovery bug up to 2.4.18 which was
fixed along the way to 2.4.19. There are no crashes, no routing instabilities,
we are absolutely happy with linux there ;-) (in fact, we have frequently much
more problems with the *BSDs running at some other PoPs).

Oh, of course, I must thank Alexey a lot for providing excellent support for us
:).

--

Petr "Pasky" Baudis

* ELinks maintainer * IPv6 guy (XS26 co-coordinator)
* IRCnet operator * FreeCiv AI occassional hacker
.
<Beeth> Girls are like internet domain names, the ones I like are already taken.
<honx> Well, you can still get one from a strange country :-P
.
Public PGP key && geekcode && homepage: http://pasky.ji.cz/~pasky/

2002-10-03 15:40:55

by jbradford

[permalink] [raw]
Subject: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

> > Tangent question, is it definitely to be named 2.6?
>
> I see no real reason to call it 3.0.
>
> The order-of-magnitude threading improvements might just come closest to
> being a "new thing", but yeah, I still consider it 2.6.x. We don't have
> new architectures or other really fundamental stuff. In many ways the jump
> from 2.2 -> 2.4 was bigger than the 2.4 -> 2.6 thing will be, I suspect.

I think we should stick to incrementing the major number when binary compatibility is broken.

> But hey, it's just a number. I don't feel that strongly either way. I
> think version number inflation (can anybody say "distribution makers"?) is
> a bit silly, and the way the kernel numbering works there is no reason to
> bump the major number for regular releases.

Psycologically and sub-conciously, this kind of thing _does_ make people stand up and take notice.

For example, SNK made the NeoGeo arcade games print things like:

NEO GEO
MAX 330 MEGA
PRO GEAR SPEC

on start up and in attract mode.

As far as I know, the 330 MEGA means absolutely nothing, and pro gear spec is just an arbitrary name for the addressing system used.

John.

2002-10-03 16:03:34

by jbradford

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

> On Thu, 3 Oct 2002 [email protected] wrote:
> >
> > I think we should stick to incrementing the major number when binary
> > compatibility is broken.
>
> "Stick to"? We've never had that as any criteria for major numbers in the
> kernel. Binary compatibility has _never_ been broken as a release policy,
> only as a "that code is old, and we've given people 5 years to migrate to
> the new system calls, the old ones are TOAST".

Ah, I was getting confused, I thought that the move to 2.0 was when we moved from a.out to elf. I didn't really follow kernel development very closely at all back then, to be truthful.

> The only policy for major numbers has always been "major capability
> changes".

Then it definitely shouldn't be 3.0 yet then.

> 1.0 was "networking is stable and generally usable" (by the
> standards of that time), while 2.0 was "SMP and true multi-architecture
> support". My planned point for 3.0 was NuMA support, but while we actually
> have some of that, the hardware just isn't relevant enough to matter.

Hmmm, then for 3.0 I'd vote for fully working and proven stable:

* High memory support,
* IPV6
* IDE-SCSI
* Bluetooth
* USB (2)
* IEEE 1394

> The memory management issues would qualify for 3.0, but my argument there
> is really that I doubt everybody really is happy yet. Which was why I
> asked for people to test it and complain about VM behaviour - and we've
> had some ccomplaints ("too swap-happy") although they haven't sounded like
> really horrible problems.

To be completely honest, I dont't see any improvement in 2.5.x over 2.4.x on my boxes that are running both :-(.

John.

2002-10-03 16:51:00

by Alan

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

> We still need some work for low memory boxes (where low isn't
> necessarily all that low). On my 128MB laptop I can lock up the box
> for a minute or two at a time by doing two things at the same time,
> like a bk pull, and switching desktops.
>
> I dread to think how a 16 or 32MB box performs these days..

On 2.4.1x with rmap, better than 2.2. A 32Mb box with rmap vm on 2.4,
running the xfce/rox desktop and sylpheed is very snappy indeed

2002-10-03 16:03:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)


On Thu, 3 Oct 2002 [email protected] wrote:
>
> I think we should stick to incrementing the major number when binary
> compatibility is broken.

"Stick to"? We've never had that as any criteria for major numbers in the
kernel. Binary compatibility has _never_ been broken as a release policy,
only as a "that code is old, and we've given people 5 years to migrate to
the new system calls, the old ones are TOAST".

The only policy for major numbers has always been "major capability
changes". 1.0 was "networking is stable and generally usable" (by the
standards of that time), while 2.0 was "SMP and true multi-architecture
support". My planned point for 3.0 was NuMA support, but while we actually
have some of that, the hardware just isn't relevant enough to matter.

The memory management issues would qualify for 3.0, but my argument there
is really that I doubt everybody really is happy yet. Which was why I
asked for people to test it and complain about VM behaviour - and we've
had some ccomplaints ("too swap-happy") although they haven't sounded like
really horrible problems.

Linus

2002-10-03 16:24:56

by Alan

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, 2002-10-03 at 16:57, Linus Torvalds wrote:
>
> On Thu, 3 Oct 2002 [email protected] wrote:
> >
> > I think we should stick to incrementing the major number when binary
> > compatibility is broken.
>
> "Stick to"? We've never had that as any criteria for major numbers in the
> kernel. Binary compatibility has _never_ been broken as a release policy,
> only as a "that code is old, and we've given people 5 years to migrate to
> the new system calls, the old ones are TOAST".

We've generally done better than that. Libc 2.2.2 stil works

2002-10-03 16:44:09

by Dave Jones

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, Oct 03, 2002 at 08:57:13AM -0700, Linus Torvalds wrote:

> The memory management issues would qualify for 3.0, but my argument there
> is really that I doubt everybody really is happy yet. Which was why I
> asked for people to test it and complain about VM behaviour - and we've
> had some ccomplaints ("too swap-happy") although they haven't sounded like
> really horrible problems.

We still need some work for low memory boxes (where low isn't
necessarily all that low). On my 128MB laptop I can lock up the box
for a minute or two at a time by doing two things at the same time,
like a bk pull, and switching desktops.

I dread to think how a 16 or 32MB box performs these days..

Dave

--
| Dave Jones. http://www.codemonkey.org.uk

2002-10-03 16:49:56

by Linus Torvalds

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)


On 3 Oct 2002, Alan Cox wrote:
> >
> > "Stick to"? We've never had that as any criteria for major numbers in the
> > kernel. Binary compatibility has _never_ been broken as a release policy,
> > only as a "that code is old, and we've given people 5 years to migrate to
> > the new system calls, the old ones are TOAST".
>
> We've generally done better than that. Libc 2.2.2 stil works

We have removed _some_ stuff, and we've definitely broken some of the more
esoteric configuration stuff (ie things like "top" and "ps" and "ifconfig"
have broken multiple times over the last 11 years).

And that "old_stat()" thing really ought to go some day.. It's not much of
a support burden, and yeah, we can point people to "that old a.out binary
from 1993 still works fine", so I guess we'll keep it another ten years,
but at this point that has less to do with technical judgement than with
sentimentality, I think ;^)

But yeah, I think on the whole we've done pretty well on being binary
compatible.

Linus

2002-10-03 17:28:04

by Alan

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, 2002-10-03 at 17:56, Linus Torvalds wrote:
> And that "old_stat()" thing really ought to go some day.. It's not much of
> a support burden, and yeah, we can point people to "that old a.out binary
> from 1993 still works fine", so I guess we'll keep it another ten years,
> but at this point that has less to do with technical judgement than with
> sentimentality, I think ;^)
>
> But yeah, I think on the whole we've done pretty well on being binary
> compatible.

Im not sure we want to throw those things out. However all the stuff
that went out before libc5 could go into a legacy.c file that is only
liked if a.out loaders are present

2002-10-03 19:46:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, 3 Oct 2002, Linus Torvalds wrote:

> The memory management issues would qualify for 3.0, but my argument
> there is really that I doubt everybody really is happy yet.

I'm absolutely convinced some people won't be happy, simply
because of the fundamental limitations of global page replacement.
However, Andrew Morton has done a great job and the 2.5 VM seems
to be looking as good as anything we've had before.

For me 3.0 arguments would be Ingo's threading stuff, not anything
else.

regards,

Rik
--
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/ http://distro.conectiva.com/

2002-10-03 20:32:00

by James Lewis Nance

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, Oct 03, 2002 at 09:56:42AM -0700, Linus Torvalds wrote:

> And that "old_stat()" thing really ought to go some day.. It's not much of
> a support burden, and yeah, we can point people to "that old a.out binary
> from 1993 still works fine", so I guess we'll keep it another ten years,
> but at this point that has less to do with technical judgement than with
> sentimentality, I think ;^)
>
> But yeah, I think on the whole we've done pretty well on being binary
> compatible.

My wife still uses Applix, which I purchased when Red Hat first
started selling it. The kernel runs it just fine. Interestingly
enough, Red Hat no longer ships the shared libs that it uses,
but installing the necessary rpms from Red Hat 6.0 makes it work.
I looked at the dates on the binaries and they are from 1996,
but I am pretty sure they are substantially older than that.
I do appreciate you putting effort into binary compatability.
My wife would be quite upset with me if Applix quit working :-)

Jim

2002-10-03 22:27:46

by Greg KH

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

On Thu, Oct 03, 2002 at 05:16:10PM +0100, [email protected] wrote:
>
> Hmmm, then for 3.0 I'd vote for fully working and proven stable:

Hm, how do you "prove" any of these are stable :)

> * Bluetooth

Been there since 2.4

> * USB (2)

Present in 2.5 (and 2.4 now too)

> * IEEE 1394

Been there since 2.4.

thanks,

greg k-h

2002-10-04 06:19:54

by jbradford

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

> > Hmmm, then for 3.0 I'd vote for fully working and proven stable:
>
> Hm, how do you "prove" any of these are stable :)

Hmm, yeah, I see what you mean, but for me, proved stable is a couple of years of being in a major distribution, with people actually using it.

Now that major distributions no longer ship development kernels, (Slackware used to - I have slackware CDs with 1.3.x trees on them, for example), this is a less valid point.

> > * Bluetooth
>
> Been there since 2.4

..and I'm sure the three people actually using it haven't found any bugs yet ;-)

> > * USB (2)
>
> Present in 2.5 (and 2.4 now too)

..and yet there are still complaints that it doesn't work every day on the list.

> > * IEEE 1394
>
> Been there since 2.4.

Still marked as experimental, though. Not stable yet.

John.

2002-10-04 06:34:59

by Greg KH

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

On Fri, Oct 04, 2002 at 07:33:58AM +0100, [email protected] wrote:
> > > Hmmm, then for 3.0 I'd vote for fully working and proven stable:
> >
> > Hm, how do you "prove" any of these are stable :)
>
> Hmm, yeah, I see what you mean, but for me, proved stable is a couple
> of years of being in a major distribution, with people actually using
> it.

Ah, so no one actually uses those things in your list. So glad to hear
that...

> > > * USB (2)
> >
> > Present in 2.5 (and 2.4 now too)
>
> ..and yet there are still complaints that it doesn't work every day on the list.

Hm, must have missed those. I haven't seen any USB 2.0 complaints in
quite some time. The majority of USB "issues" are crappy usb storage
devices that don't match the USB storage spec, or PCI IRQ routing
problems.

But hey, no one cares about USB, I'm used to it :)

greg k-h

2002-10-04 07:04:01

by jbradford

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

> > > > Hmmm, then for 3.0 I'd vote for fully working and proven stable:
> > >
> > > Hm, how do you "prove" any of these are stable :)
> >
> > Hmm, yeah, I see what you mean, but for me, proved stable is a couple
> > of years of being in a major distribution, with people actually using
> > it.
>
> Ah, so no one actually uses those things in your list. So glad to hear
> that...

Whatever. I wouldn't call them 3.0 material yet - would you?

> > > > * USB (2)
> > >
> > > Present in 2.5 (and 2.4 now too)
> >
> > ..and yet there are still complaints that it doesn't work every day on the list.
>
> Hm, must have missed those. I haven't seen any USB 2.0 complaints in
> quite some time. The majority of USB "issues" are crappy usb storage
> devices that don't match the USB storage spec, or PCI IRQ routing
> problems.

We have to code for the devices that are out there. Big deal if we follow the spec to the letter - if Mr Average plugs in his USB device and it doesn't work, well, it doesn't work. It's no good lecturing him on the spec. I don't usually take that view, but when there are a large number of broken devices, what are the other options?

> But hey, no one cares about USB, I'm used to it :)

I certainly don't care about USB, I don't even have a USB port on my main box, but if you're saying that the current support is 3.0 material, then I totally disagree.

I started this thread because I'd originally thought that 1.x.x -> 2.x.x happened due to moving from a.out to elf as the standard binary format. Linus corrected me on that one, and pointed out that it was major feature enhancements that dictate the major version number change. Given that, I am not in any hurry to see it move to 3.0.0 :-).

John.

2002-10-04 07:27:32

by Greg KH

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem

On Fri, Oct 04, 2002 at 08:17:58AM +0100, [email protected] wrote:
> > Hm, must have missed those. I haven't seen any USB 2.0 complaints in
> > quite some time. The majority of USB "issues" are crappy usb storage
> > devices that don't match the USB storage spec, or PCI IRQ routing
> > problems.
>
> We have to code for the devices that are out there. Big deal if we
> follow the spec to the letter - if Mr Average plugs in his USB device
> and it doesn't work, well, it doesn't work. It's no good lecturing
> him on the spec. I don't usually take that view, but when there are a
> large number of broken devices, what are the other options?

I agree, we must make these devices work. But when your dealing with
odd devices, that violate the spec in random ways, and you don't have
documentation on how these devices are broken, and you aren't getting
paid to provide support for these devices, development can be a bit slow
at times. And because of these factors, we will almost always lag
behind the OSes that manufacturers directly support.

> > But hey, no one cares about USB, I'm used to it :)
>
> I certainly don't care about USB, I don't even have a USB port on my
> main box, but if you're saying that the current support is 3.0
> material, then I totally disagree.

I didn't say that it was "3.0 material", you did.

What is pretty major is the core device model. Lots of driver api
changes and cleanups have happened in 2.5. It's almost starting to look
sane in places :)

greg k-h

2002-10-04 20:01:33

by Bill Davidsen

[permalink] [raw]
Subject: Re: v2.6 vs v3.0

On Tue, 1 Oct 2002, Matthias Andree wrote:

> On Mon, 30 Sep 2002, Bill Davidsen wrote:
>
> > I do that, but it doesn't make for a storage medium I can easily use on
> > another system. The cost of DVD writers is coming down, and non-magnetic
> > media may have some advantages as well. Still, thay're small compared to
> > disk sizes.
>
> There are big drives available if you really want one (and can afford
> one, which is the bigger problem usually).

The real problem is that the media is expensive. DVD media is <$10 and
encourages taking backups fairly often. In the long run that's most
important, not the initial cost. Trying to get a client to take an
incremental and store it off-site daily is easier at $5-8 than $50+.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-10-04 22:25:13

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (NUMA)

> The only policy for major numbers has always been "major capability
> changes". 1.0 was "networking is stable and generally usable" (by the
> standards of that time), while 2.0 was "SMP and true multi-architecture
> support". My planned point for 3.0 was NuMA support, but while we actually
> have some of that, the hardware just isn't relevant enough to matter.

When you say we have "some of" that (NuMA support) ... what else would you
like to see? The main things on the planned list as far as I'm concerned are:

1. NUMA aware scheduler.
2. multipath IO with NUMA support
3. per-node slabcache.
4. NUMA aware multidrop networking.

The first 3 of these three are floating around as patches, and I'm still hoping to get
them merged before 2.5 (none are quite ready for merge yet, but should be in time).
I'll admit that people weren't desperately keen on doing multipath IO in the SCSI
layer, but it seems like the only feasible way short term ....

I'd be most curious as to what else you think should be done (short or long term)
in this area, and any comments on the above 4 items?

Thanks,

Martin.

2002-10-04 23:07:14

by Linus Torvalds

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (NUMA)


On Fri, 4 Oct 2002, Martin J. Bligh wrote:
>
> When you say we have "some of" that (NuMA support) ... what else would you
> like to see?

The main thing that I think is lacking is any relevance to any significant
user base, thanks to lack of interesting hardware. So even if Linux itself
was doing everything perfectly, as long as there is no wide hw base and
users, it's all pretty much academic, the same way SMP was during the
early 1.x days.

And I'm not trying to put you or any of the Linux NuMA work down here, I'm
just saying that what makes it not important as a "3.0 feature" is just
that deployment doesn't merit it yet.

Linus

2002-10-05 00:20:20

by Martin J. Bligh

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (NUMA)

> The main thing that I think is lacking is any relevance to any significant
> user base, thanks to lack of interesting hardware. So even if Linux itself
> was doing everything perfectly, as long as there is no wide hw base and
> users, it's all pretty much academic, the same way SMP was during the
> early 1.x days.
>
> And I'm not trying to put you or any of the Linux NuMA work down here, I'm
> just saying that what makes it not important as a "3.0 feature" is just
> that deployment doesn't merit it yet.

Fair enough, I appreciate it's not a wide market segment right now.
It's not a quick and easy project though, so there's a long-ish ramp up time.
It would be nice to have it all working and in place by the time Hammer arrives
and makes this much more widespread ;-)

Just an order of magnitude figure for you ... number of seconds spent in kernel
space across all CPUs during a kernel compile on a 16-way NUMA-Q ...

2.4 with every patch I had (including O(1) sched + NUMA mods) ... 120s.
On 2.5.40-mm1 with one small NUMA scheduler patch ... 38s.

Personally, I think that's pretty impressive - lots of very good things have been
happening, from Andrew in particular, the NUMA people, and VM people in general.
IMHO, the NUMA code is also much more readable and less buggy ;-)

M.

2002-10-05 00:29:29

by Linus Torvalds

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (NUMA)


On Fri, 4 Oct 2002, Martin J. Bligh wrote:
>
> It would be nice to have it all working and in place by the time Hammer arrives
> and makes this much more widespread ;-)

I agree, the Hammer is going to be interesting. But one of the most
interesting things to do will be to see if using it as a per-CPU memory
NUMA machine is slower or faster than using it with the memory interleaved
across CPU's (in which case it won't look NUMA at all).

My personal guess (assuming hypertransport works well) is that you'd
actually en dup interleaving at least for dual setups, and quite possibly
for quads as well. The per-node non-interleaved setup probably makes for
best _aggregate_ memory throughput if you have a load that has very
NUMA-friendly behaviour, but interleaving should make for best sustained
throughput for not-very-balanced-loads.

> Just an order of magnitude figure for you ... number of seconds spent in kernel
> space across all CPUs during a kernel compile on a 16-way NUMA-Q ...
>
> 2.4 with every patch I had (including O(1) sched + NUMA mods) ... 120s.
> On 2.5.40-mm1 with one small NUMA scheduler patch ... 38s.

Yeah, looking good..

Linus

2002-10-05 01:21:19

by Michael Hohnbaum

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (NUMA)



On Fri, 2002-10-04 at 17:36, Linus Torvalds wrote:

> > Just an order of magnitude figure for you ... number of seconds spent in kernel
> > space across all CPUs during a kernel compile on a 16-way NUMA-Q ...
> >
> > 2.4 with every patch I had (including O(1) sched + NUMA mods) ... 120s.
> > On 2.5.40-mm1 with one small NUMA scheduler patch ... 38s.
>
> Yeah, looking good..
>
Now if we could get the "one small NUMA scheduler patch" into the
kernel...

> Linus
>
>
--
Michael Hohnbaum 503-578-5486
[email protected] T/L 775-5486

2002-10-06 01:25:18

by Rob Landley

[permalink] [raw]
Subject: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Friday 04 October 2002 07:13 pm, Linus Torvalds wrote:
> On Fri, 4 Oct 2002, Martin J. Bligh wrote:
> > When you say we have "some of" that (NuMA support) ... what else would
> > you like to see?
>
> The main thing that I think is lacking is any relevance to any significant
> user base, thanks to lack of interesting hardware. So even if Linux itself
> was doing everything perfectly, as long as there is no wide hw base and
> users, it's all pretty much academic, the same way SMP was during the
> early 1.x days.
>
> And I'm not trying to put you or any of the Linux NuMA work down here, I'm
> just saying that what makes it not important as a "3.0 feature" is just
> that deployment doesn't merit it yet.

Linux isn't going to get a new order of magnitude surge from the server
space, because there isn't an order of magnitude left. The figures I've seen
from several sources broadly agree that Linux currently has somewhere between
a fifth and a third of the server market, has been doing quite well on that
score for some time, and continues to make steady incremental advances
(taking about equal amounts of market share away from proprietary unixen and
NT boxen). 2.4 is already pretty darn good on a server (assuming you never
hit swap. :). Even 2.2 wasn't at all bad at it.

The new uncharted territory for Linux, and the next major order-of-magnitude
jump in the installed base, is the desktop. A kernel that could make a
credible stab at the desktop would certainly be 3.0 material. And the work
that matters for the desktop is LATENCY work. Not SMP, not throughput, not
more memory. Latency. O(1), deadline I/O scheduler, rmap, preempt, shorter
clock ticks,

Yeah, a lot of the necessary work is user space stuff. But not all. We've
focused on the "MP3 skipping/cd burner underrun" type stuff, which is
important, but in reality an awful lot of the windows "look and feel" issues
boil down to the simple fact that enough of their windowing system is welded
into the kernel that their mouse pointer keeps updating smoothly no matter
how heavily loaded the system is, and when you click on a window its Z-order
gets promoted snappily under just about all circumstances. That's it.
That's the big secret. The mouse pointer doesn't stall, and the windows
respond immediately when you click on 'em.

This may not be a USEFUL response, but it's an immediate one. The inside of
the window may not redraw for 30 seconds, and the pulldown menus and buttons
will just ignore you for a while after that, but what the user EXPERIENCES is
snappy response to commands and smooth interactive feel. Just from those two
things. The system is listening. It may not do anything but drool in
response, but you can see that it's LISTENING. And it's not just a cosmetic
thing: try using a touchpad or nipple mouse on a laptop when the pointer
stalls: you have to wait for it to start up again or you overshoot your
target. It's not a question of "queue up the next three clicks and wait for
it to get around to them", you need interactive feedback to get your mouse in
the right place. Having it stall is really annoying in that case. The
instant an app blocks on a swapped out page, or any other read, and then I/O
starvation occurs with reads blocked by a ton of writes... BANG. User
twiddles thumb while their mouse pointer ignores them. (Speculatively
swapping out a page or two of the X server because it's easy to swap them
back in doesn't help if reads are blocked behind three seconds worth of
writes and your mouse pointer stalls at the edge of the window because of
this.)

Now to fake this in Linux, you theoretically just need to run your X server
and your window manager at a priority of -10 (and somebody needs to club the
distributions on the head until they start DOING this). But in the past,
that wouldn't guarantee your mouse cursor didn't do a half-second pause at a
window boundary when the swap file went nuts. There was NOTHING you could do
under the first dozen 2.4 kernels to make sure your mouse pointer wouldn't
stall at a window boundary, or go into la-la land for five minutes for that
matter. (It improved noticeably after that, but by then most people's
opinions of 2.4's desktop suitability were already formed. And it's STILL
not fully fixed in 2.4: the instant an app blocks on a swapped out page and
then I/O starvation happens with reads blocked by writes... BANG. User
twiddles thumb while their mouse pointer ignores them. Solution? Never do
anything disk intensive in the background unless you want interactive feel to
go into the toilet.)

The new deadline I/O scheduler directly addresses this, and the ability to
get "nice" to affect I/O priority is going to be a big win as well. Andrea
and Rik's VM work help here: rmap adds a lot of future tuning potential, such
as the ability to make SWAP care about niceness (swap out pages from the
nice+20 process before the nice-20 process). The O(1) scheduler helps here
by making niceness levels more meaningful in general. All of these help X11
at nice level -10 to not stall. The faster clock tick helps here too, the
low latency work at the start of 2.5 helps here, and preempt helps here.
There has been a LOT of work on general latency improvement and interactive
feel.

Even the new threading work can potentially help X spin off a dedicated
high-priority "update the mouse position, and manipulate window borders and z
order, and never swap this thread out" thread. (I remember the way OS/2 used
to cheat and give extra time slices to anything that got a Presentation
Manager window event, so you could literally speed up your program on a
loaded system by "scrubbing" the mouse across it repeatedly. The resulting
perception was a snappy desktop, whatever the reality was.)

Sure there's a psychological "third time's the charm" thing that MS has
conditioned the unwashed masses into believing, and a 3.0 kernel would make a
bigger marketing splash than a 2.6 kernel. And for that reason we should NOT
go to 3.0 until we ARE ready for a horde of desktop users to give Linux a try
(and potentially get burned and run away and hide and never look at us
again). But 2.5 DOES contain some significant attempts at addressing the
needs of desktop (and laptop) users. And THAT is what makes it 3.0 material.
To me, anyway. :)

Rob

(P.S. The fact Apple's conditioning the market to take unix seriously on the
desktop with OS X is just a case of convenient timing. And now that floppies
have gone the way of the dodo, the conceptual incompatabilty between "mount"
and removable media is largely a question of CDs, which are software
ejected...)

(P.P.S. There was some argument way back that 2.4 should have been 3.0 due
to the amount of new stuff in it. Old hat now, but the residue is a tendency
to compare 2.5 and 2.3 and say "if we didn't do it then, why do it now". But
looking at it the other way, doesn't that just make the jump between 2.0 and
2.5 even bigger, and INCREASE the rationale for calling the new release 3.0?)

(P.P.P.S. I'll stop now. :)

2002-10-06 06:29:37

by Martin J. Bligh

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

> Linux isn't going to get a new order of magnitude surge from the
> server space, because there isn't an order of magnitude left.

Depends on how you define the word "server". If you mean a PC being
a webserver, I'd agree. If you mean "large database server", I wouldn't.
The term is so broad as to be useless in this context.

> The new uncharted territory for Linux, and the next major
> order-of-magnitude jump in the installed base, is the desktop.
> A kernel that could make a credible stab at the desktop would
> certainly be 3.0 material. And the work that matters for the
> desktop is LATENCY work. Not SMP, not throughput, not more memory.
> Latency. O(1), deadline I/O scheduler, rmap, preempt, shorter
> clock ticks,

I'd agree there are definitely some improvements to be made in this
space. My laptop skipping on xmms whilst I compile the kernel pisses
me off. But that's not why Linux is not sucessful on the the desktop
...

> Yeah, a lot of the necessary work is user space stuff.

... that is. Userspace sucks. X-windows is a pig, and a monumental
pain in the ass to configure - I've been doing it for 10 years, and
I still hate it every single damned time I have to do it. After
years of utter crap I finally have a browser that more or less
works (Galeon, yay), though it still has some stupid annoying bugs.
Fonts are still a pain. Laptops are a minefield of turd-covered
banana skins.

Yeah, I may play with the kernel all day, can debug stuff if I have
to, and can figure out how to set things up by staring at documentation
or source code for ages if it's really necessary. But I don't want to.
I want things that are easy to use for the basic stuff, and just
frigging work out of the box. I don't want to be asked bunches of
questions that really don't matter that much perioidically throughout
a Debian install. Spending all day playing with desktop nonsense isn't
fun, I just want to get on with real work.

It's getting better. But the reason Linux is not a desktop hit has
very little to do with interactive scheduler response, or other kernel
niceties. The kernel blows the competition out of the water, even if
it does have a few problems here and there. It's to do with applications,
proprietary file formats, and commercial support.

> important, but in reality an awful lot of the windows "look and feel"
> issues boil down to the simple fact that enough of their windowing
> system is welded into the kernel that their mouse pointer keeps
> updating smoothly no matter how heavily loaded the system is,
> and when you click on a window its Z-order gets promoted snappily
> under just about all circumstances. That's it.

Pft. What OS are you talking about here? Surely not Microsoft?
Send me your copy, it's obviously very different from mine.

> (P.S. The fact Apple's conditioning the market to take unix
> seriously on the desktop with OS X is just a case of convenient
> timing.

You really think the market gives a damn that there's UNIX underneath
the hood of Apple machines? I beg to differ. They like the fact that
it actually works, and can really multitask, maybe ... which is an
indirect effect. But they don't care (on the whole) about the fact
that it's UNIX.

M.

2002-10-07 05:23:17

by John Alvord

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Sat, 5 Oct 2002 16:30:32 -0400, Rob Landley <[email protected]>
wrote:

>On Friday 04 October 2002 07:13 pm, Linus Torvalds wrote:
>> On Fri, 4 Oct 2002, Martin J. Bligh wrote:
>> > When you say we have "some of" that (NuMA support) ... what else would
>> > you like to see?
>>
>> The main thing that I think is lacking is any relevance to any significant
>> user base, thanks to lack of interesting hardware. So even if Linux itself
>> was doing everything perfectly, as long as there is no wide hw base and
>> users, it's all pretty much academic, the same way SMP was during the
>> early 1.x days.
>>
>> And I'm not trying to put you or any of the Linux NuMA work down here, I'm
>> just saying that what makes it not important as a "3.0 feature" is just
>> that deployment doesn't merit it yet.
>
>Linux isn't going to get a new order of magnitude surge from the server
>space, because there isn't an order of magnitude left. The figures I've seen
>from several sources broadly agree that Linux currently has somewhere between
>a fifth and a third of the server market, has been doing quite well on that
>score for some time, and continues to make steady incremental advances
>(taking about equal amounts of market share away from proprietary unixen and
>NT boxen). 2.4 is already pretty darn good on a server (assuming you never
>hit swap. :). Even 2.2 wasn't at all bad at it.
>
>The new uncharted territory for Linux, and the next major order-of-magnitude
>jump in the installed base, is the desktop. A kernel that could make a
>credible stab at the desktop would certainly be 3.0 material. And the work
>that matters for the desktop is LATENCY work. Not SMP, not throughput, not
>more memory. Latency. O(1), deadline I/O scheduler, rmap, preempt, shorter
>clock ticks,

The big drag on making progress on the desktop is the inertia of
existing applications. Speed/Performance is rarely a problem... just
wait a few months for more power or lower price. PCs are already
overpowered for the typical desktop workload.

Progress in that area is always possible but it will be very slow and
marginal.

john alvord

2002-10-07 08:33:55

by Giuliano Pochini

[permalink] [raw]
Subject: RE: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 n


> important, but in reality an awful lot of the windows "look and feel" issues
> boil down to the simple fact that enough of their windowing system is welded
> into the kernel that their mouse pointer keeps updating smoothly no matter
> how heavily loaded the system is, and when you click on a window its Z-order
> gets promoted snappily under just about all circumstances. That's it.

I feel linux more responsive than M$ windos. But AmigaOS was better. In
AmigaOS the GUI was handled is a different way. UI, widgets, windows, etc.
run in a separate process, so even if the application is busy you can press
buttons, and the events are queued. GTK, QT, etc.. have a different behaviour
and you can't interact with the UI while the application is busy. It is
possible, but it requires a lot of extra work for the developer and almost
nobody does it. To get more GUI responsiveness, the right way is to change
UI toolkits. The kernel works just fine now.

And about sound skipping, I found that libtool is the most offender. I
don't know why (it's a shell script...), but it it. It causes a short
pause of everything. I use a ppc, perhaps on other archs it's harmless.


Bye.

2002-10-07 13:52:43

by Jesse Pollard

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Saturday 05 October 2002 03:30 pm, Rob Landley wrote:
> On Friday 04 October 2002 07:13 pm, Linus Torvalds wrote:
[snip]
> Now to fake this in Linux, you theoretically just need to run your X server
> and your window manager at a priority of -10 (and somebody needs to club
> the distributions on the head until they start DOING this). But in the
> past, that wouldn't guarantee your mouse cursor didn't do a half-second
> pause at a window boundary when the swap file went nuts. There was NOTHING
> you could do under the first dozen 2.4 kernels to make sure your mouse
> pointer wouldn't stall at a window boundary, or go into la-la land for five
> minutes for that matter. (It improved noticeably after that, but by then
> most people's opinions of 2.4's desktop suitability were already formed.
> And it's STILL not fully fixed in 2.4: the instant an app blocks on a
> swapped out page and then I/O starvation happens with reads blocked by
> writes... BANG. User twiddles thumb while their mouse pointer ignores
> them. Solution? Never do anything disk intensive in the background unless
> you want interactive feel to go into the toilet.)

In other words... don't swap. If an application has to be swapped out, all
bets are off on response time. There are X events that REQUIRE the
application to be in memory if they are going to be handled. (example:
focus follows mouse, auto raise window on focus, app must redraw exposed
area... or worse: app grabs mouse to put it in the workspace on entry to a
status display. Guess what can happen to the mouse.)

> The new deadline I/O scheduler directly addresses this, and the ability to
> get "nice" to affect I/O priority is going to be a big win as well. Andrea
> and Rik's VM work help here: rmap adds a lot of future tuning potential,
> such as the ability to make SWAP care about niceness (swap out pages from
> the nice+20 process before the nice-20 process). The O(1) scheduler helps
> here by making niceness levels more meaningful in general. All of these
> help X11 at nice level -10 to not stall. The faster clock tick helps here
> too, the low latency work at the start of 2.5 helps here, and preempt
> helps here. There has been a LOT of work on general latency improvement and
> interactive feel.

It will still stall everytime the mouse crosses the window border IF the
application has specified "enter/leave" event notification. This requires the
application to be swapped in to recieve the event. The only fix is locking
the application/X libraries into memory.

> Even the new threading work can potentially help X spin off a dedicated
> high-priority "update the mouse position, and manipulate window borders and
> z order, and never swap this thread out" thread. (I remember the way OS/2
> used to cheat and give extra time slices to anything that got a
> Presentation Manager window event, so you could literally speed up your
> program on a loaded system by "scrubbing" the mouse across it repeatedly.
> The resulting perception was a snappy desktop, whatever the reality was.)

Not really - the application may want the mouse pointer changed, update data
based on where the mouse is located (see what happens to a rule bar on
image/word processors). There is also the possibility that multiple processes
are watching the mouse.

The only "fix" that would help this out is to lock the X shared libraries and
X server into memory, and to use a multi-threaded X server, OR have
enough memory available to not swap.

The major difference between M$ window handling and X is that X gives the
users app control over what happens to the mouse. M$ has already defined
what the actions are, it is NOT up to the application. X does not implement
application policy. That is up to the application.

Even M$ Windows will lockup when it swaps out the application. The mouse
might move... but then the entire system hangs (at least under ME).

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-10-07 18:25:56

by Daniel Phillips

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Monday 07 October 2002 15:56, Jesse Pollard wrote:
> [the mouse] will still stall everytime the mouse crosses the window border IF the
> application has specified "enter/leave" event notification. This requires the
> application to be swapped in to recieve the event. The only fix is locking
> the application/X libraries into memory.

That one could be punted with an hourglass cursor, until the events start flowing.
Well. Not sure how much this has to do with the kernel...

--
Daniel

2002-10-07 18:57:59

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Monday 07 October 2002 09:56 am, Jesse Pollard wrote:

> In other words... don't swap.

"Don't swap this bit", anyway.

> If an application has to be swapped out, all
> bets are off on response time.

Alright, breaking the problem down into specific, bite-sized chunks, seeing
what's easily measurable, and then picking the lowest hanging fruit:

The frequency of mouse pointer stalls, and the worst case response time, is
probably something an automated benchmark could measure. (Z-order's a
tricker problem because the window manager's involved, but mouse stalls are
EASY to cause.)

On my laptop (with 256 megs ram and 256 megs swap). Open up 30 or 40
konqueror windows of a "this page looks interesting, I'll read it offline"
variety until memory's full and you're about 2/3 of the way into swap.
(KTimeMon makes this easy to see.) then do something swap-happy in the
background (including downloading a huge file, which causes disk cache to
grow and evict stuff, or of course running a big compile).

No matter how much ram the system has, with six desktops full of open windows
I can usually drive it DEEP into swap, without even picking an easy target
like star/openoffice. (Yeah, KDE sucketh. And X should be able to figure
out that windows not currently being displayed at all (completely behind
other windows, on another desktop, etc) can be swapped out. But it's just
not designed that way...)

> > Even the new threading work can potentially help X spin off a dedicated
> > high-priority "update the mouse position, and manipulate window borders
> > and z order, and never swap this thread out" thread. (I remember the way
> > OS/2 used to cheat and give extra time slices to anything that got a
> > Presentation Manager window event, so you could literally speed up your
> > program on a loaded system by "scrubbing" the mouse across it repeatedly.
> > The resulting perception was a snappy desktop, whatever the reality was.)
>
> Not really - the application may want the mouse pointer changed, update
> data based on where the mouse is located (see what happens to a rule bar on
> image/word processors). There is also the possibility that multiple
> processes are watching the mouse.

You may notice that in mozilla when your rat moves over a link, the mouse
pointer turns into a hand anywhere up to several seconds later on a
pathologically loaded system. This usually doesn't stop the pointer from
moving if you just want to wander past the link and continue on. "Tooltips"
take two or three seconds to pop up, and this is a GOOD thing...

if the mouse movement stalls, you can't navigate with a nipple mouse or
touchpad (which is all you get on a laptop), 'cause you'll overshoot. Having
the button under the mouse highlight is secondary to being able to get the
mouse over the button.

When the system isn't loaded anymore (went away while a compile finished or a
file downloaded), you get one or two small (1/4 second) stalls as stuff swaps
back in and then life is good. It's when you swap stuff in and then it swaps
back out after 3 seconds of inactivity that it gets to be a real pain
(something the deadline I/O scheduler is supposed to help)...

Maybe the correct thing here is a user space fix, with X throwing certain
event handlers into an mlocked shared library, just so your mouse pointer
always updates smoothly. But I do know a lot of work has gone into making
more intelligent swapping decisions (fundamentally, that's all VM work really
is), and it's certainly a heck of a lot better than the 2.4.6 days where you
had to go get a beverage when it went swap-happy and it could be 30 seconds
between pointer updates.

> Even M$ Windows will lockup when it swaps out the application. The mouse
> might move... but then the entire system hangs (at least under ME).

The amazing number of things windows manages to screw up should not be used
to prevent discussiona about the small number of things they successfully
copied from the macintosh. :)

Rob

2002-10-08 08:15:39

by Jan Hudec

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Mon, Oct 07, 2002 at 08:22:41PM +0200, Daniel Phillips wrote:
> On Monday 07 October 2002 15:56, Jesse Pollard wrote:
> > [the mouse] will still stall everytime the mouse crosses the window border IF the
> > application has specified "enter/leave" event notification. This requires the
> > application to be swapped in to recieve the event. The only fix is locking
> > the application/X libraries into memory.
>
> That one could be punted with an hourglass cursor, until the events start flowing.
> Well. Not sure how much this has to do with the kernel...

Nothing. It's X. And it will take another X protocol extension (so it
will suck yet more).

-------------------------------------------------------------------------------
Jan 'Bulb' Hudec <[email protected]>

2002-10-08 22:12:04

by Jesse Pollard

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Monday 07 October 2002 09:03 am, Rob Landley wrote:
> On Monday 07 October 2002 09:56 am, Jesse Pollard wrote:
> > In other words... don't swap.
>
> "Don't swap this bit", anyway.
>
> > If an application has to be swapped out, all
> > bets are off on response time.
>
> Alright, breaking the problem down into specific, bite-sized chunks, seeing
> what's easily measurable, and then picking the lowest hanging fruit:
>
> The frequency of mouse pointer stalls, and the worst case response time, is
> probably something an automated benchmark could measure. (Z-order's a
> tricker problem because the window manager's involved, but mouse stalls are
> EASY to cause.)
>
> On my laptop (with 256 megs ram and 256 megs swap). Open up 30 or 40
> konqueror windows of a "this page looks interesting, I'll read it offline"
> variety until memory's full and you're about 2/3 of the way into swap.
> (KTimeMon makes this easy to see.) then do something swap-happy in the
> background (including downloading a huge file, which causes disk cache to
> grow and evict stuff, or of course running a big compile).

Out of curiosity, does it also happen if you have no swap?
It is my understanding that this change will prevent much (not all) of the
swap activity, giving a quicker response to the mouse events. It should
increase the amount of actual swap activity, but each activiation will be of
shorter duration, giving a "better" apparent interactive response.

> No matter how much ram the system has, with six desktops full of open
> windows I can usually drive it DEEP into swap, without even picking an easy
> target like star/openoffice. (Yeah, KDE sucketh. And X should be able to
> figure out that windows not currently being displayed at all (completely
> behind other windows, on another desktop, etc) can be swapped out. But
> it's just not designed that way...)

partly depends on whether the X window buffers are page aligned... If they
were then that should be the result. I bet they arn't page aligned.

> > > Even the new threading work can potentially help X spin off a dedicated
> > > high-priority "update the mouse position, and manipulate window borders
> > > and z order, and never swap this thread out" thread. (I remember the
> > > way OS/2 used to cheat and give extra time slices to anything that got
> > > a Presentation Manager window event, so you could literally speed up
> > > your program on a loaded system by "scrubbing" the mouse across it
> > > repeatedly. The resulting perception was a snappy desktop, whatever the
> > > reality was.)
> >
> > Not really - the application may want the mouse pointer changed, update
> > data based on where the mouse is located (see what happens to a rule bar
> > on image/word processors). There is also the possibility that multiple
> > processes are watching the mouse.
>
> You may notice that in mozilla when your rat moves over a link, the mouse
> pointer turns into a hand anywhere up to several seconds later on a
> pathologically loaded system. This usually doesn't stop the pointer from
> moving if you just want to wander past the link and continue on.
> "Tooltips" take two or three seconds to pop up, and this is a GOOD thing...

I was thinking more about switching pointer on window entry. I don't think
a link is implemented as a window. (I thought is was a proximity check in an
already loaded event). Or places that do pointer grabs (fortunately for me
most of the dialog boxes I see in X don't do this).

Also the "tooltips" thing is implemented as a mouse window entry event
which in turn sets a timer event. A mouse window exit event generates
a timer cancel.

One of the most amazing thing to me is the total number
of events that occur on something a simple as a scroll bar. Entering a
window can generate 8-10 events depending which toolkit is used.
First the pointer character is changed, then events cascade since the
border of a scrollbar may actually have 2 or 3 windows, each with
a different requirement, but requesting a window entry/exit event.

> if the mouse movement stalls, you can't navigate with a nipple mouse or
> touchpad (which is all you get on a laptop), 'cause you'll overshoot.
> Having the button under the mouse highlight is secondary to being able to
> get the mouse over the button.
>
> When the system isn't loaded anymore (went away while a compile finished or
> a file downloaded), you get one or two small (1/4 second) stalls as stuff
> swaps back in and then life is good. It's when you swap stuff in and then
> it swaps back out after 3 seconds of inactivity that it gets to be a real
> pain (something the deadline I/O scheduler is supposed to help)...

This is where a slightly different method of handling background processes
(and I/O requests). A background process should have a lower processing
priority. The I/O activity generated by that background process should also
have a lower priority. The deadline I/O scheduler should/would/could then
keep the forground processes (X server, apps with exposed windows) running
by processing their I/O first.

This also assumes that the X server MIGHT be able to change the priority of
processes attached to hidden windows (iconified/covered). It doesn't address
those processes that may be running detached (cron or started by terminal
emulators) which would act like foreground processes. Though the terminal
emulators could be detected, and have all subprocesses of the controlling
pty reduced in priority.... Also have to recognize when they should again
be elevated too... (or even if they should be. These things can take a LOT
of resources). It would also have to be under the control of the user, since
the user may need the background compile done ASAP (even if the user
DOES run a solitare game covering the terminal window...)

> Maybe the correct thing here is a user space fix, with X throwing certain
> event handlers into an mlocked shared library, just so your mouse pointer
> always updates smoothly. But I do know a lot of work has gone into making
> more intelligent swapping decisions (fundamentally, that's all VM work
> really is), and it's certainly a heck of a lot better than the 2.4.6 days
> where you had to go get a beverage when it went swap-happy and it could be
> 30 seconds between pointer updates.

Unfortunately, X cannot control the event handlers. That is the rest of the
application, and you end up locking the entire application in memory.

--
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: [email protected]

Any opinions expressed are solely my own.

2002-10-09 00:06:21

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Tuesday 08 October 2002 06:14 pm, Jesse Pollard wrote:

> > On my laptop (with 256 megs ram and 256 megs swap). Open up 30 or 40
> > konqueror windows of a "this page looks interesting, I'll read it
> > offline" variety until memory's full and you're about 2/3 of the way into
> > swap. (KTimeMon makes this easy to see.) then do something swap-happy in
> > the background (including downloading a huge file, which causes disk
> > cache to grow and evict stuff, or of course running a big compile).
>
> Out of curiosity, does it also happen if you have no swap?

I'd trigger the OOM killer a lot easier? (Done it more than once without
meaning to...)

It used to go into REAL swap meltdown once the swap file was full, because
it'd start paging out executables and libraries back into their files. I've
actually tried to avoid testing that recently, for obvious reasons. :)

As soon as I read, take notes from, index, and close about 40 open web pages,
I can reboot the sucker without swap. (I could try to swapoff a heavily
loaded running system, but I tried that once and the results were NOT
pretty...)

> It is my understanding that this change will prevent much (not all) of the
> swap activity, giving a quicker response to the mouse events. It should
> increase the amount of actual swap activity, but each activiation will be
> of shorter duration, giving a "better" apparent interactive response.

I haven't been brave enough to run 2.5 on my laptop yet. (Soon. I've
downloaded it, compiled it, but haven't made it through the "what do I need
to upgrade" list yet. This sucker's still running 2.4.19 inserted in a
modified red hat 7.2.) My test machine at home's an old pentium pro 180 with
96 megs of ram, so I haven't exactly got the world's highest interactive
expectations there.

> > You may notice that in mozilla when your rat moves over a link, the mouse
> > pointer turns into a hand anywhere up to several seconds later on a
> > pathologically loaded system. This usually doesn't stop the pointer from
> > moving if you just want to wander past the link and continue on.
> > "Tooltips" take two or three seconds to pop up, and this is a GOOD
> > thing...
>
> I was thinking more about switching pointer on window entry. I don't think
> a link is implemented as a window. (I thought is was a proximity check in
> an already loaded event). Or places that do pointer grabs (fortunately for
> me most of the dialog boxes I see in X don't do this).

All sorts of things can cause a stall at the edge of the window. I've seen
it happen at the edge of the little animated mozilla logo.

To drive a 2.4 system to its knees, all you have to do is "cat /dev/zero >
bigfile" on a partition with a few gigabytes free, and then scrub the mouse a
bit.

Tried it on a friend's workstation a minute ago. The result was NOT pretty.
2.4.19 is a lot better about this than whatever shipped with his SuSE box,
but if you want to make desktop interactive feel suck, try running this in
the following on a system that's a ways into swap. (It needs 4 gigs of disk
space, which should be more ram than most people have...)

while true
do
dd if=/dev/zero of=tempfile bs=65536 count=65536
rm tempfile
done

It's certainly improving. On 2.4.19, the mouse cursor only really seems to
get truly jerky when you exhaust the swap so badly it pages to the executable
files. (Then again, every few minutes it goes consistently jerky for several
seconds.)

But by the same token, I have a server running 2.4.19 that when receiving a
big file transfer through the 100baseT and blasting it to disk, goes
completely into la-la land and won't allow new ssh connections until the
transfer ends. (I've given it a 4 gigabyte transfer and waited minutes. The
prompt shows up about one second after the transfer ends, and I had more than
one machine queued waiting like that...)

I'm hoping 2.5 categorically fixes this, but haven't put it on a production
machine yet. Maybe I'll be able to slap together an appropriate spare box in
a few days. (P.S. Did make meuconfig crashing when you tried to enter the
ALSA menu ever get fixed? Set me back half an hour, that did...)

> Also the "tooltips" thing is implemented as a mouse window entry event
> which in turn sets a timer event. A mouse window exit event generates
> a timer cancel.
>
> One of the most amazing thing to me is the total number
> of events that occur on something a simple as a scroll bar. Entering a
> window can generate 8-10 events depending which toolkit is used.
> First the pointer character is changed, then events cascade since the
> border of a scrollbar may actually have 2 or 3 windows, each with
> a different requirement, but requesting a window entry/exit event.

Not exactly an easy problem to solve from kernel space, no. But when
unrelated processes can seriously interact with each other, you can't help
but think the kernel is involved somehow... :)

> This is where a slightly different method of handling background processes
> (and I/O requests). A background process should have a lower processing
> priority.

1) This doesn't affect I/O.

2) Swapping, running executables, stating files... all I/O the high priority
process may need to do.

Hmmm... You know,it might be a good idea to rip the swap file out of that
SERVER (which has 256 megs of ram also, that should be plenty) and see if
that makes the incoming transfer hang go away...

> The I/O activity generated by that background process should also
> have a lower priority. The deadline I/O scheduler should/would/could then
> keep the forground processes (X server, apps with exposed windows) running
> by processing their I/O first.

This is what I'm hoping. This is not the 2.4 reality, I'll tell you that. :)

> This also assumes that the X server MIGHT be able to change the priority of
> processes attached to hidden windows (iconified/covered).

Ingo was thinking about letting normal processes nice themselves up a couple
of levels. Enough that abuse wouldn't matter too much, but so that processes
intended to be interactive could identify themselves as such.

Part of the problem is that "nice" is really trying to say two things. "I
want more CPU time" and "I want lower latency". In theory, interactive
processes could get SHORTER time slices (subject to some minimum), they just
need to be dispatched more rapidly when they unblock. Possibly the scheduler
needs some kind of hint in addition to just a number.

> It doesn't
> address those processes that may be running detached (cron or started by
> terminal emulators) which would act like foreground processes. Though the
> terminal emulators could be detected, and have all subprocesses of the
> controlling pty reduced in priority.... Also have to recognize when they
> should again be elevated too... (or even if they should be. These things
> can take a LOT of resources). It would also have to be under the control of
> the user, since the user may need the background compile done ASAP (even if
> the user DOES run a solitare game covering the terminal window...)

Again, two separate scheduler problems. A process that wants big long
timeslices but doesn't care about gaps between them, and a process that wants
short time slices in 30 miliseconds or less or it's free. :)

An artifact of the current O(1) scheduler is that if you nice a process way
the heck DOWN it may finish slightly faster, because its timeslices are
longer when it gets them, so the cache stays hot.

Strange but true, or at least "worked for me"...

Rob

2002-10-09 08:11:48

by Alexander Kellett

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Mon, Oct 07, 2002 at 10:03:16AM -0400, Rob Landley wrote:
> The frequency of mouse pointer stalls, and the worst case response time, is
> probably something an automated benchmark could measure. (Z-order's a
> tricker problem because the window manager's involved, but mouse stalls are
> EASY to cause.)

actually with low-latency, preempt and X's new silken mouse
stuff i haven't had any real mouse pointer stalls in a while.
well, apart from my maxtor drive stalling my entire system
(vmstat included) for 2 seconds at a time when i get it to
pump its full 20mb/s (and yes, dma is enabled).

Alex

2002-10-11 23:47:47

by Hans Reiser

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

Rob Landley wrote:

>The new uncharted territory for Linux, and the next major order-of-magnitude
>jump in the installed base, is the desktop. A kernel that could make a
>credible stab at the desktop would certainly be 3.0 material. And the work
>that matters for the desktop is LATENCY work. Not SMP, not throughput, not
>more memory. Latency. O(1), deadline I/O scheduler, rmap, preempt, shorter
>clock ticks,
>
>
>
I must confess to thinking that namespace work is the most strategic
upcoming battle between Linux and Windows, but probably I am biased in
this regard.;-) MS seems to think it also, given the rumors that OFS is
where they are shifting their focus away from the browser and over to
for Longhorn....


Hans

2002-10-12 01:21:51

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Friday 11 October 2002 07:53 pm, Hans Reiser wrote:
> Rob Landley wrote:
> >The new uncharted territory for Linux, and the next major
> > order-of-magnitude jump in the installed base, is the desktop. A kernel
> > that could make a credible stab at the desktop would certainly be 3.0
> > material. And the work that matters for the desktop is LATENCY work.
> > Not SMP, not throughput, not more memory. Latency. O(1), deadline I/O
> > scheduler, rmap, preempt, shorter clock ticks,
>
> I must confess to thinking that namespace work is the most strategic
> upcoming battle between Linux and Windows, but probably I am biased in
> this regard.;-) MS seems to think it also, given the rumors that OFS is
> where they are shifting their focus away from the browser and over to
> for Longhorn....

If you're talking about driverfs (kfs, kernelfs, kernfs... i think my vote
really is for patfs here, actually :), it is indeed seriously cool, but most
of it's potential coolness rather than active (kinetic?) coolness. It's
infrastructure for cool things to be built on top of.

For example, handling removable media and transient network resources has
always been a bit of a sore spot for unix derivatives. "mount' doesn't
combine well with ejecting a floppy, and hacks like mcopy would have to be
built into the shell, or some kind of library to be sufficiently generic.
(Your web browser can't right click->save as to "a:".) And most cd-roms I've
tried still won't eject when you hit the button unless you unmount the
filesystem first. there was talking about fixing this back in 2.3. Can't
say i've really thumped on it in 2.5, IDE hasn't been working long enough
yet. NFS has a "don't hang my entire OS" mount option, which I'm told is a
kludge of biblical proportions, but I've mostly stayed away from NFS, so I
really couldn't say.)

MS has been trying and failing to have a coherent naming policy for years.
Two years ago, the active directory hype. I still haven't seen a better
naming system than the amiga (where you could dynamically create a ramdisk by
just copying something to "ram:", that was cool.)

A little side project I'm working on now (in my copious free time) is mount
point relocation support. (You can mount the same filesystem a second time
in another location (mount --bind makes this easy), and they share a
superblock so open files should be happy, but you still can't detach the
first mount point. Not with a hacksaw, or explosives...) It's more an
excuse to learn the new VFS layer than anything else, but it's functionality
I would in fact have a use for, strange enough...

I'm also looking for an "unmount --force" option that works on something
other than NFS. Close all active filehandles (the programs using it can just
deal with EBADF or whatever), flush the buffers to disk, and unmount. None
of this "oh I can't do that, you have a zombie process with an open file...",
I want "guillotine this filesystem pronto, capice?" behavior.

Of course loopback mounts would be kind of upset about this, but to be
honest: tough. The loopback block device gives them an I/O error, and the
filesystem should just cope. Floppies do this all the time with dust and cat
hair and stuff...

Of course I don't yet know 1/10 as much about the VFS as I need to, but I'm
learning. Slowly...

Rob

2002-10-12 04:08:21

by Nick LeRoy

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Friday 11 October 2002 03:26 pm, Rob Landley wrote:
> On Friday 11 October 2002 07:53 pm, Hans Reiser wrote:
<snip>
> A little side project I'm working on now (in my copious free time) is mount
> point relocation support. (You can mount the same filesystem a second time
> in another location (mount --bind makes this easy), and they share a
> superblock so open files should be happy, but you still can't detach the
> first mount point. Not with a hacksaw, or explosives...) It's more an
> excuse to learn the new VFS layer than anything else, but it's
> functionality I would in fact have a use for, strange enough...

Not quite sure that I'm following the _why_ of this one, but maybe I'm just
slow.

> I'm also looking for an "unmount --force" option that works on something
> other than NFS. Close all active filehandles (the programs using it can
> just deal with EBADF or whatever), flush the buffers to disk, and unmount.
> None of this "oh I can't do that, you have a zombie process with an open
> file...", I want "guillotine this filesystem pronto, capice?" behavior.

Now _this_ I *like*. I've wanted this for _a long time_. Not that I have
that much spare time, but I'd like to help if I can!

> Of course loopback mounts would be kind of upset about this, but to be
> honest: tough. The loopback block device gives them an I/O error, and the
> filesystem should just cope. Floppies do this all the time with dust and
> cat hair and stuff...

Yup. This is required sometimes. Ever have a CD mounted that the (#$)#
kernel won't let you umount even though lsof and /proc insist that's there's
nothing open, but all you can do is an fscking reboot?!!!

Thanks

-Nick

2002-10-12 09:58:03

by Hans Reiser

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

Rob Landley wrote:

>I'm also looking for an "unmount --force" option that works on something
>other than NFS. Close all active filehandles (the programs using it can just
>deal with EBADF or whatever), flush the buffers to disk, and unmount. None
>of this "oh I can't do that, you have a zombie process with an open file...",
>I want "guillotine this filesystem pronto, capice?" behavior.
>
This sounds useful. It would be nice if umount prompted you rather than
refusing.

>
>Of course loopback mounts would be kind of upset about this, but to be
>honest: tough. The loopback block device gives them an I/O error, and the
>filesystem should just cope. Floppies do this all the time with dust and cat
>hair and stuff...
>
>Of course I don't yet know 1/10 as much about the VFS as I need to, but I'm
>learning. Slowly...
>
>Rob
>
>
>
>



2002-10-12 11:36:22

by Matthias Andree

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Fri, 11 Oct 2002, Rob Landley wrote:

> I'm also looking for an "unmount --force" option that works on something
> other than NFS. Close all active filehandles (the programs using it can just
> deal with EBADF or whatever), flush the buffers to disk, and unmount. None
> of this "oh I can't do that, you have a zombie process with an open file...",
> I want "guillotine this filesystem pronto, capice?" behavior.

Seconded.

The patch at the URL below used to work back with 2.4.9, I did not track
what has become of it, if it still applies, I haven't needed it recently
or if so, Alt-SysRq was fair enough for me. Maybe just updating this
badfs and forced umount patch for 2.4.20 would suffice:

http://www.moses.uklinux.net/patches/forced-umount-2.4.9.patch

It gives me one reject in fs/super.c that I don't know how to fix:

***************
*** 1145,1150 ****
return retval;
}

spin_lock(&dcache_lock);

if (atomic_read(&sb->s_active) > 1) {
--- 1172,1180 ----
return retval;
}

+ if (flags&MNT_FORCE)
+ quiesce_filesystem(mnt);
+
spin_lock(&dcache_lock);

if (atomic_read(&sb->s_active) > 1) {

2002-10-12 14:50:18

by Hugh Dickins

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Sat, 12 Oct 2002, Matthias Andree wrote:
> On Fri, 11 Oct 2002, Rob Landley wrote:
> > I'm also looking for an "unmount --force" option that works on something
> > other than NFS. Close all active filehandles (the programs using it can just
> > deal with EBADF or whatever), flush the buffers to disk, and unmount. None
> > of this "oh I can't do that, you have a zombie process with an open file...",
> > I want "guillotine this filesystem pronto, capice?" behavior.
>
> Seconded.
>
> The patch at the URL below used to work back with 2.4.9, I did not track
> what has become of it, if it still applies, I haven't needed it recently
> or if so, Alt-SysRq was fair enough for me. Maybe just updating this
> badfs and forced umount patch for 2.4.20 would suffice:
>
> http://www.moses.uklinux.net/patches/forced-umount-2.4.9.patch
>
> It gives me one reject in fs/super.c that I don't know how to fix:

Tigran did update his forced umount patch to 2.4.18,
here's a built but untested patch against 2.4.20-pre10 ...

--- 2.4.20-pre10/fs/Makefile Wed Oct 9 11:53:45 2002
+++ forcedumount/fs/Makefile Sat Oct 12 15:24:11 2002
@@ -68,6 +68,7 @@
subdir-$(CONFIG_SUN_OPENPROMFS) += openpromfs
subdir-$(CONFIG_BEFS_FS) += befs
subdir-$(CONFIG_JFS_FS) += jfs
+subdir-y += badfs


obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
--- 2.4.20-pre10/fs/badfs/Makefile Thu Jan 1 00:00:00 1970
+++ forcedumount/fs/badfs/Makefile Sat Oct 12 15:24:11 2002
@@ -0,0 +1,8 @@
+#
+# Makefile for badfs filesystem.
+#
+
+O_TARGET := badfs.o
+obj-y := inode.o
+
+include $(TOPDIR)/Rules.make
--- 2.4.20-pre10/fs/badfs/inode.c Thu Jan 1 00:00:00 1970
+++ forcedumount/fs/badfs/inode.c Sat Oct 12 15:24:11 2002
@@ -0,0 +1,275 @@
+/*
+ * badfs - the Bad Filesystem
+ *
+ * Author - Tigran Aivazian <[email protected]>
+ *
+ * Thanks to:
+ * Manfred Spraul <[email protected]>, for useful comments.
+ *
+ * This file is released under the GPL.
+ *
+ * The badfs filesystem is used by forced umount ('umount -f' command)
+ * to move inodes that keep the filesystem being umounted busy to it.
+ *
+ * The entry point into this module is via quiesce_filesystem() called
+ * from fs/super.c:do_umount() if MNT_FORCE is passed.
+ */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/file.h>
+
+#define BADFS_MAGIC 0xBADF5001
+
+static struct super_block *badfs_read_super(struct super_block *,void *,int);
+
+#define FS_FLAGS_BADFS (FS_NOMOUNT | FS_SINGLE)
+static DECLARE_FSTYPE(badfs_fs_type,"badfs",badfs_read_super,FS_FLAGS_BADFS);
+
+static struct vfsmount *badfs_mnt; /* returned by kern_mount() */
+static struct super_block *badfs_sb; /* badfs_mnt->mnt_sb */
+static struct dentry *badfs_root; /* badfs_sb->s_root */
+
+static int __init init_badfs_fs(void)
+{
+ int err = register_filesystem(&badfs_fs_type);
+
+ if (!err) {
+ badfs_mnt = kern_mount(&badfs_fs_type);
+ if (IS_ERR(badfs_mnt)) {
+ err = PTR_ERR(badfs_mnt);
+ unregister_filesystem(&badfs_fs_type);
+ } else {
+ badfs_sb = badfs_mnt->mnt_sb;
+ err = 0;
+ }
+ }
+ return err;
+}
+
+static struct inode *badfs_get_inode(struct super_block *sb, int mode)
+{
+ struct inode *inode = get_empty_inode();
+
+ if (inode) {
+ make_bad_inode(inode);
+ inode->i_sb = sb;
+ inode->i_dev = sb->s_dev;
+ inode->i_mode = mode;
+ inode->i_nlink = 1;
+ inode->i_size = 0;
+ inode->i_blocks = 0;
+ }
+ return inode;
+}
+
+/* VFS ->read_super() method */
+static struct super_block *badfs_read_super(struct super_block * sb,
+ void * data, int silent)
+{
+ static struct super_operations badfs_ops = {};
+ struct inode * root = badfs_get_inode(sb, S_IFDIR|S_IRUSR|S_IWUSR);
+
+ if (!root)
+ return NULL;
+ sb->s_blocksize = 1024;
+ sb->s_blocksize_bits = 10;
+ sb->s_magic = BADFS_MAGIC;
+ sb->s_op = &badfs_ops;
+ badfs_root = sb->s_root = d_alloc(NULL,
+ &(const struct qstr){ "bad:", 5, 0});
+ if (!badfs_root) {
+ iput(root);
+ return NULL;
+ }
+ sb->s_root->d_sb = sb;
+ sb->s_root->d_parent = sb->s_root;
+ d_instantiate(sb->s_root, root);
+ return sb;
+}
+
+static void disable_pwd(struct fs_struct *fs)
+{
+ struct inode *inode;
+ struct dentry *dentry;
+
+ inode = badfs_get_inode(badfs_sb, S_IFDIR|0755);
+ if (!inode) {
+ printk(KERN_ERR "disable_pwd(): can't allocate inode\n");
+ return;
+ }
+ dentry = d_alloc(badfs_root, &(const struct qstr){"dead_pwd", 8, 0});
+ if (!dentry) {
+ iput(inode);
+ printk(KERN_ERR "disable_pwd(): can't allocate dentry\n");
+ return;
+ }
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ set_fs_pwd(fs, badfs_mnt, dentry);
+}
+
+static void disable_root(struct fs_struct *fs)
+{
+ struct inode *inode;
+ struct dentry *dentry;
+
+ inode = badfs_get_inode(badfs_sb, S_IFDIR|0755);
+ if (!inode) {
+ printk(KERN_ERR "disable_root(): can't allocate inode\n");
+ return;
+ }
+ dentry = d_alloc(badfs_root, &(const struct qstr){"dead_root", 9, 0});
+ if (!dentry) {
+ iput(inode);
+ printk(KERN_ERR "disable_root(): can't allocate dentry\n");
+ return;
+ }
+ d_instantiate(dentry, inode);
+ dget(dentry);
+ set_fs_root(fs, badfs_mnt, dentry);
+}
+
+/* called from do_umount() if MNT_FORCE is specified */
+void quiesce_filesystem(struct vfsmount *mnt)
+{
+ struct task_struct *p;
+ struct file *file;
+ struct inode *inode;
+
+ /* we do three passes through the task list, examining:
+ * 1. p->fs{->pwd,root} that can keep this mnt busy
+ * 2. p->files, i.e. open files (we do_close them)
+ * 3. p->mm, i.e. mmaped files (we simply do_munmap them)
+ * There is no guarantee that by the time we restart the loop
+ * the amount of work to do in the loop has not increased.
+ */
+repeat1:
+ read_lock(&tasklist_lock);
+ for_each_task(p) {
+ struct fs_struct *fs;
+
+ /* get a reference to p->fs */
+ task_lock(p);
+ fs = p->fs;
+ if (!fs) {
+ task_unlock(p);
+ continue;
+ } else
+ atomic_inc(&fs->count);
+ task_unlock(p);
+
+ if (fs->pwdmnt == mnt) {
+ read_unlock(&tasklist_lock);
+ disable_pwd(fs); /* may sleep */
+ put_fs_struct(fs);
+ goto repeat1;
+ }
+ if (fs->rootmnt == mnt) {
+ read_unlock(&tasklist_lock);
+ disable_root(fs); /* may sleep */
+ put_fs_struct(fs);
+ goto repeat1;
+ }
+ put_fs_struct(fs);
+ }
+ read_unlock(&tasklist_lock);
+
+repeat2:
+ read_lock(&tasklist_lock);
+ for_each_task(p) {
+ unsigned int fd, j = 0;
+ struct files_struct *files;
+
+ /* get a reference to p->files */
+ task_lock(p);
+ files = p->files;
+ if (!files) {
+ task_unlock(p);
+ continue;
+ } else {
+ atomic_inc(&files->count);
+ write_lock(&files->file_lock);
+ }
+ task_unlock(p);
+
+ /* check if this process has open files here */
+ while (1) {
+ unsigned long set;
+
+ fd = j * __NFDBITS;
+ if (fd >= files->max_fdset || fd >= files->max_fds)
+ break;
+ set = files->open_fds->fds_bits[j++];
+ while (set) {
+ if (set & 1) {
+ file = files->fd[fd];
+ if (file) {
+ inode = file->f_dentry->d_inode;
+ if (inode && (file->f_vfsmnt==mnt)) {
+ files->fd[fd] = NULL;
+ FD_CLR(fd, files->close_on_exec);
+ __put_unused_fd(files, fd);
+ write_unlock(&files->file_lock);
+ read_unlock(&tasklist_lock);
+ put_files_struct(files);
+ filp_close(file, files);
+ goto repeat2;
+ }
+ }
+ }
+ fd++;
+ set >>= 1;
+ }
+ }
+ write_unlock(&files->file_lock);
+ put_files_struct(files);
+ }
+ read_unlock(&tasklist_lock);
+
+repeat3:
+ read_lock(&tasklist_lock);
+ for_each_task(p) {
+ struct mm_struct *mm;
+ struct vm_area_struct *vma;
+
+ /* get a reference to p->mm */
+ task_lock(p);
+ mm = p->mm;
+ if (!mm) {
+ task_unlock(p);
+ continue;
+ } else
+ atomic_inc(&mm->mm_users);
+ task_unlock(p);
+
+ /* check for mmap'd files and unmap them */
+ spin_lock(&mm->page_table_lock);
+ for (vma = mm->mmap; vma; vma=vma->vm_next) {
+ file = vma->vm_file;
+ if (!file)
+ continue;
+ inode = file->f_dentry->d_inode;
+ if (!inode || !inode->i_sb)
+ continue;
+ if (file->f_vfsmnt == mnt) {
+ spin_unlock(&mm->page_table_lock);
+ read_unlock(&tasklist_lock);
+ down_write(&mm->mmap_sem);
+ do_munmap(mm, vma->vm_start,
+ vma->vm_end - vma->vm_start);
+ up_write(&mm->mmap_sem);
+ mmput(mm);
+ goto repeat3;
+ }
+ }
+ spin_unlock(&mm->page_table_lock);
+ mmput(mm);
+ }
+ read_unlock(&tasklist_lock);
+}
+
+module_init(init_badfs_fs)
+MODULE_LICENSE("GPL");
--- 2.4.20-pre10/fs/namespace.c Wed Oct 9 11:53:48 2002
+++ forcedumount/fs/namespace.c Sat Oct 12 15:24:11 2002
@@ -298,10 +298,14 @@
* about for the moment.
*/

- lock_kernel();
- if( (flags&MNT_FORCE) && sb->s_op->umount_begin)
- sb->s_op->umount_begin(sb);
- unlock_kernel();
+ if (flags & MNT_FORCE) {
+ lock_kernel();
+ if (mnt != current->fs->rootmnt)
+ quiesce_filesystem(mnt);
+ if (sb->s_op->umount_begin)
+ sb->s_op->umount_begin(sb);
+ unlock_kernel();
+ }

/*
* No sense to grab the lock for this test, but test itself looks
--- 2.4.20-pre10/include/linux/fs.h Wed Oct 9 11:58:21 2002
+++ forcedumount/include/linux/fs.h Sat Oct 12 15:24:11 2002
@@ -1479,6 +1479,8 @@
extern kdev_t ROOT_DEV;
extern char root_device_name[];

+/* fs/badfs/inode.c - used by forced umount */
+extern void quiesce_filesystem(struct vfsmount *);

extern void show_buffers(void);


2002-10-13 22:36:52

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Saturday 12 October 2002 12:14 am, Nick LeRoy wrote:
> On Friday 11 October 2002 03:26 pm, Rob Landley wrote:
> > On Friday 11 October 2002 07:53 pm, Hans Reiser wrote:
>
> <snip>
>
> > A little side project I'm working on now (in my copious free time) is
> > mount point relocation support. (You can mount the same filesystem a
> > second time in another location (mount --bind makes this easy), and they
> > share a superblock so open files should be happy, but you still can't
> > detach the first mount point. Not with a hacksaw, or explosives...)
> > It's more an excuse to learn the new VFS layer than anything else, but
> > it's
> > functionality I would in fact have a use for, strange enough...
>
> Not quite sure that I'm following the _why_ of this one, but maybe I'm just
> slow.

I posted it earlier:

Root filesystem is a loopback mounted zisofs image. The file to be loopback
mounted lives in the partition that will become /var.

An initial ramdisk mounts the partition on /initrd/var, calls losetup to
associate /dev/loop0 with the correct file, and exits to let the boot process
continue. The boot process remounts /var in the appropriate place.

/var is now mounted twice. The initrd can't be released because it's got an
active mount point under it. That mount point can't be released because the
root filesystem is loopback mounted from within it, so it has to stay open.

Logically, the second /var mount should be "mount --move /initrd/var /var",
followed by "umount /initrd" to free up the initrd memory. Right now it's
doing "mount -n --bind /initrd/var /var", because /etc is a symlink into /var
(has to remain editable, you see), and this way the information about which
partition var actually is can be kept in one place. (This is an
implementation detail: I could have used volume labels instead.)

The point is, right now I can't free the initial ramdisk because it has an
active mount point under it..

> > I'm also looking for an "unmount --force" option that works on something
> > other than NFS. Close all active filehandles (the programs using it can
> > just deal with EBADF or whatever), flush the buffers to disk, and
> > unmount. None of this "oh I can't do that, you have a zombie process with
> > an open file...", I want "guillotine this filesystem pronto, capice?"
> > behavior.
>
> Now _this_ I *like*. I've wanted this for _a long time_. Not that I have
> that much spare time, but I'd like to help if I can!

I have no spare time at the moment either (hopefully next week), and I
started out studying the 2.4 vfs layer which seems a bit different in 2.5
(can't tell how much yet), but I'll get there...

> > Of course loopback mounts would be kind of upset about this, but to be
> > honest: tough. The loopback block device gives them an I/O error, and
> > the filesystem should just cope. Floppies do this all the time with dust
> > and cat hair and stuff...
>
> Yup. This is required sometimes. Ever have a CD mounted that the (#$)#
> kernel won't let you umount even though lsof and /proc insist that's
> there's nothing open, but all you can do is an fscking reboot?!!!

Yes. And some scratched CDs can give REALLY interesting results...

Rob

2002-10-13 22:36:49

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Saturday 12 October 2002 06:03 am, Hans Reiser wrote:
> Rob Landley wrote:
> >I'm also looking for an "unmount --force" option that works on something
> >other than NFS. Close all active filehandles (the programs using it can
> > just deal with EBADF or whatever), flush the buffers to disk, and
> > unmount. None of this "oh I can't do that, you have a zombie process
> > with an open file...", I want "guillotine this filesystem pronto,
> > capice?" behavior.
>
> This sounds useful. It would be nice if umount prompted you rather than
> refusing.

The problem here is that umount(2) doesn't take a flag. I'd be happy to have
it fail unless called with the WITH_EXTREME_PREJUDICE flag or some such, but
that's an API change.

Of course I haven't gotten that far yet, but eventually this will have to be
dealt with...

Rob

2002-10-13 23:45:47

by Hans Reiser

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

Rob Landley wrote:

>On Saturday 12 October 2002 06:03 am, Hans Reiser wrote:
>
>
>>Rob Landley wrote:
>>
>>
>>>I'm also looking for an "unmount --force" option that works on something
>>>other than NFS. Close all active filehandles (the programs using it can
>>>just deal with EBADF or whatever), flush the buffers to disk, and
>>>unmount. None of this "oh I can't do that, you have a zombie process
>>>with an open file...", I want "guillotine this filesystem pronto,
>>>capice?" behavior.
>>>
>>>
>>This sounds useful. It would be nice if umount prompted you rather than
>>refusing.
>>
>>
>
>The problem here is that umount(2) doesn't take a flag. I'd be happy to have
>it fail unless called with the WITH_EXTREME_PREJUDICE flag or some such, but
>that's an API change.
>
>Of course I haven't gotten that far yet, but eventually this will have to be
>dealt with...
>
>Rob
>
>
>
>
Call it forcedumount().

What apps need to know about how to call it besides umount anyway?

Not a lot that need a lot of worry.....

Hans


2002-10-14 07:05:05

by Nikita Danilov

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

Rob Landley writes:
> On Saturday 12 October 2002 06:03 am, Hans Reiser wrote:
> > Rob Landley wrote:
> > >I'm also looking for an "unmount --force" option that works on something
> > >other than NFS. Close all active filehandles (the programs using it can
> > > just deal with EBADF or whatever), flush the buffers to disk, and
> > > unmount. None of this "oh I can't do that, you have a zombie process
> > > with an open file...", I want "guillotine this filesystem pronto,
> > > capice?" behavior.
> >
> > This sounds useful. It would be nice if umount prompted you rather than
> > refusing.
>
> The problem here is that umount(2) doesn't take a flag. I'd be happy to have
> it fail unless called with the WITH_EXTREME_PREJUDICE flag or some such, but
> that's an API change.
>
> Of course I haven't gotten that far yet, but eventually this will have to be
> dealt with...

There were several patches to do this. If I remember correctly Tigran
Aivazian wrote one, for example.

>
> Rob

Nikita.

2002-10-14 21:46:00

by Rob Landley

[permalink] [raw]
Subject: Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA))

On Sunday 13 October 2002 07:51 pm, Hans Reiser wrote:

> Call it forcedumount().
>
> What apps need to know about how to call it besides umount anyway?
>
> Not a lot that need a lot of worry.....

Actually, looking at the umount.c user space app thingy, it turns out there's
a umount2() glibc call that doesn't have a man page associated with it.
(Suspected there might be, since the existing -f had to get into the kernel
some how...)

The new patch Hugh Dickens posted looks interesting, but of course real life
has decided to intrude for a couple of days, looks like... :)

> Hans

Rob

2002-10-21 15:30:20

by Calin A. Culianu

[permalink] [raw]
Subject: [OT] Please don't call it 3.0!! (was Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA)))


So what's the verdict? Are we calling it 3.0 or 2.6? Who am I to say
this, but I really feel calling it kernel 3.0 is not fully justified. We
should stick with the 2.x series until major ABI or API changes break the
C library in massive ways, at which point we increment the major version
number.

Although its tempting to appeal to the mainstream by inflating the version
number artificially (what's Redhat up to now? 8.0?? sheesh!!), we have to
respect ourselves as developers. Why call it 3.0, other than to stroke
our own egos?

2002-10-21 16:05:22

by Wakko Warner

[permalink] [raw]
Subject: Re: [OT] Please don't call it 3.0!! (was Re: The reason to call it 3.0 is the desktop (was Re: [OT] 2.6 not 3.0 - (NUMA)))

> So what's the verdict? Are we calling it 3.0 or 2.6? Who am I to say
> this, but I really feel calling it kernel 3.0 is not fully justified. We
> should stick with the 2.x series until major ABI or API changes break the
> C library in massive ways, at which point we increment the major version
> number.
>
> Although its tempting to appeal to the mainstream by inflating the version
> number artificially (what's Redhat up to now? 8.0?? sheesh!!), we have to
> respect ourselves as developers. Why call it 3.0, other than to stroke
> our own egos?

what about when they jumped from 1.3.x to 2.0.x? I suggested around the pre
2.4 days it be called 3.0 becuase of that jump there. IIRC it was slackware
that jumped to be versioned up there with redhat. There've only been 2
major releases.

--
Lab tests show that use of micro$oft causes cancer in lab animals