2005-12-18 22:11:29

by J.A. Magallon

[permalink] [raw]
Subject: About 4k kernel stack size....

Hi...

I'm following the intense thread about stack size, and I see only one
solution.

Ship the nest stable, development, -mm, everything release of everything
with a maximum kernel/interrupt stack usage meter. The ask a poll for
everyone to send /proc/sys/stack_usage_max to a mailing list.

Until that, you wont know if current code is razoring the 4k limit or
never passes the 1K size.

Just one idea, to try to end with this endless flamewar.

BTW, I run 4k stacks in 3 boxes since long ago, and had 0 (zero) problems.
Including nsf/afp over ext3 on md.

--
J.A. Magallon <jamagallon()able!es> \ Software is like sex:
werewolf!able!es \ It's better when it's free
Mandriva Linux release 2006.1 (Cooker) for i586
Linux 2.6.14-jam5 (gcc 4.0.2 (4.0.2-1mdk for Mandriva Linux release 2006.1))


Attachments:
signature.asc (189.00 B)

2005-12-20 02:52:52

by Mark Lord

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

J.A. Magallon wrote:
> Hi...
>
> I'm following the intense thread about stack size, and I see only one
> solution.
>
> Ship the nest stable, development, -mm, everything release of everything
> with a maximum kernel/interrupt stack usage meter. The ask a poll for
> everyone to send /proc/sys/stack_usage_max to a mailing list.

That won't do it.

The mainline code paths are undoubtedly fine with 4K stacks.
It's the *error paths* that are most likely to go deeper on the stack,
and those are rarely exercised by anyone. And those are the paths
that we *really* need to be reliable.

Cheers

2005-12-20 13:37:31

by Adrian Bunk

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Mon, Dec 19, 2005 at 09:52:53PM -0500, Mark Lord wrote:
>...
> The mainline code paths are undoubtedly fine with 4K stacks.
> It's the *error paths* that are most likely to go deeper on the stack,
> and those are rarely exercised by anyone. And those are the paths
> that we *really* need to be reliable.

"most likely" is a strong sentence, especially considering that the
automatic analysis of all possible call chains can and has already
identified several such problems (which have now been fixed many months
ago).

We might not getting 100% security against stack overflows, but that's
not fundamentally different from the current situation with 6 kB stacks.

> Cheers

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2005-12-20 14:37:30

by Mike Snitzer

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On 12/20/05, Adrian Bunk <[email protected]> wrote:
> On Mon, Dec 19, 2005 at 09:52:53PM -0500, Mark Lord wrote:
> >...
> > The mainline code paths are undoubtedly fine with 4K stacks.
> > It's the *error paths* that are most likely to go deeper on the stack,
> > and those are rarely exercised by anyone. And those are the paths
> > that we *really* need to be reliable.
>
> "most likely" is a strong sentence, especially considering that the
> automatic analysis of all possible call chains can and has already
> identified several such problems (which have now been fixed many months
> ago).
>
> We might not getting 100% security against stack overflows, but that's
> not fundamentally different from the current situation with 6 kB stacks.

Given this last statement, why is it that Matt Mackall's suggestion in
the "Light-weight dynamically extended stacks" thread didn't get any
_real_ discussion from the big 4K stack advocates? For all intents
and purposes, Matt was dismissed with the same Bunk: "Ever since
neilb's patch there are 0 bugs.. blah blah". 4K, 8K (aka "6 kB")
aside; having more stack safety in the Linux kernel is a "good thing"
no? Aren't dynamic stacks a viable means to imposing 4K (but doing so
with real safety)?

Mike

2005-12-20 15:00:36

by Adrian Bunk

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Tue, Dec 20, 2005 at 09:37:28AM -0500, Mike Snitzer wrote:
> On 12/20/05, Adrian Bunk <[email protected]> wrote:
> > On Mon, Dec 19, 2005 at 09:52:53PM -0500, Mark Lord wrote:
> > >...
> > > The mainline code paths are undoubtedly fine with 4K stacks.
> > > It's the *error paths* that are most likely to go deeper on the stack,
> > > and those are rarely exercised by anyone. And those are the paths
> > > that we *really* need to be reliable.
> >
> > "most likely" is a strong sentence, especially considering that the
> > automatic analysis of all possible call chains can and has already
> > identified several such problems (which have now been fixed many months
> > ago).
> >
> > We might not getting 100% security against stack overflows, but that's
> > not fundamentally different from the current situation with 6 kB stacks.
>
> Given this last statement, why is it that Matt Mackall's suggestion in
> the "Light-weight dynamically extended stacks" thread didn't get any
> _real_ discussion from the big 4K stack advocates? For all intents
> and purposes, Matt was dismissed with the same Bunk: "Ever since
> neilb's patch there are 0 bugs.. blah blah". 4K, 8K (aka "6 kB")
> aside; having more stack safety in the Linux kernel is a "good thing"
> no? Aren't dynamic stacks a viable means to imposing 4K (but doing so
> with real safety)?

Besides the fact that I still don't think it's requred, Matt's
suggestion would work only randomly for functions using more than 1 kB
stack.

But discussing hypothetical patches is silly - code talks.

If someone sends a patch implementing Mark's suggestion and it gets
measured that this patch doesn't impose a performance penalty we'd
have a basis for a real discussion.

> Mike

cu
Adrian

--

"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed

2005-12-20 15:56:54

by Sean

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Tue, December 20, 2005 9:37 am, Mike Snitzer said:

> Given this last statement, why is it that Matt Mackall's suggestion in
> the "Light-weight dynamically extended stacks" thread didn't get any
> _real_ discussion from the big 4K stack advocates? For all intents
> and purposes, Matt was dismissed with the same Bunk: "Ever since
> neilb's patch there are 0 bugs.. blah blah". 4K, 8K (aka "6 kB")
> aside; having more stack safety in the Linux kernel is a "good thing"
> no? Aren't dynamic stacks a viable means to imposing 4K (but doing so
> with real safety)?

The so called 4K stack patch does add more stack safety. Avoiding the
possibility of allocation failures due to memory fragmentation. Besides,
the patch is really misnamed; it should have been called the split-stack
(ie. 4K + 4K). Since nobody can show any area in the mainline code where
the split stack scheme introduces a problem the old setup should be
removed as it is no longer needed by the mainline code.

I for one hope those silly bastards using ndiswrapper fix up that code to
work with the new kernel so that we can stop hearing all these wannabe
complaints against this progress.

Sean

2005-12-20 17:03:24

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Tue, 20 Dec 2005, Sean wrote:

> On Tue, December 20, 2005 9:37 am, Mike Snitzer said:
>
>> Given this last statement, why is it that Matt Mackall's suggestion in
>> the "Light-weight dynamically extended stacks" thread didn't get any
>> _real_ discussion from the big 4K stack advocates? For all intents
>> and purposes, Matt was dismissed with the same Bunk: "Ever since
>> neilb's patch there are 0 bugs.. blah blah". 4K, 8K (aka "6 kB")
>> aside; having more stack safety in the Linux kernel is a "good thing"
>> no? Aren't dynamic stacks a viable means to imposing 4K (but doing so
>> with real safety)?
>
> The so called 4K stack patch does add more stack safety. Avoiding the
> possibility of allocation failures due to memory fragmentation. Besides,
> the patch is really misnamed; it should have been called the split-stack
> (ie. 4K + 4K). Since nobody can show any area in the mainline code where
> the split stack scheme introduces a problem the old setup should be
> removed as it is no longer needed by the mainline code.
>
> I for one hope those silly bastards using ndiswrapper fix up that code to
> work with the new kernel so that we can stop hearing all these wannabe
> complaints against this progress.
>
> Sean


Since it's been determined that the kernel will
use 4k stacks, solely because "smaller is better",
I decided to look through kernel code and
find out what the minimum stack-size required
is. First I looked at the number of arguments
passed to various procedures. The maximum I
could find by looking in kernel headers was
7 parameters. There are probably some well-
hidden procedures that use more. However, let's
make a rule that nobody can use more than 8
parameters. This rule will be justified by
the loudest shouter of "should have been a
pointer to a struct".

Anyway, that's 8 parameters plus the return
address or 9 of size_t elements on the stack.
For ix86, that's 9 * sizeof(size_t) = 36
bytes of stack-space required for the procedure
call. Now, that's make a rule that there can't
be more that 32 size_t sized things in local
space on the stack. This rule can be justified
by shouting the loudest that anybody who
codes buffers or structures on the stack is
an idiot.

Anyway, that's 32 * sizeof(size_t) = 128 bytes
required. We add this to the first 36 and we
have 164 bytes of stack-space required. So,
the maximum stack-space allowed will be defined
to be 164 bytes since we have proved that this
is all that's required (interrupts happen on
another stack). You can also make a rule that
recursion isn't allowed since a simple state-
machine can be proved to work as a substitute.

Since the ix86 processor won't allow us to make
pages less than 4k, we need to poison the rest
of the stack-page and if a kernel monitor, operating
off a timer-tick, detects that anybody is violating
this rule, an automatic daemon, kernel thread, shall
be awakened to spam the violator to death.

To enforce this new-world-order, we need to
require that all procedures contain the email
address of the writer. This shall be handled
with a macro, EXPORT_WRITER([email protected]).
Or, alternatively you can send your email address,
Linux-kernel version, and procedure name to:
[email protected]
Maybe you won't need to because the NSA already
has that information.

See, isn't rule-making fun? This whole 4k stack-
thing is really dumb. Other operating systems
use paged virtual memory for stacks, except
for the interrupt stack. If Linux used paged
virtual memory for stacks, the pages would not
have to be contiguous so dynamic stack allocation
would practically never fail. But Linux doesn't
use paged virtual memory for stacks. So, there
needs to be some rule to control the amount
of kernel stack allocated to each task when it
executes a system call.

This means, in the limit, that there are two
possibilities:

(1) Implement paged virtual memory for stack.
(2) Use a single page.

The NDIS driver problem can be fixed by switching
stacks. The NDIS compatibility device is single-
threaded, no need for a dynamic allocation of a stack.
The stack can be switched using simple assembly and
a buffer/stack that was allocated when the device-driver
was installed. It is a driver problem, not a kernel
problem.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-12-20 18:06:41

by Chase Venters

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Tue, 20 Dec 2005, linux-os \(Dick Johnson\) wrote:
> See, isn't rule-making fun? This whole 4k stack-
> thing is really dumb. Other operating systems
> use paged virtual memory for stacks, except
> for the interrupt stack. If Linux used paged
> virtual memory for stacks, the pages would not
> have to be contiguous so dynamic stack allocation
> would practically never fail. But Linux doesn't
> use paged virtual memory for stacks. So, there
> needs to be some rule to control the amount
> of kernel stack allocated to each task when it
> executes a system call.

Pardon, but why should "Other operating systems use paged virtual memory
for stacks" have anything to do with the design of Linux? Other operating
systems also look for a file called AUTORUN.INF whenever you insert a CD,
and they'll happily run arbitrary code... which is great when you're a
motherboard manufacturer providing crappy drivers on a crappy CD with
crappy artwork and you want to play a jingle before slapping a hideous GUI
up in front of your unsuspecting user; or perhaps you're Sony and you want
to hook people's kernel such that you become a sort of media hypervisor.
And this is the most deployed OS in the game...

Linux is a kernel - not a perl script. Programmer laziness is about the
only excuse I've been able to spot in this discussion that has been raised
in support of big stacks. (Perhaps all the arguments against aren't worded
as such; but as far as I've seen they all reduce to it).

If Linux used 4k stacks, we wouldn't have to worry about virtual
memory *or* contiguous allocations.

- Chase

2005-12-20 18:37:39

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Tue, 20 Dec 2005, Chase Venters wrote:

> On Tue, 20 Dec 2005, linux-os \(Dick Johnson\) wrote:
>> See, isn't rule-making fun? This whole 4k stack-
>> thing is really dumb. Other operating systems
>> use paged virtual memory for stacks, except
>> for the interrupt stack. If Linux used paged
>> virtual memory for stacks, the pages would not
>> have to be contiguous so dynamic stack allocation
>> would practically never fail. But Linux doesn't
>> use paged virtual memory for stacks. So, there
>> needs to be some rule to control the amount
>> of kernel stack allocated to each task when it
>> executes a system call.
>
> Pardon, but why should "Other operating systems use paged virtual memory
> for stacks" have anything to do with the design of Linux? Other operating
> systems also look for a file called AUTORUN.INF whenever you insert a CD,
> and they'll happily run arbitrary code...

Sorry, you must be talking about M$ stuff. I wasn't. There are
real operating systems that work. They solved a lot of problems
by doing things correctly, learning from the mistakes of others.



which is great when you're a
> motherboard manufacturer providing crappy drivers on a crappy CD with
> crappy artwork and you want to play a jingle before slapping a hideous GUI
> up in front of your unsuspecting user; or perhaps you're Sony and you want
> to hook people's kernel such that you become a sort of media hypervisor.
> And this is the most deployed OS in the game...
>

Also, the M$ __kernel__ doesn't look for any files of any kind except
for its page file which is locates without the file-system, BTW.
If you have the misfortune of using some contraption that uses M$,
just bring up the "Task Manager". Look at the "processes". One
of them there, looks for new disks/mounts/etc at 1-second intervals.
Can you guess which one? Hint. You can't figure it out from its name!

> Linux is a kernel - not a perl script. Programmer laziness is about the
> only excuse I've been able to spot in this discussion that has been raised
> in support of big stacks. (Perhaps all the arguments against aren't worded
> as such; but as far as I've seen they all reduce to it).
>

A kernel stack is simply an implimentation detail. Somebody made
an early decision to use non-paged memory for stacks. From that
point one, we have to either live with it or change it. The
change doesn't involve size. It involves kind.

> If Linux used 4k stacks, we wouldn't have to worry about virtual
> memory *or* contiguous allocations.
>
> - Chase
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-12-20 18:43:41

by Arjan van de Ven

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


> A kernel stack is simply an implimentation detail. Somebody made
> an early decision to use non-paged memory for stacks. From that
> point one, we have to either live with it or change it. The
> change doesn't involve size. It involves kind.

it involves a whole lot, like banning dma from the stack, and to make it
swapable or kmapped you'd even need to fix all the places that put
things like wait queues on the stack, as well as many other similar data
structures. Staying at 4Kb is a lot easier than that ;)


2005-12-20 18:59:42

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Tue, 20 Dec 2005, Arjan van de Ven wrote:

>
>> A kernel stack is simply an implimentation detail. Somebody made
>> an early decision to use non-paged memory for stacks. From that
>> point one, we have to either live with it or change it. The
>> change doesn't involve size. It involves kind.
>
> it involves a whole lot, like banning dma from the stack, and to make it
> swapable or kmapped you'd even need to fix all the places that put
> things like wait queues on the stack, as well as many other similar data
> structures. Staying at 4Kb is a lot easier than that ;)
>
Yes. No question about it. Once that decision was made, it defined a
lot of kernel internals. It just might be why Linux is such a good
performer, too. There are a lot of good things that might be caused
by the non-paged stack.
>

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.56 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-12-20 20:15:13

by Alan

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Llu, 2005-12-19 at 21:52 -0500, Mark Lord wrote:
> The mainline code paths are undoubtedly fine with 4K stacks.
> It's the *error paths* that are most likely to go deeper on the stack,
> and those are rarely exercised by anyone. And those are the paths
> that we *really* need to be reliable.

Very few error paths are that deep, the obvious complex exception is the
scsi one, but thats a seperate thread. Also the same argument about
reliability is why going to 4K stack + IRQ stacks helps - it makes the
stack usage predictable.

2005-12-20 22:32:53

by Alan

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Maw, 2005-12-20 at 19:43 +0100, Arjan van de Ven wrote:
> it involves a whole lot, like banning dma from the stack, and to make it
> swapable or kmapped you'd even need to fix all the places that put
> things like wait queues on the stack, as well as many other similar data
> structures. Staying at 4Kb is a lot easier than that ;)

If you look at something like the early unix design books its very deep
into the design of the most basic behaviour and primitives. It would be
possible to fix that in Linux but probably not worth it. The 1 page you
need for stack is now cheap.

I did look at fixing it for ELKS where a big part of the 64K DS space is
kernel stacks but fortunately something useful needed doing instead ;)

Alan

2005-12-20 22:54:04

by Nikita Danilov

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

linux-os \(Dick Johnson\) writes:
>

[...]

> See, isn't rule-making fun? This whole 4k stack-
> thing is really dumb. Other operating systems
> use paged virtual memory for stacks, except
> for the interrupt stack. If Linux used paged
> virtual memory for stacks,

... then spin-locks couldn't be held across function calls.

> the pages would not
> have to be contiguous so dynamic stack allocation
> would practically never fail. But Linux doesn't
> use paged virtual memory for stacks. So, there
> needs to be some rule to control the amount
> of kernel stack allocated to each task when it
> executes a system call.
>
> This means, in the limit, that there are two
> possibilities:
>
> (1) Implement paged virtual memory for stack.

As an exercise: subscribe to NT kernel development mailing list, and see
the fun they have when page-in code trips over paged out kernel text
page. As a rule, even code cannot pageable without very involving and
fragile analysis. Not to say about stack.

Nikita.

2005-12-21 14:03:55

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Tue, 20 Dec 2005, Nikita Danilov wrote:

> linux-os \(Dick Johnson\) writes:
> >
>
> [...]
>
> > See, isn't rule-making fun? This whole 4k stack-
> > thing is really dumb. Other operating systems
> > use paged virtual memory for stacks, except
> > for the interrupt stack. If Linux used paged
> > virtual memory for stacks,
>
> ... then spin-locks couldn't be held across function calls.
>

Sure they can! In ix86 machines the local 'cli' within the
spin-lock code doesn't affect traps and faults, only the
activity of hardware (INTR). With a page-not-present fault,
(read stack-not-present fault) everything necessary to
restart is saved in EFLAGS and EIP pushed onto the existing
stack so that the faulting instruction can be restarted.
In other words, the address of the instruction that caused
the fault (perhaps CALL), is what will be put into EIP
once the fault-handler returns with an IRET. The fault
itself occurs on a completely different stack.

If one were to implement paged stacks, then the page-fault handler
would have to be modified. Currently, the handler reads/writes
swap which, of course, it can't do with the interrupts disabled.

So one would have to use the concept of the 'free list' like
RSX-11 and VAX/VMS did. The "swapper" needs to keep some free
pages resident in memory and evict dirty pages whenever it can
to maintain this free list.

In the case of spin-locks, the 'flags' variable is the only
thing that can be stored in local (stack) data. The lock object
needs to be global so others can access it. The flags variable
will certainly be restored as will all other stack data even
if it was evicted to the disk.

> > the pages would not
> > have to be contiguous so dynamic stack allocation
> > would practically never fail. But Linux doesn't
> > use paged virtual memory for stacks. So, there
> > needs to be some rule to control the amount
> > of kernel stack allocated to each task when it
> > executes a system call.
> >
> > This means, in the limit, that there are two
> > possibilities:
> >
> > (1) Implement paged virtual memory for stack.
>
> As an exercise: subscribe to NT kernel development mailing list, and see
> the fun they have when page-in code trips over paged out kernel text
> page. As a rule, even code cannot pageable without very involving and
> fragile analysis. Not to say about stack.
>

NT is a poor copy of VAX/VMS. The DIGITAL Project Engineer for
VAX/VMS was hired as a consultant by Microsoft to develop it.
If he had copied VAX/VMS, it would have been essentially bug-free
because VMS had been successful for many years. Unfortunately,
copying was illegal, immoral, and fattening. So, you get something
'like' VMS for NT. It has bugs, perhaps more than any other
OS. This does not mean that the proved concepts used by
VMS for about 20 years are not valid. Note that it was the
business model of DIGITAL that killed it, not its technology.

> Nikita.
>

I am not advocating "N"k stacks. I'm just explaining that it
can be done and has been done, so the response that says it
can't is wrong. Further, it may be very possible that the
"dragged-down" feeling that you get on NT and others when
there is a lot of activity might be caused by this paging.
OTH Linux seems to work fine with no such dragged-down
response until, abruptly, it stops working by killing off
the very tasks that you needed to complete!

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-12-21 14:18:20

by Nikita Danilov

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

linux-os (Dick Johnson) writes:
>
> On Tue, 20 Dec 2005, Nikita Danilov wrote:
>
> > linux-os \(Dick Johnson\) writes:
> > >
> >
> > [...]
> >
> > > See, isn't rule-making fun? This whole 4k stack-
> > > thing is really dumb. Other operating systems
> > > use paged virtual memory for stacks, except
> > > for the interrupt stack. If Linux used paged
> > > virtual memory for stacks,
> >
> > ... then spin-locks couldn't be held across function calls.
> >
>
> Sure they can! In ix86 machines the local 'cli' within the

Sure they cannot: one cannot schedule with spin-lock held, and major
page fault will block for IO.

[...]

>
> NT is a poor copy of VAX/VMS. The DIGITAL Project Engineer for
> VAX/VMS was hired as a consultant by Microsoft to develop it.

Thank you, I know who D. Cutler is, and I have used RSX11/RT11/VMS, and
I am choosing Linux because of its technical superiority.

[...]

>

Nikita.

2005-12-21 14:25:15

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Wed, 21 Dec 2005, Nikita Danilov wrote:

> linux-os (Dick Johnson) writes:
> >
> > On Tue, 20 Dec 2005, Nikita Danilov wrote:
> >
> > > linux-os \(Dick Johnson\) writes:
> > > >
> > >
> > > [...]
> > >
> > > > See, isn't rule-making fun? This whole 4k stack-
> > > > thing is really dumb. Other operating systems
> > > > use paged virtual memory for stacks, except
> > > > for the interrupt stack. If Linux used paged
> > > > virtual memory for stacks,
> > >
> > > ... then spin-locks couldn't be held across function calls.
> > >
> >
> > Sure they can! In ix86 machines the local 'cli' within the
>
> Sure they cannot: one cannot schedule with spin-lock held, and major
> page fault will block for IO.
>
> [...]
>

Read the text you deleted and you will learn how.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
Warning : 98.36% of all statistics are fiction.
.

****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

2005-12-21 15:13:29

by Giridhar Pemmasani

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

Sean wrote:

> I for one hope those silly bastards using ndiswrapper fix up that code to
^^^^^^^^^^^^^^
It is despicable that some of the proponents of this 4k-only stack size have
resorted to such epithets. As I see, although people that rely on
ndiswrapper (since there is no other way they could use the hardware that
they have) will not be able to use their wireless cards when this patch
gets merged without having to patch the kernel, only a few comments have
been raised about it. There are other people that have raised concern not
related to ndiswrapper. Branding everyone that is raising a concern about
this patch into one group and calling them names is pathetic and
despondent.

While I am at it, let me _repeat_ that ndiswrapper itself is 4k-ready and a
few Windows drivers work with 4k stacks. And supporting private stacks, as
some people have suggested, may be possible in theory, but it is _very
hard_, considering that Windows uses different calling conventions
(fastcall, stdcall, cdecl) and a driver can use more than one thread. It is
futile on this thread to suggest to someone to come up with a patch to
implement private stacks in such an environment. And let me also repeat
that I have been working on implementing NDIS in user space, although there
are few issues with that too.

Giri



2005-12-21 15:19:02

by Nikita Danilov

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

linux-os (Dick Johnson) writes:
>
> On Wed, 21 Dec 2005, Nikita Danilov wrote:
>
> > linux-os (Dick Johnson) writes:
> > >
> > > On Tue, 20 Dec 2005, Nikita Danilov wrote:
> > >
> > > > linux-os \(Dick Johnson\) writes:
> > > > >
> > > >
> > > > [...]
> > > >
> > > > > See, isn't rule-making fun? This whole 4k stack-
> > > > > thing is really dumb. Other operating systems
> > > > > use paged virtual memory for stacks, except
> > > > > for the interrupt stack. If Linux used paged
> > > > > virtual memory for stacks,
> > > >
> > > > ... then spin-locks couldn't be held across function calls.
> > > >
> > >
> > > Sure they can! In ix86 machines the local 'cli' within the
> >
> > Sure they cannot: one cannot schedule with spin-lock held, and major
> > page fault will block for IO.
> >
> > [...]
> >
>
> Read the text you deleted and you will learn how.

I am afraid, I'd better not:

- spin-locks do not imply disabled interrupts;

- how can "swapper" guarantee that there is enough pages in the free
list to satisfy stack page faults atomically? The only way is to keep
free page for each thread. But then it's so much easier to just use
this reserved page for the stack from the very beginning.

Note, that RSX/RT didn't have "kernel threads" at all: it was
implemented as a non-blocking state machine serving user requests on
per-cpu stacks (at least pdp-15 versions).

>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.13.4 on an i686 machine (5589.55 BogoMips).
> Warning : 98.36% of all statistics are fiction.
> .

Nikita.

2005-12-21 15:37:16

by Kyle Moffett

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Dec 21, 2005, at 10:07, Giridhar Pemmasani wrote:
> Sean wrote:
>
>> I for one hope those silly bastards using ndiswrapper fix up that
>> code to
> ^^^^^^^^^^^^^^
> It is despicable that some of the proponents of this 4k-only stack
> size have resorted to such epithets.


> As I see, although people that rely on ndiswrapper (since there is
> no other way they could use the hardware that they have) will not
> be able to use their wireless cards when this patch gets merged
> without having to patch the kernel, only a few comments have been
> raised about it.

Not true. This is (IIRC) the _third_ flamewar during which a large
proportion of the comments were either directly or indirectly one of
the following: "You are intentionally breaking ndiswrapper", "What's
wrong with having an 8k option?", and "This makes things more-fragile
or isn't well tested".

> There are other people that have raised concern not related to
> ndiswrapper. Branding everyone that is raising a concern about this
> patch into one group and calling them names is pathetic and
> despondent.

To date, all of the above concerns have been addressed:

"You are intentionally breaking ndiswrapper":
Yes, we know, and we don't care, because it's a bad solution and
already broken. (12k windows stacks, preempt, etc). If it matters to
you, go fix it permanently (which I gather you are already trying to
do, good work).

"What's wrong with having an 8k option?":
It complicates the code, it means that bugs do not get reported
because disabling 4k (or enabling 8k) fixes it, and this is the only
excuse for the askers of the third question. It also makes things
more fragile because it means we need per-process order-1 allocations
instead of order-0 allocations. This option also makes crashes much
harder to reproduce depending on interrupt load and a wide variety of
other factors. This means that _users_ may see no problem with an 8k
option, but to the developers it's not such a great idea.

"This makes things more-fragile or isn't well tested.":
Not true! With the current situation in -mm, there are _0_
unresolved 4k-stacks bugs. If you have a problem, please report it
so we can get it fixed. However, since this does have technical
advantages (see above), we want to _force_ this option _in_-mm_,
*precisely* so we can get it better tested and work out the few
remaining bugs. Furthermore, this does *not* change the overall
amount of stack! 8k == 4k + 4k!!! It only makes stack overflows
guaranteed and easy to debug in the specific call scenarios (instead
of making them probabilistic and hard to reproduce).

> And supporting private stacks, as some people have suggested, may
> be possible in theory, but it is _very hard_, considering that
> Windows uses different calling conventions (fastcall, stdcall,
> cdecl) and a driver can use more than one thread.

Windows drivers like that using more than one thread are basically
inherently racy under current Linux, and probably would not handle
preemption at all. If some mess like that breaks due to any in-
kernel change, you get to keep all 42 pieces.

> It is futile on this thread to suggest to someone to come up with a
> patch to implement private stacks in such an environment. And let
> me also repeat that I have been working on implementing NDIS in
> user space, although there are few issues with that too.

Great, you will probably make a lot of people happy with that.

Cheers,
Kyle Moffett

--
Simple things should be simple and complex things should be possible
-- Alan Kay



2005-12-21 16:30:46

by Giridhar Pemmasani

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

Kyle Moffett wrote:

> Not true. This is (IIRC) the _third_ flamewar during which a large
> proportion of the comments were either directly or indirectly one of
> the following: "You are intentionally breaking ndiswrapper", "What's
> wrong with having an 8k option?", and "This makes things more-fragile
> or isn't well tested".

If you have paid any attention to the contents of my article, you would know
why it has become flame war. I didn't discuss any technical issues
regarding 4k-stack proposal, nor requested a debatable summary of what has
been going on this proposal. I didn't have anything new to discuss and nor
is rest of your article. I objected to some people (including you) that are
the reason why valid discussion is turning into one by calling people
names.

> Windows drivers like that using more than one thread are basically
> inherently racy under current Linux, and probably would not handle
> preemption at all. If some mess like that breaks due to any in-
> kernel change, you get to keep all 42 pieces.

Take a look at it to understand before commenting on it. ndiswrapper works
fine with preemption and even SMP with certain drivers. It may not work
with SMP with certain drivers, because I don't have hardware to test and
understand the issues. If you have any concerns about ndiswrapper, raise
them on ndiswrapper's mailing list.

> Great, you will probably make a lot of people happy with that.

Again. I am a developer of ndiswrapper and I am doing what I can so people
have a way of using many wireless cards that don't have open source
projects. Just because you don't agree on "moral issues" and what not about
ndiswrapper doesn't mean you have to force users to give up what they may
consider important. Besides, ndiswrapper is about choice; if someone like
you doesn't want to use ndiswrapper, no one is forcing you to, but there
are plenty of users that are aware of issues with using ndiswrapper that
are comfortable with it. I am not interested in further discussing this,
lest this is perceived as stoking flame war. I will rather focus on
constructive issues and help others (as I believe it, anyway).

Giri


2005-12-21 16:46:50

by Christoph Hellwig

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

Please sod of with your ndiswrapper whining. It's entirely offtopic here.

2005-12-21 20:14:35

by Krzysztof Halasa

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

Giridhar Pemmasani <[email protected]> writes:

> As I see, although people that rely on
> ndiswrapper (since there is no other way they could use the hardware that
> they have) will not be able to use their wireless cards when this patch
> gets merged without having to patch the kernel

Huh? -mm is already a patch so I'm not sure what users are you talking
about. End-users (non-developers) using -mm kernel (binary?) provided
by their distribution? Why would they want to use ndiswrapper (= binary
drivers, which make all bug reports and in fact all development
pointless) with devel kernel?
--
Krzysztof Halasa

2005-12-21 21:39:30

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: About 4k kernel stack size....


On Wed, 21 Dec 2005, Krzysztof Halasa wrote:

> Giridhar Pemmasani <[email protected]> writes:
>
>> As I see, although people that rely on
>> ndiswrapper (since there is no other way they could use the hardware that
>> they have) will not be able to use their wireless cards when this patch
>> gets merged without having to patch the kernel
>
> Huh? -mm is already a patch so I'm not sure what users are you talking
> about. End-users (non-developers) using -mm kernel (binary?) provided
> by their distribution? Why would they want to use ndiswrapper (= binary
> drivers, which make all bug reports and in fact all development
> pointless) with devel kernel?
> --
> Krzysztof Halasa

The attached patch will poison the user-to-kernel stack with the letters
'Q', starting on each page boundary. It does one page only so it will
work with any sized stack. One can run the machine with the usual
work, then do:

# cat /dev/mem | strings >junk.dat

Somebody, if interested, could make a program that looks for a string
of 'Q's starting on each page boundary reading /dev/kmem. Anyway,
using `vim` and searching for "QQQQQQQQQQQQQQ", you can see how
much of the existing stack is being used. A cursory check shows that,
out of every instance of a string of such poison, the smallest string
was about 48 bytes and the largest was too many to bother to count.
This shows that there is (probably) about 48 bytes of overhead when
using 1 page stacks.


The mailer screws up patches not attached, but here is one
in the foreground for review. It is running on this system so
it doesn't break anything (probably slows syscalls down, though).

--- linux-2.6.13.4/arch/i386/kernel/entry.S.orig 2005-12-21 15:29:05.000000000 -0500
+++ linux-2.6.13.4/arch/i386/kernel/entry.S 2005-12-21 16:09:08.000000000 -0500
@@ -75,6 +75,27 @@
NT_MASK = 0x00004000
VM_MASK = 0x00020000

+poison:
+ pushl %eax
+ pushl %ecx
+ pushl %edi
+ pushl %es
+ pushl %ss
+ popl %es
+ movl %esp, %edi
+ movl %edi, %ecx
+ andl $~0x1000, %edi
+ subl %edi, %ecx
+ movb $'Q', %al
+ rep stosb
+ popl %es
+ popl %edi
+ popl %ecx
+ popl %eax
+ ret
+
+
+
#ifdef CONFIG_PREEMPT
#define preempt_stop cli
#else
@@ -93,6 +114,7 @@
pushl %edx; \
pushl %ecx; \
pushl %ebx; \
+ call poison; \
movl $(__USER_DS), %edx; \
movl %edx, %ds; \
movl %edx, %es;



Cheers,
Dick Johnson
Penguin : Linux version 2.6.13.4 on an i686 machine (5591.09 BogoMips).
Warning : 98.36% of all statistics are fiction.
.


****************************************************************
The information transmitted in this message is confidential and may be privileged. Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited. If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to [email protected] - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.


Attachments:
entry.patch (725.00 B)
entry.patch

2005-12-21 22:50:23

by Jan Engelhardt

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

> > > > > > See, isn't rule-making fun? This whole 4k stack-
> > > > > > thing is really dumb. Other operating systems
> > > > > > use paged virtual memory for stacks, except
> > > > > > for the interrupt stack. If Linux used paged
> > > > > > virtual memory for stacks,
> > > > >
> > > > > ... then spin-locks couldn't be held across function calls.
> > > >
> > > > Sure they can! In ix86 machines the local 'cli' within the
> > >
> > > Sure they cannot: one cannot schedule with spin-lock held, and major
> > > page fault will block for IO.
> > > [...]
> > [...]
> [...]

Without me knowing every single detail of this matter, just try to hold a
mutex over function calls in the BSD kernel. While you can acquire a mutex
(=spinlock) (local to the module implementing the chardev) in e.g. the
open() routine of a chardev in Linux, and release it upon close(), you'll
get a segfault on BSD. Ok, Linux got nothing to do with BSD, but that's
what I remember from porting some code, and it resembles what is discussed
above.
(http://unix.derkeiler.com/Mailing-Lists/FreeBSD/hackers/2004-12/0337.html)



Jan Engelhardt
--

2005-12-22 09:14:11

by Romano Giannetti

[permalink] [raw]
Subject: Re: About 4k kernel stack size....

On Wed, Dec 21, 2005 at 10:07:01AM -0500, Giridhar Pemmasani wrote:
> Sean wrote:
>
> > I for one hope those silly bastards using ndiswrapper fix up that code to
> ^^^^^^^^^^^^^^
> It is despicable that some of the proponents of this 4k-only stack size have
> resorted to such epithets.

Yes, a bit sad it is.

Being one of those silly bastard that have to use ndiswrapper to connect to
my Uni wifi, I want to pubblically thanks Giri for his work. I'd love a
world where wifi cards will have native linux open source drivers, but _now_
my only other option is to boot in windows --- which I do not even have on my
laptop.

Nevertheless, I think that if the bulk of kernel developers feel the need to
go to 4k kernel stack, they probably have *technical* reasons to do it
(which I, for my limited understanding, cannot discuss). That mean that
ndiswrapper will need to have a kernel patch; that's not a problem for me
nor for the majority of people in this list.

For the other people, let distribution maintainer decide. I see ndiswrapper
nicely integrated in a lot of distro now, and I really do not see Mandriva
or whatever drop almost all the "wifi compatible" hardware list. Maybe they
will be pushed to develop open source driver (I hope). Maybe they simply
will ship patched kernel. Let's see.

Romano


--
Romano Giannetti - Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416 fax +34 915 596 569
http://www.dea.icai.upcomillas.es/romano/