2005-02-23 23:12:58

by Chad N. Tindel

[permalink] [raw]
Subject: Xterm Hangs - Possible scheduler defect?

Hello-

We have hit a defect where an exiting xterm process will hang. This is running
on a 2-cpu IA-64 box. We have a multithreaded application, where one thread
is SCHED_FIFO and is running with priority 98, and the other thread is just
a normal SCHED_OTHER thread. The SCHED_FIFO thread is in a CPU bound tight
loop, but I wouldn't expect that to cause since there are 2 CPUs.

However, it does seem to cause some problems. For example, if you ssh into
the system and run an Xterm using X11 forwarding, when you type "exit" in
the xterm window, the window hangs and doesn't close. Killing the CPU-bound
app causes the window to exit immediately. The sysrq output shows the
following:

xterm D a0000001000bef60 0 2905 2876 (NOTLB)

Call Trace:
[<a0000001004ac480>] schedule+0xca0/0x1300
sp=e000000012257d20 bsp=e000000012251080
[<a0000001000bef60>] flush_cpu_workqueue+0x1a0/0x4a0
sp=e000000012257d30 bsp=e000000012251020
[<a0000001000bf360>] flush_workqueue+0x100/0x160
sp=e000000012257d90 bsp=e000000012250fe8
[<a0000001000bfd60>] flush_scheduled_work+0x20/0x40
sp=e000000012257d90 bsp=e000000012250fd0
[<a0000001002e2060>] release_dev+0x8e0/0x1100
sp=e000000012257d90 bsp=e000000012250f20
[<a0000001002e3350>] tty_release+0x30/0x60
sp=e000000012257e30 bsp=e000000012250ef8
[<a00000010012d430>] __fput+0x330/0x340
sp=e000000012257e30 bsp=e000000012250ea8
[<a00000010012d0e0>] fput+0x40/0x60
sp=e000000012257e30 bsp=e000000012250e88
[<a00000010012a1b0>] filp_close+0xd0/0x160
sp=e000000012257e30 bsp=e000000012250e58
[<a00000010012a380>] sys_close+0x140/0x1a0
sp=e000000012257e30 bsp=e000000012250dd8
[<a00000010000aba0>] ia64_ret_from_syscall+0x0/0x20
sp=e000000012257e30 bsp=e000000012250dd8

So it would appear that xterm is hung in close() trying to shutdown a tty.
The comment says that is calling flush_scheduled_work() to
"Wait for ->hangup_work and ->flip.work handlers to terminate". Perhaps there
is some locking issue that is causing these to not run and complete?

I'm a bit out of my space here... does anybody have any ideas? I've tried
this on both 2.6.8 and 2.6.10 with the same problem resulting.

Please make sure to CC me in any responses.

Regards,

Chad


2005-02-24 02:38:45

by Andrew Morton

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

"Chad N. Tindel" <[email protected]> wrote:
>
> We have hit a defect where an exiting xterm process will hang. This is running
> on a 2-cpu IA-64 box. We have a multithreaded application, where one thread
> is SCHED_FIFO and is running with priority 98, and the other thread is just
> a normal SCHED_OTHER thread. The SCHED_FIFO thread is in a CPU bound tight
> loop, but I wouldn't expect that to cause since there are 2 CPUs.
>
> However, it does seem to cause some problems. For example, if you ssh into
> the system and run an Xterm using X11 forwarding, when you type "exit" in
> the xterm window, the window hangs and doesn't close. Killing the CPU-bound
> app causes the window to exit immediately. The sysrq output shows the
> following:
>
> xterm D a0000001000bef60 0 2905 2876 (NOTLB)
>
> Call Trace:
> [<a0000001004ac480>] schedule+0xca0/0x1300
> sp=e000000012257d20 bsp=e000000012251080
> [<a0000001000bef60>] flush_cpu_workqueue+0x1a0/0x4a0
> sp=e000000012257d30 bsp=e000000012251020
> [<a0000001000bf360>] flush_workqueue+0x100/0x160
> sp=e000000012257d90 bsp=e000000012250fe8
> [<a0000001000bfd60>] flush_scheduled_work+0x20/0x40
> sp=e000000012257d90 bsp=e000000012250fd0
> [<a0000001002e2060>] release_dev+0x8e0/0x1100
> sp=e000000012257d90 bsp=e000000012250f20
> [<a0000001002e3350>] tty_release+0x30/0x60
> sp=e000000012257e30 bsp=e000000012250ef8
> [<a00000010012d430>] __fput+0x330/0x340
> sp=e000000012257e30 bsp=e000000012250ea8
> [<a00000010012d0e0>] fput+0x40/0x60
> sp=e000000012257e30 bsp=e000000012250e88
> [<a00000010012a1b0>] filp_close+0xd0/0x160
> sp=e000000012257e30 bsp=e000000012250e58
> [<a00000010012a380>] sys_close+0x140/0x1a0
> sp=e000000012257e30 bsp=e000000012250dd8
> [<a00000010000aba0>] ia64_ret_from_syscall+0x0/0x20
> sp=e000000012257e30 bsp=e000000012250dd8
>
> So it would appear that xterm is hung in close() trying to shutdown a tty.
> The comment says that is calling flush_scheduled_work() to
> "Wait for ->hangup_work and ->flip.work handlers to terminate". Perhaps there
> is some locking issue that is causing these to not run and complete?

`xterm' is waiting for the other CPU to schedule a kernel thread (which is
bound to that CPU). Once that kernel thread has done a little bit of work,
`xterm' can terminate.

But kernel threads don't run with realtime policy, so your userspace app
has permanently starved that kernel thread.

It's potentially quite a problem, really. For example it could prevent
various tty operations from completing, it will prevent kjournald from ever
writing back anything (on uniprocessor, etc). I've been waiting for
someone to complain ;)

But the other side of the coin is that a SCHED_FIFO userspace task
presumably has extreme latency requirements, so it doesn't *want* to be
preempted by some routine kernel operation. People would get irritated if
we were to do that.

So what to do?

2005-02-24 05:23:40

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> `xterm' is waiting for the other CPU to schedule a kernel thread (which is
> bound to that CPU). Once that kernel thread has done a little bit of work,
> `xterm' can terminate.
>
> But kernel threads don't run with realtime policy, so your userspace app
> has permanently starved that kernel thread.
>
> It's potentially quite a problem, really. For example it could prevent
> various tty operations from completing, it will prevent kjournald from ever
> writing back anything (on uniprocessor, etc). I've been waiting for
> someone to complain ;)
>
> But the other side of the coin is that a SCHED_FIFO userspace task
> presumably has extreme latency requirements, so it doesn't *want* to be
> preempted by some routine kernel operation. People would get irritated if
> we were to do that.
>
> So what to do?

It shouldn't need to preempt the kernel operation. Why is the design such that
the necessary kernel thread can't run on the other CPU?

Chad

2005-02-24 05:26:33

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> But the other side of the coin is that a SCHED_FIFO userspace task
> presumably has extreme latency requirements, so it doesn't *want* to be
> preempted by some routine kernel operation. People would get irritated if
> we were to do that.

Just to follow up a bit. People writing apps that run at SCHED_FIFO know
that they aren't getting hard real-time, and they are OK with that. If they
wanted something more they'd run on RTLinux. Why would it be wrong to preempt
the SCHED_FIFO process in the case, assuming that it is too hard to fix a broken
design that doesn't allow the necessary kernel threads to run on any CPU?

Chad

2005-02-24 06:51:13

by Andrew Morton

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

"Chad N. Tindel" <[email protected]> wrote:
>
> > `xterm' is waiting for the other CPU to schedule a kernel thread (which is
> > bound to that CPU). Once that kernel thread has done a little bit of work,
> > `xterm' can terminate.
> >
> > But kernel threads don't run with realtime policy, so your userspace app
> > has permanently starved that kernel thread.
> >
> > It's potentially quite a problem, really. For example it could prevent
> > various tty operations from completing, it will prevent kjournald from ever
> > writing back anything (on uniprocessor, etc). I've been waiting for
> > someone to complain ;)
> >
> > But the other side of the coin is that a SCHED_FIFO userspace task
> > presumably has extreme latency requirements, so it doesn't *want* to be
> > preempted by some routine kernel operation. People would get irritated if
> > we were to do that.
> >
> > So what to do?
>
> It shouldn't need to preempt the kernel operation. Why is the design such that
> the necessary kernel thread can't run on the other CPU?
>

This particular kernel function is implemented via a kernel thread per CPU,
with each thread bound to each CPU. The xterm-does-exit cleanup code is
waiting for the thread which is bound to the busy CPU to do something.

No other CPU can, or is allowed, to do that thread's work. If it were to
do so, the implicit locking which we get from the per-cpuness would be
violated.

I don't know if any clients of the workqueue code rely upon the
pinned-to-cpu feature.

2005-02-24 13:24:29

by Helge Hafting

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:

>>But the other side of the coin is that a SCHED_FIFO userspace task
>>presumably has extreme latency requirements, so it doesn't *want* to be
>>preempted by some routine kernel operation. People would get irritated if
>>we were to do that.
>>
>>
>
>Just to follow up a bit. People writing apps that run at SCHED_FIFO know
>that they aren't getting hard real-time, and they are OK with that. If they
>wanted something more they'd run on RTLinux. Why would it be wrong to preempt
>the SCHED_FIFO process in the case, assuming that it is too hard to fix a broken
>design that doesn't allow the necessary kernel threads to run on any CPU?
>
>
Why would anyone write a thread than uses exactly 100% of one cpu?
It seems wrong to me. It is unsafe if they really need that much
processing power, what if an interrupt happens? Then they get slightly less.
If they got a 10% faster cpu, would this thread suddenly drop to 90%
usage (and no problems with kernel threads)?
If it stays at 100% then that suggests they are using some
sort of busy waiting which is bad programming. Get a better hw
device that will provide an interrupt at the right time, and write a
driver for
that. Or find some spot in the code where a small delay in acceptable,
and set a short timer and sleep on it. It doesn't take much to get this
kernel thread going.

Helge Hafting

2005-02-24 17:34:03

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> Why would anyone write a thread than uses exactly 100% of one cpu?
> It seems wrong to me. It is unsafe if they really need that much
> processing power, what if an interrupt happens? Then they get slightly less.
> If they got a 10% faster cpu, would this thread suddenly drop to 90%
> usage (and no problems with kernel threads)?
> If it stays at 100% then that suggests they are using some
> sort of busy waiting which is bad programming. Get a better hw
> device that will provide an interrupt at the right time, and write a
> driver for
> that. Or find some spot in the code where a small delay in acceptable,
> and set a short timer and sleep on it. It doesn't take much to get this
> kernel thread going.

I would make the following assertion for any kernel:

No single userspace thread of execution running on an SMP system should be
able to hose a box by going CPU-bound, bug in the software or no bug. Any
kernel should be able to handle this case and shift general work over to
other processors.

While I can't speak for all commercial Unixes, I know that HP-UX handles this
case just fine. I'd be extremely surprised if Solaris and AIX didn't handle
it fine too. What I can't understand is why you want to cop-out and say
"Oh well this is just a bug in the application... the programmer shouldn't
shoot himself in the foot." If that were the attitude that kernel programmers
had, why have the kernel send SIGSEGV when applications reference invalid
memory? Why not just let them corrupt the memory of other apps and possibly
bring the whole system down?

It is the kernel's job to protect itself and userspace applications from
runaway applications whenever possible. While this might not be possible for
this case on a UP box, it certainly is for an SMP box.

Chad

2005-02-24 17:54:56

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> > Hmmm... Are you suggesting it is OK for a kernel to get nearly completely
> > hosed and for not fully utilize all the processors in the system because
> > of one SCHED_FIFO thread?
>
> Sure. You specifically directed the scheduler to run your thread at a
> higher priority than anything else. The way I see it, you used root's
> perogative to shoot himself in the foot. You could also have used root's
> perogative to don steel toed shoes(set important kernel threads to a higher
> priority) before pulling the trigger.

No, I specifically directed the scheduler to run my thread at a higher
priority than any other userspace application. The fact that I wrote it
in userspace and not in kernel space implies that I am OK with the kernel
stopping me sometimes when _it_ has work to do. If I wanted something
higher priority than the kernel I would have written something in kernel
space instead.

> SCHED_FIFO thread are supposed to preempt
> > all other userspace threads... not the kernel itself.
>
> Not so. The scheduler makes do distinction between user and kernel threads
> of execution.

That is SOOOO broken it isn't even funny.

> If you think that's broken, you'll _love_ Ingo's IRQ threads. While testing
> one of his recent (slightly buggy)unpriveleged-user-does-RT patches in an
> IRQ threadified kernel, I ran a user SCHED_FIFO task at higher than the IRQ0
> thread... if my box had been an embeded washing machine controller instead
> of a desktop box, it'd have been a classic case of "No tickie no washie" :))

Yeah, thats broken too.

Perhaps I don't understand this philosophy you have where the kernel
isn't more important than everything else. It seems to me like there needs
to be a rigid hierarchy for scheduling, lest you get into deadlock problems:

1. Kernel preempts all. There may be some hierarchy of kernel priorities
too, but it isn't important here.
2. SCHED_FIFO processes preempt all userspace applications.
3. SCHED_RR.
4. SCHED_OTHER.

Under no circumstances should any single CPU-bound userspace thread completely
hose a 64-way SMP box.

Can somebody educate me on why it is correct to do it any other way?

Chad

2005-02-24 18:20:27

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:

> 1. Kernel preempts all. There may be some hierarchy of kernel priorities
> too, but it isn't important here.
> 2. SCHED_FIFO processes preempt all userspace applications.
> 3. SCHED_RR.
> 4. SCHED_OTHER.
>
> Under no circumstances should any single CPU-bound userspace thread completely
> hose a 64-way SMP box.
>
> Can somebody educate me on why it is correct to do it any other way?

Low-latency userspace apps. The audio guys, for instance, are trying to
get latencies down to the 100us range.

If random kernel threads can preempt userspace at any time, they wreak
havoc with latency as seen by userspace.

Chris

2005-02-24 18:40:47

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> Low-latency userspace apps. The audio guys, for instance, are trying to
> get latencies down to the 100us range.
>
> If random kernel threads can preempt userspace at any time, they wreak
> havoc with latency as seen by userspace.

Come now. There is no such thing as a random kernel thread. Any General
Purpose kernel needs the ability to do work that keeps the entire system from
grinding to a halt.

Chad

2005-02-24 19:04:34

by Paulo Marques

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:
>>Low-latency userspace apps. The audio guys, for instance, are trying to
>>get latencies down to the 100us range.
>>
>>If random kernel threads can preempt userspace at any time, they wreak
>>havoc with latency as seen by userspace.
>
>
> Come now. There is no such thing as a random kernel thread. Any General
> Purpose kernel needs the ability to do work that keeps the entire system from
> grinding to a halt.

FYI most kernel threads do background work, that doesn't have hard
real-time constraints. Why should my audio recording session get
interrupted (read: "sent to the trashcan") just because the swap daemon
decided that it was a good time to write some pages out? Couldn't it
have waited just a few more milliseconds?

You don't seem to realize that you have just arrived to this mailing
list and missed years of discussions on kernel architecture.

If you keep a learning attitude, there is a chance for this discussion
to go on. However, if you keep the "Come now, don't bullshit me, this is
a broken architecture and you're just trying to cover up" attitude,
you're just going to get discarded as a troll.

I personally like the linux way: "root has the ability to shoot himself
in the foot if he wants to". This is my computer, damn it, I am the one
who tells it what to do.

This is much, much better than the "users are stupid, we must protect
them from themselves" kind of way that other OS'es use.

Just my 0.02 euros,

--
Paulo Marques - http://www.grupopie.com

All that is necessary for the triumph of evil is that good men do nothing.
Edmund Burke (1729 - 1797)

2005-02-24 19:22:53

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> If you keep a learning attitude, there is a chance for this discussion
> to go on. However, if you keep the "Come now, don't bullshit me, this is
> a broken architecture and you're just trying to cover up" attitude,
> you're just going to get discarded as a troll.

I'm not trying to troll here; I suppose I'm just coming from a different
background. I'll try to adjust my tone.

> I personally like the linux way: "root has the ability to shoot himself
> in the foot if he wants to". This is my computer, damn it, I am the one
> who tells it what to do.

I'm all for allowing people to shoot themselves in the foot. That doesn't
mean that it is OK for a single userspace thread to mess up a 64-way box.

> This is much, much better than the "users are stupid, we must protect
> them from themselves" kind of way that other OS'es use.

Isn't this what the kernel attempts to do in many other places? What else
could possibly be the point of sending SIGSEGV and causing applications
to dump core when they make a mistake referencing memory? Isn't it the
kernel's job to protect one application from another?

I see what you're saying about the swap daemon. How about if I make my
statement less black and white. Some kernel threads should always have
priority over userspace.

I believe the mindset required for a home system that is doing audio recordings
is different than the one for enterprise-level systems. How do we unify
the two?

Chad

2005-02-24 19:46:55

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:

>>I personally like the linux way: "root has the ability to shoot himself
>>in the foot if he wants to". This is my computer, damn it, I am the one
>>who tells it what to do.

> I'm all for allowing people to shoot themselves in the foot. That doesn't
> mean that it is OK for a single userspace thread to mess up a 64-way box.

If root has explicitly stated that the thread in question is the highest
priority thing on the entire machine, why would it not be okay. The
fact that root made a mistake is the issue here, not that the system
didn't protect itself.

>>This is much, much better than the "users are stupid, we must protect
>>them from themselves" kind of way that other OS'es use.

> Isn't this what the kernel attempts to do in many other places? What else
> could possibly be the point of sending SIGSEGV and causing applications
> to dump core when they make a mistake referencing memory? Isn't it the
> kernel's job to protect one application from another?

Yes. But at the same time, if root wants to do something, the kernel
should let them. There are many, many ways that root could trash the
system. "cat /dev/urandom > /dev/kcore" would do quite nicely.

> I see what you're saying about the swap daemon. How about if I make my
> statement less black and white. Some kernel threads should always have
> priority over userspace.

Not necessarily. The latest latency-reduction patches allow root to
specify exactly what the priorities should be.

> I believe the mindset required for a home system that is doing audio recordings
> is different than the one for enterprise-level systems. How do we unify
> the two?

There are professionals who use linux for audio as well, it's not just
home systems. That said, you unify them with reasonable defaults, and
the ability for root to override them.

Chris

2005-02-24 19:53:08

by Barry K. Nathan

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> > This is much, much better than the "users are stupid, we must protect
> > them from themselves" kind of way that other OS'es use.
>
> Isn't this what the kernel attempts to do in many other places? What else
> could possibly be the point of sending SIGSEGV and causing applications
> to dump core when they make a mistake referencing memory? Isn't it the
> kernel's job to protect one application from another?

A related example: Typically, when a program (even when running as
root) attempts to access I/O ports directly, it gets killed as you
describe. However, the X server, running as root, can use ioperm or iopl
to request permission to access the video card's I/O ports directly. When
it gets that permission, it can do that and no longer gets killed. It
also means the X server is capable of bringing the entire system via
errant I/O port accesses if it wishes (or if it misbehaves).

The general philosophy is to protect one application from another,
unless an application has been specifically granted sufficient power
to wreck the system.


I don't remember off the top of my head whether SCHED_FIFO tasks are
supposed to be able to take SMP systems down, if the # of SCHED_FIFO
tasks is less than the # of CPU's. I imagine someone has thought about
this in the past and answered the question one way or another, but I
don't happen to know the answer.

-Barry K. Nathan <[email protected]>

2005-02-24 20:08:32

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> >I'm all for allowing people to shoot themselves in the foot. That doesn't
> >mean that it is OK for a single userspace thread to mess up a 64-way box.
>
> If root has explicitly stated that the thread in question is the highest
> priority thing on the entire machine, why would it not be okay. The
> fact that root made a mistake is the issue here, not that the system
> didn't protect itself.

Yeah, I realized when I left for lunch that this statement wasn't as clear
as I would like it to be.

I think what we have are the need for two levels of applications:

1. That which wishes to be the highest priority userspace application, and
wishes to preempt all other userspace applications. Such an application is
OK being preempted by the kernel when the kernel needs to do work. IMHO, this
should be the default behavior for any SCHED_FIFO application. If one of these
has a bug and goes CPU-bound, the worst it can do is prevent other apps from
ever using the CPU it is on.

2. Applications which actually want to be the highest priority thing on
the system, including being higher than the kernel. These applications are
OK with the fact that they may cause system hangs and deadlocks, and are
careful not to shoot themselves in the foot.

> There are professionals who use linux for audio as well, it's not just
> home systems. That said, you unify them with reasonable defaults, and
> the ability for root to override them.

OK. Would you say it would be a reasonable default to have SCHED_FIFO userspace
threads running at a lower priority than essential kernel threads (say, the
load balancer and the events thread), and give root the ability to explicitly
have userspace threads preempt the kernel?

Chad

2005-02-24 20:29:36

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:

> OK. Would you say it would be a reasonable default to have SCHED_FIFO userspace
> threads running at a lower priority than essential kernel threads (say, the
> load balancer and the events thread), and give root the ability to explicitly
> have userspace threads preempt the kernel?

The current scheduler has a 1-100 priority range for soft realtime
tasks. To insert a task into a realtime class, you need to have root
privileges.

As long as you make sure that kernel threads get set to higher
priorities than your user threads, then you get the above behaviour.
Ultimately, however, the administrator is responsable for ensuring that
everything is running with sane priority levels.

Chris

2005-02-24 22:26:28

by Peter Chubb

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

>>>>> "Chad" == Chad N Tindel <[email protected]> writes:

Chad> I would make the following assertion for any kernel:

Chad> No single userspace thread of execution running on an SMP system
Chad> should be able to hose a box by going CPU-bound, bug in the
Chad> software or no bug. Any kernel should be able to handle this
Chad> case and shift general work over to other processors.

In many Unices, crucial kernel threads run at realtime priority with a
static priority higher than is accessible to user code.

That being said, however, you've got to be a privileged user to set
real time very high priority on a thread, and if you do, you'd better
know what you're doing. Any SCHED_FIFO thread should run for a time,
then sleep for a time, or it *will* DOS everything else on the
processor.

--
Dr Peter Chubb http://www.gelato.unsw.edu.au peterc AT gelato.unsw.edu.au
The technical we do immediately, the political takes *forever*

2005-02-24 22:40:55

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> In many Unices, crucial kernel threads run at realtime priority with a
> static priority higher than is accessible to user code.

Yep.

> That being said, however, you've got to be a privileged user to set
> real time very high priority on a thread, and if you do, you'd better
> know what you're doing. Any SCHED_FIFO thread should run for a time,
> then sleep for a time, or it *will* DOS everything else on the
> processor.

This is only true if you're not doing what you said in your first paragraph,
i.e. running crucial kernel threads higher than any user thread.

Chad

2005-02-24 23:00:52

by Andrew Morton

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

"Chad N. Tindel" <[email protected]> wrote:
>
> I would make the following assertion for any kernel:
>
> No single userspace thread of execution running on an SMP system should be
> able to hose a box by going CPU-bound, bug in the software or no bug.

But if we were to enforce that policy, realtime policy would become less
useful. You havn't even acknowledged that such a tradeoff exists, let
alone demonstrated that we're on the wrong side of it.

Here's a quicky which will convert all your kernel threads to SCHED_RR,
priority 99. Please test.

#!/bin/sh

PIDS=$(ps axo pid,command | grep ' \[.*\]$' | sed -e 's/ \[.*\]$//')

for i in $PIDS
do
chrt -r 99 -9 $i
done

2005-02-24 23:23:29

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Andrew Morton wrote:

> #!/bin/sh
>
> PIDS=$(ps axo pid,command | grep ' \[.*\]$' | sed -e 's/ \[.*\]$//')
>
> for i in $PIDS
> do
> chrt -r 99 -9 $i
> done


For the unaware, "chrt" is part of the schedutils package. (I didn't
know about it till just now...figured I'd save others the trouble of
searching.)

Chris

2005-02-24 23:40:14

by Andrew Morton

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chris Friesen <[email protected]> wrote:
>
> Andrew Morton wrote:
>
> > chrt -r 99 -9 $i

Make that

chrt -r 99 -p $i

2005-02-25 00:51:07

by Kyle Moffett

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

On Feb 24, 2005, at 18:00, Andrew Morton wrote:
> Here's a quicky which will convert all your kernel threads to SCHED_RR,
> priority 99. Please test.

We have a bunch of workstations here where we run a similar thing
during boot,
as well as starting a SCHED_RR @ 99 sulogin-type process on tty12. It
makes
blasting the occasional annoying fork-bomb or CPU-chewing-crashed-X a
lot
nicer.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r
!y?(-)
------END GEEK CODE BLOCK------


2005-02-25 00:58:54

by Ingo Oeser

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chad N. Tindel wrote:
> I think what we have are the need for two levels of applications:
>
> 1. That which wishes to be the highest priority userspace application, and
> wishes to preempt all other userspace applications. Such an application is
> OK being preempted by the kernel when the kernel needs to do work. IMHO,
> this should be the default behavior for any SCHED_FIFO application. If one
> of these has a bug and goes CPU-bound, the worst it can do is prevent other
> apps from ever using the CPU it is on.

That is basically, what you do with SCHED_RR.
(Be preempted after maximum quantum, even if having work to do)

> 2. Applications which actually want to be the highest priority thing on
> the system, including being higher than the kernel. These applications are
> OK with the fact that they may cause system hangs and deadlocks, and are
> careful not to shoot themselves in the foot.

This is SCHED_FIFO.
(Strict priority scheduling, allowed to starve anything below)

So just try to use the right scheduler for your application right now, ok?

If your system is busy with top priority task, why should the kernel disturb
it?

Things will stop anyway, if your high priority task is needing a resource,
which is blocked. Than it becomes unrunnable and other tasks have
chances to continue. Kernel threads are likely to execute then, because they
are likely runnable then. Your task could even migrate, if a lot of kernel
tasks
are waiting in one CPU and your task is NOT bound to a specific CPU.

So the system is not brought down, but just busy in a infortunate way.
Stupid applications can starve other applications for a while, but not
forever, because the kernel is still running and deciding.


Regards

Ingo Oeser


2005-02-25 04:26:24

by Mike Galbraith

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

At 12:53 PM 2/24/2005 -0500, Chad N. Tindel wrote:
> > > Hmmm... Are you suggesting it is OK for a kernel to get nearly completely
> > > hosed and for not fully utilize all the processors in the system because
> > > of one SCHED_FIFO thread?
> >
> > Sure. You specifically directed the scheduler to run your thread at a
> > higher priority than anything else. The way I see it, you used root's
> > perogative to shoot himself in the foot. You could also have used root's
> > perogative to don steel toed shoes(set important kernel threads to a higher
> > priority) before pulling the trigger.
>
>No, I specifically directed the scheduler to run my thread at a higher
>priority than any other userspace application. The fact that I wrote it
>in userspace and not in kernel space implies that I am OK with the kernel
>stopping me sometimes when _it_ has work to do. If I wanted something
>higher priority than the kernel I would have written something in kernel
>space instead.

Nope. You may have _thought_ you told it that, but the reality is as I
described it.

> > SCHED_FIFO thread are supposed to preempt
> > > all other userspace threads... not the kernel itself.
> >
> > Not so. The scheduler makes do distinction between user and kernel threads
> > of execution.
>
>That is SOOOO broken it isn't even funny.

I heartily disagree. I call it flexible/powerful.

> > If you think that's broken, you'll _love_ Ingo's IRQ threads...
>
>Yeah, thats broken too.

(You're not noticing the added power it gives you.)

>Perhaps I don't understand this philosophy you have where the kernel
>isn't more important than everything else. It seems to me like there needs
>to be a rigid hierarchy for scheduling, lest you get into deadlock problems:

Some kernel thread flushing buffers should be more important than my
userland trigger-pacemaker thread?

>Under no circumstances should any single CPU-bound userspace thread
>completely
>hose a 64-way SMP box.

I can certainly agree that any service which is required across processor
borders wants to be very high priority indeed, and I can further agree that
this crossing of borders would not exist in a perfect world.

-Mike

2005-02-25 15:14:10

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Ingo Oeser wrote:

> Stupid applications can starve other applications for a while, but not
> forever, because the kernel is still running and deciding.

Not so.



task 1: sched_rr, priority 1, takes mutex
task 2: sched_rr, priority 2, cpu hog, infinite loop
task 3: sched_rr, priority 99, tries to get mutex

And now tasks 1 and 3 are starved forever. Arguably bad application
design, but it demonstrates a case where applications can starve other
applications.

Chris

2005-02-25 15:40:22

by Ingo Oeser

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Chris Friesen wrote:
> Ingo Oeser wrote:
> > Stupid applications can starve other applications for a while, but not
> > forever, because the kernel is still running and deciding.
>
> Not so.
>
>
>
> task 1: sched_rr, priority 1, takes mutex
> task 2: sched_rr, priority 2, cpu hog, infinite loop
> task 3: sched_rr, priority 99, tries to get mutex
>
> And now tasks 1 and 3 are starved forever. Arguably bad application
> design, but it demonstrates a case where applications can starve other
> applications.

You are right.

In "If a SCHED_RR process has been running for a time period equal to or
longer than the time quantum, it will be put at the end of the list for
its priority" I missed the "for its priority" part.

You would need to change the priority of task 1 until it releases the
mutex. Ideally the owner gets the maximum priority of
his and all the waiters on it, until it releases his mutex, where he regains
its old priority after release of mutex. But this priority elevation happens
only, if he is runnable. If not, he gets his old priority back, until he is
runnable.

But then again you just need to grab a mutex shared with a high priority
task and consume CPU.

Since this behavior is not defined in POSIX AFAIK, you just have
to write your applications properly or use SCHED_OTHER for CPU hogging.


Regards

Ingo Oeser

2005-02-25 15:54:24

by Paulo Marques

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Ingo Oeser wrote:
> Chris Friesen wrote:
>
>>Ingo Oeser wrote:
>>[...]
> You would need to change the priority of task 1 until it releases the
> mutex. Ideally the owner gets the maximum priority of
> his and all the waiters on it, until it releases his mutex, where he regains
> its old priority after release of mutex. But this priority elevation happens
> only, if he is runnable. If not, he gets his old priority back, until he is
> runnable.

This is called a "priority inversion" problem, and there was some work
done by Ingo Molnar to make the scheduler aware of such cases and handle
them appropriatelly.

You can follow this thread for more info:

http://marc.theaimsgroup.com/?l=linux-kernel&m=110106915415886&w=2

I really don't know what's the current state, but this is nothing new...

--
Paulo Marques - http://www.grupopie.com

All that is necessary for the triumph of evil is that good men do nothing.
Edmund Burke (1729 - 1797)

2005-02-25 16:24:32

by Lee Revell

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

On Fri, 2005-02-25 at 15:53 +0000, Paulo Marques wrote:
> Ingo Oeser wrote:
> > Chris Friesen wrote:
> >
> >>Ingo Oeser wrote:
> >>[...]
> > You would need to change the priority of task 1 until it releases the
> > mutex. Ideally the owner gets the maximum priority of
> > his and all the waiters on it, until it releases his mutex, where he regains
> > its old priority after release of mutex. But this priority elevation happens
> > only, if he is runnable. If not, he gets his old priority back, until he is
> > runnable.
>
> This is called a "priority inversion" problem, and there was some work
> done by Ingo Molnar to make the scheduler aware of such cases and handle
> them appropriatelly.
>
> You can follow this thread for more info:
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=110106915415886&w=2
>

The solution to your problem (which is as old as the hills) involves
priority inheriting mutexes which are available in the RT preempt patch
(if you build with CONFIG_PREEMPT_RT). This should be usable for hard
realtime applications.

http://people.redhat.com/mingo/realtime-preempt

If you just need very good soft realtime performance I recommend
PREEMPT_DESKTOP.

Lee

2005-02-25 17:08:44

by Chris Friesen

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

Lee Revell wrote:

> The solution to your problem (which is as old as the hills) involves
> priority inheriting mutexes which are available in the RT preempt patch
> (if you build with CONFIG_PREEMPT_RT). This should be usable for hard
> realtime applications.

Yup. I was just pointing out that userspace apps *can* block other
userspace apps.

> http://people.redhat.com/mingo/realtime-preempt
>
> If you just need very good soft realtime performance I recommend
> PREEMPT_DESKTOP.

How does this compare with Inaky's "robust mutexes" patch?

Chris

2005-02-25 20:23:00

by Helge Hafting

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

On Thu, Feb 24, 2005 at 02:22:37PM -0500, Chad N. Tindel wrote:
> > If you keep a learning attitude, there is a chance for this discussion
> > to go on. However, if you keep the "Come now, don't bullshit me, this is
> > a broken architecture and you're just trying to cover up" attitude,
> > you're just going to get discarded as a troll.
>
> I'm not trying to troll here; I suppose I'm just coming from a different
> background. I'll try to adjust my tone.
>
> > I personally like the linux way: "root has the ability to shoot himself
> > in the foot if he wants to". This is my computer, damn it, I am the one
> > who tells it what to do.
>
> I'm all for allowing people to shoot themselves in the foot. That doesn't
> mean that it is OK for a single userspace thread to mess up a 64-way box.
>
What's so special about a 64-way box?

Note that the box wasn't messed up - the thread merely used too much cpu. It
is perfectly ok - even on a 64-way box - to have a thread that runs with
higher priority than all the kernel threads - *�if* it occationally sleeps.
That means the thread can get very low latency work done, and the kernel
threads will simply wait a little. Then the thread sleeps, and those
cruical kernel threads move on. A high-priority thread that doesn't
run all the time is no problem. and it may need the ability to preempt
kernel threads occationally due to timing constraints.

In the case mentioned, the high-priority thread ran all the time. That's bad,
but there is no way the kernel can guess that is was a bad idea in that case.
The kernel does what it is told. An ordinary user can�'t use such priorities,
so there is no security problem here. Only root can, and root has the
power to disrupt service anyway (shutdown, kill any process, delete any file.)

Someone who runs as root is _trusted_ to do the right thing, this trust
might be outside the scope of the os. In other words, some people are
allowed to run special processes, by the machine owner. Some gets
the root password - and they are supposed to be above the "crowds" and
not crash the machine just because they can.

Helge Hafting

2005-02-25 21:02:29

by Chad N. Tindel

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

> What's so special about a 64-way box?

They're expensive and customers don't expect a single userspace thread to
tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious
that a buggy kernel can bring a system to its knees, but it is not intuitively
obvious that a buggy userspace app can do the same thing. It is more of a
supportability issue than anything, because you expect the other processors
to function properly so you can get in and live-debug the application when it
hits a bug that makes it CPU-bound. This is especially important if the box
is, say, in a remote jungle of China or something where you don't have access
to the console.

The horse is dead, so lets not beat it anymore for the time being. It is
quite clear that people don't want Linux to (by default) not have the gun
cocked and pointed at the application developer's feet. People who want a
kernel that doesn't hang in the face of bad-acting userspace apps can change
the priority of important kernel threads, which seems like a reasonable
workaround for now.

Regards,

Chad

2005-02-25 23:24:50

by Lee Revell

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

On Fri, 2005-02-25 at 16:02 -0500, Chad N. Tindel wrote:
> They're expensive and customers don't expect a single userspace thread to
> tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious
> that a buggy kernel can bring a system to its knees, but it is not intuitively
> obvious that a buggy userspace app can do the same thing. It is more of a
> supportability issue than anything, because you expect the other processors
> to function properly so you can get in and live-debug the application when it
> hits a bug that makes it CPU-bound. This is especially important if the box
> is, say, in a remote jungle of China or something where you don't have access
> to the console.

"Unix policy is to not stop root from doing stupid things because
that would also stop him from doing clever things." - Andi Kleen

"It's such a fine line between stupid and clever" - Derek Smalls

Lee

2005-02-26 11:55:39

by Helge Hafting

[permalink] [raw]
Subject: Re: Xterm Hangs - Possible scheduler defect?

On Fri, Feb 25, 2005 at 04:02:26PM -0500, Chad N. Tindel wrote:
> > What's so special about a 64-way box?
>
> They're expensive and customers don't expect a single userspace thread to
> tie up the other 63 CPUs no matter how buggy it is. It is intuitively obvious
> that a buggy kernel can bring a system to its knees, but it is not intuitively
> obvious that a buggy userspace app can do the same thing. It is more of a
> supportability issue than anything, because you expect the other processors
> to function properly so you can get in and live-debug the application when it
> hits a bug that makes it CPU-bound. This is especially important if the box
> is, say, in a remote jungle of China or something where you don't have access
> to the console.
>
These are very good points. And the solution exists - if you want these
options then simply run the program at a lower priority than the
kernel threads. Doing this is not a problem.

You _can_ run a process at highest priority, but you don't have to!

> The horse is dead, so lets not beat it anymore for the time being. It is
> quite clear that people don't want Linux to (by default) not have the gun
> cocked and pointed at the application developer's feet.

Linux is safe, and you bring up a non-issue. So what if the app couldn't
get higher priority than kernel threads? You could then implement
it as a kernel thread and get the same problem anyway. No difference.

> People who want a
> kernel that doesn't hang in the face of bad-acting userspace apps can change
> the priority of important kernel threads, which seems like a reasonable
> workaround for now.
>
Yes, or they can simply run the app at a slightly lower priority until
it is fully tested so they know it can be trusted.

People sometimes need to not be delayed by kernel threads, and that
is not a problem as long as the application gives up the cpu after
it finishes doing the time-critical work. We want linux to be
able to do these kinds of work too.

saying that the os doesn't have control does not make sense. The
os will give away a cpu - but only if _you_ let it.

Helge Hafting