2006-03-10 13:36:38

by Tomasz Chmielewski

[permalink] [raw]
Subject: can I bring Linux down by running "renice -20 cpu_intensive_process"?

I have a Linux server (kernel 2.6.8.1 + Linux RAID1) which is a "backup"
machine: it gets the files from other servers, compresses it, writes to
the tape, checks md5sums etc.

It's been running for quite a bit, no problems with stability so far.

Yesterday, something happened though.

I was logged in remotely, and the system was running md5sum against a 30
GB file.

I wanted the things to speed up a bit, and made "renice -20 <md5sum_pid>".

Few minutes after that I couldn't start any process, so I thought I made
the system so busy with renice -20, that my SSH session probably
disconnected.

In the morning, the system was still unavailable - I could ping it, I
could telnet to any of the ports opened, but nothing more happened.

SSH was waiting forever after:

debug1: identity file /root/.ssh/identity type -1
debug1: identity file /root/.ssh/id_rsa type -1
debug1: identity file /root/.ssh/id_dsa type -1


Nothing was displayed on the monitor (all black).

As I restarted the machine, I saw that the logging ends few minutes
after I changed the priority of md5sum to -20.


So here is my question: is it possible to bring down the machine by
simply doing "renice -20 cpu_intensive_process"?

As I said, this machine does heavy compression and md5sum calculations
of big files every day, and was stable all the time - but stopped
responding after I changed the priority of a CPU-intensive process to -20.

Coincidence and a hardware failure?

--
Tomasz Chmielewski
http://wpkg.org


2006-03-10 14:44:16

by Jan Engelhardt

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?


>Subject: can I bring Linux down by running "renice -20 cpu_intensive_process"?
>
Depends on what the cpu_intensive_process does. If it tries to allocate
lots of memory, maybe. If it's _just_ CPU (as in `perl -e '1 while 1'`),
you get a chance that you can input some commands on a terminal to kill it.
SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.

> I have a Linux server (kernel 2.6.8.1 + Linux RAID1) which is a "backup"
> machine: it gets the files from other servers, compresses it, writes to the
> tape, checks md5sums etc.
>
> It's been running for quite a bit, no problems with stability so far.
>
Why would you need it to run at -20 anyway?

> As I restarted the machine, I saw that the logging ends few minutes after I
> changed the priority of md5sum to -20.
>
> So here is my question: is it possible to bring down the machine by simply
> doing "renice -20 cpu_intensive_process"?
>
In case of md5sum: it should not be. At least you should have been able to
unblank the console pressing any key, or have sysrq available.

> As I said, this machine does heavy compression and md5sum calculations of big
> files every day, and was stable all the time - but stopped responding after I
> changed the priority of a CPU-intensive process to -20.
>
> Coincidence and a hardware failure?
>
Sysrq+T (and/or +P) will tell you where the CPU is running.


Jan Engelhardt
--

2006-03-10 14:52:48

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Jan Engelhardt wrote:
>> Subject: can I bring Linux down by running "renice -20 cpu_intensive_process"?
>>
> Depends on what the cpu_intensive_process does. If it tries to allocate
> lots of memory, maybe. If it's _just_ CPU (as in `perl -e '1 while 1'`),
> you get a chance that you can input some commands on a terminal to kill it.
> SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.
>
>> I have a Linux server (kernel 2.6.8.1 + Linux RAID1) which is a "backup"
>> machine: it gets the files from other servers, compresses it, writes to the
>> tape, checks md5sums etc.
>>
>> It's been running for quite a bit, no problems with stability so far.
>>
> Why would you need it to run at -20 anyway?

Hmm, I hoped md5sum 30_gig would finish before I finish work to start
writing new data on tape...


>> As I restarted the machine, I saw that the logging ends few minutes after I
>> changed the priority of md5sum to -20.
>>
>> So here is my question: is it possible to bring down the machine by simply
>> doing "renice -20 cpu_intensive_process"?
>>
> In case of md5sum: it should not be. At least you should have been able to
> unblank the console pressing any key, or have sysrq available.

So in my case it just died for some reason (the console didn't unblank;
the md5sum process should have ended long time ago).
On the other hand, the machine was responding to pings, and the ports
were open, so it wasn't totally dead.

Hmm, so we can just speculate what it was.


--
Tomasz Chmielewski

2006-03-10 22:02:42

by Måns Rullgård

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Jan Engelhardt <[email protected]> writes:

>>Subject: can I bring Linux down by running "renice -20
>>cpu_intensive_process"?
>>
> Depends on what the cpu_intensive_process does. If it tries to allocate
> lots of memory, maybe. If it's _just_ CPU (as in `perl -e '1 while 1'`),
> you get a chance that you can input some commands on a terminal to kill it.
> SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.

Sysrq+n changes all realtime tasks to normal priority.

--
M?ns Rullg?rd
[email protected]

2006-03-10 22:06:55

by Jeffrey Hundstad

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?



M?ns Rullg?rd wrote:

>Jan Engelhardt <[email protected]> writes:
>
>
>
>>>Subject: can I bring Linux down by running "renice -20
>>>cpu_intensive_process"?
>>>
>>>
>>>
>>Depends on what the cpu_intensive_process does. If it tries to allocate
>>lots of memory, maybe. If it's _just_ CPU (as in `perl -e '1 while 1'`),
>>you get a chance that you can input some commands on a terminal to kill it.
>>SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.
>>
>>
>
>Sysrq+n changes all realtime tasks to normal priority.
>
>
>

Patient: "Doctor When I poke myself in the eye it hurts."
Doctor "Don't do that then."

--
Jeffrey Hundstad

2006-03-11 10:42:40

by Måns Rullgård

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Jeffrey Hundstad <[email protected]> writes:

> M?ns Rullg?rd wrote:
>
>>Jan Engelhardt <[email protected]> writes:
>>
>>>>Subject: can I bring Linux down by running "renice -20
>>>>cpu_intensive_process"?
>>>>
>>> Depends on what the cpu_intensive_process does. If it tries to
>>> allocate lots of memory, maybe. If it's _just_ CPU (as in `perl -e
>>> '1 while 1'`), you get a chance that you can input some commands on
>>> a terminal to kill it.
>>>SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.
>>
>>Sysrq+n changes all realtime tasks to normal priority.
>
> Patient: "Doctor When I poke myself in the eye it hurts."
> Doctor "Don't do that then."

A bug might cause an otherwise well-behaved realtime process to start
spinning in a loop or something. Having a way to stop it is good,
IMHO.

--
M?ns Rullg?rd
[email protected]

2006-03-12 01:34:59

by Luke Dashjr

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Friday 10 March 2006 22:01, M?ns Rullg?rd wrote:
> Sysrq+n changes all realtime tasks to normal priority.

Would the kernel's main loop (where I presume Sysreq is handled) get a chance
to run with a constantly busy realtime task?

2006-03-12 03:44:39

by Lee Revell

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Fri, 2006-03-10 at 22:01 +0000, M?ns Rullg?rd wrote:
> Jan Engelhardt <[email protected]> writes:
>
> >>Subject: can I bring Linux down by running "renice -20
> >>cpu_intensive_process"?
> >>
> > Depends on what the cpu_intensive_process does. If it tries to allocate
> > lots of memory, maybe. If it's _just_ CPU (as in `perl -e '1 while 1'`),
> > you get a chance that you can input some commands on a terminal to kill it.
> > SCHED_FIFO'ing or SCHED_RR'ing such a process is sudden death of course.
>
> Sysrq+n changes all realtime tasks to normal priority.
>

A nice -20 SCHED_OTHER task is not realtime, only SCHED_FIFO and
SCHED_RR.

Lee

2006-03-12 03:46:45

by Lee Revell

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Sun, 2006-03-12 at 01:41 +0000, Luke-Jr wrote:
> On Friday 10 March 2006 22:01, M?ns Rullg?rd wrote:
> > Sysrq+n changes all realtime tasks to normal priority.
>
> Would the kernel's main loop (where I presume Sysreq is handled) get a chance
> to run with a constantly busy realtime task?

No, other tasks will not get to run at all. A SCHED_FIFO task that
spins forever is a fatal bug. Making it less dangerous would require
making it less useful.

Lee

2006-03-12 03:50:41

by Lee Revell

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Sun, 2006-03-12 at 01:41 +0000, Luke-Jr wrote:
> On Friday 10 March 2006 22:01, M?ns Rullg?rd wrote:
> > Sysrq+n changes all realtime tasks to normal priority.
>
> Would the kernel's main loop (where I presume Sysreq is handled) get a chance
> to run with a constantly busy realtime task?

Sorry I was thinking of the -rt kernel in my previous post - in mainline
this would be effective. In the -rt kernel you are screwed if the
spinning RT task is higher priority than the keyboard IRQ thread.

Lee

2006-03-12 12:00:21

by Måns Rullgård

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Lee Revell <[email protected]> writes:

> On Fri, 2006-03-10 at 22:01 +0000, M?ns Rullg?rd wrote:
>> Jan Engelhardt <[email protected]> writes:
>>
>> >>Subject: can I bring Linux down by running "renice -20
>> >>cpu_intensive_process"?
>> >>
>> > Depends on what the cpu_intensive_process does. If it tries to
>> > allocate lots of memory, maybe. If it's _just_ CPU (as in `perl
>> > -e '1 while 1'`), you get a chance that you can input some
>> > commands on a terminal to kill it. SCHED_FIFO'ing or
>> > SCHED_RR'ing such a process is sudden death of course.
>>
>> Sysrq+n changes all realtime tasks to normal priority.
>>
>
> A nice -20 SCHED_OTHER task is not realtime, only SCHED_FIFO and
> SCHED_RR.

Maybe extending sysrq+n to lower the priority of -20 tasks would be a
good idea.

--
M?ns Rullg?rd
[email protected]

2006-03-16 21:13:50

by Bill Davidsen

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

M?ns Rullg?rd wrote:
> Lee Revell <[email protected]> writes:
>
>> On Fri, 2006-03-10 at 22:01 +0000, M?ns Rullg?rd wrote:
>>> Jan Engelhardt <[email protected]> writes:
>>>
>>>>> Subject: can I bring Linux down by running "renice -20
>>>>> cpu_intensive_process"?
>>>>>
>>>> Depends on what the cpu_intensive_process does. If it tries to
>>>> allocate lots of memory, maybe. If it's _just_ CPU (as in `perl
>>>> -e '1 while 1'`), you get a chance that you can input some
>>>> commands on a terminal to kill it. SCHED_FIFO'ing or
>>>> SCHED_RR'ing such a process is sudden death of course.
>>> Sysrq+n changes all realtime tasks to normal priority.
>>>
>> A nice -20 SCHED_OTHER task is not realtime, only SCHED_FIFO and
>> SCHED_RR.
>
> Maybe extending sysrq+n to lower the priority of -20 tasks would be a
> good idea.
>
If it runs before the keyboard thread it doesn't matter... But why
should this hang anything, when there should be enough i/o to get out of
the user process. There's a good fix for this, don't give this guy root
any more ;-)

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2006-03-16 22:11:58

by Lee Revell

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Thu, 2006-03-16 at 16:15 -0500, Bill Davidsen wrote:
> There's a good fix for this, don't give this guy root
> any more ;-)

Minor nit: s/root/realtime privileges/. Since 2.6.12 these have been
decoupled. No official distro release supports it OOTB yet (the
upcoming Ubuntu Dapper will).

Lee

2006-03-16 22:51:33

by Måns Rullgård

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Bill Davidsen <[email protected]> writes:

> M?ns Rullg?rd wrote:
>> Lee Revell <[email protected]> writes:
>>
>>> On Fri, 2006-03-10 at 22:01 +0000, M?ns Rullg?rd wrote:
>>>> Jan Engelhardt <[email protected]> writes:
>>>>
>>>>>> Subject: can I bring Linux down by running "renice -20
>>>>>> cpu_intensive_process"?
>>>>>>
>>>>> Depends on what the cpu_intensive_process does. If it tries to
>>>>> allocate lots of memory, maybe. If it's _just_ CPU (as in `perl
>>>>> -e '1 while 1'`), you get a chance that you can input some
>>>>> commands on a terminal to kill it. SCHED_FIFO'ing or
>>>>> SCHED_RR'ing such a process is sudden death of course.
>>>> Sysrq+n changes all realtime tasks to normal priority.
>>>>
>>> A nice -20 SCHED_OTHER task is not realtime, only SCHED_FIFO and
>>> SCHED_RR.
>> Maybe extending sysrq+n to lower the priority of -20 tasks would be a
>> good idea.
>>
> If it runs before the keyboard thread it doesn't matter...

Of course not, but that's not generally the case.

> But why should this hang anything, when there should be enough i/o
> to get out of the user process. There's a good fix for this, don't
> give this guy root any more ;-)

Ever heard of bugs? Anyone developing a program can make a mistake.
If the program runs with realtime scheduling a bug that makes it enter
an infinite loop (or do something else that hogs the CPU) can be
difficult to find since it rather efficiently locks you out.

--
M?ns Rullg?rd
[email protected]

2006-03-17 06:04:16

by Mike Galbraith

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

On Thu, 2006-03-16 at 22:51 +0000, M?ns Rullg?rd wrote:
> Bill Davidsen <[email protected]> writes:
>
> > M?ns Rullg?rd wrote:
> >> Maybe extending sysrq+n to lower the priority of -20 tasks would be a
> >> good idea.
> >>
> > If it runs before the keyboard thread it doesn't matter...
>
> Of course not, but that's not generally the case.
>
> > But why should this hang anything, when there should be enough i/o
> > to get out of the user process. There's a good fix for this, don't
> > give this guy root any more ;-)
>
> Ever heard of bugs? Anyone developing a program can make a mistake.
> If the program runs with realtime scheduling a bug that makes it enter
> an infinite loop (or do something else that hogs the CPU) can be
> difficult to find since it rather efficiently locks you out.

Given that someone has already determined that installing a safety valve
for RT tasks was worth while, and given that there is practically no
difference between a nice -20 and the lowest RT priority, seems to me
that extending that safety valve to cover reniced tasks is the
obviously-correct thing to do. I think you should submit a patch.

-Mike

2006-03-17 22:22:26

by Måns Rullgård

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

Mike Galbraith <[email protected]> writes:

> On Thu, 2006-03-16 at 22:51 +0000, M?ns Rullg?rd wrote:
>> Bill Davidsen <[email protected]> writes:
>>
>> > M?ns Rullg?rd wrote:
>> >> Maybe extending sysrq+n to lower the priority of -20 tasks would be a
>> >> good idea.
>> >>
>> > If it runs before the keyboard thread it doesn't matter...
>>
>> Of course not, but that's not generally the case.
>>
>> > But why should this hang anything, when there should be enough i/o
>> > to get out of the user process. There's a good fix for this, don't
>> > give this guy root any more ;-)
>>
>> Ever heard of bugs? Anyone developing a program can make a mistake.
>> If the program runs with realtime scheduling a bug that makes it enter
>> an infinite loop (or do something else that hogs the CPU) can be
>> difficult to find since it rather efficiently locks you out.
>
> Given that someone has already determined that installing a safety valve
> for RT tasks was worth while, and given that there is practically no
> difference between a nice -20 and the lowest RT priority, seems to me
> that extending that safety valve to cover reniced tasks is the
> obviously-correct thing to do. I think you should submit a patch.

Something like this ought to do it (untested):

--- kernel/sched.c.orig 2006-02-09 23:41:57.000000000 +0000
+++ kernel/sched.c 2006-03-17 22:16:46.257298014 +0000
@@ -5681,21 +5681,22 @@

read_lock_irq(&tasklist_lock);
for_each_process (p) {
- if (!rt_task(p))
- continue;
+ if (rt_task(p)) {
+ rq = task_rq_lock(p, &flags);

- rq = task_rq_lock(p, &flags);
-
- array = p->array;
- if (array)
- deactivate_task(p, task_rq(p));
- __setscheduler(p, SCHED_NORMAL, 0);
- if (array) {
- __activate_task(p, task_rq(p));
- resched_task(rq->curr);
+ array = p->array;
+ if (array)
+ deactivate_task(p, task_rq(p));
+ __setscheduler(p, SCHED_NORMAL, 0);
+ if (array) {
+ __activate_task(p, task_rq(p));
+ resched_task(rq->curr);
+ }
+
+ task_rq_unlock(rq, &flags);
+ } else if (TASK_NICE(p) == -20) {
+ set_user_nice(p, 0);
}
-
- task_rq_unlock(rq, &flags);
}
read_unlock_irq(&tasklist_lock);
}


--
M?ns Rullg?rd
[email protected]

2006-03-18 12:36:55

by Bill Davidsen

[permalink] [raw]
Subject: Re: can I bring Linux down by running "renice -20 cpu_intensive_process"?

M?ns Rullg?rd wrote:

>Bill Davidsen <[email protected]> writes:
>
>
>
>>M?ns Rullg?rd wrote:
>>
>>
>>>Lee Revell <[email protected]> writes:
>>>
>>>
>>>
>>>>On Fri, 2006-03-10 at 22:01 +0000, M?ns Rullg?rd wrote:
>>>>
>>>>
>>>>>Jan Engelhardt <[email protected]> writes:
>>>>>
>>>>>
>>>>>
>>>>>>>Subject: can I bring Linux down by running "renice -20
>>>>>>>cpu_intensive_process"?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>Depends on what the cpu_intensive_process does. If it tries to
>>>>>>allocate lots of memory, maybe. If it's _just_ CPU (as in `perl
>>>>>>-e '1 while 1'`), you get a chance that you can input some
>>>>>>commands on a terminal to kill it. SCHED_FIFO'ing or
>>>>>>SCHED_RR'ing such a process is sudden death of course.
>>>>>>
>>>>>>
>>>>>Sysrq+n changes all realtime tasks to normal priority.
>>>>>
>>>>>
>>>>>
>>>>A nice -20 SCHED_OTHER task is not realtime, only SCHED_FIFO and
>>>>SCHED_RR.
>>>>
>>>>
>>>Maybe extending sysrq+n to lower the priority of -20 tasks would be a
>>>good idea.
>>>
>>>
>>>
>>If it runs before the keyboard thread it doesn't matter...
>>
>>
>
>Of course not, but that's not generally the case.
>
>
>
>>But why should this hang anything, when there should be enough i/o
>>to get out of the user process. There's a good fix for this, don't
>>give this guy root any more ;-)
>>
>>
>
>Ever heard of bugs? Anyone developing a program can make a mistake.
>If the program runs with realtime scheduling a bug that makes it enter
>an infinite loop (or do something else that hogs the CPU) can be
>difficult to find since it rather efficiently locks you out.
>
>
>
Please google "emoticons" and find out what those funny characters at
the end of the of the paragraph you quoted really mean. Sheesh!

--
bill davidsen <[email protected]>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979