2007-05-07 10:11:39

by Satoru Takeuchi

[permalink] [raw]
Subject: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

Hi,

I found a bug on 2.6.21 cpu-hotplug code.

When process A on CPU0 try to offline the CPU1 on which the process B,
realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
without sleep or yield, both CPU0 and CPU1 get hang. It's because of
the following code on __stop_machine_run().

struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
unsigned int cpu)
{
...
p = kthread_create(do_stop, &smdata, "kstopmachine");
if (!IS_ERR(p)) {
kthread_bind(p, cpu);
wake_up_process(p);
wait_for_completion(&smdata.done);
}
...
}

kstopmachine is created, bound to the CPU1, and woken up here, but
this process can't start to run because reschedule doesn't occur on
CPU1. Hence CPU0 also be able to run because it's waiting completion
of CPU1's offline work.

Thanks,

Sat


2007-05-07 10:47:28

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Mon, May 07, 2007 at 07:10:05PM +0900, Satoru Takeuchi wrote:
> Hi,
>
> I found a bug on 2.6.21 cpu-hotplug code.
>
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang.

One could argue that this can be tackled in userspace by SIGSTOPping all
such real-time threads before hotplugging CPUs and SIGCONTing them after
hotplug is complete.

Would this simple solution be acceptable?

Otherwise, we need to have:

1. __stop_machine_run() set the priority/policy of the first kthread
(do_stop) to MAX_RT_PRIO-1/SCHED_FIFO *before* waking it up

2. scheduler gives some API to add a thread to /front/ of runqueue
(enqueue_task_head is internal to sched.c) and use that API in
activating all stop_machine related threads.

> It's because of the following code on __stop_machine_run().
>
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> unsigned int cpu)
> {
> ...
> p = kthread_create(do_stop, &smdata, "kstopmachine");
> if (!IS_ERR(p)) {
> kthread_bind(p, cpu);
> wake_up_process(p);
> wait_for_completion(&smdata.done);
> }
> ...
> }
>
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.

--
Regards,
vatsa

Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

Hi Satoru,

On Mon, May 07, 2007 at 07:10:05PM +0900, Satoru Takeuchi wrote:
> Hi,
>
> I found a bug on 2.6.21 cpu-hotplug code.

IIRC, __stop_machine_run is used by subsystems other than cpu-hotplug.
So we're not the only ones bugged.

>
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> the following code on __stop_machine_run().
>
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> unsigned int cpu)
> {
> ...
> p = kthread_create(do_stop, &smdata, "kstopmachine");
> if (!IS_ERR(p)) {
> kthread_bind(p, cpu);
> wake_up_process(p);
> wait_for_completion(&smdata.done);
> }
> ...
> }
>
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.


But each of these stop_machine_run threads run at MAX_RT_PRIO - 1
with SCHED_FIFO. So unless B is also running at MAX_RT_PRIO - 1,
there should not be a hang. Moreover, I doubt if we have kernel threads(B)
which runs at MAX_RT_PRIO - 1.

Nevertheless, with the freezer based approach that we're experimenting,
this problem shouldn't arise. We expect the whole system to get frozen
before we actually do a cpu_down() (which will then call
__stop_machine_run). So any such rogue RT task will have to first fail
the freezer ( which it will), but that's ok, since on a freezer-fail,
we just thaw all the processes and get the system up and running again.
Yeah, the cpu-hotplug operation will fail though.


>
> Thanks,
>
> Sat

Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-07 10:54:54

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Mon, May 07, 2007 at 04:17:24PM +0530, Gautham R Shenoy wrote:
> Nevertheless, with the freezer based approach that we're experimenting,
> this problem shouldn't arise. We expect the whole system to get frozen
> before we actually do a cpu_down() (which will then call
> __stop_machine_run). So any such rogue RT task will have to first fail
> the freezer ( which it will),

>From what I understand of the freezer, if the RT task is running in user
space (which seems to be the case in this thread), it should get frozen even
if it is a forever running SCHED_FIFO task at MAX_RT_PRIO -1 priority?

> but that's ok, since on a freezer-fail,
> we just thaw all the processes and get the system up and running again.
> Yeah, the cpu-hotplug operation will fail though.

--
Regards,
vatsa

2007-05-07 10:56:14

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Mon, 07 May 2007 19:10:05 +0900
Satoru Takeuchi <[email protected]> wrote:


> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.
>
Is this Bug ? It seems the system works as designed...

Hmm, adding stop_machine_run_interruptible() and
using wait_for_completion_interruptible() instead of wait_for_completion()
is O.K. ? Then we can stop cpu hot-unplug by signal. Is this okay for you ?

-Kame

Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Mon, May 07, 2007 at 04:32:56PM +0530, Srivatsa Vaddagiri wrote:
> On Mon, May 07, 2007 at 04:17:24PM +0530, Gautham R Shenoy wrote:
> > Nevertheless, with the freezer based approach that we're experimenting,
> > this problem shouldn't arise. We expect the whole system to get frozen
> > before we actually do a cpu_down() (which will then call
> > __stop_machine_run). So any such rogue RT task will have to first fail
> > the freezer ( which it will),
>
> >From what I understand of the freezer, if the RT task is running in user
> space (which seems to be the case in this thread), it should get frozen even
> if it is a forever running SCHED_FIFO task at MAX_RT_PRIO -1 priority?

Yes, you are right. It will end up getting the fake signal.
So yeah, freezer pretty much solves the problem for cpu hotplug.

But I now wonder if we will have some problem with module stopping if we
have a high prio SCHED_FIFO in the system.

>
> --
> Regards,
> vatsa

Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-07 13:43:19

by Rusty Russell

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Mon, 2007-05-07 at 19:10 +0900, Satoru Takeuchi wrote:
> Hi,
>
> I found a bug on 2.6.21 cpu-hotplug code.
>
> When process A on CPU0 try to offline the CPU1 on which the process B,
> realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> the following code on __stop_machine_run().
>
> struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> unsigned int cpu)
> {
> ...
> p = kthread_create(do_stop, &smdata, "kstopmachine");
> if (!IS_ERR(p)) {
> kthread_bind(p, cpu);
> wake_up_process(p);
> wait_for_completion(&smdata.done);
> }
> ...
> }
>
> kstopmachine is created, bound to the CPU1, and woken up here, but
> this process can't start to run because reschedule doesn't occur on
> CPU1. Hence CPU0 also be able to run because it's waiting completion
> of CPU1's offline work.

Yes, we should probably move the set_scheduler call in stop_machine
(where the thread up-prioritizes itself) to before wake_up_process(p),
to avoid this happening.

Others have suggested we use the freezer; I've always distrusted that
code. It's much trickier than stop_machine().

I look forward to your patch!
Rusty.

2007-05-08 02:43:59

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

At Mon, 07 May 2007 23:42:53 +1000,
Rusty Russell wrote:
>
> On Mon, 2007-05-07 at 19:10 +0900, Satoru Takeuchi wrote:
> > Hi,
> >
> > I found a bug on 2.6.21 cpu-hotplug code.
> >
> > When process A on CPU0 try to offline the CPU1 on which the process B,
> > realtime process (its task->policy == SCHED_FIFO or SCHED_RR) running
> > without sleep or yield, both CPU0 and CPU1 get hang. It's because of
> > the following code on __stop_machine_run().
> >
> > struct task_struct *__stop_machine_run(int (*fn)(void *), void *data,
> > unsigned int cpu)
> > {
> > ...
> > p = kthread_create(do_stop, &smdata, "kstopmachine");
> > if (!IS_ERR(p)) {
> > kthread_bind(p, cpu);
> > wake_up_process(p);
> > wait_for_completion(&smdata.done);
> > }
> > ...
> > }
> >
> > kstopmachine is created, bound to the CPU1, and woken up here, but
> > this process can't start to run because reschedule doesn't occur on
> > CPU1. Hence CPU0 also be able to run because it's waiting completion
> > of CPU1's offline work.
>
> Yes, we should probably move the set_scheduler call in stop_machine
> (where the thread up-prioritizes itself) to before wake_up_process(p),
> to avoid this happening.
>
> Others have suggested we use the freezer; I've always distrusted that
> code. It's much trickier than stop_machine().
>
> I look forward to your patch!
> Rusty.

Thanks, I'll do. Maybe this work will take several days including test.

BTW, how should I manage rt process having max priority as Gautham said?
He said that it's OK unless such kernel thread exists. However, currently
MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
"Shouldn't abuse highest rt proority" rule?

Thanks,

Satoru

2007-05-08 03:08:09

by Rusty Russell

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Tue, 2007-05-08 at 11:41 +0900, Satoru Takeuchi wrote:
> At Mon, 07 May 2007 23:42:53 +1000,
> Rusty Russell wrote:
> > I look forward to your patch!
> > Rusty.
>
> Thanks, I'll do. Maybe this work will take several days including test.

Excellent.

> BTW, how should I manage rt process having max priority as Gautham said?
> He said that it's OK unless such kernel thread exists. However, currently
> MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
> to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
> "Shouldn't abuse highest rt proority" rule?

We used to be able to create kernel threads higher than any userspace
priority. If this is no longer true, I think that's OK: equal priority
still means we'll get scheduled, right?

Cheers,
Rusty.


2007-05-08 03:31:10

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

At Tue, 08 May 2007 13:02:25 +1000,
Rusty Russell wrote:
>
> On Tue, 2007-05-08 at 11:41 +0900, Satoru Takeuchi wrote:
> > At Mon, 07 May 2007 23:42:53 +1000,
> > Rusty Russell wrote:
> > > I look forward to your patch!
> > > Rusty.
> >
> > Thanks, I'll do. Maybe this work will take several days including test.
>
> Excellent.
>
> > BTW, how should I manage rt process having max priority as Gautham said?
> > He said that it's OK unless such kernel thread exists. However, currently
> > MAX_USER_RT_PRIORITY is equal to MAX_RT_PRIO, so user process also be able
> > to cause this problem. Is Srivatsa's idea 2 acceptable? Or just apply
> > "Shouldn't abuse highest rt proority" rule?
>
> We used to be able to create kernel threads higher than any userspace
> priority. If this is no longer true, I think that's OK: equal priority
> still means we'll get scheduled, right?

IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
and only relinquish CPU time voluntarily.

# Hence this problem is complicated ;-(

Thanks,

Satoru

2007-05-08 04:02:51

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Tue, May 08, 2007 at 12:29:19PM +0900, Satoru Takeuchi wrote:
> > We used to be able to create kernel threads higher than any userspace
> > priority. If this is no longer true, I think that's OK: equal priority
> > still means we'll get scheduled, right?
>
> IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> and only relinquish CPU time voluntarily.

yeah ..this is truly a problem if SCHED_FIFO user-space cpu hog task is
running at MAX_USER_RT_PRIO (which happens to be same as max real-time
priority kernel threads can attain - MAX_RT_PRIO).

One option is to make MAX_USER_RT_PRIO < MAX_RT_PRIO. I am not sure what
semantics that will break (perhaps the real-time folks can clarify
that).

The other easier option is to add a wake_up_process_to_front() API in
sched.c, which essentially wakes up the process and enqueues the task to
the front of runqueue.

> # Hence this problem is complicated ;-(

--
Regards,
vatsa

2007-05-08 04:04:33

by Rusty Russell

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Tue, 2007-05-08 at 12:29 +0900, Satoru Takeuchi wrote:
> At Tue, 08 May 2007 13:02:25 +1000,
> Rusty Russell wrote:
> > We used to be able to create kernel threads higher than any userspace
> > priority. If this is no longer true, I think that's OK: equal priority
> > still means we'll get scheduled, right?
>
> IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> and only relinquish CPU time voluntarily.
>
> # Hence this problem is complicated ;-(

OK, well, I agree with "don't do that" then 8)

Thanks,
Rusty.

2007-05-08 07:17:50

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

At Tue, 8 May 2007 09:40:33 +0530,
Srivatsa Vaddagiri wrote:
>
> On Tue, May 08, 2007 at 12:29:19PM +0900, Satoru Takeuchi wrote:
> > > We used to be able to create kernel threads higher than any userspace
> > > priority. If this is no longer true, I think that's OK: equal priority
> > > still means we'll get scheduled, right?
> >
> > IF SCHED_RR, yes. However, if SCHED_FIFO, no. Such process doen't have timeslice
> > and only relinquish CPU time voluntarily.
>
> yeah ..this is truly a problem if SCHED_FIFO user-space cpu hog task is
> running at MAX_USER_RT_PRIO (which happens to be same as max real-time
> priority kernel threads can attain - MAX_RT_PRIO).
>
> One option is to make MAX_USER_RT_PRIO < MAX_RT_PRIO. I am not sure what
> semantics that will break (perhaps the real-time folks can clarify
> that).

Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
and the meaning of each entry is as follows, I think.

+-----------+-----------------------------------------------+
| index | usage |
+-----------+-----------------------------------------------+
| 0 - 98 | RT processes are here. They are in the entry |
| | whose index is 99 - sched_priority. |
+-----------+-----------------------------------------------+
| 99 | No one use it? CMIIW. |
+-----------+-----------------------------------------------+
| 100 - 139 | Ordinally processes are here. They are in the |
| | entry whose index is (nice+120) +/- 5 |
+-----------+-----------------------------------------------+

What's the purpose of the prio_array[99]? Once I exlore source tree
briefly and can't found any kernel thread which uses this entry.
Does anybody know?

Regards,

Satoru

2007-05-08 16:41:12

by Srivatsa Vaddagiri

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> and the meaning of each entry is as follows, I think.
>
> +-----------+-----------------------------------------------+
> | index | usage |
> +-----------+-----------------------------------------------+
> | 0 - 98 | RT processes are here. They are in the entry |
> | | whose index is 99 - sched_priority. |

>From sched.h:

/*
* Priority of a process goes from 0..MAX_PRIO-1, valid RT
* priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
* tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.

so shouldn't the index for RT processes be 0 - 99, given that
MAX_RT_PRIO = 100?

> +-----------+-----------------------------------------------+
> | 99 | No one use it? CMIIW. |
> +-----------+-----------------------------------------------+
> | 100 - 139 | Ordinally processes are here. They are in the |
> | | entry whose index is (nice+120) +/- 5 |
> +-----------+-----------------------------------------------+
>
> What's the purpose of the prio_array[99]? Once I exlore source tree
> briefly and can't found any kernel thread which uses this entry.
> Does anybody know?

--
Regards,
vatsa

2007-05-09 00:42:39

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

At Tue, 8 May 2007 22:18:50 +0530,
Srivatsa Vaddagiri wrote:
>
> On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> > Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> > and the meaning of each entry is as follows, I think.
> >
> > +-----------+-----------------------------------------------+
> > | index | usage |
> > +-----------+-----------------------------------------------+
> > | 0 - 98 | RT processes are here. They are in the entry |
> > | | whose index is 99 - sched_priority. |
>
> >From sched.h:
>
> /*
> * Priority of a process goes from 0..MAX_PRIO-1, valid RT
> * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
> * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
>
> so shouldn't the index for RT processes be 0 - 99, given that
> MAX_RT_PRIO = 100?

However `man sched_priority' says...


Processes scheduled with SCHED_OTHER or SCHED_BATCH must
be assigned the static priority 0. Processes scheduled
under SCHED_FIFO or SCHED_RR can have a static priority
in the range 1 to 99. The system calls
sched_get_priority_min() and sched_get_priority_max() can
be used to find out the valid priority range for a
scheduling policy in a portable way on all POSIX.1-2001
conforming systems.


and see the kernel/sched.c ...


int sched_setscheduler(struct task_struct *p, int policy,
struct sched_param *param)
{
...
/*
* Valid priorities for SCHED_FIFO and SCHED_RR are
* 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
* SCHED_BATCH is 0.
*/
if (param->sched_priority < 0 ||
(p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
(!p->mm && param->sched_priority > MAX_RT_PRIO-1))
return -EINVAL;
if (is_rt_policy(policy) != (param->sched_priority != 0))
return -EINVAL;
...
}


So, if I want to set the rt_prio of a kernel_thread, we can't use this
entry unless set t->prio to 99 directly. I don't know whether we are
allowed to write such code bipassing sched_setscheduler(). In addition,
even if kernel_thread can use this index , I can't understand it's usage.
It can only be used by kernel, but its priority is LOWER than any real
time thread.

If the rule can be changed to the following...

+-----------+-----------------------------------------------+
| index | usage |
+-----------+-----------------------------------------------+
| 0 | RT processes are here. Only kernel can use |
| | this entry. |
+-----------+-----------------------------------------------+
| 1 - 99 | RT processes are here. They are in the entry |
| | whose index is 99 - sched_priority. |
+-----------+-----------------------------------------------+
| 100 - 139 | Ordinally processes are here. They are in the |
| | entry whose index is (nice+120) +/- 5 |
+-----------+-----------------------------------------------+

... there will be an entry only used by kernel and its priority is HIGHER
than any user process, and I'll get happy :-)

Thanks,

Satoru

>
> > +-----------+-----------------------------------------------+
> > | 99 | No one use it? CMIIW. |
> > +-----------+-----------------------------------------------+
> > | 100 - 139 | Ordinally processes are here. They are in the |
> > | | entry whose index is (nice+120) +/- 5 |
> > +-----------+-----------------------------------------------+
> >
> > What's the purpose of the prio_array[99]? Once I exlore source tree
> > briefly and can't found any kernel thread which uses this entry.
> > Does anybody know?
>
> --
> Regards,
> vatsa

2007-05-09 00:48:17

by Nick Piggin

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

Satoru Takeuchi wrote:
> At Tue, 8 May 2007 22:18:50 +0530,
> Srivatsa Vaddagiri wrote:
>
>>On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
>>
>>>Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
>>>and the meaning of each entry is as follows, I think.
>>>
>>>+-----------+-----------------------------------------------+
>>>| index | usage |
>>>+-----------+-----------------------------------------------+
>>>| 0 - 98 | RT processes are here. They are in the entry |
>>>| | whose index is 99 - sched_priority. |
>>
>>>From sched.h:
>>
>>/*
>> * Priority of a process goes from 0..MAX_PRIO-1, valid RT
>> * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
>> * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
>>
>>so shouldn't the index for RT processes be 0 - 99, given that
>>MAX_RT_PRIO = 100?
>
>
> However `man sched_priority' says...
>
>
> Processes scheduled with SCHED_OTHER or SCHED_BATCH must
> be assigned the static priority 0. Processes scheduled
> under SCHED_FIFO or SCHED_RR can have a static priority
> in the range 1 to 99. The system calls
> sched_get_priority_min() and sched_get_priority_max() can
> be used to find out the valid priority range for a
> scheduling policy in a portable way on all POSIX.1-2001
> conforming systems.
>
>
> and see the kernel/sched.c ...
>
>
> int sched_setscheduler(struct task_struct *p, int policy,
> struct sched_param *param)
> {
> ...
> /*
> * Valid priorities for SCHED_FIFO and SCHED_RR are
> * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
> * SCHED_BATCH is 0.
> */
> if (param->sched_priority < 0 ||
> (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
> (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
> return -EINVAL;
> if (is_rt_policy(policy) != (param->sched_priority != 0))
> return -EINVAL;
> ...
> }
>
>
> So, if I want to set the rt_prio of a kernel_thread, we can't use this
> entry unless set t->prio to 99 directly. I don't know whether we are
> allowed to write such code bipassing sched_setscheduler(). In addition,
> even if kernel_thread can use this index , I can't understand it's usage.
> It can only be used by kernel, but its priority is LOWER than any real
> time thread.
>
> If the rule can be changed to the following...
>
> +-----------+-----------------------------------------------+
> | index | usage |
> +-----------+-----------------------------------------------+
> | 0 | RT processes are here. Only kernel can use |
> | | this entry. |
> +-----------+-----------------------------------------------+
> | 1 - 99 | RT processes are here. They are in the entry |
> | | whose index is 99 - sched_priority. |
> +-----------+-----------------------------------------------+
> | 100 - 139 | Ordinally processes are here. They are in the |
> | | entry whose index is (nice+120) +/- 5 |
> +-----------+-----------------------------------------------+
>
> ... there will be an entry only used by kernel and its priority is HIGHER
> than any user process, and I'll get happy :-)

We've seen the same problem with other stop_machine_run sites in the kernel.
module remove was one.

Reserving the top priority slot for stop machine (and migration thread, I
guess) isn't a bad idea.

--
SUSE Labs, Novell Inc.

2007-05-09 06:33:52

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

At Wed, 09 May 2007 10:47:50 +1000,
Nick Piggin wrote:
>
> Satoru Takeuchi wrote:
> > At Tue, 8 May 2007 22:18:50 +0530,
> > Srivatsa Vaddagiri wrote:
> >
> >>On Tue, May 08, 2007 at 04:16:06PM +0900, Satoru Takeuchi wrote:
> >>
> >>>Sometimes I wonder at prio_array. It has 140 entries(from 0 to 139),
> >>>and the meaning of each entry is as follows, I think.
> >>>
> >>>+-----------+-----------------------------------------------+
> >>>| index | usage |
> >>>+-----------+-----------------------------------------------+
> >>>| 0 - 98 | RT processes are here. They are in the entry |
> >>>| | whose index is 99 - sched_priority. |
> >>
> >>>From sched.h:
> >>
> >>/*
> >> * Priority of a process goes from 0..MAX_PRIO-1, valid RT
> >> * priority is 0..MAX_RT_PRIO-1, and SCHED_NORMAL/SCHED_BATCH
> >> * tasks are in the range MAX_RT_PRIO..MAX_PRIO-1.
> >>
> >>so shouldn't the index for RT processes be 0 - 99, given that
> >>MAX_RT_PRIO = 100?
> >
> >
> > However `man sched_priority' says...
> >
> >
> > Processes scheduled with SCHED_OTHER or SCHED_BATCH must
> > be assigned the static priority 0. Processes scheduled
> > under SCHED_FIFO or SCHED_RR can have a static priority
> > in the range 1 to 99. The system calls
> > sched_get_priority_min() and sched_get_priority_max() can
> > be used to find out the valid priority range for a
> > scheduling policy in a portable way on all POSIX.1-2001
> > conforming systems.
> >
> >
> > and see the kernel/sched.c ...
> >
> >
> > int sched_setscheduler(struct task_struct *p, int policy,
> > struct sched_param *param)
> > {
> > ...
> > /*
> > * Valid priorities for SCHED_FIFO and SCHED_RR are
> > * 1..MAX_USER_RT_PRIO-1, valid priority for SCHED_NORMAL and
> > * SCHED_BATCH is 0.
> > */
> > if (param->sched_priority < 0 ||
> > (p->mm && param->sched_priority > MAX_USER_RT_PRIO-1) ||
> > (!p->mm && param->sched_priority > MAX_RT_PRIO-1))
> > return -EINVAL;
> > if (is_rt_policy(policy) != (param->sched_priority != 0))
> > return -EINVAL;
> > ...
> > }
> >
> >
> > So, if I want to set the rt_prio of a kernel_thread, we can't use this
> > entry unless set t->prio to 99 directly. I don't know whether we are
> > allowed to write such code bipassing sched_setscheduler(). In addition,
> > even if kernel_thread can use this index , I can't understand it's usage.
> > It can only be used by kernel, but its priority is LOWER than any real
> > time thread.
> >
> > If the rule can be changed to the following...
> >
> > +-----------+-----------------------------------------------+
> > | index | usage |
> > +-----------+-----------------------------------------------+
> > | 0 | RT processes are here. Only kernel can use |
> > | | this entry. |
> > +-----------+-----------------------------------------------+
> > | 1 - 99 | RT processes are here. They are in the entry |
> > | | whose index is 99 - sched_priority. |
> > +-----------+-----------------------------------------------+
> > | 100 - 139 | Ordinally processes are here. They are in the |
> > | | entry whose index is (nice+120) +/- 5 |
> > +-----------+-----------------------------------------------+
> >
> > ... there will be an entry only used by kernel and its priority is HIGHER
> > than any user process, and I'll get happy :-)
>
> We've seen the same problem with other stop_machine_run sites in the kernel.
> module remove was one.
>
> Reserving the top priority slot for stop machine (and migration thread, I
> guess) isn't a bad idea.

For the time being, I'll try to write the patch implement this idea after
submitting stop_machine_run() fix code. Probably I'll post RFC in one week.

Thanks,
Satoru

>
> --
> SUSE Labs, Novell Inc.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

Subject: Re: [BUG] cpu-hotplug: Can't offline the CPU with naughty realtime processes

On Wed, May 09, 2007 at 10:47:50AM +1000, Nick Piggin wrote:
>
> We've seen the same problem with other stop_machine_run sites in the kernel.
> module remove was one.
>
> Reserving the top priority slot for stop machine (and migration thread, I
> guess) isn't a bad idea.

I second this thought.
The process freezer, if used will only safeguard cpu-hotplug, but not other
sites which use stop_machine_run.

>
> --
> SUSE Labs, Novell Inc.

Regards
gautham.
--
Gautham R Shenoy
Linux Technology Center
IBM India.
"Freedom comes with a price tag of responsibility, which is still a bargain,
because Freedom is priceless!"

2007-05-11 08:51:05

by Satoru Takeuchi

[permalink] [raw]
Subject: [PATCH 2/2] cpu hotplug: fix ksoftirqd termination on cpu hotplug with naughty realtime process

Fix ksoftirqd termination on cpu hotplug with naughty real time process.

Assuming the following case:

- Try to hot remove CPU2 from CPU1.
- There is a real time process on CPU2, and that process doesn't sleep at all.
- That rt process and ksoftirqd/2 is migrated to the CPU0

Then ksoftirqd/2 can't stop becasue that rt process runs everlastingly on CPU0,
and CPU1 waiting the ksoftirqd/2's termination hangs up. To fix this problem, set
the priority of ksoftirqd/2 to max one before kthread_stop().

Signed-off-by: Satoru Takeuchi <[email protected]>

Index: linux-2.6.21/kernel/softirq.c
===================================================================
--- linux-2.6.21.orig/kernel/softirq.c 2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/softirq.c 2007-05-11 17:19:12.000000000 +0900
@@ -590,6 +590,7 @@ static int __cpuinit cpu_callback(struct
{
int hotcpu = (unsigned long)hcpu;
struct task_struct *p;
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };

switch (action) {
case CPU_UP_PREPARE:
@@ -614,6 +615,7 @@ static int __cpuinit cpu_callback(struct
case CPU_DEAD:
p = per_cpu(ksoftirqd, hotcpu);
per_cpu(ksoftirqd, hotcpu) = NULL;
+ sched_setscheduler(p, SCHED_FIFO, &param);
kthread_stop(p);
takeover_tasklets(hotcpu);
break;

2007-05-11 08:51:26

by Satoru Takeuchi

[permalink] [raw]
Subject: [PATCH 1/2] Fix stop_machine_run problem with naughty real time process

Hi,

I wrote patches which fixes the problem regarding stop_machine_run() and
cpu hotplug.

stop_machine_run() can't accomplish its work if there is a real time process
on the CPU on which "kstopmachine" kernel thread is running. For more details,
please refer to the following thread:

http://lkml.org/lkml/2007/5/7/41

TEST RESULT:

I did the following test on my ia64 box. It works fine:

-------------------------------------------------------------------------------
# cat loop.sh
while true ; do
:
done
-------------------------------------------------------------------------------
# cat test_stop_machine_run_with_rt_proc.sh
#!/bin/sh

taskset 0x2 chrt -f 98 ./loop.sh &
PID=${!}
echo 0 >/sys/devices/system/cpu/cpu1/online
kill ${PID}
echo 1 >/sys/devices/system/cpu/cpu1/online
-------------------------------------------------------------------------------

To do the test, just issue the following command.

# ./test_stop_machine_run_with_rt_proc.sh
#

TODO list
=========

Some more works are needed. See the TODO list.

- If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't
work because kstopmachine doesn't be scheduled.

-> I'm trying to fix this problem, see the followings:

http://lkml.org/lkml/2007/5/8/620

I would submit RFC patches in 1 weeks.

- On CPU hot removal, if that RT process is migrated to the CPU on which
stop_machine_run() is running, stop_machine_run can't continue to run.

-> I'm trying to fix this problem.

- Other `stop_machine_run() with FIFO` problem might exist.

-> I've not research other subsystem using stop_machine_run yet.


# FYI, I'll be offline for 2 days.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

- "kstopmachine" try to run on CPU1
- There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <[email protected]>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
static int stop_machine(void)
{
int i, ret = 0;
- struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
- /* One high-prio thread per cpu. We'll do this one. */
- sched_setscheduler(current, SCHED_FIFO, &param);

atomic_set(&stopmachine_thread_ack, 0);
stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i

p = kthread_create(do_stop, &smdata, "kstopmachine");
if (!IS_ERR(p)) {
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+
+ /* One high-prio thread per cpu. We'll do this one. */
+ sched_setscheduler(p, SCHED_FIFO, &param);
kthread_bind(p, cpu);
wake_up_process(p);
wait_for_completion(&smdata.done);

2007-05-11 09:19:22

by Satoru Takeuchi

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix stop_machine_run problem with naughty real time process

At Fri, 11 May 2007 17:49:20 +0900,
Satoru Takeuchi wrote:
>
> Hi,
>
> I wrote patches which fixes the problem regarding stop_machine_run() and
> cpu hotplug.

Sorry, there were extra tabs. Fixed.

Thanks,

Satoru

---
Fix stop_machine_run() problem with naughty real time process

stop_machine_run() does its work on "kstopmachine" thread having max priority.
However that thread get such priority after woken up. Therefore, in the
following case ...

- "kstopmachine" try to run on CPU1
- There is a real time process which doesn't relinquish CPU time voluntary on CPU1

... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing
hangs up. To fix this problem, call sched_setscheduler() before waking up that thread.

Signed-off-by: Satoru Takeuchi <[email protected]>

Index: linux-2.6.21/kernel/stop_machine.c
===================================================================
--- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900
+++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900
@@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s
static int stop_machine(void)
{
int i, ret = 0;
- struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
-
- /* One high-prio thread per cpu. We'll do this one. */
- sched_setscheduler(current, SCHED_FIFO, &param);

atomic_set(&stopmachine_thread_ack, 0);
stopmachine_num_threads = 0;
@@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i

p = kthread_create(do_stop, &smdata, "kstopmachine");
if (!IS_ERR(p)) {
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
+
+ /* One high-prio thread per cpu. We'll do this one. */
+ sched_setscheduler(p, SCHED_FIFO, &param);
kthread_bind(p, cpu);
wake_up_process(p);
wait_for_completion(&smdata.done);