Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760823AbXEKIv0 (ORCPT ); Fri, 11 May 2007 04:51:26 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759918AbXEKIvG (ORCPT ); Fri, 11 May 2007 04:51:06 -0400 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:35390 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1759410AbXEKIvD (ORCPT ); Fri, 11 May 2007 04:51:03 -0400 Date: Fri, 11 May 2007 17:49:20 +0900 Message-ID: <87veezeodr.wl%takeuchi_satoru@jp.fujitsu.com> From: Satoru Takeuchi To: Rusty Russell Cc: Satoru Takeuchi , Linux Kernel , Srivatsa Vaddagiri , Zwane Mwaikambo , Nathan Lynch , Joel Schopp , Ashok Raj , Heiko Carstens , Gautham R Shenoy Subject: [PATCH 1/2] Fix stop_machine_run problem with naughty real time process In-Reply-To: <1178593345.28438.29.camel@localhost.localdomain> References: <87bqgxrlky.wl%takeuchi_satoru@jp.fujitsu.com> <1178545373.28438.7.camel@localhost.localdomain> <877irkrq8a.wl%takeuchi_satoru@jp.fujitsu.com> <1178593345.28438.29.camel@localhost.localdomain> User-Agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (=?ISO-8859-4?Q?Shij=F2?=) APEL/10.6 Emacs/21.4 (i486-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3579 Lines: 112 Hi, I wrote patches which fixes the problem regarding stop_machine_run() and cpu hotplug. stop_machine_run() can't accomplish its work if there is a real time process on the CPU on which "kstopmachine" kernel thread is running. For more details, please refer to the following thread: http://lkml.org/lkml/2007/5/7/41 TEST RESULT: I did the following test on my ia64 box. It works fine: ------------------------------------------------------------------------------- # cat loop.sh while true ; do : done ------------------------------------------------------------------------------- # cat test_stop_machine_run_with_rt_proc.sh #!/bin/sh taskset 0x2 chrt -f 98 ./loop.sh & PID=${!} echo 0 >/sys/devices/system/cpu/cpu1/online kill ${PID} echo 1 >/sys/devices/system/cpu/cpu1/online ------------------------------------------------------------------------------- To do the test, just issue the following command. # ./test_stop_machine_run_with_rt_proc.sh # TODO list ========= Some more works are needed. See the TODO list. - If there is a SCHED_FIFO process having max priority, stop_machine_run doesn't work because kstopmachine doesn't be scheduled. -> I'm trying to fix this problem, see the followings: http://lkml.org/lkml/2007/5/8/620 I would submit RFC patches in 1 weeks. - On CPU hot removal, if that RT process is migrated to the CPU on which stop_machine_run() is running, stop_machine_run can't continue to run. -> I'm trying to fix this problem. - Other `stop_machine_run() with FIFO` problem might exist. -> I've not research other subsystem using stop_machine_run yet. # FYI, I'll be offline for 2 days. Thanks, Satoru --- Fix stop_machine_run() problem with naughty real time process stop_machine_run() does its work on "kstopmachine" thread having max priority. However that thread get such priority after woken up. Therefore, in the following case ... - "kstopmachine" try to run on CPU1 - There is a real time process which doesn't relinquish CPU time voluntary on CPU1 ... "kstopmachine" can't start to run and the CPU on which stop_machine_run() is runing hangs up. To fix this problem, call sched_setscheduler() before waking up that thread. Signed-off-by: Satoru Takeuchi Index: linux-2.6.21/kernel/stop_machine.c =================================================================== --- linux-2.6.21.orig/kernel/stop_machine.c 2007-05-11 13:45:34.000000000 +0900 +++ linux-2.6.21/kernel/stop_machine.c 2007-05-11 14:49:17.000000000 +0900 @@ -89,10 +89,6 @@ static void stopmachine_set_state(enum s static int stop_machine(void) { int i, ret = 0; - struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; - - /* One high-prio thread per cpu. We'll do this one. */ - sched_setscheduler(current, SCHED_FIFO, ¶m); atomic_set(&stopmachine_thread_ack, 0); stopmachine_num_threads = 0; @@ -184,6 +180,10 @@ struct task_struct *__stop_machine_run(i p = kthread_create(do_stop, &smdata, "kstopmachine"); if (!IS_ERR(p)) { + struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 }; + + /* One high-prio thread per cpu. We'll do this one. */ + sched_setscheduler(p, SCHED_FIFO, ¶m); kthread_bind(p, cpu); wake_up_process(p); wait_for_completion(&smdata.done); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/