Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752771AbaDXIdH (ORCPT ); Thu, 24 Apr 2014 04:33:07 -0400 Received: from inet-tsb5.toshiba.co.jp ([202.33.96.24]:53200 "EHLO imx2.toshiba.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751928AbaDXIdA (ORCPT ); Thu, 24 Apr 2014 04:33:00 -0400 Message-Id: <201404240832.s3O8WHd0011014@toshiba.co.jp> Date: Thu, 24 Apr 2014 17:31:55 +0900 From: Daniel Sangorrin User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.4.0 MIME-Version: 1.0 To: Viresh Kumar , Daniel Sangorrin Cc: Thomas Gleixner , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYg==?= =?UTF-8?B?ZWNrZXI=?= , Peter Zijlstra , Ingo Molnar , Tejun Heo , Li Zefan , Lists linaro-kernel , Linux Kernel Mailing List , Cgroups Subject: Re: [RFC 0/4] Migrate timers away from cpuset on setting cpuset.quiesce References: <201404240725.s3O7PrUv003720@toshiba.co.jp> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2014/04/24 16:43, Viresh Kumar wrote: > On 24 April 2014 12:55, Daniel Sangorrin wrote: >> I tried your set of patches for isolating particular CPU cores from unpinned >> timers. On x86_64 they were working fine, however I found out that on ARM >> they would fail under the following test: > > I am happy that these drew attention from somebody Atleast :) Thanks to you for your hard work. >> # mount -t cpuset none /cpuset >> # cd /cpuset >> # mkdir rt >> # cd rt >> # echo 1 > cpus >> # echo 1 > cpu_exclusive >> # cd >> # taskset 0x2 ./setquiesce.sh <--- contains "echo 1 > /cpuset/rt/quiesce" >> [ 75.622375] ------------[ cut here ]------------ >> [ 75.627258] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2595 __migrate_hrtimers+0x17c/0x1bc() >> [ 75.636840] DEBUG_LOCKS_WARN_ON(current->hardirq_context) >> [ 75.636840] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.14.0-rc1-37710-g23c8f02 #1 >> [ 75.649627] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) >> [ 75.649627] [] (show_stack) from [] (dump_stack+0x78/0x94) >> [ 75.662689] [] (dump_stack) from [] (warn_slowpath_common+0x60/0x84) >> [ 75.670410] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x30/0x40) >> [ 75.677673] [] (warn_slowpath_fmt) from [] (__migrate_hrtimers+0x17c/0x1bc) >> [ 75.677673] [] (__migrate_hrtimers) from [] (generic_smp_call_function_single_interrupt+0x8c/0x104) >> [ 75.699645] [] (generic_smp_call_function_single_interrupt) from [] (handle_IPI+0xa4/0x16c) >> [ 75.706970] [] (handle_IPI) from [] (gic_handle_irq+0x54/0x5c) >> [ 75.715087] [] (gic_handle_irq) from [] (__irq_svc+0x44/0x5c) >> [ 75.725311] Exception stack(0xc08a3f58 to 0xc08a3fa0) > > I couldn't understand why we went via a interrupt here ? Probably CPU1 > was idle and was woken up with a IPI and then this happened. But in > that case too, > shouldn't the script run from process context instead ? In kernel/cpuset.c:quiesce_cpuset() you are using the function 'smp_call_function_any' which asks CPU cores in 'cpumask' to execute the functions 'hrtimer_quiesce_cpu' and 'timer_quiesce_cpu'. In the case above, 'cpumask' corresponds to core 0. Since I'm forcing the call to be executed from core 1 (by using taskset), an inter-processor interrupt is sent to core 0 for those functions to be executed. >> I also backported your patches to Linux 3.10.y and found the same problem >> both in ARM and x86_64. > > There are very few changes in between 3.10 and latest for timers/hrtimers > and so things are expected to be the same. > >> However, I think I figured out the reason for those >> errors. Please, could you check the patch below (it applies on the top of >> your tree, branch isolate-cpusets) and let me know what you think? > > Okay, just to let you know, I have also found some issues and they are > now pushed in my tree.. Also it is rebased over 3.15-rc2 now. Ok, thank you! I see that you have already fixed the problem. I tested your tree on ARM and now it seems to work correctly. > >> -------------------------PATCH STARTS HERE--------------------------------- >> cpuset: quiesce: change irq disable/enable by irq save/restore >> >> The function __migrate_timers can be called under interrupt context >> or thread context depending on the core where the system call was >> executed. In case it executes under interrupt context, it > > How exactly? See my reply above. >> seems a bad idea to leave interrupts enabled after migrating the >> timers. In fact, this caused kernel errors on the ARM architecture and >> on the x86_64 architecture with the 3.10 kernel (backported version >> of the cpuset-quiesce patch). > > I can't keep it as a separate patch and so would be required to merge > it into my original patch.. > > Thanks for your inputs :) > -- > To unsubscribe from this list: send the line "unsubscribe cgroups" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Thanks, Daniel -- Toshiba Corporate Software Engineering Center Daniel SANGORRIN E-mail: daniel.sangorrin@toshiba.co.jp -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/