Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751709AbbEDWGt (ORCPT ); Mon, 4 May 2015 18:06:49 -0400 Received: from mail-db3on0092.outbound.protection.outlook.com ([157.55.234.92]:35376 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1750992AbbEDWGm (ORCPT ); Mon, 4 May 2015 18:06:42 -0400 Authentication-Results: infradead.org; dkim=none (message not signed) header.d=none; Message-ID: <5547ED60.4020707@ezchip.com> Date: Mon, 4 May 2015 18:06:24 -0400 From: Chris Metcalf User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Frederic Weisbecker CC: Andrew Morton , Don Zickus , Ingo Molnar , Andrew Jones , Ulrich Obergfell , Fabian Frederick , Aaron Tomlin , Ben Zhang , Christoph Lameter , Gilad Ben-Yossef , Steven Rostedt , , Jonathan Corbet , , Thomas Gleixner , Peter Zijlstra Subject: Re: [PATCH v10 1/3] smpboot: allow excluding cpus from the smpboot threads References: <1430422766-19703-1-git-send-email-cmetcalf@ezchip.com> <1430422766-19703-2-git-send-email-cmetcalf@ezchip.com> <20150501085356.GA14149@lerouge> <5543DABF.4060303@ezchip.com> <20150501212329.GA4179@lerouge> In-Reply-To: <20150501212329.GA4179@lerouge> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [173.76.21.154] X-ClientProxiedBy: BLUPR11CA0061.namprd11.prod.outlook.com (10.141.30.29) To DB5PR02MB0773.eurprd02.prod.outlook.com (25.161.243.144) X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0773; X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:; X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(601004)(5005006)(3002001);SRVR:DB5PR02MB0773;BCL:0;PCL:0;RULEID:;SRVR:DB5PR02MB0773; X-Forefront-PRVS: 05669A7924 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(6009001)(24454002)(51704005)(479174004)(377454003)(76176999)(50986999)(93886004)(46102003)(64126003)(1411001)(87266999)(40100003)(47776003)(66066001)(65816999)(122386002)(83506001)(5001960100002)(19580395003)(54356999)(15975445007)(110136002)(87976001)(36756003)(62966003)(33656002)(2950100001)(50466002)(59896002)(92566002)(80316001)(42186005)(4001350100001)(86362001)(77156002)(117156001)(77096005)(23746002)(18886065003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB5PR02MB0773;H:[192.168.1.163];FPR:;SPF:None;MLV:sfv;LANG:en; X-OriginatorOrg: ezchip.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 May 2015 22:06:37.4959 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB5PR02MB0773 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1854 Lines: 41 On 5/1/2015 5:23 PM, Frederic Weisbecker wrote: > On Fri, May 01, 2015 at 03:57:51PM -0400, Chris Metcalf wrote: > >> For example, booting with only cpu 0 as a housekeeping core (and >> therefore all watchdogs 1-35 on my 36-core tilegx are parked), and >> immediately doing "echo 0 > /proc/sys/kernel/watchdog", I see >> (via SysRq ^O-l) the first parked watchdog, on cpu 1, hung with: >> >> frame 0: 0xfffffff7000f2928 lock_hrtimer_base+0xb8/0xc0 >> frame 1: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170 >> frame 2: 0xfffffff7000f2a28 hrtimer_try_to_cancel+0x40/0x170 >> frame 3: 0xfffffff7000f2b98 hrtimer_cancel+0x40/0x68 >> frame 4: 0xfffffff70014cce0 watchdog_disable+0x50/0x70 >> frame 5: 0xfffffff70008c2d0 smpboot_thread_fn+0x350/0x438 >> frame 6: 0xfffffff700084b28 kthread+0x160/0x178 > Have you tried to do that before your patchset? Yes, it works fine. It requires the presence of the parked threads to trigger the issue. >> The config does not have NO_HZ_FULL_ALL or NO_HZ_FULL_SYSIDLE >> set, and does have RCU_FAST_NO_HZ and RCU_NOCB_CPU_ALL. >> >> I don't really know how to start debugging this, but I do know that >> unparking the threads first avoids the issue :-) > Do you have CONFIG_PROVE_LOCKING=y ? There seems to be some skew between the community version, which is throwing a bunch of errors when I enable PROVE_LOCKING, and our internal version where some things are not yet upstreamed but PROVE_LOCKING works :-) I'll try to set aside some time to reconcile the two to figure it out. -- Chris Metcalf, EZChip Semiconductor http://www.ezchip.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/