Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp1970340imm; Mon, 3 Sep 2018 14:36:27 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYhU5VQnYsJX2CGW3fn1rY0O7bYPP8oNT49PVYseNNa99bBuY3mFky8mg2Pocul+bMJXjQo X-Received: by 2002:a17:902:6501:: with SMTP id b1-v6mr30482986plk.31.1536010587649; Mon, 03 Sep 2018 14:36:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536010587; cv=none; d=google.com; s=arc-20160816; b=VhSQ0KRbtETtASV2QRp8IDUNaexDGpkJr6c4ixumOyQuBtll/hJnJA3qBZYz5fhq88 Ph5GsU8uyMVnz5seDkV7t0LMr9AqRBdtSmowvBA+NOFrGs9ShcsQs/Rq66gyskQk1kox NOZNC5kUEzw2nDYInZegXIdhtj900SwCoZf1uLCri0+oAIFXu30mysqAXULu6BnY9IYk lcAPg6ti8AtZiFPkeD7E4IRBDYnaWygr2rRr5/Rtm7vboHbG0B85BZVuWh7tqnLQZS7H j9vAZzDdZUZQqZRS9xwHIw6ylQxuTNUQy+oxVu/GZ7zrgqYV9SEsybKWOTaOWOR8PR40 6lcg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:date:message-id:from :references:cc:to:subject:dkim-signature:arc-authentication-results; bh=r/MXVjoTvhwBv9SQV3i4b6Wkvhp2uek3McwHeuBTFGU=; b=Gm84q0x3d1MdP2oZczgSmNnzpx+tzMa4KB8GeyFMFV4CiOyhvCBmalT+BaraJC9dji CLPdMdpJ+w+pPnemlxCHROhezZYqu/0kR0DqhENlab+pNuvjxkNmdcxlxFCoBI2d9Vhy OK42A64BuL1VDqIQvTLR2/ppEr1PX/ITMKmoWRmEoW6szkZEv2vbNbv87H9P0LOorYyn n5MGykwP7Re7FKHdmDo1M/rpUokilgCGgl9c4xALMEKaOLHB0cf3511zGE3joihe4KU/ dkl1Cx9OOCePGXbrFSAfJYztUb70y/4B5eVVCQ0KPxMkaRbVQdRHoCzdW+j2/WDNx1KW 09aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@mailbox.org header.s=mail20150812 header.b=u18LVUa8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mailbox.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l134-v6si18900432pga.196.2018.09.03.14.36.12; Mon, 03 Sep 2018 14:36:27 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@mailbox.org header.s=mail20150812 header.b=u18LVUa8; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=mailbox.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727431AbeIDB5M (ORCPT + 99 others); Mon, 3 Sep 2018 21:57:12 -0400 Received: from mx2.mailbox.org ([80.241.60.215]:58884 "EHLO mx2.mailbox.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726008AbeIDB5L (ORCPT ); Mon, 3 Sep 2018 21:57:11 -0400 Received: from smtp1.mailbox.org (smtp1.mailbox.org [80.241.60.240]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx2.mailbox.org (Postfix) with ESMTPS id 1767A41D72; Mon, 3 Sep 2018 23:35:06 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mailbox.org; h= content-transfer-encoding:content-language:content-type :content-type:in-reply-to:mime-version:date:date:message-id:from :from:references:subject:subject:received; s=mail20150812; t= 1536010504; bh=KG7/yRu/8xuw047agLb6KcOL5yV4KVcQZwzUVyFOjKU=; b=u 18LVUa8lXppcw41wvHFItBQzZDQYRrpkMq694ccnuzhN563TuN+guREnlb1wv/4l fzb58jzcXt/QNEXEHC8QF7sPedBrd3prW0S4tqhf2aQ3SiSbYCt/9KPYIZubyjA4 DicwbIrG4/0CKlb4BxwEAMSyeaMCcKtfQAQoz7LRAwHb0NUUNbIvM8ol3pM8+HWM ZyeRBdqCPKkUqvEIX9vOS8Dr7ZEUthTmFH4rujzPl2cm/SRnHibbN48Asd8LHzPU LIa6SxTiIFGEwVcHXUfByPoFyRWyzjDCA6r2NEXrhSIxnpbX4uoWbJIJ8l/OHTIQ ROYg7BdmO/BCwatkJZWsg== X-Virus-Scanned: amavisd-new at heinlein-support.de Received: from smtp1.mailbox.org ([80.241.60.240]) by spamfilter03.heinlein-hosting.de (spamfilter03.heinlein-hosting.de [80.241.56.117]) (amavisd-new, port 10030) with ESMTP id 26xa7igVenII; Mon, 3 Sep 2018 23:35:04 +0200 (CEST) Subject: Re: REGRESSION: boot stalls on several old dual core Intel CPUs To: Peter Zijlstra , Thomas Gleixner Cc: Kevin Shanahan , linux-kernel@vger.kernel.org, rafael.j.wysocki@intel.com, len.brown@intel.com, rjw@rjwysocki.net, diego.viola@gmail.com, rui.zhang@intel.com, viktor_jaegerskuepper@freenet.de References: <74c5abc8-7430-5bc9-2f8a-a2205608bee7@mailbox.org> <20180830130439.GM24082@hirez.programming.kicks-ass.net> <20180901022125.GO4941@tuon.disenchant.local> <20180903072506.GS24124@hirez.programming.kicks-ass.net> <20180903085423.GU24124@hirez.programming.kicks-ass.net> <20180903093305.GC24142@hirez.programming.kicks-ass.net> From: Siegfried Metz Message-ID: <14c49f11-4df3-a07b-b2aa-dacb998dcd89@mailbox.org> Date: Mon, 3 Sep 2018 23:34:56 +0200 MIME-Version: 1.0 In-Reply-To: <20180903093305.GC24142@hirez.programming.kicks-ass.net> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/3/18 11:33 AM, Peter Zijlstra wrote: > On Mon, Sep 03, 2018 at 10:54:23AM +0200, Peter Zijlstra wrote: >> On Mon, Sep 03, 2018 at 09:38:15AM +0200, Thomas Gleixner wrote: >>> On Mon, 3 Sep 2018, Peter Zijlstra wrote: >>>> On Sat, Sep 01, 2018 at 11:51:26AM +0930, Kevin Shanahan wrote: >>>>> commit 01548f4d3e8e94caf323a4f664eb347fd34a34ab >>>>> Author: Martin Schwidefsky >>>>> Date: Tue Aug 18 17:09:42 2009 +0200 >>>>> >>>>> clocksource: Avoid clocksource watchdog circular locking dependency >>>>> >>>>> stop_machine from a multithreaded workqueue is not allowed because >>>>> of a circular locking dependency between cpu_down and the workqueue >>>>> execution. Use a kernel thread to do the clocksource downgrade. >>>> >>>> I cannot find stop_machine usage there; either it went away or I need to >>>> like wake up. >>> >>> timekeeping_notify() which is involved in switching clock source uses stomp >>> machine. >> >> ARGH... OK, lemme see if I can come up with something other than >> endlessly spawning that kthread. >> >> A special purpose kthread_worker would make more sense than that. > > Can someone test this? > > --- > kernel/time/clocksource.c | 28 ++++++++++++++++++++++------ > 1 file changed, 22 insertions(+), 6 deletions(-) > > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c > index f74fb00d8064..898976d0082a 100644 > --- a/kernel/time/clocksource.c > +++ b/kernel/time/clocksource.c > @@ -112,13 +112,28 @@ static int finished_booting; > static u64 suspend_start; > > #ifdef CONFIG_CLOCKSOURCE_WATCHDOG > -static void clocksource_watchdog_work(struct work_struct *work); > +static void clocksource_watchdog_work(struct kthread_work *work); > static void clocksource_select(void); > > static LIST_HEAD(watchdog_list); > static struct clocksource *watchdog; > static struct timer_list watchdog_timer; > -static DECLARE_WORK(watchdog_work, clocksource_watchdog_work); > + > +/* > + * We must use a kthread_worker here, because: > + * > + * clocksource_watchdog_work() > + * clocksource_select() > + * __clocksource_select() > + * timekeeping_notify() > + * stop_machine() > + * > + * cannot be called from a reqular workqueue, because of deadlocks between > + * workqueue and stopmachine. > + */ > +static struct kthread_worker *watchdog_worker; > +static DEFINE_KTHREAD_WORK(watchdog_work, clocksource_watchdog_work); > + > static DEFINE_SPINLOCK(watchdog_lock); > static int watchdog_running; > static atomic_t watchdog_reset_pending; > @@ -158,7 +173,7 @@ static void __clocksource_unstable(struct clocksource *cs) > > /* kick clocksource_watchdog_work() */ > if (finished_booting) > - schedule_work(&watchdog_work); > + kthread_queue_work(watchdog_worker, &watchdog_work); > } > > /** > @@ -199,7 +214,7 @@ static void clocksource_watchdog(struct timer_list *unused) > /* Clocksource already marked unstable? */ > if (cs->flags & CLOCK_SOURCE_UNSTABLE) { > if (finished_booting) > - schedule_work(&watchdog_work); > + kthread_queue_work(watchdog_worker, &watchdog_work); > continue; > } > > @@ -269,7 +284,7 @@ static void clocksource_watchdog(struct timer_list *unused) > */ > if (cs != curr_clocksource) { > cs->flags |= CLOCK_SOURCE_RESELECT; > - schedule_work(&watchdog_work); > + kthread_queue_work(watchdog_worker, &watchdog_work); > } else { > tick_clock_notify(); > } > @@ -418,7 +433,7 @@ static int __clocksource_watchdog_work(void) > return select; > } > > -static void clocksource_watchdog_work(struct work_struct *work) > +static void clocksource_watchdog_work(struct kthread_work *work) > { > mutex_lock(&clocksource_mutex); > if (__clocksource_watchdog_work()) > @@ -806,6 +821,7 @@ static int __init clocksource_done_booting(void) > { > mutex_lock(&clocksource_mutex); > curr_clocksource = clocksource_default_clock(); > + watchdog_worker = kthread_create_worker(0, "cs-watchdog"); > finished_booting = 1; > /* > * Run the watchdog first to eliminate unstable clock sources > Successfully booted my Intel Core 2 Duo with the patch applied on top of 4.18.5 (based on default Arch Linux config). I tested with at least 8/8 successful boots in total - with no additional kernel boot parameters and also with "quiet", and "debug". No problems seen so far. Thank you for your effort and developing this patch. Siegfried