Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp2807356rdg; Mon, 16 Oct 2023 16:03:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHYjw1sKidXJXCCh+5wMipzGquhUiWZOMw1HgWwiRN0wxPILop+9ZMFjgcaaNc/1l8rjlgT X-Received: by 2002:a92:ab08:0:b0:345:3378:4251 with SMTP id v8-20020a92ab08000000b0034533784251mr819841ilh.23.1697497438202; Mon, 16 Oct 2023 16:03:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697497438; cv=none; d=google.com; s=arc-20160816; b=WubW7LqQi4+KFlN0SGql+LI//hhyXgByVlh3Zj+8DekTWySR7Py45bbu8G+gSYPt19 RC+rbzYM46fJ1pPCGATqO5NvEJeb6jXDYFl9+QMKh7705KUh1Q5sW/7eWtZY++E2/8Gb I0J7jP1E5QSIGsXeF4AExkrgwd49OKtDL3+ax/wMOO8k8N96CZ3Cuu3a6kP4K2lm+1Oq 6rbr9XjrlvL3F1yrXEljr7TwCt093BCMvR/7BnB6VyTKXkZBZnO97IVuxF413/jsonm7 ew+hkZg2wzHcwolFols80th81KT/5Z80wl3DpObemX92O5GABW303bGG6VbPYJP5qVf3 5wnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=1kTWAMk58dylUCpLbVAxWVtqYb7x5HaHn6JI7j15snM=; fh=TzYgjn/6qFSK4xb7wnGjhCD7BiA9RXAwkr3Kn96flKc=; b=Qsvz+y+mYpd/sui0sR4G5O6WMherFdIvgVsSgreIxaYVYFhjfnsrWM76ygww3zMb/W ajvfDqX49RrguIi+2KkDbvV8eI1ek50h8i8ArTK78NP0tjCaBlni64UJplL4cAlIWJVW 9J4uNCwlcC6pqUG1iYNtoBjb8MQBUA9fL700iK69w4slxNptRLhyaFOJymD0dGdab7zs hJUcpCCvQcUt2BD1OTjPZwyQIDwa1HjKVnfghuC4EMpsEDtIUTTj/hBDx+JNqvU2Pm6m nTaX45CXBKj15lCgmce9+CeFBAGiCEevRz3ejznnyclKCPuLQ0QngVuA7mWH2W4DGRsU /4ow== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XiEoN46e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from morse.vger.email (morse.vger.email. [2620:137:e000::3:1]) by mx.google.com with ESMTPS id w8-20020a63f508000000b005898db9d66esi364895pgh.625.2023.10.16.16.03.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Oct 2023 16:03:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) client-ip=2620:137:e000::3:1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=XiEoN46e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:1 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 086AF80440CC; Mon, 16 Oct 2023 16:03:56 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232996AbjJPXDu (ORCPT + 99 others); Mon, 16 Oct 2023 19:03:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58882 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233775AbjJPXDt (ORCPT ); Mon, 16 Oct 2023 19:03:49 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAB309B for ; Mon, 16 Oct 2023 16:03:47 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 77869C433C7; Mon, 16 Oct 2023 23:03:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697497427; bh=b8xKj753lfZd+cD8vncLLLlX6EQ1e8JPWBiJsHZJ/FU=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=XiEoN46ehPaHXLfmpPJ1G5yNY90m/L1rUh5e2hPEfQwpGfOIoYRfwcEeaiOKBOOOg mWKX7ws+oyW/2ETte+u66L33X/3XrX89FnwfwinYO2xazsqklmYeXoYY4CeVWYA+OS /7XCBVP2ZGD2SaX3M3OFBa0KLSSQA5lABW5CGK84D5+N4dzXnTR4YmvyK+dByJvvZw EJ7h5VEa9sa16fxj0F4dytsnXxejd2rVy+DrKCBkAstYO9OZEPyrTftkNLPW2n3rqc 3D43ZBWHQ7pdxoRktFiKx0zHvPpswuFk8Qd1G6xQW+yf949sbn8Gj6bJAv9oeCs5Co d9zVOiN82VJdQ== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id 0EC32CE126C; Mon, 16 Oct 2023 16:03:47 -0700 (PDT) Date: Mon, 16 Oct 2023 16:03:47 -0700 From: "Paul E. McKenney" To: Thomas Gleixner Cc: John Stultz , Tetsuo Handa , Stephen Boyd , LKML , Sebastian Andrzej Siewior , x86@kernel.org Subject: Re: [PATCH] clocksource: disable irq when holding watchdog_lock. Message-ID: Reply-To: paulmck@kernel.org References: <80ff5036-8449-44a6-ba2f-0130d3be6b57@I-love.SAKURA.ne.jp> <878r826xys.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <878r826xys.ffs@tglx> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 16 Oct 2023 16:03:56 -0700 (PDT) On Mon, Oct 16, 2023 at 11:47:55PM +0200, Thomas Gleixner wrote: > On Mon, Oct 16 2023 at 10:46, John Stultz wrote: > > On Fri, Oct 13, 2023 at 7:51 AM Tetsuo Handa > > wrote: > >> > >> Lockdep found that spin_lock(&watchdog_lock) from call_timer_fn() > >> is not safe. Use spin_lock_irqsave(&watchdog_lock, flags) instead. > >> > >> [ 0.378387] TSC synchronization [CPU#0 -> CPU#1]: > >> [ 0.378387] Measured 55060 cycles TSC warp between CPUs, turning off TSC clock. [ . . . ] > Something like the uncompiled/untested below should cure it for real. It > really does not matter whether the TSC unstable event happens a bit > later. The system is unhappy no matter what. This does pass my acceptance tests: Tested-by: Paul E. McKenney > That said, this whole clocksource watchdog mess wants a proper > overhaul. It has become a pile of warts and duct tape by now and after > staring at it long enough there is no real reason to run it in a timer > callback anymore. It just can move into delayed work and the whole > locking problem can be reduced to the clocksource_mutex and some well > thought out atomic operations to handle the mark unstable case. But > that's a different story and not relevant for curing the problem at > hand. Moving the code to delayed work seems quite reasonable. But Thomas, you do understand that the way things have been going for the clocksource watchdog, pushing it out to delayed work will no doubt add yet more hair on large busy systems, right? Yeah, yeah, I know, delayed work shouldn't be any worse than ksoftirqd. The key word of course being "shouldn't". ;-) Thanx, Paul > Thanks, > > tglx > --- > --- a/arch/x86/kernel/tsc_sync.c > +++ b/arch/x86/kernel/tsc_sync.c > @@ -15,6 +15,7 @@ > * ( The serial nature of the boot logic and the CPU hotplug lock > * protects against more than 2 CPUs entering this code. ) > */ > +#include > #include > #include > #include > @@ -342,6 +343,13 @@ static inline unsigned int loop_timeout( > return (cpumask_weight(topology_core_cpumask(cpu)) > 1) ? 2 : 20; > } > > +static void tsc_sync_mark_tsc_unstable(struct work_struct *work) > +{ > + mark_tsc_unstable("check_tsc_sync_source failed"); > +} > + > +static DECLARE_WORK(tsc_sync_work, tsc_sync_mark_tsc_unstable); > + > /* > * The freshly booted CPU initiates this via an async SMP function call. > */ > @@ -395,7 +403,7 @@ static void check_tsc_sync_source(void * > "turning off TSC clock.\n", max_warp); > if (random_warps) > pr_warn("TSC warped randomly between CPUs\n"); > - mark_tsc_unstable("check_tsc_sync_source failed"); > + schedule_work(&tsc_sync_work); > } > > /*