Received: by 2002:a05:6358:5282:b0:b5:90e7:25cb with SMTP id g2csp3325638rwa; Tue, 23 Aug 2022 02:53:44 -0700 (PDT) X-Google-Smtp-Source: AA6agR5jk5bSg1N6Zy5fn6ScokRjWJwr9mXYleCdARSdJNcZmC4HPPWU9YW8XOY8zyKjAU6Q8GOC X-Received: by 2002:a05:6402:360d:b0:445:bd16:803b with SMTP id el13-20020a056402360d00b00445bd16803bmr2914854edb.318.1661248424621; Tue, 23 Aug 2022 02:53:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661248424; cv=none; d=google.com; s=arc-20160816; b=zsx8NWRvILCcygmSB6z4gDu1vfkylg338EsQCdeK7vy6eD2+M0BwgvYp4g749iALIM ujO1UWeiZGFRsXRArSsN3oFmaaqAZfEjrVcC0heW7S/FOK0u+RWdNlWpoP5ztHtPI80W GLr/lu7UA5V2Vc/gFv0oL/KCAtG9u8NoqFUpl/46Pk7VA/s4dz0kA026ShwW8oCXI00b joFWju8Kr1kmpf73T4oQoY3TVMIFnnr5w6h3D2cF+gU/j7R5e5LJUrYn5w9Hs52WbPh+ PSv9tE9duUAdM+/H3ivtuAHBS2LetJgKA1OLbl0miyKsonNOVYAW9XRaOq2A5PKWEz1/ n7RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=1gcWyB5rwE3HzeFNVi2rwQsXDKBainJqtShZDGuPBro=; b=jeQSLK1oQNB7qTYT+ZtysnA3kLX+IR5HUazLLtZh1iF7OhCEj6KjJEnZ18kU/QaLQm t6uekd2K8FTlxIxoCRJCFZ14azsgISxAOY9cPXKiDiAB8u7VnJTOulpdVQRx7dkmmE74 q/sJgqX87OlpcWYXATlyP+2t6jWBw9VUK6BVpxz3wWbWytAdeiIoIByfMS7jftzTa3Z1 EEqH2rbbGOKFmU8ismWBGHDKaUzQbrhiL4uKEBB/N2AHu8I8id11asi4hxMW1MsKp2a3 6D0s/2DEnbkN8FBCvzozvCEUZ67fvXQo9mpJQpfGNa15GGtPiPs6RTIYkE6kSEQKg6Bq UC1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=iNxUGgz1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r18-20020a05640251d200b004472aa2affdsi663755edd.464.2022.08.23.02.53.19; Tue, 23 Aug 2022 02:53:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=iNxUGgz1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1351478AbiHWJhn (ORCPT + 99 others); Tue, 23 Aug 2022 05:37:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1351825AbiHWJgB (ORCPT ); Tue, 23 Aug 2022 05:36:01 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 159F177EBD; Tue, 23 Aug 2022 01:40:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id DF4C2B81C5C; Tue, 23 Aug 2022 08:34:11 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17ECCC433D6; Tue, 23 Aug 2022 08:34:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1661243650; bh=E3o1NL9M7MMxOvh5P8/+p5pkrfDEPsN6Qcy7OMxtyBg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iNxUGgz1wYS+xdqLHgod7z0VcqYGjCZdtQ7pR/+osCssFe921qG1Sn0mkUYDGz6ZC kBzQHABFPWsVyxutdnNNNIP99uTuJcth2hEJEli1g/0n9oDWqopPOYHxq7RYYVR/bd bH2tJA2haZsdZkbYvminTtYMaL4kwRf9VemOoQ/g= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Laurent Dufour , Nicholas Piggin , Michael Ellerman , Sasha Levin Subject: [PATCH 5.19 345/365] powerpc/pseries/mobility: set NMI watchdog factor during an LPM Date: Tue, 23 Aug 2022 10:04:06 +0200 Message-Id: <20220823080132.671776851@linuxfoundation.org> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20220823080118.128342613@linuxfoundation.org> References: <20220823080118.128342613@linuxfoundation.org> User-Agent: quilt/0.67 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Laurent Dufour [ Upstream commit 118b1366930c8c833b8b36abef657f40d4e26610 ] During an LPM, while the memory transfer is in progress on the arrival side, some latencies are generated when accessing not yet transferred pages on the arrival side. Thus, the NMI watchdog may be triggered too frequently, which increases the risk to hit an NMI interrupt in a bad place in the kernel, leading to a kernel panic. Disabling the Hard Lockup Watchdog until the memory transfer could be a too strong work around, some users would want this timeout to be eventually triggered if the system is hanging even during an LPM. Introduce a new sysctl variable nmi_watchdog_factor. It allows to apply a factor to the NMI watchdog timeout during an LPM. Just before the CPUs are stopped for the switchover sequence, the NMI watchdog timer is set to watchdog_thresh + factor% A value of 0 has no effect. The default value is 200, meaning that the NMI watchdog is set to 30s during LPM (based on a 10s watchdog_thresh value). Once the memory transfer is achieved, the factor is reset to 0. Setting this value to a high number is like disabling the NMI watchdog during an LPM. Signed-off-by: Laurent Dufour Reviewed-by: Nicholas Piggin Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220713154729.80789-5-ldufour@linux.ibm.com Signed-off-by: Sasha Levin --- Documentation/admin-guide/sysctl/kernel.rst | 12 ++++++ arch/powerpc/platforms/pseries/mobility.c | 43 +++++++++++++++++++++ 2 files changed, 55 insertions(+) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index ddccd1077462..9b7fa1baf225 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -592,6 +592,18 @@ to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). +nmi_wd_lpm_factor (PPC only) +============================ + +Factor to apply to the NMI watchdog timeout (only when ``nmi_watchdog`` is +set to 1). This factor represents the percentage added to +``watchdog_thresh`` when calculating the NMI watchdog timeout during an +LPM. The soft lockup timeout is not impacted. + +A value of 0 means no change. The default value is 200 meaning the NMI +watchdog is set to 30s (based on ``watchdog_thresh`` equal to 10). + + numa_balancing ============== diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c index 78f3f74c7056..cbe0989239bf 100644 --- a/arch/powerpc/platforms/pseries/mobility.c +++ b/arch/powerpc/platforms/pseries/mobility.c @@ -48,6 +48,39 @@ struct update_props_workarea { #define MIGRATION_SCOPE (1) #define PRRN_SCOPE -2 +#ifdef CONFIG_PPC_WATCHDOG +static unsigned int nmi_wd_lpm_factor = 200; + +#ifdef CONFIG_SYSCTL +static struct ctl_table nmi_wd_lpm_factor_ctl_table[] = { + { + .procname = "nmi_wd_lpm_factor", + .data = &nmi_wd_lpm_factor, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + }, + {} +}; +static struct ctl_table nmi_wd_lpm_factor_sysctl_root[] = { + { + .procname = "kernel", + .mode = 0555, + .child = nmi_wd_lpm_factor_ctl_table, + }, + {} +}; + +static int __init register_nmi_wd_lpm_factor_sysctl(void) +{ + register_sysctl_table(nmi_wd_lpm_factor_sysctl_root); + + return 0; +} +device_initcall(register_nmi_wd_lpm_factor_sysctl); +#endif /* CONFIG_SYSCTL */ +#endif /* CONFIG_PPC_WATCHDOG */ + static int mobility_rtas_call(int token, char *buf, s32 scope) { int rc; @@ -665,19 +698,29 @@ static int pseries_suspend(u64 handle) static int pseries_migrate_partition(u64 handle) { int ret; + unsigned int factor = 0; +#ifdef CONFIG_PPC_WATCHDOG + factor = nmi_wd_lpm_factor; +#endif ret = wait_for_vasi_session_suspending(handle); if (ret) return ret; vas_migration_handler(VAS_SUSPEND); + if (factor) + watchdog_nmi_set_timeout_pct(factor); + ret = pseries_suspend(handle); if (ret == 0) post_mobility_fixup(); else pseries_cancel_migration(handle, ret); + if (factor) + watchdog_nmi_set_timeout_pct(0); + vas_migration_handler(VAS_RESUME); return ret; -- 2.35.1