Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp1284233pxb; Fri, 21 Jan 2022 14:19:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJzmrJ+vBGhDYoUHBPi0i2KIRU2nXkT+n28lFfHtrJCRMzDU/D5xEYdgM5a+wbK+NqrUcDyB X-Received: by 2002:a63:1562:: with SMTP id 34mr4249007pgv.15.1642803588626; Fri, 21 Jan 2022 14:19:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1642803588; cv=none; d=google.com; s=arc-20160816; b=skatV3m5qOUSk47KYnzJvgMmpZBoxj5fF4XoT/usrva4bemEVOryUY0LsX1KFizunz +fm5T+icYCqrNZ1oFzCOTEeLbzZpo0Lcnw2x/9RWuNHRhT/LSp6M9uZi++JoIx2X4TG4 BLyo+bu0cy2y3TSGtk8ln1s/BkGQSb8HQysXOswO9StPz/NmWFvXPCGQpAk2AwnxfnzJ 2yjAEncxBR4KyWIKbCwZiHlT6CPzcvxSoQDb2/DYfkBWEhc2vxT9VCGmOrL8UIEd6gxO xGeXD0gtvFY22f0ZkIdn2k9LXQX5mfe29qzaAnoKl7c0QUT2DpIYM4YvtM9Es8NqRJPE Yqjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=WQQEqKNNxbc6s62M7PExlWeVrhf+Ly/6VnETzkVWTFg=; b=XGvHsz/Hq7/bD9OIUKuxCkE5byEZn1zfNOZnoWdBiSwnQKYA7u7KvVhsi9L5K570np RarhFkFhOWUgNJ+TJwXeTdAnrNjn1MhbzFOv+uFmiDGJ3li6LfG2qQKgc+FMU4akjY5Z /7IpVzHD7Eu+ZW54xzhmXQUEO1+jh/ubkfAL6jxerClEabJsVGzY+mDPJ9P85Li5IRTj xDitN1bmjKqbsO7QPSWp9GWmYkpoKYHV+vDKRuaFEt98UzSimcSdJra3rHt59GxvsW5m je/KGQ4zN/wpFz39rqMOh7yZ/WC2QYRAG5uX1MCuc6S8gKDDzlv6VhWJear0b3fsA2L4 C3fQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bTTh5nN9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id pi3si15603316pjb.156.2022.01.21.14.19.34; Fri, 21 Jan 2022 14:19:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=bTTh5nN9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=NONE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1376587AbiATPOk (ORCPT + 99 others); Thu, 20 Jan 2022 10:14:40 -0500 Received: from smtp-out2.suse.de ([195.135.220.29]:54522 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1376558AbiATPOi (ORCPT ); Thu, 20 Jan 2022 10:14:38 -0500 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id D3CDA1F388; Thu, 20 Jan 2022 15:14:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1642691676; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=WQQEqKNNxbc6s62M7PExlWeVrhf+Ly/6VnETzkVWTFg=; b=bTTh5nN9bq+eNCJJAa0qKKrTD1towCdvccoRxAs04SNeqCFd7q6OLqOS01nX4ZtFy36JLC fFW6RlEUPCxTvuvmJCeuuiLjnuXtG3kuR/2sltd6hCC1w9TH407L1Du+E4VX3WEts8OsdA B4OhSoBzgSzlQguSI/Uo+E18dGRU+2w= Received: from suse.cz (unknown [10.100.224.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 0BBF3A3B90; Thu, 20 Jan 2022 15:14:36 +0000 (UTC) Date: Thu, 20 Jan 2022 16:14:33 +0100 From: Petr Mladek To: "Guilherme G. Piccoli" Cc: kexec@lists.infradead.org, linux-kernel@vger.kernel.org, dyoung@redhat.com, linux-doc@vger.kernel.org, bhe@redhat.com, vgoyal@redhat.com, stern@rowland.harvard.edu, akpm@linux-foundation.org, andriy.shevchenko@linux.intel.com, corbet@lwn.net, halves@canonical.com, kernel@gpiccoli.net, Will Deacon , Kees Cook , Steven Rostedt , Hidehiro Kawai , Vitaly Kuznetsov , HATAYAMA Daisuke , Masami Hiramatsu , John Ogness , "Paul E. McKenney" , Peter Zijlstra , Juergen Gross Subject: Re: [PATCH V4] notifier/panic: Introduce panic_notifier_filter Message-ID: References: <20220108153451.195121-1-gpiccoli@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220108153451.195121-1-gpiccoli@igalia.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Adding some more people into Cc. Some modified the logic in the past. Some are familiar with some interesting areas where the panic notfiers are used. On Sat 2022-01-08 12:34:51, Guilherme G. Piccoli wrote: > The kernel notifier infrastructure allows function callbacks to be > added in multiple lists, which are then called in the proper time, > like in a reboot or panic event. The panic_notifier_list specifically > contains the callbacks that are executed during a panic event. As any > other notifier list, the panic one has no filtering and all functions > previously registered are executed. > > The kdump infrastructure, on the other hand, enables users to set > a crash kernel that is kexec'ed in a panic event, and vmcore/logs > are collected in such crash kernel. When kdump is set, by default > the panic notifiers are ignored - the kexec jumps to the crash kernel > before the list is checked and callbacks executed. > > There are some cases though in which kdump users might want to > allow panic notifier callbacks to execute _before_ the kexec to > the crash kernel, for a variety of reasons - for example, users > may think kexec is very prone to fail and want to give a chance > to kmsg dumpers to run (and save logs using pstore), Yes, this seems to be original intention for the "crash_kexec_post_notifiers" option, see the commit f06e5153f4ae2e2f3b0300f ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifiers") > some panic notifier is required to properly quiesce some hardware > that must be used to the crash kernel. Do you have any example, please? The above mentioned commit says "crash_kexec_post_notifiers" actually increases risk of kdump failure. Note that kmsg_dump() is called after the notifiers only because some are printing more information, see the commit 6723734cdff15211bb78a ("panic: call panic handlers before kmsg_dump"). They might still increase the change that kmsg_dump() will never be called. > But there's a problem: currently it's an "all-or-nothing" situation, > the kdump user choice is either to execute all panic notifiers or > none of them. Given that panic notifiers may increase the risk of a > kdump failure, this is a tough decision and may affect the debug of > hard to reproduce bugs, if for some reason the user choice is to > enable panic notifiers, but kdump then fails. > > So, this patch aims to ease this decision: we hereby introduce a filter > for the panic notifier list, in which users may select specifically > which callbacks they wish to run, allowing a safer kdump. The allowlist > should be provided using the parameter "panic_notifier_filter=a,b,..." > where a, b are valid callback names. Invalid symbols are discarded. I am afraid that this is almost unusable solution: + requires deep knowledge of what each notifier does + might need debugging what notifier causes problems + the list might need to be updated when new notifiers are added + function names are implementation detail and might change + requires kallsyms It is only workaround for a real problem. The problem is that "panic_notifier_list" is used for many purposes that break each other. I checked some notifiers and found few groups: + disable watchdogs: + hung_task_panic() + rcu_panic() + dump information: + kernel_offset_notifier() + trace_panic_handler() (duplicate of panic_print=0x10) + inform hypervisor + xen_panic_event() + pvpanic_panic_notify() + hyperv_panic_event() + misc cleanup / flush / blinking + panic_event() in ipmi_msghandler.c + panic_happened() in heartbeat.c + led_trigger_panic_notifier() IMHO, the right solution is to split the callbacks into 2 or more notifier list. Then we might rework panic() to do: void panic(void) { [...] /* stop watchdogs + extra info */ atomic_notifier_call_chain(&panic_disable_watchdogs_notifier_list, 0, buf); atomic_notifier_call_chain(&panic_info_notifier_list, 0, buf); panic_print_sys_info(); /* crash_kexec + kmsg_dump in configurable order */ if (!_crash_kexec_post_kmsg_dump) { __crash_kexec(NULL); smp_send_stop(); } else { crash_smp_send_stop(); } kmsg_dump(); if (_crash_kexec_post_kmsg_dump) __crash_kexec(NULL); /* infinite loop or reboot */ atomic_notifier_call_chain(&panic_hypervisor_notifier_list, 0, buf); atomic_notifier_call_chain(&panic_rest_notifier_list, 0, buf); console_flush_on_panic(CONSOLE_FLUSH_PENDING); if (panic_timeout >= 0) { timeout(); emergency_restart(); } for (i = 0; ; i += PANIC_TIMER_STEP) { if (i >= i_next) { i += panic_blink(state ^= 1); i_next = i + 3600 / PANIC_BLINK_SPD; } mdelay(PANIC_TIMER_STEP); } } Two notifier lists might be enough in the above scenario. I would call them: panic_pre_dump_notifier_list panic_post_dump_notifier_list It is a real solution that will help everyone. It is more complicated now but it will makes things much easier in the long term. And it might be done step by step: 1. introduce the two notifier lists 2. convert all users: one by one 3. remove the original notifier list when there is no user Best Regards, Petr