Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp996323rdb; Fri, 1 Dec 2023 04:36:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IE3f0UWmtZunczM5qs00CLfEd3SicM9Fd7J2SN8PKLIORgwOpxcZ8EUtwfLbiBnxhqzTiTO X-Received: by 2002:a17:902:f54c:b0:1cf:b47f:a9a6 with SMTP id h12-20020a170902f54c00b001cfb47fa9a6mr22054474plf.41.1701434164600; Fri, 01 Dec 2023 04:36:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701434164; cv=none; d=google.com; s=arc-20160816; b=QwYJ/MfMLP/1+VerfOSAZ6NJzMf09I+uHJPQupvc7z9FycD6Tn8XZgprJJAvlrsiFI JTVX5zMG+QkcWIpf2e/gUmr8O0fdfyaZrhNcAx8Dd3xNWTz/UWlKo6fcbap3tGxEsivD CLZW6wuB+mfRqf+jXgffVULOiGRWqoKAOoTMu+qTp9NdLlkJUw0EzqZ4eQ4mLOG4ohps 0E28MSFYYP0WaRRabCznLxbgca/lfKzVuXomrzBerwn+cFTeAUxX9mFEIvhZ9zc4mOrW N0azvtl74WgtqVZq1hr2ab4Vj59fObyRjNxamYvxdDE03iccbcw9rt2txNmTk2LHHHYH RuAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature :dkim-signature; bh=9qpSH/VFJA7oKeS3f7gAfhSd3U4lHBZHK5JZku/5WYQ=; fh=6FqhDmI9Ndd85t4dwHGvpYpry9nWb4i/3kDYVCO6Zdo=; b=JAmjGgcfSjrwfD+ey6k/9b9nIf9TrIx8sgGX3hnX65YWH3eIK+cGl0YFBXwP78J+Yr 3hGUX4zgBDmOFyW9Q6xmKpgfEKsol4xew0ns4OtQcqHeNYkGfPFfuQHS6trR4U0NTQLP CcgfrzlmZfwbEtYI2h9R1CWcS94nd4K4Uq57SM7wr5H1deMWvpOUD9uWrqakukLYBd2U EuNpQY5/ZEEkRTPePnCLXRcu7XvMT4CPQdQBtETYW2VYGy3BosxGZy7+5ZNfVIZOFCjt FAvCZYH8F8ruTb44z09z6Xqa2SiRH9yLJg/iY5/An01AlP9AwzSpwiezxIQ7yNHmTFSW aeBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=X+pFGdUQ; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=+vVnfTwi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id bc4-20020a170902930400b001d05814624esi799782plb.385.2023.12.01.04.36.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Dec 2023 04:36:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=X+pFGdUQ; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519 header.b=+vVnfTwi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id A82AD80FE53E; Fri, 1 Dec 2023 04:35:59 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378865AbjLAMfj (ORCPT + 99 others); Fri, 1 Dec 2023 07:35:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378831AbjLAMfh (ORCPT ); Fri, 1 Dec 2023 07:35:37 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4247E1707 for ; Fri, 1 Dec 2023 04:35:43 -0800 (PST) Received: from localhost (unknown [10.100.12.32]) by smtp-out2.suse.de (Postfix) with ESMTP id A98A51FD68; Fri, 1 Dec 2023 12:35:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1701434141; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9qpSH/VFJA7oKeS3f7gAfhSd3U4lHBZHK5JZku/5WYQ=; b=X+pFGdUQxOA8KtjhjRyMzN3ggAOsn24COaoxnTYbWMIZl5NJczGRliqrqTazExiUqFFmp8 sVuZvnWSipxXbC/o39m4A3vAu+kRgHkjPYU8yl2JiRvH6EpvZGkz+V+TAaMVxg+Uq5wcab XwC0dr+/1U7TeWdotx8FhIorBGweUWg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1701434141; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9qpSH/VFJA7oKeS3f7gAfhSd3U4lHBZHK5JZku/5WYQ=; b=+vVnfTwiXLhbdEvutNTmbGnGIaK/VlmQC9iU1KwONAYD5KTA1u01CcOhnJLGaE6BSyc4Zj dIptQs22X3O2FBCw== Date: Fri, 1 Dec 2023 13:35:41 +0100 From: Jiri Bohac To: Baoquan He Cc: Michal Hocko , Pingfan Liu , Tao Liu , Vivek Goyal , Dave Young , kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spamd-Result: default: False [-0.98 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_SPAM_SHORT(3.00)[1.000]; MIME_GOOD(-0.10)[text/plain]; NEURAL_HAM_LONG(-0.88)[-0.880]; MID_RHS_MATCH_FROMTLD(0.00)[]; DKIM_SIGNED(0.00)[suse.cz:s=susede2_rsa,suse.cz:s=susede2_ed25519]; RCPT_COUNT_SEVEN(0.00)[8]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.cz:email]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Score: -0.98 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 01 Dec 2023 04:35:59 -0800 (PST) On Thu, Nov 30, 2023 at 12:01:36PM +0800, Baoquan He wrote: > On 11/29/23 at 11:51am, Jiri Bohac wrote: > > We get a lot of problems reported by partners testing kdump on > > their setups prior to release. But even if we tune the reserved > > size up, OOM is still the most common reason for kdump to fail > > when the product starts getting used in real life. It's been > > pretty frustrating for a long time. > > I remember SUSE engineers ever told you will boot kernel and do an > estimation of kdump kernel usage, then set the crashkernel according to > the estimation. OOM will be triggered even that way is taken? Just > curious, not questioning the benefit of using ,cma to save memory. Yes, we do that during the kdump package build. We use this to find some baseline for memory requirements of the kdump kernel and tools on that specific product. Using these numbers we estimate the requirements on the system where kdump is configured by adding extra memory for the size of RAM, number of SCSI devices, etc. But apparently we get this wrong in too many cases, because the actual hardware differs too much from the virtual environment which we used to get the baseline numbers. We've been adding silly constants to the calculations and we still get OOMs on one hand and people hesitant to sacrifice the calculated amount of memory on the other. The result is that kdump basically cannot be trusted unless the user verifies that the sacrificed memory is still enough after every major upgrade. This is the main motivation behind the CMA idea: to safely give kdump enough memory, including a safe margin, without sacrificing too much memory. > > I feel the exact opposite about VMs. Reserving hundreds of MB for > > crash kernel on _every_ VM on a busy VM host wastes the most > > memory. VMs are often tuned to well defined task and can be set > > up with very little memory, so the ~256 MB can be a huge part of > > that. And while it's theoretically better to dump from the > > hypervisor, users still often prefer kdump because the hypervisor > > may not be under their control. Also, in a VM it should be much > > easier to be sure the machine is safe WRT the potential DMA > > corruption as it has less HW drivers. So I actually thought the > > CMA reservation could be most useful on VMs. > > Hmm, we ever discussed this in upstream with David Hildend who works in > virt team. VMs problem is much easier to solve if they complain the > default crashkernel value is wasteful. The shrinking interface is for > them. The crashkernel value can't be enlarged, but shrinking existing > crashkernel memory is functioning smoothly well. They can adjust that in > script in a very simple way. The shrinking does not solve this problem at all. It solves a different problem: the virtual hardware configuration can easily vary between boots and so will the crashkernel size requirements. And since crashkernel needs to be passed on the commandline, once the system is booted it's impossible to change it without a reboot. Here the shrinking mechanism comes in handy - we reserve enough for all configurations on the command line and during boot the requirements for the currently booted configuration can be determined and the reservation shrunk to the determined value. But determining this value is the same unsolved problem as above and CMA could help in exactly the same way. > Anyway, let's discuss and figure out any risk of ,cma. If finally all > worries and concerns are proved unnecessary, then let's have a new great > feature. But we can't afford the risk if the ,cma area could be entangled > with 1st kernel's on-going action. As we know, not like kexec reboot, we > only shutdown CPUs, interrupt, most of devices are alive. And many of > them could be not reset and initialized in kdump kernel if the relevant > driver is not added in. Well since my patchset makes the use of ,cma completely optional and has _absolutely_ _no_ _effect_ on users that don't opt to use it, I think you're not taking any risk at all. We will never know how much DMA is a problem in practice unless we give users or distros a way to try and come up with good ways of determining if it's safe on whichever specific system based on the hardware, drivers, etc. I've successfully tested the patches on a few systems, physical and virtual. Of course this is not proof that the DMA problem does not exist but shows that it may be a solution that mostly works. If nothing else, for systems where sacrificing ~400 MB of memory is something that prevents the user from having any dump at all, having a dump that mostly works with a sacrifice of ~100 MB may be useful. Thanks, -- Jiri Bohac SUSE Labs, Prague, Czechia