Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1424721imm; Thu, 19 Jul 2018 01:09:06 -0700 (PDT) X-Google-Smtp-Source: AAOMgpfx88mGvLhM+eY9zIO8Hvm/t3E2xO6kpYyhN2bTw8VWFE21lTCLoGBTfD4Lw0NJEWAHp1jr X-Received: by 2002:a63:de10:: with SMTP id f16-v6mr8841382pgg.97.1531987746259; Thu, 19 Jul 2018 01:09:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531987746; cv=none; d=google.com; s=arc-20160816; b=t3VG4RbCOVX+I+5W+K1yoLZUjW4W+mmvj8aa2GP2Q+B3AtmQrqFRk2/1aXcKaVVUcW ezxKdxwLvtAbnXwFxkQlnERFu7Yt3je5hEpo3nTr4B9wtzbnV6hct185FhJn6gUcZjuj 9YJwr7lKHDzXIqL9/njskOqDwFC6G26egkhL8mOQI+BHPlhGS497TlTlPF7vmRN0Xoj3 62AUh2QzTiysbcKwmCjCprzYV0KQZhW1QEwfs884qmn1r93ICROjmSoqlwj0W7rRU9Vr 5YOFbzfAaU6qHlII3huZ4DlhVSrJ4oxAzuFsPBCcFe0gSWbZU9mz+zYRcefmfDEppmmb dG7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Gq5bcaKOUSU68bcSWbf0oAg1YZZKRAlYyNj/+qfHbjQ=; b=fQKA63k8+DabJsNHRaAMNxsE1pJ/YRXvQrSh8cZP6oSLnOueJgcV1y7Rvume5RvaLr AFCSqXcML+UBUMAnNWWOWhB/L+lO44wtKws9x8tsAbdJoaAW9BMlKX0ABc3yRwBO0E9s /NRwUJ7gpmpEJ+XB1KexmymLV8Y3OKWO8u2YYKtCCz7mR3NvRPpW8yV+7Zp0H38t+n8l 01NVppJ37Ha3l27MjbZ4TfUryrVXUZ89Y9bdaoyLqEZUT29k8r502OSAdAWEaQlBz+kG l/CsvcXSvyjotP5+TvXzAY2dkLMn20/UUEMv8PDow+8JTQxRC8x9XL6FEZhgBgz3Cyp1 W+pw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y7-v6si4875550pgp.551.2018.07.19.01.08.51; Thu, 19 Jul 2018 01:09:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731422AbeGSIuE (ORCPT + 99 others); Thu, 19 Jul 2018 04:50:04 -0400 Received: from mx2.suse.de ([195.135.220.15]:56794 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1730565AbeGSIuE (ORCPT ); Thu, 19 Jul 2018 04:50:04 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay1.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id ADD0BADB9; Thu, 19 Jul 2018 08:08:06 +0000 (UTC) Date: Thu, 19 Jul 2018 10:08:05 +0200 From: Michal Hocko To: Mahesh Jagannath Salgaonkar Cc: linuxppc-dev , Linux Kernel , Hari Bathini , Ananth N Mavinakayanahalli , Srikar Dronamraju , "Aneesh Kumar K.V" , Anshuman Khandual , Andrew Morton , Joonsoo Kim , Ananth Narayan , kernelfans@gmail.com Subject: Re: [RFC PATCH v6 0/4] powerpc/fadump: Improvements and fixes for firmware-assisted dump. Message-ID: <20180719080805.GM7193@dhcp22.suse.cz> References: <153172096333.29252.4376707071382727345.stgit@jupiter.in.ibm.com> <20180716082646.GF17280@dhcp22.suse.cz> <20180717115232.GF7193@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.0 (2018-05-17) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 18-07-18 21:52:17, Mahesh Jagannath Salgaonkar wrote: > On 07/17/2018 05:22 PM, Michal Hocko wrote: > > On Tue 17-07-18 16:58:10, Mahesh Jagannath Salgaonkar wrote: > >> On 07/16/2018 01:56 PM, Michal Hocko wrote: > >>> On Mon 16-07-18 11:32:56, Mahesh J Salgaonkar wrote: > >>>> One of the primary issues with Firmware Assisted Dump (fadump) on Power > >>>> is that it needs a large amount of memory to be reserved. This reserved > >>>> memory is used for saving the contents of old crashed kernel's memory before > >>>> fadump capture kernel uses old kernel's memory area to boot. However, This > >>>> reserved memory area stays unused until system crash and isn't available > >>>> for production kernel to use. > >>> > >>> How much memory are we talking about. Regular kernel dump process needs > >>> some reserved memory as well. Why that is not a big problem? > >> > >> We reserve around 5% of total system RAM. On large systems with > >> TeraBytes of memory, this reservation can be quite significant. > >> > >> The regular kernel dump uses the kexec method to boot into capture > >> kernel and it can control the parameters that are being passed to > >> capture kernel. This allows a capability to strip down the parameters > >> that can help lowering down the memory requirement for capture kernel to > >> boot. This allows regular kdump to reserve less memory to start with. > >> > >> Where as fadump depends on power firmware (pHyp) to load the capture > >> kernel after full reset and boots like a regular kernel. It needs same > >> amount of memory to boot as the production kernel. On large systems > >> production kernel needs significant amount of memory to boot. Hence > >> fadump needs to reserve enough memory for capture kernel to boot > >> successfully and execute dump capturing operations. By default fadump > >> reserves 5% of total system RAM and in most cases this has worked > >> flawlessly on variety of system configurations. Optionally, > >> 'crashkernel=X' can also be used to specify more fine-tuned memory size > >> for reservation. > > > > So why do we even care about fadump when regular kexec provides > > (presumably) same functionality with a smaller memory footprint? Or is > > there any reason why kexec doesn't work well on ppc? > > Kexec based kdump is loaded by crashing kernel. When OS crashes, the > system is in an inconsistent state, especially the devices. In some > cases, a rogue DMA or ill-behaving device drivers can cause the kdump > capture to fail. > > On power platform, fadump solves these issues by taking help from power > firmware, to fully-reset the system, load the fresh copy of same kernel > to capture the dump with PCI and I/O devices reinitialized, making it > more reliable. Thanks for the clarification. > Fadump does full system reset, booting system through the regular boot > options i.e the dump capture kernel is booted in the same fashion and > doesn't have specialized kernel command line option. This implies, we > need to give more memory for the system boot. Since the new kernel boots > from the same memory location as crashed kernel, we reserve 5% of memory > where power firmware moves the crashed kernel's memory content. This > reserved memory is completely removed from the available memory. For > large memory systems like 64TB systems, this account to ~ 3TB, which is > a significant chunk of memory production kernel is deprived of. Hence, > this patch adds an improvement to exiting fadump feature to make the > reserved memory available to system for use, using zone movable. Is the 5% a reasonable estimate or more a ballpark number? I find it a bit strange to require 3TB of memory to boot a kernel just to dump the crashed kernel image. Shouldn't you rather look into this estimate than spreading ZONE_MOVABLE abuse? Larger systems need more memory to dump even with the regular kexec kdump but I have never seen any to use more than 1G or something like that. -- Michal Hocko SUSE Labs