Received: by 2002:a05:7412:b10a:b0:f3:1519:9f41 with SMTP id az10csp267943rdb; Thu, 30 Nov 2023 04:32:04 -0800 (PST) X-Google-Smtp-Source: AGHT+IGE2gnPouR+WPWD79qhj0kC6Mvk/JI+249s1K3TV/YDXbPq26dvjRBBR/SQLJyHH8sBlkBD X-Received: by 2002:a17:902:d2c6:b0:1cf:cf34:d4e0 with SMTP id n6-20020a170902d2c600b001cfcf34d4e0mr15712967plc.23.1701347524096; Thu, 30 Nov 2023 04:32:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701347524; cv=none; d=google.com; s=arc-20160816; b=z0qejIHQhaXzNiTcYq9nzeWcClITODqgt7GZM/1/vJmI6OjtUcO+oCEd5y913f/HxS iSjIYYvw1h9rXHvJ027JAdy5u979g48+yMtb08pDimqHl5rG7yJEotKoQRG0Zqm2lSjK 0WF7QsDjUjStDgtvQ59JVzSBmp62p/5ir94OFMjauPVETHMg1AXpIIVZlJ4cl+crpgXa g5wPu/ED754gXbJ+ccbncg9cPNzuvEq770Klx5BZFmCtEPjr81h3cl+Jv+7gB0qh/KWq thji2OCRo2CGc9GS24ODyDzTLmrdS9EX6eVAUDjGtbATJPD5aWmsFiCtk53TuC216PrA gdwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Rb8IOIu/t/qR1ytNoPyJ1c9zi9oIBJiXlmkbGz0coSs=; fh=YixqC/HmY5RaZe3BRPyMsz0B/0nyqp8sxqLrXkMVNCU=; b=WxQG8ktiDZqlvbSKZtQvkQCgTrylVMOY/114GQM6wlRJ+nRLZJVbGnQKHpikV17YA/ mNUEL/ALq7glOWAsHLKD0H1SxZrIHL3yx0VotJgynTg3niT5fhUbRvDhRa02CJ3DlhO6 9xwuKBa6cHuYoUlO7qDOCgrqBEIZ/JyNqc015zs108L/gDEUZRQrwS5CrkpDXB9zohbN 2CajYC6KAOYqH0a10eVU5F09UQJ0UTxxPB+O9iLcwOPqZF6kqPFc0nP07fMU64/M5OM/ d5nG9x9SHjqfyACG0P1J5o29KmnT88IFCWSTQ6QjNARVpZJsD+CMbRJtsGmVw6a0xoLW zLUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TojzbhA6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id h10-20020a170902748a00b001cc43368c9bsi1094958pll.630.2023.11.30.04.32.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Nov 2023 04:32:04 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TojzbhA6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id 7F169803F8DE; Thu, 30 Nov 2023 04:32:01 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345469AbjK3Mbs (ORCPT + 99 others); Thu, 30 Nov 2023 07:31:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39824 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232008AbjK3Mbq (ORCPT ); Thu, 30 Nov 2023 07:31:46 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A97D46 for ; Thu, 30 Nov 2023 04:31:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1701347511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Rb8IOIu/t/qR1ytNoPyJ1c9zi9oIBJiXlmkbGz0coSs=; b=TojzbhA6Vz7/Aj9A7wV6up1SO8j3kokrWHVbw23XBlb16h3WAVJWrPMFndc6i9YxffmmR1 XER80jvU4i6LJJciAEUA6OGG3pXwvXHcUN0z8Eq9gM0CGuXL3KKWjsP300pSaFCI8XdNXk Pno1Ae/3MkhUx6DAT0CgmSOk8gu1jO0= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-262-z1SPdZvmN6qOwz4QTIffsA-1; Thu, 30 Nov 2023 07:31:48 -0500 X-MC-Unique: z1SPdZvmN6qOwz4QTIffsA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 254401C07580; Thu, 30 Nov 2023 12:31:48 +0000 (UTC) Received: from localhost (unknown [10.72.113.121]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6E58C1C060BB; Thu, 30 Nov 2023 12:31:47 +0000 (UTC) Date: Thu, 30 Nov 2023 20:31:44 +0800 From: Baoquan He To: Michal Hocko Cc: Donald Dutile , Jiri Bohac , Pingfan Liu , Tao Liu , Vivek Goyal , Dave Young , kexec@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA Message-ID: References: <91a31ce5-63d1-7470-18f7-92b039fda8e6@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Thu, 30 Nov 2023 04:32:01 -0800 (PST) Hi Michal, On 11/30/23 at 08:04pm, Baoquan He wrote: > On 11/30/23 at 11:16am, Michal Hocko wrote: > > On Thu 30-11-23 11:00:48, Baoquan He wrote: > > [...] > > > Now, we are worried if there's risk if the CMA area is retaken into kdump > > > kernel as system RAM. E.g is it possible that 1st kernel's ongoing RDMA > > > or DMA will interfere with kdump kernel's normal memory accessing? > > > Because kdump kernel usually only reset and initialize the needed > > > device, e.g dump target. Those unneeded devices will be unshutdown and > > > let go. Re-read your mail, we are saying the same thing, Please ignore the words at bottom from my last mail. > > > > I do not really want to discount your concerns but I am bit confused why > > this matters so much. First of all, if there is a buggy RDMA driver Not buggy DMA or RDMA driver. This is decided by kdump mechanism. When we do kexec reboot, we shutdown cpu, interrupt, all devicees. When we do kdump, we only shutdown cpu, interrupt. > > which doesn't use the proper pinning API (which would migrate away from > > the CMA) then what is the worst case? We will get crash kernel corrupted > > potentially and fail to take a proper kernel crash, right? Is this > > worrisome? Yes. Is it a real roadblock? I do not think so. The problem We may fail to take a proper kernel crash, why isn't it a roadblock? We have stable way with a little more memory, why would we take risk to take another way, just for saving memory? Usually only high end server needs the big memory for crashkernel and the big end server usually have huge system ram. The big memory will be a very small percentage relative to huge system RAM. > > seems theoretical to me and it is not CMA usage at fault here IMHO. It > > is the said theoretical driver that needs fixing anyway. Now, what we want to make clear is if it's a theoretical possibility, or very likely happen. We have met several on-flight DMA stomping into kexec kernel's initrd in the past two years because device driver didn't provide shutdown() methor properly. For kdump, once it happen, the pain is we don't know how to debug. For kexec reboot, customer allows to login their system to reproduce and figure out the stomping. For kdump, the system corruption rarely happend, and the stomping could rarely happen too. The code change looks simple and the benefit is very attractive. I surely like it if finally people confirm there's no risk. As I said, we can't afford to take the risk if it possibly happen. But I don't object if other people would rather take risk, we can let it land in kernel. My personal opinion, thanks for sharing your thought. > > > > Now, it is really fair to mention that CMA backed crash kernel memory > > has some limitations > > - CMA reservation can only be used by the userspace in the > > primary kernel. If the size is overshot this might have > > negative impact on kernel allocations > > - userspace memory dumping in the crash kernel is fundamentally > > incomplete. > > I am not sure if we are talking about the same thing. My concern is: > ==================================================================== > 1) system corrutption happened, crash dumping is prepared, cpu and > interrupt controllers are shutdown; > 2) all pci devices are kept alive; > 3) kdump kernel boot up, initialization is only done on those devices > which drivers are added into kdump kernel's initrd; > 4) those on-flight DMA engine could be still working if their kernel > module is not loaded; > > In this case, if the DMA's destination is located in crashkernel=,cma > region, the DMA writting could continue even when kdump kernel has put > important kernel data into the area. Is this possible or absolutely not > possible with DMA, RDMA, or any other stuff which could keep accessing > that area? > > The existing crashkernel= syntax can gurantee the reserved crashkernel > area for kdump kernel is safe. > ======================================================================= > > The 1st kernel's data in the ,cma area is ignored once crashkernel=,cma > is taken. >