Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp3771592rwe; Mon, 29 Aug 2022 20:25:43 -0700 (PDT) X-Google-Smtp-Source: AA6agR7LNhaIlCYTuYL0VTNkAszkDBOFr1koEUCPSkITFTcqH31aPbPuTpkopruEqYjjIg6PoRzv X-Received: by 2002:a17:90b:3b8a:b0:1f5:1df2:1fff with SMTP id pc10-20020a17090b3b8a00b001f51df21fffmr21855393pjb.169.1661829943586; Mon, 29 Aug 2022 20:25:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661829943; cv=none; d=google.com; s=arc-20160816; b=ZoOXOg+Qyk1Z2PAiw9BzYSXFvY0bGrO0MO7c36U8Q5piZeEM2J6MKtya0kDcLJ6exC Sjs244aYZR9RQi3pfdRlnnQcuLnDK3gZxTcFghGhdf/7/BA+14wc2nhjXz72QdBnxfry CGtVmqaQYZEfigUTRT+LuYVuk3pTiHYGhwWNHN7uEA59ia5GOhfnx7GLS9vPjo2BfjQi FsDDqqALSC/F5bwxA5OQ10S1l75T6atAsKTu0CdfJc4pcn7MxvH0X4kHvRE+xC5ytGyI KOobluYRQx5qgR3QMC0qeaNrcN6AO9/z0NSEug1vOpHqROvX4KQTbmZqGLsNST0B7wqs sXyA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=K9eHM5bdDrU457dsjk7yJ0LqGH38gn04owAekbromuM=; b=zcRkdJQbeB5XsFl7q1koydSrVB7VCfCFAn4V1jJIW7AWNVtYkRiKTe/SU0TIO+m9ns cZm35FuXiVuTILHrvcRWxx3BLPkeyJsVd5mceehrxYNf5R8H4p25rMHol0I8kv9foIwF V0kxwhOEf/Lx3ZJITSZKrxdKmVGyzFAmkBlHVOXOI21wevwBCdJKHCEkm+HFOinseR08 oFv9v2ybUWCZV2y1EqGtXTbjyUAckesVm83zCml+bVUVTLKuirUeozUT7ZFAXsi7V5vH kvyktQKdRme6VvuAmtFN2eqZpkcgIAMyDJsPSLrVsJPTPQlCmH/Tplv0LgVCrZ/gFYQv 1UcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W9N4HEgg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ba9-20020a170902720900b001727e8ac180si10097348plb.386.2022.08.29.20.25.32; Mon, 29 Aug 2022 20:25:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=W9N4HEgg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229756AbiH3DZB (ORCPT + 99 others); Mon, 29 Aug 2022 23:25:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229524AbiH3DY5 (ORCPT ); Mon, 29 Aug 2022 23:24:57 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD2CFB1E8 for ; Mon, 29 Aug 2022 20:24:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661829888; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=K9eHM5bdDrU457dsjk7yJ0LqGH38gn04owAekbromuM=; b=W9N4HEggAi69QSa3goJz4uRrJ20ZSvRbJz6RFSkDxMD95zfZwZ51cLVkg/ukmv2sgn+p2l WjDcthFd3Tnq+p89lPXHU4Nolqs6UXd+24as2xoPDwD+RDZqWb/uaeKw4hNXS7SRPSZwI4 GXnpukzV03Ii+mTIKhWTk9e8zPDoqaI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-594-H6hKGCrnPduvbUXehoYbfA-1; Mon, 29 Aug 2022 23:24:44 -0400 X-MC-Unique: H6hKGCrnPduvbUXehoYbfA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B6815857FE7; Tue, 30 Aug 2022 03:24:42 +0000 (UTC) Received: from localhost (ovpn-12-78.pek2.redhat.com [10.72.12.78]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 7A06A951CE; Tue, 30 Aug 2022 03:24:41 +0000 (UTC) Date: Tue, 30 Aug 2022 11:24:37 +0800 From: Baoquan He To: Mike Rapoport Cc: linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , kexec@lists.infradead.org, Catalin Marinas , Guanghui Feng , Mark Rutland , Mike Rapoport , Will Deacon , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Mike Rapoport Subject: Re: [PATCH 0/5] arm64/mm: remap crash kernel with base pages even if rodata_full disabled Message-ID: References: <20220819041156.873873-1-rppt@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/29/22 at 05:31pm, Mike Rapoport wrote: > On Sun, Aug 28, 2022 at 04:37:29PM +0800, Baoquan He wrote: > > On 08/25/22 at 10:48am, Mike Rapoport wrote: > > ...... > > > > > There were several rounds of discussion how to remap with base pages only > > > > > the crash kernel area, the latest one here: > > > > > > > > > > https://lore.kernel.org/all/1656777473-73887-1-git-send-email-guanghuifeng@linux.alibaba.com > > > > > > > > > > and this is my attempt to allow having both large pages in the linear map > > > > > and protection for the crash kernel memory. > > > > > > > > > > For server systems it is important to protect crash kernel memory for > > > > > post-mortem analysis, and for that protection to work the crash kernel > > > > > memory should be mapped with base pages in the linear map. > > > > > > > > > > On the systems with ZONE_DMA/DMA32 enabled, crash kernel reservation > > > > > happens after the linear map is created and the current code forces using > > > > > base pages for the entire linear map, which results in performance > > > > > degradation. > > > > > > > > > > These patches enable remapping of the crash kernel area with base pages > > > > > while keeping large pages in the rest of the linear map. > > > > > > > > > > The idea is to align crash kernel reservation to PUD boundaries, remap that > > > > > PUD and then free the extra memory. > > > > > > > > Hi Mike, > > > > > > > > Thanks for the effort to work on this issue. While I have to say this > > > > isnt's good because it can only be made relying on a prerequisite that > > > > there's big enough memory. If on a system, say 2G memory, it's not easy > > > > to succeed on getting one 1G memory. While we only require far smaller > > > > region than 1G, e.g about 200M which should be easy to get. So the way > > > > taken in this patchset is too quirky and will cause regression on > > > > systemswith small memory. This kind of sytems with small memory exists > > > > widely on virt guest instance. > > > > > > I don't agree there is a regression. If the PUD-aligned allocation fails, > > > there is a fallback to the allocation of the exact size requested for crash > > > kernel. This allocation just won't get protected. > > > > Sorry, I misunderstood it. I just went through the log and didn't > > look into codes. > > > > But honestly, if we accept the fallback which doesn't do the protection, > > we should be able to take off the protection completely, right? > > Otherwise, the reservation code is a little complicated. > > We don't do protection of the crash kernel for most architectures > supporting kexec ;-) Yeah. The protection was introduced into x86 firstly by my former colleague of Redhat as an enhancement. Later people ported it to arm64. We have signature verification mechanism to check if corruption on loaded kdump kernel happened. In fact, panic is a small probability event, and accidental corruption on kdump kernel data is a much smaller probability event. The protection is an icing on the cake. But if it brings mess, better take it away if no way to clean up the mess. > > My goal was to allow large systems with ZONE_DMA/DMA32 have block mappings > in the linear map and crash kernel protection without breaking backward > compatibility for the existing systems. > > > > Also please note, that the changes are only for the case when user didn't > > > force base-size pages in the linear map, so anything that works now will > > > work the same way with this set applied. > > > > > > > The crashkernel reservation happens after linear map because the > > > > reservation needs to know the dma zone boundary, arm64_dma_phys_limit. > > > > If we can deduce that before bootmem_init(), the reservation can be > > > > done before linear map. I will make an attempt on that. If still can't > > > > be accepted, we would like to take off the crashkernel region protection > > > > on arm64 for now. > > > > > > I doubt it would be easy because arm64_dma_phys_limit is determined after > > > parsing of the device tree and there might be memory allocations of > > > possibly unmapped memory during the parsing. > > > > I have sent out the patches with an attempt, it's pretty straightforward > > and simple. Because arm64 only has one exception, namely Raspberry Pi 4, > > on which some peripherals can only address 30bit range. That is a corner > > case, to be honest. And kdump is a necessary feature on server, but may > > not be so expected on Raspberry Pi 4, a system for computer education > > and hobbyists. And kdump only cares whether the dump target devices can > > address 32bit range, namely storage device or network card on server. > > If finally confirmed that storage devices can only address 30bit range > > on Raspberry Pi 4, people still can have crashkernel=xM@yM method to > > reserve crashkernel regions. > > I hope you are right and Raspberry Pi 4 is the only system that limits > DMA'able range to 30 bits. But with diversity of arm64 chips and boards I > won't be surprised that there are other variants with a similar problem. We still need people to confirm if the storage disk or NIC on RPi4 is able to address 32 bit range. From Nicalas's patch log and cover-letter, he said not all devices on RPi4 are 30bit addressable. That's possible a new arm64 chip comes out with devices of 30bit addresing, even though those arm64 servers usually deployed with devices of wider than 32bit DMA addressing ability. And I don't think users of the chip will care about kdump. Kdump is relied more on enterprise level system. On x86, we ignore those ISA devices in kdump kernel at the beginning. As you can see, the current kdump kernel has no available physical pages in DMA zone on x86. If people have a ISA device in x86_64 system, and want to set it as dump target, it doesn't work at all. We don't support the corner case. If we want to cover everything, we can only limp with patches all over us.