Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp2207992rwd; Fri, 16 Jun 2023 00:14:57 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6yNa38bzqgxAcAdDSW7rg1eZoXq8W9vY5uvEc261aO3PtY8zcCCMUtAVwkio6kjN2o+c0n X-Received: by 2002:a05:6a20:4320:b0:11d:5ec:f6be with SMTP id h32-20020a056a20432000b0011d05ecf6bemr1739304pzk.60.1686899697330; Fri, 16 Jun 2023 00:14:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686899697; cv=none; d=google.com; s=arc-20160816; b=F9M8zwz2lgQ/GkdM9R0CMa9EJAfYyI6+KbzQkRpia2YbCUude1amaGXBOTEWGtI+mV qzGvsZwBF/AsBy0tdNAhZlRaqHjZhC3QzvDNN2+O7H3QmgdGq8x3TqAvcmQ5b9egCHy/ FUhggV/2DWDHpNj2NgDG3kDeNzBoUEFr2UjPOBY400AfDirnZiqKjK+IkZC/WhJv6f9d pArn4wTZjVvrCNMj/eLxvc9m8ApnNmY6BLybb7bthuQ3qYWjoThxis7WfAXCKLT5ctz0 F02rOFmjDAFY9gSG4ECGhdWphc/nrzw0ORwShndEAm7sOVJaHl7K5wL3Oo6KMLI58x52 5OFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=YGuinu+vy8S9dbzjVAgn+Vn8ef1Jk4zNddFVDAy6Ni0=; b=L4/5N86XOLVPUh9hBWgg/kSWqcelUvH+MPisE9dNgWepDG0ldDw88xydlUk5PLhOPD F2VUiKTTY/VdSbOX4AhbN6+OF87tUeWOrpKY3lOC2R7UKnmaOsyhaimfX2UTWWH52gWz 8dPmurBsWzu68lR9HuvuZq9nmIzfjpDmWfIuz6vTHbJlT/n8+oek6xfYtFpyrY/YSd5F dvYO2AvKZ5pUbFFkexu/BBp3KB24RDhJ7UjDTzj3eIkpuWp1xLj9ia/cKFYUj0C0dlKN kAxe5X7XQuFm6ahIWLejzAhGHgljVceugCGrqp1Ee0KVZ5Hf7TbMsO1sEY4o4HrjWKA0 byqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fOQRGsZD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j14-20020a170903024e00b001ac6e1fd203si11616443plh.180.2023.06.16.00.14.41; Fri, 16 Jun 2023 00:14:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=fOQRGsZD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231580AbjFPGoS (ORCPT + 99 others); Fri, 16 Jun 2023 02:44:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41562 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230378AbjFPGoR (ORCPT ); Fri, 16 Jun 2023 02:44:17 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 404302118; Thu, 15 Jun 2023 23:44:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1686897856; x=1718433856; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=5hbSK8g7G9zu60nQd2Mz8toL8P7q0y+tNU5DH4UKOkY=; b=fOQRGsZD67CylTyErQUaZcEEc6A1fqejejPq+99rVpp67OXgi9V2htzH qqUhePrwreU03pOJ2C9yH9tZXN+ky+baVgWVhtuC9fvtY8TcDarn5ems9 0tTqZ1TMYxVGT87J7NOkaGt6ZKYI5hGg8VJU5+H31j42KH/xMtFgsZ9/w barsH6rgOsB7+om+ZvOvoWRLDtjHpDNZgtPEicKtppnYEcGlz0jyG3nNN XxkXm+N3cDTQrswV7yskn/tOBUSbCbt6F2kORhF1UP8PRYziJ6zWWjLBR izpcMPxsyEDBNfoJ41pJPwjqWbDNV7aL5n9pIXzxzhiWwi4Rlqjn25qs5 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="425076498" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="425076498" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 23:44:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10742"; a="802717338" X-IronPort-AV: E=Sophos;i="6.00,246,1681196400"; d="scan'208";a="802717338" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jun 2023 23:44:12 -0700 From: "Huang, Ying" To: Vishal Verma Cc: "Rafael J. Wysocki" , Len Brown , Andrew Morton , David Hildenbrand , Oscar Salvador , Dan Williams , Dave Jiang , , , , , , Dave Hansen Subject: Re: [PATCH 3/3] dax/kmem: Always enroll hotplugged memory for memmap_on_memory References: <20230613-vv-kmem_memmap-v1-0-f6de9c6af2c6@intel.com> <20230613-vv-kmem_memmap-v1-3-f6de9c6af2c6@intel.com> Date: Fri, 16 Jun 2023 14:42:43 +0800 In-Reply-To: <20230613-vv-kmem_memmap-v1-3-f6de9c6af2c6@intel.com> (Vishal Verma's message of "Thu, 15 Jun 2023 16:00:25 -0600") Message-ID: <87zg4zewm4.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Vishal Verma writes: > With DAX memory regions originating from CXL memory expanders or > NVDIMMs, the kmem driver may be hot-adding huge amounts of system memory > on a system without enough 'regular' main memory to support the memmap > for it. To avoid this, ensure that all kmem managed hotplugged memory is > added with the MHP_MEMMAP_ON_MEMORY flag to place the memmap on the > new memory region being hot added. > > To do this, call add_memory() in chunks of memory_block_size_bytes() as > that is a requirement for memmap_on_memory. Additionally, Use the > mhp_flag to force the memmap_on_memory checks regardless of the > respective module parameter setting. > > Cc: "Rafael J. Wysocki" > Cc: Len Brown > Cc: Andrew Morton > Cc: David Hildenbrand > Cc: Oscar Salvador > Cc: Dan Williams > Cc: Dave Jiang > Cc: Dave Hansen > Cc: Huang Ying > Signed-off-by: Vishal Verma > --- > drivers/dax/kmem.c | 49 ++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 36 insertions(+), 13 deletions(-) > > diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c > index 7b36db6f1cbd..0751346193ef 100644 > --- a/drivers/dax/kmem.c > +++ b/drivers/dax/kmem.c > @@ -12,6 +12,7 @@ > #include > #include > #include > +#include > #include "dax-private.h" > #include "bus.h" > > @@ -105,6 +106,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > data->mgid = rc; > > for (i = 0; i < dev_dax->nr_range; i++) { > + u64 cur_start, cur_len, remaining; > struct resource *res; > struct range range; > > @@ -137,21 +139,42 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) > res->flags = IORESOURCE_SYSTEM_RAM; > > /* > - * Ensure that future kexec'd kernels will not treat > - * this as RAM automatically. > + * Add memory in chunks of memory_block_size_bytes() so that > + * it is considered for MHP_MEMMAP_ON_MEMORY > + * @range has already been aligned to memory_block_size_bytes(), > + * so the following loop will always break it down cleanly. > */ > - rc = add_memory_driver_managed(data->mgid, range.start, > - range_len(&range), kmem_name, MHP_NID_IS_MGID); > + cur_start = range.start; > + cur_len = memory_block_size_bytes(); > + remaining = range_len(&range); > + while (remaining) { > + mhp_t mhp_flags = MHP_NID_IS_MGID; > > - if (rc) { > - dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", > - i, range.start, range.end); > - remove_resource(res); > - kfree(res); > - data->res[i] = NULL; > - if (mapped) > - continue; > - goto err_request_mem; > + if (mhp_supports_memmap_on_memory(cur_len, > + MHP_MEMMAP_ON_MEMORY)) > + mhp_flags |= MHP_MEMMAP_ON_MEMORY; > + /* > + * Ensure that future kexec'd kernels will not treat > + * this as RAM automatically. > + */ > + rc = add_memory_driver_managed(data->mgid, cur_start, > + cur_len, kmem_name, > + mhp_flags); > + > + if (rc) { > + dev_warn(dev, > + "mapping%d: %#llx-%#llx memory add failed\n", > + i, cur_start, cur_start + cur_len - 1); > + remove_resource(res); > + kfree(res); > + data->res[i] = NULL; > + if (mapped) > + continue; > + goto err_request_mem; > + } > + > + cur_start += cur_len; > + remaining -= cur_len; > } > mapped++; > } It appears that we need to hot-remove memory in the granularity of memory_block_size_bytes() too, according to try_remove_memory(). If so, it seems better to allocate one dax_kmem_data.res[] element for each memory block instead of dax region? Best Regards, Huang, Ying