Received: by 2002:a25:ca44:0:0:0:0:0 with SMTP id a65csp802514ybg; Tue, 28 Jul 2020 20:37:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyleRZcDdp1gPYq26iPRjZnIFomNtufhrh/veYAEhB4fpf9JW+PgogVhEsUusrNkbyJCfL8 X-Received: by 2002:aa7:c70b:: with SMTP id i11mr9566834edq.272.1595993865468; Tue, 28 Jul 2020 20:37:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1595993865; cv=none; d=google.com; s=arc-20160816; b=MNA0FKQObNo5JxuuA6BCJmRJceyPu6SRQBPMERNfxYnywsVw3D1YpfFSn3NxO8jyvQ sml++LHYQ89pYyEiL2mRvE975QMdREbLGiCNde1fwTfjvmAJBQ+sS1dFm8TSnUoeKhQB vhKIUpUpt2mMujRG+VLtLjdoALxUbhENTfDNWuJH9JAi7D0fWcBZUXEZWIJzCH4THK21 Dv1tn8qNX18+sBRaZaGnWAZxAmnuOqEKOA+YiMPKWcDeUiUBbvzMNemjRL36Wd7GNSSD OhWjMgj7s3C7U+vXkOPHmWUDCfX4SO75lRLNp8fzg92gfobEKRxCY4q8+DZqGZD0eexx yQVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=SVJ+xMG0+Qzfjq0wgc3il+4nX5JI7SQMM0ecNdMNk/I=; b=k9c/YSjysxuBP7/Vmq6PzkM9MbSPG6Ch6s463vkI+2g6KlWyYaVIJNdAbkt5ncsPWG fs6tHC2EuZOgqUXZ6fnXyu1nGBHVzKmpjb8tyLd7fw+EAHQgUiVOMIWxxDeHDjxNFl2l ULObyWPmzd5vm4KFLahWAEjIwqjMCH41/6eVtyAYkBZxowJH26r3UciOswyac1CYcaPH 4ZDk9R/H4X6Du8JwFJW6sSzFwMElOlJ0UD513KyDSBA8q7UWo9YToFyQrmQ6Z8YXqmBx fAiymDUM59Rl0ZlDkBBKenTT5W+Q0wWLilnLBvhjfznvXFz/jFi6hGEhFKhKKWgvf+Hl LxWw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q22si246878ejn.410.2020.07.28.20.37.23; Tue, 28 Jul 2020 20:37:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726929AbgG2Dfx (ORCPT + 99 others); Tue, 28 Jul 2020 23:35:53 -0400 Received: from foss.arm.com ([217.140.110.172]:44668 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726245AbgG2Dfx (ORCPT ); Tue, 28 Jul 2020 23:35:53 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0F46A31B; Tue, 28 Jul 2020 20:35:52 -0700 (PDT) Received: from localhost.localdomain (entos-thunderx2-02.shanghai.arm.com [10.169.212.213]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 93FEF3F66E; Tue, 28 Jul 2020 20:35:44 -0700 (PDT) From: Jia He To: Dan Williams , Vishal Verma , Mike Rapoport , David Hildenbrand Cc: Catalin Marinas , Will Deacon , Greg Kroah-Hartman , "Rafael J. Wysocki" , Dave Jiang , Andrew Morton , Steve Capper , Mark Rutland , Logan Gunthorpe , Anshuman Khandual , Hsin-Yi Wang , Jason Gunthorpe , Dave Hansen , Kees Cook , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-mm@kvack.org, Wei Yang , Pankaj Gupta , Ira Weiny , Kaly Xin , Jia He Subject: [RFC PATCH 5/6] device-dax: relax the memblock size alignment for kmem_start Date: Wed, 29 Jul 2020 11:34:23 +0800 Message-Id: <20200729033424.2629-6-justin.he@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20200729033424.2629-1-justin.he@arm.com> References: <20200729033424.2629-1-justin.he@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Previously, kmem_start in dev_dax_kmem_probe should be aligned with SECTION_SIZE_BITS(30), i.e. 1G memblock size on arm64. Even with Dan Williams' sub-section patch series, it was not helpful when adding the dax pmem kmem to memblock: $ndctl create-namespace -e namespace0.0 --mode=devdax --map=dev -s 2g -f -a 2M $echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind $echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id $cat /proc/iomem ... 23c000000-23fffffff : System RAM 23dd40000-23fecffff : reserved 23fed0000-23fffffff : reserved 240000000-33fdfffff : Persistent Memory 240000000-2403fffff : namespace0.0 280000000-2bfffffff : dax0.0 <- boundary are aligned with 1G 280000000-2bfffffff : System RAM (kmem) $ lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000040000000-0x000000023fffffff 8G online yes 1-8 0x0000000280000000-0x00000002bfffffff 1G online yes 10 Memory block size: 1G Total online memory: 9G Total offline memory: 0B ... Hence there is a big gap between 0x2403fffff and 0x280000000 due to the 1G alignment on arm64. More than that, only 1G memory is returned while 2G is requested. On x86, the gap is relatively small due to SECTION_SIZE_BITS(27). Besides descreasing SECTION_SIZE_BITS on arm64, we can relax the alignment when adding the kmem. After this patch: 240000000-33fdfffff : Persistent Memory 240000000-2421fffff : namespace0.0 242400000-2bfffffff : dax0.0 242400000-2bfffffff : System RAM (kmem) $ lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000040000000-0x00000002bfffffff 10G online yes 1-10 Memory block size: 1G Total online memory: 10G Total offline memory: 0B Notes, block 9-10 are the newly hotplug added. This patches remove the tight alignment constraint of memory_block_size_bytes(), but still keep the constraint from online_pages_range(). Signed-off-by: Jia He --- drivers/dax/kmem.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index d77786dc0d92..849d0706dfe0 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -30,9 +30,20 @@ int dev_dax_kmem_probe(struct device *dev) const char *new_res_name; int numa_node; int rc; + int order; - /* Hotplug starting at the beginning of the next block: */ - kmem_start = ALIGN(res->start, memory_block_size_bytes()); + /* kmem_start needn't be aligned with memory_block_size_bytes(). + * But given the constraint in online_pages_range(), adjust the + * alignment of kmem_start and kmem_size + */ + kmem_size = resource_size(res); + order = min_t(int, MAX_ORDER - 1, get_order(kmem_size)); + kmem_start = ALIGN(res->start, 1ul << (order + PAGE_SHIFT)); + /* Adjust the size down to compensate for moving up kmem_start: */ + kmem_size -= kmem_start - res->start; + /* Align the size down to cover only complete blocks: */ + kmem_size &= ~((1ul << (order + PAGE_SHIFT)) - 1); + kmem_end = kmem_start + kmem_size; /* * Ensure good NUMA information for the persistent memory. @@ -48,13 +59,6 @@ int dev_dax_kmem_probe(struct device *dev) numa_node, res); } - kmem_size = resource_size(res); - /* Adjust the size down to compensate for moving up kmem_start: */ - kmem_size -= kmem_start - res->start; - /* Align the size down to cover only complete blocks: */ - kmem_size &= ~(memory_block_size_bytes() - 1); - kmem_end = kmem_start + kmem_size; - new_res_name = kstrdup(dev_name(dev), GFP_KERNEL); if (!new_res_name) return -ENOMEM; -- 2.17.1