Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp4261685ybb; Mon, 23 Mar 2020 17:12:23 -0700 (PDT) X-Google-Smtp-Source: ADFU+vu0rRWfnu9WgeVPwjtR+RalxZQu6KWVud7OobUxPZux77VPvjcLW5iqd6dV7WNLlhKQEzUs X-Received: by 2002:aca:171a:: with SMTP id j26mr1409034oii.111.1585008743766; Mon, 23 Mar 2020 17:12:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585008743; cv=none; d=google.com; s=arc-20160816; b=epnKWkFIml3ss1qs46ybzSphuhZqIpVJcbcQQqfKZ1f5p84dvmOImevUKrzKyCP54Y CgZWYRYnobf2fdYT6EmJOAP16c4QDlgvszpBAwuC/dmKDzxOHR/qEDbjosaZkfvkcYNl GPbydMy3yiEMJB7PSEQwPUrZhIxzeUarqjdro2cJZb99E4Bs9Yj79YQ3qZgeVGGuMaVE 5CeSeYf4v8H/IJ1ve44YuVtps9HJ2eIXF6N/iocpx+0DZOIensDKfWUc+vs5ew+l+l85 I/HGi0SdHty5GotpxjJ/gZ7k8N+04F0j9YzK05i9OxACXCV7wvGcofpe3wtC4TkT8Cla Y3gA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:cc:to:from :subject:ironport-sdr:ironport-sdr; bh=0PdFZF0xlTK5YjFNE5BF4qeLnKR3JGdEEQ3omLMkgW4=; b=oTwDf9/01YfqebkYe2lnvitZ11QXw/wWAhVV+qTP3qpVwqOkVQ/9DW/H3STerBcb45 Kjn86J8xeq2OcJ4106dvetF6KPNwkU5gVfcHaSfJrdqe8h1ziGUEVsFbZDPp8Lvt43/y rPnT0AweNxjskPWiSuiB0r+6WIURLXbCJmjDOeeI9Hm755gAynY1LCpmOs2kQCqb5VCv yUL/pMGvTWslEFHw6+gwZW7rxtVmeuIIMouIN/KjDIbGLEJbVIAb/cR9l2W8niltkvI6 xDoK0elpfMLoyqODJ+RygG7rZ7llSFa4JIOAsuntiK2sXIgK9/NxV1Fn4BkPvVzOq8af svNg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s6si486431oou.8.2020.03.23.17.12.10; Mon, 23 Mar 2020 17:12:23 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727530AbgCXAL0 (ORCPT + 99 others); Mon, 23 Mar 2020 20:11:26 -0400 Received: from mga02.intel.com ([134.134.136.20]:7729 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727299AbgCXAL0 (ORCPT ); Mon, 23 Mar 2020 20:11:26 -0400 IronPort-SDR: 5wrwA765t5z5tBA6xy3SD0K3nc1PIlZ0/zgjPoxK0Fc0ax1jwcZcajpLgCTPjxeVrFo45FXCtx 8eOd09r9U7QA== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:25 -0700 IronPort-SDR: FhihgXpXO734ms4pg404CTxhXTLQKf7BmV1kpTrW2gfHV9JrtYkgwks/y9x1Xlvxu+ddTK5E5o rzV4RXZa++BA== X-IronPort-AV: E=Sophos;i="5.72,298,1580803200"; d="scan'208";a="357253195" Received: from dwillia2-desk3.jf.intel.com (HELO dwillia2-desk3.amr.corp.intel.com) ([10.54.39.16]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Mar 2020 17:11:25 -0700 Subject: [PATCH 08/12] device-dax: Add resize support From: Dan Williams To: linux-mm@kvack.org Cc: vishal.l.verma@intel.com, dave.hansen@linux.intel.com, hch@lst.de, linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org, jmoyer@redhat.com Date: Mon, 23 Mar 2020 16:55:18 -0700 Message-ID: <158500771845.2088294.637621783660044227.stgit@dwillia2-desk3.amr.corp.intel.com> In-Reply-To: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> References: <158500767138.2088294.17131646259803932461.stgit@dwillia2-desk3.amr.corp.intel.com> User-Agent: StGit/0.18-3-g996c MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Make the device-dax 'size' attribute writable to allow capacity to be split between multiple instances in a region. The intended consumers of this capability are users that want to split a scarce memory resource between device-dax and System-RAM access, or users that want to have multiple security domains for a large region. By default the hmem instance provider allocates an entire region to the first instance. The process of creating a new instance (assuming a region-id of 0) is find the region and trigger the 'create' attribute which yields an empty instance to configure. For example: cd /sys/bus/dax/devices echo dax0.0 > dax0.0/driver/unbind echo $new_size > dax0.0/size echo 1 > $(readlink -f dax0.0)../dax_region/create seed=$(cat $(readlink -f dax0.0)../dax_region/seed) echo $new_size > $seed/size echo dax0.0 > ../drivers/{device_dax,kmem}/bind echo dax0.1 > ../drivers/{device_dax,kmem}/bind Instances can be destroyed by: echo $device > $(readlink -f $device)../dax_region/delete Signed-off-by: Dan Williams --- drivers/dax/bus.c | 186 ++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 176 insertions(+), 10 deletions(-) diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c index 8db771feed3d..6eb77127bb7d 100644 --- a/drivers/dax/bus.c +++ b/drivers/dax/bus.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "dax-private.h" #include "bus.h" @@ -541,7 +542,8 @@ struct dax_region *alloc_dax_region(struct device *parent, int region_id, } EXPORT_SYMBOL_GPL(alloc_dax_region); -static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) +static int __alloc_dev_dax_range(struct dev_dax *dev_dax, u64 start, + resource_size_t size) { struct dax_region *dax_region = dev_dax->region; struct resource *res = &dax_region->res; @@ -550,8 +552,34 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) device_lock_assert(dax_region->dev); + if (dev_WARN_ONCE(&dev_dax->dev, !size, "non-zero size required\n")) + return -EINVAL; + + /* allow default @start when the resource tree is empty */ + if (start == U64_MAX && !res->child) + start = res->start; + if (start == U64_MAX) + return -EINVAL; + + alloc = __request_region(res, start, size, dev_name(dev), 0); + if (!alloc) + return -ENOMEM; + + dev_dax->range = (struct range) { + .start = alloc->start, + .end = alloc->end, + }; + + return 0; +} + +static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) +{ /* handle the seed alloc special case */ if (!size) { + struct dax_region *dax_region = dev_dax->region; + struct resource *res = &dax_region->res; + dev_dax->range = (struct range) { .start = res->start, .end = res->start - 1, @@ -559,18 +587,29 @@ static int alloc_dev_dax_range(struct dev_dax *dev_dax, resource_size_t size) return 0; } - /* TODO: handle multiple allocations per region */ - if (res->child) - return -ENOMEM; + return __alloc_dev_dax_range(dev_dax, U64_MAX, size); +} - alloc = __request_region(res, res->start, size, dev_name(dev), 0); +static int __adjust_dev_dax_range(struct dev_dax *dev_dax, struct resource *res, + resource_size_t size) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + int rc = 0; - if (!alloc) - return -ENOMEM; + device_lock_assert(dax_region->dev); + + if (size) + rc = adjust_resource(res, range->start, size); + else + __release_region(&dax_region->res, range->start, + range_len(range)); + if (rc) + return rc; dev_dax->range = (struct range) { - .start = alloc->start, - .end = alloc->end, + .start = range->start, + .end = range->start + size - 1, }; return 0; @@ -584,7 +623,131 @@ static ssize_t size_show(struct device *dev, return sprintf(buf, "%llu\n", size); } -static DEVICE_ATTR_RO(size); + +static bool alloc_is_aligned(struct dax_region *dax_region, + resource_size_t size) +{ + /* + * The minimum mapping granularity for a device instance is a + * single subsection, unless the arch says otherwise. + */ + return IS_ALIGNED(size, max_t(unsigned long, dax_region->align, + memremap_compat_align())); +} + +static int dev_dax_shrink(struct dev_dax *dev_dax, resource_size_t size) +{ + struct dax_region *dax_region = dev_dax->region; + struct range *range = &dev_dax->range; + struct resource *res, *adjust = NULL; + struct device *dev = &dev_dax->dev; + + for_each_dax_region_resource(dax_region, res) + if (strcmp(res->name, dev_name(dev)) == 0 + && res->start == range->start) { + adjust = res; + break; + } + + if (dev_WARN_ONCE(dev, !adjust, "failed to find matching resource\n")) + return -ENXIO; + return __adjust_dev_dax_range(dev_dax, adjust, size); +} + +static ssize_t dev_dax_resize(struct dax_region *dax_region, + struct dev_dax *dev_dax, resource_size_t size) +{ + resource_size_t avail = dax_region_avail_size(dax_region), to_alloc; + resource_size_t dev_size = range_len(&dev_dax->range); + struct resource *region_res = &dax_region->res; + struct device *dev = &dev_dax->dev; + const char *name = dev_name(dev); + struct resource *res, *first; + + if (dev->driver) + return -EBUSY; + if (size == dev_size) + return 0; + if (size > dev_size && size - dev_size > avail) + return -ENOSPC; + if (size < dev_size) + return dev_dax_shrink(dev_dax, size); + + to_alloc = size - dev_size; + if (dev_WARN_ONCE(dev, !alloc_is_aligned(dax_region, to_alloc), + "resize of %pa misaligned\n", &to_alloc)) + return -ENXIO; + + /* + * Expand the device into the unused portion of the region. This + * may involve adjusting the end of an existing resource, or + * allocating a new resource. + */ + first = region_res->child; + if (!first) + return __alloc_dev_dax_range(dev_dax, dax_region->res.start, + to_alloc); + for (res = first; to_alloc && res; res = res->sibling) { + struct resource *next = res->sibling; + resource_size_t free; + + /* space at the beginning of the region */ + free = 0; + if (res == first && res->start > dax_region->res.start) + free = res->start - dax_region->res.start; + if (free >= to_alloc && dev_size == 0) + return __alloc_dev_dax_range(dev_dax, + dax_region->res.start, to_alloc); + + free = 0; + /* space between allocations */ + if (next && next->start > res->end + 1) + free = next->start - res->end + 1; + + /* space at the end of the region */ + if (free < to_alloc && !next && res->end < region_res->end) + free = region_res->end - res->end; + + if (free >= to_alloc && strcmp(name, res->name) == 0) + return __adjust_dev_dax_range(dev_dax, res, + resource_size(res) + to_alloc); + else if (free >= to_alloc && dev_size == 0) + return __alloc_dev_dax_range(dev_dax, res->end + 1, + to_alloc); + } + return -ENOSPC; +} + +static ssize_t size_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t len) +{ + ssize_t rc; + unsigned long long val; + struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; + + rc = kstrtoull(buf, 0, &val); + if (rc) + return rc; + + if (!alloc_is_aligned(dax_region, val)) { + dev_dbg(dev, "%s: size: %lld misaligned\n", __func__, val); + return -EINVAL; + } + + device_lock(dax_region->dev); + if (!dax_region->dev->driver) { + device_unlock(dax_region->dev); + return -ENXIO; + } + device_lock(dev); + rc = dev_dax_resize(dax_region, dev_dax, val); + device_unlock(dev); + device_unlock(dax_region->dev); + + return rc == 0 ? len : rc; +} +static DEVICE_ATTR_RW(size); static int dev_dax_target_node(struct dev_dax *dev_dax) { @@ -633,11 +796,14 @@ static umode_t dev_dax_visible(struct kobject *kobj, struct attribute *a, int n) { struct device *dev = container_of(kobj, struct device, kobj); struct dev_dax *dev_dax = to_dev_dax(dev); + struct dax_region *dax_region = dev_dax->region; if (a == &dev_attr_target_node.attr && dev_dax_target_node(dev_dax) < 0) return 0; if (a == &dev_attr_numa_node.attr && !IS_ENABLED(CONFIG_NUMA)) return 0; + if (a == &dev_attr_size.attr && is_static(dax_region)) + return 0444; return a->mode; }