Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4186370pxj; Tue, 15 Jun 2021 17:48:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyPd13Rpqj3tK3yfxgB0gLP/1Gg1ZXMwMt2wbIUV8pfgF7m2mmi4h0EYNPBWbcEnqiGoIm3 X-Received: by 2002:a17:906:2752:: with SMTP id a18mr2451011ejd.458.1623804492537; Tue, 15 Jun 2021 17:48:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1623804492; cv=none; d=google.com; s=arc-20160816; b=tYglfcVzlt5tno46iPEBTDI7s2djwmKde63x00PMEwm1rz6ILvzKEWtnrVeriM/Rfz Hvpm25JlvEo4JJ34gwptnH/zeHBvLG5ekGn8aGCM6utkPu/mILUagXUm5gzlpYsQq7/w QBdpk55BDnY0oVX4PEUwdCAhBZ8P9SVauy+Uo+RKlG2hTkbrKqH5EbXwbNhfxQUOjKcM aB+ji1ur6ByDoe4/AmUpMuVHaqG8KtQioP0b2f03mgXBnmSMQANlm7WNSCAzySzPJNrz gM1U9QHzeuOV0F+TUztaPp8hDx1+V3k05Eqnbstcqjtm95wVp/CVpAI80/T+G55ukbJf JgEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=1UGOQAS/gnlRXyh5KGqeT8Vh25LI5tpduRj0+a7sMPk=; b=Oi+WgbDasT7j6CeUcP2+ErUhZLGjvch664TbdTZuRfEahbSVD4WENiRR3lp/wDd2fX A9A3x+uJ2pf/ABfAXsKZZjhF+adtHKqMA7nB0ydt6kktSBIPSb3mphf1QLUvZREejUxc BLaWxVGyNIT7vpJ5WZ1pzM1KW7gzaBnCl93XdPBqqvo1dggx0cL0XBNvv8ZnindRJshP C0bIoKOI579w2zGnYtKQcpyZTJtUvE9Q1ZS8LlcXVnce5x6oPK2gFBzevZ8dxe1QG28T RpwvE9KK5UXc5+FYX/YjBiyDb4LNXLb0+nd7+gCAXd1W2ZUw5RrpWrWUCzFa0iK7XGR7 AFzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vo8kqB3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id nc38si612399ejc.36.2021.06.15.17.47.50; Tue, 15 Jun 2021 17:48:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=vo8kqB3u; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231809AbhFPAsi (ORCPT + 99 others); Tue, 15 Jun 2021 20:48:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231703AbhFPAsh (ORCPT ); Tue, 15 Jun 2021 20:48:37 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E9F6C061574 for ; Tue, 15 Jun 2021 17:46:31 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id mj8-20020a17090b3688b029016ee34fc1b3so758862pjb.0 for ; Tue, 15 Jun 2021 17:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=1UGOQAS/gnlRXyh5KGqeT8Vh25LI5tpduRj0+a7sMPk=; b=vo8kqB3upoOfLkcjr7BktRu/G8ZVEhvIzJRE4kky8AJ4PfRn8k2gzYYoapgpQuUbxb eW8RGrn7MgbOV4zezM2DvxEUGloaOcI+ZwM7w+hGK1ODQGZY61Whd9RnPKTR0JjPdOAy gZlhjQRZo7WgJTr5eaAqs0zFWg4doaen5CBF+OQ/OSQS1JEdId+qBnmjbsRXh8Bcc94F UwUUw+zf7f3sRdQ6CmITUZn9WRx5c/WThXOvJbtfrGxqQQj9KFvUmE3zt4KqLqj3Vi9e 8vOqYXgaBsZ+ftPDLJY3C8RjB2YsfygyA0wu/S313rMtXG+DAjfSWIE7PNAJaUVVbjMq fihg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=1UGOQAS/gnlRXyh5KGqeT8Vh25LI5tpduRj0+a7sMPk=; b=aXF/5m30Uxbx45Qb0crzoSQ+mPb8vYqJkjdjzx6ldjaKNBLEosPB/dp9Fy/3SBJEBG Yn7VHU329MbQxou2fuSTat2o13J9BSW/bgtK8w35waytSwDjpDxyqs51QHTBjiB9F9VO Eat1veNcYgaFqi+okTrQmNvH2ZUJm7CvENygtKWcUtCYjJP4YwSUaAwGxwRbDvMMx2nM imRJXQD+d+tjmVjt+nq/QA7Q0jN82qIiFIZ7w1eUfVjbJHtbLSIKdeVvkpgzDgOYvy6Q xMGO2qHKDQd3xHPYQ6cnk8nB8mz8aoIrfxLqHFeL4hzWp6Oj1G/13JtrvvnMedOxL5OO AIOQ== X-Gm-Message-State: AOAM531chZgpMxCqTUUIpoTCrjYCxJAj5K3RjZRJDKyBgAMrwK8sGuxw PkpZcpJn3mOYZ8U47Dbe6iNCVqQdF7vnRe8OM0XkVw== X-Received: by 2002:a17:90a:ea8c:: with SMTP id h12mr7535174pjz.149.1623804390579; Tue, 15 Jun 2021 17:46:30 -0700 (PDT) MIME-Version: 1.0 References: <20210604011844.1756145-1-ruansy.fnst@fujitsu.com> <20210604011844.1756145-3-ruansy.fnst@fujitsu.com> In-Reply-To: <20210604011844.1756145-3-ruansy.fnst@fujitsu.com> From: Dan Williams Date: Tue, 15 Jun 2021 17:46:19 -0700 Message-ID: Subject: Re: [PATCH v4 02/10] dax: Introduce holder for dax_device To: Shiyang Ruan Cc: Linux Kernel Mailing List , linux-xfs , linux-nvdimm , Linux MM , linux-fsdevel , device-mapper development , "Darrick J. Wong" , david , Christoph Hellwig , Alasdair Kergon , Mike Snitzer , Goldwyn Rodrigues Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 3, 2021 at 6:19 PM Shiyang Ruan wrote: > > To easily track filesystem from a pmem device, we introduce a holder for > dax_device structure, and also its operation. This holder is used to > remember who is using this dax_device: > - When it is the backend of a filesystem, the holder will be the > superblock of this filesystem. > - When this pmem device is one of the targets in a mapped device, the > holder will be this mapped device. In this case, the mapped device > has its own dax_device and it will follow the first rule. So that we > can finally track to the filesystem we needed. > > The holder and holder_ops will be set when filesystem is being mounted, > or an target device is being activated. > > Signed-off-by: Shiyang Ruan > --- > drivers/dax/super.c | 38 ++++++++++++++++++++++++++++++++++++++ > include/linux/dax.h | 10 ++++++++++ > 2 files changed, 48 insertions(+) > > diff --git a/drivers/dax/super.c b/drivers/dax/super.c > index 5fa6ae9dbc8b..d118e2a7dc70 100644 > --- a/drivers/dax/super.c > +++ b/drivers/dax/super.c > @@ -222,8 +222,10 @@ struct dax_device { > struct cdev cdev; > const char *host; > void *private; @private is likely too generic of a name now, it would be better to call this @parent. > + void *holder; This should probably be called holder_data, and this structure could use some kernel-doc to clarify what the fields mean. > unsigned long flags; > const struct dax_operations *ops; > + const struct dax_holder_operations *holder_ops; > }; > > static ssize_t write_cache_show(struct device *dev, > @@ -373,6 +375,24 @@ int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, > } > EXPORT_SYMBOL_GPL(dax_zero_page_range); > > +int dax_corrupted_range(struct dax_device *dax_dev, struct block_device *bdev, > + loff_t offset, size_t size, void *data) Why is @bdev an argument to this routine? The primary motivation for a 'struct dax_device' is to break the association with 'struct block_device'. The filesystem may know that the logical addresses associated with a given dax_dev alias with the logical addresses of a given bdev, but that knowledge need not leak into the API. > +{ > + int rc = -ENXIO; > + if (!dax_dev) > + return rc; > + > + if (dax_dev->holder) { > + rc = dax_dev->holder_ops->corrupted_range(dax_dev, bdev, offset, > + size, data); A bikeshed comment, but I do not like the name corrupted_range(), because "corrupted" implies a permanent state. The source of this notification is memory_failure() and that does not convey "permanent" vs "transient" it just reports "failure". So, to keep the naming consistent with the pgmap notification callback lets call this one "notify_failure". > + if (rc == -ENODEV) > + rc = -ENXIO; > + } else > + rc = -EOPNOTSUPP; > + return rc; > +} > +EXPORT_SYMBOL_GPL(dax_corrupted_range); dax_holder_notify_failure() makes it clearer that this is communicating a failure up the holder stack. > + > #ifdef CONFIG_ARCH_HAS_PMEM_API > void arch_wb_cache_pmem(void *addr, size_t size); > void dax_flush(struct dax_device *dax_dev, void *addr, size_t size) > @@ -624,6 +644,24 @@ void put_dax(struct dax_device *dax_dev) > } > EXPORT_SYMBOL_GPL(put_dax); > > +void dax_set_holder(struct dax_device *dax_dev, void *holder, > + const struct dax_holder_operations *ops) > +{ > + if (!dax_dev) > + return; > + dax_dev->holder = holder; > + dax_dev->holder_ops = ops; I think there needs to be some synchronization here, perhaps a global dax_dev_rwsem that is taken for read in the notification path and write when adding a holder to the chain. I also wonder if this should be an event that triggers a dax_dev stack to re-report any failure notifications. For example the pmem driver may have recorded a list of bad blocks at the beginning of time. Likely the filesystem or other holder would like to get that pre-existing list of failures at first registration. Have you given thought about how the filesystem is told about pre-existing badblocks? > +} > +EXPORT_SYMBOL_GPL(dax_set_holder); > + > +void *dax_get_holder(struct dax_device *dax_dev) > +{ > + if (!dax_dev) > + return NULL; > + return dax_dev->holder; > +} > +EXPORT_SYMBOL_GPL(dax_get_holder); Where is this used?