2021-10-18 04:41:37

by Christoph Hellwig

[permalink] [raw]
Subject: futher decouple DAX from block devices

Hi Dan,

this series cleans up and simplifies the association between DAX and block
devices in preparation of allowing to mount file systems directly on DAX
devices without a detour through block devices.

Diffstat:
drivers/dax/Kconfig | 4
drivers/dax/bus.c | 2
drivers/dax/super.c | 220 +++++--------------------------------------
drivers/md/dm-linear.c | 51 +++------
drivers/md/dm-log-writes.c | 44 +++-----
drivers/md/dm-stripe.c | 65 +++---------
drivers/md/dm-table.c | 22 ++--
drivers/md/dm-writecache.c | 2
drivers/md/dm.c | 29 -----
drivers/md/dm.h | 4
drivers/nvdimm/Kconfig | 2
drivers/nvdimm/pmem.c | 9 -
drivers/s390/block/Kconfig | 2
drivers/s390/block/dcssblk.c | 12 +-
fs/dax.c | 13 ++
fs/erofs/super.c | 11 +-
fs/ext2/super.c | 6 -
fs/ext4/super.c | 9 +
fs/fuse/Kconfig | 2
fs/fuse/virtio_fs.c | 2
fs/xfs/xfs_super.c | 54 +++++-----
include/linux/dax.h | 30 ++---
22 files changed, 185 insertions(+), 410 deletions(-)


2021-10-18 04:41:37

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev

fs_dax_get_by_bdev is the primary interface to find a dax device for a
block device, so move the partition alignment check there instead of
wiring it up through ->dax_supported.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/dax/super.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 04fc680542e8d..482fe775324a4 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;

+ if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
+ (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
+ pr_info("%pg: error: unaligned partition for dax\n", bdev);
+ return NULL;
+ }
+
id = dax_read_lock();
dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
@@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
{
- pgoff_t pgoff, pgoff_end;
- sector_t last_page;
- int err;
-
if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
return false;
@@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}

- err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
- if (err) {
- pr_info("%pg: error: unaligned partition for dax\n", bdev);
- return false;
- }
-
- last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
- err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
- if (err) {
- pr_info("%pg: error: unaligned partition for dax\n", bdev);
- return false;
- }
-
return true;
}
EXPORT_SYMBOL_GPL(generic_fsdax_supported);
--
2.30.2

2021-10-18 04:41:38

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association

Replace the dax_host_hash with an xarray indexed by the pointer value
of the gendisk, and require explicitl calls from the block drivers that
want to associate their gendisk with a dax_device.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/dax/bus.c | 2 +-
drivers/dax/super.c | 106 +++++++++--------------------------
drivers/md/dm.c | 6 +-
drivers/nvdimm/pmem.c | 8 ++-
drivers/s390/block/dcssblk.c | 11 +++-
fs/fuse/virtio_fs.c | 2 +-
include/linux/dax.h | 19 +++++--
7 files changed, 60 insertions(+), 94 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 6cc4da4c713d9..6d91b0186e3be 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -1326,7 +1326,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
* No 'host' or dax_operations since there is no access to this
* device outside of mmap of the resulting character device.
*/
- dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
+ dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
if (IS_ERR(dax_dev)) {
rc = PTR_ERR(dax_dev);
goto err_alloc_dax;
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e20d0cef10a18..9383c11b21853 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -7,10 +7,8 @@
#include <linux/mount.h>
#include <linux/pseudo_fs.h>
#include <linux/magic.h>
-#include <linux/genhd.h>
#include <linux/pfn_t.h>
#include <linux/cdev.h>
-#include <linux/hash.h>
#include <linux/slab.h>
#include <linux/uio.h>
#include <linux/dax.h>
@@ -26,10 +24,8 @@
* @flags: state and boolean properties
*/
struct dax_device {
- struct hlist_node list;
struct inode inode;
struct cdev cdev;
- const char *host;
void *private;
unsigned long flags;
const struct dax_operations *ops;
@@ -42,10 +38,6 @@ static DEFINE_IDA(dax_minor_ida);
static struct kmem_cache *dax_cache __read_mostly;
static struct super_block *dax_superblock __read_mostly;

-#define DAX_HASH_SIZE (PAGE_SIZE / sizeof(struct hlist_head))
-static struct hlist_head dax_host_list[DAX_HASH_SIZE];
-static DEFINE_SPINLOCK(dax_host_lock);
-
int dax_read_lock(void)
{
return srcu_read_lock(&dax_srcu);
@@ -58,13 +50,22 @@ void dax_read_unlock(int id)
}
EXPORT_SYMBOL_GPL(dax_read_unlock);

-static int dax_host_hash(const char *host)
+#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
+#include <linux/blkdev.h>
+
+static DEFINE_XARRAY(dax_hosts);
+
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
{
- return hashlen_hash(hashlen_string("DAX", host)) % DAX_HASH_SIZE;
+ return xa_insert(&dax_hosts, (unsigned long)disk, dax_dev, GFP_KERNEL);
}
+EXPORT_SYMBOL_GPL(dax_add_host);

-#if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX)
-#include <linux/blkdev.h>
+void dax_remove_host(struct gendisk *disk)
+{
+ xa_erase(&dax_hosts, (unsigned long)disk);
+}
+EXPORT_SYMBOL_GPL(dax_remove_host);

int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
pgoff_t *pgoff)
@@ -82,40 +83,23 @@ EXPORT_SYMBOL(bdev_dax_pgoff);

/**
* dax_get_by_host() - temporary lookup mechanism for filesystem-dax
- * @host: alternate name for the device registered by a dax driver
+ * @bdev: block device to find a dax_device for
*/
-static struct dax_device *dax_get_by_host(const char *host)
+struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
{
- struct dax_device *dax_dev, *found = NULL;
- int hash, id;
+ struct dax_device *dax_dev;
+ int id;

- if (!host)
+ if (!blk_queue_dax(bdev->bd_disk->queue))
return NULL;

- hash = dax_host_hash(host);
-
id = dax_read_lock();
- spin_lock(&dax_host_lock);
- hlist_for_each_entry(dax_dev, &dax_host_list[hash], list) {
- if (!dax_alive(dax_dev)
- || strcmp(host, dax_dev->host) != 0)
- continue;
-
- if (igrab(&dax_dev->inode))
- found = dax_dev;
- break;
- }
- spin_unlock(&dax_host_lock);
+ dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
+ if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
+ dax_dev = NULL;
dax_read_unlock(id);

- return found;
-}
-
-struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
-{
- if (!blk_queue_dax(bdev->bd_disk->queue))
- return NULL;
- return dax_get_by_host(bdev->bd_disk->disk_name);
+ return dax_dev;
}
EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);

@@ -361,12 +345,7 @@ void kill_dax(struct dax_device *dax_dev)
return;

clear_bit(DAXDEV_ALIVE, &dax_dev->flags);
-
synchronize_srcu(&dax_srcu);
-
- spin_lock(&dax_host_lock);
- hlist_del_init(&dax_dev->list);
- spin_unlock(&dax_host_lock);
}
EXPORT_SYMBOL_GPL(kill_dax);

@@ -398,8 +377,6 @@ static struct dax_device *to_dax_dev(struct inode *inode)
static void dax_free_inode(struct inode *inode)
{
struct dax_device *dax_dev = to_dax_dev(inode);
- kfree(dax_dev->host);
- dax_dev->host = NULL;
if (inode->i_rdev)
ida_simple_remove(&dax_minor_ida, iminor(inode));
kmem_cache_free(dax_cache, dax_dev);
@@ -474,54 +451,25 @@ static struct dax_device *dax_dev_get(dev_t devt)
return dax_dev;
}

-static void dax_add_host(struct dax_device *dax_dev, const char *host)
-{
- int hash;
-
- /*
- * Unconditionally init dax_dev since it's coming from a
- * non-zeroed slab cache
- */
- INIT_HLIST_NODE(&dax_dev->list);
- dax_dev->host = host;
- if (!host)
- return;
-
- hash = dax_host_hash(host);
- spin_lock(&dax_host_lock);
- hlist_add_head(&dax_dev->list, &dax_host_list[hash]);
- spin_unlock(&dax_host_lock);
-}
-
-struct dax_device *alloc_dax(void *private, const char *__host,
- const struct dax_operations *ops, unsigned long flags)
+struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
+ unsigned long flags)
{
struct dax_device *dax_dev;
- const char *host;
dev_t devt;
int minor;

- if (ops && !ops->zero_page_range) {
- pr_debug("%s: error: device does not provide dax"
- " operation zero_page_range()\n",
- __host ? __host : "Unknown");
+ if (WARN_ON_ONCE(ops && !ops->zero_page_range))
return ERR_PTR(-EINVAL);
- }
-
- host = kstrdup(__host, GFP_KERNEL);
- if (__host && !host)
- return ERR_PTR(-ENOMEM);

minor = ida_simple_get(&dax_minor_ida, 0, MINORMASK+1, GFP_KERNEL);
if (minor < 0)
- goto err_minor;
+ return ERR_PTR(-ENOMEM);

devt = MKDEV(MAJOR(dax_devt), minor);
dax_dev = dax_dev_get(devt);
if (!dax_dev)
goto err_dev;

- dax_add_host(dax_dev, host);
dax_dev->ops = ops;
dax_dev->private = private;
if (flags & DAXDEV_F_SYNC)
@@ -531,8 +479,6 @@ struct dax_device *alloc_dax(void *private, const char *__host,

err_dev:
ida_simple_remove(&dax_minor_ida, minor);
- err_minor:
- kfree(host);
return ERR_PTR(-ENOMEM);
}
EXPORT_SYMBOL_GPL(alloc_dax);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 79737aee516b1..a0a4703620650 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1683,6 +1683,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
bioset_exit(&md->io_bs);

if (md->dax_dev) {
+ dax_remove_host(md->disk);
kill_dax(md->dax_dev);
put_dax(md->dax_dev);
md->dax_dev = NULL;
@@ -1784,10 +1785,11 @@ static struct mapped_device *alloc_dev(int minor)
sprintf(md->disk->disk_name, "dm-%d", minor);

if (IS_ENABLED(CONFIG_FS_DAX)) {
- md->dax_dev = alloc_dax(md, md->disk->disk_name,
- &dm_dax_ops, 0);
+ md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
if (IS_ERR(md->dax_dev))
goto bad;
+ if (dax_add_host(md->dax_dev, md->disk))
+ goto bad;
}

format_dev_t(md->name, MKDEV(_major, minor));
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 4f187ee86bf71..5628afb808f41 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -399,6 +399,7 @@ static void pmem_release_disk(void *__pmem)
{
struct pmem_device *pmem = __pmem;

+ dax_remove_host(pmem->disk);
kill_dax(pmem->dax_dev);
put_dax(pmem->dax_dev);
del_gendisk(pmem->disk);
@@ -524,10 +525,11 @@ static int pmem_attach_disk(struct device *dev,

if (is_nvdimm_sync(nd_region))
flags = DAXDEV_F_SYNC;
- dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops, flags);
- if (IS_ERR(dax_dev)) {
+ dax_dev = alloc_dax(pmem, &pmem_dax_ops, flags);
+ if (IS_ERR(dax_dev))
return PTR_ERR(dax_dev);
- }
+ if (dax_add_host(dax_dev, disk))
+ return -ENOMEM;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;

diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 27ab888b44d0a..657e492f2bc26 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -687,18 +687,21 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
if (rc)
goto put_dev;

- dev_info->dax_dev = alloc_dax(dev_info, dev_info->gd->disk_name,
- &dcssblk_dax_ops, DAXDEV_F_SYNC);
+ dev_info->dax_dev = alloc_dax(dev_info, &dcssblk_dax_ops,
+ DAXDEV_F_SYNC);
if (IS_ERR(dev_info->dax_dev)) {
rc = PTR_ERR(dev_info->dax_dev);
dev_info->dax_dev = NULL;
goto put_dev;
}
+ rc = dax_add_host(dev_info->dax_dev, dev_info->gd);
+ if (rc)
+ goto out_dax;

get_device(&dev_info->dev);
rc = device_add_disk(&dev_info->dev, dev_info->gd, NULL);
if (rc)
- goto out_dax;
+ goto out_dax_host;

switch (dev_info->segment_type) {
case SEG_TYPE_SR:
@@ -714,6 +717,8 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
rc = count;
goto out;

+out_dax_host:
+ dax_remove_host(dev_info->gd);
out_dax:
put_device(&dev_info->dev);
kill_dax(dev_info->dax_dev);
diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
index 0ad89c6629d74..eda41390bb02a 100644
--- a/fs/fuse/virtio_fs.c
+++ b/fs/fuse/virtio_fs.c
@@ -850,7 +850,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr 0x%llx len 0x%llx\n",
__func__, fs->window_kaddr, cache_reg.addr, cache_reg.len);

- fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops, 0);
+ fs->dax_dev = alloc_dax(fs, &virtio_fs_dax_ops, 0);
if (IS_ERR(fs->dax_dev))
return PTR_ERR(fs->dax_dev);

diff --git a/include/linux/dax.h b/include/linux/dax.h
index 8623caa673889..e2e9a67004cbd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -11,9 +11,11 @@

typedef unsigned long dax_entry_t;

+struct dax_device;
+struct gendisk;
struct iomap_ops;
struct iomap;
-struct dax_device;
+
struct dax_operations {
/*
* direct_access: translate a device-relative
@@ -39,8 +41,8 @@ struct dax_operations {
};

#if IS_ENABLED(CONFIG_DAX)
-struct dax_device *alloc_dax(void *private, const char *host,
- const struct dax_operations *ops, unsigned long flags);
+struct dax_device *alloc_dax(void *private, const struct dax_operations *ops,
+ unsigned long flags);
void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
void dax_write_cache(struct dax_device *dax_dev, bool wc);
@@ -68,7 +70,7 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
return dax_synchronous(dax_dev);
}
#else
-static inline struct dax_device *alloc_dax(void *private, const char *host,
+static inline struct dax_device *alloc_dax(void *private,
const struct dax_operations *ops, unsigned long flags)
{
/*
@@ -107,6 +109,8 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
struct writeback_control;
int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
#if IS_ENABLED(CONFIG_FS_DAX)
+int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
+void dax_remove_host(struct gendisk *disk);
bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors);
@@ -128,6 +132,13 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t st
dax_entry_t dax_lock_page(struct page *page);
void dax_unlock_page(struct page *page, dax_entry_t cookie);
#else
+static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
+{
+ return 0;
+}
+static inline void dax_remove_host(struct gendisk *disk)
+{
+}
#define generic_fsdax_supported NULL

static inline bool dax_supported(struct dax_device *dax_dev,
--
2.30.2

2021-10-18 04:41:45

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

Add a helper to perform the entire remapping for DAX accesses. This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/md/dm-linear.c | 49 +++++++++++++-----------------------------
1 file changed, 15 insertions(+), 34 deletions(-)

diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index 32fbab11bf90c..bf03f73fd0f36 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti,
}

#if IS_ENABLED(CONFIG_FS_DAX)
+static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
+{
+ struct linear_c *lc = ti->private;
+ sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
+
+ *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
+ return lc->dev->dax_dev;
+}
+
static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
- long ret;
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct dax_device *dax_dev = lc->dev->dax_dev;
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
- dev_sector = linear_map_sector(ti, sector);
- ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
- if (ret)
- return ret;
+ struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
}

static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct dax_device *dax_dev = lc->dev->dax_dev;
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+ struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);

- dev_sector = linear_map_sector(ti, sector);
- if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}

static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct dax_device *dax_dev = lc->dev->dax_dev;
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
+ struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);

- dev_sector = linear_map_sector(ti, sector);
- if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
}

static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
size_t nr_pages)
{
- int ret;
- struct linear_c *lc = ti->private;
- struct block_device *bdev = lc->dev->bdev;
- struct dax_device *dax_dev = lc->dev->dax_dev;
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
-
- dev_sector = linear_map_sector(ti, sector);
- ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
- if (ret)
- return ret;
+ struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
+
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
}

--
2.30.2

2021-10-18 04:41:49

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 07/11] dax: remove dax_capable

Just open code the block size and dax_dev == NULL checks in the callers.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/dax/super.c | 36 ------------------------------------
drivers/md/dm-table.c | 22 +++++++++++-----------
drivers/md/dm.c | 21 ---------------------
drivers/md/dm.h | 4 ----
drivers/nvdimm/pmem.c | 1 -
drivers/s390/block/dcssblk.c | 1 -
fs/erofs/super.c | 11 +++++++----
fs/ext2/super.c | 6 ++++--
fs/ext4/super.c | 9 ++++++---
fs/xfs/xfs_super.c | 21 ++++++++-------------
include/linux/dax.h | 14 --------------
11 files changed, 36 insertions(+), 110 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 482fe775324a4..803942586d1b6 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
return dax_dev;
}
EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
-
-bool generic_fsdax_supported(struct dax_device *dax_dev,
- struct block_device *bdev, int blocksize, sector_t start,
- sector_t sectors)
-{
- if (blocksize != PAGE_SIZE) {
- pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
- return false;
- }
-
- if (!dax_dev) {
- pr_debug("%pg: error: dax unsupported by block device\n", bdev);
- return false;
- }
-
- return true;
-}
-EXPORT_SYMBOL_GPL(generic_fsdax_supported);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
- int blocksize, sector_t start, sector_t len)
-{
- bool ret = false;
- int id;
-
- if (!dax_dev)
- return false;
-
- id = dax_read_lock();
- if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
- ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
- start, len);
- dax_read_unlock(id);
- return ret;
-}
-EXPORT_SYMBOL_GPL(dax_supported);
#endif /* CONFIG_BLOCK && CONFIG_FS_DAX */

enum dax_device_flags {
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 1fa4d5582dca5..4ae671c2168ea 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type)
EXPORT_SYMBOL_GPL(dm_table_set_type);

/* validate the dax capability of the target device span */
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
+static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
sector_t start, sector_t len, void *data)
{
- int blocksize = *(int *) data;
+ if (dev->dax_dev)
+ return false;

- return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
+ pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
+ return true;
}

/* Check devices support synchronous DAX */
@@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct dm_target *ti, struct dm_de
return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
}

-bool dm_table_supports_dax(struct dm_table *t,
- iterate_devices_callout_fn iterate_fn, int *blocksize)
+static bool dm_table_supports_dax(struct dm_table *t,
+ iterate_devices_callout_fn iterate_fn)
{
struct dm_target *ti;
unsigned i;
@@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
return false;

if (!ti->type->iterate_devices ||
- ti->type->iterate_devices(ti, iterate_fn, blocksize))
+ ti->type->iterate_devices(ti, iterate_fn, NULL))
return false;
}

@@ -863,7 +865,6 @@ static int dm_table_determine_type(struct dm_table *t)
struct dm_target *tgt;
struct list_head *devices = dm_table_get_devices(t);
enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
- int page_size = PAGE_SIZE;

if (t->type != DM_TYPE_NONE) {
/* target already set the table's type */
@@ -907,7 +908,7 @@ static int dm_table_determine_type(struct dm_table *t)
verify_bio_based:
/* We must use this table as bio-based */
t->type = DM_TYPE_BIO_BASED;
- if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
+ if (dm_table_supports_dax(t, device_not_dax_capable) ||
(list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
t->type = DM_TYPE_DAX_BIO_BASED;
}
@@ -1981,7 +1982,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
struct queue_limits *limits)
{
bool wc = false, fua = false;
- int page_size = PAGE_SIZE;
int r;

/*
@@ -2015,9 +2015,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
}
blk_queue_write_cache(q, wc, fua);

- if (dm_table_supports_dax(t, device_not_dax_capable, &page_size)) {
+ if (dm_table_supports_dax(t, device_not_dax_capable)) {
blk_queue_flag_set(QUEUE_FLAG_DAX, q);
- if (dm_table_supports_dax(t, device_not_dax_synchronous_capable, NULL))
+ if (dm_table_supports_dax(t, device_not_dax_synchronous_capable))
set_dax_synchronous(t->md->dax_dev);
}
else
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a0a4703620650..f896ad29a67a7 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1027,26 +1027,6 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
return ret;
}

-static bool dm_dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
- int blocksize, sector_t start, sector_t len)
-{
- struct mapped_device *md = dax_get_private(dax_dev);
- struct dm_table *map;
- bool ret = false;
- int srcu_idx;
-
- map = dm_get_live_table(md, &srcu_idx);
- if (!map)
- goto out;
-
- ret = dm_table_supports_dax(map, device_not_dax_capable, &blocksize);
-
-out:
- dm_put_live_table(md, srcu_idx);
-
- return ret;
-}
-
static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
@@ -3050,7 +3030,6 @@ static const struct block_device_operations dm_rq_blk_dops = {

static const struct dax_operations dm_dax_ops = {
.direct_access = dm_dax_direct_access,
- .dax_supported = dm_dax_supported,
.copy_from_iter = dm_dax_copy_from_iter,
.copy_to_iter = dm_dax_copy_to_iter,
.zero_page_range = dm_dax_zero_page_range,
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 742d9c80efe19..9013dc1a7b002 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -73,10 +73,6 @@ bool dm_table_bio_based(struct dm_table *t);
bool dm_table_request_based(struct dm_table *t);
void dm_table_free_md_mempools(struct dm_table *t);
struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
-bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
- int *blocksize);
-int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
- sector_t start, sector_t len, void *data);

void dm_lock_md_type(struct mapped_device *md);
void dm_unlock_md_type(struct mapped_device *md);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 5628afb808f41..428a485800058 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -321,7 +321,6 @@ static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,

static const struct dax_operations pmem_dax_ops = {
.direct_access = pmem_dax_direct_access,
- .dax_supported = generic_fsdax_supported,
.copy_from_iter = pmem_copy_from_iter,
.copy_to_iter = pmem_copy_to_iter,
.zero_page_range = pmem_dax_zero_page_range,
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 657e492f2bc26..e65e83764d1ce 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -72,7 +72,6 @@ static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,

static const struct dax_operations dcssblk_dax_ops = {
.direct_access = dcssblk_dax_direct_access,
- .dax_supported = generic_fsdax_supported,
.copy_from_iter = dcssblk_dax_copy_from_iter,
.copy_to_iter = dcssblk_dax_copy_to_iter,
.zero_page_range = dcssblk_dax_zero_page_range,
diff --git a/fs/erofs/super.c b/fs/erofs/super.c
index b8f042c3e7e67..530d7b1e0f138 100644
--- a/fs/erofs/super.c
+++ b/fs/erofs/super.c
@@ -649,10 +649,13 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
if (err)
return err;

- if (test_opt(&sbi->opt, DAX_ALWAYS) &&
- !dax_supported(sbi->dax_dev, sb->s_bdev, EROFS_BLKSIZ, 0, bdev_nr_sectors(sb->s_bdev))) {
- errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
- clear_opt(&sbi->opt, DAX_ALWAYS);
+ if (test_opt(&sbi->opt, DAX_ALWAYS)) {
+ BUILD_BUG_ON(EROFS_BLKSIZ != PAGE_SIZE);
+
+ if (!sbi->dax_dev) {
+ errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
+ clear_opt(&sbi->opt, DAX_ALWAYS);
+ }
}
sb->s_flags |= SB_RDONLY | SB_NOATIME;
sb->s_maxbytes = MAX_LFS_FILESIZE;
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index d8d580b609baa..a964066a80aa7 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -946,11 +946,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);

if (test_opt(sb, DAX)) {
- if (!dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
- bdev_nr_sectors(sb->s_bdev))) {
+ if (!dax_dev) {
ext2_msg(sb, KERN_ERR,
"DAX unsupported by block device. Turning off DAX.");
clear_opt(sbi->s_mount_opt, DAX);
+ } else if (blocksize != PAGE_SIZE) {
+ ext2_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
+ clear_opt(sbi->s_mount_opt, DAX);
}
}

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6361ea1f97bc5..f571be3a6252b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4291,9 +4291,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount;
}

- if (dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
- bdev_nr_sectors(sb->s_bdev)))
- set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
+ if (dax_dev) {
+ if (blocksize == PAGE_SIZE)
+ set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
+ else
+ ext4_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
+ }

if (sbi->s_mount_opt & EXT4_MOUNT_DAX_ALWAYS) {
if (ext4_has_feature_inline_data(sb)) {
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index d07020a8eb9e3..163ceafbd8fd2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -330,28 +330,23 @@ xfs_set_inode_alloc(
return xfs_is_inode32(mp) ? maxagi : agcount;
}

-static bool
-xfs_buftarg_is_dax(
- struct super_block *sb,
- struct xfs_buftarg *bt)
-{
- return dax_supported(bt->bt_daxdev, bt->bt_bdev, sb->s_blocksize, 0,
- bdev_nr_sectors(bt->bt_bdev));
-}
-
static int
xfs_setup_dax(
struct xfs_mount *mp)
{
- struct super_block *sb = mp->m_super;
-
- if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
- (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+ if (!mp->m_ddev_targp->bt_daxdev &&
+ (!mp->m_rtdev_targp || !mp->m_rtdev_targp->bt_daxdev)) {
xfs_alert(mp,
"DAX unsupported by block device. Turning off DAX.");
goto disable_dax;
}

+ if (mp->m_super->s_blocksize != PAGE_SIZE) {
+ xfs_alert(mp,
+ "DAX not supported for blocksize. Turning off DAX.\n");
+ goto disable_dax;
+ }
+
if (xfs_has_reflink(mp)) {
xfs_alert(mp, "DAX and reflink cannot be used together!");
return -EINVAL;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index e2e9a67004cbd..439c3c70e347b 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -111,12 +111,6 @@ int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
#if IS_ENABLED(CONFIG_FS_DAX)
int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
void dax_remove_host(struct gendisk *disk);
-bool generic_fsdax_supported(struct dax_device *dax_dev,
- struct block_device *bdev, int blocksize, sector_t start,
- sector_t sectors);
-
-bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
- int blocksize, sector_t start, sector_t len);

static inline void fs_put_dax(struct dax_device *dax_dev)
{
@@ -139,14 +133,6 @@ static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
static inline void dax_remove_host(struct gendisk *disk)
{
}
-#define generic_fsdax_supported NULL
-
-static inline bool dax_supported(struct dax_device *dax_dev,
- struct block_device *bdev, int blocksize, sector_t start,
- sector_t len)
-{
- return false;
-}

static inline void fs_put_dax(struct dax_device *dax_dev)
{
--
2.30.2

2021-10-18 04:41:54

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

Add a helper to perform the entire remapping for DAX accesses. This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/md/dm-log-writes.c | 42 +++++++++++++++-----------------------
1 file changed, 17 insertions(+), 25 deletions(-)

diff --git a/drivers/md/dm-log-writes.c b/drivers/md/dm-log-writes.c
index 6d694526881d0..5aac60c1b774c 100644
--- a/drivers/md/dm-log-writes.c
+++ b/drivers/md/dm-log-writes.c
@@ -949,17 +949,21 @@ static int log_dax(struct log_writes_c *lc, sector_t sector, size_t bytes,
return 0;
}

+static struct dax_device *log_writes_dax_pgoff(struct dm_target *ti,
+ pgoff_t *pgoff)
+{
+ struct log_writes_c *lc = ti->private;
+
+ *pgoff += (get_start_sect(lc->dev->bdev) >> PAGE_SECTORS_SHIFT);
+ return lc->dev->dax_dev;
+}
+
static long log_writes_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
- struct log_writes_c *lc = ti->private;
- sector_t sector = pgoff * PAGE_SECTORS;
- int ret;
+ struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);

- ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages * PAGE_SIZE, &pgoff);
- if (ret)
- return ret;
- return dax_direct_access(lc->dev->dax_dev, pgoff, nr_pages, kaddr, pfn);
+ return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
}

static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
@@ -968,11 +972,9 @@ static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
{
struct log_writes_c *lc = ti->private;
sector_t sector = pgoff * PAGE_SECTORS;
+ struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);
int err;

- if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
-
/* Don't bother doing anything if logging has been disabled */
if (!lc->logging_enabled)
goto dax_copy;
@@ -983,34 +985,24 @@ static size_t log_writes_dax_copy_from_iter(struct dm_target *ti,
return 0;
}
dax_copy:
- return dax_copy_from_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+ return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}

static size_t log_writes_dax_copy_to_iter(struct dm_target *ti,
pgoff_t pgoff, void *addr, size_t bytes,
struct iov_iter *i)
{
- struct log_writes_c *lc = ti->private;
- sector_t sector = pgoff * PAGE_SECTORS;
+ struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);

- if (bdev_dax_pgoff(lc->dev->bdev, sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
- return dax_copy_to_iter(lc->dev->dax_dev, pgoff, addr, bytes, i);
+ return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
}

static int log_writes_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
size_t nr_pages)
{
- int ret;
- struct log_writes_c *lc = ti->private;
- sector_t sector = pgoff * PAGE_SECTORS;
+ struct dax_device *dax_dev = log_writes_dax_pgoff(ti, &pgoff);

- ret = bdev_dax_pgoff(lc->dev->bdev, sector, nr_pages << PAGE_SHIFT,
- &pgoff);
- if (ret)
- return ret;
- return dax_zero_page_range(lc->dev->dax_dev, pgoff,
- nr_pages << PAGE_SHIFT);
+ return dax_zero_page_range(dax_dev, pgoff, nr_pages << PAGE_SHIFT);
}

#else
--
2.30.2

2021-10-18 04:41:57

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

Add a helper to perform the entire remapping for DAX accesses. This
helper open codes bdev_dax_pgoff given that the alignment checks have
already been done by the submitting file system and don't need to be
repeated.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/md/dm-stripe.c | 63 ++++++++++--------------------------------
1 file changed, 15 insertions(+), 48 deletions(-)

diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
index f084607220293..50dba3f39274c 100644
--- a/drivers/md/dm-stripe.c
+++ b/drivers/md/dm-stripe.c
@@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
}

#if IS_ENABLED(CONFIG_FS_DAX)
-static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
- long nr_pages, void **kaddr, pfn_t *pfn)
+static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
{
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
struct stripe_c *sc = ti->private;
- struct dax_device *dax_dev;
struct block_device *bdev;
+ sector_t dev_sector;
uint32_t stripe;
- long ret;

- stripe_map_sector(sc, sector, &stripe, &dev_sector);
+ stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector);
dev_sector += sc->stripe[stripe].physical_start;
- dax_dev = sc->stripe[stripe].dev->dax_dev;
bdev = sc->stripe[stripe].dev->bdev;

- ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
- if (ret)
- return ret;
+ *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
+ return sc->stripe[stripe].dev->dax_dev;
+}
+
+static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+ long nr_pages, void **kaddr, pfn_t *pfn)
+{
+ struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
+
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
}

static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
- struct stripe_c *sc = ti->private;
- struct dax_device *dax_dev;
- struct block_device *bdev;
- uint32_t stripe;
-
- stripe_map_sector(sc, sector, &stripe, &dev_sector);
- dev_sector += sc->stripe[stripe].physical_start;
- dax_dev = sc->stripe[stripe].dev->dax_dev;
- bdev = sc->stripe[stripe].dev->bdev;
+ struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);

- if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}

static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
- struct stripe_c *sc = ti->private;
- struct dax_device *dax_dev;
- struct block_device *bdev;
- uint32_t stripe;
-
- stripe_map_sector(sc, sector, &stripe, &dev_sector);
- dev_sector += sc->stripe[stripe].physical_start;
- dax_dev = sc->stripe[stripe].dev->dax_dev;
- bdev = sc->stripe[stripe].dev->bdev;
+ struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);

- if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
- return 0;
return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
}

static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
size_t nr_pages)
{
- int ret;
- sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
- struct stripe_c *sc = ti->private;
- struct dax_device *dax_dev;
- struct block_device *bdev;
- uint32_t stripe;
+ struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);

- stripe_map_sector(sc, sector, &stripe, &dev_sector);
- dev_sector += sc->stripe[stripe].physical_start;
- dax_dev = sc->stripe[stripe].dev->dax_dev;
- bdev = sc->stripe[stripe].dev->bdev;
-
- ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
- if (ret)
- return ret;
return dax_zero_page_range(dax_dev, pgoff, nr_pages);
}

--
2.30.2

2021-10-18 04:41:59

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c

No functional changet, but this will allow for a tighter integration
with the iomap code, including possible passing the partition offset
in the iomap in the future. For now it mostly avoids growing more
callers outside of fs/dax.c.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/dax/super.c | 14 --------------
fs/dax.c | 13 +++++++++++++
include/linux/dax.h | 1 -
3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 803942586d1b6..c0910687fbcb2 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
}
EXPORT_SYMBOL_GPL(dax_remove_host);

-int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
- pgoff_t *pgoff)
-{
- sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
- phys_addr_t phys_off = (start_sect + sector) * 512;
-
- if (pgoff)
- *pgoff = PHYS_PFN(phys_off);
- if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
- return -EINVAL;
- return 0;
-}
-EXPORT_SYMBOL(bdev_dax_pgoff);
-
/**
* dax_get_by_host() - temporary lookup mechanism for filesystem-dax
* @bdev: block device to find a dax_device for
diff --git a/fs/dax.c b/fs/dax.c
index 4e3e5a283a916..eb715363fd667 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
return __dax_invalidate_entry(mapping, index, false);
}

+static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
+ pgoff_t *pgoff)
+{
+ sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
+ phys_addr_t phys_off = (start_sect + sector) * 512;
+
+ if (pgoff)
+ *pgoff = PHYS_PFN(phys_off);
+ if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
+ return -EINVAL;
+ return 0;
+}
+
static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_dev,
sector_t sector, struct page *to, unsigned long vaddr)
{
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 439c3c70e347b..324363b798ecd 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
#endif

struct writeback_control;
-int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
#if IS_ENABLED(CONFIG_FS_DAX)
int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
void dax_remove_host(struct gendisk *disk);
--
2.30.2

2021-10-18 04:42:27

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported

Drivers that register a dax_dev should make sure it works, no need
to double check from the file system.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/dax/super.c | 49 +--------------------------------------------
1 file changed, 1 insertion(+), 48 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 9383c11b21853..04fc680542e8d 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
struct block_device *bdev, int blocksize, sector_t start,
sector_t sectors)
{
- bool dax_enabled = false;
pgoff_t pgoff, pgoff_end;
- void *kaddr, *end_kaddr;
- pfn_t pfn, end_pfn;
sector_t last_page;
- long len, len2;
- int err, id;
+ int err;

if (blocksize != PAGE_SIZE) {
pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
@@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
return false;
}

- id = dax_read_lock();
- len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn);
- len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn);
-
- if (len < 1 || len2 < 1) {
- pr_info("%pg: error: dax access failed (%ld)\n",
- bdev, len < 1 ? len : len2);
- dax_read_unlock(id);
- return false;
- }
-
- if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
- /*
- * An arch that has enabled the pmem api should also
- * have its drivers support pfn_t_devmap()
- *
- * This is a developer warning and should not trigger in
- * production. dax_flush() will crash since it depends
- * on being able to do (page_address(pfn_to_page())).
- */
- WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
- dax_enabled = true;
- } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
- struct dev_pagemap *pgmap, *end_pgmap;
-
- pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
- end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
- if (pgmap && pgmap == end_pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX
- && pfn_t_to_page(pfn)->pgmap == pgmap
- && pfn_t_to_page(end_pfn)->pgmap == pgmap
- && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
- && pfn_t_to_pfn(end_pfn) == PHYS_PFN(__pa(end_kaddr)))
- dax_enabled = true;
- put_dev_pagemap(pgmap);
- put_dev_pagemap(end_pgmap);
-
- }
- dax_read_unlock(id);
-
- if (!dax_enabled) {
- pr_info("%pg: error: dax support not enabled\n", bdev);
- return false;
- }
return true;
}
EXPORT_SYMBOL_GPL(generic_fsdax_supported);
--
2.30.2

2021-10-18 04:43:34

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper

Factor out another DAX setup helper to simplify future changes. Also
move the experimental warning after the checks to not clutter the log
too much if the setup failed.

Signed-off-by: Christoph Hellwig <[email protected]>
---
fs/xfs/xfs_super.c | 47 +++++++++++++++++++++++++++-------------------
1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index c4e0cd1c1c8ca..d07020a8eb9e3 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
bdev_nr_sectors(bt->bt_bdev));
}

+static int
+xfs_setup_dax(
+ struct xfs_mount *mp)
+{
+ struct super_block *sb = mp->m_super;
+
+ if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
+ (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
+ xfs_alert(mp,
+ "DAX unsupported by block device. Turning off DAX.");
+ goto disable_dax;
+ }
+
+ if (xfs_has_reflink(mp)) {
+ xfs_alert(mp, "DAX and reflink cannot be used together!");
+ return -EINVAL;
+ }
+
+ xfs_warn(mp, "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
+ return 0;
+
+disable_dax:
+ xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
+ return 0;
+}
+
STATIC int
xfs_blkdev_get(
xfs_mount_t *mp,
@@ -1592,26 +1618,9 @@ xfs_fs_fill_super(
sb->s_flags |= SB_I_VERSION;

if (xfs_has_dax_always(mp)) {
- bool rtdev_is_dax = false, datadev_is_dax;
-
- xfs_warn(mp,
- "DAX enabled. Warning: EXPERIMENTAL, use at your own risk");
-
- datadev_is_dax = xfs_buftarg_is_dax(sb, mp->m_ddev_targp);
- if (mp->m_rtdev_targp)
- rtdev_is_dax = xfs_buftarg_is_dax(sb,
- mp->m_rtdev_targp);
- if (!rtdev_is_dax && !datadev_is_dax) {
- xfs_alert(mp,
- "DAX unsupported by block device. Turning off DAX.");
- xfs_mount_set_dax_mode(mp, XFS_DAX_NEVER);
- }
- if (xfs_has_reflink(mp)) {
- xfs_alert(mp,
- "DAX and reflink cannot be used together!");
- error = -EINVAL;
+ error = xfs_setup_dax(mp);
+ if (error)
goto out_filestream_unmount;
- }
}

if (xfs_has_discard(mp)) {
--
2.30.2

2021-10-18 12:29:07

by Gao Xiang

[permalink] [raw]
Subject: Re: [PATCH 07/11] dax: remove dax_capable

On Mon, Oct 18, 2021 at 06:40:50AM +0200, Christoph Hellwig wrote:
> Just open code the block size and dax_dev == NULL checks in the callers.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 36 ------------------------------------
> drivers/md/dm-table.c | 22 +++++++++++-----------
> drivers/md/dm.c | 21 ---------------------
> drivers/md/dm.h | 4 ----
> drivers/nvdimm/pmem.c | 1 -
> drivers/s390/block/dcssblk.c | 1 -
> fs/erofs/super.c | 11 +++++++----

For erofs part,

Reviewed-by: Gao Xiang <[email protected]>

Thanks,
Gao Xiang

2021-10-19 15:45:45

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH 07/11] dax: remove dax_capable

On Mon, Oct 18, 2021 at 06:40:50AM +0200, Christoph Hellwig wrote:
> Just open code the block size and dax_dev == NULL checks in the callers.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 36 ------------------------------------
> drivers/md/dm-table.c | 22 +++++++++++-----------
> drivers/md/dm.c | 21 ---------------------
> drivers/md/dm.h | 4 ----
> drivers/nvdimm/pmem.c | 1 -
> drivers/s390/block/dcssblk.c | 1 -
> fs/erofs/super.c | 11 +++++++----
> fs/ext2/super.c | 6 ++++--
> fs/ext4/super.c | 9 ++++++---
> fs/xfs/xfs_super.c | 21 ++++++++-------------
> include/linux/dax.h | 14 --------------
> 11 files changed, 36 insertions(+), 110 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 482fe775324a4..803942586d1b6 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
> return dax_dev;
> }
> EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> -
> -bool generic_fsdax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t sectors)
> -{
> - if (blocksize != PAGE_SIZE) {
> - pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> - return false;
> - }
> -
> - if (!dax_dev) {
> - pr_debug("%pg: error: dax unsupported by block device\n", bdev);
> - return false;
> - }
> -
> - return true;
> -}
> -EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> -
> -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len)
> -{
> - bool ret = false;
> - int id;
> -
> - if (!dax_dev)
> - return false;
> -
> - id = dax_read_lock();
> - if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
> - ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
> - start, len);
> - dax_read_unlock(id);
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(dax_supported);
> #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>
> enum dax_device_flags {
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 1fa4d5582dca5..4ae671c2168ea 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type)
> EXPORT_SYMBOL_GPL(dm_table_set_type);
>
> /* validate the dax capability of the target device span */
> -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> sector_t start, sector_t len, void *data)
> {
> - int blocksize = *(int *) data;
> + if (dev->dax_dev)
> + return false;
>
> - return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> + pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
> + return true;
> }
>
> /* Check devices support synchronous DAX */
> @@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct dm_target *ti, struct dm_de
> return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
> }
>
> -bool dm_table_supports_dax(struct dm_table *t,
> - iterate_devices_callout_fn iterate_fn, int *blocksize)
> +static bool dm_table_supports_dax(struct dm_table *t,
> + iterate_devices_callout_fn iterate_fn)
> {
> struct dm_target *ti;
> unsigned i;
> @@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
> return false;
>
> if (!ti->type->iterate_devices ||
> - ti->type->iterate_devices(ti, iterate_fn, blocksize))
> + ti->type->iterate_devices(ti, iterate_fn, NULL))
> return false;
> }
>
> @@ -863,7 +865,6 @@ static int dm_table_determine_type(struct dm_table *t)
> struct dm_target *tgt;
> struct list_head *devices = dm_table_get_devices(t);
> enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
> - int page_size = PAGE_SIZE;
>
> if (t->type != DM_TYPE_NONE) {
> /* target already set the table's type */
> @@ -907,7 +908,7 @@ static int dm_table_determine_type(struct dm_table *t)
> verify_bio_based:
> /* We must use this table as bio-based */
> t->type = DM_TYPE_BIO_BASED;
> - if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
> + if (dm_table_supports_dax(t, device_not_dax_capable) ||
> (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
> t->type = DM_TYPE_DAX_BIO_BASED;
> }
> @@ -1981,7 +1982,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> struct queue_limits *limits)
> {
> bool wc = false, fua = false;
> - int page_size = PAGE_SIZE;
> int r;
>
> /*
> @@ -2015,9 +2015,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> }
> blk_queue_write_cache(q, wc, fua);
>
> - if (dm_table_supports_dax(t, device_not_dax_capable, &page_size)) {
> + if (dm_table_supports_dax(t, device_not_dax_capable)) {
> blk_queue_flag_set(QUEUE_FLAG_DAX, q);
> - if (dm_table_supports_dax(t, device_not_dax_synchronous_capable, NULL))
> + if (dm_table_supports_dax(t, device_not_dax_synchronous_capable))
> set_dax_synchronous(t->md->dax_dev);
> }
> else
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index a0a4703620650..f896ad29a67a7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1027,26 +1027,6 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> return ret;
> }
>
> -static bool dm_dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len)
> -{
> - struct mapped_device *md = dax_get_private(dax_dev);
> - struct dm_table *map;
> - bool ret = false;
> - int srcu_idx;
> -
> - map = dm_get_live_table(md, &srcu_idx);
> - if (!map)
> - goto out;
> -
> - ret = dm_table_supports_dax(map, device_not_dax_capable, &blocksize);
> -
> -out:
> - dm_put_live_table(md, srcu_idx);
> -
> - return ret;
> -}
> -
> static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> @@ -3050,7 +3030,6 @@ static const struct block_device_operations dm_rq_blk_dops = {
>
> static const struct dax_operations dm_dax_ops = {
> .direct_access = dm_dax_direct_access,
> - .dax_supported = dm_dax_supported,
> .copy_from_iter = dm_dax_copy_from_iter,
> .copy_to_iter = dm_dax_copy_to_iter,
> .zero_page_range = dm_dax_zero_page_range,
> diff --git a/drivers/md/dm.h b/drivers/md/dm.h
> index 742d9c80efe19..9013dc1a7b002 100644
> --- a/drivers/md/dm.h
> +++ b/drivers/md/dm.h
> @@ -73,10 +73,6 @@ bool dm_table_bio_based(struct dm_table *t);
> bool dm_table_request_based(struct dm_table *t);
> void dm_table_free_md_mempools(struct dm_table *t);
> struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
> -bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
> - int *blocksize);
> -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> - sector_t start, sector_t len, void *data);
>
> void dm_lock_md_type(struct mapped_device *md);
> void dm_unlock_md_type(struct mapped_device *md);
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 5628afb808f41..428a485800058 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -321,7 +321,6 @@ static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
>
> static const struct dax_operations pmem_dax_ops = {
> .direct_access = pmem_dax_direct_access,
> - .dax_supported = generic_fsdax_supported,
> .copy_from_iter = pmem_copy_from_iter,
> .copy_to_iter = pmem_copy_to_iter,
> .zero_page_range = pmem_dax_zero_page_range,
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 657e492f2bc26..e65e83764d1ce 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -72,7 +72,6 @@ static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,
>
> static const struct dax_operations dcssblk_dax_ops = {
> .direct_access = dcssblk_dax_direct_access,
> - .dax_supported = generic_fsdax_supported,
> .copy_from_iter = dcssblk_dax_copy_from_iter,
> .copy_to_iter = dcssblk_dax_copy_to_iter,
> .zero_page_range = dcssblk_dax_zero_page_range,
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index b8f042c3e7e67..530d7b1e0f138 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -649,10 +649,13 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> if (err)
> return err;
>
> - if (test_opt(&sbi->opt, DAX_ALWAYS) &&
> - !dax_supported(sbi->dax_dev, sb->s_bdev, EROFS_BLKSIZ, 0, bdev_nr_sectors(sb->s_bdev))) {
> - errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> - clear_opt(&sbi->opt, DAX_ALWAYS);
> + if (test_opt(&sbi->opt, DAX_ALWAYS)) {
> + BUILD_BUG_ON(EROFS_BLKSIZ != PAGE_SIZE);
> +
> + if (!sbi->dax_dev) {
> + errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> + clear_opt(&sbi->opt, DAX_ALWAYS);
> + }
> }
> sb->s_flags |= SB_RDONLY | SB_NOATIME;
> sb->s_maxbytes = MAX_LFS_FILESIZE;
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index d8d580b609baa..a964066a80aa7 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -946,11 +946,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
> blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
>
> if (test_opt(sb, DAX)) {
> - if (!dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
> - bdev_nr_sectors(sb->s_bdev))) {
> + if (!dax_dev) {
> ext2_msg(sb, KERN_ERR,
> "DAX unsupported by block device. Turning off DAX.");
> clear_opt(sbi->s_mount_opt, DAX);
> + } else if (blocksize != PAGE_SIZE) {
> + ext2_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
> + clear_opt(sbi->s_mount_opt, DAX);
> }
> }
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 6361ea1f97bc5..f571be3a6252b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4291,9 +4291,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> goto failed_mount;
> }
>
> - if (dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
> - bdev_nr_sectors(sb->s_bdev)))
> - set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
> + if (dax_dev) {
> + if (blocksize == PAGE_SIZE)
> + set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
> + else
> + ext4_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
> + }
>
> if (sbi->s_mount_opt & EXT4_MOUNT_DAX_ALWAYS) {
> if (ext4_has_feature_inline_data(sb)) {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d07020a8eb9e3..163ceafbd8fd2 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -330,28 +330,23 @@ xfs_set_inode_alloc(
> return xfs_is_inode32(mp) ? maxagi : agcount;
> }
>
> -static bool
> -xfs_buftarg_is_dax(
> - struct super_block *sb,
> - struct xfs_buftarg *bt)
> -{
> - return dax_supported(bt->bt_daxdev, bt->bt_bdev, sb->s_blocksize, 0,
> - bdev_nr_sectors(bt->bt_bdev));
> -}
> -
> static int
> xfs_setup_dax(
> struct xfs_mount *mp)
> {
> - struct super_block *sb = mp->m_super;
> -
> - if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
> - (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
> + if (!mp->m_ddev_targp->bt_daxdev &&
> + (!mp->m_rtdev_targp || !mp->m_rtdev_targp->bt_daxdev)) {
> xfs_alert(mp,
> "DAX unsupported by block device. Turning off DAX.");
> goto disable_dax;
> }
>
> + if (mp->m_super->s_blocksize != PAGE_SIZE) {
> + xfs_alert(mp,
> + "DAX not supported for blocksize. Turning off DAX.\n");

Newlines aren't needed at the end of extX_msg/xfs_alert format strings.

Aside from that, the fs/dax/pmem changes look good to me. I didn't look
as carefully at the dm parts, though as an fs developer, nothing bad
stood out to me.

With the format strings fixed,

Reviewed-by: Darrick J. Wong <[email protected]>

(Sorry I didn't notice this patch had fs changes in it until I saw that
Gao replied to it...)

--D

> + goto disable_dax;
> + }
> +
> if (xfs_has_reflink(mp)) {
> xfs_alert(mp, "DAX and reflink cannot be used together!");
> return -EINVAL;
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index e2e9a67004cbd..439c3c70e347b 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -111,12 +111,6 @@ int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
> #if IS_ENABLED(CONFIG_FS_DAX)
> int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
> void dax_remove_host(struct gendisk *disk);
> -bool generic_fsdax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t sectors);
> -
> -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len);
>
> static inline void fs_put_dax(struct dax_device *dax_dev)
> {
> @@ -139,14 +133,6 @@ static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
> static inline void dax_remove_host(struct gendisk *disk)
> {
> }
> -#define generic_fsdax_supported NULL
> -
> -static inline bool dax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t len)
> -{
> - return false;
> -}
>
> static inline void fs_put_dax(struct dax_device *dax_dev)
> {
> --
> 2.30.2
>
> --
> dm-devel mailing list
> [email protected]
> https://listman.redhat.com/mailman/listinfo/dm-devel
>

2021-10-27 21:33:46

by Dan Williams

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

[ add sfr ]

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Hi Dan,
>
> this series cleans up and simplifies the association between DAX and block
> devices in preparation of allowing to mount file systems directly on DAX
> devices without a detour through block devices.

So I notice that this is based on linux-next while libnvdimm-for-next
is based on v5.15-rc4. Since I'm not Andrew I went ahead and rebased
these onto v5.15-rc4, tested that, and then merged with linux-next to
resolve the conflicts and tested again.

My merge resolution is here [1]. Christoph, please have a look. The
rebase and the merge result are both passing my test and I'm now going
to review the individual patches. However, while I do that and collect
acks from DM and EROFS folks, I want to give Stephen a heads up that
this is coming. Primarily I want to see if someone sees a better
strategy to merge this, please let me know, but if not I plan to walk
Stephen and Linus through the resolution.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/commit/?id=c3894cf6eb8f


>
> Diffstat:
> drivers/dax/Kconfig | 4
> drivers/dax/bus.c | 2
> drivers/dax/super.c | 220 +++++--------------------------------------
> drivers/md/dm-linear.c | 51 +++------
> drivers/md/dm-log-writes.c | 44 +++-----
> drivers/md/dm-stripe.c | 65 +++---------
> drivers/md/dm-table.c | 22 ++--
> drivers/md/dm-writecache.c | 2
> drivers/md/dm.c | 29 -----
> drivers/md/dm.h | 4
> drivers/nvdimm/Kconfig | 2
> drivers/nvdimm/pmem.c | 9 -
> drivers/s390/block/Kconfig | 2
> drivers/s390/block/dcssblk.c | 12 +-
> fs/dax.c | 13 ++
> fs/erofs/super.c | 11 +-
> fs/ext2/super.c | 6 -
> fs/ext4/super.c | 9 +
> fs/fuse/Kconfig | 2
> fs/fuse/virtio_fs.c | 2
> fs/xfs/xfs_super.c | 54 +++++-----
> include/linux/dax.h | 30 ++---
> 22 files changed, 185 insertions(+), 410 deletions(-)

2021-10-27 22:18:08

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Replace the dax_host_hash with an xarray indexed by the pointer value
> of the gendisk, and require explicitl calls from the block drivers that

s/explicitl/explicitl/

I've fixed that up locally.

> want to associate their gendisk with a dax_device.

This looks good. 0day-robot is now chewing on it via the test merge
with linux-next (libnvdimm-pending).

2021-10-27 23:03:34

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 04/11] dax: remove the pgmap sanity checks in generic_fsdax_supported

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Drivers that register a dax_dev should make sure it works, no need
> to double check from the file system.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 49 +--------------------------------------------
> 1 file changed, 1 insertion(+), 48 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 9383c11b21853..04fc680542e8d 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -107,13 +107,9 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> struct block_device *bdev, int blocksize, sector_t start,
> sector_t sectors)
> {
> - bool dax_enabled = false;
> pgoff_t pgoff, pgoff_end;
> - void *kaddr, *end_kaddr;
> - pfn_t pfn, end_pfn;
> sector_t last_page;
> - long len, len2;
> - int err, id;
> + int err;
>
> if (blocksize != PAGE_SIZE) {
> pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> @@ -138,49 +134,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> return false;
> }
>
> - id = dax_read_lock();
> - len = dax_direct_access(dax_dev, pgoff, 1, &kaddr, &pfn);
> - len2 = dax_direct_access(dax_dev, pgoff_end, 1, &end_kaddr, &end_pfn);
> -
> - if (len < 1 || len2 < 1) {
> - pr_info("%pg: error: dax access failed (%ld)\n",
> - bdev, len < 1 ? len : len2);
> - dax_read_unlock(id);
> - return false;
> - }
> -
> - if (IS_ENABLED(CONFIG_FS_DAX_LIMITED) && pfn_t_special(pfn)) {
> - /*
> - * An arch that has enabled the pmem api should also
> - * have its drivers support pfn_t_devmap()
> - *
> - * This is a developer warning and should not trigger in
> - * production. dax_flush() will crash since it depends
> - * on being able to do (page_address(pfn_to_page())).
> - */
> - WARN_ON(IS_ENABLED(CONFIG_ARCH_HAS_PMEM_API));
> - dax_enabled = true;
> - } else if (pfn_t_devmap(pfn) && pfn_t_devmap(end_pfn)) {
> - struct dev_pagemap *pgmap, *end_pgmap;
> -
> - pgmap = get_dev_pagemap(pfn_t_to_pfn(pfn), NULL);
> - end_pgmap = get_dev_pagemap(pfn_t_to_pfn(end_pfn), NULL);
> - if (pgmap && pgmap == end_pgmap && pgmap->type == MEMORY_DEVICE_FS_DAX
> - && pfn_t_to_page(pfn)->pgmap == pgmap
> - && pfn_t_to_page(end_pfn)->pgmap == pgmap
> - && pfn_t_to_pfn(pfn) == PHYS_PFN(__pa(kaddr))
> - && pfn_t_to_pfn(end_pfn) == PHYS_PFN(__pa(end_kaddr)))

This is effectively a self-test for a regression that was found while
manipulating the 'struct page' memmap metadata reservation for PMEM
namespaces.

I guess it's just serving as a security-blanket now and I should find
a way to move it out to a self-test. I'll take a look.

> - dax_enabled = true;
> - put_dev_pagemap(pgmap);
> - put_dev_pagemap(end_pgmap);
> -
> - }
> - dax_read_unlock(id);
> -
> - if (!dax_enabled) {
> - pr_info("%pg: error: dax support not enabled\n", bdev);
> - return false;
> - }
> return true;
> }
> EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> --
> 2.30.2
>

2021-10-27 23:07:49

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 05/11] dax: move the partition alignment check into fs_dax_get_by_bdev

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> fs_dax_get_by_bdev is the primary interface to find a dax device for a
> block device, so move the partition alignment check there instead of
> wiring it up through ->dax_supported.

Looks good.

>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 23 ++++++-----------------
> 1 file changed, 6 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 04fc680542e8d..482fe775324a4 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -93,6 +93,12 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
> if (!blk_queue_dax(bdev->bd_disk->queue))
> return NULL;
>
> + if ((get_start_sect(bdev) * SECTOR_SIZE) % PAGE_SIZE ||
> + (bdev_nr_sectors(bdev) * SECTOR_SIZE) % PAGE_SIZE) {
> + pr_info("%pg: error: unaligned partition for dax\n", bdev);
> + return NULL;
> + }
> +
> id = dax_read_lock();
> dax_dev = xa_load(&dax_hosts, (unsigned long)bdev->bd_disk);
> if (!dax_dev || !dax_alive(dax_dev) || !igrab(&dax_dev->inode))
> @@ -107,10 +113,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> struct block_device *bdev, int blocksize, sector_t start,
> sector_t sectors)
> {
> - pgoff_t pgoff, pgoff_end;
> - sector_t last_page;
> - int err;
> -
> if (blocksize != PAGE_SIZE) {
> pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> return false;
> @@ -121,19 +123,6 @@ bool generic_fsdax_supported(struct dax_device *dax_dev,
> return false;
> }
>
> - err = bdev_dax_pgoff(bdev, start, PAGE_SIZE, &pgoff);
> - if (err) {
> - pr_info("%pg: error: unaligned partition for dax\n", bdev);
> - return false;
> - }
> -
> - last_page = PFN_DOWN((start + sectors - 1) * 512) * PAGE_SIZE / 512;
> - err = bdev_dax_pgoff(bdev, last_page, PAGE_SIZE, &pgoff_end);
> - if (err) {
> - pr_info("%pg: error: unaligned partition for dax\n", bdev);
> - return false;
> - }
> -
> return true;
> }
> EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> --
> 2.30.2
>

2021-10-27 23:37:40

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 06/11] xfs: factor out a xfs_setup_dax helper

On Tue, Oct 19, 2021 at 12:24 AM Christoph Hellwig <[email protected]> wrote:
>
> On Mon, Oct 18, 2021 at 09:43:51AM -0700, Darrick J. Wong wrote:
> > > --- a/fs/xfs/xfs_super.c
> > > +++ b/fs/xfs/xfs_super.c
> > > @@ -339,6 +339,32 @@ xfs_buftarg_is_dax(
> > > bdev_nr_sectors(bt->bt_bdev));
> > > }
> > >
> > > +static int
> > > +xfs_setup_dax(
> >
> > /me wonders if this should be named xfs_setup_dax_always, since this
> > doesn't handle the dax=inode mode?
>
> Sure, why not.

I went ahead and made that change locally.

>
> > The only reason I bring that up is that Eric reminded me a while ago
> > that we don't actually print any kind of EXPERIMENTAL warning for the
> > auto-detection behavior.
>
> Yes, I actually noticed that as well when preparing this series.

The rest looks good to me.

2021-10-28 00:17:18

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 07/11] dax: remove dax_capable

I am going to change the subject of this patch to:

dax: remove ->dax_supported()

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>

I'll add a bit more background to help others review this.

The ->dax_supported() operation arranges for a stack of devices to
each answer the question "is dax operational". That request routes to
generic_fsdax_supported() at last level device and that attempted an
actual dax_direct_access() call and did some sanity checks. However,
those sanity checks can be validated in other ways and with those
removed the only question to answer is "has each block device driver
in the stack performed dax_add_host()". That can be validated without
a dax_operation. So, just open code the block size and dax_dev == NULL
checks in the callers, and delete ->dax_supported().

Mike, let me know if you have any concerns.

> Just open code the block size and dax_dev == NULL checks in the callers.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 36 ------------------------------------
> drivers/md/dm-table.c | 22 +++++++++++-----------
> drivers/md/dm.c | 21 ---------------------
> drivers/md/dm.h | 4 ----
> drivers/nvdimm/pmem.c | 1 -
> drivers/s390/block/dcssblk.c | 1 -
> fs/erofs/super.c | 11 +++++++----
> fs/ext2/super.c | 6 ++++--
> fs/ext4/super.c | 9 ++++++---
> fs/xfs/xfs_super.c | 21 ++++++++-------------
> include/linux/dax.h | 14 --------------
> 11 files changed, 36 insertions(+), 110 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 482fe775324a4..803942586d1b6 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
> return dax_dev;
> }
> EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> -
> -bool generic_fsdax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t sectors)
> -{
> - if (blocksize != PAGE_SIZE) {
> - pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> - return false;
> - }
> -
> - if (!dax_dev) {
> - pr_debug("%pg: error: dax unsupported by block device\n", bdev);
> - return false;
> - }
> -
> - return true;
> -}
> -EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> -
> -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len)
> -{
> - bool ret = false;
> - int id;
> -
> - if (!dax_dev)
> - return false;
> -
> - id = dax_read_lock();
> - if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
> - ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
> - start, len);
> - dax_read_unlock(id);
> - return ret;
> -}
> -EXPORT_SYMBOL_GPL(dax_supported);
> #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
>
> enum dax_device_flags {
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index 1fa4d5582dca5..4ae671c2168ea 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type)
> EXPORT_SYMBOL_GPL(dm_table_set_type);
>
> /* validate the dax capability of the target device span */
> -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> sector_t start, sector_t len, void *data)
> {
> - int blocksize = *(int *) data;
> + if (dev->dax_dev)
> + return false;
>
> - return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> + pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);
> + return true;
> }
>
> /* Check devices support synchronous DAX */
> @@ -822,8 +824,8 @@ static int device_not_dax_synchronous_capable(struct dm_target *ti, struct dm_de
> return !dev->dax_dev || !dax_synchronous(dev->dax_dev);
> }
>
> -bool dm_table_supports_dax(struct dm_table *t,
> - iterate_devices_callout_fn iterate_fn, int *blocksize)
> +static bool dm_table_supports_dax(struct dm_table *t,
> + iterate_devices_callout_fn iterate_fn)
> {
> struct dm_target *ti;
> unsigned i;
> @@ -836,7 +838,7 @@ bool dm_table_supports_dax(struct dm_table *t,
> return false;
>
> if (!ti->type->iterate_devices ||
> - ti->type->iterate_devices(ti, iterate_fn, blocksize))
> + ti->type->iterate_devices(ti, iterate_fn, NULL))
> return false;
> }
>
> @@ -863,7 +865,6 @@ static int dm_table_determine_type(struct dm_table *t)
> struct dm_target *tgt;
> struct list_head *devices = dm_table_get_devices(t);
> enum dm_queue_mode live_md_type = dm_get_md_type(t->md);
> - int page_size = PAGE_SIZE;
>
> if (t->type != DM_TYPE_NONE) {
> /* target already set the table's type */
> @@ -907,7 +908,7 @@ static int dm_table_determine_type(struct dm_table *t)
> verify_bio_based:
> /* We must use this table as bio-based */
> t->type = DM_TYPE_BIO_BASED;
> - if (dm_table_supports_dax(t, device_not_dax_capable, &page_size) ||
> + if (dm_table_supports_dax(t, device_not_dax_capable) ||
> (list_empty(devices) && live_md_type == DM_TYPE_DAX_BIO_BASED)) {
> t->type = DM_TYPE_DAX_BIO_BASED;
> }
> @@ -1981,7 +1982,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> struct queue_limits *limits)
> {
> bool wc = false, fua = false;
> - int page_size = PAGE_SIZE;
> int r;
>
> /*
> @@ -2015,9 +2015,9 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> }
> blk_queue_write_cache(q, wc, fua);
>
> - if (dm_table_supports_dax(t, device_not_dax_capable, &page_size)) {
> + if (dm_table_supports_dax(t, device_not_dax_capable)) {
> blk_queue_flag_set(QUEUE_FLAG_DAX, q);
> - if (dm_table_supports_dax(t, device_not_dax_synchronous_capable, NULL))
> + if (dm_table_supports_dax(t, device_not_dax_synchronous_capable))
> set_dax_synchronous(t->md->dax_dev);
> }
> else
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index a0a4703620650..f896ad29a67a7 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1027,26 +1027,6 @@ static long dm_dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff,
> return ret;
> }
>
> -static bool dm_dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len)
> -{
> - struct mapped_device *md = dax_get_private(dax_dev);
> - struct dm_table *map;
> - bool ret = false;
> - int srcu_idx;
> -
> - map = dm_get_live_table(md, &srcu_idx);
> - if (!map)
> - goto out;
> -
> - ret = dm_table_supports_dax(map, device_not_dax_capable, &blocksize);
> -
> -out:
> - dm_put_live_table(md, srcu_idx);
> -
> - return ret;
> -}
> -
> static size_t dm_dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> @@ -3050,7 +3030,6 @@ static const struct block_device_operations dm_rq_blk_dops = {
>
> static const struct dax_operations dm_dax_ops = {
> .direct_access = dm_dax_direct_access,
> - .dax_supported = dm_dax_supported,
> .copy_from_iter = dm_dax_copy_from_iter,
> .copy_to_iter = dm_dax_copy_to_iter,
> .zero_page_range = dm_dax_zero_page_range,
> diff --git a/drivers/md/dm.h b/drivers/md/dm.h
> index 742d9c80efe19..9013dc1a7b002 100644
> --- a/drivers/md/dm.h
> +++ b/drivers/md/dm.h
> @@ -73,10 +73,6 @@ bool dm_table_bio_based(struct dm_table *t);
> bool dm_table_request_based(struct dm_table *t);
> void dm_table_free_md_mempools(struct dm_table *t);
> struct dm_md_mempools *dm_table_get_md_mempools(struct dm_table *t);
> -bool dm_table_supports_dax(struct dm_table *t, iterate_devices_callout_fn fn,
> - int *blocksize);
> -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> - sector_t start, sector_t len, void *data);
>
> void dm_lock_md_type(struct mapped_device *md);
> void dm_unlock_md_type(struct mapped_device *md);
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 5628afb808f41..428a485800058 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -321,7 +321,6 @@ static size_t pmem_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff,
>
> static const struct dax_operations pmem_dax_ops = {
> .direct_access = pmem_dax_direct_access,
> - .dax_supported = generic_fsdax_supported,
> .copy_from_iter = pmem_copy_from_iter,
> .copy_to_iter = pmem_copy_to_iter,
> .zero_page_range = pmem_dax_zero_page_range,
> diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
> index 657e492f2bc26..e65e83764d1ce 100644
> --- a/drivers/s390/block/dcssblk.c
> +++ b/drivers/s390/block/dcssblk.c
> @@ -72,7 +72,6 @@ static int dcssblk_dax_zero_page_range(struct dax_device *dax_dev,
>
> static const struct dax_operations dcssblk_dax_ops = {
> .direct_access = dcssblk_dax_direct_access,
> - .dax_supported = generic_fsdax_supported,
> .copy_from_iter = dcssblk_dax_copy_from_iter,
> .copy_to_iter = dcssblk_dax_copy_to_iter,
> .zero_page_range = dcssblk_dax_zero_page_range,
> diff --git a/fs/erofs/super.c b/fs/erofs/super.c
> index b8f042c3e7e67..530d7b1e0f138 100644
> --- a/fs/erofs/super.c
> +++ b/fs/erofs/super.c
> @@ -649,10 +649,13 @@ static int erofs_fc_fill_super(struct super_block *sb, struct fs_context *fc)
> if (err)
> return err;
>
> - if (test_opt(&sbi->opt, DAX_ALWAYS) &&
> - !dax_supported(sbi->dax_dev, sb->s_bdev, EROFS_BLKSIZ, 0, bdev_nr_sectors(sb->s_bdev))) {
> - errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> - clear_opt(&sbi->opt, DAX_ALWAYS);
> + if (test_opt(&sbi->opt, DAX_ALWAYS)) {
> + BUILD_BUG_ON(EROFS_BLKSIZ != PAGE_SIZE);
> +
> + if (!sbi->dax_dev) {
> + errorfc(fc, "DAX unsupported by block device. Turning off DAX.");
> + clear_opt(&sbi->opt, DAX_ALWAYS);
> + }
> }
> sb->s_flags |= SB_RDONLY | SB_NOATIME;
> sb->s_maxbytes = MAX_LFS_FILESIZE;
> diff --git a/fs/ext2/super.c b/fs/ext2/super.c
> index d8d580b609baa..a964066a80aa7 100644
> --- a/fs/ext2/super.c
> +++ b/fs/ext2/super.c
> @@ -946,11 +946,13 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
> blocksize = BLOCK_SIZE << le32_to_cpu(sbi->s_es->s_log_block_size);
>
> if (test_opt(sb, DAX)) {
> - if (!dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
> - bdev_nr_sectors(sb->s_bdev))) {
> + if (!dax_dev) {
> ext2_msg(sb, KERN_ERR,
> "DAX unsupported by block device. Turning off DAX.");
> clear_opt(sbi->s_mount_opt, DAX);
> + } else if (blocksize != PAGE_SIZE) {
> + ext2_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
> + clear_opt(sbi->s_mount_opt, DAX);
> }
> }
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 6361ea1f97bc5..f571be3a6252b 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4291,9 +4291,12 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
> goto failed_mount;
> }
>
> - if (dax_supported(dax_dev, sb->s_bdev, blocksize, 0,
> - bdev_nr_sectors(sb->s_bdev)))
> - set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
> + if (dax_dev) {
> + if (blocksize == PAGE_SIZE)
> + set_bit(EXT4_FLAGS_BDEV_IS_DAX, &sbi->s_ext4_flags);
> + else
> + ext4_msg(sb, KERN_ERR, "unsupported blocksize for DAX\n");
> + }
>
> if (sbi->s_mount_opt & EXT4_MOUNT_DAX_ALWAYS) {
> if (ext4_has_feature_inline_data(sb)) {
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index d07020a8eb9e3..163ceafbd8fd2 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -330,28 +330,23 @@ xfs_set_inode_alloc(
> return xfs_is_inode32(mp) ? maxagi : agcount;
> }
>
> -static bool
> -xfs_buftarg_is_dax(
> - struct super_block *sb,
> - struct xfs_buftarg *bt)
> -{
> - return dax_supported(bt->bt_daxdev, bt->bt_bdev, sb->s_blocksize, 0,
> - bdev_nr_sectors(bt->bt_bdev));
> -}
> -
> static int
> xfs_setup_dax(
> struct xfs_mount *mp)
> {
> - struct super_block *sb = mp->m_super;
> -
> - if (!xfs_buftarg_is_dax(sb, mp->m_ddev_targp) &&
> - (!mp->m_rtdev_targp || !xfs_buftarg_is_dax(sb, mp->m_rtdev_targp))) {
> + if (!mp->m_ddev_targp->bt_daxdev &&
> + (!mp->m_rtdev_targp || !mp->m_rtdev_targp->bt_daxdev)) {
> xfs_alert(mp,
> "DAX unsupported by block device. Turning off DAX.");
> goto disable_dax;
> }
>
> + if (mp->m_super->s_blocksize != PAGE_SIZE) {
> + xfs_alert(mp,
> + "DAX not supported for blocksize. Turning off DAX.\n");
> + goto disable_dax;
> + }
> +
> if (xfs_has_reflink(mp)) {
> xfs_alert(mp, "DAX and reflink cannot be used together!");
> return -EINVAL;
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index e2e9a67004cbd..439c3c70e347b 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -111,12 +111,6 @@ int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
> #if IS_ENABLED(CONFIG_FS_DAX)
> int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
> void dax_remove_host(struct gendisk *disk);
> -bool generic_fsdax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t sectors);
> -
> -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> - int blocksize, sector_t start, sector_t len);
>
> static inline void fs_put_dax(struct dax_device *dax_dev)
> {
> @@ -139,14 +133,6 @@ static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk)
> static inline void dax_remove_host(struct gendisk *disk)
> {
> }
> -#define generic_fsdax_supported NULL
> -
> -static inline bool dax_supported(struct dax_device *dax_dev,
> - struct block_device *bdev, int blocksize, sector_t start,
> - sector_t len)
> -{
> - return false;
> -}
>
> static inline void fs_put_dax(struct dax_device *dax_dev)
> {
> --
> 2.30.2
>

2021-10-28 00:21:09

by Dan Williams

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH 07/11] dax: remove dax_capable

On Tue, Oct 19, 2021 at 8:45 AM Darrick J. Wong <[email protected]> wrote:
>
> On Mon, Oct 18, 2021 at 06:40:50AM +0200, Christoph Hellwig wrote:
> > Just open code the block size and dax_dev == NULL checks in the callers.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > ---
> > drivers/dax/super.c | 36 ------------------------------------
> > drivers/md/dm-table.c | 22 +++++++++++-----------
> > drivers/md/dm.c | 21 ---------------------
> > drivers/md/dm.h | 4 ----
> > drivers/nvdimm/pmem.c | 1 -
> > drivers/s390/block/dcssblk.c | 1 -
> > fs/erofs/super.c | 11 +++++++----
> > fs/ext2/super.c | 6 ++++--
> > fs/ext4/super.c | 9 ++++++---
> > fs/xfs/xfs_super.c | 21 ++++++++-------------
> > include/linux/dax.h | 14 --------------
> > 11 files changed, 36 insertions(+), 110 deletions(-)
> >
[..] if (ext4_has_feature_inline_data(sb)) {
> > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> > index d07020a8eb9e3..163ceafbd8fd2 100644
> > --- a/fs/xfs/xfs_super.c
> > +++ b/fs/xfs/xfs_super.c
[..]
> > + if (mp->m_super->s_blocksize != PAGE_SIZE) {
> > + xfs_alert(mp,
> > + "DAX not supported for blocksize. Turning off DAX.\n");
>
> Newlines aren't needed at the end of extX_msg/xfs_alert format strings.

Thanks Darrick, I fixed those up.

2021-10-28 01:34:00

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Add a helper to perform the entire remapping for DAX accesses. This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Looks good.

Mike, ack?

>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/md/dm-linear.c | 49 +++++++++++++-----------------------------
> 1 file changed, 15 insertions(+), 34 deletions(-)
>
> diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
> index 32fbab11bf90c..bf03f73fd0f36 100644
> --- a/drivers/md/dm-linear.c
> +++ b/drivers/md/dm-linear.c
> @@ -164,63 +164,44 @@ static int linear_iterate_devices(struct dm_target *ti,
> }
>
> #if IS_ENABLED(CONFIG_FS_DAX)
> +static struct dax_device *linear_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
> +{
> + struct linear_c *lc = ti->private;
> + sector_t sector = linear_map_sector(ti, *pgoff << PAGE_SECTORS_SHIFT);
> +
> + *pgoff = (get_start_sect(lc->dev->bdev) + sector) >> PAGE_SECTORS_SHIFT;
> + return lc->dev->dax_dev;
> +}
> +
> static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> long nr_pages, void **kaddr, pfn_t *pfn)
> {
> - long ret;
> - struct linear_c *lc = ti->private;
> - struct block_device *bdev = lc->dev->bdev;
> - struct dax_device *dax_dev = lc->dev->dax_dev;
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -
> - dev_sector = linear_map_sector(ti, sector);
> - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
> - if (ret)
> - return ret;
> + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
> +
> return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
> }
>
> static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> - struct linear_c *lc = ti->private;
> - struct block_device *bdev = lc->dev->bdev;
> - struct dax_device *dax_dev = lc->dev->dax_dev;
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
>
> - dev_sector = linear_map_sector(ti, sector);
> - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> - return 0;
> return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
> }
>
> static size_t linear_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> - struct linear_c *lc = ti->private;
> - struct block_device *bdev = lc->dev->bdev;
> - struct dax_device *dax_dev = lc->dev->dax_dev;
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
>
> - dev_sector = linear_map_sector(ti, sector);
> - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> - return 0;
> return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
> }
>
> static int linear_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
> size_t nr_pages)
> {
> - int ret;
> - struct linear_c *lc = ti->private;
> - struct block_device *bdev = lc->dev->bdev;
> - struct dax_device *dax_dev = lc->dev->dax_dev;
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> -
> - dev_sector = linear_map_sector(ti, sector);
> - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
> - if (ret)
> - return ret;
> + struct dax_device *dax_dev = linear_dax_pgoff(ti, &pgoff);
> +
> return dax_zero_page_range(dax_dev, pgoff, nr_pages);
> }
>
> --
> 2.30.2
>

2021-10-28 01:36:54

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Add a helper to perform the entire remapping for DAX accesses. This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Looks good.

Mike, ack?

2021-10-28 01:42:02

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> Add a helper to perform the entire remapping for DAX accesses. This
> helper open codes bdev_dax_pgoff given that the alignment checks have
> already been done by the submitting file system and don't need to be
> repeated.

Again, looks good. Kind of embarrassing when the open-coded version is
less LOC than using the helper.

Mike, ack?

>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/md/dm-stripe.c | 63 ++++++++++--------------------------------
> 1 file changed, 15 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/md/dm-stripe.c b/drivers/md/dm-stripe.c
> index f084607220293..50dba3f39274c 100644
> --- a/drivers/md/dm-stripe.c
> +++ b/drivers/md/dm-stripe.c
> @@ -301,83 +301,50 @@ static int stripe_map(struct dm_target *ti, struct bio *bio)
> }
>
> #if IS_ENABLED(CONFIG_FS_DAX)
> -static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> - long nr_pages, void **kaddr, pfn_t *pfn)
> +static struct dax_device *stripe_dax_pgoff(struct dm_target *ti, pgoff_t *pgoff)
> {
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> struct stripe_c *sc = ti->private;
> - struct dax_device *dax_dev;
> struct block_device *bdev;
> + sector_t dev_sector;
> uint32_t stripe;
> - long ret;
>
> - stripe_map_sector(sc, sector, &stripe, &dev_sector);
> + stripe_map_sector(sc, *pgoff * PAGE_SECTORS, &stripe, &dev_sector);
> dev_sector += sc->stripe[stripe].physical_start;
> - dax_dev = sc->stripe[stripe].dev->dax_dev;
> bdev = sc->stripe[stripe].dev->bdev;
>
> - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages * PAGE_SIZE, &pgoff);
> - if (ret)
> - return ret;
> + *pgoff = (get_start_sect(bdev) + dev_sector) >> PAGE_SECTORS_SHIFT;
> + return sc->stripe[stripe].dev->dax_dev;
> +}
> +
> +static long stripe_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
> + long nr_pages, void **kaddr, pfn_t *pfn)
> +{
> + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
> +
> return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
> }
>
> static size_t stripe_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> - struct stripe_c *sc = ti->private;
> - struct dax_device *dax_dev;
> - struct block_device *bdev;
> - uint32_t stripe;
> -
> - stripe_map_sector(sc, sector, &stripe, &dev_sector);
> - dev_sector += sc->stripe[stripe].physical_start;
> - dax_dev = sc->stripe[stripe].dev->dax_dev;
> - bdev = sc->stripe[stripe].dev->bdev;
> + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> - return 0;
> return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
> }
>
> static size_t stripe_dax_copy_to_iter(struct dm_target *ti, pgoff_t pgoff,
> void *addr, size_t bytes, struct iov_iter *i)
> {
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> - struct stripe_c *sc = ti->private;
> - struct dax_device *dax_dev;
> - struct block_device *bdev;
> - uint32_t stripe;
> -
> - stripe_map_sector(sc, sector, &stripe, &dev_sector);
> - dev_sector += sc->stripe[stripe].physical_start;
> - dax_dev = sc->stripe[stripe].dev->dax_dev;
> - bdev = sc->stripe[stripe].dev->bdev;
> + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> - if (bdev_dax_pgoff(bdev, dev_sector, ALIGN(bytes, PAGE_SIZE), &pgoff))
> - return 0;
> return dax_copy_to_iter(dax_dev, pgoff, addr, bytes, i);
> }
>
> static int stripe_dax_zero_page_range(struct dm_target *ti, pgoff_t pgoff,
> size_t nr_pages)
> {
> - int ret;
> - sector_t dev_sector, sector = pgoff * PAGE_SECTORS;
> - struct stripe_c *sc = ti->private;
> - struct dax_device *dax_dev;
> - struct block_device *bdev;
> - uint32_t stripe;
> + struct dax_device *dax_dev = stripe_dax_pgoff(ti, &pgoff);
>
> - stripe_map_sector(sc, sector, &stripe, &dev_sector);
> - dev_sector += sc->stripe[stripe].physical_start;
> - dax_dev = sc->stripe[stripe].dev->dax_dev;
> - bdev = sc->stripe[stripe].dev->bdev;
> -
> - ret = bdev_dax_pgoff(bdev, dev_sector, nr_pages << PAGE_SHIFT, &pgoff);
> - if (ret)
> - return ret;
> return dax_zero_page_range(dax_dev, pgoff, nr_pages);
> }
>
> --
> 2.30.2
>

2021-10-28 01:44:44

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 11/11] dax: move bdev_dax_pgoff to fs/dax.c

On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
>
> No functional changet, but this will allow for a tighter integration

s/changet/changes/

> with the iomap code, including possible passing the partition offset

s/possible/possibly/

> in the iomap in the future. For now it mostly avoids growing more

s/now/now,/

...all of the above fixed up locally.

Other than that, it looks good to me.

> callers outside of fs/dax.c.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/super.c | 14 --------------
> fs/dax.c | 13 +++++++++++++
> include/linux/dax.h | 1 -
> 3 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index 803942586d1b6..c0910687fbcb2 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -67,20 +67,6 @@ void dax_remove_host(struct gendisk *disk)
> }
> EXPORT_SYMBOL_GPL(dax_remove_host);
>
> -int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
> - pgoff_t *pgoff)
> -{
> - sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
> - phys_addr_t phys_off = (start_sect + sector) * 512;
> -
> - if (pgoff)
> - *pgoff = PHYS_PFN(phys_off);
> - if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> - return -EINVAL;
> - return 0;
> -}
> -EXPORT_SYMBOL(bdev_dax_pgoff);
> -
> /**
> * dax_get_by_host() - temporary lookup mechanism for filesystem-dax
> * @bdev: block device to find a dax_device for
> diff --git a/fs/dax.c b/fs/dax.c
> index 4e3e5a283a916..eb715363fd667 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -709,6 +709,19 @@ int dax_invalidate_mapping_entry_sync(struct address_space *mapping,
> return __dax_invalidate_entry(mapping, index, false);
> }
>
> +static int bdev_dax_pgoff(struct block_device *bdev, sector_t sector, size_t size,
> + pgoff_t *pgoff)
> +{
> + sector_t start_sect = bdev ? get_start_sect(bdev) : 0;
> + phys_addr_t phys_off = (start_sect + sector) * 512;
> +
> + if (pgoff)
> + *pgoff = PHYS_PFN(phys_off);
> + if (phys_off % PAGE_SIZE || size % PAGE_SIZE)
> + return -EINVAL;
> + return 0;
> +}
> +
> static int copy_cow_page_dax(struct block_device *bdev, struct dax_device *dax_dev,
> sector_t sector, struct page *to, unsigned long vaddr)
> {
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 439c3c70e347b..324363b798ecd 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -107,7 +107,6 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> #endif
>
> struct writeback_control;
> -int bdev_dax_pgoff(struct block_device *, sector_t, size_t, pgoff_t *pgoff);
> #if IS_ENABLED(CONFIG_FS_DAX)
> int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk);
> void dax_remove_host(struct gendisk *disk);
> --
> 2.30.2
>

2021-10-28 23:53:25

by Stephen Rothwell

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

Hi Dan,

On Wed, 27 Oct 2021 13:46:31 -0700 Dan Williams <[email protected]> wrote:
>
> My merge resolution is here [1]. Christoph, please have a look. The
> rebase and the merge result are both passing my test and I'm now going
> to review the individual patches. However, while I do that and collect
> acks from DM and EROFS folks, I want to give Stephen a heads up that
> this is coming. Primarily I want to see if someone sees a better
> strategy to merge this, please let me know, but if not I plan to walk
> Stephen and Linus through the resolution.

It doesn't look to bad to me (however it is a bit late in the cycle :-(
). Once you are happy, just put it in your tree (some of the conflicts
are against the current -rc3 based version of your tree anyway) and I
will cope with it on Monday.

You could do a test merge against next-<date>^^ (that leaves out
Andrew's patch series) and if you think there is anything tricky please
send me a "git diff-tree --cc HEAD" after you have resolved the
conflicts to your satisfaction and committed the test merge or just
point me at the test merge in a tree somewhere (like this one).

--
Cheers,
Stephen Rothwell


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-10-29 04:58:03

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association

On Mon, Oct 18, 2021 at 06:40:46AM +0200, Christoph Hellwig wrote:
> Replace the dax_host_hash with an xarray indexed by the pointer value
> of the gendisk, and require explicitl calls from the block drivers that
> want to associate their gendisk with a dax_device.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> ---
> drivers/dax/bus.c | 2 +-
> drivers/dax/super.c | 106 +++++++++--------------------------
> drivers/md/dm.c | 6 +-
> drivers/nvdimm/pmem.c | 8 ++-
> drivers/s390/block/dcssblk.c | 11 +++-
> fs/fuse/virtio_fs.c | 2 +-
> include/linux/dax.h | 19 +++++--
> 7 files changed, 60 insertions(+), 94 deletions(-)
>
> diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
> index 6cc4da4c713d9..6d91b0186e3be 100644
> --- a/drivers/dax/bus.c
> +++ b/drivers/dax/bus.c
> @@ -1326,7 +1326,7 @@ struct dev_dax *devm_create_dev_dax(struct dev_dax_data *data)
> * No 'host' or dax_operations since there is no access to this
> * device outside of mmap of the resulting character device.
> */

NIT: this comment needs to be updated as well.

Ira

> - dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
> + dax_dev = alloc_dax(dev_dax, NULL, DAXDEV_F_SYNC);
> if (IS_ERR(dax_dev)) {
> rc = PTR_ERR(dax_dev);
> goto err_alloc_dax;

[snip]

2021-10-29 15:43:17

by Dan Williams

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Oct 28, 2021 at 4:52 PM Stephen Rothwell <[email protected]> wrote:
>
> Hi Dan,
>
> On Wed, 27 Oct 2021 13:46:31 -0700 Dan Williams <[email protected]> wrote:
> >
> > My merge resolution is here [1]. Christoph, please have a look. The
> > rebase and the merge result are both passing my test and I'm now going
> > to review the individual patches. However, while I do that and collect
> > acks from DM and EROFS folks, I want to give Stephen a heads up that
> > this is coming. Primarily I want to see if someone sees a better
> > strategy to merge this, please let me know, but if not I plan to walk
> > Stephen and Linus through the resolution.
>
> It doesn't look to bad to me (however it is a bit late in the cycle :-(
> ). Once you are happy, just put it in your tree (some of the conflicts
> are against the current -rc3 based version of your tree anyway) and I
> will cope with it on Monday.

Christoph, Darrick, Shiyang,

I'm losing my nerve to try to jam this into v5.16 this late in the
cycle. I do want to get dax+reflink squared away as soon as possible,
but that looks like something that needs to build on top of a
v5.16-rc1 at this point. If Linus does a -rc8 then maybe it would have
enough soak time, but otherwise I want to take the time to collect the
acks and queue up some more follow-on cleanups to prepare for
block-less-dax.

2021-10-29 15:56:19

by Darrick J. Wong

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Fri, Oct 29, 2021 at 08:42:29AM -0700, Dan Williams wrote:
> On Thu, Oct 28, 2021 at 4:52 PM Stephen Rothwell <[email protected]> wrote:
> >
> > Hi Dan,
> >
> > On Wed, 27 Oct 2021 13:46:31 -0700 Dan Williams <[email protected]> wrote:
> > >
> > > My merge resolution is here [1]. Christoph, please have a look. The
> > > rebase and the merge result are both passing my test and I'm now going
> > > to review the individual patches. However, while I do that and collect
> > > acks from DM and EROFS folks, I want to give Stephen a heads up that
> > > this is coming. Primarily I want to see if someone sees a better
> > > strategy to merge this, please let me know, but if not I plan to walk
> > > Stephen and Linus through the resolution.
> >
> > It doesn't look to bad to me (however it is a bit late in the cycle :-(
> > ). Once you are happy, just put it in your tree (some of the conflicts
> > are against the current -rc3 based version of your tree anyway) and I
> > will cope with it on Monday.
>
> Christoph, Darrick, Shiyang,
>
> I'm losing my nerve to try to jam this into v5.16 this late in the
> cycle.

Always a solid choice to hold off for a little more testing and a little
less anxiety. :)

I don't usually accept new code patches for iomap after rc4 anyway.

> I do want to get dax+reflink squared away as soon as possible,
> but that looks like something that needs to build on top of a
> v5.16-rc1 at this point. If Linus does a -rc8 then maybe it would have
> enough soak time, but otherwise I want to take the time to collect the
> acks and queue up some more follow-on cleanups to prepare for
> block-less-dax.

I think that hwpoison-calls-xfs-rmap patchset is a prerequisite for
dax+reflink anyway, right? /me had concluded both were 5.17 things.

--D

2021-10-29 16:17:20

by Dan Williams

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Fri, Oct 29, 2021 at 8:55 AM Darrick J. Wong <[email protected]> wrote:
>
> On Fri, Oct 29, 2021 at 08:42:29AM -0700, Dan Williams wrote:
> > On Thu, Oct 28, 2021 at 4:52 PM Stephen Rothwell <[email protected]> wrote:
> > >
> > > Hi Dan,
> > >
> > > On Wed, 27 Oct 2021 13:46:31 -0700 Dan Williams <[email protected]> wrote:
> > > >
> > > > My merge resolution is here [1]. Christoph, please have a look. The
> > > > rebase and the merge result are both passing my test and I'm now going
> > > > to review the individual patches. However, while I do that and collect
> > > > acks from DM and EROFS folks, I want to give Stephen a heads up that
> > > > this is coming. Primarily I want to see if someone sees a better
> > > > strategy to merge this, please let me know, but if not I plan to walk
> > > > Stephen and Linus through the resolution.
> > >
> > > It doesn't look to bad to me (however it is a bit late in the cycle :-(
> > > ). Once you are happy, just put it in your tree (some of the conflicts
> > > are against the current -rc3 based version of your tree anyway) and I
> > > will cope with it on Monday.
> >
> > Christoph, Darrick, Shiyang,
> >
> > I'm losing my nerve to try to jam this into v5.16 this late in the
> > cycle.
>
> Always a solid choice to hold off for a little more testing and a little
> less anxiety. :)
>
> I don't usually accept new code patches for iomap after rc4 anyway.
>
> > I do want to get dax+reflink squared away as soon as possible,
> > but that looks like something that needs to build on top of a
> > v5.16-rc1 at this point. If Linus does a -rc8 then maybe it would have
> > enough soak time, but otherwise I want to take the time to collect the
> > acks and queue up some more follow-on cleanups to prepare for
> > block-less-dax.
>
> I think that hwpoison-calls-xfs-rmap patchset is a prerequisite for
> dax+reflink anyway, right? /me had concluded both were 5.17 things.

Ok, cool, sounds like a plan.

2021-11-01 16:18:21

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 07/11] dax: remove dax_capable

On Wed, Oct 27 2021 at 8:16P -0400,
Dan Williams <[email protected]> wrote:

> I am going to change the subject of this patch to:
>
> dax: remove ->dax_supported()
>
> On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
> >
>
> I'll add a bit more background to help others review this.
>
> The ->dax_supported() operation arranges for a stack of devices to
> each answer the question "is dax operational". That request routes to
> generic_fsdax_supported() at last level device and that attempted an
> actual dax_direct_access() call and did some sanity checks. However,
> those sanity checks can be validated in other ways and with those
> removed the only question to answer is "has each block device driver
> in the stack performed dax_add_host()". That can be validated without
> a dax_operation. So, just open code the block size and dax_dev == NULL
> checks in the callers, and delete ->dax_supported().
>
> Mike, let me know if you have any concerns.

Thanks for your additional background, it helped.


>
> > Just open code the block size and dax_dev == NULL checks in the callers.
> >
> > Signed-off-by: Christoph Hellwig <[email protected]>
> > ---
> > drivers/dax/super.c | 36 ------------------------------------
> > drivers/md/dm-table.c | 22 +++++++++++-----------
> > drivers/md/dm.c | 21 ---------------------
> > drivers/md/dm.h | 4 ----
> > drivers/nvdimm/pmem.c | 1 -
> > drivers/s390/block/dcssblk.c | 1 -
> > fs/erofs/super.c | 11 +++++++----
> > fs/ext2/super.c | 6 ++++--
> > fs/ext4/super.c | 9 ++++++---
> > fs/xfs/xfs_super.c | 21 ++++++++-------------
> > include/linux/dax.h | 14 --------------
> > 11 files changed, 36 insertions(+), 110 deletions(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index 482fe775324a4..803942586d1b6 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -108,42 +108,6 @@ struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev)
> > return dax_dev;
> > }
> > EXPORT_SYMBOL_GPL(fs_dax_get_by_bdev);
> > -
> > -bool generic_fsdax_supported(struct dax_device *dax_dev,
> > - struct block_device *bdev, int blocksize, sector_t start,
> > - sector_t sectors)
> > -{
> > - if (blocksize != PAGE_SIZE) {
> > - pr_info("%pg: error: unsupported blocksize for dax\n", bdev);
> > - return false;
> > - }
> > -
> > - if (!dax_dev) {
> > - pr_debug("%pg: error: dax unsupported by block device\n", bdev);
> > - return false;
> > - }
> > -
> > - return true;
> > -}
> > -EXPORT_SYMBOL_GPL(generic_fsdax_supported);
> > -
> > -bool dax_supported(struct dax_device *dax_dev, struct block_device *bdev,
> > - int blocksize, sector_t start, sector_t len)
> > -{
> > - bool ret = false;
> > - int id;
> > -
> > - if (!dax_dev)
> > - return false;
> > -
> > - id = dax_read_lock();
> > - if (dax_alive(dax_dev) && dax_dev->ops->dax_supported)
> > - ret = dax_dev->ops->dax_supported(dax_dev, bdev, blocksize,
> > - start, len);
> > - dax_read_unlock(id);
> > - return ret;
> > -}
> > -EXPORT_SYMBOL_GPL(dax_supported);
> > #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */
> >
> > enum dax_device_flags {
> > diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> > index 1fa4d5582dca5..4ae671c2168ea 100644
> > --- a/drivers/md/dm-table.c
> > +++ b/drivers/md/dm-table.c
> > @@ -807,12 +807,14 @@ void dm_table_set_type(struct dm_table *t, enum dm_queue_mode type)
> > EXPORT_SYMBOL_GPL(dm_table_set_type);
> >
> > /* validate the dax capability of the target device span */
> > -int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> > +static int device_not_dax_capable(struct dm_target *ti, struct dm_dev *dev,
> > sector_t start, sector_t len, void *data)
> > {
> > - int blocksize = *(int *) data;
> > + if (dev->dax_dev)
> > + return false;
> >
> > - return !dax_supported(dev->dax_dev, dev->bdev, blocksize, start, len);
> > + pr_debug("%pg: error: dax unsupported by block device\n", dev->bdev);

Would prefer the use of DMDEBUG() here (which doesn't need trailing \n)

But otherwise:
Acked-by: Mike Snitzer <[email protected]>

2021-11-01 16:19:29

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 08/11] dm-linear: add a linear_dax_pgoff helper

On Wed, Oct 27 2021 at 9:32P -0400,
Dan Williams <[email protected]> wrote:

> On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
> >
> > Add a helper to perform the entire remapping for DAX accesses. This
> > helper open codes bdev_dax_pgoff given that the alignment checks have
> > already been done by the submitting file system and don't need to be
> > repeated.
>
> Looks good.
>
> Mike, ack?

Acked-by: Mike Snitzer <[email protected]>

2021-11-01 16:19:46

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 09/11] dm-log-writes: add a log_writes_dax_pgoff helper

On Wed, Oct 27 2021 at 9:36P -0400,
Dan Williams <[email protected]> wrote:

> On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
> >
> > Add a helper to perform the entire remapping for DAX accesses. This
> > helper open codes bdev_dax_pgoff given that the alignment checks have
> > already been done by the submitting file system and don't need to be
> > repeated.
>
> Looks good.
>
> Mike, ack?
>

Acked-by: Mike Snitzer <[email protected]>

2021-11-01 16:21:46

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 10/11] dm-stripe: add a stripe_dax_pgoff helper

On Wed, Oct 27 2021 at 9:41P -0400,
Dan Williams <[email protected]> wrote:

> On Sun, Oct 17, 2021 at 9:41 PM Christoph Hellwig <[email protected]> wrote:
> >
> > Add a helper to perform the entire remapping for DAX accesses. This
> > helper open codes bdev_dax_pgoff given that the alignment checks have
> > already been done by the submitting file system and don't need to be
> > repeated.
>
> Again, looks good. Kind of embarrassing when the open-coded version is
> less LOC than using the helper.
>
> Mike, ack?

Acked-by: Mike Snitzer <[email protected]>

2021-11-01 16:29:33

by Mike Snitzer

[permalink] [raw]
Subject: Re: [PATCH 03/11] dax: simplify the dax_device <-> gendisk association

On Mon, Oct 18 2021 at 12:40P -0400,
Christoph Hellwig <[email protected]> wrote:

> Replace the dax_host_hash with an xarray indexed by the pointer value
> of the gendisk, and require explicitl calls from the block drivers that
> want to associate their gendisk with a dax_device.
>
> Signed-off-by: Christoph Hellwig <[email protected]>

...

> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 79737aee516b1..a0a4703620650 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1683,6 +1683,7 @@ static void cleanup_mapped_device(struct mapped_device *md)
> bioset_exit(&md->io_bs);
>
> if (md->dax_dev) {
> + dax_remove_host(md->disk);
> kill_dax(md->dax_dev);
> put_dax(md->dax_dev);
> md->dax_dev = NULL;
> @@ -1784,10 +1785,11 @@ static struct mapped_device *alloc_dev(int minor)
> sprintf(md->disk->disk_name, "dm-%d", minor);
>
> if (IS_ENABLED(CONFIG_FS_DAX)) {
> - md->dax_dev = alloc_dax(md, md->disk->disk_name,
> - &dm_dax_ops, 0);
> + md->dax_dev = alloc_dax(md, &dm_dax_ops, 0);
> if (IS_ERR(md->dax_dev))
> goto bad;
> + if (dax_add_host(md->dax_dev, md->disk))
> + goto bad;
> }
>
> format_dev_t(md->name, MKDEV(_major, minor));

Acked-by: Mike Snitzer <[email protected]>

2021-11-03 17:59:58

by Eric Sandeen

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On 10/17/21 11:40 PM, Christoph Hellwig wrote:
> Hi Dan,
>
> this series cleans up and simplifies the association between DAX and block
> devices in preparation of allowing to mount file systems directly on DAX
> devices without a detour through block devices.

Christoph, can I ask what the end game looks like, here? If dax is completely
decoupled from block devices, are there user-visible changes? If I want to
run fs-dax on a pmem device - what do I point mkfs at, if not a block device?

Thanks,
-Eric

2021-11-04 08:21:35

by Christoph Hellwig

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Wed, Nov 03, 2021 at 12:59:31PM -0500, Eric Sandeen wrote:
> Christoph, can I ask what the end game looks like, here? If dax is completely
> decoupled from block devices, are there user-visible changes?

Yes.

> If I want to
> run fs-dax on a pmem device - what do I point mkfs at, if not a block device?

The rough plan is to use the device dax character devices. I'll hopefully
have a draft version in the next days.

2021-11-04 17:35:18

by Darrick J. Wong

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 04, 2021 at 09:17:40AM +0100, Christoph Hellwig wrote:
> On Wed, Nov 03, 2021 at 12:59:31PM -0500, Eric Sandeen wrote:
> > Christoph, can I ask what the end game looks like, here? If dax is completely
> > decoupled from block devices, are there user-visible changes?
>
> Yes.
>
> > If I want to
> > run fs-dax on a pmem device - what do I point mkfs at, if not a block device?
>
> The rough plan is to use the device dax character devices. I'll hopefully
> have a draft version in the next days.

/me wonders, are block devices going away? Will mkfs.xfs have to learn
how to talk to certain chardevs? I guess jffs2 and others already do
that kind of thing... but I suppose I can wait for the real draft to
show up to ramble further. ;)

--D

2021-11-04 17:36:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 04, 2021 at 10:34:17AM -0700, Darrick J. Wong wrote:
> /me wonders, are block devices going away? Will mkfs.xfs have to learn
> how to talk to certain chardevs? I guess jffs2 and others already do
> that kind of thing... but I suppose I can wait for the real draft to
> show up to ramble further. ;)

Right now I've mostly been looking into the kernel side. An no, I
do not expect /dev/pmem* to go away as you'll still need it for a
not DAX aware file system and/or application (such as mkfs initially).

But yes, just pointing mkfs to the chardev should be doable with very
little work. We can point it to a regular file after all.

2021-11-04 18:12:05

by Dan Williams

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 4, 2021 at 10:36 AM Christoph Hellwig <[email protected]> wrote:
>
> On Thu, Nov 04, 2021 at 10:34:17AM -0700, Darrick J. Wong wrote:
> > /me wonders, are block devices going away? Will mkfs.xfs have to learn
> > how to talk to certain chardevs? I guess jffs2 and others already do
> > that kind of thing... but I suppose I can wait for the real draft to
> > show up to ramble further. ;)
>
> Right now I've mostly been looking into the kernel side. An no, I
> do not expect /dev/pmem* to go away as you'll still need it for a
> not DAX aware file system and/or application (such as mkfs initially).
>
> But yes, just pointing mkfs to the chardev should be doable with very
> little work. We can point it to a regular file after all.

Note that I've avoided implementing read/write fops for dax devices
partly out of concern for not wanting to figure out shared-mmap vs
write coherence issues, but also because of a bet with Dave Hansen
that device-dax not grow features like what happened to hugetlbfs. So
it would seem mkfs would need to switch to mmap I/O, or bite the
bullet and implement read/write fops in the driver.

2021-11-04 19:06:01

by Darrick J. Wong

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 04, 2021 at 11:10:19AM -0700, Dan Williams wrote:
> On Thu, Nov 4, 2021 at 10:36 AM Christoph Hellwig <[email protected]> wrote:
> >
> > On Thu, Nov 04, 2021 at 10:34:17AM -0700, Darrick J. Wong wrote:
> > > /me wonders, are block devices going away? Will mkfs.xfs have to learn
> > > how to talk to certain chardevs? I guess jffs2 and others already do
> > > that kind of thing... but I suppose I can wait for the real draft to
> > > show up to ramble further. ;)
> >
> > Right now I've mostly been looking into the kernel side. An no, I
> > do not expect /dev/pmem* to go away as you'll still need it for a
> > not DAX aware file system and/or application (such as mkfs initially).
> >
> > But yes, just pointing mkfs to the chardev should be doable with very
> > little work. We can point it to a regular file after all.
>
> Note that I've avoided implementing read/write fops for dax devices
> partly out of concern for not wanting to figure out shared-mmap vs
> write coherence issues, but also because of a bet with Dave Hansen
> that device-dax not grow features like what happened to hugetlbfs. So
> it would seem mkfs would need to switch to mmap I/O, or bite the
> bullet and implement read/write fops in the driver.

That ... would require a fair amount of userspace changes, though at
least e2fsprogs has pluggable io drivers, which would make mmapping a
character device not too awful.

xfsprogs would be another story -- porting the buffer cache mignt not be
too bad, but mkfs and repair seem to issue pread/pwrite calls directly.
Note that xfsprogs explicitly screens out chardevs.

--D

2021-11-05 04:06:27

by Theodore Ts'o

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote:
> > Note that I've avoided implementing read/write fops for dax devices
> > partly out of concern for not wanting to figure out shared-mmap vs
> > write coherence issues, but also because of a bet with Dave Hansen
> > that device-dax not grow features like what happened to hugetlbfs. So
> > it would seem mkfs would need to switch to mmap I/O, or bite the
> > bullet and implement read/write fops in the driver.
>
> That ... would require a fair amount of userspace changes, though at
> least e2fsprogs has pluggable io drivers, which would make mmapping a
> character device not too awful.
>
> xfsprogs would be another story -- porting the buffer cache mignt not be
> too bad, but mkfs and repair seem to issue pread/pwrite calls directly.
> Note that xfsprogs explicitly screens out chardevs.

It's not just e2fsprogs and xfsprogs. There's also udev, blkid,
potententially systemd unit generators to kick off fsck runs, etc.
There are probably any number of user scripts which assume that file
systems are mounted on block devices --- for example, by looking at
the output of lsblk, etc.

Also note that block devices have O_EXCL support to provide locking
against attempts to run mkfs on a mounted file system. If you move
dax file systems to be mounted on a character mode device, that would
have to be replicated as well, etc. So I suspect that a large number
of subtle things would break, and I'd strongly recommend against going
down that path.

- Ted

2021-11-05 04:28:50

by Matthew Wilcox

[permalink] [raw]
Subject: Re: futher decouple DAX from block devices

On Thu, Nov 04, 2021 at 11:09:19PM -0400, Theodore Ts'o wrote:
> On Thu, Nov 04, 2021 at 12:04:43PM -0700, Darrick J. Wong wrote:
> > > Note that I've avoided implementing read/write fops for dax devices
> > > partly out of concern for not wanting to figure out shared-mmap vs
> > > write coherence issues, but also because of a bet with Dave Hansen
> > > that device-dax not grow features like what happened to hugetlbfs. So
> > > it would seem mkfs would need to switch to mmap I/O, or bite the
> > > bullet and implement read/write fops in the driver.
> >
> > That ... would require a fair amount of userspace changes, though at
> > least e2fsprogs has pluggable io drivers, which would make mmapping a
> > character device not too awful.
> >
> > xfsprogs would be another story -- porting the buffer cache mignt not be
> > too bad, but mkfs and repair seem to issue pread/pwrite calls directly.
> > Note that xfsprogs explicitly screens out chardevs.
>
> It's not just e2fsprogs and xfsprogs. There's also udev, blkid,
> potententially systemd unit generators to kick off fsck runs, etc.
> There are probably any number of user scripts which assume that file
> systems are mounted on block devices --- for example, by looking at
> the output of lsblk, etc.
>
> Also note that block devices have O_EXCL support to provide locking
> against attempts to run mkfs on a mounted file system. If you move
> dax file systems to be mounted on a character mode device, that would
> have to be replicated as well, etc. So I suspect that a large number
> of subtle things would break, and I'd strongly recommend against going
> down that path.

Agreed. There were reasons we decided to present pmem as "block
device with extra functionality" rather than try to cram all the block
layer functionality (eg submitting BIOs for filesystem metadata) into a
character device. Some of those assumptions might be worth re-examining,
but I haven't seen anything that makes me say "this is obviously better
than what we did at the time".