2018-06-05 23:59:09

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 1/3] libnvdimm: unconditionally deep flush on *sync

Prior to this commit we would only do a "deep flush" in response to an
msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
we were setting up the request queue. This happens due to the write cache
value passed in to blk_queue_write_cache(). We do have a "write_cache"
sysfs entry for namespaces, i.e.:

/sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache

which can be used to control whether or not the kernel thinks a given
namespace has a write cache, but this didn't modify the deep flush behavior
that we set up when the driver was initialized. Instead, it only modified
whether or not DAX would flush CPU caches in response to *sync calls.

Simplify this by making the *sync "deep flush" always happen, regardless of
the write cache setting of a namespace. The DAX CPU cache flushing will be
controlled by a combination of the write_cache setting as well as whether
the platform supports flush-on-fail CPU caches.

Signed-off-by: Ross Zwisler <[email protected]>
Suggested-by: Dan Williams <[email protected]>
---
drivers/nvdimm/pmem.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..a152dd9e4134 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -299,7 +299,7 @@ static int pmem_attach_disk(struct device *dev,
{
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
struct nd_region *nd_region = to_nd_region(dev->parent);
- int nid = dev_to_node(dev), fua, wbc;
+ int nid = dev_to_node(dev), fua;
struct resource *res = &nsio->res;
struct resource bb_res;
struct nd_pfn *nd_pfn = NULL;
@@ -335,7 +335,6 @@ static int pmem_attach_disk(struct device *dev,
dev_warn(dev, "unable to guarantee persistence of writes\n");
fua = 0;
}
- wbc = nvdimm_has_cache(nd_region);

if (!devm_request_mem_region(dev, res->start, resource_size(res),
dev_name(&ndns->dev))) {
@@ -382,7 +381,7 @@ static int pmem_attach_disk(struct device *dev,
return PTR_ERR(addr);
pmem->virt_addr = addr;

- blk_queue_write_cache(q, wbc, fua);
+ blk_queue_write_cache(q, true, fua);
blk_queue_make_request(q, pmem_make_request);
blk_queue_physical_block_size(q, PAGE_SIZE);
blk_queue_logical_block_size(q, pmem_sector_size(ndns));
@@ -413,7 +412,7 @@ static int pmem_attach_disk(struct device *dev,
put_disk(disk);
return -ENOMEM;
}
- dax_write_cache(dax_dev, wbc);
+ dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;

gendev = disk_to_dev(disk);
--
2.14.4



2018-06-05 23:58:47

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 2/3] libnvdimm: use dax_write_cache* helpers

Use dax_write_cache() and dax_write_cache_enabled() instead of open coding
the bit operations.

Signed-off-by: Ross Zwisler <[email protected]>
---
drivers/dax/super.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..c2c46f96b18c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -182,8 +182,7 @@ static ssize_t write_cache_show(struct device *dev,
if (!dax_dev)
return -ENXIO;

- rc = sprintf(buf, "%d\n", !!test_bit(DAXDEV_WRITE_CACHE,
- &dax_dev->flags));
+ rc = sprintf(buf, "%d\n", !!dax_write_cache_enabled(dax_dev));
put_dax(dax_dev);
return rc;
}
@@ -201,10 +200,8 @@ static ssize_t write_cache_store(struct device *dev,

if (rc)
len = rc;
- else if (write_cache)
- set_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags);
else
- clear_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags);
+ dax_write_cache(dax_dev, write_cache);

put_dax(dax_dev);
return len;
@@ -286,7 +283,7 @@ EXPORT_SYMBOL_GPL(dax_copy_from_iter);
void arch_wb_cache_pmem(void *addr, size_t size);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
{
- if (unlikely(!test_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags)))
+ if (unlikely(!dax_write_cache_enabled(dax_dev)))
return;

arch_wb_cache_pmem(addr, size);
--
2.14.4


2018-06-05 23:59:28

by Ross Zwisler

[permalink] [raw]
Subject: [PATCH v2 3/3] libnvdimm: don't flush power-fail protected CPU caches

This commit:

5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")

intended to make sure that deep flush was always available even on
platforms which support a power-fail protected CPU cache. An unintended
side effect of this change was that we also lost the ability to skip
flushing CPU caches on those power-fail protected CPU cache.

Signed-off-by: Ross Zwisler <[email protected]>
Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
---
drivers/dax/super.c | 14 +++++++++++++-
drivers/nvdimm/pmem.c | 2 ++
include/linux/dax.h | 4 ++++
3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index c2c46f96b18c..80253c531a9b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -152,6 +152,8 @@ enum dax_device_flags {
DAXDEV_ALIVE,
/* gate whether dax_flush() calls the low level flush routine */
DAXDEV_WRITE_CACHE,
+ /* only flush the CPU caches if they are not power fail protected */
+ DAXDEV_FLUSH_ON_SYNC,
};

/**
@@ -283,7 +285,8 @@ EXPORT_SYMBOL_GPL(dax_copy_from_iter);
void arch_wb_cache_pmem(void *addr, size_t size);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
{
- if (unlikely(!dax_write_cache_enabled(dax_dev)))
+ if (unlikely(!dax_write_cache_enabled(dax_dev)) ||
+ !test_bit(DAXDEV_FLUSH_ON_SYNC, &dax_dev->flags))
return;

arch_wb_cache_pmem(addr, size);
@@ -310,6 +313,15 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
}
EXPORT_SYMBOL_GPL(dax_write_cache_enabled);

+void dax_flush_on_sync(struct dax_device *dax_dev, bool flush)
+{
+ if (flush)
+ set_bit(DAXDEV_FLUSH_ON_SYNC, &dax_dev->flags);
+ else
+ clear_bit(DAXDEV_FLUSH_ON_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_flush_on_sync);
+
bool dax_alive(struct dax_device *dax_dev)
{
lockdep_assert_held(&dax_srcu);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index a152dd9e4134..e8c2795bf766 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -413,6 +413,8 @@ static int pmem_attach_disk(struct device *dev,
return -ENOMEM;
}
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+ dax_flush_on_sync(dax_dev,
+ !test_bit(ND_REGION_PERSIST_CACHE, &nd_region->flags));
pmem->dax_dev = dax_dev;

gendev = disk_to_dev(disk);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index f9eb22ad341e..4575742508b0 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -32,6 +32,7 @@ void put_dax(struct dax_device *dax_dev);
void kill_dax(struct dax_device *dax_dev);
void dax_write_cache(struct dax_device *dax_dev, bool wc);
bool dax_write_cache_enabled(struct dax_device *dax_dev);
+void dax_flush_on_sync(struct dax_device *dax_dev, bool flush);
#else
static inline struct dax_device *dax_get_by_host(const char *host)
{
@@ -59,6 +60,9 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
{
return false;
}
+static inline void dax_flush_on_sync(struct dax_device *dax_dev, bool flush)
+{
+}
#endif

struct writeback_control;
--
2.14.4


2018-06-06 02:00:54

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] libnvdimm: don't flush power-fail protected CPU caches

On Tue, Jun 5, 2018 at 4:58 PM, Ross Zwisler
<[email protected]> wrote:
> This commit:
>
> 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
>
> intended to make sure that deep flush was always available even on
> platforms which support a power-fail protected CPU cache. An unintended
> side effect of this change was that we also lost the ability to skip
> flushing CPU caches on those power-fail protected CPU cache.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
> ---
> drivers/dax/super.c | 14 +++++++++++++-
> drivers/nvdimm/pmem.c | 2 ++
> include/linux/dax.h | 4 ++++
> 3 files changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> index c2c46f96b18c..80253c531a9b 100644
> --- a/drivers/dax/super.c
> +++ b/drivers/dax/super.c
> @@ -152,6 +152,8 @@ enum dax_device_flags {
> DAXDEV_ALIVE,
> /* gate whether dax_flush() calls the low level flush routine */
> DAXDEV_WRITE_CACHE,
> + /* only flush the CPU caches if they are not power fail protected */
> + DAXDEV_FLUSH_ON_SYNC,

I'm not grokking why we need DAXDEV_FLUSH_ON_SYNC. The power fail
protected status of the cache only determines the default for
DAXDEV_WRITE_CACHE.

2018-06-06 17:49:46

by Ross Zwisler

[permalink] [raw]
Subject: Re: [PATCH v2 3/3] libnvdimm: don't flush power-fail protected CPU caches

On Tue, Jun 05, 2018 at 07:00:14PM -0700, Dan Williams wrote:
> On Tue, Jun 5, 2018 at 4:58 PM, Ross Zwisler
> <[email protected]> wrote:
> > This commit:
> >
> > 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
> >
> > intended to make sure that deep flush was always available even on
> > platforms which support a power-fail protected CPU cache. An unintended
> > side effect of this change was that we also lost the ability to skip
> > flushing CPU caches on those power-fail protected CPU cache.
> >
> > Signed-off-by: Ross Zwisler <[email protected]>
> > Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
> > ---
> > drivers/dax/super.c | 14 +++++++++++++-
> > drivers/nvdimm/pmem.c | 2 ++
> > include/linux/dax.h | 4 ++++
> > 3 files changed, 19 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dax/super.c b/drivers/dax/super.c
> > index c2c46f96b18c..80253c531a9b 100644
> > --- a/drivers/dax/super.c
> > +++ b/drivers/dax/super.c
> > @@ -152,6 +152,8 @@ enum dax_device_flags {
> > DAXDEV_ALIVE,
> > /* gate whether dax_flush() calls the low level flush routine */
> > DAXDEV_WRITE_CACHE,
> > + /* only flush the CPU caches if they are not power fail protected */
> > + DAXDEV_FLUSH_ON_SYNC,
>
> I'm not grokking why we need DAXDEV_FLUSH_ON_SYNC. The power fail
> protected status of the cache only determines the default for
> DAXDEV_WRITE_CACHE.

Ah, yea, that's much cleaner. I'll send out a v3.