LinuxLists.cc - [PATCH v3 1/4] pmem, libnvdimm: complete REQ_FLUSH => REQ

2018-06-06 16:46:22

Subject: [PATCH v3 1/4] pmem, libnvdimm: complete REQ_FLUSH => REQ_PREFLUSH

Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started
way back in v4.8.

Signed-off-by: Ross Zwisler <[email protected]>
---
drivers/nvdimm/pmem.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 9d714926ecf5..252adfab1e47 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -164,11 +164,6 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
return rc;
}

-/* account for REQ_FLUSH rename, replace with REQ_PREFLUSH after v4.8-rc1 */
-#ifndef REQ_FLUSH
-#define REQ_FLUSH REQ_PREFLUSH
-#endif
-
static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
{
blk_status_t rc = 0;
@@ -179,7 +174,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
struct pmem_device *pmem = q->queuedata;
struct nd_region *nd_region = to_region(pmem);

- if (bio->bi_opf & REQ_FLUSH)
+ if (bio->bi_opf & REQ_PREFLUSH)
nvdimm_flush(nd_region);

do_acct = nd_iostat_start(bio, &start);
--
2.14.4

2018-06-06 16:49:16

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 3/4] libnvdimm: use dax_write_cache* helpers

Use dax_write_cache() and dax_write_cache_enabled() instead of open coding
the bit operations.

Signed-off-by: Ross Zwisler <[email protected]>
---
drivers/dax/super.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 2b2332b605e4..c2c46f96b18c 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -182,8 +182,7 @@ static ssize_t write_cache_show(struct device *dev,
if (!dax_dev)
return -ENXIO;

- rc = sprintf(buf, "%d\n", !!test_bit(DAXDEV_WRITE_CACHE,
- &dax_dev->flags));
+ rc = sprintf(buf, "%d\n", !!dax_write_cache_enabled(dax_dev));
put_dax(dax_dev);
return rc;
}
@@ -201,10 +200,8 @@ static ssize_t write_cache_store(struct device *dev,

if (rc)
len = rc;
- else if (write_cache)
- set_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags);
else
- clear_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags);
+ dax_write_cache(dax_dev, write_cache);

put_dax(dax_dev);
return len;
@@ -286,7 +283,7 @@ EXPORT_SYMBOL_GPL(dax_copy_from_iter);
void arch_wb_cache_pmem(void *addr, size_t size);
void dax_flush(struct dax_device *dax_dev, void *addr, size_t size)
{
- if (unlikely(!test_bit(DAXDEV_WRITE_CACHE, &dax_dev->flags)))
+ if (unlikely(!dax_write_cache_enabled(dax_dev)))
return;

arch_wb_cache_pmem(addr, size);
--
2.14.4

2018-06-06 18:27:50

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 2/4] libnvdimm: unconditionally deep flush on *sync

Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
write to each of the flush hints for a region) in response to an
msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
we were setting up the request queue. This happens due to the write cache
value passed in to blk_queue_write_cache(), which then causes the block
layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a
"write_cache" sysfs entry for namespaces, i.e.:

/sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache

which can be used to control whether or not the kernel thinks a given
namespace has a write cache, but this didn't modify the deep flush behavior
that we set up when the driver was initialized. Instead, it only modified
whether or not DAX would flush CPU caches via dax_flush() in response to
*sync calls.

Simplify this by making the *sync deep flush always happen, regardless of
the write cache setting of a namespace. The DAX CPU cache flushing will
still be controlled the write_cache setting of the namespace.

Signed-off-by: Ross Zwisler <[email protected]>
Suggested-by: Dan Williams <[email protected]>
---
drivers/nvdimm/pmem.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 252adfab1e47..97b4c39a9267 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -294,7 +294,7 @@ static int pmem_attach_disk(struct device *dev,
{
struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
struct nd_region *nd_region = to_nd_region(dev->parent);
- int nid = dev_to_node(dev), fua, wbc;
+ int nid = dev_to_node(dev), fua;
struct resource *res = &nsio->res;
struct resource bb_res;
struct nd_pfn *nd_pfn = NULL;
@@ -330,7 +330,6 @@ static int pmem_attach_disk(struct device *dev,
dev_warn(dev, "unable to guarantee persistence of writes\n");
fua = 0;
}
- wbc = nvdimm_has_cache(nd_region);

if (!devm_request_mem_region(dev, res->start, resource_size(res),
dev_name(&ndns->dev))) {
@@ -377,7 +376,7 @@ static int pmem_attach_disk(struct device *dev,
return PTR_ERR(addr);
pmem->virt_addr = addr;

- blk_queue_write_cache(q, wbc, fua);
+ blk_queue_write_cache(q, true, fua);
blk_queue_make_request(q, pmem_make_request);
blk_queue_physical_block_size(q, PAGE_SIZE);
blk_queue_logical_block_size(q, pmem_sector_size(ndns));
@@ -408,7 +407,7 @@ static int pmem_attach_disk(struct device *dev,
put_disk(disk);
return -ENOMEM;
}
- dax_write_cache(dax_dev, wbc);
+ dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
pmem->dax_dev = dax_dev;

gendev = disk_to_dev(disk);
--
2.14.4

2018-06-06 18:27:51

by Ross Zwisler

[permalink] [raw]

Subject: [PATCH v3 4/4] libnvdimm: don't flush power-fail protected CPU caches

This commit:

5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")

intended to make sure that deep flush was always available even on
platforms which support a power-fail protected CPU cache. An unintended
side effect of this change was that we also lost the ability to skip
flushing CPU caches on those power-fail protected CPU cache.

Fix this by skipping the low level cache flushing in dax_flush() if we have
CPU caches which are power-fail protected. The user can still override this
behavior by manually setting the write_cache state of a namespace. See
libndctl's ndctl_namespace_write_cache_is_enabled(),
ndctl_namespace_enable_write_cache() and
ndctl_namespace_disable_write_cache() functions.

Signed-off-by: Ross Zwisler <[email protected]>
Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices via fsync()")
---
drivers/nvdimm/region_devs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index a612be6f019d..ec3543b83330 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1132,7 +1132,8 @@ EXPORT_SYMBOL_GPL(nvdimm_has_flush);

int nvdimm_has_cache(struct nd_region *nd_region)
{
- return is_nd_pmem(&nd_region->dev);
+ return is_nd_pmem(&nd_region->dev) &&
+ !test_bit(ND_REGION_PERSIST_CACHE, &nd_region->flags);
}
EXPORT_SYMBOL_GPL(nvdimm_has_cache);

--
2.14.4

2018-06-06 19:06:49

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH v3 1/4] pmem, libnvdimm: complete REQ_FLUSH => REQ_PREFLUSH

On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
<[email protected]> wrote:
> Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started
> way back in v4.8.
>
> Signed-off-by: Ross Zwisler <[email protected]>

Yup, long overdue. Applied for 4.18.

2018-06-06 19:09:11

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH v3 2/4] libnvdimm: unconditionally deep flush on *sync

On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
<[email protected]> wrote:
> Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
> write to each of the flush hints for a region) in response to an
> msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
> we were setting up the request queue. This happens due to the write cache
> value passed in to blk_queue_write_cache(), which then causes the block
> layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a
> "write_cache" sysfs entry for namespaces, i.e.:
>
> /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
>
> which can be used to control whether or not the kernel thinks a given
> namespace has a write cache, but this didn't modify the deep flush behavior
> that we set up when the driver was initialized. Instead, it only modified
> whether or not DAX would flush CPU caches via dax_flush() in response to
> *sync calls.
>
> Simplify this by making the *sync deep flush always happen, regardless of
> the write cache setting of a namespace. The DAX CPU cache flushing will
> still be controlled the write_cache setting of the namespace.
>
> Signed-off-by: Ross Zwisler <[email protected]>
> Suggested-by: Dan Williams <[email protected]>

Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
don't flush power-fail protected CPU caches" marked for -stable and
tagged with:

Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
via fsync()")

...any concerns with that?

2018-06-06 19:25:11

by Dan Williams

[permalink] [raw]

Subject: Re: [PATCH v3 2/4] libnvdimm: unconditionally deep flush on *sync

On Wed, Jun 6, 2018 at 11:16 AM, Ross Zwisler
<[email protected]> wrote:
> On Wed, Jun 06, 2018 at 10:57:59AM -0700, Dan Williams wrote:
>> On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
>> <[email protected]> wrote:
>> > Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
>> > write to each of the flush hints for a region) in response to an
>> > msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
>> > we were setting up the request queue. This happens due to the write cache
>> > value passed in to blk_queue_write_cache(), which then causes the block
>> > layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a
>> > "write_cache" sysfs entry for namespaces, i.e.:
>> >
>> > /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
>> >
>> > which can be used to control whether or not the kernel thinks a given
>> > namespace has a write cache, but this didn't modify the deep flush behavior
>> > that we set up when the driver was initialized. Instead, it only modified
>> > whether or not DAX would flush CPU caches via dax_flush() in response to
>> > *sync calls.
>> >
>> > Simplify this by making the *sync deep flush always happen, regardless of
>> > the write cache setting of a namespace. The DAX CPU cache flushing will
>> > still be controlled the write_cache setting of the namespace.
>> >
>> > Signed-off-by: Ross Zwisler <[email protected]>
>> > Suggested-by: Dan Williams <[email protected]>
>>
>> Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
>> don't flush power-fail protected CPU caches" marked for -stable and
>> tagged with:
>>
>> Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
>> via fsync()")
>>
>> ...any concerns with that?
>
> Nope, sounds good. Can you fix that up when you apply, or would it be helpful
> for me to send another revision with those tags?

I'll fix it up, thanks Ross.

2018-06-06 20:08:09

by Ross Zwisler

[permalink] [raw]

Subject: Re: [PATCH v3 2/4] libnvdimm: unconditionally deep flush on *sync

On Wed, Jun 06, 2018 at 10:57:59AM -0700, Dan Williams wrote:
> On Wed, Jun 6, 2018 at 9:45 AM, Ross Zwisler
> <[email protected]> wrote:
> > Prior to this commit we would only do a "deep flush" (have nvdimm_flush()
> > write to each of the flush hints for a region) in response to an
> > msync/fsync/sync call if the nvdimm_has_cache() returned true at the time
> > we were setting up the request queue. This happens due to the write cache
> > value passed in to blk_queue_write_cache(), which then causes the block
> > layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a
> > "write_cache" sysfs entry for namespaces, i.e.:
> >
> > /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache
> >
> > which can be used to control whether or not the kernel thinks a given
> > namespace has a write cache, but this didn't modify the deep flush behavior
> > that we set up when the driver was initialized. Instead, it only modified
> > whether or not DAX would flush CPU caches via dax_flush() in response to
> > *sync calls.
> >
> > Simplify this by making the *sync deep flush always happen, regardless of
> > the write cache setting of a namespace. The DAX CPU cache flushing will
> > still be controlled the write_cache setting of the namespace.
> >
> > Signed-off-by: Ross Zwisler <[email protected]>
> > Suggested-by: Dan Williams <[email protected]>
>
> Looks, good. I believe we want this one and ["PATCH v3 4/4] libnvdimm:
> don't flush power-fail protected CPU caches" marked for -stable and
> tagged with:
>
> Fixes: 5fdf8e5ba566 ("libnvdimm: re-enable deep flush for pmem devices
> via fsync()")
>
> ...any concerns with that?

Nope, sounds good. Can you fix that up when you apply, or would it be helpful
for me to send another revision with those tags?