Introduce dax device flag DAXDEV_RECOVERY to indicate a device
that is capable of recoverying from media poison.
For MD raid DAX devices, the capability is allowed for partial
device as oppose to the entire device. And the final poison
detection and repair rely on the provisioning base drivers.
Signed-off-by: Jane Chu <[email protected]>
---
drivers/dax/super.c | 24 ++++++++++++++++++++++++
drivers/nvdimm/pmem.c | 1 +
include/linux/dax.h | 14 ++++++++++++++
3 files changed, 39 insertions(+)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index e3029389d809..f4f607d9698b 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -109,6 +109,8 @@ enum dax_device_flags {
DAXDEV_NOCACHE,
/* handle CPU fetch exceptions during reads */
DAXDEV_NOMC,
+ /* flag to indicate device capable of poison recovery */
+ DAXDEV_RECOVERY,
};
/**
@@ -311,6 +313,28 @@ static void dax_destroy_inode(struct inode *inode)
"kill_dax() must be called before final iput()\n");
}
+void set_dax_recovery(struct dax_device *dax_dev)
+{
+ set_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(set_dax_recovery);
+
+bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+ return test_bit(DAXDEV_RECOVERY, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_recovery_capable);
+
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+ if (dax_recovery_capable(dax_dev)) {
+ set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
+ return 0;
+ }
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(dax_prep_recovery);
+
static const struct super_operations dax_sops = {
.statfs = simple_statfs,
.alloc_inode = dax_alloc_inode,
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 58d95242a836..e8823813a8df 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -487,6 +487,7 @@ static int pmem_attach_disk(struct device *dev,
if (rc)
goto out_cleanup_dax;
dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
+ set_dax_recovery(dax_dev);
pmem->dax_dev = dax_dev;
rc = device_add_disk(dev, disk, pmem_attribute_groups);
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 9fc5f99a0ae2..2fc776653c6e 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -40,6 +40,9 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc);
bool dax_write_cache_enabled(struct dax_device *dax_dev);
bool dax_synchronous(struct dax_device *dax_dev);
void set_dax_synchronous(struct dax_device *dax_dev);
+void set_dax_recovery(struct dax_device *dax_dev);
+bool dax_recovery_capable(struct dax_device *dax_dev);
+int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr);
/*
* Check if given mapping is supported by the file / underlying device.
*/
@@ -87,6 +90,17 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
{
return !(vma->vm_flags & VM_SYNC);
}
+static inline void set_dax_recovery(struct dax_device *dax_dev)
+{
+}
+static inline bool dax_recovery_capable(struct dax_device *dax_dev)
+{
+ return false;
+}
+static inline int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
+{
+ return -EINVAL;
+}
#endif
void set_dax_nocache(struct dax_device *dax_dev);
--
2.18.4
On 2/2/2022 5:23 AM, Christoph Hellwig wrote:
> On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
>> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
>> +{
>> + if (dax_recovery_capable(dax_dev)) {
>> + set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
>> + return 0;
>> + }
>> + return -EINVAL;
>
> Setting a random bit on a passed in memory address looks a little
> dangerous to me.
Yeah, I see. Would you suggest a way to pass the indication from
dax_iomap_iter to dax_direct_access that the caller intends the
callee to ignore poison in the range because the caller intends
to do recovery_write? We tried adding a flag to dax_direct_access, and
that wasn't liked if I recall.
>
> Also I'd return early for the EINVAL case to make the flow a little
> more clear.
Agreed, will do.
thanks!
-jane
On Fri, Jan 28, 2022 at 02:31:45PM -0700, Jane Chu wrote:
> +int dax_prep_recovery(struct dax_device *dax_dev, void **kaddr)
> +{
> + if (dax_recovery_capable(dax_dev)) {
> + set_bit(DAXDEV_RECOVERY, (unsigned long *)kaddr);
> + return 0;
> + }
> + return -EINVAL;
Setting a random bit on a passed in memory address looks a little
dangerous to me.
Also I'd return early for the EINVAL case to make the flow a little
more clear.
On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> Yeah, I see. Would you suggest a way to pass the indication from
> dax_iomap_iter to dax_direct_access that the caller intends the
> callee to ignore poison in the range because the caller intends
> to do recovery_write? We tried adding a flag to dax_direct_access, and
> that wasn't liked if I recall.
To me a flag seems cleaner than this magic, but let's wait for Dan to
chime in.
On 2/3/2022 9:32 PM, Dan Williams wrote:
> On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <[email protected]> wrote:
>>
>> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <[email protected]> wrote:
>>>
>>> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
>>>> Yeah, I see. Would you suggest a way to pass the indication from
>>>> dax_iomap_iter to dax_direct_access that the caller intends the
>>>> callee to ignore poison in the range because the caller intends
>>>> to do recovery_write? We tried adding a flag to dax_direct_access, and
>>>> that wasn't liked if I recall.
>>>
>>> To me a flag seems cleaner than this magic, but let's wait for Dan to
>>> chime in.
>>
>> So back in November I suggested modifying the kaddr, mainly to avoid
>> touching all the dax_direct_access() call sites [1]. However, now
>> seeing the code and Chrisoph's comment I think this either wants type
>> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
>> those options involve touching all dax_direct_access() call sites and
>> a @flags operation is more extensible if any other scenarios arrive
>> lets go ahead and plumb a flag and skip the magic.
>
> Just to be clear we are talking about a flow like:
>
> flags = 0;
> rc = dax_direct_access(..., &kaddr, flags, ...);
> if (unlikely(rc)) {
> flags |= DAX_RECOVERY;
> dax_direct_access(..., &kaddr, flags, ...);
> return dax_recovery_{read,write}(..., kaddr, ...);
> }
> return copy_{mc_to_iter,from_iter_flushcache}(...);
Okay, will go with a flag.
thanks!
-jane
On Thu, Feb 3, 2022 at 9:17 PM Dan Williams <[email protected]> wrote:
>
> On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <[email protected]> wrote:
> >
> > On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > > Yeah, I see. Would you suggest a way to pass the indication from
> > > dax_iomap_iter to dax_direct_access that the caller intends the
> > > callee to ignore poison in the range because the caller intends
> > > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > > that wasn't liked if I recall.
> >
> > To me a flag seems cleaner than this magic, but let's wait for Dan to
> > chime in.
>
> So back in November I suggested modifying the kaddr, mainly to avoid
> touching all the dax_direct_access() call sites [1]. However, now
> seeing the code and Chrisoph's comment I think this either wants type
> safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
> those options involve touching all dax_direct_access() call sites and
> a @flags operation is more extensible if any other scenarios arrive
> lets go ahead and plumb a flag and skip the magic.
Just to be clear we are talking about a flow like:
flags = 0;
rc = dax_direct_access(..., &kaddr, flags, ...);
if (unlikely(rc)) {
flags |= DAX_RECOVERY;
dax_direct_access(..., &kaddr, flags, ...);
return dax_recovery_{read,write}(..., kaddr, ...);
}
return copy_{mc_to_iter,from_iter_flushcache}(...);
On Thu, Feb 3, 2022 at 5:43 AM Christoph Hellwig <[email protected]> wrote:
>
> On Wed, Feb 02, 2022 at 09:27:42PM +0000, Jane Chu wrote:
> > Yeah, I see. Would you suggest a way to pass the indication from
> > dax_iomap_iter to dax_direct_access that the caller intends the
> > callee to ignore poison in the range because the caller intends
> > to do recovery_write? We tried adding a flag to dax_direct_access, and
> > that wasn't liked if I recall.
>
> To me a flag seems cleaner than this magic, but let's wait for Dan to
> chime in.
So back in November I suggested modifying the kaddr, mainly to avoid
touching all the dax_direct_access() call sites [1]. However, now
seeing the code and Chrisoph's comment I think this either wants type
safety (e.g. 'dax_addr_t *'), or just add a new flag. Given both of
those options involve touching all dax_direct_access() call sites and
a @flags operation is more extensible if any other scenarios arrive
lets go ahead and plumb a flag and skip the magic.