ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
of error actually cleared, which may be smaller than its requested
'len'.
Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
'clear_err.cleared' when this value is valid.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vishal Verma <[email protected]>
---
Based on 'libnvdimm-for-next'.
---
drivers/nvdimm/bus.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index d214ac44..43ddfd4 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
if (cmd_rc < 0)
return cmd_rc;
- nvdimm_forget_poison(nvdimm_bus, phys, len);
+ if (clear_err.cleared > 0)
+ nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);
+
return clear_err.cleared;
}
EXPORT_SYMBOL_GPL(nvdimm_clear_poison);
Badblocks are tracked at both region and device levels.
pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
and then badblocks_clear() to clear badblocks at the device level.
However, it does not update badblocks at the region level, which
makes them inconsistent.
Change nvdimm_clear_poison() to update backblocks at the region
level to keep them consistent.
Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vishal Verma <[email protected]>
---
Based on 'libnvdimm-for-next'.
---
drivers/nvdimm/bus.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 43ddfd4..998332d 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -179,6 +179,7 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
struct nvdimm_bus_descriptor *nd_desc;
struct nd_cmd_clear_error clear_err;
struct nd_cmd_ars_cap ars_cap;
+ struct resource res;
u32 clear_err_unit, mask;
int cmd_rc, rc;
@@ -222,6 +223,14 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
if (clear_err.cleared > 0)
nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);
+ if (clear_err.cleared > 0 && clear_err.cleared / 512) {
+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ res.start = phys;
+ res.end = phys + clear_err.cleared - 1;
+ __nvdimm_bus_badblocks_clear(nvdimm_bus, &res);
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
+ }
+
return clear_err.cleared;
}
EXPORT_SYMBOL_GPL(nvdimm_clear_poison);
On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
> ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
> of error actually cleared, which may be smaller than its requested
> 'len'.
>
> Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
> 'clear_err.cleared' when this value is valid.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Vishal Verma <[email protected]>
> ---
> Based on 'libnvdimm-for-next'.
> ---
> drivers/nvdimm/bus.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index d214ac44..43ddfd4 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
> if (cmd_rc < 0)
> return cmd_rc;
>
> - nvdimm_forget_poison(nvdimm_bus, phys, len);
> + if (clear_err.cleared > 0)
> + nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);
> +
> return clear_err.cleared;
Looks, good we need to mark this for -stable since the bug is also
present in current mainline.
Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
clearing badblocks")
On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
> wrote:
> > ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
> > of error actually cleared, which may be smaller than its requested
> > 'len'.
> >
> > Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
> > 'clear_err.cleared' when this value is valid.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Dave Jiang <[email protected]>
> > Cc: Vishal Verma <[email protected]>
> > ---
> > Based on 'libnvdimm-for-next'.
> > ---
> > drivers/nvdimm/bus.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> > index d214ac44..43ddfd4 100644
> > --- a/drivers/nvdimm/bus.c
> > +++ b/drivers/nvdimm/bus.c
> > @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev,
> > phys_addr_t phys,
> > if (cmd_rc < 0)
> > return cmd_rc;
> >
> > - nvdimm_forget_poison(nvdimm_bus, phys, len);
> > + if (clear_err.cleared > 0)
> > + nvdimm_forget_poison(nvdimm_bus, phys,
> > clear_err.cleared);
> > +
> > return clear_err.cleared;
>
> Looks, good we need to mark this for -stable since the bug is also
> present in current mainline.
>
> Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
> clearing badblocks")
Shall I send a patch based on the current mainline with cc to -stable?
The func name is nvdimm_clear_from_poison_list() in the mainline.
Thanks,
-Toshi
On Fri, Apr 28, 2017 at 3:29 PM, Kani, Toshimitsu <[email protected]> wrote:
> On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
>> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
>> wrote:
>> > ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
>> > of error actually cleared, which may be smaller than its requested
>> > 'len'.
>> >
>> > Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
>> > 'clear_err.cleared' when this value is valid.
>> >
>> > Signed-off-by: Toshi Kani <[email protected]>
>> > Cc: Dan Williams <[email protected]>
>> > Cc: Dave Jiang <[email protected]>
>> > Cc: Vishal Verma <[email protected]>
>> > ---
>> > Based on 'libnvdimm-for-next'.
>> > ---
>> > drivers/nvdimm/bus.c | 4 +++-
>> > 1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
>> > index d214ac44..43ddfd4 100644
>> > --- a/drivers/nvdimm/bus.c
>> > +++ b/drivers/nvdimm/bus.c
>> > @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev,
>> > phys_addr_t phys,
>> > if (cmd_rc < 0)
>> > return cmd_rc;
>> >
>> > - nvdimm_forget_poison(nvdimm_bus, phys, len);
>> > + if (clear_err.cleared > 0)
>> > + nvdimm_forget_poison(nvdimm_bus, phys,
>> > clear_err.cleared);
>> > +
>> > return clear_err.cleared;
>>
>> Looks, good we need to mark this for -stable since the bug is also
>> present in current mainline.
>>
>> Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
>> clearing badblocks")
>
> Shall I send a patch based on the current mainline with cc to -stable?
> The func name is nvdimm_clear_from_poison_list() in the mainline.
I thinks it's too late to get a fix into 4.11, especially when this
went in broken and not a regression. I'll just tag this for -stable
and handle the backport manually.
On Fri, 2017-04-28 at 15:39 -0700, Dan Williams wrote:
> On Fri, Apr 28, 2017 at 3:29 PM, Kani, Toshimitsu <[email protected]
> > wrote:
> > On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
> > > On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
:
> > > >
> > > > - nvdimm_forget_poison(nvdimm_bus, phys, len);
> > > > + if (clear_err.cleared > 0)
> > > > + nvdimm_forget_poison(nvdimm_bus, phys,
> > > > clear_err.cleared);
> > > > +
> > > > return clear_err.cleared;
> > >
> > > Looks, good we need to mark this for -stable since the bug is
> > > also present in current mainline.
> > >
> > > Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list
> > > when clearing badblocks")
> >
> > Shall I send a patch based on the current mainline with cc to
> > -stable? The func name is nvdimm_clear_from_poison_list() in the
> > mainline.
>
> I thinks it's too late to get a fix into 4.11, especially when this
> went in broken and not a regression. I'll just tag this for -stable
> and handle the backport manually.
Sounds great. Thanks Dan!
-Toshi
On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
> Badblocks are tracked at both region and device levels.
> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
> and then badblocks_clear() to clear badblocks at the device level.
> However, it does not update badblocks at the region level, which
> makes them inconsistent.
>
> Change nvdimm_clear_poison() to update backblocks at the region
> level to keep them consistent.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Vishal Verma <[email protected]>
This looks good, and it seems we have a bug in the other location that
does this in __nd_ioctl(). That other one is missing the
"clear_err.cleared / 512" check. Can you respin this and define a
common helper that both locations can call?
On Fri, Apr 28, 2017 at 5:10 PM, Dan Williams <[email protected]> wrote:
> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
>> Badblocks are tracked at both region and device levels.
>> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
>> and then badblocks_clear() to clear badblocks at the device level.
>> However, it does not update badblocks at the region level, which
>> makes them inconsistent.
>>
>> Change nvdimm_clear_poison() to update backblocks at the region
>> level to keep them consistent.
>>
>> Signed-off-by: Toshi Kani <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Dave Jiang <[email protected]>
>> Cc: Vishal Verma <[email protected]>
>
> This looks good, and it seems we have a bug in the other location that
> does this in __nd_ioctl(). That other one is missing the
> "clear_err.cleared / 512" check. Can you respin this and define a
> common helper that both locations can call?
On second thought, I'll take this and spin my own cleanup / fix on top.
Thanks Toshi!
On Fri, Apr 28, 2017 at 5:12 PM, Dan Williams <[email protected]> wrote:
> On Fri, Apr 28, 2017 at 5:10 PM, Dan Williams <[email protected]> wrote:
>> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
>>> Badblocks are tracked at both region and device levels.
>>> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
>>> and then badblocks_clear() to clear badblocks at the device level.
>>> However, it does not update badblocks at the region level, which
>>> makes them inconsistent.
>>>
>>> Change nvdimm_clear_poison() to update backblocks at the region
>>> level to keep them consistent.
>>>
>>> Signed-off-by: Toshi Kani <[email protected]>
>>> Cc: Dan Williams <[email protected]>
>>> Cc: Dave Jiang <[email protected]>
>>> Cc: Vishal Verma <[email protected]>
>>
>> This looks good, and it seems we have a bug in the other location that
>> does this in __nd_ioctl(). That other one is missing the
>> "clear_err.cleared / 512" check. Can you respin this and define a
>> common helper that both locations can call?
>
> On second thought, I'll take this and spin my own cleanup / fix on top.
>
> Thanks Toshi!
...and I need to drop it again because it causes this:
[ 106.974889] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:74
7
[ 106.977328] in_atomic(): 1, irqs_disabled(): 0, pid: 5584, name: dd
[ 106.978845] 1 lock held by dd/5584:
[ 106.979923] #0: (&bdev->bd_mutex){+.+.+.}, at:
[<ffffffff812f4937>] __blkdev_put+0x47/0x3
70
[ 106.982221] CPU: 29 PID: 5584 Comm: dd Tainted: G O
4.11.0-rc4+ #105
[ 106.984329] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.fc25 04/01/
2014
[ 106.986638] Call Trace:
[ 106.987530] dump_stack+0x86/0xc3
[ 106.988555] ___might_sleep+0x17d/0x250
[ 106.989650] __might_sleep+0x4a/0x80
[ 106.990718] __mutex_lock+0x58/0x980
[ 106.991788] ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
[ 106.993059] ? _raw_spin_unlock+0x27/0x40
[ 106.994181] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 106.995430] mutex_lock_nested+0x1b/0x20
[ 106.996552] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
[ 106.997804] nvdimm_clear_poison+0x11a/0x150 [libnvdimm]
[ 106.999138] nsio_rw_bytes+0x18f/0x280 [libnvdimm]
[ 107.000390] btt_write_pg+0x1d4/0x3c0 [nd_btt]