2017-04-27 22:57:49

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH 1/2] libnvdimm: fix clear length of nvdimm_forget_poison()

ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
of error actually cleared, which may be smaller than its requested
'len'.

Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
'clear_err.cleared' when this value is valid.

Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vishal Verma <[email protected]>
---
Based on 'libnvdimm-for-next'.
---
drivers/nvdimm/bus.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index d214ac44..43ddfd4 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
if (cmd_rc < 0)
return cmd_rc;

- nvdimm_forget_poison(nvdimm_bus, phys, len);
+ if (clear_err.cleared > 0)
+ nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);
+
return clear_err.cleared;
}
EXPORT_SYMBOL_GPL(nvdimm_clear_poison);


2017-04-27 22:57:58

by Kani, Toshimitsu

[permalink] [raw]
Subject: [PATCH 2/2] libnvdimm: clear region badblock in nvdimm_clear_poison()

Badblocks are tracked at both region and device levels.
pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
and then badblocks_clear() to clear badblocks at the device level.
However, it does not update badblocks at the region level, which
makes them inconsistent.

Change nvdimm_clear_poison() to update backblocks at the region
level to keep them consistent.

Signed-off-by: Toshi Kani <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Jiang <[email protected]>
Cc: Vishal Verma <[email protected]>
---
Based on 'libnvdimm-for-next'.
---
drivers/nvdimm/bus.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 43ddfd4..998332d 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -179,6 +179,7 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
struct nvdimm_bus_descriptor *nd_desc;
struct nd_cmd_clear_error clear_err;
struct nd_cmd_ars_cap ars_cap;
+ struct resource res;
u32 clear_err_unit, mask;
int cmd_rc, rc;

@@ -222,6 +223,14 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
if (clear_err.cleared > 0)
nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);

+ if (clear_err.cleared > 0 && clear_err.cleared / 512) {
+ nvdimm_bus_lock(&nvdimm_bus->dev);
+ res.start = phys;
+ res.end = phys + clear_err.cleared - 1;
+ __nvdimm_bus_badblocks_clear(nvdimm_bus, &res);
+ nvdimm_bus_unlock(&nvdimm_bus->dev);
+ }
+
return clear_err.cleared;
}
EXPORT_SYMBOL_GPL(nvdimm_clear_poison);

2017-04-28 21:48:11

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/2] libnvdimm: fix clear length of nvdimm_forget_poison()

On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
> ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
> of error actually cleared, which may be smaller than its requested
> 'len'.
>
> Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
> 'clear_err.cleared' when this value is valid.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Vishal Verma <[email protected]>
> ---
> Based on 'libnvdimm-for-next'.
> ---
> drivers/nvdimm/bus.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> index d214ac44..43ddfd4 100644
> --- a/drivers/nvdimm/bus.c
> +++ b/drivers/nvdimm/bus.c
> @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev, phys_addr_t phys,
> if (cmd_rc < 0)
> return cmd_rc;
>
> - nvdimm_forget_poison(nvdimm_bus, phys, len);
> + if (clear_err.cleared > 0)
> + nvdimm_forget_poison(nvdimm_bus, phys, clear_err.cleared);
> +
> return clear_err.cleared;

Looks, good we need to mark this for -stable since the bug is also
present in current mainline.

Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
clearing badblocks")

2017-04-28 22:29:30

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] libnvdimm: fix clear length of nvdimm_forget_poison()

On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
> wrote:
> > ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
> > of error actually cleared, which may be smaller than its requested
> > 'len'.
> >
> > Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
> > 'clear_err.cleared' when this value is valid.
> >
> > Signed-off-by: Toshi Kani <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Dave Jiang <[email protected]>
> > Cc: Vishal Verma <[email protected]>
> > ---
> > Based on 'libnvdimm-for-next'.
> > ---
> >  drivers/nvdimm/bus.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
> > index d214ac44..43ddfd4 100644
> > --- a/drivers/nvdimm/bus.c
> > +++ b/drivers/nvdimm/bus.c
> > @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev,
> > phys_addr_t phys,
> >         if (cmd_rc < 0)
> >                 return cmd_rc;
> >
> > -       nvdimm_forget_poison(nvdimm_bus, phys, len);
> > +       if (clear_err.cleared > 0)
> > +               nvdimm_forget_poison(nvdimm_bus, phys,
> > clear_err.cleared);
> > +
> >         return clear_err.cleared;
>
> Looks, good we need to mark this for -stable since the bug is also
> present in current mainline.
>
> Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
> clearing badblocks")

Shall I send a patch based on the current mainline with cc to -stable?
The func name is nvdimm_clear_from_poison_list() in the mainline.

Thanks,
-Toshi

2017-04-28 22:39:09

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 1/2] libnvdimm: fix clear length of nvdimm_forget_poison()

On Fri, Apr 28, 2017 at 3:29 PM, Kani, Toshimitsu <[email protected]> wrote:
> On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
>> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
>> wrote:
>> > ND_CMD_CLEAR_ERROR command returns 'clear_err.cleared', the length
>> > of error actually cleared, which may be smaller than its requested
>> > 'len'.
>> >
>> > Change nvdimm_clear_poison() to call nvdimm_forget_poison() with
>> > 'clear_err.cleared' when this value is valid.
>> >
>> > Signed-off-by: Toshi Kani <[email protected]>
>> > Cc: Dan Williams <[email protected]>
>> > Cc: Dave Jiang <[email protected]>
>> > Cc: Vishal Verma <[email protected]>
>> > ---
>> > Based on 'libnvdimm-for-next'.
>> > ---
>> > drivers/nvdimm/bus.c | 4 +++-
>> > 1 file changed, 3 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
>> > index d214ac44..43ddfd4 100644
>> > --- a/drivers/nvdimm/bus.c
>> > +++ b/drivers/nvdimm/bus.c
>> > @@ -219,7 +219,9 @@ long nvdimm_clear_poison(struct device *dev,
>> > phys_addr_t phys,
>> > if (cmd_rc < 0)
>> > return cmd_rc;
>> >
>> > - nvdimm_forget_poison(nvdimm_bus, phys, len);
>> > + if (clear_err.cleared > 0)
>> > + nvdimm_forget_poison(nvdimm_bus, phys,
>> > clear_err.cleared);
>> > +
>> > return clear_err.cleared;
>>
>> Looks, good we need to mark this for -stable since the bug is also
>> present in current mainline.
>>
>> Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list when
>> clearing badblocks")
>
> Shall I send a patch based on the current mainline with cc to -stable?
> The func name is nvdimm_clear_from_poison_list() in the mainline.

I thinks it's too late to get a fix into 4.11, especially when this
went in broken and not a regression. I'll just tag this for -stable
and handle the backport manually.

2017-04-28 22:42:02

by Kani, Toshimitsu

[permalink] [raw]
Subject: Re: [PATCH 1/2] libnvdimm: fix clear length of nvdimm_forget_poison()

On Fri, 2017-04-28 at 15:39 -0700, Dan Williams wrote:
> On Fri, Apr 28, 2017 at 3:29 PM, Kani, Toshimitsu <[email protected]
> > wrote:
> > On Fri, 2017-04-28 at 14:48 -0700, Dan Williams wrote:
> > > On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]>
:
> > > >
> > > > -       nvdimm_forget_poison(nvdimm_bus, phys, len);
> > > > +       if (clear_err.cleared > 0)
> > > > +               nvdimm_forget_poison(nvdimm_bus, phys,
> > > > clear_err.cleared);
> > > > +
> > > >         return clear_err.cleared;
> > >
> > > Looks, good we need to mark this for -stable since the bug is
> > > also present in current mainline.
> > >
> > > Fixes: e046114af5fc ("libnvdimm: clear the internal poison_list
> > > when clearing badblocks")
> >
> > Shall I send a patch based on the current mainline with cc to
> > -stable? The func name is nvdimm_clear_from_poison_list() in the
> > mainline.
>
> I thinks it's too late to get a fix into 4.11, especially when this
> went in broken and not a regression. I'll just tag this for -stable
> and handle the backport manually.

Sounds great. Thanks Dan!
-Toshi



2017-04-29 00:10:37

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/2] libnvdimm: clear region badblock in nvdimm_clear_poison()

On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
> Badblocks are tracked at both region and device levels.
> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
> and then badblocks_clear() to clear badblocks at the device level.
> However, it does not update badblocks at the region level, which
> makes them inconsistent.
>
> Change nvdimm_clear_poison() to update backblocks at the region
> level to keep them consistent.
>
> Signed-off-by: Toshi Kani <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Cc: Vishal Verma <[email protected]>

This looks good, and it seems we have a bug in the other location that
does this in __nd_ioctl(). That other one is missing the
"clear_err.cleared / 512" check. Can you respin this and define a
common helper that both locations can call?

2017-04-29 00:12:41

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/2] libnvdimm: clear region badblock in nvdimm_clear_poison()

On Fri, Apr 28, 2017 at 5:10 PM, Dan Williams <[email protected]> wrote:
> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
>> Badblocks are tracked at both region and device levels.
>> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
>> and then badblocks_clear() to clear badblocks at the device level.
>> However, it does not update badblocks at the region level, which
>> makes them inconsistent.
>>
>> Change nvdimm_clear_poison() to update backblocks at the region
>> level to keep them consistent.
>>
>> Signed-off-by: Toshi Kani <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Dave Jiang <[email protected]>
>> Cc: Vishal Verma <[email protected]>
>
> This looks good, and it seems we have a bug in the other location that
> does this in __nd_ioctl(). That other one is missing the
> "clear_err.cleared / 512" check. Can you respin this and define a
> common helper that both locations can call?

On second thought, I'll take this and spin my own cleanup / fix on top.

Thanks Toshi!

2017-04-29 00:36:09

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 2/2] libnvdimm: clear region badblock in nvdimm_clear_poison()

On Fri, Apr 28, 2017 at 5:12 PM, Dan Williams <[email protected]> wrote:
> On Fri, Apr 28, 2017 at 5:10 PM, Dan Williams <[email protected]> wrote:
>> On Thu, Apr 27, 2017 at 3:57 PM, Toshi Kani <[email protected]> wrote:
>>> Badblocks are tracked at both region and device levels.
>>> pmem_clear_poison() and nsio_rw_bytes() call nvdimm_clear_poison()
>>> and then badblocks_clear() to clear badblocks at the device level.
>>> However, it does not update badblocks at the region level, which
>>> makes them inconsistent.
>>>
>>> Change nvdimm_clear_poison() to update backblocks at the region
>>> level to keep them consistent.
>>>
>>> Signed-off-by: Toshi Kani <[email protected]>
>>> Cc: Dan Williams <[email protected]>
>>> Cc: Dave Jiang <[email protected]>
>>> Cc: Vishal Verma <[email protected]>
>>
>> This looks good, and it seems we have a bug in the other location that
>> does this in __nd_ioctl(). That other one is missing the
>> "clear_err.cleared / 512" check. Can you respin this and define a
>> common helper that both locations can call?
>
> On second thought, I'll take this and spin my own cleanup / fix on top.
>
> Thanks Toshi!

...and I need to drop it again because it causes this:

[ 106.974889] BUG: sleeping function called from invalid context at
kernel/locking/mutex.c:74
7
[ 106.977328] in_atomic(): 1, irqs_disabled(): 0, pid: 5584, name: dd
[ 106.978845] 1 lock held by dd/5584:
[ 106.979923] #0: (&bdev->bd_mutex){+.+.+.}, at:
[<ffffffff812f4937>] __blkdev_put+0x47/0x3
70
[ 106.982221] CPU: 29 PID: 5584 Comm: dd Tainted: G O
4.11.0-rc4+ #105
[ 106.984329] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.fc25 04/01/
2014
[ 106.986638] Call Trace:
[ 106.987530] dump_stack+0x86/0xc3
[ 106.988555] ___might_sleep+0x17d/0x250
[ 106.989650] __might_sleep+0x4a/0x80
[ 106.990718] __mutex_lock+0x58/0x980
[ 106.991788] ? nvdimm_bus_lock+0x21/0x30 [libnvdimm]
[ 106.993059] ? _raw_spin_unlock+0x27/0x40
[ 106.994181] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 106.995430] mutex_lock_nested+0x1b/0x20
[ 106.996552] nvdimm_bus_lock+0x21/0x30 [libnvdimm]
[ 106.997804] nvdimm_clear_poison+0x11a/0x150 [libnvdimm]
[ 106.999138] nsio_rw_bytes+0x18f/0x280 [libnvdimm]
[ 107.000390] btt_write_pg+0x1d4/0x3c0 [nd_btt]