When user unbinds a BTT disk and binds again with a different
sector size without wiping out the disk, a BTT disk is created
with a wrong size.
This is because the bind operation keeps the previous metadata,
which leads nd_btt->lbasize inconsistent with internal_lbasize
and external_lbasize in the arena. A reboot also reattaches
the BTT from the previous metadata.
Change nd_btt_arena_is_valid() to check if nd_btt->uuid matches
with super->uuid when a new UUID is set for binding. This
assures the bind operation writes the metadata with the values
specified by user.
Cc: Vishal Verma <[email protected]>
Cc: Dan Williams <[email protected]>
Reported-by: Micah Parrish <[email protected]>
Signed-off-by: Toshi Kani <[email protected]>
---
drivers/nvdimm/btt_devs.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/nvdimm/btt_devs.c b/drivers/nvdimm/btt_devs.c
index cb47751..176ea25 100644
--- a/drivers/nvdimm/btt_devs.c
+++ b/drivers/nvdimm/btt_devs.c
@@ -218,6 +218,8 @@ static bool uuid_is_null(u8 *uuid)
* Check consistency of the btt info block with itself by validating
* the checksum, and with the parent namespace by verifying the
* parent_uuid contained in the info block with the one supplied in.
+ * When nd_btt->uuid is set for binding, verify if the metadata is
+ * stale.
*
* Returns:
* false for an invalid info block, true for a valid one
@@ -234,6 +236,10 @@ bool nd_btt_arena_is_valid(struct nd_btt *nd_btt, struct btt_sb *super)
if (memcmp(super->parent_uuid, parent_uuid, 16) != 0)
return false;
+ if (nd_btt->uuid)
+ if (memcmp(super->uuid, nd_btt->uuid, 16) != 0)
+ return false;
+
checksum = le64_to_cpu(super->checksum);
super->checksum = 0;
if (checksum != nd_sb_checksum((struct nd_gen_sb *) super))
On Thu, 2015-12-17 at 16:00 -0700, Toshi Kani wrote:
> When user unbinds a BTT disk and binds again with a different
> sector size without wiping out the disk, a BTT disk is created
> with a wrong size.
I think this is an incorrect usage model in the first place. You
shouldn't expect to disable a BTT, change the sector size or uuid
behind it, and expect it to work with the new sector size on re-
enabling.
While this patch makes the BTT see the right size, it is really just an
illusion, because if you try to read the pre-sector-size-change data,
it will be scrambled, and thus practically lost.
Even with this patch, I can skip changing the UUID, just change the
sector size, and re-enable it, and the total available size will appear
to have changed.
For the case of legacy (non-nfit) namespaces, the only way to change a
BTT's properties is to recreate it using 'force_raw'. For non-legacy
namespaces, the recommended way is to recreate the namespace with a new
uuid, and this will cause BTT to react to the parent_uuid change and
not try to bind itself to stale metadata. Both cases will lose any data
on the old BTT.
Ideally, changing BTT properties shouldn't be allowed till the parent
namespaces is recreated, but I'm not sure there is an easy way to
enforce this -- Dan?
Also, I wonder if this problem is solved by using libndctl to manage
BTTs.
Thanks,
-Vishal
On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote:
> On Thu, 2015-12-17 at 16:00 -0700, Toshi Kani wrote:
> > When user unbinds a BTT disk and binds again with a different
> > sector size without wiping out the disk, a BTT disk is created
> > with a wrong size.
>
> I think this is an incorrect usage model in the first place. You
> shouldn't expect to disable a BTT, change the sector size or uuid
> behind it, and expect it to work with the new sector size on re-
> enabling.
Well, users do test with multiple sector sizes when they evaluate BTT. So,
this is a legitimate use-case with a missing step to wipe out the data in
between. I believe BTT needs to be protected from this case in one way or
the other.
> While this patch makes the BTT see the right size, it is really just an
> illusion, because if you try to read the pre-sector-size-change data,
> it will be scrambled, and thus practically lost.
When user binds a new BTT, he/she does not expect any valid data left in
BTT. mkfs is then performed before using the BTT disk. This use-case is
similar with re-partitioning.
> Even with this patch, I can skip changing the UUID, just change the
> sector size, and re-enable it, and the total available size will appear
> to have changed.
Using the same UUID requires deliberate effort since uuidgen won't generate
the same UUID again. We can check the sector size in addition to the UUID,
if that makes it any safer. I am not sure how much we need to protect from
such deliberate effort, though.
> For the case of legacy (non-nfit) namespaces, the only way to change a
> BTT's properties is to recreate it using 'force_raw'. For non-legacy
> namespaces, the recommended way is to recreate the namespace with a new
> uuid, and this will cause BTT to react to the parent_uuid change and
> not try to bind itself to stale metadata. Both cases will lose any data
> on the old BTT.
Losing the data is expected after newly binding BTT, which is not an issue
here. The issue is that BTT operates in split sector sizes when user
missed the step to wipe out the data before binding. Yes, this patch
follows the same approach of the parent_uuid check.
> Ideally, changing BTT properties shouldn't be allowed till the parent
> namespaces is recreated, but I'm not sure there is an easy way to
> enforce this -- Dan?
>
> Also, I wonder if this problem is solved by using libndctl to manage
> BTTs.
I have not tested with libndctl yet, but I think our bind/unbind scripts do
the same procedures.
Thanks,
-Toshi
On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <[email protected]> wrote:
> On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote:
>> Also, I wonder if this problem is solved by using libndctl to manage
>> BTTs.
>
> I have not tested with libndctl yet, but I think our bind/unbind scripts do
> the same procedures.
We loop through all combinations of sector size in our unit test. If
you want to change the sector size the expectation is that the
namespace is destroyed and fully re-created, especially due to the
fact that changing sector size invalidates all data on the namespace.
See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c
On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote:
> On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <[email protected]> wrote:
> > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote:
> > > Also, I wonder if this problem is solved by using libndctl to manage
> > > BTTs.
> >
> > I have not tested with libndctl yet, but I think our bind/unbind
> > scripts do the same procedures.
>
> We loop through all combinations of sector size in our unit test. If
> you want to change the sector size the expectation is that the
> namespace is destroyed and fully re-created, especially due to the
> fact that changing sector size invalidates all data on the namespace.
>
> See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c
The parent_uuid is not set on our NVDIMM-N systems. I do not see 'uuid'
file under sysfs 'namespaceX.X' per namespace_visible(), either. This
concept of creating/destroying a namespace is a bit foreign to us since
we've never needed to do. Can you elaborate how it's supposed to work for
NVDIMM-N?
Thanks,
-Toshi
On Fri, Dec 18, 2015 at 10:52 AM, Toshi Kani <[email protected]> wrote:
> On Fri, 2015-12-18 at 09:54 -0800, Dan Williams wrote:
>> On Fri, Dec 18, 2015 at 7:15 AM, Toshi Kani <[email protected]> wrote:
>> > On Fri, 2015-12-18 at 01:34 -0700, Vishal Verma wrote:
>> > > Also, I wonder if this problem is solved by using libndctl to manage
>> > > BTTs.
>> >
>> > I have not tested with libndctl yet, but I think our bind/unbind
>> > scripts do the same procedures.
>>
>> We loop through all combinations of sector size in our unit test. If
>> you want to change the sector size the expectation is that the
>> namespace is destroyed and fully re-created, especially due to the
>> fact that changing sector size invalidates all data on the namespace.
>>
>> See: https://github.com/pmem/ndctl/blob/master/lib/test-libndctl.c
>
> The parent_uuid is not set on our NVDIMM-N systems. I do not see 'uuid'
> file under sysfs 'namespaceX.X' per namespace_visible(), either. This
> concept of creating/destroying a namespace is a bit foreign to us since
> we've never needed to do. Can you elaborate how it's supposed to work for
> NVDIMM-N?
>
Ugh, yes. An oversight on my part, let me give this some thought.
Whatever we decide, I want the libndctl api to be identical for the
two cases.
I expect the simplest option is to have
ndctl_namespace_disable_invalidate() destroy the btt-info block in the
NVDIMM-N case.