2022-09-12 15:41:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote:
> On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote:
> > On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote:
> > > On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote:
> > > > On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote:
> > > > > Because of the "seen" flag, we have a 63 bit counter to play
> > > > > with.
> > > > > Could
> > > > > we use a similar scheme to the one we use to handle when
> > > > > "jiffies"
> > > > > wraps? Assume that we'd never compare two values that were
> > > > > more
> > > > > than
> > > > > 2^62 apart? We could add i_version_before/i_version_after
> > > > > macros to
> > > > > make
> > > > > it simple to handle this.
> > > >
> > > > As far as I recall the protocol just assumes it can never
> > > > wrap. 
> > > > I
> > > > guess
> > > > you could add a new change_attr_type that works the way you
> > > > describe.
> > > > But without some new protocol clients aren't going to know what
> > > > to do
> > > > with a change attribute that wraps.
> > > >
> > > > I think this just needs to be designed so that wrapping is
> > > > impossible
> > > > in
> > > > any realistic scenario.  I feel like that's doable?
> > > >
> > > > If we feel we have to catch that case, the only 100% correct
> > > > behavior
> > > > would probably be to make the filesystem readonly.
> > > >
> > >
> > > Which protocol? If you're talking about basic NFSv4, it doesn't
> > > assume
> > > anything about the change attribute and wrapping.
> > >
> > > The NFSv4.2 protocol did introduce the optional attribute
> > > 'change_attr_type' that tries to describe the change attribute
> > > behaviour to the client. It tells you if the behaviour is
> > > monotonically
> > > increasing, but doesn't say anything about the behaviour when the
> > > attribute value overflows.
> > >
> > > That said, the Linux NFSv4.2 client, which uses that
> > > change_attr_type
> > > attribute does deal with overflow by assuming standard uint64_t
> > > wrap
> > > around rules. i.e. it assumes bit values > 63 are truncated,
> > > meaning
> > > that the value obtained by incrementing (2^64-1) is 0.
> >
> > Yeah, it was the MONOTONIC_INCRE case I was thinking of.  That's
> > interesting, I didn't know the client did that.
> >
>
> If you look at where we compare version numbers, it is always some
> variant of the following:
>
> static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr
> *fattr,
>                                          const struct inode *inode)
> {
>         s64 diff = fattr->change_attr -
> inode_peek_iversion_raw(inode);
>         if (diff > 0)
>                 return 1;
>         return diff == 0 ? 0 : -1;
> }
>
> i.e. we do an unsigned 64-bit subtraction, and then cast it to the
> signed 64-bit equivalent in order to figure out which is the more
> recent value.
>

...and by the way, yes this does mean that if you suddenly add a value
of 2^63 to the change attribute, then you are likely to cause the
client to think that you just handed it an old value.

i.e. you're better off having the crash counter increment the change
attribute by a relatively small value. One that is guaranteed to be
larger than the values that may have been lost, but that is not
excessively large.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]



2022-09-12 16:02:38

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field

On Mon, 2022-09-12 at 15:32 +0000, Trond Myklebust wrote:
> On Mon, 2022-09-12 at 14:56 +0000, Trond Myklebust wrote:
> > On Mon, 2022-09-12 at 10:50 -0400, J. Bruce Fields wrote:
> > > On Mon, Sep 12, 2022 at 02:15:16PM +0000, Trond Myklebust wrote:
> > > > On Mon, 2022-09-12 at 09:51 -0400, J. Bruce Fields wrote:
> > > > > On Mon, Sep 12, 2022 at 08:55:04AM -0400, Jeff Layton wrote:
> > > > > > Because of the "seen" flag, we have a 63 bit counter to play
> > > > > > with.
> > > > > > Could
> > > > > > we use a similar scheme to the one we use to handle when
> > > > > > "jiffies"
> > > > > > wraps??Assume that we'd never compare two values that were
> > > > > > more
> > > > > > than
> > > > > > 2^62 apart? We could add i_version_before/i_version_after
> > > > > > macros to
> > > > > > make
> > > > > > it simple to handle this.
> > > > >
> > > > > As far as I recall the protocol just assumes it can never
> > > > > wrap.?
> > > > > I
> > > > > guess
> > > > > you could add a new change_attr_type that works the way you
> > > > > describe.
> > > > > But without some new protocol clients aren't going to know what
> > > > > to do
> > > > > with a change attribute that wraps.
> > > > >
> > > > > I think this just needs to be designed so that wrapping is
> > > > > impossible
> > > > > in
> > > > > any realistic scenario.? I feel like that's doable?
> > > > >
> > > > > If we feel we have to catch that case, the only 100% correct
> > > > > behavior
> > > > > would probably be to make the filesystem readonly.
> > > > >
> > > >
> > > > Which protocol? If you're talking about basic NFSv4, it doesn't
> > > > assume
> > > > anything about the change attribute and wrapping.
> > > >
> > > > The NFSv4.2 protocol did introduce the optional attribute
> > > > 'change_attr_type' that tries to describe the change attribute
> > > > behaviour to the client. It tells you if the behaviour is
> > > > monotonically
> > > > increasing, but doesn't say anything about the behaviour when the
> > > > attribute value overflows.
> > > >
> > > > That said, the Linux NFSv4.2 client, which uses that
> > > > change_attr_type
> > > > attribute does deal with overflow by assuming standard uint64_t
> > > > wrap
> > > > around rules. i.e. it assumes bit values > 63 are truncated,
> > > > meaning
> > > > that the value obtained by incrementing (2^64-1) is 0.
> > >
> > > Yeah, it was the MONOTONIC_INCRE case I was thinking of.? That's
> > > interesting, I didn't know the client did that.
> > >
> >
> > If you look at where we compare version numbers, it is always some
> > variant of the following:
> >
> > static int nfs_inode_attrs_cmp_monotonic(const struct nfs_fattr
> > *fattr,
> > ???????????????????????????????????????? const struct inode *inode)
> > {
> > ??????? s64 diff = fattr->change_attr -
> > inode_peek_iversion_raw(inode);
> > ??????? if (diff > 0)
> > ??????????????? return 1;
> > ??????? return diff == 0 ? 0 : -1;
> > }
> >
> > i.e. we do an unsigned 64-bit subtraction, and then cast it to the
> > signed 64-bit equivalent in order to figure out which is the more
> > recent value.
> >

Good! This seems like the reasonable thing to do, given that the spec
doesn't really say that the change attribute has to start at low values.

>
> ...and by the way, yes this does mean that if you suddenly add a value
> of 2^63 to the change attribute, then you are likely to cause the
> client to think that you just handed it an old value.
>
> i.e. you're better off having the crash counter increment the change
> attribute by a relatively small value. One that is guaranteed to be
> larger than the values that may have been lost, but that is not
> excessively large.
>

Yeah.

Like with jiffies, you need to make sure the samples you're comparing
aren't _too_ far off. That should be doable here -- 62 bits is plenty of
room to store a lot of change values.

My benchmark (maybe wrong, but maybe good enough) is to figure on an
increment per nanosecond for a worst-case scenario. With that, 2^40
nanoseconds is >12 days. Maybe that's overkill.

2^32 ns is about an hour and 20 mins. That's probably a reasonable value
to use. If we can't get a a new value onto disk in that time then
something is probably very wrong.
--
Jeff Layton <[email protected]>