LinuxLists.cc - out of order v3 write replies and cache invalidation

2016-03-29 19:57:54

Subject: out of order v3 write replies and cache invalidation

Is it always the case that cache invalidation is unavoidable when
client receives out of order replies back from the server? I believe
it is because the change attribute mismatch is unavoidable but I'd
like to check if my understanding is correct.

Here's what I mean:
1 write call 0-1024
2 write call 1024-2048
3 write call 2048-4096
4 write reply to 1
5 write reply to 3
6 write reply to 2

When #5 is received in the "before" attributes it doesn't have the
"after" attributes of reply #4 and that leads to cache invalidation
(this is what I'm seeing in the current code).

Thank you.

2016-03-30 17:40:42

by J. Bruce Fields

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Tue, Mar 29, 2016 at 03:57:53PM -0400, Olga Kornievskaia wrote:
> Is it always the case that cache invalidation is unavoidable when
> client receives out of order replies back from the server? I believe
> it is because the change attribute mismatch is unavoidable but I'd
> like to check if my understanding is correct.
>
> Here's what I mean:
> 1 write call 0-1024
> 2 write call 1024-2048
> 3 write call 2048-4096
> 4 write reply to 1
> 5 write reply to 3
> 6 write reply to 2
>
> When #5 is received in the "before" attributes it doesn't have the
> "after" attributes of reply #4 and that leads to cache invalidation
> (this is what I'm seeing in the current code).

In theory, couldn't the client in theory handle these situations by
remembering some (before, after) pairs? Then in the above case:

assume file starts with change attribute A
> 1 write call 0-1024
new change attribute after first write is B
> 2 write call 1024-2048
new change attribute after second write is C
> 3 write call 2048-4096
new change attribute after third write is D
> 4 write reply to 1
returns (before, after) == (A, B): mark our cache as
representing the state of the file at change attribute B.
> 5 write reply to 3
returns (before, after) == (C, D): our cache is now untrusted,
but would be trusted again if we saw (B, C).
> 6 write reply to 2
returns (before, after) == (B, C): now we've seen both (B, C),
and (C, D), so we can mark our cache as representing the state
of the file at change attribute D.

In general, at a given point:

- remember the last change attribute about which we had complete
information.
- remember a list of change attribute intervals which we've seen
in replies. Consolidate any pairs with common endpoints
(e.g., [(B,C),(C,D)] can be replaced by [(B,D)]).
- if the result is a pair that matches the last known-good
change attribute, then delete that pair and just record the
right endpoint as the new known-good change attribute.

In practice to make it manageable don't record more than a few such
intervals, give up and invalidate cache if that isn't enough. Maybe
even just one interval would be enough to catch most cases.

I don't know if that's worth it.

Also, it all depends on the assumption that the change attributes are
read atomically with respect with the write, which isn't really true.
But it sounds like we're already making that assumption.

If we assume no other writers until we close, couldn't you on close wait
for all writes, send a final getattr for change attribute, and trust
that? If the extra getattr's too much, then you'd need some algorithm
like the above to determine which change attribute is the last. Or
implement
https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
on client and server and just track the maximum returned value when the
server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.

--b.

>
> Thank you.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-03-30 18:20:41

by Trond Myklebust

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
> If we assume no other writers until we close, couldn't you on close wait
> for all writes, send a final getattr for change attribute, and trust
> that? If the extra getattr's too much, then you'd need some algorithm
> like the above to determine which change attribute is the last. Or
> implement
> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
> on client and server and just track the maximum returned value when the
> server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
>

The correct tool to use for resolving these caching issues is
ultimately a write delegation.

You can also eliminate a lot of invalidations if you know that the
server implements change_attr_type ==
NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
predict what the attribute should be after a change.

Cheers
Trond

2016-03-30 18:39:19

by Olga Kornievskaia

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 2:20 PM, Trond Myklebust
<[email protected]> wrote:
> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
>> If we assume no other writers until we close, couldn't you on close wait
>> for all writes, send a final getattr for change attribute, and trust
>> that? If the extra getattr's too much, then you'd need some algorithm
>> like the above to determine which change attribute is the last. Or
>> implement
>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
>> on client and server and just track the maximum returned value when the
>> server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
>>
>
> The correct tool to use for resolving these caching issues is
> ultimately a write delegation.
>
> You can also eliminate a lot of invalidations if you know that the
> server implements change_attr_type ==
> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
> predict what the attribute should be after a change.

Thanks for all the info. But let me highlight that I was asking about
v3. I don't see that the code has issues with cache invalidation for
nfsv4 when receiving out-of-order RPCs.

I am not sure if it's worth implementing something that Bruce
suggests. I just wanted to make sure that what i'm seeing is
"expected" behavior (caz it's v3) and not a bug.

2016-03-30 18:43:47

by Trond Myklebust

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 2:39 PM, Olga Kornievskaia <[email protected]> wrote:
> On Wed, Mar 30, 2016 at 2:20 PM, Trond Myklebust
> <[email protected]> wrote:
>> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
>>> If we assume no other writers until we close, couldn't you on close wait
>>> for all writes, send a final getattr for change attribute, and trust
>>> that? If the extra getattr's too much, then you'd need some algorithm
>>> like the above to determine which change attribute is the last. Or
>>> implement
>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
>>> on client and server and just track the maximum returned value when the
>>> server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
>>>
>>
>> The correct tool to use for resolving these caching issues is
>> ultimately a write delegation.
>>
>> You can also eliminate a lot of invalidations if you know that the
>> server implements change_attr_type ==
>> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
>> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
>> predict what the attribute should be after a change.
>
> Thanks for all the info. But let me highlight that I was asking about
> v3. I don't see that the code has issues with cache invalidation for
> nfsv4 when receiving out-of-order RPCs.
>
> I am not sure if it's worth implementing something that Bruce
> suggests. I just wanted to make sure that what i'm seeing is
> "expected" behavior (caz it's v3) and not a bug.

Yes. The design does expect the occasional false positive cache
invalidation due to RPC request reordering.

Cheers
Trond

2016-03-30 18:48:16

by J. Bruce Fields

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 02:20:40PM -0400, Trond Myklebust wrote:
> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
> > If we assume no other writers until we close, couldn't you on close wait
> > for all writes, send a final getattr for change attribute, and trust
> > that? If the extra getattr's too much, then you'd need some algorithm
> > like the above to determine which change attribute is the last. Or
> > implement
> > https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
> > on client and server and just track the maximum returned value when the
> > server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
> >
>
> The correct tool to use for resolving these caching issues is
> ultimately a write delegation.
>
> You can also eliminate a lot of invalidations if you know that the
> server implements change_attr_type ==
> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
> predict what the attribute should be after a change.

Do we know of any implementations?

It looks difficult (and possibly not yet sufficiently well-defined) to
me, but I haven't thought it through.

--b.

2016-03-30 18:50:27

by J. Bruce Fields

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 02:43:38PM -0400, Trond Myklebust wrote:
> On Wed, Mar 30, 2016 at 2:39 PM, Olga Kornievskaia <[email protected]> wrote:
> > On Wed, Mar 30, 2016 at 2:20 PM, Trond Myklebust
> > <[email protected]> wrote:
> >> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
> >>> If we assume no other writers until we close, couldn't you on close wait
> >>> for all writes, send a final getattr for change attribute, and trust
> >>> that? If the extra getattr's too much, then you'd need some algorithm
> >>> like the above to determine which change attribute is the last. Or
> >>> implement
> >>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
> >>> on client and server and just track the maximum returned value when the
> >>> server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
> >>>
> >>
> >> The correct tool to use for resolving these caching issues is
> >> ultimately a write delegation.
> >>
> >> You can also eliminate a lot of invalidations if you know that the
> >> server implements change_attr_type ==
> >> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
> >> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
> >> predict what the attribute should be after a change.
> >
> > Thanks for all the info. But let me highlight that I was asking about
> > v3. I don't see that the code has issues with cache invalidation for
> > nfsv4 when receiving out-of-order RPCs.
> >
> > I am not sure if it's worth implementing something that Bruce
> > suggests. I just wanted to make sure that what i'm seeing is
> > "expected" behavior (caz it's v3) and not a bug.
>
> Yes. The design does expect the occasional false positive cache
> invalidation due to RPC request reordering.

In the v3 and close-to-open case, since the ctime's monotonically
increasing, why couldn't we just keep track of the maximum ctime seen
before close?

--b.

2016-03-30 18:51:49

by Trond Myklebust

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 2:48 PM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Mar 30, 2016 at 02:20:40PM -0400, Trond Myklebust wrote:
>> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
>> > If we assume no other writers until we close, couldn't you on close wait
>> > for all writes, send a final getattr for change attribute, and trust
>> > that? If the extra getattr's too much, then you'd need some algorithm
>> > like the above to determine which change attribute is the last. Or
>> > implement
>> > https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
>> > on client and server and just track the maximum returned value when the
>> > server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
>> >
>>
>> The correct tool to use for resolving these caching issues is
>> ultimately a write delegation.
>>
>> You can also eliminate a lot of invalidations if you know that the
>> server implements change_attr_type ==
>> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
>> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
>> predict what the attribute should be after a change.
>
> Do we know of any implementations?

Yes. The AFS client does this.

> It looks difficult (and possibly not yet sufficiently well-defined) to
> me, but I haven't thought it through.
>
> --b.

2016-03-30 18:53:37

by Trond Myklebust

[permalink] [raw]

Subject: Re: out of order v3 write replies and cache invalidation

On Wed, Mar 30, 2016 at 2:50 PM, J. Bruce Fields <[email protected]> wrote:
> On Wed, Mar 30, 2016 at 02:43:38PM -0400, Trond Myklebust wrote:
>> On Wed, Mar 30, 2016 at 2:39 PM, Olga Kornievskaia <[email protected]> wrote:
>> > On Wed, Mar 30, 2016 at 2:20 PM, Trond Myklebust
>> > <[email protected]> wrote:
>> >> On Wed, Mar 30, 2016 at 1:40 PM, J. Bruce Fields <[email protected]> wrote:
>> >>> If we assume no other writers until we close, couldn't you on close wait
>> >>> for all writes, send a final getattr for change attribute, and trust
>> >>> that? If the extra getattr's too much, then you'd need some algorithm
>> >>> like the above to determine which change attribute is the last. Or
>> >>> implement
>> >>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#section-12.2.3
>> >>> on client and server and just track the maximum returned value when the
>> >>> server returns something other than NFS4_CHANGE_TYPE_IS_UNDEFINED.
>> >>>
>> >>
>> >> The correct tool to use for resolving these caching issues is
>> >> ultimately a write delegation.
>> >>
>> >> You can also eliminate a lot of invalidations if you know that the
>> >> server implements change_attr_type ==
>> >> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER or
>> >> NFS4_CHANGE_TYPE_IS_VERSION_COUNTER_NOPNFS, since that allows you to
>> >> predict what the attribute should be after a change.
>> >
>> > Thanks for all the info. But let me highlight that I was asking about
>> > v3. I don't see that the code has issues with cache invalidation for
>> > nfsv4 when receiving out-of-order RPCs.
>> >
>> > I am not sure if it's worth implementing something that Bruce
>> > suggests. I just wanted to make sure that what i'm seeing is
>> > "expected" behavior (caz it's v3) and not a bug.
>>
>> Yes. The design does expect the occasional false positive cache
>> invalidation due to RPC request reordering.
>
> In the v3 and close-to-open case, since the ctime's monotonically
> increasing, why couldn't we just keep track of the maximum ctime seen
> before close?

We can do that, but in theory NFSv3 is supposed to observe weak cache
consistency rather than strict close-to-open. In practice, we probably
don't care.