I've been observing an issue where an NFS client reading(tail -f to be
precise) a file to which another NFS client is writing(or rather,
appending. It's a log file of a cluster job), the reading client gets
stuck, and does not update its view of the file anymore.
On the server, the file still gets new lines of log added as expected.
On the reading client, the file gets stuck in the state of the moment
it's being read.
And stays that way until the writing site closes it, and some time
passes after that.
So, with the NFS Server being on 5.4, the issue did not appear.
On 5.10, it does. So I bisected the server it with the following setup:
One Client opening and appending to a file:
> ( for i in $(seq 1 60); do date; sleep 1; done ) > testfile
The other just does "fail -f" on the selfsame file.
On the good side of the bisect, the file updates as expected, and tail
-f shows new lines as they are added. On the bad side, it just gets
stuck and never updates. The file size in "ls -l" also gets stuck.
At the end of that bisect, git pointed me at commit
94415b06eb8aed13481646026dc995f04a3a534a.
> nfsd4: a client's own opens needn't prevent delegations
And indeed, reverting that commit on top of a current 5.10 kernel makes
the issue go away.
Thanks for the bisecting this and reporting the results.
The behavior you describe is probably not a bug: NFS close-to-open
caching semantics don't guarantee you'll see updates on the reading
client in this case.
Nevertheless, I don't understand why the behavior changed with that
commit, and I'd like to.
The only change in behavior I'd expect would be that the writing client
would be granted a read delegation. But I wouldn't expect that to
change how it writes back data, or how the reading client checks for
file changes. (The reading client still shouldn't be granted a
delegation.)
I'll take a look.
--b.
On Sun, Feb 28, 2021 at 11:27:53PM +0100, Timo Rothenpieler wrote:
> I've been observing an issue where an NFS client reading(tail -f to
> be precise) a file to which another NFS client is writing(or rather,
> appending. It's a log file of a cluster job), the reading client
> gets stuck, and does not update its view of the file anymore.
>
> On the server, the file still gets new lines of log added as expected.
> On the reading client, the file gets stuck in the state of the
> moment it's being read.
> And stays that way until the writing site closes it, and some time
> passes after that.
>
> So, with the NFS Server being on 5.4, the issue did not appear.
> On 5.10, it does. So I bisected the server it with the following setup:
>
> One Client opening and appending to a file:
>
> > ( for i in $(seq 1 60); do date; sleep 1; done ) > testfile
>
> The other just does "fail -f" on the selfsame file.
>
> On the good side of the bisect, the file updates as expected, and
> tail -f shows new lines as they are added. On the bad side, it just
> gets stuck and never updates. The file size in "ls -l" also gets
> stuck.
>
> At the end of that bisect, git pointed me at commit
> 94415b06eb8aed13481646026dc995f04a3a534a.
>
> > nfsd4: a client's own opens needn't prevent delegations
>
> And indeed, reverting that commit on top of a current 5.10 kernel
> makes the issue go away.
>
On 01.03.2021 19:46, J. Bruce Fields wrote:
> Thanks for the bisecting this and reporting the results.
>
> The behavior you describe is probably not a bug: NFS close-to-open
> caching semantics don't guarantee you'll see updates on the reading
> client in this case.
>
> Nevertheless, I don't understand why the behavior changed with that
> commit, and I'd like to.
>
> The only change in behavior I'd expect would be that the writing client
> would be granted a read delegation. But I wouldn't expect that to
> change how it writes back data, or how the reading client checks for
The writing side of things does not seem to be affected.
At any point during testing, the file on the servers filesystem is in
the state I'd expect it to be, growing by one line per second.
What's also a bit odd about this is that after both clients have closed
the file, the reading client still sees an older version of the file,
the one state it was in when it was first opened on the client.
The file size in stat()/ls -l mismatches between the reading client and
the server.
It stays in that state for some seemingly arbitrary amount of time.
Observing the file in any way(cat/tail, or even just ls) appears to
extend the wait time.
After leaving the file alone for long enough on the reading client, it
snaps back to reality.
This change in behavior, while clearly an edge case, is quite
devastating for our use case. And also somewhat counter-intuitive.
It's the log output of cluster jobs sent off to compute nodes by the users.
They then observe what the job is doing via tail -f on the log file on
the login node.
On Tue, Mar 02, 2021 at 02:03:58AM +0100, Timo Rothenpieler wrote:
> On 01.03.2021 19:46, J. Bruce Fields wrote:
> >Thanks for the bisecting this and reporting the results.
> >
> >The behavior you describe is probably not a bug: NFS close-to-open
> >caching semantics don't guarantee you'll see updates on the reading
> >client in this case.
> >
> >Nevertheless, I don't understand why the behavior changed with that
> >commit, and I'd like to.
> >
> >The only change in behavior I'd expect would be that the writing client
> >would be granted a read delegation. But I wouldn't expect that to
> >change how it writes back data, or how the reading client checks for
>
> The writing side of things does not seem to be affected.
> At any point during testing, the file on the servers filesystem is
> in the state I'd expect it to be, growing by one line per second.
>
> What's also a bit odd about this is that after both clients have
> closed the file, the reading client still sees an older version of
> the file, the one state it was in when it was first opened on the
> client.
> The file size in stat()/ls -l mismatches between the reading client
> and the server.
> It stays in that state for some seemingly arbitrary amount of time.
> Observing the file in any way(cat/tail, or even just ls) appears to
> extend the wait time.
> After leaving the file alone for long enough on the reading client,
> it snaps back to reality.
Hm, I wonder if we're (incorrectly) giving out a delegation to the
reading client. Is this 100% reproduceable?
--b.
> This change in behavior, while clearly an edge case, is quite
> devastating for our use case. And also somewhat counter-intuitive.
>
> It's the log output of cluster jobs sent off to compute nodes by the users.
> They then observe what the job is doing via tail -f on the log file
> on the login node.
>
On 02.03.2021 02:06, J. Bruce Fields wrote:
> On Tue, Mar 02, 2021 at 02:03:58AM +0100, Timo Rothenpieler wrote:
>> On 01.03.2021 19:46, J. Bruce Fields wrote:
>>> Thanks for the bisecting this and reporting the results.
>>>
>>> The behavior you describe is probably not a bug: NFS close-to-open
>>> caching semantics don't guarantee you'll see updates on the reading
>>> client in this case.
>>>
>>> Nevertheless, I don't understand why the behavior changed with that
>>> commit, and I'd like to.
>>>
>>> The only change in behavior I'd expect would be that the writing client
>>> would be granted a read delegation. But I wouldn't expect that to
>>> change how it writes back data, or how the reading client checks for
>>
>> The writing side of things does not seem to be affected.
>> At any point during testing, the file on the servers filesystem is
>> in the state I'd expect it to be, growing by one line per second.
>>
>> What's also a bit odd about this is that after both clients have
>> closed the file, the reading client still sees an older version of
>> the file, the one state it was in when it was first opened on the
>> client.
>> The file size in stat()/ls -l mismatches between the reading client
>> and the server.
>> It stays in that state for some seemingly arbitrary amount of time.
>> Observing the file in any way(cat/tail, or even just ls) appears to
>> extend the wait time.
>> After leaving the file alone for long enough on the reading client,
>> it snaps back to reality.
>
> Hm, I wonder if we're (incorrectly) giving out a delegation to the
> reading client. Is this 100% reproduceable?
Server and reading client disagreeing about the files size and contents
is 100% reproducible, even after the writing client has long and
definitely closed the file.
I'm not 100% certain about the exact timing and extension of timing,
since I never did any intensive testing on that. But it stays mismatched
between client and server like that for at least several minutes.
On Tue, Mar 02, 2021 at 02:23:05AM +0100, Timo Rothenpieler wrote:
> Server and reading client disagreeing about the files size and
> contents is 100% reproducible, even after the writing client has
> long and definitely closed the file.
OK, I reproduced this myself and watched the network traffic in
Wireshark.
The server is *not* giving out a read delegation to the writer (which is
what the commit you bisected to should have allowed). It *is* giving
out a delegation to the reader. Which is a truly awful bug.
Investigating....
--b.
On Thu, Mar 04, 2021 at 09:42:28AM -0500, J. Bruce Fields wrote:
> On Tue, Mar 02, 2021 at 02:23:05AM +0100, Timo Rothenpieler wrote:
> > Server and reading client disagreeing about the files size and
> > contents is 100% reproducible, even after the writing client has
> > long and definitely closed the file.
>
> OK, I reproduced this myself and watched the network traffic in
> Wireshark.
>
> The server is *not* giving out a read delegation to the writer (which is
> what the commit you bisected to should have allowed). It *is* giving
> out a delegation to the reader. Which is a truly awful bug.
> Investigating....
Yeah, it's completely failing to provide close-to-open cache consistency
there in most cases. Arghh.
I have some idea how to fix it, but for now it should just be reverted.
I'll post that pending some testing.
I've also added a pynfs test for this case (4.1 test DELEG9). We should
probably have some more.
So, thanks for the report.
I'd still be a little cautious about your use--NFS can't give you a lot
of guarantees when you've got a file open for read and write
simultaneously. The writing client can delay those writes at least
until the writing application closes, and the reading isn't necessarily
going to revalidate its cache until it the reading application closes
and reopens. And weirder behavior is possible: for exapmle when the
writer does write back, I don't think there's any guarantee it will
write in order, so you could end up temporarily with NULLs in the
middle.
But that particular regression was my fault, thanks again for bisecting
and reporting.
--b.
On 05.03.2021 23:31, J. Bruce Fields wrote:
> Yeah, it's completely failing to provide close-to-open cache consistency
> there in most cases. Arghh.
>
> I have some idea how to fix it, but for now it should just be reverted.
> I'll post that pending some testing.
>
> I've also added a pynfs test for this case (4.1 test DELEG9). We should
> probably have some more.
>
> So, thanks for the report.
>
> I'd still be a little cautious about your use--NFS can't give you a lot
> of guarantees when you've got a file open for read and write
> simultaneously. The writing client can delay those writes at least
> until the writing application closes, and the reading isn't necessarily
> going to revalidate its cache until it the reading application closes
> and reopens. And weirder behavior is possible: for exapmle when the
> writer does write back, I don't think there's any guarantee it will
> write in order, so you could end up temporarily with NULLs in the
> middle.
As long as it doesn't get stuck for extended amounts of time, it being a
bit janked is fine.
A close and re-open bringing it back up to date that's all I can ask for.
Not sure how "tail -f" operates internally. It does seem to have special
handling for remote files/network filesystems though, so it might be
doing the right thing already.
But as long as calling "cat" on the file repeatedly works, that's fine.
> But that particular regression was my fault, thanks again for bisecting
> and reporting.
>
I've been running a 5.10 Kernel with the commit reverted for a the whole
week now, and haven't run into any issues so far.