2020-04-13 15:36:47

by David Howells

[permalink] [raw]
Subject: [GIT PULL] afs: Fixes

Hi Linus,

Here are some fixes for the afs filesystem:

(1) Fix the decoding of fetched file status records do that in advances
the xdr pointer under all circumstances.

(2) Fix the decoding of a fetched file status record that indicates an
inline abort code (ie. an error) so that it sets the flag saying that
it got an error.

(3) Fix the decoding of the result of the rename operation so that it
doesn't skip the decoding of the second fetched file status (ie. that
of the dest dir) in the case that the source and dest dirs were the
same as this causes the xdr pointer not to be advanced, leading to
incorrect decoding of subsequent parts of the reply.

(4) Fix the dump of a bad YFSFetchStatus record to dump the full length.

(5) Fix a race between local editing of directory contents and accessing
the dir for reading or d_revalidate by using the same lock in both.

(6) Fix afs_d_revalidate() to not accidentally reverse the version on a
dentry when it's meant to be bringing it forward.

David
---
The following changes since commit 8f3d9f354286745c751374f5f1fcafee6b3f3136:

Linux 5.7-rc1 (2020-04-12 12:35:55 -0700)

are available in the Git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-fixes-20200413

for you to fetch changes up to 40fc81027f892284ce31f8b6de1e497f5b47e71f:

afs: Fix afs_d_validate() to set the right directory version (2020-04-13 15:09:01 +0100)

----------------------------------------------------------------
AFS fixes

----------------------------------------------------------------
David Howells (6):
afs: Fix missing XDR advance in xdr_decode_{AFS,YFS}FSFetchStatus()
afs: Fix decoding of inline abort codes from version 1 status records
afs: Fix rename operation status delivery
afs: Fix length of dump of bad YFSFetchStatus record
afs: Fix race between post-modification dir edit and readdir/d_revalidate
afs: Fix afs_d_validate() to set the right directory version

fs/afs/dir.c | 108 +++++++++++++++++++++++++++++++++--------------------
fs/afs/dir_silly.c | 22 +++++++----
fs/afs/fsclient.c | 27 ++++++++------
fs/afs/yfsclient.c | 26 +++++++------
4 files changed, 112 insertions(+), 71 deletions(-)


2020-04-14 18:25:17

by David Howells

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

David Howells <[email protected]> wrote:

> (1) Fix the decoding of fetched file status records do that in advances
> the xdr pointer under all circumstances.

As Willy points out, this isn't very English. Let me try that again.

David

2020-04-15 13:13:00

by pr-tracker-bot

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

The pull request you sent on Mon, 13 Apr 2020 15:50:15 +0100:

> git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/afs-fixes-20200413

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/f4cd66682b2728734b2fc44f5f1e83a5c740b5cf

Thank you!

--
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

2017-11-25 23:20:20

by David Howells

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

Linus Torvalds <[email protected]> wrote:

> However, even when you do that, the page can be writable in other
> mappings. At least fork(), for example, only clears the dirty bit,
> doesn't mark it write-protected.

I assumed the rmap walk done by page_mkclean() would take care of that but I'm
not really clear on what the code does.

> I just hope that the inconsistency isn't fatal to the afs client or
> server code. For example, if you retry writes forever when a checksum
> were to not match the data, that would be bad.

Shouldn't be a problem for the the in-Linux client. Data is copied into
sk_bufs preparatory to doing further things to it like checksumming,
encryption or transmission (actually, in future, I would like to use the
encryption process to save on the copy, but this shouldn't bother that
either).

AFAIK, the servers are all userspace jobs that don't let anyone else touch
their storage so that they can maintain correctness on the data version number
of each vnode.

> so I just wanted to bring this up as a potential issue, not
> necessarily as a big problem.

Thanks.

David

From 1585080506133673956@xxx Sat Nov 25 22:56:08 +0000 2017
X-GM-THRID: 1584957627109406089
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-25 22:56:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

On Sat, Nov 25, 2017 at 12:35 PM, David Howells <[email protected]> wrote:
>
> Doesn't clear_page_dirty_for_io() write-protect the PTE for the page to be
> written out, in which case ->page_mkwrite() will get called again before the
> page is redirtied?

No, it literally just sets the dirty bit (and does accounting).

But I think you may be right that we always write-protect he page when
we move the dirty bit from the page tables to the 'struct page'
(page_mkclean_one()).

However, even when you do that, the page can be writable in other
mappings. At least fork(), for example, only clears the dirty bit,
doesn't mark it write-protected.

So there is some rate-limiting of dirty pages, but I do not believe
that we've ever really *serialized* writes.

>> (b) can cause some really nasty latency issues
>
> True, but I think the most common case is a file being opened, written start
> to finish and then closed. Actually, the worst-handled thing I've seen is a
> shell script appending a bunch of things to a file because ->flush() syncs the
> file each time it is closed:-/
>
> What would you recommend instead? I'm currently trying and keep track of what
> needs to be written so that I only write what's changed to the server, rather
> than writing only whole pages.

I don't think that what you are doing is necessarily wrong, I'm just
pointing out that you may still see mmap'ed pages being modified
concurrently with the actual IO, and that this will potentially mean
(for example) that things like checksums won't be reliably unless you
do the checksum as you copy the data to a network packet or something.

Of course, if that kind of inconsistency happens, a later write-back
will also happen, and eventually fix it. So the server may see
temporarily 'wrong' data, but it won't be final.

I just hope that the inconsistency isn't fatal to the afs client or
server code. For example, if you retry writes forever when a checksum
were to not match the data, that would be bad.

And yes, this can be

(a) really hard to trigger in practice

(b) very hard to debug due to a combination of very specific timing
and behavior.

so I just wanted to bring this up as a potential issue, not
necessarily as a big problem.

Linus

From 1585080059653432227@xxx Sat Nov 25 22:49:02 +0000 2017
X-GM-THRID: 1584957627109406089
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-25 22:49:02

by Dave Chinner

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

On Sat, Nov 25, 2017 at 10:35:43PM +0000, David Howells wrote:
> Linus Torvalds <[email protected]> wrote:
>
> > So I see in the commit message why afs needs to do this, but it's
> > worth pointing out that it's
> >
> > (a) impossible to avoid the "inconsistent data" case for writable mmap'ed
> > pages
>
> Doesn't clear_page_dirty_for_io() write-protect the PTE for the page to be
> written out, in which case ->page_mkwrite() will get called again before the
> page is redirtied?

Yes, but page_mkwrite will only block on writeback in progress is if
the backing device says it needs stable pages. See
wait_for_stable_page(). e.g. stable pages are required if RAID is
in use, otherwise modification during IO can result in broken
on-disk parity/mirroring....

Cheers,

Dave.
--
Dave Chinner
[email protected]

From 1585079273454583556@xxx Sat Nov 25 22:36:32 +0000 2017
X-GM-THRID: 1584957627109406089
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-25 22:36:32

by David Howells

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

Linus Torvalds <[email protected]> wrote:

> So I see in the commit message why afs needs to do this, but it's
> worth pointing out that it's
>
> (a) impossible to avoid the "inconsistent data" case for writable mmap'ed
> pages

Doesn't clear_page_dirty_for_io() write-protect the PTE for the page to be
written out, in which case ->page_mkwrite() will get called again before the
page is redirtied?

> (b) can cause some really nasty latency issues

True, but I think the most common case is a file being opened, written start
to finish and then closed. Actually, the worst-handled thing I've seen is a
shell script appending a bunch of things to a file because ->flush() syncs the
file each time it is closed:-/

What would you recommend instead? I'm currently trying and keep track of what
needs to be written so that I only write what's changed to the server, rather
than writing only whole pages.

David

From 1585062261113906907@xxx Sat Nov 25 18:06:08 +0000 2017
X-GM-THRID: 1584957627109406089
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread

2017-11-25 18:06:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: [GIT PULL] afs: Fixes

On Fri, Nov 24, 2017 at 4:22 AM, David Howells <[email protected]> wrote:
>
> (2) Don't write to a page that's being written out, but wait for it to
> complete.

So I see in the commit message why afs needs to do this, but it's
worth pointing out that it's

(a) impossible to avoid the "inconsistent data" case for writable mmap'ed pages

(b) can cause some really nasty latency issues

So the "page->private" issue does mean that write-vs-writeback clearly
needs that serialization (and that's not an issue for the mapped page
being changed at the same time), but in general filesystem people
should be aware that data under writeback may still be further dirtied
while the writeback is active (and the page marked dirty again). It
can obviously screw with things like checksums etc, and make for
temporarily inconsistent state if the filesystem does those kinds of
things.

Linus

From 1584957627109406089@xxx Fri Nov 24 14:23:01 +0000 2017
X-GM-THRID: 1584957627109406089
X-Gmail-Labels: Inbox,Category Forums,HistoricalUnread