2022-06-14 04:04:27

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900:
> cached operations sometimes need to do invalid operations (e.g. read
> on a write only file)
> Historic fscache had added a "writeback fid" for this, but the conversion
> to new fscache somehow lost usage of it: use the writeback fid instead
> of normal one.
>
> Note that the way this works (writeback fid being linked to inode) means
> we might use overprivileged fid for some operations, e.g. write as root
> when we shouldn't.
> Ideally we should keep both fids handy, and only use the writeback fid
> when really required e.g. reads to a write-only file to fill in the page
> cache (read-modify-write); but this is the situation we've always had
> and this commit only fixes an issue we've had for too long.
>
> Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching")
> Cc: [email protected]
> Cc: David Howells <[email protected]>
> Reported-By: Christian Schoenebeck <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---
> Ok so finally had time to look at this, and it's not a lot so this is
> the most straight forward way to do: just reverting to how the old
> fscache worked.
>
> This appears to work from quick testing, Chiristian could you test it?
>
> I think the warnings you added in p9_client_read/write that check
> fid->mode might a lot of sense, if you care to resend it as
> WARN_ON((fid->mode & ACCMODE) == O_xyz);
> instead I'll queue that for 5.20
>
>
> @Stable people, I've checked it applies to 5.17 and 5.18 so should be
> good to grab once I submit it for inclusion (that commit was included in
> 5.16, which is no longer stable)
>
>
> fs/9p/vfs_addr.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index 7382c5227e94..262968d02f55 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
> */
> static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
> {
> - struct p9_fid *fid = file->private_data;
> + struct inode *inode = file_inode(file);
> + struct v9fs_inode *v9inode = V9FS_I(inode);
> + struct p9_fid *fid = v9inode->writeback_fid;
> +

Sorry for mails back-to-back (grmbl I hate git commit --amend not
warning I only have unstaged changes), this is missing the following
here:

+ /* If there is no writeback fid this file only ever has had
+ * read-only opens, so we can use file's fid which should
+ * always be set instead */
+ if (!fid)
+ fid = file->private_data;

Christian, you can find it here to test:
https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118e2bda8

> + BUG_ON(!fid);
>
> p9_fid_get(fid);
> rreq->netfs_priv = fid;

Thanks,
--
Dominique


2022-06-14 13:30:39

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

On Dienstag, 14. Juni 2022 05:41:40 CEST Dominique Martinet wrote:
> Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900:
> > cached operations sometimes need to do invalid operations (e.g. read
> > on a write only file)
> > Historic fscache had added a "writeback fid" for this, but the conversion
> > to new fscache somehow lost usage of it: use the writeback fid instead
> > of normal one.
> >
> > Note that the way this works (writeback fid being linked to inode) means
> > we might use overprivileged fid for some operations, e.g. write as root
> > when we shouldn't.
> > Ideally we should keep both fids handy, and only use the writeback fid
> > when really required e.g. reads to a write-only file to fill in the page
> > cache (read-modify-write); but this is the situation we've always had
> > and this commit only fixes an issue we've had for too long.
> >
> > Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do
> > reads and caching") Cc: [email protected]
> > Cc: David Howells <[email protected]>
> > Reported-By: Christian Schoenebeck <[email protected]>
> > Signed-off-by: Dominique Martinet <[email protected]>
> > ---
> > Ok so finally had time to look at this, and it's not a lot so this is
> > the most straight forward way to do: just reverting to how the old
> > fscache worked.
> >
> > This appears to work from quick testing, Chiristian could you test it?
> >
> > I think the warnings you added in p9_client_read/write that check
> > fid->mode might a lot of sense, if you care to resend it as
> > WARN_ON((fid->mode & ACCMODE) == O_xyz);
> > instead I'll queue that for 5.20
> >
> >
> > @Stable people, I've checked it applies to 5.17 and 5.18 so should be
> > good to grab once I submit it for inclusion (that commit was included in
> > 5.16, which is no longer stable)
> >
> > fs/9p/vfs_addr.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> > index 7382c5227e94..262968d02f55 100644
> > --- a/fs/9p/vfs_addr.c
> > +++ b/fs/9p/vfs_addr.c
> > @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest
> > *subreq)>
> > */
> >
> > static int v9fs_init_request(struct netfs_io_request *rreq, struct file
> > *file) {
> >
> > - struct p9_fid *fid = file->private_data;
> > + struct inode *inode = file_inode(file);
> > + struct v9fs_inode *v9inode = V9FS_I(inode);
> > + struct p9_fid *fid = v9inode->writeback_fid;
> > +
>
> Sorry for mails back-to-back (grmbl I hate git commit --amend not
> warning I only have unstaged changes), this is missing the following
> here:

I think git does actually. It shows you staged and unstaged changes as comment
below the commit log text inside the editor. Not as a big fat warning, but the
info is there.

> + /* If there is no writeback fid this file only ever has had
> + * read-only opens, so we can use file's fid which should
> + * always be set instead */
> + if (!fid)
> + fid = file->private_data;
>
> Christian, you can find it here to test:
> https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118
> e2bda8
> > + BUG_ON(!fid);
> >
> > p9_fid_get(fid);
> > rreq->netfs_priv = fid;

It definitely goes into the right direction, but I think it's going a bit too
far by using writeback_fid also in cases where it is not necessary and wasn't
used before in the past.

What about something like this in v9fs_init_request() (yet untested):

/* writeback_fid is always opened O_RDWR (instead of just O_WRONLY)
* explicitly for this case: partial write backs that require a read
* prior to actual write and therefore requires a fid with read
* capability.
*/
if (rreq->origin == NETFS_READ_FOR_WRITE)
fid = v9inode->writeback_fid;

If desired, this could be further constrained later on like:

if (rreq->origin == NETFS_READ_FOR_WRITE &&
(fid->mode & O_ACCMODE) == O_WRONLY)
{
fid = v9inode->writeback_fid;
}

I will definitely give these options some test spins here, a short feedback
ahead would be appreciated though.

Thanks Dominique!

Best regards,
Christian Schoenebeck


2022-06-14 13:36:48

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200:
> It definitely goes into the right direction, but I think it's going a bit too
> far by using writeback_fid also in cases where it is not necessary and wasn't
> used before in the past.

Would help if I had an idea of what was used where in the past.. :)

From a quick look at the code, checking out v5.10,
v9fs_vfs_writepage_locked() used the writeback fid always for all writes
v9fs_vfs_readpages is a bit more complex but only seems to be using the
"direct" private_data fid for reads...
It took me a bit of time but I think the reads you were seeing on
writeback fid come from v9fs_write_begin that does some readpage on the
writeback fid to populate the page before a non-filling write happens.

> What about something like this in v9fs_init_request() (yet untested):
>
> /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY)
> * explicitly for this case: partial write backs that require a read
> * prior to actual write and therefore requires a fid with read
> * capability.
> */
> if (rreq->origin == NETFS_READ_FOR_WRITE)
> fid = v9inode->writeback_fid;

... Which seems to be exactly what this origin is about, so if that
works I'm all for it.

> If desired, this could be further constrained later on like:
>
> if (rreq->origin == NETFS_READ_FOR_WRITE &&
> (fid->mode & O_ACCMODE) == O_WRONLY)
> {
> fid = v9inode->writeback_fid;
> }

That also makes sense, if the fid mode has read permissions we might as
well use these as the writeback fid would needlessly be doing root IOs.

> I will definitely give these options some test spins here, a short feedback
> ahead would be appreciated though.

Please let me know how that works out, I'd be happy to use either of
your versions instead of mine.
If I can be greedy though I'd like to post it together with the other
couple of fixes next week, so having something before the end of the
week would be great -- I think even my first overkill version early and
building on it would make sense at this point.

But I think you've got the right end, so hopefully won't be needing to
delay


Cheers,
--
Dominique

2022-06-14 14:38:56

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200:
> > It definitely goes into the right direction, but I think it's going a bit
> > too far by using writeback_fid also in cases where it is not necessary
> > and wasn't used before in the past.
>
> Would help if I had an idea of what was used where in the past.. :)
>
> From a quick look at the code, checking out v5.10,
> v9fs_vfs_writepage_locked() used the writeback fid always for all writes
> v9fs_vfs_readpages is a bit more complex but only seems to be using the
> "direct" private_data fid for reads...
> It took me a bit of time but I think the reads you were seeing on
> writeback fid come from v9fs_write_begin that does some readpage on the
> writeback fid to populate the page before a non-filling write happens.

Yes, the overall picture in the past was not clear to me either.

To be more specific, I was reading your patch as if it would e.g. also use the
writeback_fid if somebody explicitly called read() (i.e. not an implied read
caused by a partial write back), and was concerned about a potential privilege
escalation. Maybe it's just a theoretical issue, as this case is probably
already catched on a higher, general fs handling level, but worth
consideration.

> > What about something like this in v9fs_init_request() (yet untested):
> > /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY)
> >
> > * explicitly for this case: partial write backs that require a read
> > * prior to actual write and therefore requires a fid with read
> > * capability.
> > */
> >
> > if (rreq->origin == NETFS_READ_FOR_WRITE)
> >
> > fid = v9inode->writeback_fid;
>
> ... Which seems to be exactly what this origin is about, so if that
> works I'm all for it.
>
> > If desired, this could be further constrained later on like:
> > if (rreq->origin == NETFS_READ_FOR_WRITE &&
> >
> > (fid->mode & O_ACCMODE) == O_WRONLY)
> >
> > {
> >
> > fid = v9inode->writeback_fid;
> >
> > }
>
> That also makes sense, if the fid mode has read permissions we might as
> well use these as the writeback fid would needlessly be doing root IOs.
>
> > I will definitely give these options some test spins here, a short
> > feedback
> > ahead would be appreciated though.
>
> Please let me know how that works out, I'd be happy to use either of
> your versions instead of mine.
> If I can be greedy though I'd like to post it together with the other
> couple of fixes next week, so having something before the end of the
> week would be great -- I think even my first overkill version early and
> building on it would make sense at this point.
>
> But I think you've got the right end, so hopefully won't be needing to
> delay

I need a day or two for testing, then I will report back for sure. So it
should perfectly fit into your intended schedule.

Thanks!

Best regards,
Christian Schoenebeck


2022-06-16 13:39:11

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

On Dienstag, 14. Juni 2022 16:11:35 CEST Christian Schoenebeck wrote:
> On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote:
[...]
> > Please let me know how that works out, I'd be happy to use either of
> > your versions instead of mine.
> > If I can be greedy though I'd like to post it together with the other
> > couple of fixes next week, so having something before the end of the
> > week would be great -- I think even my first overkill version early and
> > building on it would make sense at this point.
> >
> > But I think you've got the right end, so hopefully won't be needing to
> > delay
>
> I need a day or two for testing, then I will report back for sure. So it
> should perfectly fit into your intended schedule.

Two things:

1. your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable.

2. I fixed the conflict and gave your patch a test spin, and it triggers
the BUG_ON(!fid); that you added with that patch. Backtrace based on
30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):

[ 2.211473] kernel BUG at fs/9p/vfs_addr.c:65!
...
[ 2.244415] netfs_alloc_request (fs/netfs/objects.c:42) netfs
[ 2.245438] netfs_readahead (fs/netfs/buffered_read.c:166) netfs
[ 2.246392] read_pages (./include/linux/pagemap.h:1264 ./include/linux/pagemap.h:1306 mm/readahead.c:164)
[ 2.247120] ? folio_add_lru (./arch/x86/include/asm/preempt.h:103 mm/swap.c:468)
[ 2.247911] page_cache_ra_unbounded (./include/linux/fs.h:808 mm/readahead.c:264)
[ 2.248875] filemap_get_pages (mm/filemap.c:2594)
[ 2.249723] filemap_read (mm/filemap.c:2679)
[ 2.250478] ? ptep_set_access_flags (./arch/x86/include/asm/paravirt.h:441 arch/x86/mm/pgtable.c:493)
[ 2.251417] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186)
[ 2.252253] ? do_wp_page (mm/memory.c:3293 mm/memory.c:3393)
[ 2.253012] ? aa_file_perm (security/apparmor/file.c:604)
[ 2.253824] new_sync_read (fs/read_write.c:402 (discriminator 1))
[ 2.254616] vfs_read (fs/read_write.c:482)
[ 2.255313] ksys_read (fs/read_write.c:620)
[ 2.256000] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 2.256764] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)

Did your patch work there for you? I mean I have not applied the other pending
9p patches, but they should not really make difference, right? I won't have
time today, but I will continue to look at it tomorrow. If you already had
some thoughts on this, that would be great of course.

Best regards,
Christian Schoenebeck


2022-06-16 14:25:33

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200:
> 2. I fixed the conflict and gave your patch a test spin, and it triggers
> the BUG_ON(!fid); that you added with that patch. Backtrace based on
> 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):

hm, that's probably the version I sent without the fallback to
private_data fid if writeback fid was sent (I've only commented without
sending a v2)

> 1. your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable.

ugh, you are correct, that was wrong as well in the version I sent by
mail... I've hurried that way too much.

The patch that's currently on the tip of my 9p-next branch should be
alright though, I'll resend it now so you can apply cleanly if you don't
want to fetch https://github.com/martinetd/linux/commits/9p-next

> Did your patch work there for you? I mean I have not applied the other pending
> 9p patches, but they should not really make difference, right? I won't have
> time today, but I will continue to look at it tomorrow. If you already had
> some thoughts on this, that would be great of course.

Yes, my version passes basic tests at least, and I could no longer
reproduce the problem.

Without the if (!fid) fid = file->private_data though it does fail
horribly like you've found out.

--
Dominique

2022-06-16 14:35:52

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v2] 9p: fix EBADF errors in cached mode

cached operations sometimes need to do invalid operations (e.g. read
on a write only file)
Historic fscache had added a "writeback fid" for this, but the conversion
to new fscache somehow lost usage of it: use the writeback fid instead
of normal one.

Note that the way this works (writeback fid being linked to inode) means
we might use overprivileged fid for some operations, e.g. write as root
when we shouldn't.
Ideally we should keep both fids handy, and only use the writeback fid
when really required e.g. reads to a write-only file to fill in the page
cache (read-modify-write); but this is the situation we've always had
and this commit only fixes an issue we've had for too long.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching")
Cc: [email protected]
Cc: David Howells <[email protected]>
Reported-By: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
---
fs/9p/vfs_addr.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index a8f512b44a85..7f924e671e3e 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -58,7 +58,17 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
*/
static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
{
- struct p9_fid *fid = file->private_data;
+ struct inode *inode = file_inode(file);
+ struct v9fs_inode *v9inode = V9FS_I(inode);
+ struct p9_fid *fid = v9inode->writeback_fid;
+
+ /* If there is no writeback fid this file only ever has had
+ * read-only opens, so we can use file's fid which should
+ * always be set instead */
+ if (!fid)
+ fid = file->private_data;
+
+ BUG_ON(!fid);

refcount_inc(&fid->count);
rreq->netfs_priv = fid;
--
2.35.1

2022-06-16 15:12:35

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900:
> > Did your patch work there for you? I mean I have not applied the other pending
> > 9p patches, but they should not really make difference, right? I won't have
> > time today, but I will continue to look at it tomorrow. If you already had
> > some thoughts on this, that would be great of course.
>
> Yes, my version passes basic tests at least, and I could no longer
> reproduce the problem.

For what it's worth I've also tested a version of your patch:

-----
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index a8f512b44a85..d0833fa69faf 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
*/
static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
{
+ struct inode *inode = file_inode(file);
+ struct v9fs_inode *v9inode = V9FS_I(inode);
struct p9_fid *fid = file->private_data;

+ BUG_ON(!fid);
+
+ /* we might need to read from a fid that was opened write-only
+ * for read-modify-write of page cache, use the writeback fid
+ * for that */
+ if (rreq->origin == NETFS_READ_FOR_WRITE &&
+ (fid->mode & O_ACCMODE) == O_WRONLY) {
+ fid = v9inode->writeback_fid;
+ BUG_ON(!fid);
+ }
+
refcount_inc(&fid->count);
rreq->netfs_priv = fid;
return 0;
-----

And this also seems to work alright.

I was about to ask why the original code did writes with the writeback
fid, but I'm noticing now the current code still does (through
v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old
code, and init_request will only be getting reads? Which actually makes
sense now I'm thinking about it because I recall David saying he's
working on netfs writes now...

So that minimal version is probably what we want, give or take style
adjustments (only initializing inode/v9inode in the if case or not) -- I
sure hope compilers optimizes it away when not needed.


I'll let you test one or both versions and will fixup the commit message
again/credit you/resend if we go with this version, unless you want to
send it.

--
Dominique

2022-06-16 20:19:03

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

On Donnerstag, 16. Juni 2022 15:51:31 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200:
> > 2. I fixed the conflict and gave your patch a test spin, and it triggers
> > the BUG_ON(!fid); that you added with that patch. Backtrace based on
>
> > 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):
> hm, that's probably the version I sent without the fallback to
> private_data fid if writeback fid was sent (I've only commented without
> sending a v2)

Right, I forgot that you queued another version, sorry. With your already
queued patch (today's v2) that's fine now.

On Donnerstag, 16. Juni 2022 16:11:16 CEST Dominique Martinet wrote:
> Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900:
> > > Did your patch work there for you? I mean I have not applied the other
> > > pending 9p patches, but they should not really make difference, right?
> > > I won't have time today, but I will continue to look at it tomorrow. If
> > > you already had some thoughts on this, that would be great of course.
> >
> > Yes, my version passes basic tests at least, and I could no longer
> > reproduce the problem.
>
> For what it's worth I've also tested a version of your patch:
>
> -----
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index a8f512b44a85..d0833fa69faf 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest
> *subreq) */
> static int v9fs_init_request(struct netfs_io_request *rreq, struct file
> *file) {
> + struct inode *inode = file_inode(file);
> + struct v9fs_inode *v9inode = V9FS_I(inode);
> struct p9_fid *fid = file->private_data;
>
> + BUG_ON(!fid);
> +
> + /* we might need to read from a fid that was opened write-only
> + * for read-modify-write of page cache, use the writeback fid
> + * for that */
> + if (rreq->origin == NETFS_READ_FOR_WRITE &&
> + (fid->mode & O_ACCMODE) == O_WRONLY) {
> + fid = v9inode->writeback_fid;
> + BUG_ON(!fid);
> + }
> +
> refcount_inc(&fid->count);
> rreq->netfs_priv = fid;
> return 0;
> -----
>
> And this also seems to work alright.
>
> I was about to ask why the original code did writes with the writeback
> fid, but I'm noticing now the current code still does (through
> v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old
> code, and init_request will only be getting reads? Which actually makes
> sense now I'm thinking about it because I recall David saying he's
> working on netfs writes now...
>
> So that minimal version is probably what we want, give or take style
> adjustments (only initializing inode/v9inode in the if case or not) -- I
> sure hope compilers optimizes it away when not needed.
>
>
> I'll let you test one or both versions and will fixup the commit message
> again/credit you/resend if we go with this version, unless you want to
> send it.
>
> --
> Dominique

I tested all 3 variants today, and they were all behaving correctly (no EBADF
errors anymore, no other side effects observed).

The minimalistic version (i.e. your initial suggestion) performed 20% slower
in my tests, but that could be due to the fact that it was simply the 1st
version I tested, so caching on host side might be the reason. If necessary I
can check the performance aspect more thoroughly.

Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's
up to you. On doubt, clarify with David's plans.

Feel free to add my RB and TB tags to any of the 3 version(s) you end up
queuing:

Reviewed-by: Christian Schoenebeck <[email protected]>
Tested-by: Christian Schoenebeck <[email protected]>

Best regards,
Christian Schoenebeck


2022-06-16 21:01:17

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH] 9p: fix EBADF errors in cached mode

Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 10:14:16PM +0200:
> I tested all 3 variants today, and they were all behaving correctly (no EBADF
> errors anymore, no other side effects observed).

Thanks!

> The minimalistic version (i.e. your initial suggestion) performed 20% slower
> in my tests, but that could be due to the fact that it was simply the 1st
> version I tested, so caching on host side might be the reason. If necessary I
> can check the performance aspect more thoroughly.

hmm, yeah we open the writeback fids anyway so I'm not sure what would
be really different performance-wise, but I'd tend to go with the most
restricted change anyway.

> Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's
> up to you. On doubt, clarify with David's plans.
>
> Feel free to add my RB and TB tags to any of the 3 version(s) you end up
> queuing:
>
> Reviewed-by: Christian Schoenebeck <[email protected]>
> Tested-by: Christian Schoenebeck <[email protected]>

Thanks, I'll add these and resend the last version for archival on the
list / commit message wording check.

At last that issue closed...
--
Dominique

2022-06-16 21:32:54

by Dominique Martinet

[permalink] [raw]
Subject: [PATCH v3] 9p: fix EBADF errors in cached mode

cached operations sometimes need to do invalid operations (e.g. read
on a write only file)
Historic fscache had added a "writeback fid", a special handle opened
RW as root, for this. The conversion to new fscache missed that bit.

This commit reinstates a slightly lesser variant of the original code
that uses the writeback fid for partial pages backfills if the regular
user fid had been open as WRONLY, and thus would lack read permissions.

Link: https://lkml.kernel.org/r/[email protected]
Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching")
Cc: [email protected]
Cc: David Howells <[email protected]>
Reported-By: Christian Schoenebeck <[email protected]>
Reviewed-by: Christian Schoenebeck <[email protected]>
Tested-by: Christian Schoenebeck <[email protected]>
Signed-off-by: Dominique Martinet <[email protected]>
---
v3: use the least permissive version of the patch that only uses
writeback fid when really required

If no problem shows up by then I'll post this patch around Wed 23 (next
week) with the other stable fixes.

fs/9p/vfs_addr.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
index a8f512b44a85..d0833fa69faf 100644
--- a/fs/9p/vfs_addr.c
+++ b/fs/9p/vfs_addr.c
@@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
*/
static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file)
{
+ struct inode *inode = file_inode(file);
+ struct v9fs_inode *v9inode = V9FS_I(inode);
struct p9_fid *fid = file->private_data;

+ BUG_ON(!fid);
+
+ /* we might need to read from a fid that was opened write-only
+ * for read-modify-write of page cache, use the writeback fid
+ * for that */
+ if (rreq->origin == NETFS_READ_FOR_WRITE &&
+ (fid->mode & O_ACCMODE) == O_WRONLY) {
+ fid = v9inode->writeback_fid;
+ BUG_ON(!fid);
+ }
+
refcount_inc(&fid->count);
rreq->netfs_priv = fid;
return 0;
--
2.35.1

2022-06-20 12:55:04

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH v3] 9p: fix EBADF errors in cached mode

On Donnerstag, 16. Juni 2022 23:10:25 CEST Dominique Martinet wrote:
> cached operations sometimes need to do invalid operations (e.g. read
> on a write only file)
> Historic fscache had added a "writeback fid", a special handle opened
> RW as root, for this. The conversion to new fscache missed that bit.
>
> This commit reinstates a slightly lesser variant of the original code
> that uses the writeback fid for partial pages backfills if the regular
> user fid had been open as WRONLY, and thus would lack read permissions.
>
> Link:
> https://lkml.kernel.org/r/[email protected]
> Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads
> and caching") Cc: [email protected]
> Cc: David Howells <[email protected]>
> Reported-By: Christian Schoenebeck <[email protected]>
> Reviewed-by: Christian Schoenebeck <[email protected]>
> Tested-by: Christian Schoenebeck <[email protected]>
> Signed-off-by: Dominique Martinet <[email protected]>
> ---
> v3: use the least permissive version of the patch that only uses
> writeback fid when really required
>
> If no problem shows up by then I'll post this patch around Wed 23 (next
> week) with the other stable fixes.
>
> fs/9p/vfs_addr.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c
> index a8f512b44a85..d0833fa69faf 100644
> --- a/fs/9p/vfs_addr.c
> +++ b/fs/9p/vfs_addr.c
> @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest
> *subreq) */
> static int v9fs_init_request(struct netfs_io_request *rreq, struct file
> *file) {
> + struct inode *inode = file_inode(file);
> + struct v9fs_inode *v9inode = V9FS_I(inode);
> struct p9_fid *fid = file->private_data;
>
> + BUG_ON(!fid);
> +
> + /* we might need to read from a fid that was opened write-only
> + * for read-modify-write of page cache, use the writeback fid
> + * for that */
> + if (rreq->origin == NETFS_READ_FOR_WRITE &&
> + (fid->mode & O_ACCMODE) == O_WRONLY) {
> + fid = v9inode->writeback_fid;
> + BUG_ON(!fid);
> + }
> +
> refcount_inc(&fid->count);
> rreq->netfs_priv = fid;
> return 0;

Some more tests this weekend; all looks fine. It appears that this also fixed
the performance degradation that I reported early in this thread. Again,
benchmarks compiling a bunch of sources:

Case Linux kernel version msize cache duration (average)

A) EBADF fix only [1] 512000 loose 31m 14s
B) EBADF fix only [1] 512000 mmap 44m 1s
C) EBADF fix + clunk fixes [2] 512000 loose 29m 32s
D) EBADF fix + clunk fixes [2] 512000 mmap 44m 0s
E) 5.10.84 512000 loose 35m 5s
F) 5.10.84 512000 mmap 65m 5s

[1] 5.19.0-rc2 + EBADF fix v3 patch (alone):
https://lore.kernel.org/lkml/[email protected]/

[2] 5.19.0-rc2 + EBADF fix v3 patch + clunk fix patches, a.k.a. 9p-next:
https://github.com/martinetd/linux/commit/b0017602fdf6bd3f344dd49eaee8b6ffeed6dbac

Conclusion: all thumbs in my possession pointing upwards. :)

Thanks Dominique!

Best regards,
Christian Schoenebeck


2022-06-20 20:56:36

by Dominique Martinet

[permalink] [raw]
Subject: Re: [PATCH v3] 9p: fix EBADF errors in cached mode

Christian Schoenebeck wrote on Mon, Jun 20, 2022 at 02:47:24PM +0200:
> Some more tests this weekend; all looks fine. It appears that this also fixed
> the performance degradation that I reported early in this thread.

wow, I wouldn't have expected the EBADF fix patch to have any impact on
performance. Maybe the build just behaved differently enough to take
more time with the errors?

> Again, benchmarks compiling a bunch of sources:
>
> Case Linux kernel version msize cache duration (average)
>
> A) EBADF fix only [1] 512000 loose 31m 14s
> B) EBADF fix only [1] 512000 mmap 44m 1s
> C) EBADF fix + clunk fixes [2] 512000 loose 29m 32s
> D) EBADF fix + clunk fixes [2] 512000 mmap 44m 0s
> E) 5.10.84 512000 loose 35m 5s
> F) 5.10.84 512000 mmap 65m 5s
>
> [1] 5.19.0-rc2 + EBADF fix v3 patch (alone):
> https://lore.kernel.org/lkml/[email protected]/
>
> [2] 5.19.0-rc2 + EBADF fix v3 patch + clunk fix patches, a.k.a. 9p-next:
> https://github.com/martinetd/linux/commit/b0017602fdf6bd3f344dd49eaee8b6ffeed6dbac
>
> Conclusion: all thumbs in my possession pointing upwards. :)
>
> Thanks Dominique!

Great news, thanks for the tests! :)

--
Dominique

2022-06-21 12:18:55

by Christian Schoenebeck

[permalink] [raw]
Subject: Re: [PATCH v3] 9p: fix EBADF errors in cached mode

On Montag, 20. Juni 2022 22:34:38 CEST Dominique Martinet wrote:
> Christian Schoenebeck wrote on Mon, Jun 20, 2022 at 02:47:24PM +0200:
> > Some more tests this weekend; all looks fine. It appears that this also
> > fixed the performance degradation that I reported early in this thread.
>
> wow, I wouldn't have expected the EBADF fix patch to have any impact on
> performance. Maybe the build just behaved differently enough to take
> more time with the errors?

Maybe. It could also be less overhead using writeback_fid vs. dedicated fid,
i.e. no walking and fid cloning required when just using the writeback_fid
which is already there (reduced latency).

Probably also overall reduced total amount of fids might have some (smaller)
impact, as on QEMU 9p server side we still have a simple linked list for fids
which is iterated on each fid lookup. A proc-like interface for statistics
(e.g. max. amount of fids) would be useful.

But honestly, all these things still don't really explain to me such a
difference from performance PoV in regards to this patch, as the particular
case handled by this patch does not appear to happen often.

Anyway, my plan is to identify performance bottlenecks in general more
analytically this year. Now that we have macOS support for 9p in QEMU, I'll
probably use Xcode's "Instruments" tool which really has a great way to
graphically investigate complex performance aspects in a very intuitive and
customizable way, which goes beyond standard profiling. Then I can hunt down
performance issues by weight.

Best regards,
Christian Schoenebeck