2023-09-25 09:15:36

by Jens Axboe

[permalink] [raw]
Subject: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

overlayfs copies the kiocb flags when it sets up a new kiocb to handle
a write, but it doesn't properly support dealing with the deferred
caller completions of the kiocb. This means it doesn't get the final
write completion value, and hence will complete the write with '0' as
the result.

We could support the caller completions in overlayfs, but for now let's
just disable them in the generated write kiocb.

Reported-by: Zorro Lang <[email protected]>
Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP")
Signed-off-by: Jens Axboe <[email protected]>

---

diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4193633c4c7a..693971d20280 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -391,6 +391,12 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
ifl &= ~(IOCB_DSYNC | IOCB_SYNC);

+ /*
+ * Overlayfs doesn't support deferred completions, don't copy
+ * this property in case it is set by the issuer.
+ */
+ ifl &= ~IOCB_DIO_CALLER_COMP;
+
old_cred = ovl_override_creds(file_inode(file)->i_sb);
if (is_sync_kiocb(iocb)) {
file_start_write(real.file);

--
Jens Axboe



2023-09-25 09:44:56

by Amir Goldstein

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On Mon, Sep 25, 2023 at 9:21 AM Jens Axboe <[email protected]> wrote:
>
> overlayfs copies the kiocb flags when it sets up a new kiocb to handle
> a write, but it doesn't properly support dealing with the deferred
> caller completions of the kiocb. This means it doesn't get the final
> write completion value, and hence will complete the write with '0' as
> the result.
>
> We could support the caller completions in overlayfs, but for now let's
> just disable them in the generated write kiocb.
>
> Reported-by: Zorro Lang <[email protected]>
> Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
> Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP")
> Signed-off-by: Jens Axboe <[email protected]>
>

Thanks for fixing this Jens!
If you or Christian want to send this fix to Linus, you have my ACK.

On the bright side, I am glad that you are aware of the overlayfs
"kiocb_clone" use case, which delegates/forwards the io request to
another file in another fs.

I have already posted an RFC [1] for moving this functionality to
common vfs code. My main goal was to expose it to other filesystem
(fuse), but a very desired side effect is that this functionality gets
more vfs reviewer eyes and then the chances of catching a regression
like this one during review of vfs changes hopefully increases.

As for test coverage, I need to check why my tests did not catch
this - I suspect fsx may not have been rebuilt with io_uring support,
but not sure (not near workstation atm).

If you would like to add overlayfs to your test coverage, as Zorro
explained, it is as simple as running ./check -overlay with your
existing fstests config.
./check -overlay is a relatively faster test run because many of the
tests do _notrun on overlayfs.
I don't have to tell you that io_uring code will end up running on
overlayfs in many container workloads, so it is not a niche setup.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/[email protected]/

> ---
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4193633c4c7a..693971d20280 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -391,6 +391,12 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> if (!ovl_should_sync(OVL_FS(inode->i_sb)))
> ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
>
> + /*
> + * Overlayfs doesn't support deferred completions, don't copy
> + * this property in case it is set by the issuer.
> + */
> + ifl &= ~IOCB_DIO_CALLER_COMP;
> +
> old_cred = ovl_override_creds(file_inode(file)->i_sb);
> if (is_sync_kiocb(iocb)) {
> file_start_write(real.file);
>
> --
> Jens Axboe
>
>

2023-09-25 09:52:17

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

> No problem - and thanks, maybe Christian can pick this one up? I

Snatched it I've got a pile I need to send to Linus this week anyway.
(Thanks for the Cc, Amir!)

2023-09-25 09:54:19

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On Mon, 25 Sep 2023 00:21:35 -0600, Jens Axboe wrote:
> overlayfs copies the kiocb flags when it sets up a new kiocb to handle
> a write, but it doesn't properly support dealing with the deferred
> caller completions of the kiocb. This means it doesn't get the final
> write completion value, and hence will complete the write with '0' as
> the result.
>
> We could support the caller completions in overlayfs, but for now let's
> just disable them in the generated write kiocb.
>
> [...]

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] ovl: disable IOCB_DIO_CALLER_COMP
https://git.kernel.org/vfs/vfs/c/2d1b3bbc3dd5

2023-09-25 10:44:16

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On 9/25/23 3:18 AM, Amir Goldstein wrote:
> On Mon, Sep 25, 2023 at 9:21?AM Jens Axboe <[email protected]> wrote:
>>
>> overlayfs copies the kiocb flags when it sets up a new kiocb to handle
>> a write, but it doesn't properly support dealing with the deferred
>> caller completions of the kiocb. This means it doesn't get the final
>> write completion value, and hence will complete the write with '0' as
>> the result.
>>
>> We could support the caller completions in overlayfs, but for now let's
>> just disable them in the generated write kiocb.
>>
>> Reported-by: Zorro Lang <[email protected]>
>> Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
>> Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP")
>> Signed-off-by: Jens Axboe <[email protected]>
>>
>
> Thanks for fixing this Jens!
> If you or Christian want to send this fix to Linus, you have my ACK.

No problem - and thanks, maybe Christian can pick this one up? I
tentatively queued it up here just so I don't forget it:

https://git.kernel.dk/cgit/linux/log/?h=ovl-kiocb

> On the bright side, I am glad that you are aware of the overlayfs
> "kiocb_clone" use case, which delegates/forwards the io request to
> another file in another fs.
>
> I have already posted an RFC [1] for moving this functionality to
> common vfs code. My main goal was to expose it to other filesystem
> (fuse), but a very desired side effect is that this functionality gets
> more vfs reviewer eyes and then the chances of catching a regression
> like this one during review of vfs changes hopefully increases.

Ah that's great! Yeah it's a bit hidden in there if you don't know about
it, and I did grep today when writing this patch to ensure we didn't
have any others like it. So I think we're good for now, at least.

> As for test coverage, I need to check why my tests did not catch
> this - I suspect fsx may not have been rebuilt with io_uring support,
> but not sure (not near workstation atm).

I'm guessing it's because you don't have liburing installed on the test
box, then fsx etc don't get built with io_uring support in xfstests.

> If you would like to add overlayfs to your test coverage, as Zorro
> explained, it is as simple as running ./check -overlay with your
> existing fstests config.
> ./check -overlay is a relatively faster test run because many of the
> tests do _notrun on overlayfs.
> I don't have to tell you that io_uring code will end up running on
> overlayfs in many container workloads, so it is not a niche setup.

Will add it to the mix! Thanks for the details.

--
Jens Axboe

2023-09-25 17:21:00

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On Sep 25, 2023, at 11:39 AM, Christian Brauner <[email protected]> wrote:
>
> 
>>
>> No problem - and thanks, maybe Christian can pick this one up? I
>
> Snatched it I've got a pile I need to send to Linus this week anyway.
> (Thanks for the Cc, Amir!)

Perfect, thanks!

2023-09-26 19:35:41

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On 9/26/23 9:23 AM, Zorro Lang wrote:
> On Mon, Sep 25, 2023 at 12:21:35AM -0600, Jens Axboe wrote:
>> overlayfs copies the kiocb flags when it sets up a new kiocb to handle
>> a write, but it doesn't properly support dealing with the deferred
>> caller completions of the kiocb. This means it doesn't get the final
>> write completion value, and hence will complete the write with '0' as
>> the result.
>>
>> We could support the caller completions in overlayfs, but for now let's
>> just disable them in the generated write kiocb.
>>
>> Reported-by: Zorro Lang <[email protected]>
>> Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
>> Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP")
>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> ---
>
> Thanks Jens, the fstests generic/617 works on latest linux kernel with
> this patch now.

Excellent, thanks for re-testing.

--
Jens Axboe

2023-09-28 10:49:47

by Zorro Lang

[permalink] [raw]
Subject: Re: [PATCH] ovl: disable IOCB_DIO_CALLER_COMP

On Mon, Sep 25, 2023 at 12:21:35AM -0600, Jens Axboe wrote:
> overlayfs copies the kiocb flags when it sets up a new kiocb to handle
> a write, but it doesn't properly support dealing with the deferred
> caller completions of the kiocb. This means it doesn't get the final
> write completion value, and hence will complete the write with '0' as
> the result.
>
> We could support the caller completions in overlayfs, but for now let's
> just disable them in the generated write kiocb.
>
> Reported-by: Zorro Lang <[email protected]>
> Link: https://lore.kernel.org/io-uring/20230924142754.ejwsjen5pvyc32l4@dell-per750-06-vm-08.rhts.eng.pek2.redhat.com/
> Fixes: 8c052fb3002e ("iomap: support IOCB_DIO_CALLER_COMP")
> Signed-off-by: Jens Axboe <[email protected]>
>
> ---

Thanks Jens, the fstests generic/617 works on latest linux kernel with this
patch now.

>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4193633c4c7a..693971d20280 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -391,6 +391,12 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> if (!ovl_should_sync(OVL_FS(inode->i_sb)))
> ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
>
> + /*
> + * Overlayfs doesn't support deferred completions, don't copy
> + * this property in case it is set by the issuer.
> + */
> + ifl &= ~IOCB_DIO_CALLER_COMP;
> +
> old_cred = ovl_override_creds(file_inode(file)->i_sb);
> if (is_sync_kiocb(iocb)) {
> file_start_write(real.file);
>
> --
> Jens Axboe
>
>
>