2020-08-31 15:34:08

by Rich Felker

[permalink] [raw]
Subject: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

The pwrite function, originally defined by POSIX (thus the "p"), is
defined to ignore O_APPEND and write at the offset passed as its
argument. However, historically Linux honored O_APPEND if set and
ignored the offset. This cannot be changed due to stability policy,
but is documented in the man page as a bug.

Now that there's a pwritev2 syscall providing a superset of the pwrite
functionality that has a flags argument, the conforming behavior can
be offered to userspace via a new flag. Since pwritev2 checks flag
validity (in kiocb_set_rw_flags) and reports unknown ones with
EOPNOTSUPP, callers will not get wrong behavior on old kernels that
don't support the new flag; the error is reported and the caller can
decide how to handle it.

Signed-off-by: Rich Felker <[email protected]>
---

Changes in v2: I've added a check to ensure that RWF_NOAPPEND does not
override O_APPEND for S_APPEND (chattr +a) inodes, and fixed conflicts
with 1752f0adea98ef85, which optimized kiocb_set_rw_flags to work with
a local copy of flags. Unfortunately the same optimization does not
work for RWF_NOAPPEND since it needs to remove flags from the original
set at function entry.

If desired, I could further change this so that kiocb_flags is
initialized to ki->ki_flags, with assignment-back in place of |= at
the end of the function. This would allow the same local variable
pattern in the RWF_NOAPPEND code path, which might be more elegant,
but I'm not sure if the emitted code would improve or get worse.


include/linux/fs.h | 7 +++++++
include/uapi/linux/fs.h | 5 ++++-
2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7519ae003a08..924e17ac8e7e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3321,6 +3321,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
return 0;
if (unlikely(flags & ~RWF_SUPPORTED))
return -EOPNOTSUPP;
+ if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+ return -EINVAL;

if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3335,6 +3337,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
kiocb_flags |= (IOCB_DSYNC | IOCB_SYNC);
if (flags & RWF_APPEND)
kiocb_flags |= IOCB_APPEND;
+ if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) {
+ if (IS_APPEND(file_inode(ki->ki_filp)))
+ return -EPERM;
+ ki->ki_flags &= ~IOCB_APPEND;
+ }

ki->ki_flags |= kiocb_flags;
return 0;
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index f44eb0a04afd..d5e54e0742cf 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -300,8 +300,11 @@ typedef int __bitwise __kernel_rwf_t;
/* per-IO O_APPEND */
#define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)

+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
+
/* mask of flags supported by the kernel */
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
- RWF_APPEND)
+ RWF_APPEND | RWF_NOAPPEND)

#endif /* _UAPI_LINUX_FS_H */
--
2.21.0


2020-08-31 15:48:24

by Jann Horn

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On Mon, Aug 31, 2020 at 5:32 PM Rich Felker <[email protected]> wrote:
> The pwrite function, originally defined by POSIX (thus the "p"), is
> defined to ignore O_APPEND and write at the offset passed as its
> argument. However, historically Linux honored O_APPEND if set and
> ignored the offset. This cannot be changed due to stability policy,
> but is documented in the man page as a bug.
>
> Now that there's a pwritev2 syscall providing a superset of the pwrite
> functionality that has a flags argument, the conforming behavior can
> be offered to userspace via a new flag. Since pwritev2 checks flag
> validity (in kiocb_set_rw_flags) and reports unknown ones with
> EOPNOTSUPP, callers will not get wrong behavior on old kernels that
> don't support the new flag; the error is reported and the caller can
> decide how to handle it.
>
> Signed-off-by: Rich Felker <[email protected]>

Reviewed-by: Jann Horn <[email protected]>

Note that if this lands, Michael Kerrisk will probably be happy if you
send a corresponding patch for the manpage man2/readv.2.

Btw, I'm not really sure whose tree this should go through - VFS is
normally Al Viro's turf, but it looks like the most recent
modifications to this function have gone through Jens Axboe's tree?

> ---
>
> Changes in v2: I've added a check to ensure that RWF_NOAPPEND does not
> override O_APPEND for S_APPEND (chattr +a) inodes, and fixed conflicts
> with 1752f0adea98ef85, which optimized kiocb_set_rw_flags to work with
> a local copy of flags. Unfortunately the same optimization does not
> work for RWF_NOAPPEND since it needs to remove flags from the original
> set at function entry.
>
> If desired, I could further change this so that kiocb_flags is
> initialized to ki->ki_flags, with assignment-back in place of |= at
> the end of the function. This would allow the same local variable
> pattern in the RWF_NOAPPEND code path, which might be more elegant,
> but I'm not sure if the emitted code would improve or get worse.
>
>
> include/linux/fs.h | 7 +++++++
> include/uapi/linux/fs.h | 5 ++++-
> 2 files changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7519ae003a08..924e17ac8e7e 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3321,6 +3321,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
> return 0;
> if (unlikely(flags & ~RWF_SUPPORTED))
> return -EOPNOTSUPP;
> + if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
> + return -EINVAL;
>
> if (flags & RWF_NOWAIT) {
> if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
> @@ -3335,6 +3337,11 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
> kiocb_flags |= (IOCB_DSYNC | IOCB_SYNC);
> if (flags & RWF_APPEND)
> kiocb_flags |= IOCB_APPEND;
> + if ((flags & RWF_NOAPPEND) && (ki->ki_flags & IOCB_APPEND)) {
> + if (IS_APPEND(file_inode(ki->ki_filp)))
> + return -EPERM;
> + ki->ki_flags &= ~IOCB_APPEND;
> + }
>
> ki->ki_flags |= kiocb_flags;
> return 0;
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index f44eb0a04afd..d5e54e0742cf 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -300,8 +300,11 @@ typedef int __bitwise __kernel_rwf_t;
> /* per-IO O_APPEND */
> #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)
>
> +/* per-IO negation of O_APPEND */
> +#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
> +
> /* mask of flags supported by the kernel */
> #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> - RWF_APPEND)
> + RWF_APPEND | RWF_NOAPPEND)
>
> #endif /* _UAPI_LINUX_FS_H */
> --
> 2.21.0
>

2020-08-31 20:27:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On 8/31/20 9:46 AM, Jann Horn wrote:
> On Mon, Aug 31, 2020 at 5:32 PM Rich Felker <[email protected]> wrote:
>> The pwrite function, originally defined by POSIX (thus the "p"), is
>> defined to ignore O_APPEND and write at the offset passed as its
>> argument. However, historically Linux honored O_APPEND if set and
>> ignored the offset. This cannot be changed due to stability policy,
>> but is documented in the man page as a bug.
>>
>> Now that there's a pwritev2 syscall providing a superset of the pwrite
>> functionality that has a flags argument, the conforming behavior can
>> be offered to userspace via a new flag. Since pwritev2 checks flag
>> validity (in kiocb_set_rw_flags) and reports unknown ones with
>> EOPNOTSUPP, callers will not get wrong behavior on old kernels that
>> don't support the new flag; the error is reported and the caller can
>> decide how to handle it.
>>
>> Signed-off-by: Rich Felker <[email protected]>
>
> Reviewed-by: Jann Horn <[email protected]>
>
> Note that if this lands, Michael Kerrisk will probably be happy if you
> send a corresponding patch for the manpage man2/readv.2.
>
> Btw, I'm not really sure whose tree this should go through - VFS is
> normally Al Viro's turf, but it looks like the most recent
> modifications to this function have gone through Jens Axboe's tree?

Should probably go through Al's tree, I've only carried them when
they've been associated with io_uring in some shape or form.

--
Jens Axboe

2024-01-18 15:57:45

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On Mon, Aug 31, 2020 at 11:05:34AM -0600, Jens Axboe wrote:
> On 8/31/20 9:46 AM, Jann Horn wrote:
> > On Mon, Aug 31, 2020 at 5:32 PM Rich Felker <[email protected]> wrote:
> >> The pwrite function, originally defined by POSIX (thus the "p"), is
> >> defined to ignore O_APPEND and write at the offset passed as its
> >> argument. However, historically Linux honored O_APPEND if set and
> >> ignored the offset. This cannot be changed due to stability policy,
> >> but is documented in the man page as a bug.
> >>
> >> Now that there's a pwritev2 syscall providing a superset of the pwrite
> >> functionality that has a flags argument, the conforming behavior can
> >> be offered to userspace via a new flag. Since pwritev2 checks flag
> >> validity (in kiocb_set_rw_flags) and reports unknown ones with
> >> EOPNOTSUPP, callers will not get wrong behavior on old kernels that
> >> don't support the new flag; the error is reported and the caller can
> >> decide how to handle it.
> >>
> >> Signed-off-by: Rich Felker <[email protected]>
> >
> > Reviewed-by: Jann Horn <[email protected]>
> >
> > Note that if this lands, Michael Kerrisk will probably be happy if you
> > send a corresponding patch for the manpage man2/readv.2.
> >
> > Btw, I'm not really sure whose tree this should go through - VFS is
> > normally Al Viro's turf, but it looks like the most recent
> > modifications to this function have gone through Jens Axboe's tree?
>
> Should probably go through Al's tree, I've only carried them when
> they've been associated with io_uring in some shape or form.

This appears to have slipped through the cracks. Do I need to send an
updated rebase of it? Were there any objections to it I missed?

Rich

2024-01-18 16:02:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On 1/18/24 8:57 AM, Rich Felker wrote:
> On Mon, Aug 31, 2020 at 11:05:34AM -0600, Jens Axboe wrote:
>> On 8/31/20 9:46 AM, Jann Horn wrote:
>>> On Mon, Aug 31, 2020 at 5:32 PM Rich Felker <[email protected]> wrote:
>>>> The pwrite function, originally defined by POSIX (thus the "p"), is
>>>> defined to ignore O_APPEND and write at the offset passed as its
>>>> argument. However, historically Linux honored O_APPEND if set and
>>>> ignored the offset. This cannot be changed due to stability policy,
>>>> but is documented in the man page as a bug.
>>>>
>>>> Now that there's a pwritev2 syscall providing a superset of the pwrite
>>>> functionality that has a flags argument, the conforming behavior can
>>>> be offered to userspace via a new flag. Since pwritev2 checks flag
>>>> validity (in kiocb_set_rw_flags) and reports unknown ones with
>>>> EOPNOTSUPP, callers will not get wrong behavior on old kernels that
>>>> don't support the new flag; the error is reported and the caller can
>>>> decide how to handle it.
>>>>
>>>> Signed-off-by: Rich Felker <[email protected]>
>>>
>>> Reviewed-by: Jann Horn <[email protected]>
>>>
>>> Note that if this lands, Michael Kerrisk will probably be happy if you
>>> send a corresponding patch for the manpage man2/readv.2.
>>>
>>> Btw, I'm not really sure whose tree this should go through - VFS is
>>> normally Al Viro's turf, but it looks like the most recent
>>> modifications to this function have gone through Jens Axboe's tree?
>>
>> Should probably go through Al's tree, I've only carried them when
>> they've been associated with io_uring in some shape or form.
>
> This appears to have slipped through the cracks. Do I need to send an
> updated rebase of it? Were there any objections to it I missed?

Let's add Christian.

--
Jens Axboe



2024-01-19 14:36:18

by Christian Brauner

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On Mon, 31 Aug 2020 11:32:08 -0400, Rich Felker wrote:
> The pwrite function, originally defined by POSIX (thus the "p"), is
> defined to ignore O_APPEND and write at the offset passed as its
> argument. However, historically Linux honored O_APPEND if set and
> ignored the offset. This cannot be changed due to stability policy,
> but is documented in the man page as a bug.
>
> Now that there's a pwritev2 syscall providing a superset of the pwrite
> functionality that has a flags argument, the conforming behavior can
> be offered to userspace via a new flag. Since pwritev2 checks flag
> validity (in kiocb_set_rw_flags) and reports unknown ones with
> EOPNOTSUPP, callers will not get wrong behavior on old kernels that
> don't support the new flag; the error is reported and the caller can
> decide how to handle it.
>
> [...]

The RWF_* and IOCB_* flags were
aligned so they could be set in one operation. So there was a merge
conflict when applying. I've resolved it. Please take a look and make
sure that it's all correct.

---

Applied to the vfs.misc branch of the vfs/vfs.git tree.
Patches in the vfs.misc branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.misc

[1/1] vfs: add RWF_NOAPPEND flag for pwritev2
https://git.kernel.org/vfs/vfs/c/31081ab305a1

2024-01-19 19:10:56

by Rich Felker

[permalink] [raw]
Subject: Re: [PATCH v2] vfs: add RWF_NOAPPEND flag for pwritev2

On Fri, Jan 19, 2024 at 03:33:32PM +0100, Christian Brauner wrote:
> On Mon, 31 Aug 2020 11:32:08 -0400, Rich Felker wrote:
> > The pwrite function, originally defined by POSIX (thus the "p"), is
> > defined to ignore O_APPEND and write at the offset passed as its
> > argument. However, historically Linux honored O_APPEND if set and
> > ignored the offset. This cannot be changed due to stability policy,
> > but is documented in the man page as a bug.
> >
> > Now that there's a pwritev2 syscall providing a superset of the pwrite
> > functionality that has a flags argument, the conforming behavior can
> > be offered to userspace via a new flag. Since pwritev2 checks flag
> > validity (in kiocb_set_rw_flags) and reports unknown ones with
> > EOPNOTSUPP, callers will not get wrong behavior on old kernels that
> > don't support the new flag; the error is reported and the caller can
> > decide how to handle it.
> >
> > [...]
>
> The RWF_* and IOCB_* flags were
> aligned so they could be set in one operation. So there was a merge
> conflict when applying. I've resolved it. Please take a look and make
> sure that it's all correct.
>
> ---
>
> Applied to the vfs.misc branch of the vfs/vfs.git tree.
> Patches in the vfs.misc branch should appear in linux-next soon.
>
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
>
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
>
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
>
> tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.misc
>
> [1/1] vfs: add RWF_NOAPPEND flag for pwritev2
> https://git.kernel.org/vfs/vfs/c/31081ab305a1

LGTM. Thanks!