2020-01-24 00:18:04

by Rich Felker

[permalink] [raw]
Subject: Proposal to fix pwrite with O_APPEND via pwritev2 flag

There's a longstanding unfixable (due to API stability) bug in the
pwrite syscall:

http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS

whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
offset. Now that there's a pwritev2 syscall that takes a flags
argument, it's possible to fix this without breaking stability by
adding a new RWF_NOAPPEND flag, which callers that want the fixed
behavior can then pass.

I have a completely untested patch to add such a flag, but would like
to get a feel for whether the concept is acceptable before putting
time into testing it. If so, I'll submit this as a proper patch with
detailed commit message etc. Draft is below.

Rich



diff --git a/include/linux/fs.h b/include/linux/fs.h
index e0d909d35763..3a769a972f79 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3397,6 +3397,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
{
if (unlikely(flags & ~RWF_SUPPORTED))
return -EOPNOTSUPP;
+ if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
+ return -EINVAL;

if (flags & RWF_NOWAIT) {
if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
@@ -3411,6 +3413,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
if (flags & RWF_APPEND)
ki->ki_flags |= IOCB_APPEND;
+ if (flags & RWF_NOAPPEND)
+ ki->ki_flags &= ~IOCB_APPEND;
return 0;
}

diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 379a612f8f1d..591357d9b3c9 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t;
/* per-IO O_APPEND */
#define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)

+/* per-IO negation of O_APPEND */
+#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
+
/* mask of flags supported by the kernel */
#define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
- RWF_APPEND)
+ RWF_APPEND | RWF_NOAPPEND)

#endif /* _UAPI_LINUX_FS_H */


2020-01-24 09:38:58

by Florian Weimer

[permalink] [raw]
Subject: Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag

* Rich Felker:

> There's a longstanding unfixable (due to API stability) bug in the
> pwrite syscall:
>
> http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
>
> whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> offset. Now that there's a pwritev2 syscall that takes a flags
> argument, it's possible to fix this without breaking stability by
> adding a new RWF_NOAPPEND flag, which callers that want the fixed
> behavior can then pass.
>
> I have a completely untested patch to add such a flag, but would like
> to get a feel for whether the concept is acceptable before putting
> time into testing it. If so, I'll submit this as a proper patch with
> detailed commit message etc. Draft is below.

Has this come up before?

I had already written a test case and it turns out that an O_APPEND
descriptor does not protect the previously written data in the file:

openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
write(3, "@", 1) = 1
close(3) = 0
openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_WRONLY|O_APPEND) = 3
ftruncate(3, 0) = 0

So at least it looks like there is no security issue in adding a
RWF_NOAPPEND flag.

Thanks,
Florian

2020-01-24 14:38:19

by Rich Felker

[permalink] [raw]
Subject: Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag

On Fri, Jan 24, 2020 at 10:37:22AM +0100, Florian Weimer wrote:
> * Rich Felker:
>
> > There's a longstanding unfixable (due to API stability) bug in the
> > pwrite syscall:
> >
> > http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
> >
> > whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> > offset. Now that there's a pwritev2 syscall that takes a flags
> > argument, it's possible to fix this without breaking stability by
> > adding a new RWF_NOAPPEND flag, which callers that want the fixed
> > behavior can then pass.
> >
> > I have a completely untested patch to add such a flag, but would like
> > to get a feel for whether the concept is acceptable before putting
> > time into testing it. If so, I'll submit this as a proper patch with
> > detailed commit message etc. Draft is below.
>
> Has this come up before?

I'm not sure if there's an open glibc bug for it or not, but it's come
up in musl community before that the kernel is non-conforming here for
historical reasons (preserving the original bug in case any software
is depending on it) and we've always wanted to have a fix, but
couldn't find one short of just erroring out if O_APPEND is set when
pwrite is called. That's what the fallback will do (rather than
silently write data at the wrong place) if pwritev2+RWF_NOAPPEND is
not supported on the system at runtime.

> I had already written a test case and it turns out that an O_APPEND
> descriptor does not protect the previously written data in the file:
>
> openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_RDWR|O_CREAT|O_EXCL, 0600) = 3
> write(3, "@", 1) = 1
> close(3) = 0
> openat(AT_FDCWD, "/tmp/append-truncateuoRexJ", O_WRONLY|O_APPEND) = 3
> ftruncate(3, 0) = 0
>
> So at least it looks like there is no security issue in adding a
> RWF_NOAPPEND flag.

Indeed, if you have the file open you can just use fcntl to remove
O_APPEND (but of course using that in an emulation would be racy), so
it's not a security boundary. Someone could try to "make it into one"
with seccomp, blocking fcntl that would remove O_APPEND and blocking
ftruncate, mmap, and all other ways you could modify the existing part
of the file, but that sounds fragile, and if they really want to do
that they can block pwritev2 as well (or at least block it with
RWF_NOAPPEND or future/unknown flags).

Rich

2020-02-05 04:45:12

by Rich Felker

[permalink] [raw]
Subject: Re: Proposal to fix pwrite with O_APPEND via pwritev2 flag

On Thu, Jan 23, 2020 at 07:02:43PM -0500, Rich Felker wrote:
> There's a longstanding unfixable (due to API stability) bug in the
> pwrite syscall:
>
> http://man7.org/linux/man-pages/man2/pwrite.2.html#BUGS
>
> whereby it wrongly honors O_APPEND if set, ignoring the caller-passed
> offset. Now that there's a pwritev2 syscall that takes a flags
> argument, it's possible to fix this without breaking stability by
> adding a new RWF_NOAPPEND flag, which callers that want the fixed
> behavior can then pass.
>
> I have a completely untested patch to add such a flag, but would like
> to get a feel for whether the concept is acceptable before putting
> time into testing it. If so, I'll submit this as a proper patch with
> detailed commit message etc. Draft is below.

I went ahead and tested this, and it works as intended, so I'll post a
proper patch with commit message.

Rich



> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e0d909d35763..3a769a972f79 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3397,6 +3397,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
> {
> if (unlikely(flags & ~RWF_SUPPORTED))
> return -EOPNOTSUPP;
> + if (unlikely((flags & RWF_APPEND) && (flags & RWF_NOAPPEND)))
> + return -EINVAL;
>
> if (flags & RWF_NOWAIT) {
> if (!(ki->ki_filp->f_mode & FMODE_NOWAIT))
> @@ -3411,6 +3413,8 @@ static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
> ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
> if (flags & RWF_APPEND)
> ki->ki_flags |= IOCB_APPEND;
> + if (flags & RWF_NOAPPEND)
> + ki->ki_flags &= ~IOCB_APPEND;
> return 0;
> }
>
> diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
> index 379a612f8f1d..591357d9b3c9 100644
> --- a/include/uapi/linux/fs.h
> +++ b/include/uapi/linux/fs.h
> @@ -299,8 +299,11 @@ typedef int __bitwise __kernel_rwf_t;
> /* per-IO O_APPEND */
> #define RWF_APPEND ((__force __kernel_rwf_t)0x00000010)
>
> +/* per-IO negation of O_APPEND */
> +#define RWF_NOAPPEND ((__force __kernel_rwf_t)0x00000020)
> +
> /* mask of flags supported by the kernel */
> #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\
> - RWF_APPEND)
> + RWF_APPEND | RWF_NOAPPEND)
>
> #endif /* _UAPI_LINUX_FS_H */