If an operation's flag `needs_file` is set, the function
io_req_set_file() calls io_file_get() to obtain a `struct file*`.
This fails for `O_PATH` file descriptors, because those have no
`struct file*`, causing io_req_set_file() to throw `-EBADF`. This
breaks the operations `openat`, `openat2` and `statx`, where `O_PATH`
file descriptors are commonly used.
The solution is to simply remove `needs_file` (and the accompanying
flag `fd_non_reg`). This flag was never needed because those
operations use numeric file descriptor and don't use the `struct
file*` obtained by io_req_set_file().
Signed-off-by: Max Kellermann <[email protected]>
Cc: [email protected]
---
fs/io_uring.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index a46de2cfc28e..d24f8e33323c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -693,8 +693,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1,
},
[IORING_OP_OPENAT] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
@@ -708,8 +706,6 @@ static const struct io_op_def io_op_defs[] = {
},
[IORING_OP_STATX] = {
.needs_mm = 1,
- .needs_file = 1,
- .fd_non_neg = 1,
.needs_fs = 1,
},
[IORING_OP_READ] = {
@@ -739,8 +735,6 @@ static const struct io_op_def io_op_defs[] = {
.unbound_nonreg_file = 1,
},
[IORING_OP_OPENAT2] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
--
2.20.1
On 5/7/20 12:57 PM, Max Kellermann wrote:
> If an operation's flag `needs_file` is set, the function
> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
>
> This fails for `O_PATH` file descriptors, because those have no
> `struct file*`, causing io_req_set_file() to throw `-EBADF`. This
> breaks the operations `openat`, `openat2` and `statx`, where `O_PATH`
> file descriptors are commonly used.
>
> The solution is to simply remove `needs_file` (and the accompanying
> flag `fd_non_reg`). This flag was never needed because those
> operations use numeric file descriptor and don't use the `struct
> file*` obtained by io_req_set_file().
Do you happen to have a liburing test addition for this as well?
--
Jens Axboe
On 2020/05/07 20:58, Jens Axboe <[email protected]> wrote:
> Do you happen to have a liburing test addition for this as well?
No, I'll write one tomorrow. GitHub PR or email preferred?
Max
On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
> If an operation's flag `needs_file` is set, the function
> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
>
> This fails for `O_PATH` file descriptors, because those have no
> `struct file*`
O_PATH descriptors most certainly *do* have that. What the hell
are you talking about?
On 5/7/20 1:01 PM, Al Viro wrote:
> On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
>> If an operation's flag `needs_file` is set, the function
>> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
>>
>> This fails for `O_PATH` file descriptors, because those have no
>> `struct file*`
>
> O_PATH descriptors most certainly *do* have that. What the hell
> are you talking about?
Yeah, hence I was interested in the test case. Since this is
bypassing that part, was assuming we'd have some logic error
that attempted a file grab for a case where we shouldn't.
--
Jens Axboe
On 2020/05/07 21:01, Al Viro <[email protected]> wrote:
> On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
> > If an operation's flag `needs_file` is set, the function
> > io_req_set_file() calls io_file_get() to obtain a `struct file*`.
> >
> > This fails for `O_PATH` file descriptors, because those have no
> > `struct file*`
>
> O_PATH descriptors most certainly *do* have that. What the hell
> are you talking about?
Oh, then my patch description (and my understanding of the root
problem) is wrong. In my debugging session, io_file_get() on that fd
returned NULL, so I assumed O_PATH doesn't have that, but maybe there
are other reasons.
In any case, with a "real" fd, io_uring openat() succeeds, and my
patch makes the problem with O_PATH go away.
I guess I need to learn more about what happens inside io_file_get().
Max
On 2020/05/07 21:05, Jens Axboe <[email protected]> wrote:
> On 5/7/20 1:01 PM, Al Viro wrote:
> > On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
> >> If an operation's flag `needs_file` is set, the function
> >> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
> >>
> >> This fails for `O_PATH` file descriptors, because those have no
> >> `struct file*`
> >
> > O_PATH descriptors most certainly *do* have that. What the hell
> > are you talking about?
>
> Yeah, hence I was interested in the test case. Since this is
> bypassing that part, was assuming we'd have some logic error
> that attempted a file grab for a case where we shouldn't.
Reproduce this by patching liburing/test/lfs-openat.c:
- int dfd = open("/tmp", O_RDONLY | O_DIRECTORY);
+ int dfd = open("/tmp", O_PATH);
$ ./test/lfs-openat
io_uring openat failed: Bad file descriptor
GH PR: https://github.com/axboe/liburing/pull/130
Max
On Thu, May 07, 2020 at 01:05:23PM -0600, Jens Axboe wrote:
> On 5/7/20 1:01 PM, Al Viro wrote:
> > On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
> >> If an operation's flag `needs_file` is set, the function
> >> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
> >>
> >> This fails for `O_PATH` file descriptors, because those have no
> >> `struct file*`
> >
> > O_PATH descriptors most certainly *do* have that. What the hell
> > are you talking about?
>
> Yeah, hence I was interested in the test case. Since this is
> bypassing that part, was assuming we'd have some logic error
> that attempted a file grab for a case where we shouldn't.
Just in case - you do realize that you should either resolve the
descriptor yourself (and use the resulting struct file *, without
letting anyone even look at the descriptor) *or* pass the
descriptor as-is and don't even look at the descriptor table?
Once more, with feeling:
Descriptor tables are inherently sharable objects. You can't resolve
a descriptor twice and assume you'll get the same thing both times.
You can't insert something into descriptor table and assume that the
same slot will be holding the same struct file reference after
the descriptor table has been unlocked.
Again, resolving the descriptor more than once in course of syscall
is almost always a serious bug; there are very few exceptions and
none of the mentioned in that patch are anywhere near those.
IOW, that patch will either immediately break things on O_PATH
(if you are really passing struct file *) or it's probably correct,
but the reason is entirely different - it's that you are passing
descriptor, which gets resolved by whatever you are calling, in
which case io_uring has no business resolving it. And if that's
the case, you are limited to real descriptors - your descriptor
table lookalikes won't be of any use.
On 2020/05/07 21:29, Al Viro <[email protected]> wrote:
> Again, resolving the descriptor more than once in course of syscall
> is almost always a serious bug;
.. and that is what Linux currently does for those three operation,
and yes, it's buggy. The generic preparation code looks up the fd,
but later in the implementation, only the numeric fd is used.
My patch removes this duplication, by removing the first lookup.
Max
On 5/7/20 1:29 PM, Al Viro wrote:
> On Thu, May 07, 2020 at 01:05:23PM -0600, Jens Axboe wrote:
>> On 5/7/20 1:01 PM, Al Viro wrote:
>>> On Thu, May 07, 2020 at 08:57:25PM +0200, Max Kellermann wrote:
>>>> If an operation's flag `needs_file` is set, the function
>>>> io_req_set_file() calls io_file_get() to obtain a `struct file*`.
>>>>
>>>> This fails for `O_PATH` file descriptors, because those have no
>>>> `struct file*`
>>>
>>> O_PATH descriptors most certainly *do* have that. What the hell
>>> are you talking about?
>>
>> Yeah, hence I was interested in the test case. Since this is
>> bypassing that part, was assuming we'd have some logic error
>> that attempted a file grab for a case where we shouldn't.
>
> Just in case - you do realize that you should either resolve the
> descriptor yourself (and use the resulting struct file *, without
> letting anyone even look at the descriptor) *or* pass the
> descriptor as-is and don't even look at the descriptor table?
>
> Once more, with feeling:
>
> Descriptor tables are inherently sharable objects. You can't resolve
> a descriptor twice and assume you'll get the same thing both times.
> You can't insert something into descriptor table and assume that the
> same slot will be holding the same struct file reference after
> the descriptor table has been unlocked.
>
> Again, resolving the descriptor more than once in course of syscall
> is almost always a serious bug; there are very few exceptions and
> none of the mentioned in that patch are anywhere near those.
>
> IOW, that patch will either immediately break things on O_PATH
> (if you are really passing struct file *) or it's probably correct,
> but the reason is entirely different - it's that you are passing
> descriptor, which gets resolved by whatever you are calling, in
> which case io_uring has no business resolving it. And if that's
> the case, you are limited to real descriptors - your descriptor
> table lookalikes won't be of any use.
I think the patch is correct as-is, I took a good look at how we're
currently handling it. None of those three ops should fiddle with
the fd at all, and all of them do forbid the use of fixed files (the
descriptor table look-alikes), so that part is fine, too.
There's some low hanging fruit around optimizing and improving it,
I'm including an updated version below. Max, can you double check
with your testing?
diff --git a/fs/io_uring.c b/fs/io_uring.c
index dd680eb153cb..979d9f977409 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -680,8 +680,6 @@ struct io_op_def {
unsigned needs_mm : 1;
/* needs req->file assigned */
unsigned needs_file : 1;
- /* needs req->file assigned IFF fd is >= 0 */
- unsigned fd_non_neg : 1;
/* hash wq insertion if file is a regular file */
unsigned hash_reg_file : 1;
/* unbound wq insertion if file is a non-regular file */
@@ -784,8 +782,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1,
},
[IORING_OP_OPENAT] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
@@ -799,8 +795,6 @@ static const struct io_op_def io_op_defs[] = {
},
[IORING_OP_STATX] = {
.needs_mm = 1,
- .needs_file = 1,
- .fd_non_neg = 1,
.needs_fs = 1,
.file_table = 1,
},
@@ -837,8 +831,6 @@ static const struct io_op_def io_op_defs[] = {
.buffer_select = 1,
},
[IORING_OP_OPENAT2] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
@@ -5368,15 +5360,6 @@ static void io_wq_submit_work(struct io_wq_work **workptr)
io_steal_work(req, workptr);
}
-static int io_req_needs_file(struct io_kiocb *req, int fd)
-{
- if (!io_op_defs[req->opcode].needs_file)
- return 0;
- if ((fd == -1 || fd == AT_FDCWD) && io_op_defs[req->opcode].fd_non_neg)
- return 0;
- return 1;
-}
-
static inline struct file *io_file_from_index(struct io_ring_ctx *ctx,
int index)
{
@@ -5414,14 +5397,11 @@ static int io_file_get(struct io_submit_state *state, struct io_kiocb *req,
}
static int io_req_set_file(struct io_submit_state *state, struct io_kiocb *req,
- int fd, unsigned int flags)
+ int fd)
{
bool fixed;
- if (!io_req_needs_file(req, fd))
- return 0;
-
- fixed = (flags & IOSQE_FIXED_FILE);
+ fixed = (req->flags & REQ_F_FIXED_FILE) != 0;
if (unlikely(!fixed && req->needs_fixed_file))
return -EBADF;
@@ -5798,7 +5778,7 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
struct io_submit_state *state, bool async)
{
unsigned int sqe_flags;
- int id, fd;
+ int id;
/*
* All io need record the previous position, if LINK vs DARIN,
@@ -5850,8 +5830,10 @@ static int io_init_req(struct io_ring_ctx *ctx, struct io_kiocb *req,
IOSQE_ASYNC | IOSQE_FIXED_FILE |
IOSQE_BUFFER_SELECT | IOSQE_IO_LINK);
- fd = READ_ONCE(sqe->fd);
- return io_req_set_file(state, req, fd, sqe_flags);
+ if (!io_op_defs[req->opcode].needs_file)
+ return 0;
+
+ return io_req_set_file(state, req, READ_ONCE(sqe->fd));
}
static int io_submit_sqes(struct io_ring_ctx *ctx, unsigned int nr,
--
Jens Axboe
On Thu, May 07, 2020 at 02:53:30PM -0600, Jens Axboe wrote:
> I think the patch is correct as-is, I took a good look at how we're
> currently handling it. None of those three ops should fiddle with
> the fd at all, and all of them do forbid the use of fixed files (the
> descriptor table look-alikes), so that part is fine, too.
>
> There's some low hanging fruit around optimizing and improving it,
> I'm including an updated version below. Max, can you double check
> with your testing?
<looks>
Could you explain WTF is io_issue_sqe() doing in case of IORING_OP_CLOSE?
Specifically, what is the value of
req->close.fd = READ_ONCE(sqe->fd);
if (req->file->f_op == &io_uring_fops ||
req->close.fd == req->ctx->ring_fd)
return -EBADF;
in io_close_prep()? And what does happen if some joker does dup2()
of something with io_uring_fops into our slot right after that check?
Before the subsequent
ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
that is.
On 5/7/20 4:06 PM, Al Viro wrote:
> On Thu, May 07, 2020 at 02:53:30PM -0600, Jens Axboe wrote:
>
>> I think the patch is correct as-is, I took a good look at how we're
>> currently handling it. None of those three ops should fiddle with
>> the fd at all, and all of them do forbid the use of fixed files (the
>> descriptor table look-alikes), so that part is fine, too.
>>
>> There's some low hanging fruit around optimizing and improving it,
>> I'm including an updated version below. Max, can you double check
>> with your testing?
>
> <looks>
>
> Could you explain WTF is io_issue_sqe() doing in case of IORING_OP_CLOSE?
> Specifically, what is the value of
> req->close.fd = READ_ONCE(sqe->fd);
> if (req->file->f_op == &io_uring_fops ||
> req->close.fd == req->ctx->ring_fd)
> return -EBADF;
> in io_close_prep()? And what does happen if some joker does dup2()
> of something with io_uring_fops into our slot right after that check?
> Before the subsequent
> ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
> that is.
I agree, there's a gap there. We should do the check in the op handler,
and under the files_struct lock. How about something like the below?
diff --git a/fs/file.c b/fs/file.c
index c8a4e4c86e55..50ee73b76d17 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -646,18 +646,13 @@ int __close_fd(struct files_struct *files, unsigned fd)
}
EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
-/*
- * variant of __close_fd that gets a ref on the file for later fput.
- * The caller must ensure that filp_close() called on the file, and then
- * an fput().
- */
-int __close_fd_get_file(unsigned int fd, struct file **res)
+int __close_fd_get_file_locked(struct files_struct *files, unsigned int fd,
+ struct file **res)
+ __releases(&files->file_lock)
{
- struct files_struct *files = current->files;
struct file *file;
struct fdtable *fdt;
- spin_lock(&files->file_lock);
fdt = files_fdtable(files);
if (fd >= fdt->max_fds)
goto out_unlock;
@@ -677,6 +672,19 @@ int __close_fd_get_file(unsigned int fd, struct file **res)
return -ENOENT;
}
+/*
+ * variant of __close_fd that gets a ref on the file for later fput.
+ * The caller must ensure that filp_close() called on the file, and then
+ * an fput().
+ */
+int __close_fd_get_file(unsigned int fd, struct file **res)
+{
+ struct files_struct *files = current->files;
+
+ spin_lock(&files->file_lock);
+ return __close_fd_get_file_locked(files, fd, res);
+}
+
void do_close_on_exec(struct files_struct *files)
{
unsigned i;
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..740547106717 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -3399,10 +3399,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EBADF;
req->close.fd = READ_ONCE(sqe->fd);
- if (req->file->f_op == &io_uring_fops ||
- req->close.fd == req->ctx->ring_fd)
- return -EBADF;
-
return 0;
}
@@ -3430,10 +3426,19 @@ static void io_close_finish(struct io_wq_work **workptr)
static int io_close(struct io_kiocb *req, bool force_nonblock)
{
+ struct files_struct *files = current->files;
int ret;
req->close.put_file = NULL;
- ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
+ spin_lock(&files->file_lock);
+ if (req->file->f_op == &io_uring_fops ||
+ req->close.fd == req->ctx->ring_fd) {
+ spin_unlock(&files->file_lock);
+ return -EBADF;
+ }
+
+ ret = __close_fd_get_file_locked(files, req->close.fd,
+ &req->close.put_file);
if (ret < 0)
return ret;
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index f07c55ea0c22..11d19303af46 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -122,6 +122,8 @@ extern void __fd_install(struct files_struct *files,
extern int __close_fd(struct files_struct *files,
unsigned int fd);
extern int __close_fd_get_file(unsigned int fd, struct file **res);
+extern int __close_fd_get_file_locked(struct files_struct *files,
+ unsigned int fd, struct file **res);
extern struct kmem_cache *files_cachep;
--
Jens Axboe
On Thu, May 07, 2020 at 04:25:24PM -0600, Jens Axboe wrote:
> static int io_close(struct io_kiocb *req, bool force_nonblock)
> {
> + struct files_struct *files = current->files;
> int ret;
>
> req->close.put_file = NULL;
> - ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
> + spin_lock(&files->file_lock);
> + if (req->file->f_op == &io_uring_fops ||
> + req->close.fd == req->ctx->ring_fd) {
> + spin_unlock(&files->file_lock);
> + return -EBADF;
> + }
> +
> + ret = __close_fd_get_file_locked(files, req->close.fd,
> + &req->close.put_file);
Pointless. By that point req->file might have nothing in common with
anything in any descriptor table.
Al, carefully _not_ saying anything about the taste and style of the
entire thing...
On 5/7/20 4:44 PM, Al Viro wrote:
> On Thu, May 07, 2020 at 04:25:24PM -0600, Jens Axboe wrote:
>
>> static int io_close(struct io_kiocb *req, bool force_nonblock)
>> {
>> + struct files_struct *files = current->files;
>> int ret;
>>
>> req->close.put_file = NULL;
>> - ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
>> + spin_lock(&files->file_lock);
>> + if (req->file->f_op == &io_uring_fops ||
>> + req->close.fd == req->ctx->ring_fd) {
>> + spin_unlock(&files->file_lock);
>> + return -EBADF;
>> + }
>> +
>> + ret = __close_fd_get_file_locked(files, req->close.fd,
>> + &req->close.put_file);
>
> Pointless. By that point req->file might have nothing in common with
> anything in any descriptor table.
How about the below then? Stop using req->file, defer the lookup until
we're in the handler instead. Not sure the 'fd' check makes sense
at this point, but at least we should be consistent in terms of
once we lookup the file and check the f_op.
> Al, carefully _not_ saying anything about the taste and style of the
> entire thing...
It's just a quickie, didn't put much concern into the style and naming
of the locked helper. What do you prefer there? Normally I'd do __,
but it's already that, so... There's only one other user of it, so
we could just make the regular one be close_fd_get_file() and use
the __ prefix for the new locked variant.
But I figured it was more important to get the details right first,
the style is easier to polish.
diff --git a/fs/file.c b/fs/file.c
index c8a4e4c86e55..50ee73b76d17 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -646,18 +646,13 @@ int __close_fd(struct files_struct *files, unsigned fd)
}
EXPORT_SYMBOL(__close_fd); /* for ksys_close() */
-/*
- * variant of __close_fd that gets a ref on the file for later fput.
- * The caller must ensure that filp_close() called on the file, and then
- * an fput().
- */
-int __close_fd_get_file(unsigned int fd, struct file **res)
+int __close_fd_get_file_locked(struct files_struct *files, unsigned int fd,
+ struct file **res)
+ __releases(&files->file_lock)
{
- struct files_struct *files = current->files;
struct file *file;
struct fdtable *fdt;
- spin_lock(&files->file_lock);
fdt = files_fdtable(files);
if (fd >= fdt->max_fds)
goto out_unlock;
@@ -677,6 +672,19 @@ int __close_fd_get_file(unsigned int fd, struct file **res)
return -ENOENT;
}
+/*
+ * variant of __close_fd that gets a ref on the file for later fput.
+ * The caller must ensure that filp_close() called on the file, and then
+ * an fput().
+ */
+int __close_fd_get_file(unsigned int fd, struct file **res)
+{
+ struct files_struct *files = current->files;
+
+ spin_lock(&files->file_lock);
+ return __close_fd_get_file_locked(files, fd, res);
+}
+
void do_close_on_exec(struct files_struct *files)
{
unsigned i;
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..54ef10240bf3 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -786,7 +786,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_fs = 1,
},
[IORING_OP_CLOSE] = {
- .needs_file = 1,
.file_table = 1,
},
[IORING_OP_FILES_UPDATE] = {
@@ -3399,10 +3398,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EBADF;
req->close.fd = READ_ONCE(sqe->fd);
- if (req->file->f_op == &io_uring_fops ||
- req->close.fd == req->ctx->ring_fd)
- return -EBADF;
-
return 0;
}
@@ -3430,10 +3425,21 @@ static void io_close_finish(struct io_wq_work **workptr)
static int io_close(struct io_kiocb *req, bool force_nonblock)
{
+ struct files_struct *files = current->files;
+ struct file *file;
int ret;
req->close.put_file = NULL;
- ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
+ spin_lock(&files->file_lock);
+ if (req->close.fd == req->ctx->ring_fd)
+ goto badf;
+
+ file = fcheck_files(files, req->close.fd);
+ if (!file || file->f_op == &io_uring_fops)
+ goto badf;
+
+ ret = __close_fd_get_file_locked(files, req->close.fd,
+ &req->close.put_file);
if (ret < 0)
return ret;
@@ -3458,6 +3464,9 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
*/
__io_close_finish(req);
return 0;
+badf:
+ spin_unlock(&files->file_lock);
+ return -EBADF;
}
static int io_prep_sfr(struct io_kiocb *req, const struct io_uring_sqe *sqe)
diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h
index f07c55ea0c22..11d19303af46 100644
--- a/include/linux/fdtable.h
+++ b/include/linux/fdtable.h
@@ -122,6 +122,8 @@ extern void __fd_install(struct files_struct *files,
extern int __close_fd(struct files_struct *files,
unsigned int fd);
extern int __close_fd_get_file(unsigned int fd, struct file **res);
+extern int __close_fd_get_file_locked(struct files_struct *files,
+ unsigned int fd, struct file **res);
extern struct kmem_cache *files_cachep;
--
Jens Axboe
On Thu, May 07, 2020 at 05:03:17PM -0600, Jens Axboe wrote:
> On 5/7/20 4:44 PM, Al Viro wrote:
> > On Thu, May 07, 2020 at 04:25:24PM -0600, Jens Axboe wrote:
> >
> >> static int io_close(struct io_kiocb *req, bool force_nonblock)
> >> {
> >> + struct files_struct *files = current->files;
> >> int ret;
> >>
> >> req->close.put_file = NULL;
> >> - ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
> >> + spin_lock(&files->file_lock);
> >> + if (req->file->f_op == &io_uring_fops ||
> >> + req->close.fd == req->ctx->ring_fd) {
> >> + spin_unlock(&files->file_lock);
> >> + return -EBADF;
> >> + }
> >> +
> >> + ret = __close_fd_get_file_locked(files, req->close.fd,
> >> + &req->close.put_file);
> >
> > Pointless. By that point req->file might have nothing in common with
> > anything in any descriptor table.
>
> How about the below then? Stop using req->file, defer the lookup until
> we're in the handler instead. Not sure the 'fd' check makes sense
> at this point, but at least we should be consistent in terms of
> once we lookup the file and check the f_op.
Actually, what _is_ the reason for that check? Note, BTW, that if the
file in question happens to be an AF_UNIX socket, closing it will
close all references held in SCM_RIGHTS datagrams sitting in its queue,
which might very well include io_uring files.
IOW, if tries to avoid something really unpleasant, it's not enough.
And if it doesn't, then what is it for?
On 5/7/20 5:31 PM, Al Viro wrote:
> On Thu, May 07, 2020 at 05:03:17PM -0600, Jens Axboe wrote:
>> On 5/7/20 4:44 PM, Al Viro wrote:
>>> On Thu, May 07, 2020 at 04:25:24PM -0600, Jens Axboe wrote:
>>>
>>>> static int io_close(struct io_kiocb *req, bool force_nonblock)
>>>> {
>>>> + struct files_struct *files = current->files;
>>>> int ret;
>>>>
>>>> req->close.put_file = NULL;
>>>> - ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
>>>> + spin_lock(&files->file_lock);
>>>> + if (req->file->f_op == &io_uring_fops ||
>>>> + req->close.fd == req->ctx->ring_fd) {
>>>> + spin_unlock(&files->file_lock);
>>>> + return -EBADF;
>>>> + }
>>>> +
>>>> + ret = __close_fd_get_file_locked(files, req->close.fd,
>>>> + &req->close.put_file);
>>>
>>> Pointless. By that point req->file might have nothing in common with
>>> anything in any descriptor table.
>>
>> How about the below then? Stop using req->file, defer the lookup until
>> we're in the handler instead. Not sure the 'fd' check makes sense
>> at this point, but at least we should be consistent in terms of
>> once we lookup the file and check the f_op.
>
> Actually, what _is_ the reason for that check? Note, BTW, that if the
> file in question happens to be an AF_UNIX socket, closing it will
> close all references held in SCM_RIGHTS datagrams sitting in its queue,
> which might very well include io_uring files.
>
> IOW, if tries to avoid something really unpleasant, it's not enough.
> And if it doesn't, then what is it for?
Maybe there is no issue at all, the point was obviously to not have
io_uring close itself. But we might just need an ordering of the
fput vs put_request to make that just fine. Let me experiment a bit
and see what's going on.
--
Jens Axboe
On 5/7/20 8:28 PM, Jens Axboe wrote:
> On 5/7/20 5:31 PM, Al Viro wrote:
>> On Thu, May 07, 2020 at 05:03:17PM -0600, Jens Axboe wrote:
>>> On 5/7/20 4:44 PM, Al Viro wrote:
>>>> On Thu, May 07, 2020 at 04:25:24PM -0600, Jens Axboe wrote:
>>>>
>>>>> static int io_close(struct io_kiocb *req, bool force_nonblock)
>>>>> {
>>>>> + struct files_struct *files = current->files;
>>>>> int ret;
>>>>>
>>>>> req->close.put_file = NULL;
>>>>> - ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
>>>>> + spin_lock(&files->file_lock);
>>>>> + if (req->file->f_op == &io_uring_fops ||
>>>>> + req->close.fd == req->ctx->ring_fd) {
>>>>> + spin_unlock(&files->file_lock);
>>>>> + return -EBADF;
>>>>> + }
>>>>> +
>>>>> + ret = __close_fd_get_file_locked(files, req->close.fd,
>>>>> + &req->close.put_file);
>>>>
>>>> Pointless. By that point req->file might have nothing in common with
>>>> anything in any descriptor table.
>>>
>>> How about the below then? Stop using req->file, defer the lookup until
>>> we're in the handler instead. Not sure the 'fd' check makes sense
>>> at this point, but at least we should be consistent in terms of
>>> once we lookup the file and check the f_op.
>>
>> Actually, what _is_ the reason for that check? Note, BTW, that if the
>> file in question happens to be an AF_UNIX socket, closing it will
>> close all references held in SCM_RIGHTS datagrams sitting in its queue,
>> which might very well include io_uring files.
>>
>> IOW, if tries to avoid something really unpleasant, it's not enough.
>> And if it doesn't, then what is it for?
>
> Maybe there is no issue at all, the point was obviously to not have
> io_uring close itself. But we might just need an ordering of the
> fput vs put_request to make that just fine. Let me experiment a bit
> and see what's going on.
Ran various bits of testing and tracing with just the below, and
I don't see anything wrong. Even verified the same cases with
pending poll requests and an async read (punted to thread), and
it works and doesn't complain with KASAN either.
And I did think this would work after looking at it. The ctx
referencing should handle this just fine. Hence it seems to me
that my initial attempts at blocking the ring closing itself
were not needed at all.
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..9099a9362ad4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -786,7 +786,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_fs = 1,
},
[IORING_OP_CLOSE] = {
- .needs_file = 1,
.file_table = 1,
},
[IORING_OP_FILES_UPDATE] = {
@@ -3399,10 +3398,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EBADF;
req->close.fd = READ_ONCE(sqe->fd);
- if (req->file->f_op == &io_uring_fops ||
- req->close.fd == req->ctx->ring_fd)
- return -EBADF;
-
return 0;
}
--
Jens Axboe
On 5/8/20 9:29 AM, Hillf Danton wrote:
> Dunno if what's missing makes grumpy.
>
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3439,6 +3439,11 @@ static void io_close_finish(struct io_wq
> static int io_close(struct io_kiocb *req, bool force_nonblock)
> {
> int ret;
> + struct fd f;
> +
> + f = fdget(req->close.fd);
> + if (!f.file)
> + return -EBADF;
>
> req->close.put_file = NULL;
> ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
Can you expand? With the last patch posted, we don't do that fget/fdget
at all. __close_fd_get_file() will error out if we don't have a file
there. It does change the close error from -EBADF to -ENOENT, so maye we
just need to improve that?
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..9fd1257c8404 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -786,7 +786,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_fs = 1,
},
[IORING_OP_CLOSE] = {
- .needs_file = 1,
.file_table = 1,
},
[IORING_OP_FILES_UPDATE] = {
@@ -3399,10 +3398,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EBADF;
req->close.fd = READ_ONCE(sqe->fd);
- if (req->file->f_op == &io_uring_fops ||
- req->close.fd == req->ctx->ring_fd)
- return -EBADF;
-
return 0;
}
@@ -3434,8 +3429,11 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
req->close.put_file = NULL;
ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
- if (ret < 0)
+ if (ret < 0) {
+ if (ret == -ENOENT)
+ ret = -EBADF;
return ret;
+ }
/* if the file has a flush method, be safe and punt to async */
if (req->close.put_file->f_op->flush && force_nonblock) {
--
Jens Axboe