2023-02-15 00:42:32

by Josh Triplett

[permalink] [raw]
Subject: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
of the opcode) to treat the fd as a registered index rather than a file
descriptor.

This makes it possible for a library to open an io_uring, register the
ring fd, close the ring fd, and subsequently use the ring entirely via
registered index.

Signed-off-by: Josh Triplett <[email protected]>
---

v2: Rebase. Change io_uring_register to extract the flag from the opcode first.

include/uapi/linux/io_uring.h | 6 +++++-
io_uring/io_uring.c | 34 +++++++++++++++++++++++++++-------
2 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index 2780bce62faf..35e6f8046b9b 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -470,6 +470,7 @@ struct io_uring_params {
#define IORING_FEAT_RSRC_TAGS (1U << 10)
#define IORING_FEAT_CQE_SKIP (1U << 11)
#define IORING_FEAT_LINKED_FILE (1U << 12)
+#define IORING_FEAT_REG_REG_RING (1U << 13)

/*
* io_uring_register(2) opcodes and arguments
@@ -517,7 +518,10 @@ enum {
IORING_REGISTER_FILE_ALLOC_RANGE = 25,

/* this goes last */
- IORING_REGISTER_LAST
+ IORING_REGISTER_LAST,
+
+ /* flag added to the opcode to use a registered ring fd */
+ IORING_REGISTER_USE_REGISTERED_RING = 1U << 31
};

/* io-wq worker categories */
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index db623b3185c8..1fb743ecba5a 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3663,7 +3663,7 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p,
IORING_FEAT_POLL_32BITS | IORING_FEAT_SQPOLL_NONFIXED |
IORING_FEAT_EXT_ARG | IORING_FEAT_NATIVE_WORKERS |
IORING_FEAT_RSRC_TAGS | IORING_FEAT_CQE_SKIP |
- IORING_FEAT_LINKED_FILE;
+ IORING_FEAT_LINKED_FILE | IORING_FEAT_REG_REG_RING;

if (copy_to_user(params, p, sizeof(*p))) {
ret = -EFAULT;
@@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode,
struct io_ring_ctx *ctx;
long ret = -EBADF;
struct fd f;
+ bool use_registered_ring;
+
+ use_registered_ring = !!(opcode & IORING_REGISTER_USE_REGISTERED_RING);
+ opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;

if (opcode >= IORING_REGISTER_LAST)
return -EINVAL;

- f = fdget(fd);
- if (!f.file)
- return -EBADF;
+ if (use_registered_ring) {
+ /*
+ * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
+ * need only dereference our task private array to find it.
+ */
+ struct io_uring_task *tctx = current->io_uring;

- ret = -EOPNOTSUPP;
- if (!io_is_uring_fops(f.file))
- goto out_fput;
+ if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
+ return -EINVAL;
+ fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
+ f.file = tctx->registered_rings[fd];
+ f.flags = 0;
+ if (unlikely(!f.file))
+ return -EBADF;
+ opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
+ } else {
+ f = fdget(fd);
+ if (unlikely(!f.file))
+ return -EBADF;
+ ret = -EOPNOTSUPP;
+ if (!io_is_uring_fops(f.file))
+ goto out_fput;
+ }

ctx = f.file->private_data;

--
2.39.1



2023-02-15 17:44:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On 2/14/23 5:42 PM, Josh Triplett wrote:
> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> of the opcode) to treat the fd as a registered index rather than a file
> descriptor.
>
> This makes it possible for a library to open an io_uring, register the
> ring fd, close the ring fd, and subsequently use the ring entirely via
> registered index.

This looks pretty straight forward to me, only real question I had
was whether using the top bit of the register opcode for this is the
best choice. But I can't think of better ways to do it, and the space
is definitely big enough to do that, so looks fine to me.

One more comment below:

> + if (use_registered_ring) {
> + /*
> + * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
> + * need only dereference our task private array to find it.
> + */
> + struct io_uring_task *tctx = current->io_uring;

I need to double check if it's guaranteed we always have current->io_uring
assigned here. If the ring is registered we certainly will have it, but
what if someone calls io_uring_register(2) without having a ring setup
upfront?

IOW, I think we need a NULL check here and failing the request at that
point.

--
Jens Axboe



2023-02-15 20:33:15

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On Wed, Feb 15, 2023 at 10:44:38AM -0700, Jens Axboe wrote:
> On 2/14/23 5:42 PM, Josh Triplett wrote:
> > Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> > of the opcode) to treat the fd as a registered index rather than a file
> > descriptor.
> >
> > This makes it possible for a library to open an io_uring, register the
> > ring fd, close the ring fd, and subsequently use the ring entirely via
> > registered index.
>
> This looks pretty straight forward to me, only real question I had
> was whether using the top bit of the register opcode for this is the
> best choice. But I can't think of better ways to do it, and the space
> is definitely big enough to do that, so looks fine to me.

It seemed like the cleanest way available given the ABI of
io_uring_register, yeah.

> One more comment below:
>
> > + if (use_registered_ring) {
> > + /*
> > + * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
> > + * need only dereference our task private array to find it.
> > + */
> > + struct io_uring_task *tctx = current->io_uring;
>
> I need to double check if it's guaranteed we always have current->io_uring
> assigned here. If the ring is registered we certainly will have it, but
> what if someone calls io_uring_register(2) without having a ring setup
> upfront?
>
> IOW, I think we need a NULL check here and failing the request at that
> point.

The next line is:

+ if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))

The first part of that condition is the NULL check you're looking for,
right?

- Josh Triplett

2023-02-15 21:39:09

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On 2/15/23 1:33?PM, Josh Triplett wrote:
> On Wed, Feb 15, 2023 at 10:44:38AM -0700, Jens Axboe wrote:
>> On 2/14/23 5:42?PM, Josh Triplett wrote:
>>> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
>>> of the opcode) to treat the fd as a registered index rather than a file
>>> descriptor.
>>>
>>> This makes it possible for a library to open an io_uring, register the
>>> ring fd, close the ring fd, and subsequently use the ring entirely via
>>> registered index.
>>
>> This looks pretty straight forward to me, only real question I had
>> was whether using the top bit of the register opcode for this is the
>> best choice. But I can't think of better ways to do it, and the space
>> is definitely big enough to do that, so looks fine to me.
>
> It seemed like the cleanest way available given the ABI of
> io_uring_register, yeah.
>
>> One more comment below:
>>
>>> + if (use_registered_ring) {
>>> + /*
>>> + * Ring fd has been registered via IORING_REGISTER_RING_FDS, we
>>> + * need only dereference our task private array to find it.
>>> + */
>>> + struct io_uring_task *tctx = current->io_uring;
>>
>> I need to double check if it's guaranteed we always have current->io_uring
>> assigned here. If the ring is registered we certainly will have it, but
>> what if someone calls io_uring_register(2) without having a ring setup
>> upfront?
>>
>> IOW, I think we need a NULL check here and failing the request at that
>> point.
>
> The next line is:
>
> + if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
>
> The first part of that condition is the NULL check you're looking for,
> right?

Ah yeah, I'm just blind... Looks fine!

--
Jens Axboe


2023-02-16 03:24:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd


On Tue, 14 Feb 2023 16:42:22 -0800, Josh Triplett wrote:
> Add a new flag IORING_REGISTER_USE_REGISTERED_RING (set via the high bit
> of the opcode) to treat the fd as a registered index rather than a file
> descriptor.
>
> This makes it possible for a library to open an io_uring, register the
> ring fd, close the ring fd, and subsequently use the ring entirely via
> registered index.
>
> [...]

Applied, thanks!

[1/1] io_uring: Support calling io_uring_register with a registered ring fd
commit: 04eb372cac91a4f70c9b921c1b86758f5553d311

Best regards,
--
Jens Axboe




2023-02-16 09:35:52

by Dylan Yudaken

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
> @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
> int, fd, unsigned int, opcode,
>         struct io_ring_ctx *ctx;
>         long ret = -EBADF;
>         struct fd f;
> +       bool use_registered_ring;
> +
> +       use_registered_ring = !!(opcode &
> IORING_REGISTER_USE_REGISTERED_RING);
> +       opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>  
>         if (opcode >= IORING_REGISTER_LAST)
>                 return -EINVAL;
>  
> -       f = fdget(fd);
> -       if (!f.file)
> -               return -EBADF;
> +       if (use_registered_ring) {
> +               /*
> +                * Ring fd has been registered via
> IORING_REGISTER_RING_FDS, we
> +                * need only dereference our task private array to
> find it.
> +                */
> +               struct io_uring_task *tctx = current->io_uring;
>  
> -       ret = -EOPNOTSUPP;
> -       if (!io_is_uring_fops(f.file))
> -               goto out_fput;
> +               if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
> +                       return -EINVAL;
> +               fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
> +               f.file = tctx->registered_rings[fd];
> +               f.flags = 0;
> +               if (unlikely(!f.file))
> +                       return -EBADF;
> +               opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;

^ this line looks duplicated at the top of the function?


Also - is there a liburing regression test for this?

2023-02-16 12:06:03

by Josh Triplett

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On Thu, Feb 16, 2023 at 09:35:44AM +0000, Dylan Yudaken wrote:
> On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
> > @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
> > int, fd, unsigned int, opcode,
> > ????????struct io_ring_ctx *ctx;
> > ????????long ret = -EBADF;
> > ????????struct fd f;
> > +???????bool use_registered_ring;
> > +
> > +???????use_registered_ring = !!(opcode &
> > IORING_REGISTER_USE_REGISTERED_RING);
> > +???????opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
> > ?
> > ????????if (opcode >= IORING_REGISTER_LAST)
> > ????????????????return -EINVAL;
> > ?
> > -???????f = fdget(fd);
> > -???????if (!f.file)
> > -???????????????return -EBADF;
> > +???????if (use_registered_ring) {
> > +???????????????/*
> > +??????????????? * Ring fd has been registered via
> > IORING_REGISTER_RING_FDS, we
> > +??????????????? * need only dereference our task private array to
> > find it.
> > +??????????????? */
> > +???????????????struct io_uring_task *tctx = current->io_uring;
> > ?
> > -???????ret = -EOPNOTSUPP;
> > -???????if (!io_is_uring_fops(f.file))
> > -???????????????goto out_fput;
> > +???????????????if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
> > +???????????????????????return -EINVAL;
> > +???????????????fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
> > +???????????????f.file = tctx->registered_rings[fd];
> > +???????????????f.flags = 0;
> > +???????????????if (unlikely(!f.file))
> > +???????????????????????return -EBADF;
> > +???????????????opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>
> ^ this line looks duplicated at the top of the function?

Good catch!

Jens, since you've already applied this, can you remove this line or
would you like a patch doing so?

> Also - is there a liburing regression test for this?

Userspace, including test: https://github.com/axboe/liburing/pull/664

2023-02-16 13:10:52

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCHv2] io_uring: Support calling io_uring_register with a registered ring fd

On 2/16/23 5:05?AM, Josh Triplett wrote:
> On Thu, Feb 16, 2023 at 09:35:44AM +0000, Dylan Yudaken wrote:
>> On Tue, 2023-02-14 at 16:42 -0800, Josh Triplett wrote:
>>> @@ -4177,17 +4177,37 @@ SYSCALL_DEFINE4(io_uring_register, unsigned
>>> int, fd, unsigned int, opcode,
>>> struct io_ring_ctx *ctx;
>>> long ret = -EBADF;
>>> struct fd f;
>>> + bool use_registered_ring;
>>> +
>>> + use_registered_ring = !!(opcode &
>>> IORING_REGISTER_USE_REGISTERED_RING);
>>> + opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>>>
>>> if (opcode >= IORING_REGISTER_LAST)
>>> return -EINVAL;
>>>
>>> - f = fdget(fd);
>>> - if (!f.file)
>>> - return -EBADF;
>>> + if (use_registered_ring) {
>>> + /*
>>> + * Ring fd has been registered via
>>> IORING_REGISTER_RING_FDS, we
>>> + * need only dereference our task private array to
>>> find it.
>>> + */
>>> + struct io_uring_task *tctx = current->io_uring;
>>>
>>> - ret = -EOPNOTSUPP;
>>> - if (!io_is_uring_fops(f.file))
>>> - goto out_fput;
>>> + if (unlikely(!tctx || fd >= IO_RINGFD_REG_MAX))
>>> + return -EINVAL;
>>> + fd = array_index_nospec(fd, IO_RINGFD_REG_MAX);
>>> + f.file = tctx->registered_rings[fd];
>>> + f.flags = 0;
>>> + if (unlikely(!f.file))
>>> + return -EBADF;
>>> + opcode &= ~IORING_REGISTER_USE_REGISTERED_RING;
>>
>> ^ this line looks duplicated at the top of the function?
>
> Good catch!

Indeed!

> Jens, since you've already applied this, can you remove this line or
> would you like a patch doing so?

It's still top-of-tree, I just amended it.

--
Jens Axboe